Introduce new %-literal, %d"a#{b}c" which returns ["a", b, "c"]. This eases sanitization and various issue.
The value of dynamic string "a#{b}c" is "a" + b.to_s + "c". So we cannot distinguish literal part "a" and "c" from dynamic part #{b} in the value.
The undistinguishable nature makes sanitization difficult.
The dynamic string is used frequently for structual text generation: HTML, SQL, commandline, etc. For example, "<html><title>#{title}</title>...".
However, "<html><title>#{title}</title>..." has XSS (cross-site scripting) security problem if the variable title contains "<script>...</script>" submitted by a malicious user. To avoid XSS, it should be escaped as "<html><title>#{CGI.escapeHTML title}</title>...". CGI.escapeHTML should be called for each #{} in each dynamic string. It is tend to forget and cause XSS.
%d eases it as: HTML(%d"<html><title>#{title}</title>...") where HTML is a method to escape dynamic part as HTML. The HTML method should be called at once for each dynamic string. So %d reduces number of method calls which should be described. Also the HTML method can context dependent escaping: only angle brackets and ampersand for PCDATA but also double quote or single quote for attribute value. Such context dependency must be handled by an user with current "...#{...}...". If the user missed, XSS is caused.
%d eases fix not only XSS but also SQL injection, command injection and other security problem in structured text generation. For example %d can be elegant notation for SQL preparsed statement. In general, %d makes possible to validate dynamic part doesn't break structure formed by literal part since %d distinguish them.
%d can solve other problems.
For example, gettext can use %d to lookup i18ned message from dynamic string. Literal or printf-format string can be used for gettext well as follows.
$msgs_ja = {
'Hello' => 'Kon-nichi-wa',
'Hello %s' => 'Kon-nichi-wa %s-san'
}
def gettext(msg) $msgs_ja[msg] || msg end
... puts gettext('Hello') ...
... puts gettext('Hello %s') % [person] ...
But dynamic string cannot.
$msgs_ja = { 'Hello, ???' => 'Kon-nichi-wa, ???-san' }
def gettext(msg) $msgs_ja[msg] || msg end
... puts gettext("Hello, #{person}") ...
Because the argument for gettext varies by person, gettext cannot use it as a key for $msgs_ja. But %d can be used for gettext as follows.
$msgs_ja = { %d"Hello, #{:person}" => %d{Kon-nichi-wa, #{:person}-san" }
def gettext(msg) ... end
... gettext(%d"Hello, #{person}") ...
Since %d distinguishs literal part and dynamic part, gettext can use only literal part of the argument.
Another example is String#gsub's 2nd argument. We know many Ruby beginners confuses backslash escape in the argument.
If gsub interpret %d"...#{1}..." as '...\1...', the confusion can be avoided. Because %d doesn't need no extra escaping mechanism unlike \1 notation.
Introduce new %-literal: %d"...". The syntax of the content of %d"..." is exactly same as "...". But the result value is different.
The value of %d"lit1#{dyn1}lit2#{dyn2}..." is an array ["lit1", dyn1, "lit2", dyn2, ...]. The array contains literal part and dynamic part in distinguishable format.
The array contains strings as literal part in even index: 0, 2, 4, ... Some of them may be empty if a dynamic part is placed at beginning or two dynamic parts are consecutive.
The array contains values as dynamic part in odd index: 1, 3, 5, ... They are the evaluated values of expressions in #{}. They are any objects which is not restricted to string.
Example:
%d"a#{b}c" -> ["a", b, "c"]
%d"a#{b}c#{d}e#{f}" -> ["a", b, "c", d, "e", f]
%d"#{v1}#{v2}" -> ["", v1, "", v2]
%d"#{v}" -> ["", v]
%d"abc" -> ["abc"]
%d"" -> []
Note that `d' is taken from `d'ynamic. Other candidates are `e'mbeded and `h'ole.
There are some design alternatives.
In current design, %d"..." returns an array. But special string can be returned like Groovy's GString.
I choosed an array because:
%d"..." responds to to_str, its behavior will be concatenation. But the concatenation is the behavior which causes XSS. Such dangerous behavior shouldn't be default.However, the special string has a benefit: "..." syntax can be used instead of %d"...".
Another choise is %d'...' which interprets #{} but not various backslash escapes. Because the syntax is identical to "...", %d"..." is more intuitive than %d'...'. So %d"..." should be introduced first. %d'...' should be considered after we experienced with %d"...".
Not available. But it should be easily implementable because %d"...#{...}..." can be implemented as the implementation of "...#{...}..." without string concatenation.
Back to RCRchive.
RCR Submission page and RCRchive powered by Ruby, Apache, RuWiki (modified), and RubLog