Submitted by akr (Thu Mar 25 07:03:31 UTC 2004)
Introduce new %-literal, %d"a#{b}c"
which returns ["a", b, "c"]
. This eases sanitization and various issue.
The value of dynamic string "a#{b}c"
is "a" + b.to_s + "c"
. So we cannot distinguish literal part "a"
and "c"
from dynamic part #{b}
in the value.
The undistinguishable nature makes sanitization difficult.
The dynamic string is used frequently for structual text generation: HTML, SQL, commandline, etc. For example, "<html><title>#{title}</title>..."
.
However, "<html><title>#{title}</title>..."
has XSS (cross-site scripting) security problem if the variable title contains "<script>...</script>"
submitted by a malicious user. To avoid XSS, it should be escaped as "<html><title>#{CGI.escapeHTML title}</title>..."
. CGI.escapeHTML
should be called for each #{}
in each dynamic string. It is tend to forget and cause XSS.
%d
eases it as: HTML(%d"<html><title>#{title}</title>...")
where HTML
is a method to escape dynamic part as HTML. The HTML
method should be called at once for each dynamic string. So %d
reduces number of method calls which should be described. Also the HTML
method can context dependent escaping: only angle brackets and ampersand for PCDATA but also double quote or single quote for attribute value. Such context dependency must be handled by an user with current "...#{...}..."
. If the user missed, XSS is caused.
%d
eases fix not only XSS but also SQL injection, command injection and other security problem in structured text generation. For example %d
can be elegant notation for SQL preparsed statement. In general, %d
makes possible to validate dynamic part doesn't break structure formed by literal part since %d
distinguish them.
%d
can solve other problems.
For example, gettext can use %d
to lookup i18ned message from dynamic string. Literal or printf-format string can be used for gettext well as follows.
$msgs_ja = {
'Hello' => 'Kon-nichi-wa', 'Hello %s' => 'Kon-nichi-wa %s-san'
} def gettext(msg) $msgs_ja[msg] || msg end ... puts gettext('Hello') ... ... puts gettext('Hello %s') % [person] ...
But dynamic string cannot.
$msgs_ja = { 'Hello, ???' => 'Kon-nichi-wa, ???-san' } def gettext(msg) $msgs_ja[msg] || msg end ... puts gettext("Hello, #{person}") ...
Because the argument for gettext varies by person, gettext cannot use it as a key for $msgs_ja
. But %d
can be used for gettext as follows.
$msgs_ja = { %d"Hello, #{:person}" => %d{Kon-nichi-wa, #{:person}-san" } def gettext(msg) ... end ... gettext(%d"Hello, #{person}") ...
Since %d
distinguishs literal part and dynamic part, gettext can use only literal part of the argument.
Another example is String#gsub
's 2nd argument. We know many Ruby beginners confuses backslash escape in the argument.
If gsub interpret %d"...#{1}..."
as '...\1...'
, the confusion can be avoided. Because %d
doesn't need no extra escaping mechanism unlike \1
notation.
Introduce new %-literal: %d"..."
. The syntax of the content of %d"..."
is exactly same as "..."
. But the result value is different.
The value of %d"lit1#{dyn1}lit2#{dyn2}..."
is an array ["lit1", dyn1, "lit2", dyn2, ...]
. The array contains literal part and dynamic part in distinguishable format.
The array contains strings as literal part in even index: 0, 2, 4, ... Some of them may be empty if a dynamic part is placed at beginning or two dynamic parts are consecutive.
The array contains values as dynamic part in odd index: 1, 3, 5, ... They are the evaluated values of expressions in #{}
. They are any objects which is not restricted to string.
Example:
%d"a#{b}c" -> ["a", b, "c"] %d"a#{b}c#{d}e#{f}" -> ["a", b, "c", d, "e", f] %d"#{v1}#{v2}" -> ["", v1, "", v2] %d"#{v}" -> ["", v] %d"abc" -> ["abc"] %d"" -> []
Note that `d' is taken from `d'ynamic. Other candidates are `e'mbeded and `h'ole.
There are some design alternatives.
In current design, %d"..."
returns an array. But special string can be returned like Groovy's GString.
I choosed an array because:
%d"..."
responds to to_str, its behavior will be concatenation. But the concatenation is the behavior which causes XSS. Such dangerous behavior shouldn't be default.However, the special string has a benefit: "..."
syntax can be used instead of %d"..."
.
Another choise is %d'...'
which interprets #{}
but not various backslash escapes. Because the syntax is identical to "..."
, %d"..."
is more intuitive than %d'...'
. So %d"..."
should be introduced first. %d'...'
should be considered after we experienced with %d"..."
.
Not available. But it should be easily implementable because %d"...#{...}..."
can be implemented as the implementation of "...#{...}..."
without string concatenation.
Comments | Current voting | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
RCRchive copyright © David Alan Black, 2003-2005.
Powered by .
I've read this several times, and I certainly acknowledge that I know very little about i18n etc. But I'm rather certain that I don't support the %d{...} idea. Things like interpolating #{person} into a string for a method call can be done in other ways, and I don't think that people who have trouble understanding "\\1" in a replacement string are going to find the proposed %d{...} behavior easier to understand.
I'm also not convinced that this is really a matter of concatenating or not concatenating. I don't really think of "a#{b}c" as concatenated strings, but rather as one string with an interpolated component. The idea of breaking it into an array doesn't seem natural. -- David Black
From what I gather, the purpose behind this RCR is to write a user-defined string interpolator. The %d syntax allows that, but the syntax of the user-defined interpolation imo doesn't make it clear what is happening:
I think a better solution would be to use one of the template systems we already have as libraries and which don't change the syntax of Ruby:
Another interesting solution that does change Ruby might be to lazy-evaluate the string concatenation after the interpolation. So if I have:
Then foo() will be called, "this is a test" will be printed, and the two parts of the interpolation will be stored for later as ["number: ", foo] (much like the proposed %d does). Then, when s is used (by puts), the interpolation actually occurs.
In addition, a new method could be added to the String class to return the parts of the interpolated string so the user can do his own interpolation using the same syntax as standard interpolation.
I'm not sure if I really like this idea, though; it's just a thought that happened to come to my head as I was writing this comment.
-- Paul Brannan