ruby picture

RCR 232: string interpolation without concatination

Submitted by akr (Thu Mar 25 07:03:31 UTC 2004)

Abstract

Introduce new %-literal, %d"a#{b}c" which returns ["a", b, "c"]. This eases sanitization and various issue.

Problem

The value of dynamic string "a#{b}c" is "a" + b.to_s + "c". So we cannot distinguish literal part "a" and "c" from dynamic part #{b} in the value.

The undistinguishable nature makes sanitization difficult.

The dynamic string is used frequently for structual text generation: HTML, SQL, commandline, etc. For example, "<html><title>#{title}</title>...".

However, "<html><title>#{title}</title>..." has XSS (cross-site scripting) security problem if the variable title contains "<script>...</script>" submitted by a malicious user. To avoid XSS, it should be escaped as "<html><title>#{CGI.escapeHTML title}</title>...". CGI.escapeHTML should be called for each #{} in each dynamic string. It is tend to forget and cause XSS.

%d eases it as: HTML(%d"<html><title>#{title}</title>...") where HTML is a method to escape dynamic part as HTML. The HTML method should be called at once for each dynamic string. So %d reduces number of method calls which should be described. Also the HTML method can context dependent escaping: only angle brackets and ampersand for PCDATA but also double quote or single quote for attribute value. Such context dependency must be handled by an user with current "...#{...}...". If the user missed, XSS is caused.

%d eases fix not only XSS but also SQL injection, command injection and other security problem in structured text generation. For example %d can be elegant notation for SQL preparsed statement. In general, %d makes possible to validate dynamic part doesn't break structure formed by literal part since %d distinguish them.

%d can solve other problems.

For example, gettext can use %d to lookup i18ned message from dynamic string. Literal or printf-format string can be used for gettext well as follows.

$msgs_ja = {
  'Hello' =&gt; 'Kon-nichi-wa',
  'Hello %s' =&gt; 'Kon-nichi-wa %s-san'
}
def gettext(msg) $msgs_ja[msg] || msg end
... puts gettext('Hello') ...
... puts gettext('Hello %s') % [person] ...

But dynamic string cannot.

$msgs_ja = { 'Hello, ???' => 'Kon-nichi-wa, ???-san' }
def gettext(msg) $msgs_ja[msg] || msg end
... puts gettext("Hello, #{person}") ...

Because the argument for gettext varies by person, gettext cannot use it as a key for $msgs_ja. But %d can be used for gettext as follows.

$msgs_ja = { %d"Hello, #{:person}" => %d{Kon-nichi-wa, #{:person}-san" }
def gettext(msg) ... end
... gettext(%d"Hello, #{person}") ...

Since %d distinguishs literal part and dynamic part, gettext can use only literal part of the argument.

Another example is String#gsub's 2nd argument. We know many Ruby beginners confuses backslash escape in the argument.

If gsub interpret %d"...#{1}..." as '...\1...', the confusion can be avoided. Because %d doesn't need no extra escaping mechanism unlike \1 notation.

Proposal

Introduce new %-literal: %d"...". The syntax of the content of %d"..." is exactly same as "...". But the result value is different.

The value of %d"lit1#{dyn1}lit2#{dyn2}..." is an array ["lit1", dyn1, "lit2", dyn2, ...]. The array contains literal part and dynamic part in distinguishable format.

The array contains strings as literal part in even index: 0, 2, 4, ... Some of them may be empty if a dynamic part is placed at beginning or two dynamic parts are consecutive.

The array contains values as dynamic part in odd index: 1, 3, 5, ... They are the evaluated values of expressions in #{}. They are any objects which is not restricted to string.

Example:

%d"a#{b}c" -> ["a", b, "c"]
%d"a#{b}c#{d}e#{f}" -> ["a", b, "c", d, "e", f]
%d"#{v1}#{v2}" -> ["", v1, "", v2]
%d"#{v}" -> ["", v]
%d"abc" -> ["abc"]
%d"" -> []

Note that `d' is taken from `d'ynamic. Other candidates are `e'mbeded and `h'ole.

Analysis

There are some design alternatives.

In current design, %d"..." returns an array. But special string can be returned like Groovy's GString.

I choosed an array because:

However, the special string has a benefit: "..." syntax can be used instead of %d"...".

Another choise is %d'...' which interprets #{} but not various backslash escapes. Because the syntax is identical to "...", %d"..." is more intuitive than %d'...'. So %d"..." should be introduced first. %d'...' should be considered after we experienced with %d"...".

Implementation

Not available. But it should be easily implementable because %d"...#{...}..." can be implemented as the implementation of "...#{...}..." without string concatenation.

ruby picture
Comments Current voting

I've read this several times, and I certainly acknowledge that I know very little about i18n etc. But I'm rather certain that I don't support the %d{...} idea. Things like interpolating #{person} into a string for a method call can be done in other ways, and I don't think that people who have trouble understanding "\\1" in a replacement string are going to find the proposed %d{...} behavior easier to understand.

I'm also not convinced that this is really a matter of concatenating or not concatenating. I don't really think of "a#{b}c" as concatenated strings, but rather as one string with an interpolated component. The idea of breaking it into an array doesn't seem natural. -- David Black


From what I gather, the purpose behind this RCR is to write a user-defined string interpolator. The %d syntax allows that, but the syntax of the user-defined interpolation imo doesn't make it clear what is happening:

  foo(%d"string")

I think a better solution would be to use one of the template systems we already have as libraries and which don't change the syntax of Ruby:

  http://www.rubygarden.org/ruby?HtmlTemplates

Another interesting solution that does change Ruby might be to lazy-evaluate the string concatenation after the interpolation. So if I have:

  def foo
    puts "this is a test"
    return 42
  end  s = "number: #{foo}"
  puts s

Then foo() will be called, "this is a test" will be printed, and the two parts of the interpolation will be stored for later as ["number: ", foo] (much like the proposed %d does). Then, when s is used (by puts), the interpolation actually occurs.

In addition, a new method could be added to the String class to return the parts of the interpolated string so the user can do his own interpolation using the same syntax as standard interpolation.

I'm not sure if I really like this idea, though; it's just a thought that happened to come to my head as I was writing this comment.

-- Paul Brannan


Strongly opposed 2
Opposed 2
Neutral 0
In favor 1
Strongly advocate 0
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .