Submitted by petertje (Mon Sep 06 13:09:43 UTC 2004)
%
literals. %
literals are especially useful in reducing code clutter for commonly recreated data structures. Presently, the built-in %
literals (namely %q
, %Q
, %r
, %s
, %w
and %x
) are handled opaquely by the Ruby interpreter. User-defined %
literals can be provided via %
methods in the same way as the present ``
(backquote) construct, which in itself calls %x
. Programmers would then be able to swiftly create data structures particular to their needs. For example, %y
could be used for YAML::load
.
%
literals conforms to the same rules as the current literals, i.e., matching braces, etc. Additionally, the literal's closing delimiter may be followed by any number of letters, serving as a limited form of parameter, congruent with the present behavior of %r
.
%
literal is evaluated (i.e. %m
, where m is any lowercase letter), the literal and its parameters are passed as strings to a method of like name, e.g. def %m(string, options)
. This method then interprets the string according to any optional parameters and returns a representative object. In the case of an uppercase %
literal (%M
, where M is an uppercase letter) the lowercase method is also called, but only after the interpreter applies the additional substitutions for double-quoted strings.
%
method, we will first give the trivial case of %q:
<pre> module Kernel def %q(string) string end end </pre>
<pre> class OurClass private def %q(string, params) params.split(//).each do |p| case p when 'u' string.upcase! else raise "unknown string option: #{p}" end end string end end </pre>
OurClass#%q
as the method to call. An example would be:
class OurClass def test %q{Hello #{world}!}u end end
u
is passed in the same way as to the %r
literal in Ruby now. When calling OurClass#test
, the evaluation of the literal will result in a call to OurClass#%q
with 'Hello #{world}!' and 'u' as parameters. This method will then return 'HELLO #{WORLD}!'
class OurClass def test world = 'ruby-talk' %Q{Hello #{world}!}u end end
OurClass#test
will again result in a call to OurClass#%q
, but this time with 'Hello ruby-talk!' and 'u' as parameters. This is because the uppercase variant does do string interpolation. The result of it all would be 'HELLO RUBY-TALK!'.
%
literal method can return an object other than a string. For instance, this is how the aforementioned YAML case is defined:
<pre> module Kernel def %y(string, params) YAML::load(string) end end </pre>
%r
:
<pre> module Kernel def %r(string, options) Regexp.new(string, options.split(//).inject(0) { |v, c| v | Hash.new { |h, k| raise "unknown regexp option - #{k}" }.update({"i" => Regexp::IGNORECASE, "m" => Regexp::MULTILINE, "x" => Regexp::EXTENDED}[c]}) end end </pre>
%
literals. So it would also be possible to allow more than one letter after the %
, e.g., %yaml
which is less cryptic than %y
(although longer). While not a necessity, it increases the possibilities.
<h3>Pros</h3>
%
literals without changing the Ruby interpreter. Requests for new %
literals have come up a few times on ruby-talk, e.g., >, >, >; lots of people would know what to do with this feature: YAML literals, XML literals, syntax literals (on-the-fly parser generation), ...%
literals have: convenience, with less typing for commonly occurring data structures, conciseness, thus reducing code clutter and less escaping of quotes in literals.%
literals and the ``
notation.%
literals must be handled with care. When using other's people's libraries, it may cause non-backward compatiblity issues. Then again, this comes with the territory of having open classes. While overriding the built-in %
literals could be prohibited, a warning would probably suffice.%r
and %x
, though both lowercase, are presently treated as double-quoted, thus differing from the general convention set forth in this proposal. Changing this can break backward compatibility. In which case a phased implementation is recommended, issuing a strong warning for a number of release cycles. The (less elegant) alternative is to make asymmetric exceptions for %r
and %x
regardless of whether they are overridden or not. These two will then always be parsed as if double-quoted, rendering the uppercase variants useless.%
literal syntax to allow any letter and allow any parameters to each literal. Again this strictly extends ruby's syntax and does not clash with Ruby's syntax.%
literals. During the parsing phase, the literal and its flags are stored as strings, the method that will be called at evaluation is stored in some form and it is flagged as being single or double-quoted. At evaluation time all required substitutions are performed before passing the literal and its parameters as strings on to the according method. The result of that method is the result of the evaluation of the literal.Comments | Current voting | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
RCRchive copyright © David Alan Black, 2003-2005.
Powered by .
Custom literals will make it more difficult to do syntax coloring. Ruby syntax coloring is already complicated... such feature will make it even more complex. --Simon Strandgaard
Can't you simply color a %x literal as a single-quoted string and a %X literal as a double-quoted string.
-- Peter
Peter, first let me tell you that this is one of the better-written RCRs I've seen. IMO, we need more RCRs like this (not that I agree with the proposal, but I think it's really well-written).
Could you please provide some examples of how calling %q with parameters would look?
Also, do the new %-methods follow the normal method lookup rules? E.g. if I have a derived class, an included module, and a base class, and the derived class defines a new %x, and the module uses %x, does it get the derived class's version or the one in Kernel? If it gets the derived class's version, is it then good advice to avoid using %-literals from inside mixins?
Will multi-character %-literal names be allowed, e.g.:
If so (and even if not -- the same applies to single-character %-literals), then if I write this:
should this be interpreted as:
(that is, a method foo being called with the result from the %bar literal that was passed the string "baz"), or should it be interpreted as:
(that is, the result of mod'ing foo with the result of calling bar with {baz} as a block)?
Lastly, could you elaborate on the advantages that creating user-defined %-literals has over simply passing strings into methods? Since there are only 26 letters of the alphabet, and I think most people will tend toward defining single-letter literals, is this really a good idea (it seems to favor libraries that establish their %-literals over newer libraries).
Simon, I think syntax highlighting is a solvable problem (just highlight all unknown literals as you would a string), so long as the rules for opening and closing the literal are well-defined and not redefinable at run-time (since the parser is pretty much static at the moment, I think this is a reasonable requirement).
-- Paul Brannan
Paul, thanks for the compliment about the quality of the RCR, but the credit isn't all mine. T. Onoma helped me put it together at which was started exactly because we also think we need better quality RCRs. Now we know it was not an illusion :-)
First the easy part: the syntax ambiguity. It is already resolved in Ruby now. This:
is interpreted as
and this:
is interpreted as
The standard method lookup rules will indeed be used. This does pose some dangers when using literals in modules. The easy answer is that the same danger exists with ``, so the danger level must be acceptible. But really, the danger of name clashes when including modules exists anyhow, but poses very few problems in practice. I'm not sure if these literals will make it worse, even if there are only 26 letters in the alphabet. I think you would define many more methods than % literals, and in practice not that many different method names are actually used, so the danger is IMO not that much higher than everyday Ruby coding nowadays. Besides, if Matz implements namespaces, the issue becomes resolvable without having to revert to ugly notations like Kernel::%q{Hi}.
Lastly the battle for the literals. Defining the literals in Kernel or Object is bad just like any definition in Kernel or Object. But even if multiple libraries you use define the same literals in separate modules or so, you still can't use then both in the same place. But the libraries should always provide an alternative way to do the same thing, even if it is more verbose. That means you can't use the literals but that's tough luck then. And again, if Matz implements namespaces, this will be a problem no more.
I think a good reason to have these literals is twofold. One is that these % literals use less escaping. But you can still use m(%q{}) for that instead of %m{}. But secondly, IMO if your code uses m(%q{blah blah}) a lot, it will become more readible if you can just type %m{blah blah} all the time. It's not just less typing, it's one indirection less to interpret when reading the code. It's also a matter of taste, if you don't like %q, %r and alike now, you won't like it in any form. If you do like these literals now, it's a small step to wanting some of your own.
-- Peter
In my ruby lexer I attemt to color escapes and interpolated code inside literals. see these screenshots:
I want to color %r{} regexp literals as regexp, if I detect illegal regexp constructions I want to color it red. %r literals can have tailing options too. I want to color %w{} as an array literal, so that its easier to see where the seperators are. Uppercase literals, such as %W and %R is being resolved by ruby.
I think allowing for custom single letters wont break too much. But allowing for arbitrary strings %tag content tag will course problems. I don't know exactly what I want to say.
-- Simon Strandgaard
Simon, It would still be possible to color escape sequences and interpolations as well as the tailing options, because the rules for this will be the same for all of the literals. Detecting errors in the literals will have to be generic, unless it would be acceptable to you to color all %r literals as regexps and %w literals as word lists even if they are redefined. I don't remember the details, but I distinctly remember an editor doing something like that, and I also remember I hated that.
And for all clarity, I do not intend to allow arbitrary strings as delimiters, that's a bad idea. I meant arbitrary identifiers, like this:
-- Peter
I would like to point out that the main advantage here is that no one would need to pine to ruby's engineers, nor would the developers have to worry about such, any longer, when those cases arise that a new % literal is desired. For instance, _why would like a %y for YAML. Sean Russell might like a %xml, etc.
A couple things happen. 1) the Ruby interpretor can actually be simplified. That's good. And 2) A generally acceptable syntax for creating constructor shortcuts is promoted. These two facts leads me to believe that either Ruby should do away % literals altogether (if their perlism-ness is undesired) or embrace this RCR.
T.
Allowing programmers to change things like this doesn't seem to hurt the language at all and gives a large benefit of flexibility, which is what I see as a major benefit of Ruby (with things like open classes: new methods can be added, old methods can be changed, &c).
-- Olathe