Submitted by akr (Wed Mar 31 09:03:44 UTC 2004)
Introduce a new regexp construct for named capture with assignment.
Ruby uses $1
, $2
, ... to represent a substring captured by regexp matching. They are very useful but have several problems.
$1
, $2
, ... tends to re-numbered. This makes a code fragile.$
-variables are ugly.Oniguruma's named capture will provide a robust solution for 1st problem. But the expected interface $~[:name]
doesn't solve 2nd and 3rd problem.
Regexp#match
solves 2nd and 3rd problem. But it makes a code longer.
The notation using $~[:name]
or Regexp#match
is longer than the notation using $1
. So a programmer must choose the shorter but fragile notation or the longer but robust notation. It is good thing that Ruby encourages robust notation by making it shorter (and $
-free).
Introduce a new construct for named capture with assignment: (?{var=}...)
in Regexp literal. Where var is an assignable expression such as a local variable, an instance variable, a writable attribute, etc.
For example, it can be used as follows.
if /...(?{var=}...).../ =~ str # captured substring is assigned to var.
p var
end
For assignment in Regexp, Regexp object should have variable bindings like closure.
Since Regexp object is like closure, scope rule should be lexical.
RE = /...(?{v=}...).../
def m(str)
/...#{RE}...(?{w=}...).../ =~ str # v and w are assigned or not?
end
Since scope rule should be lexical, v should be assigned in toplevel and w should be assigned in method locally. If such scope rule is difficult to implement, it is acceptable that v is not assigned. But it is not acceptable that v is assigned in method locally.
The construct (?{var=}...)
is different from Oniguruma's named capture (?<name>...)
. It is intended for making side-effect explicit.
Not available.
Comments | Current voting | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
RCRchive copyright © David Alan Black, 2003-2005.
Powered by .
I won't love other obscure poart in regexp, but I won't vote against this rcr cause I don't like $n either. Anyway It seem that $variables are going to be deprecated anyway, so you should compare
with, I believe, something likeI'm not sure oniguruma supports this but IIRC it does (and if it does not it should :) gabriele renzi
That doesn't take into account my biggest problem with extracting values from regexps: type. for instance many times i want to extract integers from a regexp. i have to do a to_i. with this scheme, i would get in the variable a String and then i would have to do eg "var = var.to_i" (which i really don't like, a var is of a specific type that shouldn't change like that). what would be great is if we could embed this type information too.
maybe