
Submitted by akr (Wed Mar 31 09:03:44 UTC 2004)
Introduce a new regexp construct for named capture with assignment.
Ruby uses $1, $2, ... to represent a substring captured by regexp matching. They are very useful but have several problems.
$1, $2, ... tends to re-numbered. This makes a code fragile.$-variables are ugly.Oniguruma's named capture will provide a robust solution for 1st problem. But the expected interface $~[:name] doesn't solve 2nd and 3rd problem.
Regexp#match solves 2nd and 3rd problem. But it makes a code longer.
The notation using $~[:name] or Regexp#match is longer than the notation using $1. So a programmer must choose the shorter but fragile notation or the longer but robust notation. It is good thing that Ruby encourages robust notation by making it shorter (and $-free).
Introduce a new construct for named capture with assignment: (?{var=}...) in Regexp literal. Where var is an assignable expression such as a local variable, an instance variable, a writable attribute, etc.
For example, it can be used as follows.
if /...(?{var=}...).../ =~ str # captured substring is assigned to var.
p var
end
For assignment in Regexp, Regexp object should have variable bindings like closure.
Since Regexp object is like closure, scope rule should be lexical.
RE = /...(?{v=}...).../
def m(str)
/...#{RE}...(?{w=}...).../ =~ str
# v and w are assigned or not?
end
Since scope rule should be lexical, v should be assigned in toplevel and w should be assigned in method locally. If such scope rule is difficult to implement, it is acceptable that v is not assigned. But it is not acceptable that v is assigned in method locally.
The construct (?{var=}...) is different from Oniguruma's named capture (?<name>...). It is intended for making side-effect explicit.
Not available.

| Comments | Current voting | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|


RCRchive copyright © David Alan Black, 2003-2005.
Powered by .
I won't love other obscure poart in regexp, but I won't vote against this rcr cause I don't like $n either. Anyway It seem that $variables are going to be deprecated anyway, so you should compare
with, I believe, something likeI'm not sure oniguruma supports this but IIRC it does (and if it does not it should :) gabriele renzi
That doesn't take into account my biggest problem with extracting values from regexps: type. for instance many times i want to extract integers from a regexp. i have to do a to_i. with this scheme, i would get in the variable a String and then i would have to do eg "var = var.to_i" (which i really don't like, a var is of a specific type that shouldn't change like that). what would be great is if we could embed this type information too.
maybe
/...(?{var as Integer=}).../