ruby picture

RCR 237: named capture with assignment in regexp

Submitted by akr (Wed Mar 31 09:03:44 UTC 2004)

Abstract

Introduce a new regexp construct for named capture with assignment.

Problem

Ruby uses $1, $2, ... to represent a substring captured by regexp matching. They are very useful but have several problems.

Oniguruma's named capture will provide a robust solution for 1st problem. But the expected interface $~[:name] doesn't solve 2nd and 3rd problem.

Regexp#match solves 2nd and 3rd problem. But it makes a code longer.

The notation using $~[:name] or Regexp#match is longer than the notation using $1. So a programmer must choose the shorter but fragile notation or the longer but robust notation. It is good thing that Ruby encourages robust notation by making it shorter (and $-free).

Proposal

Introduce a new construct for named capture with assignment: (?{var=}...) in Regexp literal. Where var is an assignable expression such as a local variable, an instance variable, a writable attribute, etc.

For example, it can be used as follows.

if /...(?{var=}...).../ =~ str # captured substring is assigned to var.
  p var
end

For assignment in Regexp, Regexp object should have variable bindings like closure.

Since Regexp object is like closure, scope rule should be lexical.

RE = /...(?{v=}...).../

def m(str)
  /...#{RE}...(?{w=}...).../ =~ str
  # v and w are assigned or not?
end

Since scope rule should be lexical, v should be assigned in toplevel and w should be assigned in method locally. If such scope rule is difficult to implement, it is acceptable that v is not assigned. But it is not acceptable that v is assigned in method locally.

Analysis

The construct (?{var=}...) is different from Oniguruma's named capture (?<name>...). It is intended for making side-effect explicit.

Implementation

Not available.

ruby picture
Comments Current voting

I won't love other obscure poart in regexp, but I won't vote against this rcr cause I don't like $n either. Anyway It seem that $variables are going to be deprecated anyway, so you should compare


          
 m= 'str' =~ /rgx/
 p m[0]
with, I believe, something like
          

          
 m= 'str' =~ /rgx/
 p m['name'] # ok, I'll prefer a symbol, I admit it.. 

          

I'm not sure oniguruma supports this but IIRC it does (and if it does not it should :) gabriele renzi


That doesn't take into account my biggest problem with extracting values from regexps: type. for instance many times i want to extract integers from a regexp. i have to do a to_i. with this scheme, i would get in the variable a String and then i would have to do eg "var = var.to_i" (which i really don't like, a var is of a specific type that shouldn't change like that). what would be great is if we could embed this type information too.

maybe

/...(?{var as Integer=}).../


Strongly opposed 1
Opposed 1
Neutral 0
In favor 3
Strongly advocate 3
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .