Submitted by: Jordan Sissel
This revision by: Jordan Sissel
Date: Tue Sep 18 13:18:23 -0400 2007
Regular expressions are very flexible, but perl went a step further allowing you to inject code into the execution of a regular expression. This is extremely powerful in that it lets you extend functionality of the regexp system without having to write your own.
In regex, the problem of doing ‘and’, submatches on groups, and general assertions is difficult. For example:
matching an a url with foo in it. It is easier to specify “match a url with foo in it” by crafting a regex that matches any url and specifiying that it also matches ‘foo’ than crafting a regex that matches any url and injecting the necessary regexes which will match foo anywhere inside. Sometimes, look-ahead/behind assertions can be used, but not always.
A way to specify “and” in regex is useful, but absent in all implementations other than perl. That is, =~ /(regex_matching_a_url)(?some_assertion_that_tests_$1)/
This change affects the regular expression syntax by adding a new pattern:
(?{ code })
where ‘code’ is a ruby code block (alternatively, simply a function to be called. ‘perldoc perlre’ will explain exactly what (?{ code }) does:
"(?{ code })"
This zero-width assertion evaluates any embedded Perl code. It
always succeeds, and its "code" is not interpolated.
I would like to modify the constraints here and add that ‘code’ can optionally fail if the return value from said code is false. This gives you great control over pattern matching.
No other changes to Ruby are necessary for this change other than adding this into the regular expression engine.
Code samples:
This change is only truely beneficial when implemented in the regex engine itself so you can take advantage of the natural backtracking that the regex execution does when it hits a failure. It makes for more readable and more powerful regular expressions when you need to do some more advanced matching.
I use this specific kind of advanced regular expressions in grok, a pattern matching tool I wrote in perl. It is extremely useful to have.
I have patches that mostly put this change into ruby1.8.6 with oniguruma. Here’s a sample ruby invocation of (?{ code })
def check_private_network(ip) priv_re = /^(192\.168\.|10\.|172.16)/ ret = false puts “Checking #{ip}” result = (ip =~ priv_re) if result ret = true end return ret end
ip_re = “((?:[0-9]{1,3}\.){3}(?:[0-9]{1,3}))” fun_re = Regexp.new(“(#{ip_re})(?{ check_private_network($g0)})”);
ips.each { |x| y = fun_re.match(x) if y puts ”#{y} is a private net” end }
Another example:
mystr = “1.2.3.4 192.168.0.1” ip = fun_re.match(mystr)
Comments
Return to top
Copyright © 2006, Ruby Power and Light, LLC