accidental duplicate rcr, please delete. (#17)

Submitted by: Jordan Sissel

This revision by: Jordan Sissel

Date: Tue Sep 18 13:18:23 -0400 2007

View earlier revisions

ABSTRACT

Regular expressions are very flexible, but perl went a step further allowing you to inject code into the execution of a regular expression. This is extremely powerful in that it lets you extend functionality of the regexp system without having to write your own.

PROBLEM

In regex, the problem of doing ‘and’, submatches on groups, and general assertions is difficult. For example:

matching an a url with foo in it. It is easier to specify “match a url with foo in it” by crafting a regex that matches any url and specifiying that it also matches ‘foo’ than crafting a regex that matches any url and injecting the necessary regexes which will match foo anywhere inside. Sometimes, look-ahead/behind assertions can be used, but not always.

A way to specify “and” in regex is useful, but absent in all implementations other than perl. That is, =~ /(regex_matching_a_url)(?some_assertion_that_tests_$1)/

PROPOSAL

This change affects the regular expression syntax by adding a new pattern:

(?{ code })

where ‘code’ is a ruby code block (alternatively, simply a function to be called. ‘perldoc perlre’ will explain exactly what (?{ code }) does:

"(?{ code })" 
          This zero-width assertion evaluates any embedded Perl code.  It
          always succeeds, and its "code" is not interpolated.

I would like to modify the constraints here and add that ‘code’ can optionally fail if the return value from said code is false. This gives you great control over pattern matching.

No other changes to Ruby are necessary for this change other than adding this into the regular expression engine.

Code samples:

ANALYSIS

This change is only truely beneficial when implemented in the regex engine itself so you can take advantage of the natural backtracking that the regex execution does when it hits a failure. It makes for more readable and more powerful regular expressions when you need to do some more advanced matching.

I use this specific kind of advanced regular expressions in grok, a pattern matching tool I wrote in perl. It is extremely useful to have.

IMPLEMENTATION

I have patches that mostly put this change into ruby1.8.6 with oniguruma. Here’s a sample ruby invocation of (?{ code })

def check_private_network(ip) priv_re = /^(192\.168\.|10\.|172.16)/ ret = false puts “Checking #{ip}” result = (ip =~ priv_re) if result ret = true end return ret end

ip_re = “((?:[0-9]{1,3}\.){3}(?:[0-9]{1,3}))” fun_re = Regexp.new(“(#{ip_re})(?{ check_private_network($g0)})”);

  1. this set matches correctly, identifies i0 and i2 as private ips = [“192.168.0.1”, “1.2.3.4”, “10.8.3.44”, “73.55.244.2”]

ips.each { |x| y = fun_re.match(x) if y puts ”#{y} is a private net” end }

  1. output should be:
  2. Checking 192.168.0.1
  3. 192.168.0.1 is a private net
  4. Checking 1.2.3.4
  5. Checking 10.8.3.44
  6. 10.8.3.44 is a private net
  7. Checking 73.55.244.2

Another example:

mystr = “1.2.3.4 192.168.0.1” ip = fun_re.match(mystr)
  1. ip should be ‘192.168.0.1’ because the assertion will fail to match
  2. 1.2.3.4 as a ‘private’ ip address causing the regexp engine to skip ip.

Comments


Return to top

Copyright © 2006, Ruby Power and Light, LLC