ruby picture

RCR 179: Create a 'NoMatchData' object on Regex match failing.

Submitted by cyent (Mon Jan 05 17:01:15 UTC 2004)

Abstract

Extend MatchData to have information on why a Regular Expression match failed.

Problem

How often have you written a Regex and expected it to match something and it just didn't?

You got it wrong, but you have no idea how.

Or worse, your user has entered input wrong, and you can't tell him what, just that it is wrong.

Any sensible parser when choking on input will tell you, 'Syntax Error: Expecting one of "blah,bloo,foo"', so why can't Ruby Regex's?

For example if I try match

 "digger" =~ /dig(by|raph)/

expecting to match either digby or digraph it would be nice if Ruby could tell me, "No match, longest match "dig", expecting a character in set [br] but found 'g' at position 3"

Proposal

There are several ways in which this could be achieved...

   1. Rewrite a Regex engine in Ruby.
   2. Change Regex::match to always return a MatchData that held the info. This would break some existing code.
   3. Add a new method Regex::try_match(string) which would always return a MatchData, even if it didn't match.
   4. Add a new modifier 'e' "digger" =~ /dig(by|raph)/e would return false, but set some global $whatever with the MatchData 

Analysis

Option 1 would probably be too slow, option 2 would break too much existing code.

I would favour option 3 or 4.

Implementation

ruby picture
Comments Current voting

Unmatch is not an error, so that I'm not sure whether we need to report the detailed unmatch information. Besides that, implementation is harder than you may expect. Under the multibyte environment, to get "position 3" for example, is much more difficult.

Among choices, Option 1 is not realistic, option 2 is incompatible too much. and I hate option 3 to introduce new global variable.

- matz.


It seems that the use of the new Oniguramu regexp engine might make something like this more possible, but that it should be a specialized call on a regexp, and not a normal match. Perhaps the RCR author should ask the Oniguramu author(s) if something like this is feasible using Oniguramu, and if they would consider adding such a feature. Then a specialized method could be added to Regexp that attempts a match and returns an "error report" if it fails.

-- ntalbott


I have asked the author of Oniguruma if it is possible, I will wait on his reply.

I think an additional method regex.try_match( string) that returned an subclass of the MatchData object would be the best / least impacting extension.

The other think I noted whilst contemplating your replies, is that the failure information should probably apply to the longest matching string.


Just my 2 cents: I know that I personally have spent many hours debugging complex regular expressions, and some return from a failed match that said where/why it failed would have been invaluable. As for the implementations, I agree with matz in that adding a global variable is not a good solution. I think the third option, which was to have an alternate method that would always return a MatchData.

-- JamisBuck


Why would option 3 introduce another global var? IMHO this is the best choice because it has the advantage over 4 that behavior of =~ does not change, i.e. you do not have to look at the regexp option to figure what =~ does in this particular case (note: the regexp might be defined somewehere else).

robert


Strongly opposed 0
Opposed 4
Neutral 0
In favor 4
Strongly advocate 0
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .