Submitted by cyent (Mon Jan 05 17:01:15 UTC 2004)
You got it wrong, but you have no idea how.
Or worse, your user has entered input wrong, and you can't tell him what, just that it is wrong.
Any sensible parser when choking on input will tell you, 'Syntax Error: Expecting one of "blah,bloo,foo"', so why can't Ruby Regex's?
For example if I try match
"digger" =~ /dig(by|raph)/
expecting to match either digby or digraph it would be nice if Ruby could tell me, "No match, longest match "dig", expecting a character in set [br] but found 'g' at position 3"
1. Rewrite a Regex engine in Ruby. 2. Change Regex::match to always return a MatchData that held the info. This would break some existing code. 3. Add a new method Regex::try_match(string) which would always return a MatchData, even if it didn't match. 4. Add a new modifier 'e' "digger" =~ /dig(by|raph)/e would return false, but set some global $whatever with the MatchData
I would favour option 3 or 4.
Comments | Current voting | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
RCRchive copyright © David Alan Black, 2003-2005.
Powered by .
Unmatch is not an error, so that I'm not sure whether we need to report the detailed unmatch information. Besides that, implementation is harder than you may expect. Under the multibyte environment, to get "position 3" for example, is much more difficult.
Among choices, Option 1 is not realistic, option 2 is incompatible too much. and I hate option 3 to introduce new global variable.
- matz.
It seems that the use of the new Oniguramu regexp engine might make something like this more possible, but that it should be a specialized call on a regexp, and not a normal match. Perhaps the RCR author should ask the Oniguramu author(s) if something like this is feasible using Oniguramu, and if they would consider adding such a feature. Then a specialized method could be added to Regexp that attempts a match and returns an "error report" if it fails.
-- ntalbott
I have asked the author of Oniguruma if it is possible, I will wait on his reply.
I think an additional method regex.try_match( string) that returned an subclass of the MatchData object would be the best / least impacting extension.
The other think I noted whilst contemplating your replies, is that the failure information should probably apply to the longest matching string.
Just my 2 cents: I know that I personally have spent many hours debugging complex regular expressions, and some return from a failed match that said where/why it failed would have been invaluable. As for the implementations, I agree with matz in that adding a global variable is not a good solution. I think the third option, which was to have an alternate method that would always return a MatchData.
-- JamisBuck
Why would option 3 introduce another global var? IMHO this is the best choice because it has the advantage over 4 that behavior of =~ does not change, i.e. you do not have to look at the regexp option to figure what =~ does in this particular case (note: the regexp might be defined somewehere else).
robert