String#scan, #gsub, and #sub to provide the MatchData to attached code blocks.
String#scan, #gsub, and #sub yield the string value of the matched regular expression to a provided block, which is of very limited value. Currently, we must rely upon either ugly numeric match variables ( $1 - $9, etc.) or a class method ( Regexp.last_matchstr = '<span id="1"> <span> ...</span> </span> ' re = /(<(\/?)span> )/i str.scan(re) # => [["<span> ", ""], ["</span> ", "/"], ["</span> ", "/"]] matches = [] str.scan(re) do matches << Regexp.last_match end matches.each do |match| match.captures.each_with_index do |capture, ii| soff, eoff = match.offset(ii + 1) puts %Q("#{capture}" #{soff} .. #{eoff}) end end
String#scan, #sub, and #gsub yield MatchData objects instead of Strings. I think that this could be achieved while breaking the least amount of code by adding a #to_str implementation to MatchData.
--- re.c.old 2004-08-22 00:24:09 Eastern Daylight Time
+++ re.c 2004-08-22 00:18:50 Eastern Daylight Time
@@ -2320,6 +2320,7 @@
rb_define_method(rb_cMatch, "pre_match", rb_reg_match_pre, 0);
rb_define_method(rb_cMatch, "post_match", rb_reg_match_post, 0);
rb_define_method(rb_cMatch, "to_s", match_to_s, 0);
+ rb_define_method(rb_cMatch, "to_str", match_to_s, 0);
rb_define_method(rb_cMatch, "inspect", rb_any_to_s, 0); /* in object.c */
rb_define_method(rb_cMatch, "string", match_string, 0);
}
--- string.c.old 2004-08-22 00:24:10 Eastern Daylight Time
+++ string.c 2004-08-22 00:20:35 Eastern Daylight Time
@@ -1928,7 +1928,7 @@
if (iter) {
rb_match_busy(match);
- repl = rb_obj_as_string(rb_yield(rb_reg_nth_match(0, match)));
+ repl = rb_obj_as_string(rb_yield(0, match));
rb_backref_set(match);
}
else {
@@ -2043,7 +2043,7 @@
regs = RMATCH(match)-> regs;
if (iter) {
rb_match_busy(match);
- val = rb_obj_as_string(rb_yield(rb_reg_nth_match(0, match)));
+ val = rb_obj_as_string(rb_yield(match));
rb_backref_set(match);
}
else {
@@ -4164,15 +4164,7 @@
else {
*start = END(0);
}
- if (regs-> num_regs == 1) {
- return rb_reg_nth_match(0, match);
- }
- result = rb_ary_new2(regs-> num_regs);
- for (i=1; i < regs-> num_regs; i++) {
- rb_ary_push(result, rb_reg_nth_match(i, match));
- }
-
- return result;
+ return match;
}
return Qnil;
}
I'm not 100% sure that this is right, and I haven't tested it. The equivalent Ruby code would be (note: this code appears to work, but it does cause problems with irb):
class MatchData
def to_str
self.to_s
end
end
class String
alias_method :old_scan, :scan
alias_method :old_gsub!, :gsub!
alias_method :old_sub!, :sub!
def scan(pattern)
if block_given?
old_scan(pattern) { yield Regexp.last_match }
else
old_scan(pattern)
end
end
def gsub(pattern, repl = nil, &block)
s = self.dup
s.gsub!(pattern, repl, &block)
s
end
def gsub!(pattern, repl = nil)
if block_given? and repl.nil?
old_gsub!(pattern) { yield Regexp.last_match }
elsif repl.nil?
old_gsub!(pattern)
else
old_gsub!(pattern, repl)
end
end
def sub(pattern, repl = nil, &block)
s = self.dup
s.sub!(pattern, repl, &block)
s
end
def sub!(pattern, repl = nil)
if block_given? and repl.nil?
old_sub!(pattern) { yield Regexp.last_match }
elsif repl.nil?
old_sub!(pattern)
else
old_sub!(pattern, repl)
end
end
end
If accepted, this RCR would do three good things, IMHO... it would:
As for the Regexp#gsub idea, I'm not convinced that it makes sense to have these methods, which are basically identical in function, attached to two different objects. It would make about as much sense to me as offering String#join(array), with it being a near parallel to Array#join(string). Then there's the lack of possible bang methods, which I would miss. --Mark Hubbart
Agreed. It also is 100% incompatible on #scan with groups in the regexp (e.g., "foobar".scan(/(..)(.)/) will yield [["fo", "o"], ["ba", "b"]]. This is the argument for Regexp#scan instead of modifying String#scan. However, this is something that I believe should be changed. An alternative is to yield both the normal values and the match -- but that itself will be incompatible with #scan and most current uses of #gsub and #sub that use the match value.
Yet another alternative is to add an optional parameter in all cases. String#gsub currently expects a regexp and a replace pattern OR a regexp and a block. #gsub could be modified such that when it gets a regexp, a "boolean", and a block, it yields something different. This could be, for example:
String#gsub(pattern, true) { |match_data| ... }
String#gsub(pattern) { |string| ... }
I would actually rather see the opposite form, if we do this:
String#gsub(pattern, true) { |string| ... }
String#gsub(pattern) { |match_data| ... }
This would encourage the use of the new form. By doing it this way, a transition period can be introduced for this (e.g., it in 1.8.3 it may warn that the current replace will be changed to yield a match_data instead of a string; in 1.9 it yields a match_data instead of a string).
I have *not* analysed code out there that uses #gsub/#scan/#sub, but I think that this is an ideal change.
Adding a flag to change method behavior is usually regarded bad OO practice. The usual solution is to have two different methods - one for each behavior. That increases modularity, simplifies the implementation and improves performance.
I'd rather have String#gsub_md, String#gsub_md! and String#scan_md than the flag although I have to admit that those method names are ugly.
The idea of having #gsub with the flag (as opposed to the #gsubm [the name I had chosen instead of #gsub_md) form which I rejected in writing my response to Nobu) is that IMO we should encourage the use of the new form with MatchData objects, not the other form. By using #gsubm, we discourage the use of the new form in favour of the old form. The only way that I think that this would really work is to have #gsub yield MatchData and #gsubs yield Strings, if we take that approach. --AustinZiegler
String#gsub(pattern) { |string| ... }
String#gsub(pattern) { |string, match_data| ... }
-- Paul Brannan
String#gsub(pattern) { ... }
This, when I am not using the MatchData, is my common case for String#gsub.
Back to RCRchive.
RCR Submission page and RCRchive powered by Ruby, Apache, RuWiki (modified), and RubLog