ruby picture

RCR 308: Enumerable methods to select iterator other than 'each'

Submitted by bcandler (Wed Jul 13 09:39:01 UTC 2005)

Abstract

Add a new parameter to each of the Enumerable methods to choose the iterator method to run (defaults to :each)

Problem

1. Lots of people want #map_with_index. This is a subset of the RCR proposed here: map(:each_with_index) { ... }

2. Using #map and similar with Hash#each_value, Hash#each_pair, String#each_byte etc. is awkward. Currently it can be done by creating an intermediate wrapper object, which is a standard extension but is not documented by 'ri'

    require 'enumerator'
    obj.to_enum(:each_with_index).map { ... }

The intention of this code is difficult to read until you are familiar with this particular way of working, and IMO it separates the 'map' from the 'obj' excessively. It seems to be a complicated way of doing a simple job.

Proposal

Each of the Enumerable methods is extended to take an additional, optional parameter giving the symbolic name of the iterator method to call, defaulting to 'each'.

    def map(meth = :each)
    def inject(initial, meth = :each)
    def find(ifnone, meth = :each)
    ... etc

There are only two awkward cases.

- Enumerable#to_set (in set.rb): the current syntax allows you to pass in (klass,*args), which is actually not very useful. But it can be kept backwards-compatible by:

    meth = :each
    if klass.is_a?(Symbol)
      meth = klass
      args = args.dup
      klass = args.shift || Set
    end

- Enumerable#zip takes a list of arrays. Similarly, it can check if the first (or last) arg is a Symbol, and use that as the method name to call.

Analysis

This solution has virtually zero overhead - just the parsing of an extra argument.

In my opinion, it reads well:

   obj.collect(:each_with_index) { |x,i| ... }

"For this object, collect the results of each_with_index calling the following block"

It gives a straightforward answer to the question "how do I use {find/collect etc} but iterating over :each_byte instead of :each" - because it would be documented together with {find/collect etc} as one of its arguments.

It's a backwards-compatible change with minor implementation impact.

All objects which have additional iterator methods (like Hash#each_value, IO#each_byte, plus any user-written classes) benefit from this feature without having to be changed themselves.

On the minus side: there *is* another way to achieve this (using to_enum). However, Ruby embraces the "more than one way to do it" philosophy :-)


Aside: for the case of each_with_index, the concept of an 'index' isn't necessarily the same for all enumerable objects. For a hash, the index might be the key; for a File, the index might be the line number (starting from 1). This might be an argument for moving each_with_index into the underlying object, rather than Enumerable (although Enumerable could provide a default implementation giving an index starting from 0)

Might also be worth considering making Hash#each_with_index yield |value,key| for each element; then there would be a common way of iterating over both an Array and a Hash, which doesn't exist at present - but that would be an incompatible change.

Implementation

Trivial, but here are some samples:

    module Enumerable
      def nmap(meth = :each)
        res = []
        send(meth) { |*args| res << yield(*args) }
        res
      end
      def ninject(sum, meth = :each)
        send(meth) { |*args| sum = yield(sum,*args) }
        sum
      end
    end
    a = [10,20,30,40]
    p a.nmap { |x| x + 1 }
    p a.nmap(:each_with_index) { |x,i| x + i }
    p a.ninject(0,:each_with_index) { |sum,x,i| sum + x*i }           
    h = {1=>"one", 2=>"two", 3=>"three" }
    p h.nmap(:each_key) { |x| x + x }
    p h.nmap(:each_value) { |x| x + x }

I demonstrate with 'nmap' rather than 'map' because overriding Enumerable#map does not override Array#map. The actual implementation would change the C code for Enumerable.

ruby picture
Comments Current voting
Don't like it at all. It makes the code much more opaque (less expressive) and therefore less useful. I don't care if there's a transformation involved, but I do care if the code looks awful. --Austin


More opaque/less expressive as compared to the "to_enum" pattern, or compared to something else?

e.g. what's your preferred way of expressing "abc".collect(:each_byte) ?

--Brian.


I don't have a preferred way; aside from perhaps two uses of mwi, I haven't needed alternate iterators at this point.

I'm not sure that to_enum is good, but algorithm-changing parameters to me are a sure sign of code smell (e.g., this proposal, a few other proposals that have floated around for using booleans for algorithm changes, etc.). I've used them before, but I don't like using them.

If I had need of this, I could offer a better solution, maybe, but as I don't have need of it, I wouldn't trust any solution I'd offer right now. -Austin


If we were talking about map(true) versus map(false), I'd certainly agree with you. Same for map() versus map(:with_index). If the method says "if flag do X, else do Y" then they should be two different methods. But with map(:each), map(:each_with_index), map(:each_byte) etc you're doing the same in each case; call method M and collect the yielded values. Then, :each is the default just to save typing in the common case.

I wonder if those in favour of map_with_index() would also want to see find_with_index, find_all_with_index, sort_with_index etc? And then map_with_[each]_pair, map_with_[each]_value, map_with_[each]_byte? That seems pretty smelly too. --Brian.


I wonder if those in favour of map_with_index() would also want to
see find_with_index, find_all_with_index, sort_with_index etc? And then map_with_[each]_pair, map_with_[each]_value, map_with_[each]_byte?

As I've said on ruby-talk a few times, in response to the same question: no :-) I'm not sure where the whole "If we do this, then we would have to do this" thing comes from. Obviously each_with_index has not caused a cascade of *_with_index methods. I think the only question about adding map_with_index is whether or not map_with_index (not anything_else_with_index) would be a good addition.

Meanwhile, as to the RCR itself: Among the ways on offer of doing two iterators, it's one of the cleaner looking. (I'm a fan of the power of Enumerator, but not the look of to_enum(:method). And the proposed magical return of an enumerator in the absence of blocks is actually a bit *too* clean-looking (i.e., devoid of visual cues as to what it does).) It still leaves me with the feeling, though, that this would be palpably in the "added to the language as an afterthought" category. The whole business of passing method names around as symbols is, to me, at a different level of integratedness (or some such thing) than, say, method chaining, where the whole thing arises from the underlying message-sending principle.

I don't have any great other plan, though -- still mulling the whole thing over, and not voting yet.

David Black


I think you paritally mix apples and oranges here. On one hand you have the with_index (really a counter) and the other you have key, value or pair.

The later first. Enumerable has to hold things in common with the classes it appends. So likewise the eachness has to have a commonality. Right now Ruby barely manages that. I say barely b/c Ruby actually behaves incongruently depending on whether you use a Hash (or hash-like thing) or an Array (or array like thing). This is appraent in the underlying code. The difference:

  ary.each { |value| ... }
  hsh.each { |key, value| ... }

How it would be better is that an array index corresponds to a hash key, then:

  ary.each { |value| ... }
  hsh.each { |value| ... }

And further

  ary.each_pair { |index,value| ... }
  hsh.each_pair { |key,value| ... }
  ary.each_index { |index| ... }
  hsh.each_key { |key| ... }

Of course Ary.each_index is mostly usless but nonetheless it is just an alias of each_key (or visversa) anyway, we could just get rid of one or the other. Also, not to forget associative arrays either:

  ary.each_assoc { |a0, a1| }

Though specialized for an array, we shadow it in a hash with another alias for each_key.

So that's the first thing I think that would really improve this --and improve polymorhism. Still that does lead to a lot of potential methods: map_index, map_pair, map_assoc, select_index, select_pair ... (not sure they all make sense but still) (And It doesn't help that we have another term for map, i.e. collect).

A better way to handle would be to be able to indicate the desired each in the arguments. Perhaps like:

  ary.each { |v| ... }      # value only
  ary.each { |i,v| ... }    # index and value
  ary.each { |i,_| ... }    # just index (or key)
  ary.each { |(i,v)| ... }  # association

The trick is making this general and programable, so that an #each definition can respond appropriately.

This leaves the with_index. Mind you we've discussed on Ruby talk how that's a misnomer. It should be called 'with_counter' instead, b/c that's really all it is --it _counts_ the number of iterations and should not be confused with an array index (or hash key). While, it may not seem the most elegant solution, the most useful fix is to have a special iteration state object that is always available in a block. It could be a subclass of Fixnum, but have other methods like #first? and #last?. The trick is what it is called/how to name it in the syntax. For example, a possiblity if it can be parsed:

  hsh.each { c |i,v| p c }  #=> 0 1 2 3 4 ...

Finally and most importantly, arguments on the Enumerable methods should be left open for user defined information to pass through. I have an ParameterizedEnumerable module that does just that. I offerd it as an RCR to Enumerable itself (it's backward compat.), but Matz felt at the time that it overcomplexified enumerable and suggested I use Enumerator. But truthfully Enumerator's way too complex for me :-)

T.


Enumerator is simple when you realise that all it does is map calls to ":each" to ":some_other_method"

    require 'enumerator'
    a = Hash.new( ... )
    b = a.to_enum(:each_value)

b is just a, except that "b.each" does the same as "a.each_value". Period.

Using to_enum, all those Enumerable methods like collect() and find() and inject(), which have hard-coded logic which say they call "each" on the object to the left, gain the flexibility to call another method of your choice. I just think they should have had that flexibility in the first place. --BrianCandler


I like the concept, but not this implementation, it needs more work. Opposed. I have a few ideas on what I'd like to see the new behavior to be.

Supply the index regardless, and simply let people define and use it if they wish. "foo".each_byte {|char,index| or |char|}

For types without "natural" breaking points (arrays have natural divisions), make .each take a split argument. (example follows)

"This,is a test,string!".map! /,/ {|l,i| l = "Line #{i}: #{l}\n"} => "Line 1: This\nLine 2: is a ..."

-wn


Strongly opposed 3
Opposed 2
Neutral 0
In favor 2
Strongly advocate 0
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .