ruby picture

RCR 335: Sather-like iterators

Submitted by orenbenkiki (Wed May 03 18:24:11 UTC 2006)

Abstract

Add support for Sather-like iterators in Ruby, allowing for easily iterating on more than one sequence at once and avoiding the need for threads where simple co-routines are sufficient.

Problem

It is often useful to iterate on more than one sequence at once. The current Ruby iteration sytnax does not allow to do this, short of writing stuff like:

  seq1.each_index do |i|
    v1 = seq1[i]
    v2 = seq2[i]
    ...
  end

This is awakward to write and does not generalize well. Sather is the first language (that I know of) to solve this problem. See for example.

This is one area where Python has the better technical solution (since each "iterator" has a "next" method and raising a specific exception aborts the loop), but being Python the syntax leaves much to be desired, and there is no support for "hot" arguments.

Proposal

The current implementation of Ruby iterators (using yield) actually lends itself to generalization into Sather-like iterators without changing the source code. For example:

  def ruby_iterator(args)
    loop # of any type
      yield stuff
    end
  end

Old style iteration:

  ruby_iterator { |stuff| ... single sequence ... }

Sather-style iteration:

  loop
    stuff = <<ruby_iterator
    other_stuff = <<other_iterator
    ... two sequences at once ...
  end

The above uses << to indicate "fetch the next value from an iterator". Sather uses "!" which is already used by Ruby for another purpose. The exact syntax does not matter.

Returning from either iterator methods would terminate the loop (just like in Sather).

The above does not allow for Sather "hot" arguments and is a relatively minor enhancement to Ruby's syntax. While it is very useful already, supporting "hot" arguments brings this to a whole new level. A common idiom in many programs is producer/consumer, where the natural way to write each one is by functional decomposition. In a language without Sather iterators and "hot" arguments, this requires an explicit additional thread (with all the nastiness this requires), or writing either the producer or consumer as a callback (with a different sort of nastiness). Some functional languages avoid this trap by using lazy evaluation, which isn't natural for Ruby due to side effects. Using "hot" arguments, the problem disappears:

  def produce
    if condition
      yield "A"
      produce_a
    else
      yield "B"
      produce_b
    end
  end
  def produce_a
    seq.each { |v| yield v }
    yield "."
  end
  def produce_b
    ... etc. ...
  end
  def consume(&stuff)
    if <<stuff == "A"
      consume_a(&stuff) # With & - pass it on
    else
      ... etc. ...
    end
  end
  def consume_a(&stuff)
    until <<stuff == "."
      p stuff # Note no "<<" - uses current value
    end
  end

And then tying them together:

  loop
    consume(&produce)
  end

The Sather compiler has demonstrated that it is possible to compile the above example into single-threaded code that works as expected without the need to muck around with threads on the one hand, and without forcing the developer to code either the producer or the consumer as a callback on the other hand.

Analysis

The need for iterating on more than one sequence is common enough, and allowing invoking iterators "from the outside" without "hot" arguments is the cleanest way of supporting that.

The need for complex consumer/producer is less obvious, but supporting it has a deep impact on program structure. In effect, it gives the programmer many of the benefits of threading and/or lazy evaluation functional languages, while maintaining Ruby's simple clean syntax and avoiding the issues of locking, non-deterministic execution etc.

Implementation

I really wish I could provide one here, but I'm afraid it is a bit too much for me. Sorry...

Support for Sather iterators requires some form of multiple stacks. This complicates the internal state of the interpreter (or virtual machine) in a way similar to that required for supporting multiple threads. So it might make sense to implement this feature as a stepping stone along the way to deep support for multiple threads within Ruby.

Even with deep built-in threading support, Sather iterators still have merit in that they don't have the overhead of an actual thread. Execution is completely deterministic. Since only one iterator is executing at any given time, there are no race conditions, dead/locking issues etc.

ruby picture
Comments Current voting
I admit Sather intatator has superior functionality, but it's more complex in both behavior and implementation. I don't think it's worth it.

matz.


I agree there are implementation complexity issues, but I think that if Ruby ever wants to take on multi-threaded processing, this would be inevitable. Given the trend to move to multi-core CPUs, such a move would be necessary in the next few years. Sather-like iterators would make a wonderful first step in that direction, without actually having to deal with the full complexity of multiple threads (for a while anyway).


I am not sure how Sather iterator relates to mult-core CPUs. Can you elaborate?

Concerning multi-core, I have a vague idea of Enumerable with concurrent iteration. FYI, see <>.

matz.


Here is a crude implementation, it demonstrates that implementation isn't that complex. I'm also uploading it as a gem in RubyForge.

  #
  # si.rb - Sather-like iterators for Ruby
  #
  # Author: Oren Ben-Kiki 2006
  #
  # == Overview
  #
  # This file extends Ruby with very basic support for sather-like iterators.
  # Simply use "si_loop" instead of "loop" and prefix each iterator call with
  # ".si".
  #
  # === Example
  #
  #   a = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
  #   h = { :A => :a, :B => :b, :C => :c, :D => :d }
  #   si_loop do
  #     puts "---"
  #     v1 = a.si(:foo1).each
  #     v2 = a.si(:foo1).each
  #     p [ v1, v2 ]
  #     p a.si.each
  #     p h.si.each
  #   end
  #
  # Produces:
  #
  #   ---
  #   [0, 1]
  #   0
  #   [:D, :d]
  #   ---
  #   [2, 3]
  #   1
  #   [:A, :a]
  #   ---
  #   [4, 5]
  #   2
  #   [:B, :b]
  #   ---
  #   [6, 7]
  #   3
  #   [:C, :c]
  #   ---
  #   [8, 9]
  #   4
  # Context for Sather iteration loops.
  #
  # This class is not meant to be used directly. See the documentation for si.rb
  # for an overview.
  class SiContext
    # Push a new context on the stack when a new loop begins.
    def SiContext.push
      @@maps ||= []
      @@maps.push({})
    end
    # Pop the current context from the stack when a loop ends.
    def SiContext.pop
      @@maps.pop
    end
    # Access an SiInterator in the current scope. The +id+ uniquely identifies
    # the iterator instance, and +object+ is the one that has the iterator
    # method that will be wrapped.
    def SiContext.iterator(id, object)
      @@maps.last[id] ||= SiIterator.new(object)
    end
  end
  # Wrap a Ruby iterator for a Sather iteration loop.
  #
  # This class is not meant to be used directly. Instances are created by calls
  # to SiContext.iterator. See the documentation for si.rb for an overview.
  class SiIterator
    # Create a new iterator wrapper.
    #
    # +object+ is the object that provides the Ruby iterator method.
    def initialize(object)
      @object = object
      @is_first_fetch = true
      @to_call_out = true
      @to_call_in = false
    end
    # Capture a call to a ruby iterator.
    #
    # It is implicitly assumed that the missing method is actually the call
    # to the Ruby iterator meant for the original object. As usual, +symbol+ is
    # the missing method name, and +args+ are the arguments passed to it.
    def method_missing(symbol, *args)
      si_fetch { |b| @object.send(symbol, *args, &b) }
    end
    # Fetch the next value from a Ruby iterator.
    def si_fetch
      callcc { |@out_cont| }
      if @is_first_fetch
        @is_first_fetch = false
        yield Proc.new { |value| si_capture(value) }
        si_break
      elsif @to_call_in
        @to_call_in = false
        @in_cont.call
      else
        @to_call_in = true
      end
      @value
    end
    # Capture the value yielded by a Ruby iterator.
    def si_capture(v)
      @value = v
      callcc { |@in_cont| }
      if @to_call_out
        @to_call_out = false
        @out_cont.call
      else
        @to_call_out = true
      end
    end
  end
  module Kernel
    # Return a Sather iterator wrapper for an iterator method. The optional +id+
    # parameter identifies the iterator. If it is not provided, the source file
    # and line number of the call serve as a unique id. This means that two calls
    # in the same line for the same object will be treated as a single iterator
    # unless explicitly given explicit +id+.
    def si(id = caller[0])
      SiContext.iterator(id, self)
    end
    # Begin a Sather iterator loop.
    def si_loop(&block)
      SiContext.push
      catch(:si_break) { loop(&block) }
    ensure
      SiContext.pop
    end
    # Break a Sather iteration loop. A normal break statement will also work, but
    # this one will work even if called from a sub-function, which may be useful.
    def si_break
      throw :si_break
    end
  end


On Sather iterators and multi-threading - my point was that creating a producer/consumer pattern in a language requires either seperate threads or co-routines support. This has little to do with employing multiple cores, although obviously using a seperate thread will make use of multi-core CPUs.

Multi-threading in an imperative language is *hard* - it raises a huge amount of issues about locking and indeterminism. Using co-routines, on the other hand, is *easy*. There are no locking and safery issues and the code is deterministic.

I view Ruby as an inherently single-threaded language. I happen to believe that concurrent languages should be based on side-effect-free code, write-once variables and encapsulating each thread in a monitor object. That's just me, though; The Java people would make a strong case that it is possible to do concurrency pretty well in an imperative language using a very tightly specified memory model and explicit locks. They at least have the advantage that the language and libraries were designed for this from the start. I find it hard to believe it would be possible to add concurrency to Ruby "well". I'd be more than happy to be proven wrong :-)

At any rate, a single-threaded Ruby doesn't mean that it should be barred from using producer/consumer patterns. With a bit of creative use of continuations (as demonstrated by the code I posted), it is possible to do so without any extensions to the run time system. Continuations are much too low level for every day use, however - they are the ultimate form of goto statement. Sather happens to contain the best thought-out framework for implementing producer/consumer/coroutines in a structured, practical manner.

My sample code above demonstrates that this is doable in Ruby as well. There are of course details to work out. For example, my code handles nested loops, but not as nicely as Sather does. Hot variables aren't quite supported. My detection of seperate iterator instances isn't as robust as it should be (it is based on the line number or an explicit identifier). Finally, the syntax sucks (using "si_loop" and prefixing the method call with ".si" are hacks).

If this were integrated into the Ruby language itself, the syntax could be cleaned up, the implementation could be more robust and probably more efficient, and Ruby programmers would gain a powerful new tool - support for producer/consumer/coroutine patterns, not to mention trivial iteration on several collections at once.

Oren.


Released as a Ruby Gem calles SI (version 0.0.1), accessible in

Share & Enjoy,

    Oren.


Strongly opposed 0
Opposed 1
Neutral 0
In favor 0
Strongly advocate 1
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .