Single global constant collections (#11)

Submitted by: Robert Klemme

This revision by: Robert Klemme

Date: Thu May 17 07:49:23 -0400 2007

ABSTRACT

I propose to add a constant EMPTY to the collection classes with a frozen instance which could help increasing efficiency and also make code more readable at times.

The concept can even be extended to String and IO. The IO instance would just throw away everything written to it and always return EOF for reading. #close would be a noop.

PROBLEM

There are some places in code that need an empty collection. While [] is usually easy typed it will always create a new instance which can be an issue if invoked often. While every user could define such a constant himself it’s certainly more efficient to have a single default instance.

Example code that would benefit:

class Foo def add(x) (@collected ||= []) << x self end end

def collected
  @collected || Array::EMPTY
end
  1. do this with thousands of instances of Foo f=Foo.new
  2. nothing added f.collected.each {|col| p col}

PROPOSAL

Add Array::EMPTY, Hash::EMPTY etc. as frozen empty containers.

ANALYSIS

Pro

- simple mechanism that can be easily implemented and documented

- Java programmers will immediately feel at home, at least if they know: http://java.sun.com/j2se/1.5.0/docs/api/java/util/Collections.html#field_summary

- minor overhead (just one instance per collection class)

Con

- it has to be done and documented

- not many pieces of code might actually benefit

- performance overhead is not (yet) proven

IMPLEMENTATION

class Array EMPTY = [].freeze end

class Hash EMPTY = {}.freeze end

class Set EMPTY = Set.new.freeze end

  1. extension class String EMPTY = ’’.freeze end

class IO EMPTY = ... # custom IO object end

Comments

from Joel VanderWerf, Thu May 17 10:07:14 -0400 2007


Are all those empty arrays really a problem?

It bothers me a little that the value returned by #collected behaves 
differently depending on whether or not #add has been called.

#-----------------------------
class Array
   EMPTY = [].freeze
end

class Foo
   def add(x)
     (@collected ||= []) << x
      self
   end

   def collected
     @collected || Array::EMPTY
   end
end

foo = Foo.new

begin
   foo.collected << 1
rescue TypeError => ex
   puts ex.message  # ==> can't modify frozen array

end

foo.add(2)

foo.collected << 3

p foo.collected  # ==> [2, 3]
#-----------------------------

One solution (not changing the RCR, but just the example) is to dup and 
freeze the return value of #collected, but that probably has more 
overhead (even with COW) than having lots of empty arrays:

#-----------------------------
class Foo
   def add(x)
     (@collected ||= []) << x
      self
   end

   def collected
     @collected ? @collected.dup.freeze : Array::EMPTY
   end
end
#-----------------------------

Why isn't the following a more suitable solution than this RCR? I'm 
guessing you have an example in mind where it isn't...

#-----------------------------
class Foo
   include Enumerable

   def add(x)
     (@collected ||= []) << x
      self
   end

   def each
     @collected && @collected.each {|x| yield x}
   end
end
#-----------------------------


from Gavin Kistner, Fri May 18 00:33:20 -0400 2007

On May 17, 2007, at 8:07 AM, vjoel@path.berkeley.edu wrote:
> It bothers me a little that the value returned by #collected behaves
> differently depending on whether or not #add has been called.

That's what bothers me in the supplied example, and (in my mind)  
calls into question the utility of having a frozen psuedo-singleton.  
Usage would require an assumption that the consumer will never want  
to mutate the value.

Given the few times where this would be useful, and that the  
implementation when it is useful can be accomplished in a few lines  
of pure Ruby, I'm not sure why this change should be considered for  
core.

No offense to the well-respected Robert Klemme. :)


from Robert Klemme, Fri May 18 04:49:12 -0400 2007

2007/5/18, gavin@refinery.com <gavin@refinery.com>:
> On May 17, 2007, at 8:07 AM, vjoel@path.berkeley.edu wrote:
> > It bothers me a little that the value returned by #collected behaves
> > differently depending on whether or not #add has been called.
>
> That's what bothers me in the supplied example, and (in my mind)
> calls into question the utility of having a frozen psuedo-singleton.
> Usage would require an assumption that the consumer will never want
> to mutate the value.

Well, there *are* cases where one allows for internal modification of
a collection only (for example to maintain some class invariant) but
want to allow clients to iterate and apply other non mutating
operations. Immutability of the return value would of course have to
be documented.  It's really an optimization helper.

> Given the few times where this would be useful, and that the
> implementation when it is useful can be accomplished in a few lines
> of pure Ruby, I'm not sure why this change should be considered for
> core.

OTOH the change is really minor and has no negative impact on existing
code as far as I can see (even reassigning to the constant will work).

> No offense to the well-respected Robert Klemme. :)

No offense taken from the well respected Gavin. :-)

robert

from Wolfgang NĂ¡dasi-Donner, Fri May 18 11:51:37 -0400 2007

I see that there are no problems with existing code, and that it may 
save space for some constructs, but I don't know if it make sense for 
class "Hash", because an initially empty "Hash" object "{}" will usually 
be used later on.

from Robert Klemme, Fri May 18 15:36:43 -0400 2007

2007/5/18, wonado@donnerweb.de <wonado@donnerweb.de>:
> I see that there are no problems with existing code, and that it may
> save space for some constructs, but I don't know if it make sense for
> class "Hash", because an initially empty "Hash" object "{}" will usually
> be used later on.

Well, that could be said of any data structure. The use case I am
targeting is exactly one where a non mutable instance is needed since
clients are exptected to access read only.

Btw, here's another use case, kind of reversed of the one I cited
originally: Take a class that has a Hash / Array / ... member that is
never modified internally but just read and set via accessor methods.
But this class may have some code that depends on iterating this
instance (for example in inspect or to_s). So you have the choice to
either initialize with nil and test every using occurrence or
initialize with the constant empty singleton and use it without test.
Again, this is a performance optimization scenario.

Kind regards

robert


Return to top

Copyright © 2006, Ruby Power and Light, LLC