ruby picture

RCR 178: Add Enum type to Ruby

Submitted by Paul Brannan (Mon Dec 29 13:12:58 UTC 2003)

Abstract

To avoid certain types of errors, and to make writing C/C++ extensions that use enums easier, we should add an Enum class to Ruby.

Problem

Currently, many Ruby methods that wrap C/C++ enum types take integers as parameters. This allows code like the following to be written:

  File.open("foo", Fcntl::F_SETFD) # opens a file read/write
  UDPSocket.new(IO::O_RDWR) # create a UDP socket with family AF_INET</pre>

Even good unit tests cannot find these bugs, since Fcntl::F_SETFD, IO::O_RDWR, and Socket::AF_INET all have the same value on my platform (2). Ruby should provide the programmer with the tools he needs to avoid these kinds of mistakes.

Proposal

I propose an Enum class that can be used from extensions. It should have the following Ruby interface:

  Enum.new(*symbols)  # creates a new enumerated class in which the given
                      # symbols become unique enumerated values; the
                      # enumerated values are stored as constants in
                      # enumclass::Constants, which is a module that is mixed
                      # into enumclass.</pre>

Instances of Enum are themselves classes (called enumerated classes). Enumerated classes should include Enumerable and have the following interface:
  enumclass.each()     # iterates over all the values in the Enum

    
  enumclass.from_int() # convert an integer to an Enum type</pre>

Instances of enumerated classes are enumerated values. Enumerated values should include Comparable and have the following interface:
  enum.name           # return a stringified representation of the value

    
  enum.to_s           # return enum.name
  enum.value          # convert the enum to its underlying value
  enum.to_i           # convert the value to an integer, if possible
  enum.inspect        # inspect the enum
  enum.hash           # return enum.value.hash
  enum.eql?(x)        # return true if enum has the same value as x and
                      # enum and x are both of the same class
  enum.===(x)         # return true if enum has the same value as x and
                      # enum and x are both of the same class
  enum.<=>(x)         # raise an exception if enum and x are not of the same
                      # class, else return enum.value <=> x.value</pre>

The purpose behind the Enum class is primarily for extension writers, so the Enum class should have a C interface that is easily accessible from extensions. The C interface should look something like the following:
  VALUE rb_enum_define(int argc, VALUE symbols[]); /* equivalent to Enum.new */

    
  VALUE rb_enum_s_each(VALUE enumclass);
  VALUE rb_enum_s_from_int(VALUE enumclass, long integer);
  VALUE rb_enum_name(VALUE enumvalue);
  VALUE rb_enum_value(VALUE enumvalue);</pre>

One might use the Enum class from Ruby code like this:
  class Socket

    
    # Create a new enumerated class.  If the values were left out here, then
    # we'd have AF_UNSPEC=0, AF_UNIX=1, AF_INET=2, and AF_INET6=3.
    Address_Family = Enum.new(
        0,  :UNSPEC,
        1,  :UNIX,
        2,  :INET,
        10, :INET6)
    a = Address_Family.to_a.map { |x| x.to_s }
    p a.join(', ')          #=> "UNSPEC, UNIX, INET, INET6"
    include Address_Family::Constants
    p Address_Family::UNIX  #=> #<Socket::Address_Family::UNIX>
    p UNIX                  #=> #<Socket::Address_Family::UNIX>
    p UNIX <=> INET         #=> -1
    p UNIX == INET          #=> false
    str = case UNIX
          when UNSPEC, INET, INET6 then "not unix"
          when UNIX then "unix"
          end
    p str                   #=> "unix"
  end</pre>

  

Analysis

This RCR does add to Ruby an element of static typing, which does seem counter-intuitive, since Ruby is a dynamically typed language. Code that uses enumerated types in real life is often considered to be broken, as programs in C/C++ that use enumerated types often end up with many giant switch blocks, quickly becoming unweildy and hard to read or maintain. Such code is generally much cleaner if it is rewritten to use the state pattern or the strategy pattern.

However, real-life C code does use enumerated types, and extension writers tend to use wrap these enumerated types using ints instead of spending time to coerce the extension to use an appropriate design pattern. I believe that the danger of this RCR encouraging Ruby programmers to use a bad programming style is outweighed by the benefit gained by extension writers not using plain ints to wrap enumerated values.

Implementation

Some implementations of enumerated types in Ruby and in C++ are available from these locations:

>
>

Neither implements the exact interface proposed above, but they should be sufficient to allow users to experiment.

ruby picture
Comments Current voting

Some of them, for example O_RDWR, are not enum, but symbol name for particular bits. We need to examine more about this proposal, otherwise I don't want to say:

 open("foo", IO::RDONLY.value | IO::CREAT.value)

Besides, this problem exists among languages all over. I'm not sure how much it's worth to solve this "problem" which C/C++ programmers can live with it for long time. Perhaps more simple solution, for example number-like object that prints its symbol name, would do.

-matz.


-- Some of them, for example O_RDWR, are not enum, but symbol name for particular bits.

Good point. As written, it's not possible to use an enum like a number. This can be remedied by adding the appropriate arithmetic methods to the enumerated class (possibly as a module?). It can also be solved by making the enumerated value a delegate to the held value, though this may be a little tricky.

-- I'm not sure how much it's worth to solve this "problem" which C/C++ programmers can live with it for long time.

C programmers do tend to live with these problems, but C++ programmers do not; they tend to write wrappers around the [ugly/error-prone] C interfaces so they don't have to deal with them directly. This is part of the motivation behind and other C++ libraries.

-- Perhaps more simple solution, for example number-like object that prints its symbol name, would do.

I've considered this option before. One other alternative to a full-blown Enum class is to use symbols. My fear is that using symbols is too much trouble for most C programmers to bother, and so they will stick with what is being done currently. However, if Ruby itself avoided using integers directly, it might serve as an example to extension writers.

To make a number-like class easy to use, I think, would be just one small step away from what I've proposed, and may not be sufficiently simpler to be worthwhile. However, I haven't seen a proposal yet, so I can't say for sure.

I discussed some other options in .

-Paul Brannan


Perhaps we were mixing constant symbol (e.g. O_RDWR) and plain enum. After reading Java's JSR-201, I started to think that having enum like this (or maybe something simpler than this) can be useful.

But I still think enums should be enums. Making O_RDRW etc. to enum is overkill. Higher level wrapper API is the way to solve those kind of problems. For example, Lisp people use symbols and keyword arguments. I'm not sure yet about the way Ruby people should go.

-matz.

--- I think Enumeration should be a first-class citizen, like class and module. Anytime a conversion between the Enumeration's elements and numbers needs to be done, use a lookup table.

class Socket
  enumeration Address_Family
    enumerant :UNSPEC
    enumerant :UNIX
    enumerant :INET
    enumerant :INET6
  end  a = Address_Family.to_a.map { |x| x.to_s }
  p a.join(', ')          #=> "UNSPEC, UNIX, INET, INET6"  # Address_Family::Constants separated out
  h = { Address_Family.UNSPEC => 0, Address_Family.UNIX => 1, Address_Family.INET => 2, Address_Family.INET6 => 10 }  p Address_Family.UNIX   #=> "UNIX"
  p UNIX                  #=> "UNIX"  p UNIX <=> INET         #=> error, method missing OR 1, lexicographic comparison
  p h[UNIX] <=> h[INET]   #=> -1
  p UNIX == INET          #=> false  str = case UNIX
        when UNSPEC, INET, INET6 then "not unix"
        when UNIX then "unix"
        end
  p str                   #=> "unix"
end   

One possible implementation of this style would be that Enumeration creates a special Class, and each enumerant is a singleton subclass from that.

Additionally, C enumerations are used as bit flags. A possible extension to Enumeration follows. In this instance, all the flag combinations are combined to create nine separate singleton subclasses. Also, these would allow Arithmetic Operations to change between them.

class Colors
  enumeration Components
    flag :RED
    flag :BLUE
    flag :GREEN
  end
  c = Components.RED
  p c       #=> "RED"
  c += BLUE
  p c       #=> "[BLUE,RED]"
  c |= GREEN
  p c       #=> "[BLUE,GREEN,RED]"
  c -= BLUE
  p c       #=> "[GREEN,RED]"
  d = ~GREEN
  p d       #=> "[BLUE,RED]"
  c ^= d
  p c       #=> "[BLUE,GREEN]"
end

-chemdog


Strongly opposed 1
Opposed 2
Neutral 2
In favor 6
Strongly advocate 1
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .