RCR 280: Unified type conversion framework

Problem

RCRchive		Top	Help	Register	Sign in	RSS	Contact	Credits

Currently Ruby has no well-defined mechanism for converting one type of object into another; it only has a convention. The standard library methods such as to_s and to_i follow a naming convention that is not extensible (third party users write to_class, while the standard library methods write to_c), lack error checking, and pollute the method namespaces of classes that provide these conversions.
If error checking is desired, the user must use global methods String() and Integer(), which don't match the usual convention, and require the use of a user-defined dispatch mechanism to dispatch based on the source type.

Analysis

We dare to use the word type, because this system is not limited to converting
classes or modules. The proposed system does not limit in any way the concept
of starting and target type. Even if the basic building blocks are still
classes and modules we can use properties or state of an object to direct the
tranformation. The proposal considers that types not represented from Classes
or Modules may be supplied as a symbol to the #to method, say:


   a=some_string.to(:CapitalizedString)

The Callable object stored in the registry may return a newly created object, wrap
it in a proxy, add singleton methods or include modules. It can be even used
as an assertion facility, say:


  a=3.to :Odd #=> 3
  a=4.to :Odd #=> TypeError

This way the various is_a?, kind_of? and
respond_to? may be factored in a
conversion path that allows simple declarative usage, centralized management
and crash-early behaviour, say:


 def foo a,b
  a=a.to Bla
  b=b.to Boo
  ..do stuff
 end

This is somewhat similar to the lisp type declaring approach, in that it
allows the author to hint types using the full power of the language.

The system is extensible in that it allows conversion paths to be defined
from the developer of the starting type, of the target type or from third
party users.

It can be used to enhance interoperability between different
libraries, because it allows DeveloperA to build LibA while DeveloperB
develops LibB and ThirdUser can use both libraries if a simple conforming
path to LibT is provided. This is not something new, but a formalization of
a well known practice. This approach has proven itself useful in other languages (
for example,>PyProtocols)

The system integrates in actual ruby (see sample implementation) but is not
limited to it.

The compiler/interpreter/vm may optimize the type conversions/declarations
based on compile time or runtime analisys, for example in cases like this:


def sum a,b
 a=a.as Numeric; b=b.as Numeric
 a+b
end
 sum 1,2

by removing the check once informations about arguments are gathered.

Finally, this approach allows clean documentation of expected arguments, which
can be parsed by specialized tools (rdoc)

For examples of usage look at the provided implementation; it comes with
simple tests.

Implementation

A simple implementation is provided. The basic algorithm is:

if start_type.is_a? target_type return start_type
if start_type.class.has_conversion_to target_type ->return
convertion(self)
look in ancestors
if nothing is found or the conversion fails a TypeError is raised.

Raising an error is better than giving a false, nil,
or default value because it allows the transformation of objects in Boolean
values or NilClass, allowing users to write their own boolean
transformation (often asked as #to_bool) without changing existing
classes. It also makes error checking possible, while still allowing easy usage of
default values like:


    foo= bla.to(Integer) rescue 0

Note that optional arguments for the conversion (such as the base
in Integer conversions) can be passed to the (#to,
#as, #to_type) method, and
are passed to the conversion Callable, so they still work as expected), i.e.


    '200'.to Integer, 16

would convert the string into an Integer considering it encoded in
base 16.

Writing transformation paths is really simple.
Supposing we created a SortedCollection class. Converting an
Enumerable object would be easy:


  ConvRegistry[Enumerable,SortedCollection]= proc do |enum| 
                   sca=SortedCollection.new
                                            enum.each {|i| sc.insert i }
                                            sc
                                        end

Conversions to pseudoclass are also simple, i.e. a conversion from
Enumerable to a pseudoclass :Bool can be as easy as


  ConvRegistry[Enumerable,:Bool]=proc { |x| not x.empty? }

The behaviour of the proc is anyway not limited to simple statements.
The system allows many of the Interface-like systems proposed for ruby over the
years to be embedded in it, for example people could tag an object as implementing an
interface and write a conversion path that just checks this tag.

If this RCR is accepted if would be nice that this mechanism was easily
accessible from C code. So the proposal is that if this RCR is accepted that
it be rewritten in C, so that it has the same interface from Ruby,
but so that if this code can be written in Ruby:

                                                                   
  ConvRegistry[Enumerable,:Bool]=proc { ... }
  ConvRegistry[String,Integer]= proc { ... }
  ConvRegistry[Class,Enumerable]= proc ... }

It would be possible to write this in C:


  rb_define_conversion(rb_cEnumerable, rb_intern("Bool"),enum_to_bool);
  rb_define_conversion(rb_cString, rb_cInteger, string_to_int);
  rb_define_conversion(rb_cClass, rb_cEnumerable, class_to_enum);

One last thing:
The sample implementation does not allow automatic transitions between
arbitrary types. This means that if a path from T1 to T2 exists, and another
one exists from T2 to T3 then T1 could be automagically converted to T3.

This is not allowed because it makes the system more complex and less
predictable, plus there will be an ambiguity when two or more paths are defined
implicitly.


class ConvRegistry
    @@reg=Hash.new
    def self.[] from_type,to_type
        if res=@@reg[[from_type,to_type]]
            res
        elsif res= find_in_ancestors(from_type,to_type)
            res
        else
            raise TypeError.new("no conversion for #{from_type},#{to_type}")
                                                   
        end
    end
    def self.find_in_ancestors from_type,to_type
        from_type.ancestors[1..-1].each do |anc|
        if res2= @@reg[[anc,to_type]]
                return  res2
            end
        end
        nil
    end
    def self.[]= from_type,to_type,func
        @@reg[[from_type,to_type]]=func
    end
 
end
                                                                      
class Object
    def to(something,*args,&blk)
        if something.is_a? Module and self.is_a? something
            return self
   end
        ConvRegistry[self.class,something].call(self, *args,&blk)
    end
end

if __FILE__== $0
    require 'test/unit'
    require 'stringio'
    class MyTest < Test::Unit::TestCase
        def setup
   ConvRegistry[Enumerable,:Bool]=proc { |x| not x.empty? }
            ConvRegistry[String,Integer]= proc { |x,*args| x.to_i *args }
            ConvRegistry[Class,Enumerable]= proc { |x| x.send
:include,Enumerable }
   ConvRegistry[Object,Enumerable]= proc {|x| x.send :extend,
Enumerable }
   ConvRegistry[Object,:Frozen]= proc do |x|
                                    if x.frozen?
                                                    x
        else
                                                    raise TypeError.new
                                                end
         end
           ConvRegistry[Integer,:Odd]= proc   do |x|
                  if x%2==1
                                                    x
           else
                                                    raise TypeError.new
                                                end
         end
            ConvRegistry[Object,:Readable]= proc do |x|
               if x.respond_to? :read
                                                    x
                                                else
                    raise TypeError.new
                                                end
   end
        end

        def test_class_class
            assert_equal '5'.to(Integer),5
        end

        def test_subclass_class
            my=Class.new String
            m=my.new '5'
            assert_equal m.to(Integer),5
        end

        def test_class_module
           f_class=Class.new(Object)
           f_class.class_eval do
                def each
                    yield 1
                end
            end
            f_obj=f_class.to(Enumerable).new
            assert_equal f_obj.find_all {|x| x==1}, [1]
                                                                   
        end
                                                              
        def test_more_arguments
            a='aa'
            assert_equal 170,a.to(Integer, 16)
        end

        def test_instance_module
            f_class=Class.new Object
            f_class.class_eval do
                def each
                    yield 1
                end
            end
            f_obj=f_class.new
 f_obj=f_obj.to Enumerable
            assert_equal f_obj.find_all {|x| x==1}, [1]
        end

        def test_singleton_module
            f=Object.new
            def f.each
                yield 1
            end
            f=f.to(Enumerable)
            assert_equal f.find_all {|x| x==1}, [1]
        end
                                               
        def test_property_pseudoclass_ok
            a= 'ciao'
        a.freeze
            a=a.to(:Frozen)
                                      
            assert_equal a, 'ciao'
        end

        def test_property_pseudoclass_fail
            assert_raises(TypeError) {'ciao'.to :Frozen}
        end
                                           
        def test_property_pseudoclass_ok2
            a=5
            a=a.to :Odd
            assert_equal a,5
        end
                   
                                                                             
        def test_property_pseudoclass_fail2
            assert_raises(TypeError) {6.to :Odd}
        end
                          
        def test_object_superclass
            a=42
            assert_equal a, a.to(Integer)
        end

        def test_object_mixin_ok
            a=[]
            assert_equal a,a.to(Enumerable)
        end
                                            
        def test_object_mixin_fail
            a=5
            assert_raises(TypeError) {a.to Enumerable}
        end

        def test_methodbag_pseudoclass_ok
            a=StringIO.new
            assert_equal a,a.to(:Readable)
        end
                                
        def test_methodbag_pseudoclass_fail
            a=24
          assert_raises(TypeError) {a.to :Readable}
        end

        def test_enumerable_bool
            a=''
            assert_equal false, a.to( :Bool)
            a << 1
            assert_equal true , a.to( :Bool)
            a=[]
            assert_equal false, a.to( :Bool)
   a << 1
            assert_equal true , a.to( :Bool)
        end
    end
end

Comments

Current voting

I don't think ruby needs a general type conversion mechanism, because you don't have static typing that requires many conversions.

I prefer methods to_x b/c they describe precisely which conversions are supported by the developer. You have to write those methods anyways. Put another way, I think Object#to is syntactic sugar that makes a class more confusing.

~ patrick may (sorry I didn't sign originally)

First, I don't think static typing needs much more conversions than ruby's type system. For the second part, is not just syntax sugar. The to_x thing is nice because it is short, but it leaves the need for to_set, to_enum, Hash[] and Integer(). Not everyond can write single character names :) and anyway you end up polluting every class with a conversion to everything else. The larger the code base, larger the name pollution.

Also note that none of to_f, to_set, to_enum, Hash[] and Integer() follow the same convention then the others.

Also you can't expect that DeveloperA can write everything for you, simply because some thing does not exist when he is writing. Say, Array can't support Set convertions, but set.rb does add Array#to_set. The proposed system just make this consistent and extensible. Hope this makes thing clearer. --gabriele renzi

I am in favor of something like this. I would like to understand it better, without have to overly tax me little brain so much. :-) If I may, it certainly wouldn't hurt if you found a tag team partner to dev with and brought it over to the RcrFoundry to improve (search for on Ruby Garden Wiki).

I see this proposal as a nice way of expressing a large number of conversions. The main thing I am not sold on is whether it is a good idea to convert all classes to each other. Modules and method conventions offer better possibilities for code re-use. For example, I prefer the convention of #each and Enumerable over a possible convention of #to_a (or #to_iterator).

I like that the current convention encourages encourages standard methods (messages) over standard types.

~ patrick may

Patrick, I'm not sure what you are trying to say (I see a number of possibilities, but before I respond to your comment, I want to be sure I'm responding to the right comment). How does using #each provide any more code re-use than what is provided in this RCR?

-- Paul Brannan

Paul,

I don't think this proposal makes sense unless one wants to do lots of class conversions [1] . I don't think conversions to standard types qualifies as 'lots of conversions'. So I ask, why would one do lots of class conversions, especially conversions from one non-standard type to another?

The use case I come up with is class casting -- This method expects a Foo, so I need to convert this object to a Foo. I don't like that, I like this convention better: This method expects a parameter that responds to #foo and #bar.

Type conversion is a smell to me. Type conversions are about changing data. While necessary, IMHO data migrations (even in the small level of in-memory data) lead to hard and thorny problems [2]. The better design is to re-engineer and reduce the type conversions, instead of trying to make it easier to manage many type conversions.

Cheers,

Patrick

p.s. I don't see what's wrong with Hash[] or Integer(). Personally, I've never used them, b/c I use { } or #to_i. If we need method convention for hashes, I think #to_h or #to_hash is fine for me.

1. If the number of class conversions is small, then most invocations of Object#to will be errors. There will be a documentation problem in identifying the valid invocations of Object#to. If most conversions are to standard types, I think #to_s or #to_a is better.

Imo, you do type conversions when you want to handle data in different ways, not really change them. Say, to_enum exists so that you can handle something as an Enumerable. A CGI->FCGI or CGI->WEBrick converter may allow you to use the same code with different backends. And notice that you need Integer() when you want do discriminate valid data, cause 'foo'.to_i gives 0.

Related to "the other side" of this RCR, let me say I don't think most of the conversions would be needed beetween standard types, but in external code.

My point is that you can't refactor the whole RAA/RubyForge, you won't be able to write code based on a mixin available in both rubymail/tmail (or REXML/libxml2 or Cerise::FormHandler/Form::Validator). You would need to write an adaption layer, and this RCR gives you a simple framework in which you can work. It is nothing strange, just a Grand High Exalted interface systems ;)

--gabriele renzi

If there is a standard method that two classes support, and all you want is to make sure you have something of that type, then there should be a way to short circuit the transformation early. That is, given "Foo < DelegateClass(String)", and

  def bar( anyThing )
    x = anyThing.to_s
    ..use x
  end

there should be some way to abbreviate the fact that ConvRegistry[Foo, String] is just an identity transformation. The extra code in your implementation of Object#to works for class, but not type.

Also, #to isn't a very good, descriptive name for something that ends up being a bouncer method/assertion. Sure, it can do such a thing, but I'd rather it only attempt conversion. (x.to(:Even) would return (x/2).round * 2, while x.to(:Odd) would probably just add one if (x%10)<5, and subtract one if (x%10)>5, or something to that effect. However, it would still raise an exception if you fed it a string, or a regexp.)

--Zallus Kanite

About shortcircuit: It would be nice, sure. But how would you do this? The only ways I can think of are:

some kind of discovering based on partial evaluation (a-la MrSpidey)
runtime optimization

If there are more simple approaches I can't think of them. I hope real implementors would write great code to support this, I just provided a sample implementation to show the behaviour :)

About the name, I'm not sure I understand what you're saing: you first state that you don't like #to for a bouncer method, and I understand this. Actually I'm leaning more and more on the #as side in these days. But then you say it is ok. I'm dumb, Could you expand a little, please?

The only thing I get is that you want #as(Odd|Even) to change the object. :Odd and :Even are pseudotypes for which I found more reasonable to work just like assertions, but you're free to do what you want in you own code.

-- gabriele renzi

Very nice proposal. I think a syntax shortcut for declaring the types of arguements would be nice: def foo(arg as aType, arg2 as bType):

  #code

OR def foo(arg -> aType, arg2 as bType):

  #code

which would convert each parameter, in order, to the desired type, raising an ArguementError or TypeError (suggestions?) if the conversion fails. This assumes -> or 'as' would be the type-coversion operator.

- C. Rebert

what about using the ClassName() functions, like Integer() and String(). to faciliate this?

they have an already specified meaning as of now, so changing them could break things.

Strongly opposed	1
Opposed	1
Neutral	1
In favor	5
Strongly advocate	7

RCR 280: Unified type conversion framework

Abstract

Problem

Proposal

Analysis

Implementation

If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password: