ruby picture

RCR 280: Unified type conversion framework

Submitted by grenzi (Thu Sep 09 03:09:27 UTC 2004)

Abstract

As of ruby 1.8 there is no standard way to provide a transformation (conversion) from one "type" of object to another.
This RCR propose a generalized framework that is completely extensible and is more general and powerful than the simple convention used now.

Problem

Currently Ruby has no well-defined mechanism for converting one type of object into another; it only has a convention. The standard library methods such as to_s and to_i follow a naming convention that is not extensible (third party users write to_class, while the standard library methods write to_c), lack error checking, and pollute the method namespaces of classes that provide these conversions.
If error checking is desired, the user must use global methods String() and Integer(), which don't match the usual convention, and require the use of a user-defined dispatch mechanism to dispatch based on the source type.

Proposal

A method is added to Object (possible names: #to, #to_type, #as -- we'll use #to in the RCR) accepting a parameter wich represents the target type of the transformation.
Transformation paths are kept as Callable objects in a global registry.

Analysis

We dare to use the word type, because this system is not limited to converting
classes or modules. The proposed system does not limit in any way the concept
of starting and target type. Even if the basic building blocks are still
classes and modules we can use properties or state of an object to direct the
tranformation. The proposal considers that types not represented from Classes
or Modules may be supplied as a symbol to the #to method, say:

a=some_string.to(:CapitalizedString)

The Callable object stored in the registry may return a newly created object, wrap
it in a proxy, add singleton methods or include modules. It can be even used
as an assertion facility, say:

a=3.to :Odd #=> 3
a=4.to :Odd #=> TypeError

This way the various is_a?, kind_of? and
respond_to? may be factored in a
conversion path that allows simple declarative usage, centralized management
and crash-early behaviour, say:

def foo a,b
a=a.to Bla
b=b.to Boo
..do stuff
end

This is somewhat similar to the lisp type declaring approach, in that it
allows the author to hint types using the full power of the language.


The system is extensible in that it allows conversion paths to be defined
from the developer of the starting type, of the target type or from third
party users.


It can be used to enhance interoperability between different
libraries, because it allows DeveloperA to build LibA while DeveloperB
develops LibB and ThirdUser can use both libraries if a simple conforming
path to LibT is provided. This is not something new, but a formalization of
a well known practice. This approach has proven itself useful in other languages (
for example,>PyProtocols)


The system integrates in actual ruby (see sample implementation) but is not
limited to it.

The compiler/interpreter/vm may optimize the type conversions/declarations
based on compile time or runtime analisys, for example in cases like this:

def sum a,b
a=a.as Numeric; b=b.as Numeric
a+b
end
sum 1,2

by removing the check once informations about arguments are gathered.


Finally, this approach allows clean documentation of expected arguments, which
can be parsed by specialized tools (rdoc)


For examples of usage look at the provided implementation; it comes with
simple tests.

Implementation

A simple implementation is provided. The basic algorithm is:



Raising an error is better than giving a false, nil,
or default value because it allows the transformation of objects in Boolean
values or NilClass, allowing users to write their own boolean
transformation (often asked as #to_bool) without changing existing
classes. It also makes error checking possible, while still allowing easy usage of
default values like:

foo= bla.to(Integer) rescue 0



Note that optional arguments for the conversion (such as the base
in Integer conversions) can be passed to the (#to,
#as, #to_type) method, and
are passed to the conversion Callable, so they still work as expected), i.e.

'200'.to Integer, 16

would convert the string into an Integer considering it encoded in
base 16.



Writing transformation paths is really simple.
Supposing we created a SortedCollection class. Converting an
Enumerable object would be easy:

ConvRegistry[Enumerable,SortedCollection]= proc do |enum|
sca=SortedCollection.new
enum.each {|i| sc.insert i }
sc
end

Conversions to pseudoclass are also simple, i.e. a conversion from
Enumerable to a pseudoclass :Bool can be as easy as

ConvRegistry[Enumerable,:Bool]=proc { |x| not x.empty? }



The behaviour of the proc is anyway not limited to simple statements.
The system allows many of the Interface-like systems proposed for ruby over the
years to be embedded in it, for example people could tag an object as implementing an
interface and write a conversion path that just checks this tag.



If this RCR is accepted if would be nice that this mechanism was easily
accessible from C code. So the proposal is that if this RCR is accepted that
it be rewritten in C, so that it has the same interface from Ruby,
but so that if this code can be written in Ruby:
                                                                   
ConvRegistry[Enumerable,:Bool]=proc { ... }
ConvRegistry[String,Integer]= proc { ... }
ConvRegistry[Class,Enumerable]= proc ... }

It would be possible to write this in C:

rb_define_conversion(rb_cEnumerable, rb_intern("Bool"),enum_to_bool);
rb_define_conversion(rb_cString, rb_cInteger, string_to_int);
rb_define_conversion(rb_cClass, rb_cEnumerable, class_to_enum);

One last thing:
The sample implementation does not allow automatic transitions between
arbitrary types. This means that if a path from T1 to T2 exists, and another
one exists from T2 to T3 then T1 could be automagically converted to T3.


This is not allowed because it makes the system more complex and less
predictable, plus there will be an ambiguity when two or more paths are defined
implicitly.



class ConvRegistry
@@reg=Hash.new
def self.[] from_type,to_type
if res=@@reg[[from_type,to_type]]
res
elsif res= find_in_ancestors(from_type,to_type)
res
else
raise TypeError.new("no conversion for #{from_type},#{to_type}")

end
end
def self.find_in_ancestors from_type,to_type
from_type.ancestors[1..-1].each do |anc|
if res2= @@reg[[anc,to_type]]
return res2
end
end
nil
end
def self.[]= from_type,to_type,func
@@reg[[from_type,to_type]]=func
end

end

class Object
def to(something,*args,&blk)
if something.is_a? Module and self.is_a? something
return self
end
ConvRegistry[self.class,something].call(self, *args,&blk)
end
end

if __FILE__== $0
require 'test/unit'
require 'stringio'
class MyTest < Test::Unit::TestCase
def setup
ConvRegistry[Enumerable,:Bool]=proc { |x| not x.empty? }
ConvRegistry[String,Integer]= proc { |x,*args| x.to_i *args }
ConvRegistry[Class,Enumerable]= proc { |x| x.send
:include,Enumerable }
ConvRegistry[Object,Enumerable]= proc {|x| x.send :extend,
Enumerable }
ConvRegistry[Object,:Frozen]= proc do |x|
if x.frozen?
x
else
raise TypeError.new
end
end
ConvRegistry[Integer,:Odd]= proc do |x|
if x%2==1
x
else
raise TypeError.new
end
end
ConvRegistry[Object,:Readable]= proc do |x|
if x.respond_to? :read
x
else
raise TypeError.new
end
end
end

def test_class_class
assert_equal '5'.to(Integer),5
end

def test_subclass_class
my=Class.new String
m=my.new '5'
assert_equal m.to(Integer),5
end

def test_class_module
f_class=Class.new(Object)
f_class.class_eval do
def each
yield 1
end
end
f_obj=f_class.to(Enumerable).new
assert_equal f_obj.find_all {|x| x==1}, [1]

end

def test_more_arguments
a='aa'
assert_equal 170,a.to(Integer, 16)
end

def test_instance_module
f_class=Class.new Object
f_class.class_eval do
def each
yield 1
end
end
f_obj=f_class.new
f_obj=f_obj.to Enumerable
assert_equal f_obj.find_all {|x| x==1}, [1]
end

def test_singleton_module
f=Object.new
def f.each
yield 1
end
f=f.to(Enumerable)
assert_equal f.find_all {|x| x==1}, [1]
end

def test_property_pseudoclass_ok
a= 'ciao'
a.freeze
a=a.to(:Frozen)

assert_equal a, 'ciao'
end

def test_property_pseudoclass_fail
assert_raises(TypeError) {'ciao'.to :Frozen}
end

def test_property_pseudoclass_ok2
a=5
a=a.to :Odd
assert_equal a,5
end


def test_property_pseudoclass_fail2
assert_raises(TypeError) {6.to :Odd}
end

def test_object_superclass
a=42
assert_equal a, a.to(Integer)
end

def test_object_mixin_ok
a=[]
assert_equal a,a.to(Enumerable)
end

def test_object_mixin_fail
a=5
assert_raises(TypeError) {a.to Enumerable}
end

def test_methodbag_pseudoclass_ok
a=StringIO.new
assert_equal a,a.to(:Readable)
end

def test_methodbag_pseudoclass_fail
a=24
assert_raises(TypeError) {a.to :Readable}
end

def test_enumerable_bool
a=''
assert_equal false, a.to( :Bool)
a << 1
assert_equal true , a.to( :Bool)
a=[]
assert_equal false, a.to( :Bool)
a << 1
assert_equal true , a.to( :Bool)
end
end
end

ruby picture
Comments Current voting

I don't think ruby needs a general type conversion mechanism, because you don't have static typing that requires many conversions.

I prefer methods to_x b/c they describe precisely which conversions are supported by the developer. You have to write those methods anyways. Put another way, I think Object#to is syntactic sugar that makes a class more confusing.

~ patrick may (sorry I didn't sign originally)


First, I don't think static typing needs much more conversions than ruby's type system. For the second part, is not just syntax sugar. The to_x thing is nice because it is short, but it leaves the need for to_set, to_enum, Hash[] and Integer(). Not everyond can write single character names :) and anyway you end up polluting every class with a conversion to everything else. The larger the code base, larger the name pollution.

Also note that none of to_f, to_set, to_enum, Hash[] and Integer() follow the same convention then the others.

Also you can't expect that DeveloperA can write everything for you, simply because some thing does not exist when he is writing. Say, Array can't support Set convertions, but set.rb does add Array#to_set. The proposed system just make this consistent and extensible. Hope this makes thing clearer. --gabriele renzi


I am in favor of something like this. I would like to understand it better, without have to overly tax me little brain so much. :-) If I may, it certainly wouldn't hurt if you found a tag team partner to dev with and brought it over to the RcrFoundry to improve (search for on Ruby Garden Wiki).

T.


I see this proposal as a nice way of expressing a large number of conversions. The main thing I am not sold on is whether it is a good idea to convert all classes to each other. Modules and method conventions offer better possibilities for code re-use. For example, I prefer the convention of #each and Enumerable over a possible convention of #to_a (or #to_iterator).

I like that the current convention encourages encourages standard methods (messages) over standard types.

~ patrick may


Patrick, I'm not sure what you are trying to say (I see a number of possibilities, but before I respond to your comment, I want to be sure I'm responding to the right comment). How does using #each provide any more code re-use than what is provided in this RCR?

-- Paul Brannan


Paul,

I don't think this proposal makes sense unless one wants to do lots of class conversions [1] . I don't think conversions to standard types qualifies as 'lots of conversions'. So I ask, why would one do lots of class conversions, especially conversions from one non-standard type to another?

The use case I come up with is class casting -- This method expects a Foo, so I need to convert this object to a Foo. I don't like that, I like this convention better: This method expects a parameter that responds to #foo and #bar.

Type conversion is a smell to me. Type conversions are about changing data. While necessary, IMHO data migrations (even in the small level of in-memory data) lead to hard and thorny problems [2]. The better design is to re-engineer and reduce the type conversions, instead of trying to make it easier to manage many type conversions.

Cheers,

Patrick

p.s. I don't see what's wrong with Hash[] or Integer(). Personally, I've never used them, b/c I use { } or #to_i. If we need method convention for hashes, I think #to_h or #to_hash is fine for me.

1. If the number of class conversions is small, then most invocations of Object#to will be errors. There will be a documentation problem in identifying the valid invocations of Object#to. If most conversions are to standard types, I think #to_s or #to_a is better.

2.


Imo, you do type conversions when you want to handle data in different ways, not really change them. Say, to_enum exists so that you can handle something as an Enumerable. A CGI->FCGI or CGI->WEBrick converter may allow you to use the same code with different backends. And notice that you need Integer() when you want do discriminate valid data, cause 'foo'.to_i gives 0.

Related to "the other side" of this RCR, let me say I don't think most of the conversions would be needed beetween standard types, but in external code.

My point is that you can't refactor the whole RAA/RubyForge, you won't be able to write code based on a mixin available in both rubymail/tmail (or REXML/libxml2 or Cerise::FormHandler/Form::Validator). You would need to write an adaption layer, and this RCR gives you a simple framework in which you can work. It is nothing strange, just a Grand High Exalted interface systems ;)

--gabriele renzi


If there is a standard method that two classes support, and all you want is to make sure you have something of that type, then there should be a way to short circuit the transformation early. That is, given "Foo < DelegateClass(String)", and


          
  def bar( anyThing )
    x = anyThing.to_s
    ..use x
  end
there should be some way to abbreviate the fact that ConvRegistry[Foo, String] is just an identity transformation. The extra code in your implementation of Object#to works for class, but not type.
          

Also, #to isn't a very good, descriptive name for something that ends up being a bouncer method/assertion. Sure, it can do such a thing, but I'd rather it only attempt conversion. (x.to(:Even) would return (x/2).round * 2, while x.to(:Odd) would probably just add one if (x%10)<5, and subtract one if (x%10)>5, or something to that effect. However, it would still raise an exception if you fed it a string, or a regexp.)

--Zallus Kanite


About shortcircuit: It would be nice, sure. But how would you do this? The only ways I can think of are:

  1. some kind of discovering based on partial evaluation (a-la MrSpidey)
  2. runtime optimization
If there are more simple approaches I can't think of them. I hope real implementors would write great code to support this, I just provided a sample implementation to show the behaviour :)

About the name, I'm not sure I understand what you're saing: you first state that you don't like #to for a bouncer method, and I understand this. Actually I'm leaning more and more on the #as side in these days. But then you say it is ok. I'm dumb, Could you expand a little, please?

The only thing I get is that you want #as(Odd|Even) to change the object. :Odd and :Even are pseudotypes for which I found more reasonable to work just like assertions, but you're free to do what you want in you own code.

-- gabriele renzi


Very nice proposal. I think a syntax shortcut for declaring the types of arguements would be nice: def foo(arg as aType, arg2 as bType):
  #code
OR def foo(arg -> aType, arg2 as bType):
  #code
which would convert each parameter, in order, to the desired type, raising an ArguementError or TypeError (suggestions?) if the conversion fails. This assumes -> or 'as' would be the type-coversion operator.


- C. Rebert


what about using the ClassName() functions, like Integer() and String(). to faciliate this?


they have an already specified meaning as of now, so changing them could break things.


Strongly opposed 1
Opposed 1
Neutral 1
In favor 5
Strongly advocate 7
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .