Expand proc in Marshal.load(source, proc) to be a filter instead of a monitor (#6)

Submitted by: Rutger Nijlunsing

This revision by: Rutger Nijlunsing

Date: Sat Mar 03 17:44:20 -0500 2007

ABSTRACT

Marshal.load() can monitor objects created by passing a proc as parameter, but not change those objects on-the-fly. This is convenient for optimizing Marshalling in both speed and time.

PROBLEM

I’ve got an object graph to Marshal to file which contains a lot of duplicate objects (String with the same value, Array with the same Strings etc.) since those objects were created parsing structured files with a lot fields containing the same kind of information. Objects can be shared when having the same value (if not being changed in place) which saves Marshalling time (which is the bottleneck I’m working on) and space (both disk and RAM) because of less objects to be Marshalled

PROPOSAL

Use the currently unused return-value of the current Proc as the new Object. This way, the sole monitoring function of the proc becomes a filter if so required.

So when implemented, ‘optimizing’ an object DAG to collapse multiple objects containing the same value would become:

objects_in_dag = {}
Marshal.load(some_io, Proc.new { |obj| objects_in_dag[obj] ||= obj })

In this case, use relies on having a good hash function, which is already implemented for most basic structures like Array, Hash (see accepted RCR#344), String and so on.

An alternative interface could be that the proc argument passed is the monitor while the block given is the filter:

objects_in_dag = {}
Marshal.load(some_io) { |obj| objects_in_dag[obj] ||= obj }

ANALYSIS

Marshal.dump() and Marshal.load() are already used to make deep-copies of object graphs. In a way, the current mechanism of supplying a Proc is already an object-walker to walk over arbitrary graphs. It would be useful to expand the walking to even changing the graph.

This is a non-backward compatible change. However, I have not seen many users of the possibility to pass a Proc in the current situation so I would expect the negative impact to be minimal for the current situation.

IMPLEMENTATION

In marshal.c around line 1280 (ruby 1.9, 20060609), change:

if (proc) {
rb_funcall(proc, rb_intern("call"), 1, v);
}

into:

if (proc) {
v = rb_funcall(proc, rb_intern("call"), 1, v);
}

Comments


Return to top

Copyright © 2006, Ruby Power and Light, LLC