Editing: RCR::RCR213 - RCRchive home

Make it so that the DATA pseudo-file can refer to the DATA section of the file that is currently be read, not just the DATA section of $0.

In Perl, the <DATA> file refers to the current file, whereas in Ruby it refers exclusively to the DATA section in $0. This means that libraries that have related data must find other ways to either embed the data or find a separate location for it (which has its own problems on installation).

Create a new default class, ENDData, that will provide the ENDData of any file that has been required or explicitly requested. The class itself holds a cache hash of StringIO objects representing the DATA section of the file in question. The Ruby interpreter would generate the StringIO objects automatically when it encounters an __END__ marker in any file.

The problem here is mostly one of performance. The EndData object can be implemented purely in Ruby, but as the Ruby interpreter is already having to handle __END__ markers in source files, this would make it so that the data is available for required libraries. * A drawback to this is that it keeps the data in memory even if it is not used. This could be fixed by the 'require' method calling EndData.clear(required_file) at the end of the require execution.

A simplistic implementation of ENDData follows:

  class ENDData
    class << self
      def for_file(fn = $0)
        return DATA if fn == $0
        return @cache[fn] if (@cache ||= {})[fn]
        str = File.read(fn).scan(/__END__\n(.*?)$/m).last[0] rescue nil
        raise(IOError, "#{fn} doesn't have any DATA") unless str
        @cache[fn] = StringIO.new(str)
      end

      def clear_cache(fn)
        (@cache ||= {})[fn] = nil
      end
    end
  end

</pre>
----

Why a StringIO object instead of a File object?  (Cf. last night's discussion
on #ruby-lang :-)

David Black
----

Most operating systems have a limit on the number of filehandles that can be open at any given time, or can restrict it on a per-user basis. If these filehandles are consumed by extensive use of unclosed DATA sections, that limit could be reached quickly. The likelihood of this is increased because I don't think that most people will be used to doing DATA.close.

I'm not opposed to EndData returning filehandles, but I don't think that most of the data will be large enough to warrant the use of a filehandle for each one, because much of the data would be already loaded.
----

I think a StringIO solution is the simplest one, but an alternate solution would be delegate to a File that opens the file the first time it is used.

-- Paul Brannan

I second that.  DATA sections could be really large and thus they should not be kept in memory.  IMHO when accessing the data it could be a normal filehandle that overrides the seek operations in order to prevent setting the current position to something before the section tag.

-- Robert Klemme

The data is in memory to some degree already, as Ruby has to have the file in memory at least temporarily.

-- Austin Ziegler

Back to RCRchive.

RCR Submission page and RCRchive powered by Ruby, Apache, RuWiki (modified), and RubLog

RCR 213: Extended Access to the DATA Pseudo-File

submitted by austin on Wed Feb 11 2004 08:00:46 PM -0800

Status: pending

Abstract

Problem

Proposal

Analysis

Implementation