ruby picture

RCR 213: Extended Access to the DATA Pseudo-File

Submitted by austin (Thu Feb 12 01:02:46 UTC 2004)

Abstract

Make it so that the DATA pseudo-file can refer to the DATA section of the file that is currently be read, not just the DATA section of $0.

Problem

In Perl, the <DATA> file refers to the current file, whereas in Ruby it refers exclusively to the DATA section in $0. This means that libraries that have related data must find other ways to either embed the data or find a separate location for it (which has its own problems on installation).

Proposal

Create a new default class, ENDData, that will provide the ENDData of any file that has been required or explicitly requested. The class itself holds a cache hash of StringIO objects representing the DATA section of the file in question.

The Ruby interpreter would generate the StringIO objects automatically when it encounters an __END__ marker in any file.

Analysis

The problem here is mostly one of performance. The EndData object can be implemented purely in Ruby, but as the Ruby interpreter is already having to handle __END__ markers in source files, this would make it so that the data is available for required libraries.

Implementation

A simplistic implementation of ENDData follows:

  class ENDData
    class <&lt; self
      def for_file(fn = $0)
        return DATA if fn == $0
        return @cache[fn] if (@cache ||= {})[fn]
        str = File.read(fn).scan(/__END__\n(.*?)$/m).last[0] rescue nil
        raise(IOError, "#{fn} doesn't have any DATA") unless str
        @cache[fn] = StringIO.new(str)
      end
      def clear_cache(fn)
        (@cache ||= {})[fn] = nil
      end
    end
  end</pre&gt;

  
ruby picture
Comments Current voting

Why a StringIO object instead of a File object? (Cf. last night's discussion on #ruby-lang :-)

David Black


Most operating systems have a limit on the number of filehandles that can be open at any given time, or can restrict it on a per-user basis. If these filehandles are consumed by extensive use of unclosed DATA sections, that limit could be reached quickly. The likelihood of this is increased because I don't think that most people will be used to doing DATA.close.

I'm not opposed to EndData returning filehandles, but I don't think that most of the data will be large enough to warrant the use of a filehandle for each one, because much of the data would be already loaded.


I think a StringIO solution is the simplest one, but an alternate solution would be delegate to a File that opens the file the first time it is used.

-- Paul Brannan

I second that. DATA sections could be really large and thus they should not be kept in memory. IMHO when accessing the data it could be a normal filehandle that overrides the seek operations in order to prevent setting the current position to something before the section tag.

-- Robert Klemme

The data is in memory to some degree already, as Ruby has to have the file in memory at least temporarily.

-- Austin Ziegler


Whether it is useful or not, though I personally stand with pro-use, if DATA section is already available, it is reasonable that it should be specific for each file. I madly love ruby's PLOS, however, at this time if I call DATA.read in required script, then main($0)'s DATA section is read. It is somewhat surprising.


Above comment is written by me.

-- Gyoung-Yoon Noh


Strongly opposed 0
Opposed 1
Neutral 0
In favor 6
Strongly advocate 0
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .