Here Doc syntax that strips indenting whitespace (#2)

Submitted by: Gavin Kistner

This revision by: Gavin Kistner

Date: Sat Nov 25 13:48:53 -0500 2006

ABSTRACT

Add a new “Here Doc” syntax (or change the existing indented one; see below) so that here docs may be indented for code clarity without the additional leading whitespace being included in the string. This is a resubmission of the concepts in with a different proposed syntax.

PROBLEM

Here doc strings cannot be indented in source code without having the indentation included in the string itself. This causes programmers to do one of the following:

a) Unindent the entire Here Doc, making it difficult to visually scan the code.
  class Foo
    def bar
      if whee
        print <<END
Hello World...
   Are you listening?
END
      end
    end
  end
b) Indent the content, using markup within the string to indicate the first column, and processing the string at runtime to strip the content:
  class Foo
    def bar
      if whee
        print <<-END.gsub( /^\s*\|/, '' )
          |Hello World...
          |   Are you listening?
        END
      end
    end
  end

c) Indent the content, using more complicated heuristics to determine which whitespace is common or indented:

  class Foo
    def bar
      if whee
        print <<-END.unindent
          Hello World...
             Are you listening?
        END
      end
    end
  end

  class String
    def unindent
      leading_whitespace = self[/\A\s*/]
      self.gsub( /^#{leading_whitespace}/, '' )
    end
  end

As noted above, option ‘a’ above results in source code that is difficult to read (and grates on anal-retentive programmers). Option ‘b’ requires the programmer to do extra work to prepare the source string. (The amount of work may be relatively small depending on the text editor the programmer is using, however.) Option ‘c’ requires runtime effort for something that could be done only once when parsing the source code.

Related to this problem is the fact that Ruby includes a Here Doc syntax that allows the user to indent the END marking text; this is of limited utility, however, when the text block itself may not be indented.

PROPOSAL

I propose that the core interpretter be changed so that indented Here Docs are parsed without the leading indentation. I propose two alternatives for the syntax to support this feature:

Option 1 – New Indent-Stripping Here Doc Syntax

  class Foo
    def bar
      if whee
        print <<+END
        Hello World...
           Are you listening?
        END
      end
    end
  end

Option 2 – Modify the Behavior of the Existing Indented Here Doc Syntax

  class Foo
    def bar
      if whee
        print <<-END
        Hello World...
           Are you listening?
        END
      end
    end
  end

The behavior for both be that the amount of whitespace preceding the indented END token is removed from every line of the string. Additional leading whitespace is preserved for lines that are additionally indented. Any line that does not begin with the exact sequence of characters that precede the END token is left unaffected. (This includes lines that are indented less than the END token, lines which use a tabs for indentation where the code is using spaces, and vice-versa.)

Option 1 is a convenient solution because any old code using the indented Here Doc notation will be unaffected. However, Option 2 is better (IMO) because it makes useful the largely useless indented Here Doc notation.

Following show the results of a few lines of code before and after this RCR. These examples use the syntax option #2 from above, but obviously either could work.

  if true
    p <<-END
      A
        B
      C
      END
#=> "      A\n        B\n      C\n"      Before RCR
#=> "A\n  B\nC\n"                        After RCR

    p <<-END
      A
    B
      END
#=> "      A\n    B\n"                   Before RCR
#=> "A\n    B\n"                         After RCR

    p <<-END                                
      A                                     
      C                                     
      END                                   
#=> "      A\n    \tC\n"                 Before RCR
#=> "A\n    \tC\n"                       After RCR 

    p <<-END                                
      A                                     
        C                                     
    END                                     
#=> "      A\n      C\n"                 Before RCR
#=> "  A\n  C\n"                         After RCR

    p <<-END                                
A                                           
  B                                         
C                                           
    END                                     
#=> "A\n  B\nC\n"                        Before RCR
#=> "A\n  B\nC\n"                        After RCR
  end

ANALYSIS

Pros: Cons:

IMPLEMENTATION

This change should be in the parser/interpretter, not Ruby runtime. For reference, however, here’s some Ruby pseudo-code that should be applied when processing here docs using either syntax above.

  def post_process_indented_heredoc( here_doc_string )
    leading_whitespace = get_whitespace_preceding_END_marker( )
    here_doc_string.gsub!( /^#{leading_whitespace}/, '' )
  end

Comments


Return to top

Copyright © 2006, Ruby Power and Light, LLC