ABSTRACT

Add a new “Here Doc” syntax (or change the existing indented one; see below) so that here docs may be indented for code clarity without the additional leading whitespace being included in the string. This is a resubmission of the concepts in with a different proposed syntax.

PROBLEM

Here doc strings cannot be indented in source code without having the indentation included in the string itself. This causes programmers to do one of the following:

a) Unindent the entire Here Doc, making it difficult to visually scan the code.

  class Foo
    def bar
      if whee
        print <<END
Hello World...
   Are you listening?
END
      end
    end
  end

b) Indent the content, using markup within the string to indicate the first column, and processing the string at runtime to strip the content:

  class Foo
    def bar
      if whee
        print <<-END.gsub( /^\s*\|/, '' )
          |Hello World...
          |   Are you listening?
        END
      end
    end
  end

c) Indent the content, using more complicated heuristics to determine which whitespace is common or indented:

  class Foo
    def bar
      if whee
        print <<-END.unindent
          Hello World...
             Are you listening?
        END
      end
    end
  end

  class String
    def unindent
      leading_whitespace = self[/\A\s*/]
      self.gsub( /^#{leading_whitespace}/, '' )
    end
  end

As noted above, option ‘a’ above results in source code that is difficult to read (and grates on anal-retentive programmers). Option ‘b’ requires the programmer to do extra work to prepare the source string. (The amount of work may be relatively small depending on the text editor the programmer is using, however.) Option ‘c’ requires runtime effort for something that could be done only once when parsing the source code.

Related to this problem is the fact that Ruby includes a Here Doc syntax that allows the user to indent the END marking text; this is of limited utility, however, when the text block itself may not be indented.

PROPOSAL

I propose that the core interpretter be changed so that indented Here Docs are parsed without the leading indentation. I propose two alternatives for the syntax to support this feature:

Option 1 – New Indent-Stripping Here Doc Syntax

  class Foo
    def bar
      if whee
        print <<+END
        Hello World...
           Are you listening?
        END
      end
    end
  end

Option 2 – Modify the Behavior of the Existing Indented Here Doc Syntax

  class Foo
    def bar
      if whee
        print <<-END
        Hello World...
           Are you listening?
        END
      end
    end
  end

The behavior for both be that the amount of whitespace preceding the indented END token is removed from every line of the string. Additional leading whitespace is preserved for lines that are additionally indented. Any line that does not begin with the exact sequence of characters that precede the END token is left unaffected. (This includes lines that are indented less than the END token, lines which use a tabs for indentation where the code is using spaces, and vice-versa.)

Option 1 is a convenient solution because any old code using the indented Here Doc notation will be unaffected. However, Option 2 is better (IMO) because it makes useful the largely useless indented Here Doc notation.

Following show the results of a few lines of code before and after this RCR. These examples use the syntax option #2 from above, but obviously either could work.

  if true
    p <<-END
      A
        B
      C
      END
#=> "      A\n        B\n      C\n"      Before RCR
#=> "A\n  B\nC\n"                        After RCR

    p <<-END
      A
    B
      END
#=> "      A\n    B\n"                   Before RCR
#=> "A\n    B\n"                         After RCR

    p <<-END                                
      A                                     
      C                                     
      END                                   
#=> "      A\n    \tC\n"                 Before RCR
#=> "A\n    \tC\n"                       After RCR 

    p <<-END                                
      A                                     
        C                                     
    END                                     
#=> "      A\n      C\n"                 Before RCR
#=> "  A\n  C\n"                         After RCR

    p <<-END                                
A                                           
  B                                         
C                                           
    END                                     
#=> "A\n  B\nC\n"                        Before RCR
#=> "A\n  B\nC\n"                        After RCR
  end

ANALYSIS

Pros:

Putting this functionality in the core gives access to source code formatting not available to the scripter. (The ability to look at the indentation level of the END marker and use that to make an intelligent change to the indentation of the source code.)
If Option #2 above is chosen, it makes the indented Here Doc syntax more useful.

Cons:

Removing only the exact indentation level of the END marker means that the END marker cannot be outdented, ‘enclosing’ the string content. (However, as shown in several of the above examples, the entire string may be indented along with the END marker.)
If Option #2 above is chosen, some old code may break. I suspect this will not be very common, however. Unintended code using an indented END tag will be unaffacted. Only code indented as much (or more than) the END marker will be affected; such solutions likely are already using some post-processing of the string to remove indentation, and I suppose that much of this code will be robust enough to be unaffected. Only code that ‘guesses’ at the indentation level based on the indented string may be affected. I’m guessing at all of this, however.

IMPLEMENTATION

This change should be in the parser/interpretter, not Ruby runtime. For reference, however, here’s some Ruby pseudo-code that should be applied when processing here docs using either syntax above.

  def post_process_indented_heredoc( here_doc_string )
    leading_whitespace = get_whitespace_preceding_END_marker( )
    here_doc_string.gsub!( /^#{leading_whitespace}/, '' )
  end

Comments

Return to top

Here Doc syntax that strips indenting whitespace (#2)