ruby picture

RCR 281: Add String#chars

Submitted by flgr (Sat Oct 02 09:10:18 UTC 2004)


str.split(//) is something that you can not understand intuitively. Therefore str.chars should be added to standard Ruby.


You have to know about Regexps and how empty Regexps get handled by .split in order to know that this returns an Array of characters. For newcomers it is not easy to understand code involving this and even if you know what it does it is still uglier than str.chars.


Add String#chars. Optimally it would be slightly more efficient than str.split(//). (It would not need to invoke the Regexp engine.)


There's also str.scan(/./) which is slightly more readable than str.split(//), but I think that the characters of a String are needed frequently enough (especially with .map, .inject and .each) that one shouldn't need to use a Regexp to accomplish this.


Here is a Ruby implementation. It doesn't do anything about performance, though. (Building a character array via str[i, 1] from Ruby would be way slower than just using the Regexp engine.)
class String
  def chars
ruby picture
Comments Current voting

"explode" would be a better name. It's been called that in other languages.

-- Unknown

At least in PHP that is an alias for .split -- I'm not sure if we should use that name for something that returns the characters of a String as an Array...

-- Florian Gross

I wonder: does this work with various encodings? Is Oniguruma actually able to understand what a character is ? -- gabriele renzi

As far as I know the old regexp engine was already able to recognize UTF8 characters (but with no intelligent combination character and upper-to-lowercase handling) via the /u option. I think there's also options for misc. asian encodings.

So I would find it likely for Onigurama to also support it. :)

-- Florian Gross

string.chars.each seems much nicer than string.each_byte with conversions or string.split.

-- Olathe

I suggest using String#to_a instead, since this would be the inverse of Array#to_s.

-- Ken Kunz

I'm afraid I have to oppose. We have a free, unused method with String#to_a we could redefine instead of adding yet another method to the API.

While redefining String#to_a will cause some breakage, it's merely using Object#to_a, which is deprecated anyway. So, one way or another, breakage is going to occur.

- Dan

Wrong. String#to_a is Enumerable#to_a which returns all elements .each yields in an Array. This happens to be an array of lines in this case.

There has been talking about enhancing the .to_a interface with special boolean flags or named arguments, but I don't think this is the way to go.

I'd rather see different methods for different behavior.

-- Florian Gross

Strongly opposed 0
Opposed 1
Neutral 0
In favor 7
Strongly advocate 3
ruby picture
If you have registered at RCRchive, you may now sign in below. If you have not registered, you may sign up for a username and password. Registering enables you to submit new RCRs, and vote and leave comments on existing RCRs.
Your username:
Your password:

ruby picture

Powered by .