How about a String#chars method?

W

Warren Brown

I really think String#each_char ought to be added
to the core - it's one of the most basic String
operations, and in practice a lot more useful a
default case than each_byte.
[ruby-dev:23995] String#each -> String#each_char
(This thread contains various issues)

Shugo Maeda suggested if latest ruby HEAD includes
M17N, String#each should be String#each_char.

Matz pointed out two problems.

(1) compatibility(current String#each is alias of
String#each_line)
(2) In M17N context, "character" means "codepoint".
so a part of composite character is also character.

Minero Aoki arose an objection because of
inconvenience.

This issue is still open.

[ruby-dev:23999] Re: String#each -> String#each_char

Matz expressed that M17N imported to ruby1.9 is
different from ruby_m17n branch in following points.

* no ..._char methods.
* character is String included encoded codepoint
byte sequence
* remove an assumption that enables to recognize
multibyte length

So 1.9 after imported M17N is incompatible to
current 1.9.


With all of the recent discussions on String#each_char, I have been
thinking...

There is a need for processing the characters of a string with
each(), but in a way shorter and more efficient than string.split(//) or
string.scan(/./m). There is also a need to have the definition of
"character" be "codepoint" under M17N. One solution to all of this
would be a new method of String, let's call it String#chars, that would
efficiently translate a string into an array of "characters". I am not
fluent in all of the issues involved with multi-byte characters, but a
parameter to String#chars could control the definition of "character",
and could default to generating an array of single-byte characters (like
string.split(//)).

This would allow users who want to iterate over the individual
single-byte characters to write "string.chars.each", and those wishing
to split the string into codepoints could write
"string.chars(something).each" to select the correct definition of
"character".

This solution would seem to provide the maximum flexibility,
allowing constructs like "string.chars.sort", while maintaining
compatibility with the current behavior of String#each. However, much
of the benefit would depend on the speed at which the translation to an
array could be done.

Comments?

- Warren Brown
 
A

Ara.T.Howard

With all of the recent discussions on String#each_char, I have been
thinking...

There is a need for processing the characters of a string with
each(), but in a way shorter and more efficient than string.split(//) or
string.scan(/./m). There is also a need to have the definition of
"character" be "codepoint" under M17N. One solution to all of this
would be a new method of String, let's call it String#chars, that would
efficiently translate a string into an array of "characters". I am not
fluent in all of the issues involved with multi-byte characters, but a
parameter to String#chars could control the definition of "character",
and could default to generating an array of single-byte characters (like
string.split(//)).

This would allow users who want to iterate over the individual
single-byte characters to write "string.chars.each", and those wishing
to split the string into codepoints could write
"string.chars(something).each" to select the correct definition of
"character".

This solution would seem to provide the maximum flexibility,
allowing constructs like "string.chars.sort", while maintaining
compatibility with the current behavior of String#each. However, much
of the benefit would depend on the speed at which the translation to an
array could be done.

Comments?

something like this

~ > cat b.rb
class String
def chars width = 1
unpack "a#{ width }" * (size / width + (size % width == 0 ? 0 : 1))
end
end

s = 'foobar'
p s.chars
p s.chars(3)

~ > ruby b.rb
["f", "o", "o", "b", "a", "r"]
["foo", "bar"]

??

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,278
Latest member
BuzzDefenderpro

Latest Threads

Top