W
Warren Brown
I really think String#each_char ought to be added
With all of the recent discussions on String#each_char, I have been
thinking...
There is a need for processing the characters of a string with
each(), but in a way shorter and more efficient than string.split(//) or
string.scan(/./m). There is also a need to have the definition of
"character" be "codepoint" under M17N. One solution to all of this
would be a new method of String, let's call it String#chars, that would
efficiently translate a string into an array of "characters". I am not
fluent in all of the issues involved with multi-byte characters, but a
parameter to String#chars could control the definition of "character",
and could default to generating an array of single-byte characters (like
string.split(//)).
This would allow users who want to iterate over the individual
single-byte characters to write "string.chars.each", and those wishing
to split the string into codepoints could write
"string.chars(something).each" to select the correct definition of
"character".
This solution would seem to provide the maximum flexibility,
allowing constructs like "string.chars.sort", while maintaining
compatibility with the current behavior of String#each. However, much
of the benefit would depend on the speed at which the translation to an
array could be done.
Comments?
- Warren Brown
to the core - it's one of the most basic String
operations, and in practice a lot more useful a
default case than each_byte.
[ruby-dev:23995] String#each -> String#each_char
(This thread contains various issues)
Shugo Maeda suggested if latest ruby HEAD includes
M17N, String#each should be String#each_char.
Matz pointed out two problems.
(1) compatibility(current String#each is alias of
String#each_line)
(2) In M17N context, "character" means "codepoint".
so a part of composite character is also character.
Minero Aoki arose an objection because of
inconvenience.
This issue is still open.
[ruby-dev:23999] Re: String#each -> String#each_char
Matz expressed that M17N imported to ruby1.9 is
different from ruby_m17n branch in following points.
* no ..._char methods.
* character is String included encoded codepoint
byte sequence
* remove an assumption that enables to recognize
multibyte length
So 1.9 after imported M17N is incompatible to
current 1.9.
With all of the recent discussions on String#each_char, I have been
thinking...
There is a need for processing the characters of a string with
each(), but in a way shorter and more efficient than string.split(//) or
string.scan(/./m). There is also a need to have the definition of
"character" be "codepoint" under M17N. One solution to all of this
would be a new method of String, let's call it String#chars, that would
efficiently translate a string into an array of "characters". I am not
fluent in all of the issues involved with multi-byte characters, but a
parameter to String#chars could control the definition of "character",
and could default to generating an array of single-byte characters (like
string.split(//)).
This would allow users who want to iterate over the individual
single-byte characters to write "string.chars.each", and those wishing
to split the string into codepoints could write
"string.chars(something).each" to select the correct definition of
"character".
This solution would seem to provide the maximum flexibility,
allowing constructs like "string.chars.sort", while maintaining
compatibility with the current behavior of String#each. However, much
of the benefit would depend on the speed at which the translation to an
array could be done.
Comments?
- Warren Brown