Whay aren't Strings Iterable?

G

Googmeister

Anyone know why String in Java 1.5 does not implement
the Iterable<Character> interface? It seems reasonable
to write

String text = "Hello, World";
for (char c : text) {
// do something with c
}

and avoid charAt() and length(). It would do a lot of
unnecessary autoboxing, but if arrays are Iterable,
why not Strings?

Is the autoboxing somehow bypassed when iterating
over arrays of primitive types?

Thanks for any insight.
 
S

Stefan Ram

Googmeister said:
Anyone know why String in Java 1.5 does not implement
the Iterable<Character> interface?

I do not know. But I found, that usually I want to
iterate over code points (not characters), so I wrote
two macros:

--- // $1 the string to loop
--- // $2 the name for the length variable (current position)
--- // $3 the name for the integral loop variable (current position)
--- // $4 the name for the code point variable
--- // $5 the block
--- // e.g. FOR_CODEPOINTS_OF(string,length,i,cp,print(cp);)

$define FOR_CODEPOINTS_OF
{ final int $2 = ($1).length();
for( int $3 = 0; $3 < $2; )
{ final int $4 = ($1).codePointAt( $3 );
{ $5 }
$3 += java.lang.Character.charCount( $4 ); }}

--- // $1 the string to loop
--- // $2 the name for the length variable (current position)
--- // $3 the name for the integral loop variable (current position)
--- // $4 the name for the character variable
--- // $4 the name for the size
--- // $6 the block
--- // e.g. FOR_CHARACTERS_OF(string,length,position-name,target-name,size-name,inner-statement)

$define FOR_CHARACTERS_OF
{ final int $2 = ($1).length();
for( int $3 = 0; $3 < $2; )
{ final int $4 = ($1).codePointAt( $3 );
final int $4 = java.lang.Character.charCount( $5 );
final java.lang.String $4 = ($1).substring( $3, $3 + $5 );
{ $6 }
$3 += size; }}
 
T

Tim Tyler

Googmeister said:
Anyone know why String in Java 1.5 does not implement
the Iterable<Character> interface? It seems reasonable
to write

String text = "Hello, World";
for (char c : text) {
// do something with c
}

and avoid charAt() and length(). It would do a lot of
unnecessary autoboxing, but if arrays are Iterable,
why not Strings?

See the "holes in the syntax" thread in this forum for previous discussion
of this issue:

http://groups.google.com/group/comp...hread/9f906e61b2543d2/eaa314581bbebd54?lnk=st
 
H

Harry Bosch

can't you just call:

String text = "Hello, World";
for (char c : text.toCharArray()) {
// do something with c
}

I would suspect that the real reason is based on no real idea as to
what actual type of Iterator should be returned. Would people agree on
char, Character, Byte or String?
 
M

Mark Thornton

Tor said:
Or even stranger, that neither CharacterIterator nor CharSequence
extend that interface. Either of those would have helped.

Not strange at all. CharacterIterator iterates over a sequence of
primitive (char) values, and not over Character instances. Similarly a
CharSequence is a collection of primitives and most people are likely to
want to use it that way for performance reasons. Therefore there is
little value (and some nuisance for implementers) in having it extend
Iterable.

Mark Thornton
 
T

Tor Iver Wilhelmsen

Googmeister said:
Anyone know why String in Java 1.5 does not implement
the Iterable<Character> interface?

Or even stranger, that neither CharacterIterator nor CharSequence
extend that interface. Either of those would have helped.
 
M

Mike Schilling

Is the autoboxing somehow bypassed when iterating
over arrays of primitive types?

No Iterator is used for that, nor is any boxing done.

int[] ia;
for (int i : ia)
{
...
}

is treated like:

int[] ia;
for (int _i = 0; _i < ia.length; _a++)
{
int i = ia[_i];
...
}
 
T

Tim Tyler

Mike Schilling said:
int[] ia;
for (int i : ia)
{
...
}

is treated like:

int[] ia;
for (int _i = 0; _i < ia.length; _a++)
{
int i = ia[_i];
...
}

Analogously:

void iterate(String s) {
for (char c : s) {
process(c);
}
}

....could be defined to mean:

void iterate(String s) {
int size = s.length();
for (int i = 0; i < size; i++) {
char c = s.charAt(i);
process(c);
}
}
 
R

Roedy Green

String text = "Hello, World";
for (char c : text) {
// do something with c
}

I think the main reason is you often want not only c in your loop but
i.
 
M

Mike Schilling

Tim Tyler said:
Mike Schilling said:
int[] ia;
for (int i : ia)
{
...
}

is treated like:

int[] ia;
for (int _i = 0; _i < ia.length; _a++)
{
int i = ia[_i];
...
}

Analogously:

void iterate(String s) {
for (char c : s) {
process(c);
}
}

...could be defined to mean:

void iterate(String s) {
int size = s.length();
for (int i = 0; i < size; i++) {
char c = s.charAt(i);
process(c);
}
}


Sure. And auto-boxing and the Iterable interface, which the OP asked about,
are not really relevant.

The only thing about your suggestion that I'm a little concerned about is
surrogate pairs, which would be treated as separate characters. I suppose
code that cares about them could write:

for (int i: s) ...

which would simply zero-extending normal characters, pack any surrogate
pairs into an integer, and throw a defined exception if it found mismatches.
In fact, you could define the same semantics for any integer-valued iterator
over a character-valued aggregate.
 
G

Googmeister

Mike said:
The only thing about your suggestion that I'm a little concerned about is
surrogate pairs, which would be treated as separate characters.

I didn't know what a surrogate pair or code point was until now, but
that sounds like a plausible explanation. I guess the thing the Java
1.5
String iterator should return is a 21-bit UNICODE code point,
but this would be awfully confusing. :)

Thanks!
 
A

Alan Krueger

Harry said:
can't you just call:

String text = "Hello, World";
for (char c : text.toCharArray()) {
// do something with c
}

Yep, that works. This keeps you from having to construct object
wrappers to iterate over characters.
I would suspect that the real reason is based on no real idea as to
what actual type of Iterator should be returned. Would people agree on
char, Character, Byte or String?

Since String is a CharSequence, I think the last two don't make much
sense as the first two, and an Iterable<> would have to return an Object
rather than a primitive. That suggests Character, but iterating over
the character array, as above, seems simpler.
 
R

Ross Bamford

I didn't know what a surrogate pair or code point was until now, but
that sounds like a plausible explanation. I guess the thing the Java
1.5
String iterator should return is a 21-bit UNICODE code point,
but this would be awfully confusing. :)

Thanks!

The other (related) thing is that not all strings iterate the same way, so
wouldn't the whole order iteration would be apt to reverse in certain
locales? That too probably wouldn't the effect you'd expect...
 
M

Mike Schilling

Ross Bamford said:
The other (related) thing is that not all strings iterate the same way, so
wouldn't the whole order iteration would be apt to reverse in certain
locales? That too probably wouldn't the effect you'd expect...

I don't think so. The string might display either left-to-right or
right-to-left, but the first letter in a word is the first letter in a word,
in English or in Hebrew.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top