S
skeptic
Thomas Schodt said:My point is that charAt() is *still* a simple index lookup.
Any Unicode 4.0 supplementary codepoints units in Strings are stored as
two char values (surrogates).
This means that Strings can potentially display as few as half as many
codepoint units as String.length() reports.
What do you mean under the "codepoint units" ? The javadocs do say
about "code points"(ranging U+0000 to U+10FFFF) as opposed to "code
units"(U+0000 to U+FFFF) as a means to represent "characters".
For Strings containing Unicode 4.0 supplementary codepoints the index
you must pass to charAt() no longer corresponds to the offset of the
codepoint unit in the visual representation of the String.
Visual representation is absolutely irrelevant here. Codepoints may
split(1 codepoint may show as 2 glyfs) and combine, may not show at
all.
Let's say about characters (as listed in the big table of Unicode
characters at unicode.org).
You can use codePointAt() to get the 21-bit int value of codepoint units
in a String. When codePointAt() is called with the index of the first
surrogate of a Unicode 4.0 supplementary codepoint unit it returns the
21-bit int value of the entire pointcode unit (occupying the bytes at
index and at index+1 in the String). When codePointAt() is called with
the index of a "regular" Unicode codepoint it returns the 16-bit int
value of the pointcode unit numerically equivalent to the value charAt()
would return.
You again missed the point. The really interesting thing is the
meaning of the argument to codePointAt(i). Just returning the i-th
member of the internal char[] array converted to int(no matter how) is
either wrong or contradictory to common expectation.
I(if not most of us) use to think of a String as a *vector* of
*characters*,
where the n-th element is the n-th character. The opposite renders the
String methods like charAt(), substring(), indexOf() quite useless.
You'd say that the String now holds the UTF16-encoded data rather than
characters. Ok, agreed. No problem. But then what is the point of the
codePointAt()????
[not so logical stuff skipped]
Best Regards
P.S. Just trying to put some logic into the mess. May be wrong all
around.