String.charAt() returns wrong char

column.column · Mar 22, 2008

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same

byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?

Thank You

Mark Space · Mar 22, 2008

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same

byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?

Thank You

Probably the string is trying to interpret the byte as Unicode...

Eric Sosman · Mar 22, 2008

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same

byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?

Short answer: Because chars are not bytes.

Longer answer: When you construct a String from an array
of bytes, the bytes are decoded as representations of the
platform's default character set. On my machine (which may
be using the same encoding as yours, because we get the same
final result), the array "new byte[] { -110 }" decodes to a
String whose single character has the code 8217 or \u2019,
a Unicode right single quotation mark. When you convert this
char to a byte by chopping away the high-order half, you're
left with 25. Other systems might give you different results.

Your plan to store an array of "raw bytes" as a String
is flawed: Strings are not arrays, and they are made up not
of bytes but of chars. Why do you think you need to do it?

column.column · Mar 23, 2008

But maybe, it is possible to create string not in unicode format, but
in single byte coded characters? I found one more strange thing. My
serial communication class sends string to com port as needed -
character is 0x092. That means there is method to convert string to
bytes in right way.

[email protected] said:
[email protected] said:

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

Click to expand...

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same

Click to expand...

byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

Click to expand...

a=(byte) ggg.charAt(0); // a becomes 25 - why?

Click to expand...

Short answer: Because chars are not bytes.

Longer answer: When you construct a String from an array
of bytes, the bytes are decoded as representations of the
platform's default character set. On my machine (which may
be using the same encoding as yours, because we get the same
final result), the array "new byte[] { -110 }" decodes to a
String whose single character has the code 8217 or \u2019,
a Unicode right single quotation mark. When you convert this
char to a byte by chopping away the high-order half, you're
left with 25. Other systems might give you different results.

Your plan to store an array of "raw bytes" as a String
is flawed: Strings are not arrays, and they are made up not
of bytes but of chars. Why do you think you need to do it?

Lew · Mar 23, 2008

(please do not top-post)

Eric said:
Longer answer: When you construct a String from an array
of bytes, the bytes are decoded as representations of the
platform's default character set. On my machine (which may
be using the same encoding as yours, because we get the same
final result), the array "new byte[] { -110 }" decodes to a
String whose single character has the code 8217 or \u2019,
a Unicode right single quotation mark. When you convert this
char to a byte by chopping away the high-order half, you're
left with 25. Other systems might give you different results.

Your plan to store an array of "raw bytes" as a String
is flawed: Strings are not arrays, and they are made up not
of bytes but of chars. Why do you think you need to do it?

Click to expand...

But maybe, it is possible to create string not in unicode format, but
in single byte coded characters?

No.

One can create a String /from/ single-byte encoded characters, by specifying
the encoding for the conversion. The String itself will always comprise
16-bit-encoded characters.

rossum · Mar 23, 2008

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same

byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?

Thank You

There are ways to encode raw bytes as strings. Have you tried hex
(=Base16) encoding or Base64 encoding? Both of those will reversibly
convert between raw bytes and printable strings.

If you need the charAt() function for the string format then hex is
probably better because the mapping between bytes and character
positions is much simpler than with Base64.

rossum

Roedy Green · Mar 23, 2008

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

there are scores of ways of converting bytes to String. See
http://mindprod.com/jgloss/encoding.html

You want something quick, mindless and reversible, e.g. prepend a 0
byte.

ISO-8859-1 will do.

If you want something compact, see
http://mindprod.com/jgloss/armouring.html

column.column · Mar 24, 2008

If you need the charAt() function for the string format then hex is
probably better because the mapping between bytes and character
positions is much simpler than with Base64.

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16

{Big5=Big5, Big5-HKSCS=Big5-HKSCS, EUC-JP=EUC-JP, EUC-KR=EUC-KR,
GB18030=GB18030, GB2312=GB2312, GBK=GBK, IBM-Thai=IBM-Thai,
IBM00858=IBM00858, IBM01140=IBM01140, IBM01141=IBM01141,
IBM01142=IBM01142, IBM01143=IBM01143, IBM01144=IBM01144,
IBM01145=IBM01145, IBM01146=IBM01146, IBM01147=IBM01147,
IBM01148=IBM01148, IBM01149=IBM01149, IBM037=IBM037, IBM1026=IBM1026,
IBM1047=IBM1047, IBM273=IBM273, IBM277=IBM277, IBM278=IBM278,
IBM280=IBM280, IBM284=IBM284, IBM285=IBM285, IBM297=IBM297,
IBM420=IBM420, IBM424=IBM424, IBM437=IBM437, IBM500=IBM500,
IBM775=IBM775, IBM850=IBM850, IBM852=IBM852, IBM855=IBM855,
IBM857=IBM857, IBM860=IBM860, IBM861=IBM861, IBM862=IBM862,
IBM863=IBM863, IBM864=IBM864, IBM865=IBM865, IBM866=IBM866,
IBM868=IBM868, IBM869=IBM869, IBM870=IBM870, IBM871=IBM871,
IBM918=IBM918, ISO-2022-CN=ISO-2022-CN, ISO-2022-JP=ISO-2022-JP,
ISO-2022-JP-2=ISO-2022-JP-2, ISO-2022-KR=ISO-2022-KR,
ISO-8859-1=ISO-8859-1, ISO-8859-13=ISO-8859-13,
ISO-8859-15=ISO-8859-15, ISO-8859-2=ISO-8859-2, ISO-8859-3=ISO-8859-3,
ISO-8859-4=ISO-8859-4, ISO-8859-5=ISO-8859-5, ISO-8859-6=ISO-8859-6,
ISO-8859-7=ISO-8859-7, ISO-8859-8=ISO-8859-8, ISO-8859-9=ISO-8859-9,
JIS_X0201=JIS_X0201, JIS_X0212-1990=JIS_X0212-1990, KOI8-R=KOI8-R,
KOI8-U=KOI8-U, Shift_JIS=Shift_JIS, TIS-620=TIS-620, US-ASCII=US-
ASCII, UTF-16=UTF-16, UTF-16BE=UTF-16BE, UTF-16LE=UTF-16LE,
UTF-32=UTF-32, UTF-32BE=UTF-32BE, UTF-32LE=UTF-32LE, UTF-8=UTF-8,
windows-1250=windows-1250, windows-1251=windows-1251,
windows-1252=windows-1252, windows-1253=windows-1253,
windows-1254=windows-1254, windows-1255=windows-1255,
windows-1256=windows-1256, windows-1257=windows-1257,
windows-1258=windows-1258, windows-31j=windows-31j, x-Big5-Solaris=x-
Big5-Solaris, x-euc-jp-linux=x-euc-jp-linux, x-EUC-TW=x-EUC-TW, x-
eucJP-Open=x-eucJP-Open, x-IBM1006=x-IBM1006, x-IBM1025=x-IBM1025, x-
IBM1046=x-IBM1046, x-IBM1097=x-IBM1097, x-IBM1098=x-IBM1098, x-
IBM1112=x-IBM1112, x-IBM1122=x-IBM1122, x-IBM1123=x-IBM1123, x-
IBM1124=x-IBM1124, x-IBM1381=x-IBM1381, x-IBM1383=x-IBM1383, x-
IBM33722=x-IBM33722, x-IBM737=x-IBM737, x-IBM834=x-IBM834, x-IBM856=x-
IBM856, x-IBM874=x-IBM874, x-IBM875=x-IBM875, x-IBM921=x-IBM921, x-
IBM922=x-IBM922, x-IBM930=x-IBM930, x-IBM933=x-IBM933, x-IBM935=x-
IBM935, x-IBM937=x-IBM937, x-IBM939=x-IBM939, x-IBM942=x-IBM942, x-
IBM942C=x-IBM942C, x-IBM943=x-IBM943, x-IBM943C=x-IBM943C, x-IBM948=x-
IBM948, x-IBM949=x-IBM949, x-IBM949C=x-IBM949C, x-IBM950=x-IBM950, x-
IBM964=x-IBM964, x-IBM970=x-IBM970, x-ISCII91=x-ISCII91, x-ISO-2022-CN-
CNS=x-ISO-2022-CN-CNS, x-ISO-2022-CN-GB=x-ISO-2022-CN-GB, x-
iso-8859-11=x-iso-8859-11, x-JIS0208=x-JIS0208, x-JISAutoDetect=x-
JISAutoDetect, x-Johab=x-Johab, x-MacArabic=x-MacArabic, x-
MacCentralEurope=x-MacCentralEurope, x-MacCroatian=x-MacCroatian, x-
MacCyrillic=x-MacCyrillic, x-MacDingbat=x-MacDingbat, x-MacGreek=x-
MacGreek, x-MacHebrew=x-MacHebrew, x-MacIceland=x-MacIceland, x-
MacRoman=x-MacRoman, x-MacRomania=x-MacRomania, x-MacSymbol=x-
MacSymbol, x-MacThai=x-MacThai, x-MacTurkish=x-MacTurkish, x-
MacUkraine=x-MacUkraine, x-MS950-HKSCS=x-MS950-HKSCS, x-mswin-936=x-
mswin-936, x-PCK=x-PCK, x-UTF-16LE-BOM=x-UTF-16LE-BOM, X-UTF-32BE-
BOM=X-UTF-32BE-BOM, X-UTF-32LE-BOM=X-UTF-32LE-BOM, x-windows-50220=x-
windows-50220, x-windows-50221=x-windows-50221, x-windows-874=x-
windows-874, x-windows-949=x-windows-949, x-windows-950=x-windows-950,
x-windows-iso2022jp=x-windows-iso2022jp}

rossum · Mar 24, 2008

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16

Base16 is another name for Hex. It only uses 16 characters
0123456789ABCDEF or 0123456789abcdef. Each byte is translated into
two characters.

This is the code I use:

/**
* Converts a byte array into a hex string: "EB 33 0F 7E".
* The string uses uppercase with leading zeros and spaces
* for separators.
*
* @param inBytes The byte array to convert.
* @return A hex string with spaces for separators.
*/
public static String asHex(byte[] inBytes) {
final String separator = " ";
final char leadingZero = '0';
StringBuilder sb = new StringBuilder(inBytes.length * 3);
for (int i = 0; i < inBytes.length; ++i) {
if (i > 0) { sb.append(separator); }
if (inBytes >= 0 && inBytes < 0x10) {
sb.append(leadingZero);
} // end if
sb.append(Integer.toHexString(inBytes & 0xFF));
} // end for
return sb.toString().toUpperCase();
} // end asHex(byte[])

You may wish to remove the separator so your output looks more like
"EB330F7E".

I leave it up to you to do the reverse conversion from the string back
to bytes.

rossum

Roedy Green · Mar 24, 2008

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16

see http://mindprod.com/jgloss/base64.html
in it not one of the supported encodings.
I don't think hex is either.

Mark Space · Mar 24, 2008

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16

Here is my question:

Why use Strings at all? Byte arrays are ideal for IO, just send the
array to the serial port you want.

If you are doing some text processing, there are methods that take
byte[] and convert large amounts of text quickly. Yes, you still need a
Charset for this.

(Can you tell us what charset you are using? What character is 92
anyway? You haven't even told us yet.)

EJP · Mar 25, 2008

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String.

Why? That's where your problem is. String is not a container for binary
data.

Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
Can't solve problems! please Help	0	Sep 26, 2022
BlueJ don't know what i did wrong	29	Feb 22, 2013
whats the use of unsigned char	11	Nov 6, 2009
reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st	37	Oct 15, 2011
ZipInputStream bug	2	Jan 27, 2010
Meaning of unsigned char	42	Aug 14, 2006
Need character output instead of numbers	4	Apr 24, 2012

String.charAt() returns wrong char

column.column

Mark Space

Eric Sosman

column.column

Lew

rossum

Roedy Green

column.column

rossum

Roedy Green

Mark Space

EJP

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads