String.charAt() returns wrong char

C

column.column

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same


byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?


Thank You
 
M

Mark Space

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same


byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?


Thank You

Probably the string is trying to interpret the byte as Unicode...
 
E

Eric Sosman

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same


byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?

Short answer: Because chars are not bytes.

Longer answer: When you construct a String from an array
of bytes, the bytes are decoded as representations of the
platform's default character set. On my machine (which may
be using the same encoding as yours, because we get the same
final result), the array "new byte[] { -110 }" decodes to a
String whose single character has the code 8217 or \u2019,
a Unicode right single quotation mark. When you convert this
char to a byte by chopping away the high-order half, you're
left with 25. Other systems might give you different results.

Your plan to store an array of "raw bytes" as a String
is flawed: Strings are not arrays, and they are made up not
of bytes but of chars. Why do you think you need to do it?
 
C

column.column

But maybe, it is possible to create string not in unicode format, but
in single byte coded characters? I found one more strange thing. My
serial communication class sends string to com port as needed -
character is 0x092. That means there is method to convert string to
bytes in right way.


I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?
int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte  -128...127.
Anyway, bit layout this is the same
byte [] aaa  = new byte[] {a};
String ggg= new String(aaa); //creating string
a=(byte) ggg.charAt(0); // a becomes 25 - why?

     Short answer: Because chars are not bytes.

     Longer answer: When you construct a String from an array
of bytes, the bytes are decoded as representations of the
platform's default character set.  On my machine (which may
be using the same encoding as yours, because we get the same
final result), the array "new byte[] { -110 }" decodes to a
String whose single character has the code 8217 or \u2019,
a Unicode right single quotation mark.  When you convert this
char to a byte by chopping away the high-order half, you're
left with 25.  Other systems might give you different results.

     Your plan to store an array of "raw bytes" as a String
is flawed: Strings are not arrays, and they are made up not
of bytes but of chars.  Why do you think you need to do it?
 
L

Lew

(please do not top-post)

Eric said:
Longer answer: When you construct a String from an array
of bytes, the bytes are decoded as representations of the
platform's default character set. On my machine (which may
be using the same encoding as yours, because we get the same
final result), the array "new byte[] { -110 }" decodes to a
String whose single character has the code 8217 or \u2019,
a Unicode right single quotation mark. When you convert this
char to a byte by chopping away the high-order half, you're
left with 25. Other systems might give you different results.

Your plan to store an array of "raw bytes" as a String
is flawed: Strings are not arrays, and they are made up not
of bytes but of chars. Why do you think you need to do it?


But maybe, it is possible to create string not in unicode format, but
in single byte coded characters?

No.

One can create a String /from/ single-byte encoded characters, by specifying
the encoding for the conversion. The String itself will always comprise
16-bit-encoded characters.
 
R

rossum

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte -128...127.
Anyway, bit layout this is the same


byte [] aaa = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?


Thank You
There are ways to encode raw bytes as strings. Have you tried hex
(=Base16) encoding or Base64 encoding? Both of those will reversibly
convert between raw bytes and printable strings.

If you need the charAt() function for the string format then hex is
probably better because the mapping between bytes and character
positions is much simpler than with Base64.

rossum
 
R

Roedy Green

C

column.column

If you need the charAt() function for the string format then hex is
probably better because the mapping between bytes and character
positions is much simpler than with Base64.

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16


{Big5=Big5, Big5-HKSCS=Big5-HKSCS, EUC-JP=EUC-JP, EUC-KR=EUC-KR,
GB18030=GB18030, GB2312=GB2312, GBK=GBK, IBM-Thai=IBM-Thai,
IBM00858=IBM00858, IBM01140=IBM01140, IBM01141=IBM01141,
IBM01142=IBM01142, IBM01143=IBM01143, IBM01144=IBM01144,
IBM01145=IBM01145, IBM01146=IBM01146, IBM01147=IBM01147,
IBM01148=IBM01148, IBM01149=IBM01149, IBM037=IBM037, IBM1026=IBM1026,
IBM1047=IBM1047, IBM273=IBM273, IBM277=IBM277, IBM278=IBM278,
IBM280=IBM280, IBM284=IBM284, IBM285=IBM285, IBM297=IBM297,
IBM420=IBM420, IBM424=IBM424, IBM437=IBM437, IBM500=IBM500,
IBM775=IBM775, IBM850=IBM850, IBM852=IBM852, IBM855=IBM855,
IBM857=IBM857, IBM860=IBM860, IBM861=IBM861, IBM862=IBM862,
IBM863=IBM863, IBM864=IBM864, IBM865=IBM865, IBM866=IBM866,
IBM868=IBM868, IBM869=IBM869, IBM870=IBM870, IBM871=IBM871,
IBM918=IBM918, ISO-2022-CN=ISO-2022-CN, ISO-2022-JP=ISO-2022-JP,
ISO-2022-JP-2=ISO-2022-JP-2, ISO-2022-KR=ISO-2022-KR,
ISO-8859-1=ISO-8859-1, ISO-8859-13=ISO-8859-13,
ISO-8859-15=ISO-8859-15, ISO-8859-2=ISO-8859-2, ISO-8859-3=ISO-8859-3,
ISO-8859-4=ISO-8859-4, ISO-8859-5=ISO-8859-5, ISO-8859-6=ISO-8859-6,
ISO-8859-7=ISO-8859-7, ISO-8859-8=ISO-8859-8, ISO-8859-9=ISO-8859-9,
JIS_X0201=JIS_X0201, JIS_X0212-1990=JIS_X0212-1990, KOI8-R=KOI8-R,
KOI8-U=KOI8-U, Shift_JIS=Shift_JIS, TIS-620=TIS-620, US-ASCII=US-
ASCII, UTF-16=UTF-16, UTF-16BE=UTF-16BE, UTF-16LE=UTF-16LE,
UTF-32=UTF-32, UTF-32BE=UTF-32BE, UTF-32LE=UTF-32LE, UTF-8=UTF-8,
windows-1250=windows-1250, windows-1251=windows-1251,
windows-1252=windows-1252, windows-1253=windows-1253,
windows-1254=windows-1254, windows-1255=windows-1255,
windows-1256=windows-1256, windows-1257=windows-1257,
windows-1258=windows-1258, windows-31j=windows-31j, x-Big5-Solaris=x-
Big5-Solaris, x-euc-jp-linux=x-euc-jp-linux, x-EUC-TW=x-EUC-TW, x-
eucJP-Open=x-eucJP-Open, x-IBM1006=x-IBM1006, x-IBM1025=x-IBM1025, x-
IBM1046=x-IBM1046, x-IBM1097=x-IBM1097, x-IBM1098=x-IBM1098, x-
IBM1112=x-IBM1112, x-IBM1122=x-IBM1122, x-IBM1123=x-IBM1123, x-
IBM1124=x-IBM1124, x-IBM1381=x-IBM1381, x-IBM1383=x-IBM1383, x-
IBM33722=x-IBM33722, x-IBM737=x-IBM737, x-IBM834=x-IBM834, x-IBM856=x-
IBM856, x-IBM874=x-IBM874, x-IBM875=x-IBM875, x-IBM921=x-IBM921, x-
IBM922=x-IBM922, x-IBM930=x-IBM930, x-IBM933=x-IBM933, x-IBM935=x-
IBM935, x-IBM937=x-IBM937, x-IBM939=x-IBM939, x-IBM942=x-IBM942, x-
IBM942C=x-IBM942C, x-IBM943=x-IBM943, x-IBM943C=x-IBM943C, x-IBM948=x-
IBM948, x-IBM949=x-IBM949, x-IBM949C=x-IBM949C, x-IBM950=x-IBM950, x-
IBM964=x-IBM964, x-IBM970=x-IBM970, x-ISCII91=x-ISCII91, x-ISO-2022-CN-
CNS=x-ISO-2022-CN-CNS, x-ISO-2022-CN-GB=x-ISO-2022-CN-GB, x-
iso-8859-11=x-iso-8859-11, x-JIS0208=x-JIS0208, x-JISAutoDetect=x-
JISAutoDetect, x-Johab=x-Johab, x-MacArabic=x-MacArabic, x-
MacCentralEurope=x-MacCentralEurope, x-MacCroatian=x-MacCroatian, x-
MacCyrillic=x-MacCyrillic, x-MacDingbat=x-MacDingbat, x-MacGreek=x-
MacGreek, x-MacHebrew=x-MacHebrew, x-MacIceland=x-MacIceland, x-
MacRoman=x-MacRoman, x-MacRomania=x-MacRomania, x-MacSymbol=x-
MacSymbol, x-MacThai=x-MacThai, x-MacTurkish=x-MacTurkish, x-
MacUkraine=x-MacUkraine, x-MS950-HKSCS=x-MS950-HKSCS, x-mswin-936=x-
mswin-936, x-PCK=x-PCK, x-UTF-16LE-BOM=x-UTF-16LE-BOM, X-UTF-32BE-
BOM=X-UTF-32BE-BOM, X-UTF-32LE-BOM=X-UTF-32LE-BOM, x-windows-50220=x-
windows-50220, x-windows-50221=x-windows-50221, x-windows-874=x-
windows-874, x-windows-949=x-windows-949, x-windows-950=x-windows-950,
x-windows-iso2022jp=x-windows-iso2022jp}
 
R

rossum

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16
Base16 is another name for Hex. It only uses 16 characters
0123456789ABCDEF or 0123456789abcdef. Each byte is translated into
two characters.

This is the code I use:

/**
* Converts a byte array into a hex string: "EB 33 0F 7E".
* The string uses uppercase with leading zeros and spaces
* for separators.
*
* @param inBytes The byte array to convert.
* @return A hex string with spaces for separators.
*/
public static String asHex(byte[] inBytes) {
final String separator = " ";
final char leadingZero = '0';
StringBuilder sb = new StringBuilder(inBytes.length * 3);
for (int i = 0; i < inBytes.length; ++i) {
if (i > 0) { sb.append(separator); }
if (inBytes >= 0 && inBytes < 0x10) {
sb.append(leadingZero);
} // end if
sb.append(Integer.toHexString(inBytes & 0xFF));
} // end for
return sb.toString().toUpperCase();
} // end asHex(byte[])

You may wish to remove the separator so your output looks more like
"EB330F7E".

I leave it up to you to do the reverse conversion from the string back
to bytes.

rossum
 
M

Mark Space

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16

Here is my question:

Why use Strings at all? Byte arrays are ideal for IO, just send the
array to the serial port you want.

If you are doing some text processing, there are methods that take
byte[] and convert large amounts of text quickly. Yes, you still need a
Charset for this.

(Can you tell us what charset you are using? What character is 92
anyway? You haven't even told us yet.)
 
E

EJP

I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String.

Why? That's where your problem is. String is not a container for binary
data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top