Question about astring.getBytes() and a byteArray.toString()

A

Andrew

Hello. I have a general question in regards to String.getBytes and
toString() methods (in this case when applied to a byte array).

I have the following code:

byte[] syncML ={
(byte) 0x0002, (byte) 0x01, (byte) 0x006a, (byte) 0x0000,
(byte) 0x006d, (byte) 0x006c
, (byte) 0x0071,(byte) 0x0003, (byte) '1', (byte) '.', (byte) '1',
(byte) 0x0000,(byte) 0x0001,
(byte) 0x0072, (byte) 0x0003, (byte) 'S', (byte) 'y',
(byte) 'n',
(byte) 'c', (byte) 'M', (byte) 'L', (byte) 0x0000,
(byte) 0x0001 ,
(byte) 0x0001,(byte) 0x0001};



String str = syncML.toString();


byte[] bytesafterString = str.getBytes();

System.out.println(syncML.length + " " +
bytesafterString.length);

I realised that the result is not as I was expecting. In fact, the
lengths of the two byte arrays were not identical !!!! Result was:


25 9


How can a byte array been converted to a string and then back to an
array and be different ????????????????????? I really cant understand
this !!!! Shouldnt it be the same ?? If no, how can I make sure my
data is exactly the same before and after string conversion ?


Many thanks in advance


Andrew
 
M

Michael Borgwardt

Andrew said:
I realised that the result is not as I was expecting. In fact, the
lengths of the two byte arrays were not identical !!!! Result was:


25 9


How can a byte array been converted to a string and then back to an
array and be different ?????????????????????

First: that's NOT what you're doing! Take a look at what the intermediate
String contains.

To convert a byte array to a String, use the String(byte[]) constructor.
I really cant understand
this !!!! Shouldnt it be the same ??

No, even if you use the correct method, they are not necessarily the same.
Every transition between byte[] and String makes use of an encoding, the
platform standard encoding if (as you do) the encoding is not specified.
The platfrom standard encoding can (and does) differ between systems, and
even if it doesn't, the output can still be different because the input
byte array may have contained byte sequences that are not legal in the
encoding and which will then be converted to '?'.

If no, how can I make sure my
data is exactly the same before and after string conversion ?

The only really certain way is to NOT do a "string conversion" at all.
Strings are for text, not for binary data. If you really have to use
Strings, use something like Base64 encoding:
http://iharder.sourceforge.net/base64/
 
M

M. Uli Kusterer

Note: I'm mainly a C programmer, so I'm not really sure about Java's
typecasting behavior, and I may mix up C's and Java's data type sizes.
But usually Java and C are very close, so I'm pretty sure about what I'm
saying, but if someone knows better, let me know.

byte[] syncML ={
(byte) 0x0002, (byte) 0x01, (byte) 0x006a, (byte) 0x0000,
(byte) 0x006d, (byte) 0x006c
, (byte) 0x0071,(byte) 0x0003, (byte) '1', (byte) '.', (byte) '1',
(byte) 0x0000,(byte) 0x0001,
(byte) 0x0072, (byte) 0x0003, (byte) 'S', (byte) 'y',
(byte) 'n',
(byte) 'c', (byte) 'M', (byte) 'L', (byte) 0x0000,
(byte) 0x0001 ,
(byte) 0x0001,(byte) 0x0001};

This is wrong. IIRC a byte is 8 bits long (i.e. two hexadecimal digits)
and characters are 2 bytes long (four hexadecimal digits). toString()
and getBytes() will thus expect each character to use 2 bytes. Your
array only contains the low byte of your characters.

I'm not sure whether Java is big-endian or little-endian, or whether it
uses the current platform's native endian-ness, but either you're
discarding the first two 00s only, or you're even only storing those,
and discarding the actual information.
How can a byte array been converted to a string and then back to an
array and be different ?????????????????????

It isn't. This is simply an example of Garbage-in-Garbage-out.

I'm not sure what you're trying to achieve, but this is very likely the
wrong way to do it. Why would you want to have control characters like
0x0002 in a string anyway? They're unprintable and will thus be skipped
or replaced on output. Have a look at a DataOutputStream. That looks
more like what you may want.

Cheers,
-- Uli
http://www.zathras.de
 
T

Thomas Schodt

M. Uli Kusterer said:
This is wrong. IIRC a byte is 8 bits long (i.e. two hexadecimal digits)
and characters are 2 bytes long (four hexadecimal digits). toString()
and getBytes() will thus expect each character to use 2 bytes. Your
array only contains the low byte of your characters.
Indeed.

I'm not sure whether Java is big-endian or little-endian, or whether it
uses the current platform's native endian-ness, but either you're
discarding the first two 00s only

That is the case.

It isn't. This is simply an example of Garbage-in-Garbage-out.

Well, it's just that OP used the wrong approach.

Michael Borgwardt already told OP to use new String(byte[]) instead.

Object.toString() returns a String representation of the object type
(for a byte[] that is "[B") and address in memory.
 
W

Will Hartung

Coming in late...

anyway.

Java is NOT C. Remember that Java Strings are all Unicode, so there can be
issues with just converting bytes to Strings. This is also why we have
things like *Writer specifically to handle Strings and conversion, vs
*Stream which is more raw.

Regards,

Will Hartung
([email protected])
 
J

John C. Bollinger

M. Uli Kusterer said:
Note: I'm mainly a C programmer, so I'm not really sure about Java's
typecasting behavior, and I may mix up C's and Java's data type sizes.
But usually Java and C are very close, so I'm pretty sure about what I'm
saying, but if someone knows better, let me know.

byte[] syncML ={
(byte) 0x0002, (byte) 0x01, (byte) 0x006a, (byte) 0x0000,
(byte) 0x006d, (byte) 0x006c
, (byte) 0x0071,(byte) 0x0003, (byte) '1', (byte) '.', (byte) '1',
(byte) 0x0000,(byte) 0x0001,
(byte) 0x0072, (byte) 0x0003, (byte) 'S', (byte) 'y',
(byte) 'n',
(byte) 'c', (byte) 'M', (byte) 'L', (byte) 0x0000,
(byte) 0x0001 ,
(byte) 0x0001,(byte) 0x0001};


This is wrong. IIRC a byte is 8 bits long (i.e. two hexadecimal digits)
and characters are 2 bytes long (four hexadecimal digits).
Correct.

toString()
and getBytes() will thus expect each character to use 2 bytes. Your
array only contains the low byte of your characters.

Incorrect. Michael Borgwardt already explained that the OP, if he
really must do this at all, is using the wrong approach. What he is
trying to do is accomplished on the byte[] -> String side by an
appropriate String constructor. HOWEVER, that constructor (and Java
conversions from byte sequences to character sequences in general)
always operates by means of a character encoding, so that individual
bytes or byte pairs do not necessarily map directly onto chars.
Thinking of it as if it were similar to a C typecast is completely wrong.


John Bollinger
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top