Help please - Why does ByteBuffer return '?' as opposed to what was put?

E

Ed

All,

I have an annoying problem with ByteBuffer (BB).

I am reading from a telnet server - telnet sends bytes that can be any
hex value, some have special meaning to telnet.

I have the following code:

for (int i = 0; i < 255; i++)
{
ch = (char) i;
test = new StringBuffer();
test.append((char) i);
byteBuffer.put(test.toString().getBytes());
byteBuffer.flip();

back = byteBuffer.get();

// Uncomment below to see more working ...

/*
if (back < 0)
{
System.out.println("... adding 256 ....");
back = back + 256;
}
*/

backch = (char) back;

if (ch != backch)
{
System.out.println("ERROR - following do not match:");
}
System.out.println("" + i + " [" + ch + "] = " + back + " [" +
backch + "]");
byteBuffer.clear();
}

The code works ok up to value 127. From 129 to 159 (hex 80 to 9F)
bytebuffer always gives back the '?' char (int 63). From 160 to 255 I
get a negative number.

You will see the commented out code - adding that solves the problem
of 160 to 255 but I still get the 63 in the 129 to 159 range.

I am obviously doing something wrong.

Any help would be much appreciated.

regards

Ed
 
T

Tor Iver Wilhelmsen

I have an annoying problem with ByteBuffer (BB).

Rather with the understanding of how SIGNED bytes work.

(Deleted insanely convoluted way of doing something)
back = byteBuffer.get();

Are we supposed to guess at the type of back? It it char? int?
if (back < 0)
{
System.out.println("... adding 256 ....");
back = back + 256;
}

See, the need to do this is indicative of your misunderstanding.
The code works ok up to value 127.

Yes, that is Byte.MAX_VALUE.
From 129 to 159 (hex 80 to 9F)
bytebuffer always gives back the '?' char (int 63).

There are no printable characters in that range, so they are
substituted with a ?.
From 160 to 255 I get a negative number.

Yes, signed bytes do that to you when the 8th bit is set.
You will see the commented out code - adding that solves the problem
of 160 to 255 but I still get the 63 in the 129 to 159 range.

That's because you're supposed to.

char charValue = (char) (byteValue && 0xff);

should be somewhere in every Java IO tutorial.

That said, you probably also need to consider specifying charset
somewhere along the line.
 
T

Tim Ward

Tor Iver Wilhelmsen said:
Rather with the understanding of how SIGNED bytes work.

Yet another victim of the utterly bizarre decision to make bytes signed in
the first place.
 
X

xarax

Tim Ward said:
Yet another victim of the utterly bizarre decision to make bytes signed in
the first place.

Or rather the bizarre decision not to include "unsigned"
integer types of all sizes.
 
J

John C. Bollinger

Ed said:
I have an annoying problem with ByteBuffer (BB).

No, you have a lack of appreciation for the difference between bytes and
characters, and for the accompanying niceties of character encoding.
I have the following code:

for (int i = 0; i < 255; i++)
{
ch = (char) i;
test = new StringBuffer();
test.append((char) i);

So far, so good.
byteBuffer.put(test.toString().getBytes());

But that's bad. You are using the no-arg version of String.getBytes(),
which encodes the characters of the string into an array of bytes
according to the system's default character encoding scheme. You should
_always_ use an explicit encoding scheme to (1) make your intention
clear and (2) ensure that your application works the same way on every
system. It will also get you thinking about what's happening; the
details of this conversion are part of what's tripping you up.
byteBuffer.flip();

back = byteBuffer.get();

That bit is probably fine, although you omitted the declaration of back.
// Uncomment below to see more working ...

/*
if (back < 0)
{
System.out.println("... adding 256 ....");
back = back + 256;
}
*/

Well, yes, I can imagine that that might work better. byteBuffer.get()
returns a value of type *byte*, which is a *signed* 8-bit number. You
are comparing it (below) to a *char*, which (in one relevant sense) is
an *unsigned* 16-bit number. Try this:

if (((char) 0x80) == ((byte) 0x80)) {
System.out.println("No duh!");
} else {
System.out.println("Surprise!");
}

Note that, as for most numeric operations on bytes, chars, and shorts,
the operands are widened to type int, with sign extension, for that
comparison.
backch = (char) back;

if (ch != backch)
{
System.out.println("ERROR - following do not match:");
}
System.out.println("" + i + " [" + ch + "] = " + back + " [" +
backch + "]");
byteBuffer.clear();
}

The code works ok up to value 127. From 129 to 159 (hex 80 to 9F)
bytebuffer always gives back the '?' char (int 63). From 160 to 255 I
get a negative number.

The '?' characters are a dead giveaway of a character encoding problem.
Your default character encoding is apparently a one-byte encoding
that does not map characters U+0080 through U+009f (and, probably not
characters greater than U+00ff, either). It is probably some variant on
Latin-1 (aka ISO/IEC 8859-1, which is not quite the same as ISO-8859-1),
which in fact doesn't define mappings for those characters, but which
also doesn't define mappings for characters U+0000 - U+0019.
I am obviously doing something wrong.

You are improperly intermixing bytes and characters. Do not use
Strings, chars, StringBuffers, etc. to hold binary data. Binary data
that represents characters must be decoded according to the appropriate
character encoding scheme, and that scheme generally must be obtained
from an external source. Always specify the encoding explicitly when
producing binary representations of character data.

And remember that Java bytes are signed.


John Bollinger
(e-mail address removed)
 
D

Davis

John C. Bollinger said:
Ed said:
I have an annoying problem with ByteBuffer (BB).

No, you have a lack of appreciation for the difference between bytes and
characters, and for the accompanying niceties of character encoding.
I have the following code:

for (int i = 0; i < 255; i++)
{
ch = (char) i;
test = new StringBuffer();
test.append((char) i);

So far, so good.
byteBuffer.put(test.toString().getBytes());

But that's bad. You are using the no-arg version of String.getBytes(),
which encodes the characters of the string into an array of bytes
according to the system's default character encoding scheme. You should
_always_ use an explicit encoding scheme to (1) make your intention
clear and (2) ensure that your application works the same way on every
system. It will also get you thinking about what's happening; the
details of this conversion are part of what's tripping you up.
byteBuffer.flip();

back = byteBuffer.get();

That bit is probably fine, although you omitted the declaration of back.
// Uncomment below to see more working ...

/*
if (back < 0)
{
System.out.println("... adding 256 ....");
back = back + 256;
}
*/

Well, yes, I can imagine that that might work better. byteBuffer.get()
returns a value of type *byte*, which is a *signed* 8-bit number. You
are comparing it (below) to a *char*, which (in one relevant sense) is
an *unsigned* 16-bit number. Try this:

if (((char) 0x80) == ((byte) 0x80)) {
System.out.println("No duh!");
} else {
System.out.println("Surprise!");
}

Note that, as for most numeric operations on bytes, chars, and shorts,
the operands are widened to type int, with sign extension, for that
comparison.
backch = (char) back;

if (ch != backch)
{
System.out.println("ERROR - following do not match:");
}
System.out.println("" + i + " [" + ch + "] = " + back + " [" +
backch + "]");
byteBuffer.clear();
}

The code works ok up to value 127. From 129 to 159 (hex 80 to 9F)
bytebuffer always gives back the '?' char (int 63). From 160 to 255 I
get a negative number.

The '?' characters are a dead giveaway of a character encoding problem.
Your default character encoding is apparently a one-byte encoding
that does not map characters U+0080 through U+009f (and, probably not
characters greater than U+00ff, either). It is probably some variant on
Latin-1 (aka ISO/IEC 8859-1, which is not quite the same as ISO-8859-1),
which in fact doesn't define mappings for those characters, but which
also doesn't define mappings for characters U+0000 - U+0019.
I am obviously doing something wrong.

You are improperly intermixing bytes and characters. Do not use
Strings, chars, StringBuffers, etc. to hold binary data. Binary data
that represents characters must be decoded according to the appropriate
character encoding scheme, and that scheme generally must be obtained
from an external source. Always specify the encoding explicitly when
producing binary representations of character data.

And remember that Java bytes are signed.


John Bollinger
(e-mail address removed)

You can find a utility class to convert a byte array to hex,short,int,
or long Java primitives at
http://www.sqlmagic.com/resources/UtilUnsigned.html.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top