convert CharArray to ByteArray

H

hierholzer

I'm converting an array of char to an array of bytes:

static public byte[] convertCharArrayToByteArray(char[] ca) {
byte[] ba = new byte[ca.length*2];
int j = 0;
byte mask = 0xff;

for(int i = 0; i < ca.length; ++i, j+=2) {
byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
byte lower8bits = ((byte) ca & mask);
ba[j] = upper8bits;
ba[j+1] = lower8bits;
}

return ba;
}

I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.

What is the suggested way around this in Java?
 
A

Arne Vajhøj

hierholzer said:
I'm converting an array of char to an array of bytes:

static public byte[] convertCharArrayToByteArray(char[] ca) {
byte[] ba = new byte[ca.length*2];
int j = 0;
byte mask = 0xff;

for(int i = 0; i < ca.length; ++i, j+=2) {
byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
byte lower8bits = ((byte) ca & mask);
ba[j] = upper8bits;
ba[j+1] = lower8bits;
}

return ba;
}

I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.

What is the suggested way around this in Java?


Just cast it with (byte).

Have you considered:

static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}

?

Arne
 
H

hierholzer

hierholzer said:
I'm converting an array of char to an array of bytes:
static public byte[] convertCharArrayToByteArray(char[] ca) {
  byte[] ba = new byte[ca.length*2];
  int j = 0;
  byte mask = 0xff;
  for(int i = 0; i < ca.length; ++i, j+=2) {
    byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
    byte lower8bits = ((byte) ca & mask);
    ba[j] = upper8bits;
    ba[j+1] = lower8bits;
  }

  return ba;
}
I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.
What is the suggested way around this in Java?

Just cast it with (byte).

Have you considered:

static public byte[] convertCharArrayToByteArray(char[] ca) {
     return (new String(ca)).getBytes("UTF-16");

}

?

Arne


I posted because the cast got me nowhere.

For
"byte upper8bits = ((byte)(ca >> (1<<3)) & mask);"
I get "Possible loss of precision. Found (int) required (byte)"
and it signals the AND operator.

I'll just use your suggested method though. However, this makes me
curious, should I try compiling with a different compiler?
 
L

Lew

hierholzer said:
For
"byte upper8bits = ((byte)(ca >> (1<<3)) & mask);"
I get "Possible loss of precision. Found (int) required (byte)"
and it signals the AND operator.


It's because you have an int as the result of the mask operation (&) and you
don't cast it down to byte as you assign.

Try instead

byte upper8bits = (byte)((ca >> (1<<3)) & mask);

Also, the mask, being a negative number, would sign extend. Declare it 'int'
instead, and you really should make it 'final'.

final int mask = 0xFF;

<http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#5.6.2>
 
J

Joshua Cranmer

hierholzer said:
For
"byte upper8bits = ((byte)(ca >> (1<<3)) & mask);"
I get "Possible loss of precision. Found (int) required (byte)"
and it signals the AND operator.


(byte)((ca >> (1 << 3)) & mask). All operators cause an expansion to
int, per JLS §4.2.2:
If an integer operator other than a shift operator has at least one
operand of type long, then the operation is carried out using 64-bit
precision, and the result of the numerical operator is of type long. If
the other operand is not long, it is first widened (§5.1.5) to type
long by numeric promotion (§5.6). Otherwise, the operation is carried
out using 32-bit precision, and the result of the numerical operator is
of type int. If either operand is not an int, it is first widened to
type int by numeric promotion.

Granted, the utility of expansion for the bitwise operators is probably
negative, but it is forbidden nonetheless. I am going to guess that the
rationale involves either making the underlying mechanics obvious, or
perhaps it's a result derived from the signedness of bytes.
 
J

Jim

I'm converting an array of char to an array of bytes:

static public byte[] convertCharArrayToByteArray(char[] ca) {
byte[] ba = new byte[ca.length*2];
int j = 0;
byte mask = 0xff;

for(int i = 0; i < ca.length; ++i, j+=2) {
byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
byte lower8bits = ((byte) ca & mask);
ba[j] = upper8bits;
ba[j+1] = lower8bits;
}

return ba;
}

I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.

What is the suggested way around this in Java?


Why not simply

int i = 0;
for(char c : data) {
retArray[i++] = (byte) (c >> 8);
retArray[i++] = (byte) (c & 0xFF);
}

Jim
 
A

Arne Vajhøj

hierholzer said:
hierholzer said:
I'm converting an array of char to an array of bytes:
static public byte[] convertCharArrayToByteArray(char[] ca) {
byte[] ba = new byte[ca.length*2];
int j = 0;
byte mask = 0xff;
for(int i = 0; i < ca.length; ++i, j+=2) {
byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
byte lower8bits = ((byte) ca & mask);
ba[j] = upper8bits;
ba[j+1] = lower8bits;
}
return ba;
}
I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.
What is the suggested way around this in Java?

Just cast it with (byte).

Have you considered:

static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");

}

?


I posted because the cast got me nowhere.

For
"byte upper8bits = ((byte)(ca >> (1<<3)) & mask);"
I get "Possible loss of precision. Found (int) required (byte)"
and it signals the AND operator.


The cast is in the wrong place. It should be the last operation.
I'll just use your suggested method though. However, this makes me
curious, should I try compiling with a different compiler?

All Java compilers should give you the same error.

Arne
 
K

Karl Uppiano

Arne Vajhøj said:
hierholzer said:
I'm converting an array of char to an array of bytes:

static public byte[] convertCharArrayToByteArray(char[] ca) {
byte[] ba = new byte[ca.length*2];
int j = 0;
byte mask = 0xff;

for(int i = 0; i < ca.length; ++i, j+=2) {
byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
byte lower8bits = ((byte) ca & mask);
ba[j] = upper8bits;
ba[j+1] = lower8bits;
}

return ba;
}

I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.

What is the suggested way around this in Java?


Just cast it with (byte).

Have you considered:

static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}


The latter suggestion would definitely be the best approach if the char
array is actual (UTF-16) characters. You might be lucky, and the char array
is already from UTF-8 or a single-byte charset, but if not, look out! The
loss of precision warning is telling you something.
 
A

Arne Vajhøj

Karl said:
Arne Vajhøj said:
hierholzer said:
I'm converting an array of char to an array of bytes:

static public byte[] convertCharArrayToByteArray(char[] ca) {
byte[] ba = new byte[ca.length*2];
int j = 0;
byte mask = 0xff;

for(int i = 0; i < ca.length; ++i, j+=2) {
byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
byte lower8bits = ((byte) ca & mask);
ba[j] = upper8bits;
ba[j+1] = lower8bits;
}

return ba;
}

I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.

What is the suggested way around this in Java?


Just cast it with (byte).

Have you considered:

static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}


The latter suggestion would definitely be the best approach if the char
array is actual (UTF-16) characters. You might be lucky, and the char
array is already from UTF-8 or a single-byte charset, but if not, look
out! The loss of precision warning is telling you something.


Char's is supposed to contain UTF-16.

Arne
 
R

Roedy Green

I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.

What is the suggested way around this in Java?

There is usually an encoding involved. See
http://mindprod.com/jgloss/encoding.html

If all you want to do is discard the high order byte see
http://mindprod.com/jgloss/unsigned.html
--
Roedy Green Canadian Mind Products
http://mindprod.com

If everyone lived the way people do in Vancouver, we would need three more entire planets to support us.
~ Guy Dauncey
 
C

charlesbos73

hierholzer said:
I'm converting an array of char to an array of bytes:
static public byte[] convertCharArrayToByteArray(char[] ca) {
byte[] ba = new byte[ca.length*2];
int j = 0;
byte mask = 0xff;
for(int i = 0; i < ca.length; ++i, j+=2) {
byte upper8bits = ((byte)(ca >> (1<<3)) & mask);
byte lower8bits = ((byte) ca & mask);
ba[j] = upper8bits;
ba[j+1] = lower8bits;
}
return ba;
}
I'm getting loss of precision because the primitive type byte is
represented
as signed 2s complement. hence, 0xff causes loss of precision issues
as with the other bit manipulation statements.
What is the suggested way around this in Java?

Just cast it with (byte).
Have you considered:
static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}

The latter suggestion would definitely be the best approach if the char
array is actual (UTF-16) characters. You might be lucky, and the char array
is already from UTF-8 or a single-byte charset, but if not, look out! The
loss of precision warning is telling you something.


This is non-sense.

A Java char is well defined.

UTF-16 is also well defined.

This has exactly *nothing* to do with UTF-8 nor "single byte charset".

There's not going to be any "loss of precision" [sic] when
doing :

(new String(ca)).getBytes("UTF-16");

Any character present in "ca" can be encoded in UTF-16
(including characters from Unicode 3.1 and later)
and the whole resulting byte[] can always be reused to
recreate the original char[]. Whether the original char[]
is correctly formed or not by the OP in case Unicode 3.1 and
up codepoints are used is another topic.

I don't care (perfomances excepted) if internally the
char[] is represented using the color of boots little
fearies are wearing or if it's already UTF-16, the fact
is that:
static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}

shall *always* produce a byte[] that can be reused to
construct the original char[] (there are exactly zero
issues with UTF-8 or "single byte encoding" [sic] in
this case).

Note that:

System.out.println( convertCharArrayToByteArray( new char[]
{'a'} ).length );

shall print '4' and the OP probably wants to read on
what a BOM is if he decides to use this method.


P.S: Wheter or not the UTF-16 encoding is mandated to be present
for the JVM to be compliant is a question better left to
the JLS-nazi bot that shall recognize himself. Note that
if it is mandatory, then you have to stupidly catch an
exception that is impossible to happen, just like when
you do getBytes("UTF-8") (UTF-8 is mandatory for the JVM
to be compliant, which beg the question as to why we don't
have a getUTF8Bytes() method but I disgress and the JLS-nazi
bot certainly can explain why the Java designer were right
when they mandated UTF-8 to be a supported JVM encoding
but did not provide a getUTF8Bytes() method as everything
in Java is holy and as a logical explanation).
 
M

Mike Schilling

charlesbos73 said:
P.S: Wheter or not the UTF-16 encoding is mandated to be present
for the JVM

It is.
Note that
if it is mandatory, then you have to stupidly catch an
exception that is impossible to happen,

Yeah, that's suboptimal.
just like when
you do getBytes("UTF-8") (UTF-8 is mandatory for the JVM
to be compliant, which beg the question as to why we don't
have a getUTF8Bytes() method

Not exactly difficult to constuct.

public byte[] getUTF8Bytes(String s)
{
try
{
return s.getBytes("UTF-8");
}
catch (UnsupportedEncodingException ex)
{
// Should never happen
throw new RuntimeException(ex.getMessage, ex);
}
}
 
M

Mike Schilling

charlesbos73 said:
P.S: Wheter or not the UTF-16 encoding is mandated to be present
for the JVM

It is.
Note that
if it is mandatory, then you have to stupidly catch an
exception that is impossible to happen,

Yeah, that's suboptimal.
just like when
you do getBytes("UTF-8") (UTF-8 is mandatory for the JVM
to be compliant, which beg the question as to why we don't
have a getUTF8Bytes() method

Not exactly difficult to constuct.

public byte[] getUTF8Bytes(String s)
{
try
{
return s.getBytes("UTF-8");
}
catch (UnsupportedEncodingException ex)
{
// Should never happen
throw new RuntimeException(ex.getMessage, ex);
}
}
 
M

Mayeul

charlesbos73 said:
P.S: Wheter or not the UTF-16 encoding is mandated to be present
for the JVM to be compliant is a question better left to
the JLS-nazi bot that shall recognize himself. Note that
if it is mandatory, then you have to stupidly catch an
exception that is impossible to happen, just like when
you do getBytes("UTF-8") (UTF-8 is mandatory for the JVM
to be compliant, which beg the question as to why we don't
have a getUTF8Bytes() method but I disgress and the JLS-nazi
bot certainly can explain why the Java designer were right
when they mandated UTF-8 to be a supported JVM encoding
but did not provide a getUTF8Bytes() method as everything
in Java is holy and as a logical explanation).

Wow, your contribution is technically correct, but chill out already.

The absence of getUTF8Bytes() is just consistency with the existence of
getBytes("UTF-8") and the absence of getAsciiBytes() and
getUTF16Bytes(), getUTF16LEBytes()...

getUTF8Bytes() would only spare checking for an impossible Exception,
which is annoying to have to do, but hardly at all. Just throw an
InternalError and get on with it, you have a bigger problem if you do
that every now and then anyway.

I mean really, this is hardly worth pointing out at all.
 
M

Mayeul

Lew said:
But can contain anything that fits. There is no enforcement.

Would you mind to elaborate? java.lang.Character's javadoc seems to
indicate that chars are UTF-16, and therefore it is enforced by the char
type itself.

It seems to me like making it possibly otherwise would cause rather
serious regression on out-of-BMP-enabled Java applications (where you at
least need to use java.lang.Character methods that depend on chars and
Strings to be UTF-16 or something close enough.)
 
M

Mayeul

Arne said:
Have you considered:

static public byte[] convertCharArrayToByteArray(char[] ca) {
return (new String(ca)).getBytes("UTF-16");
}

Actually, that will add a BOM at the start of the bytes array. You'll
want "UTF-16BE" for the same behavior without a BOM.
 
M

markspace

Mike said:
// Should never happen
throw new RuntimeException(ex.getMessage, ex);


Even allowing that this really should never happen, throwing
RuntimeException makes me a bit ill. I's suggest something further down
the class hierarchy.

"MissingResourceException" might work, if you consider that Charsets are
provided by CharsetProvider and loaded through the thread context class
loader.

Picky, I know, but it's best to discourage anyone from throwing an
actual RuntimeException imo.
 
M

markspace

Mayeul said:
Would you mind to elaborate? java.lang.Character's javadoc seems to
indicate that chars are UTF-16, and therefore it is enforced by the char
type itself.


I think Lew means that the char data type is just an int, and therefore
can contain any data at all. If you get your char array from String,
then yes it really should be constrained, but any random char buffer may
not be. Caveat emptor and all that.
 
M

markspace

Mike Schilling wrote:

Why? Even if it could, somehow, be thrown, what harm would result?


Well you can't catch a RuntimeException with out having to catch ALL
runtime exceptions and then try to sort out which one it actually was.

Sure it never "should" happen but weird things happen in test
environments, and sometimes there are funky customer environments too.
Not too big of an issue if you have access to the source but even
different groups in the same company may not have easy ability to change
one-another's source code.

Code defensively.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top