encoding troubles

M

Matthijs Blaas

Hi all!

I have a byte array of data which I want to post to an php script. So I send
the representing string of the byte array using a post with utf-8 encoding.
I figured java works with UTF-16BE internally, but after receiving the utf-8
encoded post in php and converting it to back to UTF-16BE it was not the
same...

I wrote a little test and it turns out if i conver a byte array to utf-8 and
back it wont match the original byte array, but the string representing it
is the same:

String test="some test";
byte[] dest,temp;

dest = test.getBytes();

try {
temp=new String(dest).getBytes("UTF-8");
}catch(Exception e) { System.out.println(e); }

try {
test=new String(temp,"UTF-8");
System.out.println(test);
dest=test.getBytes();
} catch(Exception e) { System.out.println(e); }

if(dest==temp) System.out.println("Success!");
else System.out.println("failed");

How can I convert the UTF-8 bytes back to the original byte array? And will
this work when posting to php? Its important the binary data remains intact
instead of the string representation...

Thanks in advance,

Matthijs
 
B

BigbooTAY

String test="some test";
byte[] dest,temp;

dest = test.getBytes();

try {
temp=new String(dest).getBytes("UTF-8");
}catch(Exception e) { System.out.println(e); }

try {
test=new String(temp,"UTF-8");
System.out.println(test);
dest=test.getBytes();
} catch(Exception e) { System.out.println(e); }

if(dest==temp) System.out.println("Success!");
else System.out.println("failed");

How can I convert the UTF-8 bytes back to the original byte array? And will
this work when posting to php? Its important the binary data remains intact
instead of the string representation...

You can't use == to compare the contents of two arrays, so that test
is never going to work. Actually, by the time you do that test,
you've got two different representations anyway: 'temp' was created
using getBytes("UTF-8") while 'dest' was created with getBytes(),
which would use the system default encoding. They probably do contain
the same bytes, since the string only contains ASCII characters, but
you want to be aware of that.

This test works correctly:

try
{
String str0 = "some test";
byte[] ar0 = str0.getBytes("UTF-8");

String str1 = new String(ar0, "UTF-8");
System.out.println(str1);
}
catch (Exception ex)
{
ex.printStackTrace();
}

BTW, you don't need to know what encoding Java uses internally. All
you need to know is the encoding of the byte array, so you can do the
conversion correctly.
 
M

Matthijs Blaas

Thanks for your reply, it indeed sounds more logical to have the byte array
encoding converted before i do anything with it :)

However thing is, I convert a string(containing my plaintext) to a byte
array(utf8 encoded), I feed this byte array to an encryption engine which
returns back a byte array(internal encoding I presume?). If I want to
decrypt this encoded block of bytes back it is ofcourse important that the
bytes remain the same as outputted by the engine when php receives the
posted message. Im a little confused on how to achieve this as the original
bytes will get lost if I convert them using another encoding? Must I convert
back to the encoding java uses internal? And isn't this platform specific?

Please help!

-Thijs


BigbooTAY said:
String test="some test";
byte[] dest,temp;

dest = test.getBytes();

try {
temp=new String(dest).getBytes("UTF-8");
}catch(Exception e) { System.out.println(e); }

try {
test=new String(temp,"UTF-8");
System.out.println(test);
dest=test.getBytes();
} catch(Exception e) { System.out.println(e); }

if(dest==temp) System.out.println("Success!");
else System.out.println("failed");

How can I convert the UTF-8 bytes back to the original byte array? And will
this work when posting to php? Its important the binary data remains intact
instead of the string representation...

You can't use == to compare the contents of two arrays, so that test
is never going to work. Actually, by the time you do that test,
you've got two different representations anyway: 'temp' was created
using getBytes("UTF-8") while 'dest' was created with getBytes(),
which would use the system default encoding. They probably do contain
the same bytes, since the string only contains ASCII characters, but
you want to be aware of that.

This test works correctly:

try
{
String str0 = "some test";
byte[] ar0 = str0.getBytes("UTF-8");

String str1 = new String(ar0, "UTF-8");
System.out.println(str1);
}
catch (Exception ex)
{
ex.printStackTrace();
}

BTW, you don't need to know what encoding Java uses internally. All
you need to know is the encoding of the byte array, so you can do the
conversion correctly.
 
A

Alan Moore

However thing is, I convert a string(containing my plaintext) to a byte
array(utf8 encoded), I feed this byte array to an encryption engine which
returns back a byte array(internal encoding I presume?). If I want to
decrypt this encoded block of bytes back it is ofcourse important that the
bytes remain the same as outputted by the engine when php receives the
posted message. Im a little confused on how to achieve this as the original
bytes will get lost if I convert them using another encoding? Must I convert
back to the encoding java uses internal? And isn't this platform specific?

What is this encrytpion engine? Is is a Java app? You say it works
with byte arrays, but does it know that the bytes are supposed to
represent characters? (I would think, if it's for encrypting text, it
would work with char arrays.) On the other hand, if it doesn't care
what the bytes represent, it shouldn't matter what encoding you use.
It will scramble the bytes in a certain way, and the decrytion process
will unscramble them. You just have to make sure you use the same
character encoding at both ends.

I don't have enough information to give you an answer; I'm just trying
to help you ask more fruitful questions.
 
M

Matthijs Blaas

I use this encryption library: http://logi.org/logi.crypto/
The engine I use is to encode blocks of bytes (it doesn't concern if it's
text or something else). Yet it matters what you feed it, I use an RSA
cipher to encode a block of data, but when I want to decode, I have to feed
it exactly the same block that it outputted. It should be compatible with
other RSA standards like openssl. I want the encrypted block of data to be
decrypted in php using openssl. However when I perform a little test in java
to see if the data isn't messed up it throws an error that the input data is
invalid cipherblock:

byte[] source,dest,res;
String plain="test",contents;

try { source=plain.getBytes("UTF-8"); } catch(Exception e) {}
try { dest=Encrypt(source,0,source.length); } catch(Exception e) {}
write dest to utf8 encoded file
read utf8 encoded file in contents

now when I try to decrypt the file contents which is different than the
original byte array dest:
try { res=Decrypt(contents.getBytes(),0,contents.getBytes().length) }
catch(Exception e) {}
it throws me an error, if I use the original byte array dest it obviously
does work. The string representation seems to be the same for both byte
arrays (new String(dest) & contents) the bytes are scrambled in the utf8
encoded version... anyway to get around this problem?

Thanks,

Matthijs
 
G

Gordon Beaton

try { source=plain.getBytes("UTF-8"); } catch(Exception e) {}
try { dest=Encrypt(source,0,source.length); } catch(Exception e) {}

I think this part is the problem:
write dest to utf8 encoded file
read utf8 encoded file in contents

Compare source and dest arrays - are they equal?

Now create a new String from dest _without_ first storing the results
in a file. Is the new String equal() to the original String plain?

Accoding to the following code, contents is a String, so you are
performing an additional conversion from byte[] to String to byte[]
when you write and read the file:
try { res=Decrypt(contents.getBytes(),0,contents.getBytes().length) }
catch(Exception e) {}

How do you write dest to the file? How do you read it back? Hint: use
Readers and Writers for text, InputStreams and OutputStreams for
byte[].

/gordon
 
M

Matthijs Blaas

The array which I read from the file is not the same as the dest array,
their string representation is. This is must be because of java's internal
encoding (which the dest array is encoded in).

this is how I write the array to a file:
output = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(aFile), "UTF8"));
output.write(new String(new String(dest).getBytes("UTF-8")));

this is how i read the contents back:
BufferedReader in = new BufferedReader(new InputStreamReader(new
FileInputStream(aFile), "UTF8"));
contents = in.readLine();

So at this point contents=new String(dest) but contents.getBytes() != dest
It might be the Reader & Writer used?


Gordon Beaton said:
try { source=plain.getBytes("UTF-8"); } catch(Exception e) {}
try { dest=Encrypt(source,0,source.length); } catch(Exception e) {}

I think this part is the problem:
write dest to utf8 encoded file
read utf8 encoded file in contents

Compare source and dest arrays - are they equal?

Now create a new String from dest _without_ first storing the results
in a file. Is the new String equal() to the original String plain?

Accoding to the following code, contents is a String, so you are
performing an additional conversion from byte[] to String to byte[]
when you write and read the file:
try { res=Decrypt(contents.getBytes(),0,contents.getBytes().length) }
catch(Exception e) {}

How do you write dest to the file? How do you read it back? Hint: use
Readers and Writers for text, InputStreams and OutputStreams for
byte[].

/gordon
 
G

Gordon Beaton

this is how I write the array to a file:
output = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(aFile), "UTF8"));

The following line takes the dest byte array, converts it using the
default system encoding (which is what?) to a String, converts the
String to a second byte array (using UTF-8), then creates another
String from that (using the system default encoding), and finally (are
you still with me here) writes that second String to the file. If all
has gone well, the UTF8 (hmm, different spelling) representation of
the second String gets stored in the file:
output.write(new String(new String(dest).getBytes("UTF-8")));

Really, what are you hoping all of those conversions will achieve?
this is how i read the contents back:
BufferedReader in = new BufferedReader(new InputStreamReader(new
FileInputStream(aFile), "UTF8"));
contents = in.readLine();

So at this point contents=new String(dest) but contents.getBytes()
!= dest It might be the Reader & Writer used?

The Reader and Writer seem to be only part of the problem. All of the
unnecessary conversions certainly don't help.

Step 1: don't use any kind of Reader or Writer for handling non-text.

To write the byte[] to the file, use a FileOutputStream directly
(possibly wrapping it in a BufferedOutputStream):

OutputStream output = new FileOutputStream(afile);
output.write(dest);

To read the byte[] from the file, use a FileInputStream directly
(possibly wrapping it in a BufferedInputStream):

InputStream input = new FileInputStream(afile);
input.read(contents);

(error checking etc left out but still necessary)

/gordon
 
D

Dave Monroe

Matthijs Blaas said:
if(dest==temp) System.out.println("Success!");
else System.out.println("failed");


Try

if(dest.equals(temp) System.out.println("Success!");

The '==' operator checks to see if the two objects are the same object.
 
J

John C. Bollinger

Matthijs said:
Thanks for your reply, it indeed sounds more logical to have the byte array
encoding converted before i do anything with it :)

However thing is, I convert a string(containing my plaintext) to a byte
array(utf8 encoded), I feed this byte array to an encryption engine which
returns back a byte array(internal encoding I presume?). If I want to
decrypt this encoded block of bytes back it is ofcourse important that the
bytes remain the same as outputted by the engine when php receives the
posted message. Im a little confused on how to achieve this as the original
bytes will get lost if I convert them using another encoding? Must I convert
back to the encoding java uses internal? And isn't this platform specific?

The encrypted byte array no longer represents characters in any
character encoding, so it is inappropriate to attempt to convert it to a
String. A byte[] is exactly what it is. If you MUST transfer it in
String form, then choose an *8-bit* encoding (ISO-8859-1 is probably a
good bet) to get a one-to-one correspondence between bytes and
characters. I'm not sure what your Java / PHP interface looks like, but
do make sure that PHP interprets the characters according to the correct
encoding in order to get the bytes back. Do note that THIS IS A HACK.
You should really be passing around the raw bytes without pretending
that they represent characters.


John Bollinger
(e-mail address removed)
 
M

Michael Borgwardt

John said:
The encrypted byte array no longer represents characters in any
character encoding, so it is inappropriate to attempt to convert it to a
String. A byte[] is exactly what it is. If you MUST transfer it in
String form, then choose an *8-bit* encoding (ISO-8859-1 is probably a
good bet)

Still too risky. Instead, use a base64 or hex encoding.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top