EncodingProblem with ToString

P

Peter Plumber

Hi,

I am a very beginner with programming java.
I am trying to use java.beans.XMLEncoder for creating a String
containing the XML serialization of my object.
I am using the following code (probably clumsy code)

/**
* serialize object to XML as String.
*/
public String serialize(){
ByteArrayOutputStream streamOut = new ByteArrayOutputStream();
XMLEncoder xmlCreater = new XMLEncoder(streamOut);
xmlCreater.writeObject(this);
xmlCreater.close();
return streamOut.toString();
}

my problem is that in the result some characters are changed
e.g.: "PhÃ?nomene" instead of "Phänomene"

how could I solve this problem?
is there a less lengthy way to get the bean XML?

thanks

Peter
 
D

Dale King

Peter said:
Hi,

I am a very beginner with programming java.
I am trying to use java.beans.XMLEncoder for creating a String
containing the XML serialization of my object.
I am using the following code (probably clumsy code)

/**
* serialize object to XML as String.
*/
public String serialize(){
ByteArrayOutputStream streamOut = new ByteArrayOutputStream();
XMLEncoder xmlCreater = new XMLEncoder(streamOut);
xmlCreater.writeObject(this);
xmlCreater.close();
return streamOut.toString();
}

my problem is that in the result some characters are changed
e.g.: "PhÃ?nomene" instead of "Phänomene"

how could I solve this problem?
is there a less lengthy way to get the bean XML?

Despite its appearing to be a textual format, XML really is a binary
format. It is not in general valid to convert it to a string. It has
internal information about character encodings, which can change from
one entity to another.

From the result you got it looks like the XML output is in UTF-8
encoding, which I see is what XMLEncoder is specified to produce.

Why do you think you need to convert it to a string? If it is just for
display for debug purposes then you can use streamOut.toString("UTF8"),
but once again you really should not in general convert the XML output
to a string. If you are saving the XML output or transmitting it then
the raw bytes are what should be used.
 
P

Peter Plumber

Thanks a lot for that info.
What should the function be like at best?

/**
* serialize object to XML as String.
*/
public ByteArrayOutputStream serialize(){
ByteArrayOutputStream streamOut = new ByteArrayOutputStream();
XMLEncoder xmlCreater = new XMLEncoder(streamOut);
xmlCreater.writeObject(this);
xmlCreater.close();
return streamOut;
}

/**
* serialize object to XML as String.
*/
public void serialize(OutputStream streamOut){
XMLEncoder xmlCreater = new XMLEncoder(streamOut);
xmlCreater.writeObject(this);
xmlCreater.close();
}

sth completely different?

thx

Peter
 
T

Thomas Fritsch

Peter said:
I am a very beginner with programming java.
I am trying to use java.beans.XMLEncoder for creating a String
containing the XML serialization of my object.
I am using the following code (probably clumsy code)

/**
* serialize object to XML as String.
*/
public String serialize(){
ByteArrayOutputStream streamOut = new ByteArrayOutputStream();
XMLEncoder xmlCreater = new XMLEncoder(streamOut);
xmlCreater.writeObject(this);
xmlCreater.close();
So far, so good. Your streamOut contains a byte[] array with the
XML-representation encoded in UTF-8. This is consistent with the first
generated XML line saying
<?xml version="1.0" encoding="UTF-8"?>
Note that in UTF-8 a german 'ä' character ('\u00e4') is encoded as 2 bytes
(0xc3, 0xa4), not as 1 byte (0xe4) as you might expect.
return streamOut.toString();
According to the API doc of ByteArrayOutputStream#toString() here you are
decoding the byte[] array to a String using the system's *default*
encoding, what ever that may be (and in your case it is definitely *not*
UTF-8).
What you really want is: decode the byte[] array to a String using the UTF-8
encoding (hence: exactly revert the UTF-8 encoding as done by the
XMLEncoder). That means you have to use
return streamOut.toString("UTF-8");

This will convert the 2 bytes (0xc3, 0xa4) back to the 1 character 'ä'.
 
D

Dale King

Peter said:
Thanks a lot for that info.
What should the function be like at best?

/**
* serialize object to XML as String.
*/
public ByteArrayOutputStream serialize(){
ByteArrayOutputStream streamOut = new ByteArrayOutputStream();
XMLEncoder xmlCreater = new XMLEncoder(streamOut);
xmlCreater.writeObject(this);
xmlCreater.close();
return streamOut;
}

/**
* serialize object to XML as String.
*/
public void serialize(OutputStream streamOut){
XMLEncoder xmlCreater = new XMLEncoder(streamOut);
xmlCreater.writeObject(this);
xmlCreater.close();
}

sth completely different?

Besides removing the word string from the comments, I would change the
first method to:

public byte[] serialize()
{
ByteArrayOutputStream streamOut = new ByteArrayOutputStream();
serialize(streamOut);
streamOut.close();
return streamOut.toByteArray();
}

And there are ways that you can actually convert the bytes of XML into a
string, but they are non-trivial. You would have to parse XML data and
re-encode it again using the correct encoding and changing the encoding
declarations. But there really shouldn't be a need to that.
 
Joined
Aug 11, 2006
Messages
1
Reaction score
0
I have similar kind of problem i am writing a object which contains chinese chracters into a file
I have method which accepts (OutputStream out,Object myobject)

BufferedOutputStream buffOut = new BufferedOutputStream(out);
XMLEncoder encoder = new XMLEncoder(buffOut);
encoder.writeObject(myobject);

if there are even number of chinese chracters then its coming properly else its writing some hexadecimal value ( example i have written 3 letters after first 2 chinese characeters hexadecimal value is coming)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top