Serializing XML with JAXP - help needed

Michael · Feb 22, 2004

Hi all,

I'm trying to serialize an xml document with JAXP. The xml may or may not
contain international characters, and so I want any text elements to be
UTF-8 encoded. Consider the following (a brief summary is included below the
code):

---- code begin ----

org.w3c.dom.Document doc =
javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder().
newDocument();

org.w3c.dom.Element el = doc.createElement("element");
el.setAttribute("attr1","attr1value");
el.appendChild(doc.createTextNode("Danish < æøå > characters!"));
doc.appendChild(el);

javax.xml.transform.TransformerFactory transformerFactory =
javax.xml.transform.TransformerFactory.newInstance();
javax.xml.transform.Transformer transformer =
transformerFactory.newTransformer();

transformer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT,"yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount","4
");

java.io.StringWriter xmlout = new java.io.StringWriter();
javax.xml.transform.stream.StreamResult result = new
javax.xml.transform.stream.StreamResult(xmlout);
transformer.transform(new javax.xml.transform.dom.DOMSource(doc),result);

System.out.println(xmlout.getBuffer());

---- code end ----

So, I'm creating a document (DOM), setting an attribute and appending a text
node with international characters (and a couple of brackets just for fun).
Then I create a transformer instance, I ask it to indent the output nicely
and finally to actually serialize my DOM into xml.

When I run this code (in a jsp file on a tomcat 4.1.x server with the latest
xerces2-j version installed) I get this output:

<?xml version="1.0" encoding="UTF-8"?>
<element attr1="attr1value">Danish < æøå > characters!</element>

Okay. So I got the < and > converted as I expected. However, the
international characters do not appear to have been encoded to UTF-8 or
anything else for that matter. In fact, the above isn't even a valid xml
document, and several parsers I tried (including Microsoft XML) rejects it
because of the illegal character data. Clearly there is a mismatch between
the what xml header encoding specifies and what's actually appearing in the
text nodes of the document. It's very curious that JAXP will transform a DOM
into a result that isn't valid.

Interestingly, when I run the same code interactively inside my WebSphere
Studio Application Developer 5 (using what is known as a scrapbook page), I
get this:

<?xml version="1.0" encoding="UTF-8"?>
<element attr1="attr1value">Danish < æøå >
characters!</element>

Well. I'm not sure that #230 is a correct UTF-8 encoding of "æ" (in fact I'm
sure it isn't), but at least the document is now valid and even Microsoft
XML will parse it without complaints.

I am hoping that someone out there can shed some light on this problem and
tell me what I am doing wrong. Exactly how do I instruct JAXP to encode the
text nodes in my DOM so that it doesn't break my XML parser?

Regards,
Michael Berg
www.hyperpal.com

Michael Berg · Feb 22, 2004

Hi all,

The problem is related to the use of a StringWriter to collect the XML
output. Apparently StringWriters have their own idea about character
encoding, so use an OutputStreamWriter in stead - like this, for example:

java.io.ByteArrayOutputStream baos = new java.io.ByteArrayOutputStream();
javax.xml.transform.stream.StreamResult result = new
javax.xml.transform.stream.StreamResult(
new java.i

utputStreamWriter(
baos,
"UTF-8"
)
);

/Michael
www.hyperpal.com

JAXP throws "javax.net.ssl.SSLHandshakeException" accessing local file???	4	Jun 5, 2014
JAXP: serializing XML with identity transform, but no indent?	3	Mar 9, 2006
JAXP XSLT - TransformerConfigurationException	1	Apr 3, 2005
jaxp 1.3 validate xml with noNameSpaceSchemaLocation?	0	May 22, 2006
Help with code	0	Jun 12, 2022
Serializing an XML Dom	1	Dec 1, 2005
help needed with xml	1	May 1, 2005
JAXP Document to String needed	2	May 20, 2004

Serializing XML with JAXP - help needed

Michael

Michael Berg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads