Problems with UTF-8 characters and XSLT

J

jan00000

Hi,

I'm using Xalan to do some transforming of XML in Java. My problem is:

I have unicode in my XML (i.e., German Umlauts (ä,ö,ü, and since
they trouble me, I did not try out any other unicode characters). When
I do an Identity Transform and output the XMl to a File, the word
'Glättegefahr', for example, will appear in my File (viewed with
XMLSpy Eclipse-PlugIn) as 'Glã³´egefahr' (except that the ? is a box
instead of a ? .

When I output it to System.out, I get: Glättegefahr. (This is also
what I get using XMLSpy directly, except that XMLSpy does not seem to
understand the <xsl:copy-of> tag).

Here is my Java Code for instantiating the transformer.

Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.transform(new DOMSource(source.getDocumentElement()),
new StreamResult(new FileWriter(origin)));
//or: new StreamResult(System.out)

The XML-file is shown in Eclipse as encoded with utf-8 and each file
involved (xslt, xml) has set the encoding="UTF-8" attribute specified.

Any Idea on what else I can try will be most welcome. Thanks in
advance.
 
M

Martin Honnen

I'm using Xalan to do some transforming of XML in Java. My problem is:

I have unicode in my XML (i.e., German Umlauts (ä,ö,ü, and since
they trouble me, I did not try out any other unicode characters). When
I do an Identity Transform and output the XMl to a File, the word
'Glättegefahr', for example, will appear in my File (viewed with
XMLSpy Eclipse-PlugIn) as 'Glã³´egefahr' (except that the ? is a box
instead of a ? .

When I output it to System.out, I get: Glättegefahr. (This is also
what I get using XMLSpy directly, except that XMLSpy does not seem to
understand the <xsl:copy-of> tag).

As System.out is the console which is usually set to display an 8bit
oriented codepage it seems the output is properly UTF-8 encoded, an ä in
UTF-8 takes two bytes and those ä are the two bytes.
Here is my Java Code for instantiating the transformer.

Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.transform(new DOMSource(source.getDocumentElement()),
new StreamResult(new FileWriter(origin)));
//or: new StreamResult(System.out)

The XML-file is shown in Eclipse as encoded with utf-8 and each file
involved (xslt, xml) has set the encoding="UTF-8" attribute specified.

Does the resulting file have an XML declaration
<?xml version="1.0" encoding="utf-8"?>
?
 
J

jan00000

Yes, it does ... but thanks to your tip I reviewed the API for the
FileWriter class which I used as the Result for the transform method
.... and it showed that it uses a default charset ... and this was the
problem.

I solved it by constructing an OutputStreamWriter using the UTF-8
charset instead, and now the file is transformed, or better, copied,
correctly.

Thank you so much for your help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top