Transformer encoding not working for ISO-8859-1 only for UTF-8

janib · Aug 7, 2006

I have a problem when transforming text containing the swedish letters
"å", "ä" and "ö". If I do

Transformer t =TransformerFactory.newInstance().newTransformer();
t.setOutputProperty( OutputKeys.METHOD, "xml");
t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
t.setOutputProperty( OutputKeys.INDENT, "yes");
t.setOutputProperty( OutputKeys.ENCODING, "ISO-8859-1"); <------- *
t.transform( new DOMSource( document), new StreamResult( output ) );
return output.toString( );

I get an xml-file containing broken characters (=?) for the swedish
letters:

<?xml version="1.0" encoding="ISO-8859-1"?>
....
<channelinfo confirmed="true" validate="false" name="Internet">
<publishdate>1154940455898</publishdate>
<unpublishdate>1154940455898</unpublishdate>
<attribute name="rooms"/>
<attribute name="year"/>
<attribute name="title">K?pes</attribute> <------------- *
<attribute name="price">20000</attribute>
<attribute name="area"/>
<attribute name="body">Vill k?pa en truck</attribute>
<-------------- *
</channelinfo>

but if I change the encoding to UTF-8:

t.setOutputProperty( OutputKeys.ENCODING, "UTF-8"); <------- *

the letters are alright:

<?xml version="1.0" encoding="UTF-8"?>
....
<channelinfo confirmed="true" validate="false" name="Internet">
<publishdate>1154940455898</publishdate>
<unpublishdate>1154940455898</unpublishdate>
<attribute name="rooms"/>
<attribute name="year"/>
<attribute name="title">Köpes</attribute> <------------- *
<attribute name="price">20000</attribute>
<attribute name="area"/>
<attribute name="body">Vill köpa en truck</attribute>
<-------------- *
</channelinfo>

But the xml has to be formated in ISO-8859-1 so it would be nice if I
could make it work with that encoding.

Anyone know where I can alter this behavior or why it behaves like
above?

Jono · Aug 7, 2006

Hi Janib,
Your code works fine for me (as expected, because å", "ä" and "ö"
are part of the ISO-8859-1 character set), so I think the problem might
lie with one of the objects you're creating out of the scope of the
code snippet. Your "output" object might have a side-effect if it's
doing some character encoding of its own. I tried with a StringWriter
and also with a FileOutputStream and it worked correctly (using Java
1.5).
Cheers,
Jono

janib · Aug 7, 2006

Tje output object is only a ByteArrayOuputStream...

ByteArrayOutputStream output = new ByteArrayOutputStream( );

Jono skrev:

Roland de Ruiter · Aug 7, 2006

Tje output object is only a ByteArrayOuputStream...

ByteArrayOutputStream output = new ByteArrayOutputStream( );

See my reply in comp.lang.java.help

Generating pdf with iso-8859-1 using fop on Linux with deafult encoding utf-8	0	Feb 21, 2005
How to parse xml with ISO-8859-1 encoding using ElementTree andSimpleXMLTreeBuilder?	0	May 13, 2008
XML::DOM Encoding UTF-8 and ISO-8859-1	1	Feb 18, 2004
iso-8859-1 and UTF-8	3	Feb 24, 2006
utf-8 and iso-8859-1 not working?	1	Nov 7, 2006
change encoding from UTF-8 to ISO-8859-1	0	Feb 18, 2005
Browser Encoding Validation for ISO-8859-1	0	Oct 13, 2005
UTF-8 BOM w/ ISO-8859-1 encoding pseudo attribute	1	Aug 18, 2004

Transformer encoding not working for ISO-8859-1 only for UTF-8

janib

Jono

janib

Roland de Ruiter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads