Transformer encoding not working for ISO-8859-1 only for UTF-8

Discussion in 'Java' started by janib, Aug 7, 2006.

  1. janib

    janib Guest

    I have a problem when transforming text containing the swedish letters
    "å", "ä" and "ö". If I do

    Transformer t =TransformerFactory.newInstance().newTransformer();
    t.setOutputProperty( OutputKeys.METHOD, "xml");
    t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    t.setOutputProperty( OutputKeys.INDENT, "yes");
    t.setOutputProperty( OutputKeys.ENCODING, "ISO-8859-1"); <------- *
    t.transform( new DOMSource( document), new StreamResult( output ) );
    return output.toString( );

    I get an xml-file containing broken characters (=?) for the swedish
    letters:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    ....
    <channelinfo confirmed="true" validate="false" name="Internet">
    <publishdate>1154940455898</publishdate>
    <unpublishdate>1154940455898</unpublishdate>
    <attribute name="rooms"/>
    <attribute name="year"/>
    <attribute name="title">K?pes</attribute> <------------- *
    <attribute name="price">20000</attribute>
    <attribute name="area"/>
    <attribute name="body">Vill k?pa en truck</attribute>
    <-------------- *
    </channelinfo>

    but if I change the encoding to UTF-8:

    t.setOutputProperty( OutputKeys.ENCODING, "UTF-8"); <------- *

    the letters are alright:

    <?xml version="1.0" encoding="UTF-8"?>
    ....
    <channelinfo confirmed="true" validate="false" name="Internet">
    <publishdate>1154940455898</publishdate>
    <unpublishdate>1154940455898</unpublishdate>
    <attribute name="rooms"/>
    <attribute name="year"/>
    <attribute name="title">Köpes</attribute> <------------- *
    <attribute name="price">20000</attribute>
    <attribute name="area"/>
    <attribute name="body">Vill köpa en truck</attribute>
    <-------------- *
    </channelinfo>

    But the xml has to be formated in ISO-8859-1 so it would be nice if I
    could make it work with that encoding.

    Anyone know where I can alter this behavior or why it behaves like
    above?
     
    janib, Aug 7, 2006
    #1
    1. Advertising

  2. janib

    Jono Guest

    Hi Janib,
    Your code works fine for me (as expected, because å", "ä" and "ö"
    are part of the ISO-8859-1 character set), so I think the problem might
    lie with one of the objects you're creating out of the scope of the
    code snippet. Your "output" object might have a side-effect if it's
    doing some character encoding of its own. I tried with a StringWriter
    and also with a FileOutputStream and it worked correctly (using Java
    1.5).
    Cheers,
    Jono


    janib wrote:
    > I have a problem when transforming text containing the swedish letters
    > "å", "ä" and "ö". If I do
    >
    > Transformer t =TransformerFactory.newInstance().newTransformer();
    > t.setOutputProperty( OutputKeys.METHOD, "xml");
    > t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    > t.setOutputProperty( OutputKeys.INDENT, "yes");
    > t.setOutputProperty( OutputKeys.ENCODING, "ISO-8859-1"); <------- *
    > t.transform( new DOMSource( document), new StreamResult( output ) );
    > return output.toString( );
    >
    > I get an xml-file containing broken characters (=?) for the swedish
    > letters:
    >
    > <?xml version="1.0" encoding="ISO-8859-1"?>
    > ...
    > <channelinfo confirmed="true" validate="false" name="Internet">
    > <publishdate>1154940455898</publishdate>
    > <unpublishdate>1154940455898</unpublishdate>
    > <attribute name="rooms"/>
    > <attribute name="year"/>
    > <attribute name="title">K?pes</attribute> <------------- *
    > <attribute name="price">20000</attribute>
    > <attribute name="area"/>
    > <attribute name="body">Vill k?pa en truck</attribute>
    > <-------------- *
    > </channelinfo>
    >
    > but if I change the encoding to UTF-8:
    >
    > t.setOutputProperty( OutputKeys.ENCODING, "UTF-8"); <------- *
    >
    > the letters are alright:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > ...
    > <channelinfo confirmed="true" validate="false" name="Internet">
    > <publishdate>1154940455898</publishdate>
    > <unpublishdate>1154940455898</unpublishdate>
    > <attribute name="rooms"/>
    > <attribute name="year"/>
    > <attribute name="title">Köpes</attribute> <------------- *
    > <attribute name="price">20000</attribute>
    > <attribute name="area"/>
    > <attribute name="body">Vill köpa en truck</attribute>
    > <-------------- *
    > </channelinfo>
    >
    > But the xml has to be formated in ISO-8859-1 so it would be nice if I
    > could make it work with that encoding.
    >
    > Anyone know where I can alter this behavior or why it behaves like
    > above?
     
    Jono, Aug 7, 2006
    #2
    1. Advertising

  3. janib

    janib Guest

    Tje output object is only a ByteArrayOuputStream...

    ByteArrayOutputStream output = new ByteArrayOutputStream( );

    Jono skrev:

    > Hi Janib,
    > Your code works fine for me (as expected, because å", "ä" and "ö"
    > are part of the ISO-8859-1 character set), so I think the problem might
    > lie with one of the objects you're creating out of the scope of the
    > code snippet. Your "output" object might have a side-effect if it's
    > doing some character encoding of its own. I tried with a StringWriter
    > and also with a FileOutputStream and it worked correctly (using Java
    > 1.5).
    > Cheers,
    > Jono
    >
    >
    > janib wrote:
    > > I have a problem when transforming text containing the swedish letters
    > > "å", "ä" and "ö". If I do
    > >
    > > Transformer t =TransformerFactory.newInstance().newTransformer();
    > > t.setOutputProperty( OutputKeys.METHOD, "xml");
    > > t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    > > t.setOutputProperty( OutputKeys.INDENT, "yes");
    > > t.setOutputProperty( OutputKeys.ENCODING, "ISO-8859-1"); <------- *
    > > t.transform( new DOMSource( document), new StreamResult( output ) );
    > > return output.toString( );
    > >
    > > I get an xml-file containing broken characters (=?) for the swedish
    > > letters:
    > >
    > > <?xml version="1.0" encoding="ISO-8859-1"?>
    > > ...
    > > <channelinfo confirmed="true" validate="false" name="Internet">
    > > <publishdate>1154940455898</publishdate>
    > > <unpublishdate>1154940455898</unpublishdate>
    > > <attribute name="rooms"/>
    > > <attribute name="year"/>
    > > <attribute name="title">K?pes</attribute> <------------- *
    > > <attribute name="price">20000</attribute>
    > > <attribute name="area"/>
    > > <attribute name="body">Vill k?pa en truck</attribute>
    > > <-------------- *
    > > </channelinfo>
    > >
    > > but if I change the encoding to UTF-8:
    > >
    > > t.setOutputProperty( OutputKeys.ENCODING, "UTF-8"); <------- *
    > >
    > > the letters are alright:
    > >
    > > <?xml version="1.0" encoding="UTF-8"?>
    > > ...
    > > <channelinfo confirmed="true" validate="false" name="Internet">
    > > <publishdate>1154940455898</publishdate>
    > > <unpublishdate>1154940455898</unpublishdate>
    > > <attribute name="rooms"/>
    > > <attribute name="year"/>
    > > <attribute name="title">Köpes</attribute> <------------- *
    > > <attribute name="price">20000</attribute>
    > > <attribute name="area"/>
    > > <attribute name="body">Vill köpa en truck</attribute>
    > > <-------------- *
    > > </channelinfo>
    > >
    > > But the xml has to be formated in ISO-8859-1 so it would be nice if I
    > > could make it work with that encoding.
    > >
    > > Anyone know where I can alter this behavior or why it behaves like
    > > above?
     
    janib, Aug 7, 2006
    #3
  4. On 7-8-2006 13:35, janib wrote:
    > Tje output object is only a ByteArrayOuputStream...
    >
    > ByteArrayOutputStream output = new ByteArrayOutputStream( );
    >
    >

    See my reply in comp.lang.java.help
    --
    Regards,

    Roland
     
    Roland de Ruiter, Aug 7, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. gerlar2000
    Replies:
    0
    Views:
    623
    gerlar2000
    Feb 21, 2005
  2. Franck DARRAS
    Replies:
    12
    Views:
    648
    Jim Higson
    Aug 23, 2004
  3. Erik Wahlstrom
    Replies:
    1
    Views:
    644
    Richard Tobin
    Aug 18, 2004
  4. MEARTURO
    Replies:
    3
    Views:
    1,087
    Oliver Wong
    Jul 6, 2006
  5. JuanDG

    change encoding from UTF-8 to ISO-8859-1

    JuanDG, Feb 18, 2005, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    186
    JuanDG
    Feb 18, 2005
Loading...

Share This Page