org.apache.xml.serialize.XMLSerializer problem with UTF-8

Discussion in 'XML' started by Jim Cobban, Dec 2, 2003.

  1. Jim Cobban

    Jim Cobban Guest

    I must be missing something.

    I am using org.apache.xml.serialize.XMLSerializer to save a DOM but I am not
    getting non-basic characters converted to UTF-8.

    I create Text nodes in the DOM by, for example:

    Document doc;
    JTextArea textPrompt;
    Text newTextNode;
    Element descElt;
    ....
    newTextNode = doc.createTextNode(textPrompt.getText());
    descElt.appendChild(newTextNode);

    The code to serialize the DOM is:

    private void saveXml(Document document)
    {
    // rename the existing layout file
    new File(fileName).renameTo(new File(fileName + "~"));
    // write the document out
    OutputFormat format = new OutputFormat(document);
    format.setIndenting(true);
    format.setLineWidth(0);
    format.setPreserveSpace(true);
    try {
    XMLSerializer serializer;
    serializer = new XMLSerializer (
    new FileWriter(fileName),
    format);
    serializer.asDOMSerializer();
    serializer.serialize(document);
    }
    catch (IOException ioe)
    {
    ....
    }
    }

    If I enter a character such as e' (e with acute accent) into the JTextArea
    and I look at the XML file using a non-UTF-8-aware editor I see that the e'
    has been inserted as a single byte, not as the 2 character UTF-8 escaped
    value. If I subsequently try to read the XML file using XERCES it blows up
    because of the invalid escape sequence.

    How do I get a valid serialization of this DOM into XML using UTF-8?


    --
    Jim Cobban
    34 Palomino Dr.
    Kanata, ON, CANADA
    K2M 1M1
    +1-613-592-9438
     
    Jim Cobban, Dec 2, 2003
    #1
    1. Advertising

  2. Jim Cobban

    Soren Kuula Guest

    Jim Cobban wrote:

    > I must be missing something.


    > XMLSerializer serializer;
    > serializer = new XMLSerializer (
    > new FileWriter(fileName),
    > format);
    > serializer.asDOMSerializer();
    > If I enter a character such as e' (e with acute accent) into the JTextArea
    > and I look at the XML file using a non-UTF-8-aware editor I see that the e'
    > has been inserted as a single byte, not as the 2 character UTF-8 escaped
    > value. If I subsequently try to read the XML file using XERCES it blows up
    > because of the invalid escape sequence.
    >
    > How do I get a valid serialization of this DOM into XML using UTF-8?


    As far as I know it is the Writer responsible for the encoding.

    From FileWriter API doc:

    public class FileWriter
    extends OutputStreamWriter

    Convenience class for writing character files. The constructors of this
    class assume that the default character encoding and the default
    byte-buffer size are acceptable. To specify these values yourself,
    construct an OutputStreamWriter on a FileOutputStream.


    - try that.

    Soren

    --
    Fjern de 4 bogstaver i min mailadresse som er indsat for at hindre s...
    Remove the 4 letter word meaning "junk mail" in my mail address.
     
    Soren Kuula, Dec 15, 2003
    #2
    1. Advertising

  3. Jim Cobban

    Jim Cobban Guest

    "Soren Kuula" <> wrote in message
    news:5K7Db.59147$...
    >
    > As far as I know it is the Writer responsible for the encoding.
    >
    > From FileWriter API doc:
    >
    > public class FileWriter
    > extends OutputStreamWriter
    >
    > Convenience class for writing character files. The constructors of this
    > class assume that the default character encoding and the default
    > byte-buffer size are acceptable. To specify these values yourself,
    > construct an OutputStreamWriter on a FileOutputStream.


    Thank you.

    The problem was that I copied the code from one of the examples that came
    with Xerces. It was that example which constructed the default FileWriter.
    Since their is a version of the XMLSerializer constructor which takes an
    OutpuStream and internally constructs a Writer with the correct "utf-8"
    encoding, that is the form of the constructor which I needed to use. I
    should have read the documentation in more detail rather than trusting that
    the example had been written correctly.
     
    Jim Cobban, Dec 15, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    940
  2. Jim Cobban
    Replies:
    3
    Views:
    1,621
    Jim Cobban
    Dec 6, 2003
  3. Jim Cobban
    Replies:
    1
    Views:
    382
    Jim Cobban
    Dec 6, 2003
  4. Scott Harper
    Replies:
    0
    Views:
    401
    Scott Harper
    May 25, 2006
  5. Replies:
    4
    Views:
    590
    Joseph Kesselman
    Aug 10, 2006
Loading...

Share This Page