XML Parsing Problems with SAX xerces

Discussion in 'Java' started by John Smith, Sep 26, 2005.

  1. John Smith

    John Smith Guest

    I am trying to parse an XML document that starts with the following tag:

    <?xml version='1.0' encoding='windows-1252' ?>

    This is causing an error::

    Caused by: org.xml.sax.SAXParseException: The encoding "windows-1252" is not
    supported.
    at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1056)
    at
    org.apache.xerces.readers.DefaultEntityHandler.startReadingFromDocument(DefaultEntityHandler.java:541)
    at org.apache.xerces.framework.XMLParser.parseSomeSetup(XMLParser.java:305)
    at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:947)

    Is there a way i can get it to support windows-1252 or ignore it as I cannot
    edit the document itself.

    Thanks

    Jon
    John Smith, Sep 26, 2005
    #1
    1. Advertising

  2. John Smith

    Roedy Green Guest

    On Mon, 26 Sep 2005 08:56:22 +0100, "John Smith"
    <> wrote or quoted :

    ><?xml version='1.0' encoding='windows-1252' ?>

    I thought the XML had UTF-8 as the only supported encoding. That was
    one of its key features that made it a suitable interchange format.

    Now I see every XML utility listing its set of supported encodings!
    (Imagine an exorcist crossing his arms in horror.)

    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
    Roedy Green, Sep 26, 2005
    #2
    1. Advertising

  3. Roedy Green wrote:

    > I thought the XML had UTF-8 as the only supported encoding. That was
    > one of its key features that made it a suitable interchange format.


    No, but you may have been thinking of this: "In the absence of
    information provided by an external transport protocol (e.g. HTTP or
    MIME), it is a fatal error for an entity including an encoding
    declaration to be presented to the XML processor in an encoding other
    than that named in the declaration, or for an entity which begins with
    neither a Byte Order Mark nor an encoding declaration to use an encoding
    other than UTF-8." [XML 1.1, section 4.3.3; the same appears in XML
    1.0, also in section 4.3.3]

    You might also have been thinking of the fact the XML is defined in
    terms of Unicode characters, which indeed is a key feature that makes it
    a suitable interchange format.

    > Now I see every XML utility listing its set of supported encodings!
    > (Imagine an exorcist crossing his arms in horror.)


    Given UTF-8's status as the default encoding, any utility that does not
    support that encoding is handicapped to the point of being downright
    broken. I know of none such, and never expect to see any. With that
    being the case it is safe to encode any XML document you create in
    UTF-8; any service or utility that fails to read it on account of the
    encoding has been designed specifically to prevent you from feeding it a
    document of your own creation. (So why fight it?)

    --
    John Bollinger
    John C. Bollinger, Sep 27, 2005
    #3
  4. John Smith

    Roedy Green Guest

    On Mon, 26 Sep 2005 22:19:23 -0500, "John C. Bollinger"
    <> wrote or quoted :

    >Given UTF-8's status as the default encoding, any utility that does not
    >support that encoding is handicapped to the point of being downright
    >broken. I know of none such, and never expect to see any. With that
    >being the case it is safe to encode any XML document you create in
    >UTF-8; any service or utility that fails to read it on account of the
    >encoding has been designed specifically to prevent you from feeding it a
    >document of your own creation. (So why fight it?)


    But the problem is if you let people encode in CP278 (Scandinavian
    EBCDIC) you force any reader of that file to support obsolete baggage
    as well.

    There was no advantage in allowing anything but UTF-8 and perhaps
    UTF-16 If people want to write such files for internal purposes that
    is their business, but they have no business being passed around as
    interchange files.

    Java has to support all these old encodings to deal with legacy apps,
    but XML does not.

    The other thing, embedding the encoding in plain text is a bit of a
    chicken and egg problem. You have to know the encoding to interpret
    the encoding specification. Unicode has the advantage you can tell
    what you have got just examining the first few bytes.

    Remember Bill the Cat from Bloom County? I think this decision
    deserves one of his hair ball spitting up noises.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
    Roedy Green, Sep 27, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Per Magnus L?vold
    Replies:
    0
    Views:
    1,366
    Per Magnus L?vold
    Nov 15, 2004
  2. John Smith

    Xerces SAX encoding problems

    John Smith, Sep 21, 2005, in forum: Java
    Replies:
    1
    Views:
    1,996
    Roedy Green
    Sep 21, 2005
  3. I Hate Sheep
    Replies:
    2
    Views:
    419
    I Hate Sheep
    Aug 3, 2005
  4. Replies:
    4
    Views:
    507
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=
    May 2, 2007
  5. Erik Wasser
    Replies:
    5
    Views:
    437
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page