XML Parsing Problems with SAX xerces

J

John Smith

I am trying to parse an XML document that starts with the following tag:

<?xml version='1.0' encoding='windows-1252' ?>

This is causing an error::

Caused by: org.xml.sax.SAXParseException: The encoding "windows-1252" is not
supported.
at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1056)
at
org.apache.xerces.readers.DefaultEntityHandler.startReadingFromDocument(DefaultEntityHandler.java:541)
at org.apache.xerces.framework.XMLParser.parseSomeSetup(XMLParser.java:305)
at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:947)

Is there a way i can get it to support windows-1252 or ignore it as I cannot
edit the document itself.

Thanks

Jon
 
R

Roedy Green

<?xml version='1.0' encoding='windows-1252' ?>
I thought the XML had UTF-8 as the only supported encoding. That was
one of its key features that made it a suitable interchange format.

Now I see every XML utility listing its set of supported encodings!
(Imagine an exorcist crossing his arms in horror.)
 
J

John C. Bollinger

Roedy said:
I thought the XML had UTF-8 as the only supported encoding. That was
one of its key features that made it a suitable interchange format.

No, but you may have been thinking of this: "In the absence of
information provided by an external transport protocol (e.g. HTTP or
MIME), it is a fatal error for an entity including an encoding
declaration to be presented to the XML processor in an encoding other
than that named in the declaration, or for an entity which begins with
neither a Byte Order Mark nor an encoding declaration to use an encoding
other than UTF-8." [XML 1.1, section 4.3.3; the same appears in XML
1.0, also in section 4.3.3]

You might also have been thinking of the fact the XML is defined in
terms of Unicode characters, which indeed is a key feature that makes it
a suitable interchange format.
Now I see every XML utility listing its set of supported encodings!
(Imagine an exorcist crossing his arms in horror.)

Given UTF-8's status as the default encoding, any utility that does not
support that encoding is handicapped to the point of being downright
broken. I know of none such, and never expect to see any. With that
being the case it is safe to encode any XML document you create in
UTF-8; any service or utility that fails to read it on account of the
encoding has been designed specifically to prevent you from feeding it a
document of your own creation. (So why fight it?)
 
R

Roedy Green

Given UTF-8's status as the default encoding, any utility that does not
support that encoding is handicapped to the point of being downright
broken. I know of none such, and never expect to see any. With that
being the case it is safe to encode any XML document you create in
UTF-8; any service or utility that fails to read it on account of the
encoding has been designed specifically to prevent you from feeding it a
document of your own creation. (So why fight it?)

But the problem is if you let people encode in CP278 (Scandinavian
EBCDIC) you force any reader of that file to support obsolete baggage
as well.

There was no advantage in allowing anything but UTF-8 and perhaps
UTF-16 If people want to write such files for internal purposes that
is their business, but they have no business being passed around as
interchange files.

Java has to support all these old encodings to deal with legacy apps,
but XML does not.

The other thing, embedding the encoding in plain text is a bit of a
chicken and egg problem. You have to know the encoding to interpret
the encoding specification. Unicode has the advantage you can tell
what you have got just examining the first few bytes.

Remember Bill the Cat from Bloom County? I think this decision
deserves one of his hair ball spitting up noises.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top