"encoding specified in XML declaration is incorrect"

  • Thread starter Gustaf Liljegren
  • Start date
G

Gustaf Liljegren

I'm using xml.sax.parseString to read an XML file. The XML file contains
a few words in Russian, and is encoded in UTF-8 using C#. In the example
below, MyParser() is my SAX ContentHandler class. My first try was:

f = open('words.xml', 'r')
s = f.read()
xml.sax.parseString(s, MyParser())

This produced the following error:

Traceback (most recent call last):
File "sax5.py", line 87, in ?
xml.sax.parseString(s, MyParser())
File "D:\Python\lib\xml\sax\__init__.py", line 49, in parseString
parser.parse(inpsrc)
File "D:\Python\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "D:\Python\lib\xml\sax\xmlreader.py", line 125, in parse
self.close()
File "D:\Python\lib\xml\sax\expatreader.py", line 218, in close
self._cont_handler.endDocument()
File "sax5.py", line 81, in endDocument
f.write(header + self.all + footer)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
745-751: ordinal not in range(128)

The XML declaration should be enough to tell the encoding. Anyway, I
read some previous posts, and found that the unicode() function may help:

f = open('words.xml', 'r')
s = f.read()
u = unicode(s, "utf-8")
xml.sax.parseString(u, MyParser())

But I just got another error:

Traceback (most recent call last):
File "sax5.py", line 87, in ?
xml.sax.parseString(u, MyParser())
File "D:\Python\lib\xml\sax\__init__.py", line 49, in parseString
parser.parse(inpsrc)
File "D:\Python\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "D:\Python\lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
File "D:\Python\lib\xml\sax\expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "D:\Python\lib\xml\sax\handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:30: encoding
specified in XML declaration is incorrect

I see nothing wrong with my XML declaration:

<?xml version="1.0" encoding="utf-8"?>

And the file is indeed in UTF-8 (or I wouldn't be able to open it in IE
and FF). I tried removing the BOM, but it didn't help. What more can be
wrong?

Gustaf
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Gustaf said:
f.write(header + self.all + footer)
UnicodeEncodeError: 'ascii' codec can't encode characters in position
745-751: ordinal not in range(128)

The XML declaration should be enough to tell the encoding.

Sure, but that does not help at all. self.all is a Unicode string;
information about its original encoding is not available anymore.
If you want to write self.all to f, you need to encode it explicitly,
e.g.

f.write(header + self.all.encode("koi8-r") + footer)

Instead of koi8-r, you should use the enccoding of f, of course.
u = unicode(s, "utf-8")
xml.sax.parseString(u, MyParser())

This is not really supposed to work (yet). You need to pass
byte strings to xml.sax.parseString, not Unicode strings.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top