doc.toxml() gives ASCII encoding error

J

Jim Hefferon

Hello,

I'm having trouble with .xml files that have non-ascii characters.
Here is a small example.

.................................
#!/usr/bin/python2.2
import sys, os, os.path, re
import xml.dom.minidom

doc=xml.dom.minidom.parse(sys.argv[1])
print doc.toxml()
................................

On an .xml that contains only ascii characters, it works just fine.
But in one of my documents is the string
<name>Martin Schröder</name>
and running the above script on that file gives:
Traceback (most recent call last):
File "/home/web/catalogue_read.py", line 6, in ?
print doc.toxml()
UnicodeError: ASCII encoding error: ordinal not in range(128)

I had the idea that the parser reads the xml declaration in the .xml
file (it is UTF-8), encodes the text parts into whatever is the
internal representation for unicode, and then .toxml sends it back out
again as a python unicode string. But I can't reconcile that idea
with this outcome.

I'm simply lost; can anyone tell me what (no doubt clueless) thing
that I am
doing wrong? I'm running under Fedora, so I have python 2.2, if
that's any help.

Thanks,
Jim
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Jim said:
I'm simply lost; can anyone tell me what (no doubt clueless) thing
that I am
doing wrong?

You are not doing anything wrong; it's a bug. Try Python 2.3, or
try PyXML.

Regards,
Martin
 
J

Jim Hefferon

Martin v. Löwis said:
You are not doing anything wrong; it's a bug. Try Python 2.3, or
try PyXML.
Thanks. I understand that getting 2.3 to go on Fedora is non-trivial
(although I recently saw that RPM's are now available, so maybe now is
my chance).

I've decided that doc.toxml().encode('UTF-8') is what I want. I have
to admit that while I have gotten used to thinking of modules as black
boxes, the XML stuff seems to me to be such a big box that I often am
not sure just what I want to do. I don't think I have the whole
infoset thing inside my brain yet.

Thanks again,
Jim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top