doc.toxml() gives ASCII encoding error

Jim Hefferon · Feb 18, 2004

Hello,

I'm having trouble with .xml files that have non-ascii characters.
Here is a small example.

.................................
#!/usr/bin/python2.2
import sys, os, os.path, re
import xml.dom.minidom

doc=xml.dom.minidom.parse(sys.argv[1])
print doc.toxml()
................................

On an .xml that contains only ascii characters, it works just fine.
But in one of my documents is the string
<name>Martin Schröder</name>
and running the above script on that file gives:
Traceback (most recent call last):
File "/home/web/catalogue_read.py", line 6, in ?
print doc.toxml()
UnicodeError: ASCII encoding error: ordinal not in range(128)

I had the idea that the parser reads the xml declaration in the .xml
file (it is UTF-8), encodes the text parts into whatever is the
internal representation for unicode, and then .toxml sends it back out
again as a python unicode string. But I can't reconcile that idea
with this outcome.

I'm simply lost; can anyone tell me what (no doubt clueless) thing
that I am
doing wrong? I'm running under Fedora, so I have python 2.2, if
that's any help.

Thanks,
Jim

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 18, 2004

Jim said:
I'm simply lost; can anyone tell me what (no doubt clueless) thing
that I am
doing wrong?

You are not doing anything wrong; it's a bug. Try Python 2.3, or
try PyXML.

Regards,
Martin

Jim Hefferon · Feb 19, 2004

Martin v. Löwis said:
You are not doing anything wrong; it's a bug. Try Python 2.3, or
try PyXML.

Thanks. I understand that getting 2.3 to go on Fedora is non-trivial
(although I recently saw that RPM's are now available, so maybe now is
my chance).

I've decided that doc.toxml().encode('UTF-8') is what I want. I have
to admit that while I have gotten used to thinking of modules as black
boxes, the XML stuff seems to me to be such a big box that I often am
not sure just what I want to do. I don't think I have the whole
infoset thing inside my brain yet.

Thanks again,
Jim

xml.dom.minidom character encoding	6	Apr 21, 2010
files.py (weird encoding error)	0	Jun 10, 2013
files.py (encoding error)	0	Jun 10, 2013
encoding error	1	Feb 20, 2013
Ascii to Unicode.	4	Jul 28, 2010
encoding error in python 27	4	Feb 22, 2013
encoding ascii data for xml	4	Oct 3, 2008
Unicode/ascii encoding nightmare	19	Nov 6, 2006

doc.toxml() gives ASCII encoding error

Jim Hefferon

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Jim Hefferon

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads