XML parsing ExpatError with xml.dom.minidom at line 1, column 0

ming · Feb 13, 2014

Hi,
i've a Python script which stopped working about a month ago. But until then, it worked flawlessly for months (if not years). A tiny self-contained 7-line script is provided below.

i ran into an XML parsing problem with xml.dom.minidom and the error message is included below. The weird thing is if i used an XML validator on the web to validate against this particular URL directly, it is all good. Moreover, i saved the page source in Firefox or Chrome then validated against the saved XML file, it's also all good.

Since the error happened at the very beginning of the input (line 1, column0) as indicated below, i was wondering if this is an encoding mismatch. However, according to the saved page source in FireFox or Chrome, there is the following at the beginning:
<?xml version="1.0" encoding="UTF-8"?>

<program>
=================================================
#!/usr/bin/env python

import urllib2
from xml.dom.minidom import parseString

fd = urllib2.urlopen('http://api.worldbank.org/countries')
data = fd.read()
fd.close()
dom = parseString(data)
=================================================

<error msg>
=================================================
Traceback (most recent call last):
File "./bugReport.py", line 9, in <module>
dom = parseString(data)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1931, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 0
=================================================

i'm running Python 2.7.5+ on Ubuntu 13.10.

Thanks.

Peter Otten · Feb 13, 2014

ming said:
Hi,
i've a Python script which stopped working about a month ago. But until
then, it worked flawlessly for months (if not years). A tiny
self-contained 7-line script is provided below.

i ran into an XML parsing problem with xml.dom.minidom and the error
message is included below. The weird thing is if i used an XML validator
on the web to validate against this particular URL directly, it is all
good. Moreover, i saved the page source in Firefox or Chrome then
validated against the saved XML file, it's also all good.

Since the error happened at the very beginning of the input (line 1,
column 0) as indicated below, i was wondering if this is an encoding
mismatch. However, according to the saved page source in FireFox or
Chrome, there is the following at the beginning:
<?xml version="1.0" encoding="UTF-8"?>

<program>
=================================================
#!/usr/bin/env python

import urllib2
from xml.dom.minidom import parseString

fd = urllib2.urlopen('http://api.worldbank.org/countries')
data = fd.read()
fd.close()
dom = parseString(data)
=================================================

<error msg>
=================================================
Traceback (most recent call last):
File "./bugReport.py", line 9, in <module>
dom = parseString(data)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1931, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in
parseString
return builder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in
parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 0 =================================================

i'm running Python 2.7.5+ on Ubuntu 13.10.

Thanks.

Looking into the data returned from the server:
[1]+ Angehalten python
$ file tmp.dat
tmp.dat: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)

OK, let's expand:

$ fg
python

<xml.dom.minidom.Document instance at 0x19a1320>

There may be a way to uncompress the gzipped data transparently, but I'm too
lazy to look it up...

MRAB · Feb 13, 2014

ming said:
ming said:

Hi,
i've a Python script which stopped working about a month ago. But until
then, it worked flawlessly for months (if not years). A tiny
self-contained 7-line script is provided below.

i ran into an XML parsing problem with xml.dom.minidom and the error
message is included below. The weird thing is if i used an XML validator
on the web to validate against this particular URL directly, it is all
good. Moreover, i saved the page source in Firefox or Chrome then
validated against the saved XML file, it's also all good.

Since the error happened at the very beginning of the input (line 1,
column 0) as indicated below, i was wondering if this is an encoding
mismatch. However, according to the saved page source in FireFox or
Chrome, there is the following at the beginning:
<?xml version="1.0" encoding="UTF-8"?>

<program>
=================================================
#!/usr/bin/env python

import urllib2
from xml.dom.minidom import parseString

fd = urllib2.urlopen('http://api.worldbank.org/countries')
data = fd.read()
fd.close()
dom = parseString(data)
=================================================

<error msg>
=================================================
Traceback (most recent call last):
File "./bugReport.py", line 9, in <module>
dom = parseString(data)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1931, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in
parseString
return builder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in
parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 0 =================================================

i'm running Python 2.7.5+ on Ubuntu 13.10.

Thanks.

Click to expand...

Looking into the data returned from the server:
[1]+ Angehalten python
$ file tmp.dat
tmp.dat: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)

OK, let's expand:

$ fg
python

<xml.dom.minidom.Document instance at 0x19a1320>

There may be a way to uncompress the gzipped data transparently, but I'm too
lazy to look it up...

From a brief look at the docs, it looks like you can specify the
format. For example, for JSON:

fd = urlopen('http://api.worldbank.org/countries?format=json')

BZip2 decompression and parsing XML	1	Jun 6, 2008
mod_python and xml.dom.minidom	8	May 8, 2009
Parsing unicode (devanagari) text with xml.dom.minidom	6	Mar 7, 2009
XML DOM	4	Nov 7, 2003
problem parsing utf-8 encoded xml - minidom	2	Jul 4, 2008
Sequential XML parsing with xml.sax	2	Aug 23, 2005
Extracting xml from html	13	Sep 17, 2007
Problem with processing XML	8	Jan 22, 2008

XML parsing ExpatError with xml.dom.minidom at line 1, column 0

ming

Peter Otten

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads