M
ming
Hi,
i've a Python script which stopped working about a month ago. But until then, it worked flawlessly for months (if not years). A tiny self-contained 7-line script is provided below.
i ran into an XML parsing problem with xml.dom.minidom and the error message is included below. The weird thing is if i used an XML validator on the web to validate against this particular URL directly, it is all good. Moreover, i saved the page source in Firefox or Chrome then validated against the saved XML file, it's also all good.
Since the error happened at the very beginning of the input (line 1, column0) as indicated below, i was wondering if this is an encoding mismatch. However, according to the saved page source in FireFox or Chrome, there is the following at the beginning:
<?xml version="1.0" encoding="UTF-8"?>
<program>
=================================================
#!/usr/bin/env python
import urllib2
from xml.dom.minidom import parseString
fd = urllib2.urlopen('http://api.worldbank.org/countries')
data = fd.read()
fd.close()
dom = parseString(data)
=================================================
<error msg>
=================================================
Traceback (most recent call last):
File "./bugReport.py", line 9, in <module>
dom = parseString(data)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1931, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 0
=================================================
i'm running Python 2.7.5+ on Ubuntu 13.10.
Thanks.
i've a Python script which stopped working about a month ago. But until then, it worked flawlessly for months (if not years). A tiny self-contained 7-line script is provided below.
i ran into an XML parsing problem with xml.dom.minidom and the error message is included below. The weird thing is if i used an XML validator on the web to validate against this particular URL directly, it is all good. Moreover, i saved the page source in Firefox or Chrome then validated against the saved XML file, it's also all good.
Since the error happened at the very beginning of the input (line 1, column0) as indicated below, i was wondering if this is an encoding mismatch. However, according to the saved page source in FireFox or Chrome, there is the following at the beginning:
<?xml version="1.0" encoding="UTF-8"?>
<program>
=================================================
#!/usr/bin/env python
import urllib2
from xml.dom.minidom import parseString
fd = urllib2.urlopen('http://api.worldbank.org/countries')
data = fd.read()
fd.close()
dom = parseString(data)
=================================================
<error msg>
=================================================
Traceback (most recent call last):
File "./bugReport.py", line 9, in <module>
dom = parseString(data)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1931, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 0
=================================================
i'm running Python 2.7.5+ on Ubuntu 13.10.
Thanks.