O
Orest Kozyar
I'm working on a CGI script that pulls XML data from a public database
(Medline) and caches this data using shelveleto minimize load on the
database. In general, the script works quite well, but keeps crashing
every time I try to pickle a particular XML document. Below is a
script that illustrates the problem, followed by the stack trace that
is generated (thanks to Kent Johnson who helped me refine the
script). I'd appreciate any advice for solving this particular
problem. Someone on Python-Tutor suggested that the XML document has
a circular reference, but I'm not sure exactly what this means, or why
the document would have a reference to itself.
import urllib
from pickle import Pickler
from cStringIO import StringIO
from xml.dom import minidom
baseurl = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?'
params = {
'db': 'pubmed',
'retmode': 'xml',
'rettype': 'medline',
}
badkey = '16842422'
params['id'] = badkey
url = baseurl + urllib.urlencode(params) doc =
minidom.parseString(urllib.urlopen(url).read())
print 'Successfully retrieved and parsed XML document with ID %s' %
badkey
f = StringIO()
p = Pickler(f, 0)
p.dump(doc)
#Will fail on the above line
print 'Successfully shelved XML document with ID %s' % badkey
Here is the top of the stack trace:
File "BadShelve.py", line 35, in <module>
p.dump(doc)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 224, in dump
self.save(obj)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 286, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 725, in save_inst
save(stuff)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 286, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 649, in save_dict
self._batch_setitems(obj.iteritems())
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 663, in _batch_setitems
save(v)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 286, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 725, in save_inst
save(stuff)
(Medline) and caches this data using shelveleto minimize load on the
database. In general, the script works quite well, but keeps crashing
every time I try to pickle a particular XML document. Below is a
script that illustrates the problem, followed by the stack trace that
is generated (thanks to Kent Johnson who helped me refine the
script). I'd appreciate any advice for solving this particular
problem. Someone on Python-Tutor suggested that the XML document has
a circular reference, but I'm not sure exactly what this means, or why
the document would have a reference to itself.
import urllib
from pickle import Pickler
from cStringIO import StringIO
from xml.dom import minidom
baseurl = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?'
params = {
'db': 'pubmed',
'retmode': 'xml',
'rettype': 'medline',
}
badkey = '16842422'
params['id'] = badkey
url = baseurl + urllib.urlencode(params) doc =
minidom.parseString(urllib.urlopen(url).read())
print 'Successfully retrieved and parsed XML document with ID %s' %
badkey
f = StringIO()
p = Pickler(f, 0)
p.dump(doc)
#Will fail on the above line
print 'Successfully shelved XML document with ID %s' % badkey
Here is the top of the stack trace:
File "BadShelve.py", line 35, in <module>
p.dump(doc)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 224, in dump
self.save(obj)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 286, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 725, in save_inst
save(stuff)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 286, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 649, in save_dict
self._batch_setitems(obj.iteritems())
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 663, in _batch_setitems
save(v)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 286, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/
pickle.py",
line 725, in save_inst
save(stuff)