Sequential XML parsing with xml.sax

Discussion in 'Python' started by peter@hardy.dropbear.id.au, Aug 23, 2005.

  1. Guest

    Hi hi.

    I'm trying to do sequential decompression of a bzipped XML file and
    feed it to a SAX parser with the following code.

    remotefh = urllib.urlopen('file:///home/peter/catalog.rdf.bz2')
    decompressor = bz2.BZ2Decompressor()
    handler = CatalogueDocumentHandler(sys.stdout)
    chunksize = 2048
    data = remotefh.read(chunksize)
    while data != '':
    out = decompressor.decompress(data)
    if out != '':
    xml.sax.parseString(out, handler)
    data = remotefh.read(chunksize)

    This fails with the first chunk of decompressed data passed to
    xml.sax.parseString. I'm suspecting because it's an incomplete fragment
    of XML. I've tried with a number of different chunk sizes, putting the
    break in different places, but it always fails on the first call. For
    reference, the traceback looks like:

    xml.sax.parseString(out, handler)
    File "/usr/lib/python2.4/site-packages/_xmlplus/sax/__init__.py",
    line 47, in parseString
    parser.parse(inpsrc)
    File "/usr/lib/python2.4/site-packages/_xmlplus/sax/expatreader.py",
    line 109, in parse
    xmlreader.IncrementalParser.parse(self, source)
    File "/usr/lib/python2.4/site-packages/_xmlplus/sax/xmlreader.py",
    line 125, in parse
    self.close()
    File "/usr/lib/python2.4/site-packages/_xmlplus/sax/expatreader.py",
    line 226, in close
    self.feed("", isFinal = 1)
    File "/usr/lib/python2.4/site-packages/_xmlplus/sax/expatreader.py",
    line 220, in feed
    self._err_handler.fatalError(exc)
    File "/usr/lib/python2.4/site-packages/_xmlplus/sax/handler.py", line
    38, in fatalError
    raise exception
    xml.sax._exceptions.SAXParseException: <unknown>:15132:63: no element
    found

    (line 15132 is the last, incomplete line feed to parseString. FWIW,
    it's:
    <pgterms:friendlytitle rdf:parseType="Literal">Searchlights o
    )

    The API reference isn't clear on whether parseString can only handle
    discrete bits of valid XML, or if it's designed to be called in this
    way. So I'm not sure if I'm misusing the function, or if I've done
    something else wrong.

    Any pointers?
    Thanks,
    --
    Pete
    , Aug 23, 2005
    #1
    1. Advertising

  2. wrote:

    > The API reference isn't clear on whether parseString can only handle
    > discrete bits of valid XML


    the documentation says that "parse" expects an XML document,
    and that "parseString" is the same thing, but parses from a buffer.

    it's probably easier to pass a BZ2File instance to "parse", but if you
    insist on doing incremental SAX parsing, the IncrementalParser class
    might be what you need:

    http://www.python.org/doc/current/lib/module-xml.sax.xmlreader.html
    http://www.python.org/doc/current/lib/incremental-parser-objects.html

    </F>
    Fredrik Lundh, Aug 23, 2005
    #2
    1. Advertising

  3. Guest

    Hi.

    Fredrik Lundh wrote:
    > wrote:
    >
    > > The API reference isn't clear on whether parseString can only handle
    > > discrete bits of valid XML

    >
    > the documentation says that "parse" expects an XML document,
    > and that "parseString" is the same thing, but parses from a buffer.


    OK, so it sounded a lot more ambiguous at 4am. :)

    > it's probably easier to pass a BZ2File instance to "parse",


    It is easier to retrieve a remote file, and decompress and parse as
    separate steps. But I've been wondering if it would be faster / more
    efficient to do it without caching.

    > but if you
    > insist on doing incremental SAX parsing, the IncrementalParser class
    > might be what you need:


    That'll do the trick nicely. Thanks.

    Cheers,
    --
    Pete
    , Aug 24, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Khalid Rasheed

    Parsing XML using SAX

    Khalid Rasheed, Dec 18, 2003, in forum: Java
    Replies:
    1
    Views:
    342
    Christophe Vanfleteren
    Dec 18, 2003
  2. Nathaniel Hughes

    Parsing XML file with Sax question...

    Nathaniel Hughes, Jan 14, 2004, in forum: Java
    Replies:
    1
    Views:
    344
    Silvio Bierman
    Jan 14, 2004
  3. Per Magnus L?vold
    Replies:
    0
    Views:
    1,376
    Per Magnus L?vold
    Nov 15, 2004
  4. Naren
    Replies:
    0
    Views:
    579
    Naren
    May 11, 2004
  5. Erik Wasser
    Replies:
    5
    Views:
    446
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page