high level, fast XML package for Python?

Discussion in 'Python' started by Gleb Rybkin, Sep 15, 2006.

  1. Gleb Rybkin

    Gleb Rybkin Guest

    I searched online, but couldn't really find a standard package for
    working with Python and XML -- everybody seems to suggest different
    ones.

    Is there a standard xml package for Python? Preferably high-level, fast
    and that can parse in-file, not in-memory since I have to deal with
    potentially MBs of data.

    Thanks.
     
    Gleb Rybkin, Sep 15, 2006
    #1
    1. Advertising

  2. Gleb Rybkin wrote:

    > I searched online, but couldn't really find a standard package for
    > working with Python and XML -- everybody seems to suggest different
    > ones.
    >
    > Is there a standard xml package for Python? Preferably high-level, fast
    > and that can parse in-file, not in-memory since I have to deal with
    > potentially MBs of data.


    cElementTree and lxml (which is API-compatible to the former). cElementTree
    has an incremental parser, which allows for lager-than-memory-files to be
    processed.

    Diez
     
    Diez B. Roggisch, Sep 15, 2006
    #2
    1. Advertising

  3. Diez B. Roggisch wrote:
    > Gleb Rybkin wrote:
    >
    >> I searched online, but couldn't really find a standard package for
    >> working with Python and XML -- everybody seems to suggest different
    >> ones.
    >>
    >> Is there a standard xml package for Python? Preferably high-level, fast
    >> and that can parse in-file, not in-memory since I have to deal with
    >> potentially MBs of data.

    >
    > cElementTree and lxml (which is API-compatible to the former). cElementTree
    > has an incremental parser, which allows for lager-than-memory-files to be
    > processed.


    In Python 2.5, cElementTree and ElementTree will be available in the
    standard library as xml.etree.cElementTree and xml.etree.ElementTree.
    So learning them now is a great idea.

    STeVe
     
    Steven Bethard, Sep 15, 2006
    #3
  4. Gleb Rybkin

    Gleb Rybkin Guest

    Okay, thanks!

    Steven Bethard wrote:
    > Diez B. Roggisch wrote:
    > > Gleb Rybkin wrote:
    > >
    > >> I searched online, but couldn't really find a standard package for
    > >> working with Python and XML -- everybody seems to suggest different
    > >> ones.
    > >>
    > >> Is there a standard xml package for Python? Preferably high-level, fast
    > >> and that can parse in-file, not in-memory since I have to deal with
    > >> potentially MBs of data.

    > >
    > > cElementTree and lxml (which is API-compatible to the former). cElementTree
    > > has an incremental parser, which allows for lager-than-memory-files to be
    > > processed.

    >
    > In Python 2.5, cElementTree and ElementTree will be available in the
    > standard library as xml.etree.cElementTree and xml.etree.ElementTree.
    > So learning them now is a great idea.
    >
    > STeVe
     
    Gleb Rybkin, Sep 15, 2006
    #4
  5. Hi Gleb,

    Gleb Rybkin wrote:
    > I searched online, but couldn't really find a standard package for
    > working with Python and XML -- everybody seems to suggest different
    > ones.
    >
    > Is there a standard xml package for Python? Preferably high-level, fast
    > and that can parse in-file, not in-memory since I have to deal with
    > potentially MBs of data.
    >
    > Thanks.


    Another option is Amara; also quite high-level and also allows for
    incremental parsing. I would say Amara is somewhat higher level than
    ElementTree since it allows you to access your XML nodes as Python
    objects (with some extra attributes and some minor warts), as well as
    giving you XPath expressions on the object tree.

    URL:

    http://uche.ogbuji.net/tech/4suite/amara/

    Best version currently available is version 1.1.7

    It does work together with py2exe on windows if the need ever arises
    for you but you have to fiddle a bit with it (ask for details on this
    list if you ever need to do that)

    Cheers,

    --Tim
     
    Tim N. van der Leeuw, Sep 15, 2006
    #5
  6. Tim N. van der Leeuw wrote:
    > Another option is Amara; also quite high-level and also allows for
    > incremental parsing. I would say Amara is somewhat higher level than
    > ElementTree since it allows you to access your XML nodes as Python
    > objects (with some extra attributes and some minor warts), as well as
    > giving you XPath expressions on the object tree.


    Then you should definitely give lxml.objectify a try. It combines the ET API
    with the lxml set of features (XPath, RelaxNG, XSLT, ...) and hides the actual
    XML behind a Python object interface. That gives you everything at the same time.

    http://codespeak.net/lxml/objectify.html

    It's part of the lxml distribution:
    http://codespeak.net/lxml/

    Stefan
     
    Stefan Behnel, Sep 16, 2006
    #6
  7. Gleb Rybkin

    John J. Lee Guest

    Steven Bethard <> writes:
    [...]
    > In Python 2.5, cElementTree and ElementTree will be available in the
    > standard library as xml.etree.cElementTree and
    > xml.etree.ElementTree. So learning them now is a great idea.


    Only some of the original ElementTree software is going into 2.5,
    apparently. So you can get more on the effbot.org site than you get
    from just downloading Python 2.5. Probably future Python releases
    will add more of Fredrik's XML code.


    John
     
    John J. Lee, Sep 17, 2006
    #7
  8. Gleb Rybkin schrieb:
    > I searched online, but couldn't really find a standard package for
    > working with Python and XML -- everybody seems to suggest different
    > ones.
    >
    > Is there a standard xml package for Python? Preferably high-level, fast
    > and that can parse in-file, not in-memory since I have to deal with
    > potentially MBs of data.


    It seems that everybody is proposing libraries that use in-memory
    representations. There is a standard xml package for Python, it's
    called "xml" (and comes with the standard library). It contains a
    SAX interface, xml.sax, which can parse files incrementally.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Sep 17, 2006
    #8
  9. Martin v. Löwis wrote:
    > Gleb Rybkin schrieb:
    >> I searched online, but couldn't really find a standard package for
    >> working with Python and XML -- everybody seems to suggest different
    >> ones.
    >>
    >> Is there a standard xml package for Python? Preferably high-level, fast
    >> and that can parse in-file, not in-memory since I have to deal with
    >> potentially MBs of data.

    >
    > It seems that everybody is proposing libraries that use in-memory
    > representations. There is a standard xml package for Python, it's
    > called "xml" (and comes with the standard library). It contains a
    > SAX interface, xml.sax, which can parse files incrementally.


    To use ElementTree and keep your memory consumption down, consider using
    the iterparse function:

    http://effbot.org/zone/element-iterparse.htm

    Then you can get more SAX-like memory consumption while still enjoying
    the high-level interface of ElementTree.

    STeVe
     
    Steven Bethard, Sep 18, 2006
    #9
  10. Gleb Rybkin

    Paul Boddie Guest

    Martin v. Löwis wrote:
    >
    > It seems that everybody is proposing libraries that use in-memory
    > representations. There is a standard xml package for Python, it's
    > called "xml" (and comes with the standard library). It contains a
    > SAX interface, xml.sax, which can parse files incrementally.


    What about xml.dom.pulldom? It quite possibly resembles ElementTree's
    iterparse, or at least promotes event-style handling of XML information
    using some kind of mainloop...

    import xml.dom.pulldom

    for etype, node in xml.dom.pulldom.parseString(s):
    if etype == xml.dom.pulldom.START_ELEMENT:
    print node.nodeName, node.attributes

    ....instead of callbacks (as happens with SAX):

    import xml.sax

    class CH(xml.sax.ContentHandler):
    def startElement(self, name, attrs):
    print name, attrs

    xml.sax.parseString(s, CH())

    Paul
     
    Paul Boddie, Sep 19, 2006
    #10
  11. Martin v. Löwis wrote:

    >> Is there a standard xml package for Python? Preferably high-level, fast
    >> and that can parse in-file, not in-memory since I have to deal with
    >> potentially MBs of data.

    >
    > It seems that everybody is proposing libraries that use in-memory
    > representations. There is a standard xml package for Python, it's
    > called "xml" (and comes with the standard library). It contains a
    > SAX interface, xml.sax, which can parse files incrementally.


    note that the requirements included "high-level" and "fast"; sax is
    low-level, error-prone, and once you've finally fixed all the remaining
    bugs in your state machine, not that fast, really.

    </F>
     
    Fredrik Lundh, Sep 20, 2006
    #11
  12. Paul Boddie schrieb:
    >> It seems that everybody is proposing libraries that use in-memory
    >> representations. There is a standard xml package for Python, it's
    >> called "xml" (and comes with the standard library). It contains a
    >> SAX interface, xml.sax, which can parse files incrementally.

    >
    > What about xml.dom.pulldom? It quite possibly resembles ElementTree's
    > iterparse, or at least promotes event-style handling of XML information
    > using some kind of mainloop...


    Right; that also meets the criteria of being standard and not
    in-memory (nobody had mentioned it so far).

    Whether it is high-level and fast is in the eyes of the beholder
    (as they are relative, rather than absolute properties).

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Sep 20, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stefan
    Replies:
    0
    Views:
    1,803
    Stefan
    Apr 15, 2004
  2. Jp Calderone

    python-xlib -- high-level interface?

    Jp Calderone, Jul 18, 2003, in forum: Python
    Replies:
    0
    Views:
    384
    Jp Calderone
    Jul 18, 2003
  3. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    573
  4. pabbu
    Replies:
    8
    Views:
    735
    Marc Boyer
    Nov 7, 2005
  5. Scorpiion
    Replies:
    1
    Views:
    1,360
    Scorpiion
    Dec 25, 2008
Loading...

Share This Page