high level, fast XML package for Python?

Gleb Rybkin · Sep 15, 2006

I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

Thanks.

Diez B. Roggisch · Sep 15, 2006

Gleb said:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

cElementTree and lxml (which is API-compatible to the former). cElementTree
has an incremental parser, which allows for lager-than-memory-files to be
processed.

Diez

Steven Bethard · Sep 15, 2006

Diez said:
cElementTree and lxml (which is API-compatible to the former). cElementTree
has an incremental parser, which allows for lager-than-memory-files to be
processed.

In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and xml.etree.ElementTree.
So learning them now is a great idea.

STeVe

Gleb Rybkin · Sep 15, 2006

Okay, thanks!

Steven said:
In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and xml.etree.ElementTree.
So learning them now is a great idea.

STeVe

Tim N. van der Leeuw · Sep 15, 2006

Hi Gleb,

Gleb said:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

Thanks.

Another option is Amara; also quite high-level and also allows for
incremental parsing. I would say Amara is somewhat higher level than
ElementTree since it allows you to access your XML nodes as Python
objects (with some extra attributes and some minor warts), as well as
giving you XPath expressions on the object tree.

URL:

http://uche.ogbuji.net/tech/4suite/amara/

Best version currently available is version 1.1.7

It does work together with py2exe on windows if the need ever arises
for you but you have to fiddle a bit with it (ask for details on this
list if you ever need to do that)

Cheers,

--Tim

Stefan Behnel · Sep 16, 2006

Tim said:
Another option is Amara; also quite high-level and also allows for
incremental parsing. I would say Amara is somewhat higher level than
ElementTree since it allows you to access your XML nodes as Python
objects (with some extra attributes and some minor warts), as well as
giving you XPath expressions on the object tree.

Then you should definitely give lxml.objectify a try. It combines the ET API
with the lxml set of features (XPath, RelaxNG, XSLT, ...) and hides the actual
XML behind a Python object interface. That gives you everything at the same time.

http://codespeak.net/lxml/objectify.html

It's part of the lxml distribution:
http://codespeak.net/lxml/

Stefan

John J. Lee · Sep 16, 2006

Steven Bethard said:
In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and
xml.etree.ElementTree. So learning them now is a great idea.

Only some of the original ElementTree software is going into 2.5,
apparently. So you can get more on the effbot.org site than you get
from just downloading Python 2.5. Probably future Python releases
will add more of Fredrik's XML code.

John

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Sep 17, 2006

Gleb said:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

Regards,
Martin

Steven Bethard · Sep 17, 2006

Martin said:
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

To use ElementTree and keep your memory consumption down, consider using
the iterparse function:

http://effbot.org/zone/element-iterparse.htm

Then you can get more SAX-like memory consumption while still enjoying
the high-level interface of ElementTree.

STeVe

Paul Boddie · Sep 19, 2006

Martin said:
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

What about xml.dom.pulldom? It quite possibly resembles ElementTree's
iterparse, or at least promotes event-style handling of XML information
using some kind of mainloop...

import xml.dom.pulldom

for etype, node in xml.dom.pulldom.parseString(s):
if etype == xml.dom.pulldom.START_ELEMENT:
print node.nodeName, node.attributes

....instead of callbacks (as happens with SAX):

import xml.sax

class CH(xml.sax.ContentHandler):
def startElement(self, name, attrs):
print name, attrs

xml.sax.parseString(s, CH())

Paul

Fredrik Lundh · Sep 20, 2006

Martin said:
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

note that the requirements included "high-level" and "fast"; sax is
low-level, error-prone, and once you've finally fixed all the remaining
bugs in your state machine, not that fast, really.

</F>

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Sep 20, 2006

Paul said:
What about xml.dom.pulldom? It quite possibly resembles ElementTree's
iterparse, or at least promotes event-style handling of XML information
using some kind of mainloop...

Right; that also meets the criteria of being standard and not
in-memory (nobody had mentioned it so far).

Whether it is high-level and fast is in the eyes of the beholder
(as they are relative, rather than absolute properties).

Regards,
Martin

Galry, a high-performance interactive visualization package in Python	0	Jan 29, 2013
[ANN] Struqtural: High level database interface library	3	Jul 17, 2010
High-performance Python websites	13	Nov 25, 2009
Forking PyPI package	9	May 28, 2014
fast video encoding	4	Jul 29, 2009
how to build and install multiple micro-level major.minor versionsof Python	0	Apr 29, 2014
Package for fast plotting of many data points in Python?	8	Jul 9, 2009
Suggestions for new Website project	3	Feb 1, 2024

high level, fast XML package for Python?

Gleb Rybkin

Diez B. Roggisch

Steven Bethard

Gleb Rybkin

Tim N. van der Leeuw

Stefan Behnel

John J. Lee

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Steven Bethard

Paul Boddie

Fredrik Lundh

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads