high level, fast XML package for Python?

G

Gleb Rybkin

I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

Thanks.
 
D

Diez B. Roggisch

Gleb said:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

cElementTree and lxml (which is API-compatible to the former). cElementTree
has an incremental parser, which allows for lager-than-memory-files to be
processed.

Diez
 
S

Steven Bethard

Diez said:
cElementTree and lxml (which is API-compatible to the former). cElementTree
has an incremental parser, which allows for lager-than-memory-files to be
processed.

In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and xml.etree.ElementTree.
So learning them now is a great idea.

STeVe
 
G

Gleb Rybkin

Okay, thanks!

Steven said:
In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and xml.etree.ElementTree.
So learning them now is a great idea.

STeVe
 
T

Tim N. van der Leeuw

Hi Gleb,

Gleb said:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

Thanks.

Another option is Amara; also quite high-level and also allows for
incremental parsing. I would say Amara is somewhat higher level than
ElementTree since it allows you to access your XML nodes as Python
objects (with some extra attributes and some minor warts), as well as
giving you XPath expressions on the object tree.

URL:

http://uche.ogbuji.net/tech/4suite/amara/

Best version currently available is version 1.1.7

It does work together with py2exe on windows if the need ever arises
for you but you have to fiddle a bit with it (ask for details on this
list if you ever need to do that)

Cheers,

--Tim
 
S

Stefan Behnel

Tim said:
Another option is Amara; also quite high-level and also allows for
incremental parsing. I would say Amara is somewhat higher level than
ElementTree since it allows you to access your XML nodes as Python
objects (with some extra attributes and some minor warts), as well as
giving you XPath expressions on the object tree.

Then you should definitely give lxml.objectify a try. It combines the ET API
with the lxml set of features (XPath, RelaxNG, XSLT, ...) and hides the actual
XML behind a Python object interface. That gives you everything at the same time.

http://codespeak.net/lxml/objectify.html

It's part of the lxml distribution:
http://codespeak.net/lxml/

Stefan
 
J

John J. Lee

Steven Bethard said:
In Python 2.5, cElementTree and ElementTree will be available in the
standard library as xml.etree.cElementTree and
xml.etree.ElementTree. So learning them now is a great idea.

Only some of the original ElementTree software is going into 2.5,
apparently. So you can get more on the effbot.org site than you get
from just downloading Python 2.5. Probably future Python releases
will add more of Fredrik's XML code.


John
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Gleb said:
I searched online, but couldn't really find a standard package for
working with Python and XML -- everybody seems to suggest different
ones.

Is there a standard xml package for Python? Preferably high-level, fast
and that can parse in-file, not in-memory since I have to deal with
potentially MBs of data.

It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

Regards,
Martin
 
S

Steven Bethard

Martin said:
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

To use ElementTree and keep your memory consumption down, consider using
the iterparse function:

http://effbot.org/zone/element-iterparse.htm

Then you can get more SAX-like memory consumption while still enjoying
the high-level interface of ElementTree.

STeVe
 
P

Paul Boddie

Martin said:
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

What about xml.dom.pulldom? It quite possibly resembles ElementTree's
iterparse, or at least promotes event-style handling of XML information
using some kind of mainloop...

import xml.dom.pulldom

for etype, node in xml.dom.pulldom.parseString(s):
if etype == xml.dom.pulldom.START_ELEMENT:
print node.nodeName, node.attributes

....instead of callbacks (as happens with SAX):

import xml.sax

class CH(xml.sax.ContentHandler):
def startElement(self, name, attrs):
print name, attrs

xml.sax.parseString(s, CH())

Paul
 
F

Fredrik Lundh

Martin said:
It seems that everybody is proposing libraries that use in-memory
representations. There is a standard xml package for Python, it's
called "xml" (and comes with the standard library). It contains a
SAX interface, xml.sax, which can parse files incrementally.

note that the requirements included "high-level" and "fast"; sax is
low-level, error-prone, and once you've finally fixed all the remaining
bugs in your state machine, not that fast, really.

</F>
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Paul said:
What about xml.dom.pulldom? It quite possibly resembles ElementTree's
iterparse, or at least promotes event-style handling of XML information
using some kind of mainloop...

Right; that also meets the criteria of being standard and not
in-memory (nobody had mentioned it so far).

Whether it is high-level and fast is in the eyes of the beholder
(as they are relative, rather than absolute properties).

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top