Fast and capable XML parser?

M

Magnus Lycka

I'm looking for some library to parse XML code
much faster than the libs built into Python 2.4
(I'm stuck with 2.4 for quite a while) and I
also need XML Schema validation, and would
appreciate support for e.g. XPath and XInclude.
I also want an API which is more Pythonic than
e.g. a thin wrapper over a C or C++ API.

It should be available on at least Linux,
Solaris and AIX.

Some uses involve parsing lots of (often small)
XML files at reasonable speed, i.e. several
hundred files per second. That means that we
can't use anything like an os.system call to
xmllint for XML Schema validation--it gets too
slow. I also suspect that the standard Python
libs (in Python 2.4 at least) are slower than
we'd like them to be. (Not that it matters if
they don't support XML Schema validation.)

Any suggestions?
 
L

Larry Bates

Magnus said:
I'm looking for some library to parse XML code
much faster than the libs built into Python 2.4
(I'm stuck with 2.4 for quite a while) and I
also need XML Schema validation, and would
appreciate support for e.g. XPath and XInclude.
I also want an API which is more Pythonic than
e.g. a thin wrapper over a C or C++ API.

It should be available on at least Linux,
Solaris and AIX.

Some uses involve parsing lots of (often small)
XML files at reasonable speed, i.e. several
hundred files per second. That means that we
can't use anything like an os.system call to
xmllint for XML Schema validation--it gets too
slow. I also suspect that the standard Python
libs (in Python 2.4 at least) are slower than
we'd like them to be. (Not that it matters if
they don't support XML Schema validation.)

Any suggestions?

I don't know if it meets ALL of your requirements but this might
help:

http://www.reportlab.org/pyrxp.html

-Larry
 
S

Steven Bethard

Magnus said:
I'm looking for some library to parse XML code
much faster than the libs built into Python 2.4
(I'm stuck with 2.4 for quite a while) and I
also need XML Schema validation, and would
appreciate support for e.g. XPath and XInclude.
I also want an API which is more Pythonic than
e.g. a thin wrapper over a C or C++ API.

For a more Pythonic API, you probably want to look at cElementTree which
is now in the Python 2.5 stdlib. You can get it standalone from here:

http://effbot.org/downloads/#cElementTree

This implementation doesn't fully support XSLT, XPath, etc. but lxml
exposes an ElementTree-style API through lxml.etree, and does, I
believe, support many of these other things:

http://codespeak.net/lxml/

I don't know too much about lxml's speed, but since it's a wrapper to
the libxml2 and libxslt libraries, it should be reasonably fast.

STeVe
 
M

Magnus Lycka

Larry said:
I don't know if it meets ALL of your requirements but this might
help:

http://www.reportlab.org/pyrxp.html

AFAIK, there is no XML Schema support in PyRXP.
This is really bad enough.

GPL is not an option for us, and a commercial
licence is less good than e.g. MIT or LGPL.
(Partly due to the cost, but also because it
causes much more work for me.)

Besides, I'm a bit suspicious concerning the
lack of benchmarks for the Unicode version...

It seems to me that lxml is better in all
aspects.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top