Desc of packages for XML processing

A

ankit

There are various packages availaible for XML processing using python.
So which to choose and when. I summarized some of the features,
advantages and disadvantages of some packages int the following text.
Have a look to it. May this get out of the dillema of choice.

Here we go:

OPTIONS
=========
- libxml2
- lxml
- Pyxml
- 4Suite



DESCRIPTION
=============


-------
libxml2
-------
A quote by Mark Pilgrim: "Programming with libxml2 is like the
thrilling embrace of an exotic stranger. It seems to have the potential
to fulfill your wildest dreams, but there's a nagging voice somewhere
in your head warning you that you're about to get screwed in the worst
way."

Features:
=========
- Namespaces in XML
- XPath, Xpointer, XInclude XML Base
- XML Schemas Part 2 : DataTypes
- Relax NG
- SAX: a SAX2 like interface and a minimal SAX1 implementation
compatible
with early expat versions
- NO DOM: It provide support for DOM to some extent BUT it does not

implement the API itself, gdome2 .
- It is written in plain C, making as few assumptions as possible,
and sticking
closely to ANSI C/POSIX for easy embedding.
- Platform: Linux/Unix/Windows


Advantages
==========
- Standards-compliant XML support.
- Full-featured.
- Actively maintained by XML experts.
- fast. fast! FAST!
- Stable.

Disadvantages
=============
This library already ship with Python bindings, but
these Python bindings have
some problems:
- Very low level and C-ish (not Pythonic).
- Underdocumented and huge, you get lost in them.
- UTF-8 in API, instead of Python unicode strings.
- Can cause segfaults from Python.
- Have to do manual memory management. As the
library calls are more or
less an exact mapping on the C API, and thus
require to think about
memory management

For Those who want ot go for DOM API:
Packages for DOM
================
- gdome2: gdome2 provides support for dom on top of
libxml2.C-Based
(http://gdome2.cs.unibo.it/)
- libxml2dom: Other option availabile is libxml2dom.

(http://cheeseshop.python.org/pypi/libxml2dom/0.3.3)
- libxml_domlib:libxml_domlib is a Python extension module that
enables you
to use the DOM interface to libxml2

(http://www.rexx.com/~dkuhlman/libxml_domlib.html)


Resources
==========
- http://xmlsoft.org/index.html
- http://codespeak.net/lxml/intro.html


----
lxml
-----
lxml follows the ElementTree API as much as possible, building it on
top of the native libxml2 tree.

Features
========
- lxml provides all above features as of libxml2 but using
ElementTreet API.

Advantages
==========
- Pythonic API.
- Documented.
- Use Python unicode strings in API.
- Safe (no segfaults).
- No manual memory management


Disadvantages
==============
- No DOM support as in libxml2.
- It is in its initial release (latest is lxml 0.7)


Resources
=========
- http://codespeak.net/lxml/


------
Pyxml
------
Features
=========
- xmlproc: a validating XML parser.
- Expat: a fast non-validating parser.
- sgmlop: a C helper module that can speed-up xmllib.py and
sgmllib.py by a
factor of 5.
- PySAX: SAX 1 and SAX2 libraries with drivers for most of the
parsers.
- 4DOM: A fully compliant DOM Level 2 implementation
- pulldom: a DOM implementation that supports lazy instantiation of
nodes.
- marshal: a module with several options for serializing Python
objects to XML


Advantages
==========
- A lot of documentation is availaible and almost all resources and
examples
based on it.

Disadvantages
=============
- No Schema support

Pacakges for Schema(For those who want schema support too)
===================
XSV: currently in progress, and provides XML schema Part 1:
Structures.
Dependent on some other pacakage PyLTXML
(http://www.ltg.ed.ac.uk/~ht/xsv-status.html)




-------
4Suite
-------
Features:
=========
- XML,XSLT,XPath,DOM,XInclude,XPointer,XLink,XUpdate,RELAX NG,XML
Catalogs
- Platform: Posix, Windows

Advantages
============
- As, this provides Relax NG: RELAX NG, a simple schema language for
XML,
based on [RELAX] and [TREX]. A RELAX NG schema
specifies a pattern for
the structure and content of an XML document.
[1]
http://www.oasis-open.org/committees/relax-ng/spec-20011203.html#IDAGDYR
[2] http://xmlbuddy.com/2.0/features.html
[3] http://www.xml.com/pub/a/2001/12/12/schemacompare.html?page=2

* But Relax NG is not W3C based. It is provided by OASIS.


Site:
======
[4] http://cheeseshop.python.org/pypi/4Suite-XML/1.0b3
 
K

Kent Johnson

ankit said:
There are various packages availaible for XML processing using python.
So which to choose and when. I summarized some of the features,
advantages and disadvantages of some packages int the following text.
Have a look to it. May this get out of the dillema of choice.

Here we go:

OPTIONS
=========
- libxml2
- lxml
- Pyxml
- 4Suite

Also ElementTree, Amara

- No Windows release to date :-(

Kent
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,008
Latest member
HaroldDark

Latest Threads

Top