Re: XML

Discussion in 'Python' started by Paul Boddie, Jun 24, 2003.

  1. Paul Boddie

    Paul Boddie Guest

    "A.M. Kuchling" <> wrote in message news:<>...
    >
    > I've come to the conclusion that the initial concern for supporting APIs
    > such as SAX and DOM in Python was a mistake; many bugs stem from trying to
    > support interfaces that don't map to Python very well. Instead we should
    > have made nice Pythonic interfaces such as effbot's ElementTree which are
    > simpler to implement and to use, and ignore the W3C's APIs.


    The thing is that Python and its developers quite often have to live
    (and work) alongside other technologies; having a set of common APIs
    is important if you consider them in that context. The other issue is
    that the Python community isn't always supreme at standardising
    things, working through the edge cases, and so on, and one might well
    argue that the DOM specification has at least had a lot of attention
    on most areas to be considered generally robust. I haven't seriously
    looked at most of the "Pythonic" APIs, but I would be quite concerned
    about interoperability, how comprehensive they are (with respect to
    representing all XML details), and what the options are for increasing
    performance without pervasive source code changes.

    > SAX isn't too bad in this respect[1] -- pull APIs look pretty similar,
    > differing only in interface and method names -- but the DOM has been a mess,
    > spawning several implementations all of which are fairly complex.


    I like the work that Andrew Clover did with regard to testing the
    different DOM implementations for Python:

    http://mail.python.org/pipermail/xml-sig/2003-June/009560.html

    I'll agree that it's probably quite demanding to write a DOM
    implementation and support various levels of compliance. However, I'd
    argue that as a user of such implementations, one does gain from the
    broad compatibility between implementations - certainly, I've used
    cDomlette and minidom interchangeably for some time with the only
    major issue being a library "collision" around Expat and mod_python
    with cDomlette.

    > [1] SAX is a _de facto_ API, not a W3C one; perhaps that explains why it's
    > not too bad.


    Certainly, the W3C DOM had dubious beginnings, but I don't personally
    buy into the widespread arguments that it is found seriously lacking
    in a number of supposedly key criteria. Or at least, I don't really
    see many of the supposedly better alternatives as being noticeably
    better, especially when DOM as a "platform" supports some very useful
    technologies indeed.

    Paul
     
    Paul Boddie, Jun 24, 2003
    #1
    1. Advertising

  2. On 24 Jun 2003 01:52:27 -0700,
    Paul Boddie <> wrote:
    > The thing is that Python and its developers quite often have to live
    > (and work) alongside other technologies; having a set of common APIs
    > is important if you consider them in that context.


    In practice, there's no way to access those other technologies from Python.
    There's a Python wrapper for the Xerces DOM implementation, but I never hear
    about anyone using it; there's a wrapper for libxml2, but it has its own
    API that's somewhat similar to ElementTree (but not as nice to use --
    someone should fix that, because libxml2 is blazingly fast). So any DOM
    implementation you use will likely have been built by the Python world, and
    could have been written to a standard interface. Jython users could use the
    Python interface or use the Jython mapping of Java interfacers.

    When you think about it: how useful is it that the Python DOM interface uses
    the same method names as the Java or Perl interface? What's gained by this?
    I initially thought there might be some gain from being able to use material
    written for other languages to learn the API, but don't know how most users
    learn the DOM; do they read the DOM Recommendation, look at tutorials, read
    the implementation source, or just copy existing code?

    In this context, I find the existence of jDOM, a Java-centric DOM-like API,
    to support this view. There's even a jDOM JSR, the Java world's equivalent
    of a PEP.

    --amk
     
    A.M. Kuchling, Jun 24, 2003
    #2
    1. Advertising

  3. Paul Boddie

    Alan Kennedy Guest

    "A.M. Kuchling" wrote:

    > When you think about it: how useful is it that the Python DOM
    > interface uses the same method names as the Java or Perl interface?
    > What's gained by this?


    Code interoperability. This is very important, given the "glue" like
    nature of many uses of python, for scripting COM, Java, .NET, etc. So
    I can do things like this (off the top of my head, not tested)

    def loadDOM(filename):
    try:
    from win32com.client import Dispatch
    msxml = Dispatch('Msxml2.DOMDocument.4.0')
    domtree = msxml.load(filename)
    except ImportError:
    import xml.dom.minidom
    domtree = xml.dom.minidom.parse(filename)
    return domtree

    dom = loadDOM('myfile.xml')
    for anchor in dom.getElementsByTagName('a'):
    print "Link: %s" % anchor.getAttribute('href')

    > I initially thought there might be some gain from being able to use
    > material written for other languages to learn the API, but don't
    > know how most users learn the DOM; do they read the DOM Recommendation,
    > look at tutorials, read the implementation source, or just copy
    > existing code?


    I read the DOM Recommendation :)& But I did all of the others as well,
    at different stages, and I think most people end up doing more than one
    as well.

    > In this context, I find the existence of jDOM, a Java-centric
    > DOM-like API, to support this view. There's even a jDOM JSR,
    > the Java world's equivalent of a PEP.


    It's a pity that the JDOM isn't very well designed for extensibility,
    as opposed to DOM4J, which is so extensible that it has a steep
    learning curve. (I actually ended up writing my own minimal read-only
    XOM for a Java app, because it was quicker than trying to get JDOM or
    DOM4J to do what I needed. I must find the time to open source that
    one of these days, with its jaxen adapter).

    Which I think illustrates the simple point that many interfaces are
    needed in different scenarios: it's "horses for courses". Sometimes
    one needs interoperability, as per the 1st example above. Sometimes one
    only needs simplicity, so something pythonic like elementree or pyxie
    is suitable. Other times, requirements are somewhere in the middle of
    those two.

    I generally find that interoperability is almost always worth the
    pain, if the code is going to be used for any period of time. When use
    cases change and, for example, processing volumes increase, or I need
    to (schema)validate documents, then interoperable code greatly simplifies
    the problem, because I can switch seamlessly to a high-performance DOM,
    or one that does validation, or supports
    xpath/relaxng/xpointer/events//whatever.

    regards,

    --
    alan kennedy
    -----------------------------------------------------
    check http headers here: http://xhaus.com/headers
    email alan: http://xhaus.com/mailto/alan
     
    Alan Kennedy, Jun 24, 2003
    #3
  4. Paul Boddie

    Paul Boddie Guest

    (Paul Boddie) wrote in message news:<>...
    >


    [libxml2]

    > Yes, it's very tempting to write a PyXML-style DOM API for it. Then we
    > can use XPath (whether it be from PyXML or 4Suite) on our documents
    > without having to port our source code just because some underlying
    > implementation detail has changed.


    Minor correction on my part, here: since libxml2 provides an XPath
    implementation, the use of the PyXML/4Suite XPath implementations on
    top of a libxml2 DOM layer wouldn't be strictly necessary, but it
    might be nice to harmonise the APIs so that XPath contexts and queries
    are accessed in the same way for all available implementations. Having
    tried libxml2 and libxslt out recently, I certainly agree that they
    seem very fast in comparison to other XML processing libraries.

    Paul
     
    Paul Boddie, Jun 27, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bomb Diggy
    Replies:
    0
    Views:
    447
    Bomb Diggy
    Jul 28, 2004
  2. Tony Prichard
    Replies:
    0
    Views:
    731
    Tony Prichard
    Dec 12, 2003
  3. Mark Van Orman

    embedding xml in xml as non-xml :)

    Mark Van Orman, Sep 14, 2004, in forum: XML
    Replies:
    5
    Views:
    481
    Patrick TJ McPhee
    Sep 15, 2004
  4. Andy
    Replies:
    0
    Views:
    539
  5. Erik Wasser
    Replies:
    5
    Views:
    463
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page