DOM implementation

Discussion in 'Python' started by Emanuele D'Arrigo, May 13, 2009.

  1. Hi everybody,

    I just spent the past hour or so trying to have a better understanding
    of how the various DOM-supporting libraries (xml.dom, xml.dom.minidom)
    work. I've used etree and lxml successfully before but I wanted to
    understand how close I can get to the W3C DOM standards. Ok, I think
    more or less I got it all. A few questions emerged:

    1) classes in xml.dom.minidom (i.e. Element) seem to be old style
    classes. Is there a good reason they are kept that way or simply
    nobody had the time/will to update the library to use new-style
    classes?

    2) for a lightweight implementation xml.dom.minidom comes with a lot
    of methods that aren't part of the W3C standards. I'm referring to
    toxml, toprettyxml, writxml and the _get_* family. Would it be better
    if there was a package offering W3C-faithful classes only, on top of
    which convenience and compatibility methods are added by another
    package (or two!) through subclassing?

    Manu
     
    Emanuele D'Arrigo, May 13, 2009
    #1
    1. Advertising

  2. Emanuele D'Arrigo

    Paul Boddie Guest

    On 13 Mai, 18:08, "Emanuele D'Arrigo" <> wrote:
    >
    > I just spent the past hour or so trying to have a better understanding
    > of how the various DOM-supporting libraries (xml.dom, xml.dom.minidom)
    > work. I've used etree and lxml successfully before but I wanted to
    > understand how close I can get to the W3C DOM standards.


    You might want to look at pxdom if you want a high level of compliance
    with W3C DOM standards:

    http://www.doxdesk.com/software/py/pxdom.html

    > Ok, I  think more or less I got it all. A few questions emerged:
    >
    > 1) classes in xml.dom.minidom (i.e. Element) seem to be old style
    > classes. Is there a good reason they are kept that way or simply
    > nobody had the time/will to update the library to use new-style
    > classes?


    I imagine that no-one bothered to update the code. The built-in
    modules like minidom do get maintenance, but not much further
    development. (PyXML, which seemed to accumulate code from 4Suite,
    possibly contributed code to the standard library, but it doesn't seem
    to be actively maintained or developed any more.)

    > 2) for a lightweight implementation xml.dom.minidom comes with a lot
    > of methods that aren't part of the W3C standards. I'm referring to
    > toxml, toprettyxml, writxml and the _get_* family. Would it be better
    > if there was a package offering W3C-faithful classes only, on top of
    > which convenience and compatibility methods are added by another
    > package (or two!) through subclassing?


    Those methods probably don't add that much weight, considering the
    weight that the W3C facilities already necessitate. I attempted to
    make a somewhat W3C-compliant implementation with the libxml2dom
    package (http://pypi.python.org/pypi/libxml2dom), although I felt that
    providing PyXML-like conveniences (similar to those you describe) was
    beneficial: some of the W3C APIs for parsing and serialisation are
    baroque, and although I've tried to implement some of those, too, I
    feel that it isn't a good use of my time.

    Paul
     
    Paul Boddie, May 13, 2009
    #2
    1. Advertising

  3. Thank you Paul for your reply!

    I'm looking into pxdom right now and it looks very good and useful!

    Thank you again!

    Manu
     
    Emanuele D'Arrigo, May 14, 2009
    #3
  4. Hey Paul,

    would you mind continuing this thread on Python + DOM? I'm trying to
    implement a DOM Events-like set of classes and I could use another
    brain that has some familiarity with the DOM to bounce ideas with. If
    you are too busy never mind. Also, I thought of keeping the discussion
    here rather than via email, for the benefit of current and future
    readers.

    Manu
     
    Emanuele D'Arrigo, May 15, 2009
    #4
  5. Emanuele D'Arrigo

    Paul Boddie Guest

    On 15 Mai, 15:23, "Emanuele D'Arrigo" <> wrote:
    > Hey Paul,
    >
    > would you mind continuing this thread on Python + DOM? I'm trying to
    > implement a DOM Events-like set of classes and I could use another
    > brain that has some familiarity with the DOM to bounce ideas with. If
    > you are too busy never mind. Also, I thought of keeping the discussion
    > here rather than via email, for the benefit of current and future
    > readers.


    Sure! Just keep your observations coming! I've made a very lazy
    attempt at DOM Events support in libxml2dom, since it looked as if it
    might be necessary when providing elementary SVG Tiny support (which
    also isn't finished), although I find these things quite hard to
    figure out with the usual vagueness of the specifications on certain
    crucial implementation-related details (and that there's a mountain of
    specifications that one has to navigate).

    One of my tests tries to exercise the code, but I might be doing it
    all completely wrong:

    https://hg.boddie.org.uk/libxml2dom/file/91c0764ac7c6/tests/svg_events.py

    It occurs to me that various PyQt- and PyKDE-related bindings might
    also provide some exposure to DOM Events, although I had heard that
    WebKit, which should have support for lots of DOM features, exposes
    some pretty useless interfaces to languages like Python, currently.
    The situation with Mozilla and PyXPCOM may well be similar.

    Paul
     
    Paul Boddie, May 15, 2009
    #5
  6. Hi Paul, thank you for your swift reply!

    On May 15, 3:42 pm, Paul Boddie <> wrote:
    > Sure! Just keep your observations coming! I've made a very lazy
    > attempt at DOM Events support in libxml2dom,


    I just had a look at libxml2dom, in particular its events.py file.
    Given that we are working from a standard your implementation is
    exceedingly similar to mine and had I know before I started writing my
    own classes I would have started from it instead! =)
    Browsing through the code, the EventTarget class docstring reads:

    The listeners for a node are accessed through the global object.
    This common
    collection is consequently accessed by all nodes in a document,
    meaning that
    distinct objects representing the same node can still obtain the
    set of
    listeners registered for that node. In contrast, any attempt to
    directly
    store listeners on particular objects would result in the specific
    object
    which registered the listeners holding the record of such objects,
    whereas
    other objects obtained independently for the same node would hold
    no such
    record.

    Naively, I implemented my EventTarget class storing its own listeners
    rather than global ones. Nevertheless, I'm not quite understanding
    this issue. Why shouldn't the listeners be stored directly on the
    EventTarget? I have a glimpse of understanding that if the
    DOMImplementation keeps EventTarget and Nodes (or Elements? which
    entity is supposed to support Events?) separate this might be
    necessary. But beside the fact that it's just a fuzzy and potentially
    incorrect intuition, I seem to think that the appropriate way to
    proceed would be for the DOMImplementation to provide a Node class
    that also inherits from EventTarget. In so doing the listeners would
    be immediately accessible as soon as one has a handle to a Node.

    Furthermore, your code finds the bubbling route with the line:

    bubble_route = target.xpath("ancestor::*")

    That xpath method is a libxml method right?

    > (...) although I find these things quite hard to
    > figure out with the usual vagueness of the specifications on certain
    > crucial implementation-related details (and that there's a mountain of
    > specifications that one has to navigate).


    Indeed there is some vagueness in the W3C recommendations and the
    various documents offer very little redundancy with each other but
    require you to be knowledgeable about them all! I'm managing to piece
    together the pieces of the puzzle only after a couple of day having an
    in-depth read-through of DOM, DOM Events and a little bit of XML
    events to see how it all works in practice. XML events is also what's
    prompting me to think that Node/Elements classes of the implementation
    should also inherit from EventTarget as they can all be event
    targets.

    > One of my tests tries to exercise the code, but I might be doing it
    > all completely wrong:
    >
    > https://hg.boddie.org.uk/libxml2dom/file/91c0764ac7c6/tests/svg_event...


    Before I can comment I'd like to better understand what you are aiming
    for with libxml2dom. It seems to be providing some kind of conversion
    services from the xml structure generated by libxml to a dom-like
    structure (implemented by pxdom?).
    Is that correct?

    > It occurs to me that various PyQt- and PyKDE-related bindings might
    > also provide some exposure to DOM Events, although I had heard that
    > WebKit, which should have support for lots of DOM features, exposes
    > some pretty useless interfaces to languages like Python, currently.
    > The situation with Mozilla and PyXPCOM may well be similar.


    PyKDE is off-limits because it's unix only while I'm trying to be
    cross-platform. PyQT is interesting. Very. Further investigation is
    required. =)

    Manu
     
    Emanuele D'Arrigo, May 15, 2009
    #6
  7. Emanuele D'Arrigo

    Paul Boddie Guest

    On 15 Mai, 18:27, "Emanuele D'Arrigo" <> wrote:
    >
    > I just had a look at libxml2dom, in particular its events.py file.
    > Given that we are working from a standard your implementation is
    > exceedingly similar to mine and had I know before I started writing my
    > own classes I would have started from it instead! =)


    Another implementation is probably a good thing, though, since I don't
    trust my own interpretation of the specifications. ;-)

    > Browsing through the code, the EventTarget class docstring reads:


    [Long docstring cut]

    > Naively, I implemented my EventTarget class storing its own listeners
    > rather than global ones. Nevertheless, I'm not quite understanding
    > this issue. Why shouldn't the listeners be stored directly on the
    > EventTarget?


    One reason for this might well be due to the behaviour of libxml2 and
    libxml2dom: if I visit the same node in a document twice, obtaining a
    node instance each time, these two instances will be different;
    therefore, storing listeners on such instances is not very helpful
    because the expectation that you will automatically see previously
    added listeners on a node will not generally be fulfilled. With pxdom,
    it may be a different situation, but libxml2dom is constrained by the
    behaviour of libxml2: I don't attempt to check node equivalence and
    then expose the structures representing a single node using a single
    object; I generally try and instantiate as few Python objects,
    wrapping libxml2 structures, as I can.

    > I have a glimpse of understanding that if the
    > DOMImplementation keeps EventTarget and Nodes (or Elements? which
    > entity is supposed to support Events?) separate this might be
    > necessary. But beside the fact that it's just a fuzzy and potentially
    > incorrect intuition, I seem to think that the appropriate way to
    > proceed would be for the DOMImplementation to provide a Node class
    > that also inherits from EventTarget. In so doing the listeners would
    > be immediately accessible as soon as one has a handle to a Node.


    The libxml2dom.svg module has classes which inherit from EventTarget.
    What I've tried to do is to make submodules to address particular
    formats and document models.

    > Furthermore, your code finds the bubbling route with the line:
    >
    > bubble_route = target.xpath("ancestor::*")
    >
    > That xpath method is a libxml method right?


    I use libxml2's XPath support exposed via libxml2dom.Node.

    > Indeed there is some vagueness in the W3C recommendations and the
    > various documents offer very little redundancy with each other but
    > require you to be knowledgeable about them all! I'm managing to piece
    > together the pieces of the puzzle only after a couple of day having an
    > in-depth read-through of DOM, DOM Events and a little bit of XML
    > events to see how it all works in practice. XML events is also what's
    > prompting me to think that Node/Elements classes of the implementation
    > should also inherit from EventTarget as they can all be event
    > targets.


    I think that if I were to expose an event-capable DOM, other than that
    provided for SVG, I would just have a specific submodule for that
    purpose.

    > > One of my tests tries to exercise the code, but I might be doing it
    > > all completely wrong:

    >
    > >https://hg.boddie.org.uk/libxml2dom/file/91c0764ac7c6/tests/svg_event...

    >
    > Before I can comment I'd like to better understand what you are aiming
    > for with libxml2dom. It seems to be providing some kind of conversion
    > services from the xml structure generated by libxml to a dom-like
    > structure (implemented by pxdom?).
    > Is that correct?


    Yes. The aim is to provide a PyXML DOM API on top of libxml2
    documents.

    Paul
     
    Paul Boddie, May 15, 2009
    #7
  8. Hello Paul, sorry for the long delay, I was trying to wrap my mind
    around DOM and Events implementations...

    On May 15, 7:08 pm, Paul Boddie <> wrote:
    > Another implementation is probably a good thing, though, since I don't
    > trust my own interpretation of the specifications. ;-)


    Tell me about it. In general I like the work the W3C is doing, but
    some things could use a little less freedom and a little more clarity.
    =) But then again, maybe it's for the best to leave things as they are
    so that we can figure it out for ourselves.

    > > Why shouldn't the listeners be stored directly on the EventTarget?

    >
    > One reason for this might well be due to the behaviour of libxml2 and
    > libxml2dom: if I visit the same node in a document twice, obtaining a
    > node instance each time, these two instances will be different;


    Mmmm.... I don't know the specifics of libxml... are you saying that
    once the object tree is created out of an XML file, requesting twice
    the same node object -does not- result in a pointer to the same
    instance in memory? How's that possible?

    > The libxml2dom.svg module has classes which inherit from EventTarget.


    And what does the EventTarget inherit from? Or are those classes
    inheriting
    from both Nodes and EventTargets?

    > What I've tried to do is to make submodules to address particular
    > formats and document models.


    I think the issue to consider there is that the DOM does not restrict
    a document from being a mush-up of multiple formats. I.e. it should be
    possible to have XHTML and SVG tags in the same document. As long as
    those modules work at element/tag level and do not obstruct each other
    I think you are on the right track!

    > I think that if I were to expose an event-capable DOM, other than that
    > provided for SVG, I would just have a specific submodule for that
    > purpose.


    Ultimately I found it moderately easier to modify pxdom with the
    intention of releasing "pxdome", a fork of pxdom. Monkey-patching
    pxdom seemed to be a little too tricky and prone to error to create a
    separate module.

    > > > One of my tests tries to exercise the code, but I might be doing it
    > > > all completely wrong:
    > > >https://hg.boddie.org.uk/libxml2dom/file/91c0764ac7c6/tests/svg_event....


    I had a more in-depth look after having spent the weekend trying to
    wrap my head around all sorts of implementation issues.

    My understanding, also after a few exchanges in the
    mailing-list, is that initialization of an event can happen wherever
    you feel like doing it, except in Document.createEvent(). I.e. it
    could be a method on the event itself or an external function. In your
    code however, I believe the initialization method should be
    initMouseEventNS() rather then initEventNS() and the namespace for DOM
    3 Events should be -None-. Between the two implementations the first
    one seems to be more aligned with the DOM documentation.

    The way I'm doing it is that I invoke Document.createEvent(eventType),
    I initialize the resulting event in part manually and in part with
    type-related default settings and I finally use
    Document.pxdomTriggerEvent(event) to create a propagation path and
    iterate through its targets. I.e.:

    def _trigger_DOMSubtreeModified(target):

    relevantTargetTypes = (Node.DOCUMENT_NODE,
    Node.DOCUMENT_FRAGMENT_NODE,
    Node.ELEMENT_NODE, Node.ATTRIBUTE_NODE)

    if target.nodeType not in relevantTargetTypes:
    return

    if target.ownerDocument:
    event = target.ownerDocument.createEvent("MutationEvent")

    event._target = target

    target.ownerDocument.pxdomEventDefaultInitNS(None,
    "DOMSubtreeModified", event)
    target.ownerDocument.pxdomTriggerEvent(event)

    Notice that I'm currently keeping this function as a loose function
    but it could very well be placed as a method in the Document class or
    in each relevant classes. I'm not sure why one option would be better
    than all others and the DOM doesn't specify it.

    The dispatch of the event to each target on the propagation path is
    also a matter of implementation. In the discussion in www-dom three
    options have emerged: 1) the Document node establishes the propagation
    path and iterates through the targets listed to dispatch the event to
    each 2) an unspecified, external object does the same job 3) the
    propagation path is established, stored on the event and each event
    target is responsible for recursively dispatching the event to the
    next target if propagation hasn't been stopped. Apparently an earlier
    version of Mozilla's Gecko used option 3 but they eventually switched
    to option 1. Again, it's unclear in what circumstances to use one
    option or the other.

    What I don't know at this time is how to merge all this with the
    specific file formats such as SVG and HTML. I.e. in an SVG example, do
    I create a GroupElement(Element) class and I override the
    Document.createElement() method to create an instance of it any time a
    <g> element is found in the input file? Or do I first create an
    application-neutral DOM tree out of the input file and I then
    instantiate a parallel application-specific structure, holding the
    objects that provide methods to actually draw and group shapes? If I
    get an answer from www-dom I'll report it here...

    Manu
     
    Emanuele D'Arrigo, May 19, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thorsten Meininger
    Replies:
    0
    Views:
    444
    Thorsten Meininger
    Jul 28, 2004
  2. Thorsten Meininger
    Replies:
    0
    Views:
    513
    Thorsten Meininger
    Jul 28, 2004
  3. mike
    Replies:
    1
    Views:
    1,150
    Martin Honnen
    Nov 20, 2004
  4. Replies:
    0
    Views:
    559
  5. Replies:
    3
    Views:
    538
    Stefan Behnel
    Aug 3, 2007
Loading...

Share This Page