DOM question

Discussion in 'Python' started by Richard Lewis, Jun 2, 2005.

  1. Hi there,

    I have an XML document which contains a mixture of structural nodes
    (called 'section' and with unique 'id' attributes) and non-structural
    nodes (called anything else). The structural elements ('section's) can
    contain, as well as non-structural elements, other structural elements.
    I'm doing the Python DOM programming with this document and have got
    stuck with something.

    I want to be able to get all the non-structural elements which are
    children of a given 'section' elemenent (identified by 'id' attribute)
    but not children of any child 'section' elements of the given 'section'.

    e.g.:

    <section id="a">
    <foo>bar</foo>
    </section>
    <section id="b">
    <foo>baz</foo>
    <section id="c">
    <bar>foo</bar>
    </section>
    </section>

    Given this document, the working function would return "<foo>baz</foo>"
    for id='b' and "<bar>foo</bar>" for id='c'.

    Normally, recursion is used for DOM traversals. I've tried this function
    which uses recursion with a generator (can the two be mixed?)

    def content_elements(node):
    if node.hasChildNodes():
    node = node.firstChild

    if not page_node(node):
    yield node

    for e in self.content_elements(node):
    yield e

    node = node.nextSibling

    which didn't work. So I tried it without using a generator:

    def content_elements(node, elements):
    if node.hasChildNodes():
    node = node.firstChild

    if node.nodeType == Node.ELEMENT_NODE: print node.tagName
    if not page_node(node):
    elements.append(node)

    self.content_elements(node, elements)

    node = node.nextSibling

    return elements

    However, I got exactly the same problem: each time I use this function I
    just get a DOM Text node with a few white space (tabs and returns) in
    it. I guess this is the indentation in my source document? But why do I
    not get the propert element nodes?

    Cheers,
    Richard
     
    Richard Lewis, Jun 2, 2005
    #1
    1. Advertising

  2. > However, I got exactly the same problem: each time I use this function I
    > just get a DOM Text node with a few white space (tabs and returns) in
    > it. I guess this is the indentation in my source document? But why do I
    > not get the propert element nodes?


    Welcome to the wonderful world of DOM, Where insignificant whitespace
    becomes a first-class citizen!

    Use XPath. Really. It's well worth the effort, as it is suited for exactly
    the tasks you presented us, and allows for a concise formulation of these.
    Yours would be (untested)

    //section[id==$id_param]/node()[!name() == section]

    It looks from the root throug all the descending childs

    //

    after nodes with name section

    section

    that fulfill the predicate

    [id==$id_param]

    From this out we collect all immediate children

    /node()

    that are not of type section [!name() == section]


    --
    Regards,

    Diez B. Roggisch
     
    Diez B. Roggisch, Jun 2, 2005
    #2
    1. Advertising

  3. On Thu, 02 Jun 2005 14:34:47 +0200, "Diez B. Roggisch"
    <> said:
    > > However, I got exactly the same problem: each time I use this function I
    > > just get a DOM Text node with a few white space (tabs and returns) in
    > > it. I guess this is the indentation in my source document? But why do I
    > > not get the propert element nodes?

    >
    > Welcome to the wonderful world of DOM, Where insignificant whitespace
    > becomes a first-class citizen!
    >
    > Use XPath. Really. It's well worth the effort, as it is suited for
    > exactly
    > the tasks you presented us, and allows for a concise formulation of
    > these.
    > Yours would be (untested)
    >
    > //section[id==$id_param]/node()[!name() == section]
    >
    >

    Yes, in fact:

    //section[@id=$id_param]//*[name()!='section']

    would do the trick.

    I was trying to avoid using anything not in the standard Python
    distribution if I could help it; I need to be able to use my code on
    Linux, OS X and Windows.

    The xml.path package is from PyXML, yes? I'll just have to battle with
    installing PyXML on OS X ;-)

    Cheers,
    Richard
     
    Richard Lewis, Jun 2, 2005
    #3
  4. >
    > Yes, in fact:
    >
    > //section[@id=$id_param]//*[name()!='section']
    >
    > would do the trick.
    >
    > I was trying to avoid using anything not in the standard Python
    > distribution if I could help it; I need to be able to use my code on
    > Linux, OS X and Windows.
    >
    > The xml.path package is from PyXML, yes? I'll just have to battle with
    > installing PyXML on OS X ;-)


    As a fresh member of the MacOSX community I can say that so far except
    pygame I made everything run. So - I don't expect that to be too much of
    a problem.

    Diez
     
    Diez B. Roggisch, Jun 2, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thorsten Meininger
    Replies:
    0
    Views:
    448
    Thorsten Meininger
    Jul 28, 2004
  2. Thorsten Meininger
    Replies:
    0
    Views:
    517
    Thorsten Meininger
    Jul 28, 2004
  3. mike
    Replies:
    1
    Views:
    1,225
    Martin Honnen
    Nov 20, 2004
  4. Replies:
    0
    Views:
    568
  5. Replies:
    3
    Views:
    548
    Stefan Behnel
    Aug 3, 2007
Loading...

Share This Page