python xml dom help please

D

deglog

Apologies if this post appears more than once.

The file -

---------------
<?xml version="1.0" encoding="utf-8"?>
<Game><A/><B/><C/></Game>
---------------

is processed by this program -

---------------
#!/usr/bin/env python

from xml.dom.ext.reader import PyExpat
from xml.dom.ext import PrettyPrint

import sys

def deepen(nodeList):
for node in nodeList:
print(node.nodeName)
if node.previousSibling != None:
if node.previousSibling.nodeType == node.ELEMENT_NODE:
if node.previousSibling.hasChildNodes():
print("has children")
node.previousSibling.lastChild.appendChild(node)
else:
node.previousSibling.appendChild(node)
deepen(node.childNodes)

# get DOM object
reader = PyExpat.Reader()
doc = reader.fromUri(sys.argv[1])

# call func
deepen(doc.childNodes)

# display altered document
PrettyPrint(doc)
---------------

which outputs the following -

---------------
Game
Game
A
B
<?xml version='1.0' encoding='UTF-8'?>
<Game>
<A>
<B/>
</A>
<C/>
</Game>

---------------

Can anybody explain why the line 'print(node.nodeName)' never prints 'C'?

Also, why 'has children' is never printed?

I am trying to output

---------------
<?xml version='1.0' encoding='UTF-8'?>
<Game>
<A>
<B>
<C/>
</B>
</A>
</Game>
 
M

Miklós

Without having any thorough look at your (recursive)'deepen' function, I can
see there's no termination condition for the recursion....
So that's one reason this won't work the way you want it to.

Miklós
 
D

Diez B. Roggisch

Miklós said:
Without having any thorough look at your (recursive)'deepen' function, I
can see there's no termination condition for the recursion....
So that's one reason this won't work the way you want it to.

Nope - he has a termination condition. deepen is called for all childNodes,
so he makes a traversal of all nodes.

Regards,

Diez
 
D

Diez B. Roggisch

Hi,
Also, why 'has children' is never printed?

The code is somewhat complicated, however the reason for "has children" not
beeing printed is simply that for the example no node matches the condition
- nodes A,B,C are the only ones with siblings, and none of them has a child
node....
I know there are easier ways to do this, but i want to do it using dom.

I'm not sure what easier ways _you_ think of - but to me it looks like a
classic field for XSLT, which is much more convenient to deal with. DOM is
usually PIA, don't mess around with it if you're not forced to.

Diez
 
A

Andrew Clover

def deepen(nodeList):
for node in nodeList:
[...]
node.previousSibling.appendChild(node)

Bzzt: destructive iteration gotcha.

DOM NodeLists are 'live': when you move a child Element out of the parent,
it no longer exists in the childNodes list. So in the example:

<a/>
<b/>
<c/>

the first element (a) cannot be moved and is skipped; the second element (b)
is moved into its previousSibling (a); the third element... wait, there is no
third element any more because (c) is now the second element. So the loop
stops.

A solution would be to make a static copy of the list beforehand. There's no
standard-DOM way of doing that and the Python copy() method is not guaranteed
to work here, so use a list comprehension or map:

identity= lambda x: x
for node in map(identity, nodeList):
...
 
J

John J. Lee

(e-mail address removed) (deglog) wrote: [...]
A solution would be to make a static copy of the list beforehand. There's no
standard-DOM way of doing that and the Python copy() method is not guaranteed
to work here, so use a list comprehension or map:

identity= lambda x: x
for node in map(identity, nodeList):
...

Why not just

for node in list(nodeList):
...

?


John
 
D

deglog

Thanks for the help - this works and i understand how, and why.

Why not just

for node in list(nodeList):
...

?


John

the following also works (as i intended):

from xml.dom.NodeFilter import NodeFilter

def appendToDescendant(node):
walker.previousSibling()
while 1:
if walker.currentNode.hasChildNodes():
next = walker.nextNode()
else: break
walker.currentNode.appendChild(node)

walker = doc.createTreeWalker(doc.documentElement,NodeFilter.SHOW_ELEMENT,
None, 0)
while 1:
print walker.currentNode.nodeName
if walker.currentNode.previousSibling != None:
print "ps "+walker.currentNode.previousSibling.nodeName
if walker.currentNode.previousSibling.nodeName != "Game":
if walker.currentNode.previousSibling.hasChildNodes():
appendToDescendant(walker.currentNode)
else:
walker.currentNode.previousSibling.appendChild(walker.currentNode)
next = walker.nextNode()
if next is None: break

Strangely, the line checking "Game" is needed, because this firstnode
is its own previous sibling - how can this be right?

for example with the input file:
 
A

Andrew Clover

John J. Lee said:
Why not just for node in list(nodeList)?

You're right! I never trusted list() to make a copy if it was already a
native list (as it is sometimes in eg. minidom) but, bothering to check the
docs, it is guaranteed to after all. Hurrah.

def appendToDescendant(node):
walker.previousSibling()
while 1:
if walker.currentNode.hasChildNodes():
next = walker.nextNode()
else: break
walker.currentNode.appendChild(node)

Are you sure this is doing what you want? A TreeWalker's nextNode() method
goes to an node's next matching sibling, not into its children. To go into
the matching children you'd use TreeWalker.firstChild().

The function as written above appends the argument node to the first sibling
to have no child nodes, starting from the TreeWalker's current node or its
previous sibling if there is one.

I'm not wholly sure I understand the problem you're trying to solve. If you
just want to nest sibling elements as first children, you could do it without
Traversal or recursion, for example:

def nestChildrenIntoFirstElements(parent):
elements= [c for c in parent.childNodes if c.nodeType==c.ELEMENT_NODE]
if len(elements)>=2:
insertionPoint= elements[0]
for element in elements[1:]:
insertionPoint.appendChild(element)
insertionPoint= element

(Untested but no reason it shouldn't work.)
Strangely, the line checking "Game" is needed, because this firstnode
is its own previous sibling - how can this be right?

4DOM is fooling you. It has inserted a <!DOCTYPE> declaration automatically
for you. (It probably shouldn't do that.) So the previous sibling of the
documentElement is the doctype; of course the doctype has the same nodeName
as the documentElement, so the debugging output is misleading.
 
D

deglog

Are you sure this is doing what you want? A TreeWalker's nextNode() method
goes to an node's next matching sibling, not into its children. To go into
the matching children you'd use TreeWalker.firstChild().
right


I'm not wholly sure I understand the problem you're trying to solve.

actually i'm trying to change the relationship 'is next sibling of' to
'is child of' throughout a document

my latest idea is to go to the end of the document, then walk it
backwards (for christmas?:) towards this end i wrote:
---
walker = doc.createTreeWalker(doc.documentElement,NodeFilter.SHOW_ELEMENT,
None, 0)
while 1:
print '1 '+walker.currentNode.nodeName
next = walker.nextNode()
if next is None: break
print '2 '+walker.currentNode.nodeName
 
A

Andrew Clover

actually i'm trying to change the relationship 'is next sibling of' to
'is child of' throughout a document

Well, the snippet in the posting above should do that well enough. What
happens to any existing nested children is not defined.
How come the current node is back at the start atfter the loop has finished?

Bug. I've just submitted a patch to the PyXML tracker to address this issue.

(Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
bugs, that can lead to infinite recursion.)

That said, I'm not sure how using a TreeWalker or walking backwards actually
helps you here! If you are just using it to filter out non-element children,
remember that moving the current node takes the position of the TreeWalker
with it. It's not like NodeIterator.
 
D

deglog

Bug. I've just submitted a patch to the PyXML tracker to address this issue.

(Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
bugs, that can lead to infinite recursion.)

Thanks.

Does the function def __regress(self) from the same package need a similar fix?

(i am using PyXml 0.8.3)
 
A

Andrew Clover

Does the function def __regress(self) from the same package need a similar
fix?

Nope, looks OK to me. There's no 'in between' state where the current node
ends up pointing somewhere it shouldn't in this one, because of the different
order of the next/previous-sibling step and the move-through-ancestor/descendant
step.

I haven't checked all of the rest of the code, though, so I can't guarantee
there aren't any other problems with 4DOM's Traversal/Range implementation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,161
Latest member
GertrudeMa
Top