python xml dom help please

deglog · Nov 23, 2003

Apologies if this post appears more than once.

The file -

---------------
<?xml version="1.0" encoding="utf-8"?>
<Game><A/><C/></Game>
---------------

is processed by this program -

---------------
#!/usr/bin/env python

from xml.dom.ext.reader import PyExpat
from xml.dom.ext import PrettyPrint

import sys

def deepen(nodeList):
for node in nodeList:
print(node.nodeName)
if node.previousSibling != None:
if node.previousSibling.nodeType == node.ELEMENT_NODE:
if node.previousSibling.hasChildNodes():
print("has children")
node.previousSibling.lastChild.appendChild(node)
else:
node.previousSibling.appendChild(node)
deepen(node.childNodes)

# get DOM object
reader = PyExpat.Reader()
doc = reader.fromUri(sys.argv[1])

# call func
deepen(doc.childNodes)

# display altered document
PrettyPrint(doc)
---------------

which outputs the following -

---------------
Game
Game
A
B
<?xml version='1.0' encoding='UTF-8'?>
<Game>
<A>

</A>
<C/>
</Game>

---------------

Can anybody explain why the line 'print(node.nodeName)' never prints 'C'?

Also, why 'has children' is never printed?

I am trying to output

---------------
<?xml version='1.0' encoding='UTF-8'?>
<Game>
<A>

<C/>

</A>
</Game>

Miklós · Nov 23, 2003

Without having any thorough look at your (recursive)'deepen' function, I can
see there's no termination condition for the recursion....
So that's one reason this won't work the way you want it to.

Miklós

Diez B. Roggisch · Nov 23, 2003

Miklós said:
Without having any thorough look at your (recursive)'deepen' function, I
can see there's no termination condition for the recursion....
So that's one reason this won't work the way you want it to.

Nope - he has a termination condition. deepen is called for all childNodes,
so he makes a traversal of all nodes.

Regards,

Diez

Diez B. Roggisch · Nov 23, 2003

Hi,

Also, why 'has children' is never printed?

The code is somewhat complicated, however the reason for "has children" not
beeing printed is simply that for the example no node matches the condition
- nodes A,B,C are the only ones with siblings, and none of them has a child
node....

I know there are easier ways to do this, but i want to do it using dom.

I'm not sure what easier ways _you_ think of - but to me it looks like a
classic field for XSLT, which is much more convenient to deal with. DOM is
usually PIA, don't mess around with it if you're not forced to.

Diez

Andrew Clover · Nov 24, 2003

def deepen(nodeList):
for node in nodeList:
[...]
node.previousSibling.appendChild(node)

Bzzt: destructive iteration gotcha.

DOM NodeLists are 'live': when you move a child Element out of the parent,
it no longer exists in the childNodes list. So in the example:

<a/>

<c/>

the first element (a) cannot be moved and is skipped; the second element (b)
is moved into its previousSibling (a); the third element... wait, there is no
third element any more because (c) is now the second element. So the loop
stops.

A solution would be to make a static copy of the list beforehand. There's no
standard-DOM way of doing that and the Python copy() method is not guaranteed
to work here, so use a list comprehension or map:

identity= lambda x: x
for node in map(identity, nodeList):
...

John J. Lee · Nov 25, 2003

(e-mail address removed) (deglog) wrote: [...]
A solution would be to make a static copy of the list beforehand. There's no
standard-DOM way of doing that and the Python copy() method is not guaranteed
to work here, so use a list comprehension or map:

identity= lambda x: x
for node in map(identity, nodeList):
...

Why not just

for node in list(nodeList):
...

?

John

deglog · Nov 26, 2003

Thanks for the help - this works and i understand how, and why.

Why not just

for node in list(nodeList):
...

?

John

the following also works (as i intended):

from xml.dom.NodeFilter import NodeFilter

def appendToDescendant(node):
walker.previousSibling()
while 1:
if walker.currentNode.hasChildNodes():
next = walker.nextNode()
else: break
walker.currentNode.appendChild(node)

walker = doc.createTreeWalker(doc.documentElement,NodeFilter.SHOW_ELEMENT,
None, 0)
while 1:
print walker.currentNode.nodeName
if walker.currentNode.previousSibling != None:
print "ps "+walker.currentNode.previousSibling.nodeName
if walker.currentNode.previousSibling.nodeName != "Game":
if walker.currentNode.previousSibling.hasChildNodes():
appendToDescendant(walker.currentNode)
else:
walker.currentNode.previousSibling.appendChild(walker.currentNode)
next = walker.nextNode()
if next is None: break

Strangely, the line checking "Game" is needed, because this firstnode
is its own previous sibling - how can this be right?

for example with the input file:

Andrew Clover · Nov 27, 2003

John J. Lee said:
Why not just for node in list(nodeList)?

You're right! I never trusted list() to make a copy if it was already a
native list (as it is sometimes in eg. minidom) but, bothering to check the
docs, it is guaranteed to after all. Hurrah.

def appendToDescendant(node):
walker.previousSibling()
while 1:
if walker.currentNode.hasChildNodes():
next = walker.nextNode()
else: break
walker.currentNode.appendChild(node)

Are you sure this is doing what you want? A TreeWalker's nextNode() method
goes to an node's next matching sibling, not into its children. To go into
the matching children you'd use TreeWalker.firstChild().

The function as written above appends the argument node to the first sibling
to have no child nodes, starting from the TreeWalker's current node or its
previous sibling if there is one.

I'm not wholly sure I understand the problem you're trying to solve. If you
just want to nest sibling elements as first children, you could do it without
Traversal or recursion, for example:

def nestChildrenIntoFirstElements(parent):
elements= [c for c in parent.childNodes if c.nodeType==c.ELEMENT_NODE]
if len(elements)>=2:
insertionPoint= elements[0]
for element in elements[1:]:
insertionPoint.appendChild(element)
insertionPoint= element

(Untested but no reason it shouldn't work.)

Strangely, the line checking "Game" is needed, because this firstnode
is its own previous sibling - how can this be right?

4DOM is fooling you. It has inserted a <!DOCTYPE> declaration automatically
for you. (It probably shouldn't do that.) So the previous sibling of the
documentElement is the doctype; of course the doctype has the same nodeName
as the documentElement, so the debugging output is misleading.

deglog · Nov 28, 2003

Are you sure this is doing what you want? A TreeWalker's nextNode() method
goes to an node's next matching sibling, not into its children. To go into
the matching children you'd use TreeWalker.firstChild().
right

I'm not wholly sure I understand the problem you're trying to solve.

actually i'm trying to change the relationship 'is next sibling of' to
'is child of' throughout a document

my latest idea is to go to the end of the document, then walk it
backwards (for christmas?

towards this end i wrote:
---
walker = doc.createTreeWalker(doc.documentElement,NodeFilter.SHOW_ELEMENT,
None, 0)
while 1:
print '1 '+walker.currentNode.nodeName
next = walker.nextNode()
if next is None: break
print '2 '+walker.currentNode.nodeName

Andrew Clover · Nov 29, 2003

actually i'm trying to change the relationship 'is next sibling of' to
'is child of' throughout a document

Well, the snippet in the posting above should do that well enough. What
happens to any existing nested children is not defined.

How come the current node is back at the start atfter the loop has finished?

Bug. I've just submitted a patch to the PyXML tracker to address this issue.

(Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
bugs, that can lead to infinite recursion.)

That said, I'm not sure how using a TreeWalker or walking backwards actually
helps you here! If you are just using it to filter out non-element children,
remember that moving the current node takes the position of the TreeWalker
with it. It's not like NodeIterator.

deglog · Nov 30, 2003

Bug. I've just submitted a patch to the PyXML tracker to address this issue.

(Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
bugs, that can lead to infinite recursion.)

Thanks.

Does the function def __regress(self) from the same package need a similar fix?

(i am using PyXml 0.8.3)

Andrew Clover · Dec 1, 2003

Does the function def __regress(self) from the same package need a similar
fix?

Nope, looks OK to me. There's no 'in between' state where the current node
ends up pointing somewhere it shouldn't in this one, because of the different
order of the next/previous-sibling step and the move-through-ancestor/descendant
step.

I haven't checked all of the rest of the code, though, so I can't guarantee
there aren't any other problems with 4DOM's Traversal/Range implementation.

problems with xml parsing (python 3.3)	5	Oct 28, 2012
parse xml	5	Oct 15, 2010
Finding all instances of a string in an XML file	0	Jun 21, 2013
Splitting a DOM	4	Feb 12, 2004
PyXML difficulties	1	May 21, 2009
XML parsing ExpatError with xml.dom.minidom at line 1, column 0	2	Feb 13, 2014
Please help me understand this DOM thing	2	Mar 4, 2008
How to remove an empty line which is created when i deleted a element from my xml file?	0	Oct 1, 2016

python xml dom help please

deglog

Miklós

Diez B. Roggisch

Diez B. Roggisch

Andrew Clover

John J. Lee

deglog

Andrew Clover

deglog

Andrew Clover

deglog

Andrew Clover

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads