programmnig advise needed

M

mitsura

Hi,

I am writing a Python program that needs to read XML files and contruct
a tree object from the XML file (using wxTree).
The XML however is not an hiearchical XML file. It contains <elements>
and <association> tags. The <assiociation> tags link the elements
together.
Example:
<element1>
... element 1 attributes
</element1>
<element2>
... element 2 attributes
</element2>
<element3>
... element 3 attributes
</element3>
<association>
<Name>element1</Name>
<Parent>element3</Parent>
</association>
<association>
<Name>element2</Name>
<Parent>element3</Parent>
</association>

In this case <element3> is the parent of the two other elements.
The XML files I receive often contain thousands of elements so the tree
structure can be very large. Futhermore, the XML files are not
'sorted', like in the example above, the 'root' element object entry
might be somewhere in the middle of the XML file.

I don't really now to code this so any suggestions are welcome.

With kind regards,

Kris
 
A

Andrea Griffini

I am writing a Python program that needs to read XML files and contruct
a tree object from the XML file (using wxTree).

Supposing your XML file has a single top level node (so that it's a
legal XML file) then the following code should be able to read it...

#########
from elementtree import ElementTree as ET

class Node:
def __init__(self, xmlnode):
self.xmlnode = xmlnode
self.children = []

def loadtree(f):
root = ET.parse(f).getroot()
nodes = {}
for x in root:
if x.tag.startswith("element"):
nodes[x.tag] = Node(x)
for x in root.findall("association"):
name = x.find("Name").text
parent = x.find("Parent").text
nodes[parent].children.append(nodes.pop(name))
assert len(nodes) == 1
return nodes.popitem()[1]
##########

The idea is to create a node with an empty list of
logical children and then for every association
removing the child node from the global pool (a dict
indexed by node name) to place it in its parent.
I assumed that the nodes are in random order, but
that the associations are sorted bottom-up in
the tree. If this is not the case then you should
keep TWO dicts, removing from only one of them
the children when you process an association,
and looking up the parent in the *other* dict that
is not changed during processing of associations.
The dict from who you are removing the children
will allow you to detect logical errors in the file
(i.e. a node having two parents - you won't find
that node in the dict the second time - and absence
of a single root - you won't end up with a single
element in the dict after processing all
associations -).

HTH
Andrea
 
M

mitsura

Thanks for the feedback! It is a very good idea to use dictionaries. I
will try to implemented it this way.

Thanks,

Kris


Andrea Griffini schreef:
I am writing a Python program that needs to read XML files and contruct
a tree object from the XML file (using wxTree).

Supposing your XML file has a single top level node (so that it's a
legal XML file) then the following code should be able to read it...

#########
from elementtree import ElementTree as ET

class Node:
def __init__(self, xmlnode):
self.xmlnode = xmlnode
self.children = []

def loadtree(f):
root = ET.parse(f).getroot()
nodes = {}
for x in root:
if x.tag.startswith("element"):
nodes[x.tag] = Node(x)
for x in root.findall("association"):
name = x.find("Name").text
parent = x.find("Parent").text
nodes[parent].children.append(nodes.pop(name))
assert len(nodes) == 1
return nodes.popitem()[1]
##########

The idea is to create a node with an empty list of
logical children and then for every association
removing the child node from the global pool (a dict
indexed by node name) to place it in its parent.
I assumed that the nodes are in random order, but
that the associations are sorted bottom-up in
the tree. If this is not the case then you should
keep TWO dicts, removing from only one of them
the children when you process an association,
and looking up the parent in the *other* dict that
is not changed during processing of associations.
The dict from who you are removing the children
will allow you to detect logical errors in the file
(i.e. a node having two parents - you won't find
that node in the dict the second time - and absence
of a single root - you won't end up with a single
element in the dict after processing all
associations -).

HTH
Andrea
 
M

Magnus Lycka

Hi,

I am writing a Python program that needs to read XML files and contruct
a tree object from the XML file (using wxTree).
The XML however is not an hiearchical XML file. It contains <elements>
and <association> tags. The <assiociation> tags link the elements
together.

Are you sure that you get a tree? As I understand your description,
the XML format could well be used to describe arbitrary directed
graphs, even cyclic ones. This might not be a problem unless someone
tries to expand the entire tree...but you should probably have
some kind of cycle check, or disallow "expand all" in the GUI and
make sure you don't create the nodes before they are expanded.

If you really just permit trees, it's fairly simple, because a
node should just appear as a <Name> in an association once. You
can put those in a set, and check that they aren't already in the
set before you accept the content in a new association. Something
like the following semi-pseudo-code:

nodes = set()

for name, parent in associations:
if name in nodes:
raise KeyError, ("%s is already in the tree. "
"Can't put it under %s." % (name, parent)
nodes.add(name)
whatever...

If you want to allow arbitrary directed acyclic graphs (DAGs),
you have to work a little more. E.g. if you want to allow things
like:

A
..B
...C
....D
....E
..F
...C
....D
....E

(I know that you said a tree, but I've certainly met clever and
well educated people before who called DAGs trees...)

For DAGs, you might traverse the tree to make sure there are
no cycles, but there are probably better solutions as well. As
I said, if it's really trees you want, this is not a problem.

Actually, another risk is that your document describes several
disjunct structures... Oh well, that's another issue, and you
can always argue how defensive your code needs to be. It obviously
depends on how well you trust your input and the implications
of problems. Anyway, assuming cycle checks, it's easy to find
roots: They are the keys in the dictionary that don't occur as
values. If there's just one, you just have one tree.

To summarize: If you want to describe a tree structure in XML,
it's probably a good idea to use the natural hierarchy of XML
tags to do this. That way the file can't convey a non-tree
structure. If you need to use a construct like you describe
in your XML, this suggests that you need to convey non-tree
structures, and then you probably need to handle those as well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top