Help with fast tree-like structure

D

David Hirschfield

I've written a tree-like data structure that stores arbitrary python
objects. The objective was for the tree structure to allow any number of
children per node, and any number of root nodes...and for it to be
speedy for trees with thousands of nodes.

At its core, the structure is just a list of lists arranged so that if
node A has children B and C and node B has child D the data looks like:

A = [<A node data>, B, C]
B = [<B node data>, D]
C = [<C node data>]

where B, C and D are all lists with similar structures to A. I am
holding references to the individual nodes so that access to individual
nodes by reference is quick. Access by "tree path" is done by giving a
tuple of integers indicating where in the tree the node you want lies.
The path (1,2,5) indicates the 6th child of the 3rd child of the 2nd
root node. Everything involving basic tree functions works fine. Finding
any particular node this way is just a function of the depth of the node
in the tree, so it's very quick unless you have some degenerate tree
structure where nodes end up miles deep.

Here's my problem: I need to allow the tree to "hide" the roots, so that
the top-level appears to the outside world to be the children under the
root nodes, not the root nodes themselves. That means a path of (5,2)
indicates the 3rd child of the 6th "pseudo-root" node. Unfortunately, in
a tree with many root nodes, each containing many children (a common use
case for me) finding the 6th pseudo-root is going to be slow, and the
ways I've thought of to make it fast all require slow bookkeeping when
new nodes are inserted or removed at the pseudo-root level.

I'm running out of ideas, so if anyone has any thoughts on how to deal
with fudging which nodes are the roots efficiently...I'm all ears.
Thanks in advance,
-David
 
L

Lonnie Princehouse

B = [<B node data>, D]
C = [<C node data>]

Why store node data in the same list that contains the children? It
seems like the OO approach would be more extensible, e.g.

class Node:
def __init__(self):
self.children = [] # list of nodes
# store node data as attributes...

# If you wanted to, you could even implement the [] operator for the
exact same syntax:

def __getitem__(self, i):
return self.children

(Note that using slots will make the above class more efficient,
particularly for trees with many nodes)

Of course, that doesn't answer your question. If I've interpreted it
right, then what you're trying to do could be restated as "make
multiple nodes at the root tier appear to be a single node", the
implication of which is that finding the n-th child of this meta-node
is now going to take O(R) time, where R is the number of concealed
roots. You could make a new list of pseudo-roots everytime one is
inserted or deleted --- the book-keeping you refer to, probably --- and
that would make your searches constant time at the expense of inserts
and deletes which are ~linear w.r.t the total number of pseudo-roots.
Not really any way around it... the best approach will likely be
determined by frequency of search operations vs. frequency of
insert/deletes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top