Red Black Tree implementation?

D

Dan Stromberg

What's the best Red Black Tree implementation for Python with an opensource
license?

I started out looking at
http://newcenturycomputers.net/projects/rbtree.htmlbecause it was
pretty high in Google and had the operators I wanted, but it
gets very slow at about half a million elements. I've been discussing this
with a C programmer who believes that Red Black Trees should perform very
similarly to an AVL tree, but that's not at all what I'm getting with the
newcenturycomputers implementation.

I'd prefer something that looks like a dictionary, runs on 2.x and 3.x, and
passes pylint, but if that's not yet available I might make it so.

This is part of a comparison of Python tree types I did a while back...
I've been thinking that I've given Red Black Trees short shrift by using a
poor implementation. The comparison so far is at
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/

Thanks!
 
D

duncan smith

What's the best Red Black Tree implementation for Python with an
opensource license?

I started out looking at
http://newcenturycomputers.net/projects/rbtree.html because it was
pretty high in Google and had the operators I wanted, but it gets very
slow at about half a million elements. I've been discussing this with a
C programmer who believes that Red Black Trees should perform very
similarly to an AVL tree, but that's not at all what I'm getting with
the newcenturycomputers implementation.

I'd prefer something that looks like a dictionary, runs on 2.x and 3.x,
and passes pylint, but if that's not yet available I might make it so.

This is part of a comparison of Python tree types I did a while back...
I've been thinking that I've given Red Black Trees short shrift by using
a poor implementation. The comparison so far is at
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/

Thanks!


I have an implementation that you can try out. It's not based on any
other implementation, so my bugs will be independent of any bugs in the
code you're currently using. It looks more like a set - add, remove,
discard. Not tried on Python 3 or run through pylint. I just tried
adding a million items to a tree, and it takes about 25% longer to add
items at the end compared to those at the beginning. Timing removals
uncovered a bug. So if you want the code I'll fix the bug and send it
(to your gmail e-mail address?). Cheers.

Duncan
 
D

Dan Stromberg

I have an implementation that you can try out. It's not based on any other
implementation, so my bugs will be independent of any bugs in the code
you're currently using. It looks more like a set - add, remove, discard.
Not tried on Python 3 or run through pylint. I just tried adding a million
items to a tree, and it takes about 25% longer to add items at the end
compared to those at the beginning. Timing removals uncovered a bug. So if
you want the code I'll fix the bug and send it (to your gmail e-mail
address?). Cheers.

Duncan

What license?

Thanks!
 
D

duncan smith

On Wed, May 1, 2013 at 7:06 PM, duncan smith <[email protected]

I have an implementation that you can try out. It's not based on any
other implementation, so my bugs will be independent of any bugs in
the code you're currently using. It looks more like a set - add,
remove, discard. Not tried on Python 3 or run through pylint. I just
tried adding a million items to a tree, and it takes about 25%
longer to add items at the end compared to those at the beginning.
Timing removals uncovered a bug. So if you want the code I'll fix
the bug and send it (to your gmail e-mail address?). Cheers.

Duncan
--
http://mail.python.org/__mailman/listinfo/python-list
<http://mail.python.org/mailman/listinfo/python-list>


What license?

Thanks!

Here's the text I usually prepend.


##Copyright (c) 2013 duncan g. smith
##
##Permission is hereby granted, free of charge, to any person obtaining a
##copy of this software and associated documentation files (the "Software"),
##to deal in the Software without restriction, including without limitation
##the rights to use, copy, modify, merge, publish, distribute, sublicense,
##and/or sell copies of the Software, and to permit persons to whom the
##Software is furnished to do so, subject to the following conditions:
##
##The above copyright notice and this permission notice shall be included
##in all copies or substantial portions of the Software.
##
##THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
##OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
##FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
##THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
##OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
##ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
##OTHER DEALINGS IN THE SOFTWARE.


Basically, "do what you want with it but don't blame me if it goes tits
up". I'm happy to consider tidying it up a bit and using a more
recognized form of licence. Just had a bank holiday here, so bug not yet
squashed. But it is the sort of bug that might account for what you've
seen (if a similar bug exists in the code you've been using). The tree
doesn't always get properly rebalanced on node removals. I'll attack the
problem later tomorrow (technically, later today). Cheers.

Duncan
 
C

Chris Angelico

Here's the text I usually prepend.


##Copyright (c) 2013 duncan g. smith
##
##Permission is hereby granted, free of charge, to any person obtaining a
##copy of this software and associated documentation files (the "Software"),
##to deal in the Software without restriction, including without limitation
##the rights to use, copy, modify, merge, publish, distribute, sublicense,
##and/or sell copies of the Software, and to permit persons to whom the
##Software is furnished to do so, subject to the following conditions:
##
##The above copyright notice and this permission notice shall be included
##in all copies or substantial portions of the Software.
##
##THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
##OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
##FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
##THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
##OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
##ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
##OTHER DEALINGS IN THE SOFTWARE.


Basically, "do what you want with it but don't blame me if it goes tits up".
I'm happy to consider tidying it up a bit and using a more recognized form
of licence.

Is that the MIT license? If not, consider using it; it's well known
and trusted. I haven't eyeballed yours closely but it looks extremely
similar, at least.

ChrisA
 
D

duncan smith

[snip]

I'm starting to think Red Black Trees are pretty complex.

A while ago I looked at a few different types of self-balancing binary
tree. Most look much easier to implement.

BTW, the licence might be MIT - I just copied it from someone else's code.

Duncan
 
D

duncan smith

On Mon, May 6, 2013 at 5:55 PM, duncan smith <[email protected]


[snip]


I'd prefer Apache or MIT or BSD 3-clause, but I could probably work with
this.
http://joinup.ec.europa.eu/community/eupl/news/licence-proliferation-way-out

I'm eager to see the code, and would love it if you sorted out the
deletion rebalance issue.

I just plunked some time into
https://github.com/headius/redblack/blob/master/red_black_tree.py , only
to find that it didn't appear to be doing deletions correctly - the tree
would become unprintable after deleting one element. It's possible I
introduced the bug, but right now I don't particularly suspect so,
having not changed the __del__ method.

I'm starting to think Red Black Trees are pretty complex.

Mine is fixed now (sent to your gmail address). Restoring the tree
properties after deletion is awkward to get right, and doesn't affect
the performance noticeably for smallish trees if you get it wrong.

I realised my code was buggy when I tried adding, then removing a
million items and ran into the recursion limit. It now passes a test
where I check the tree properties after each addition / deletion.

Duncan
 
D

Dan Stromberg

OK, I've got one copy of trees.py with md5
211f80c0fe7fb9cb42feb9645b4b3ffe. You seem to be saying I should have
two though, but I don't know that I do...


On Mon, May 6, 2013 at 5:55 PM, duncan smith <[email protected]


[snip]


I'd prefer Apache or MIT or BSD 3-clause, but I could probably work with
this.
http://joinup.ec.europa.eu/community/eupl/news/licence-proliferation-way-out

I'm eager to see the code, and would love it if you sorted out the
deletion rebalance issue.

I just plunked some time into
https://github.com/headius/redblack/blob/master/red_black_tree.py , only
to find that it didn't appear to be doing deletions correctly - the tree
would become unprintable after deleting one element. It's possible I
introduced the bug, but right now I don't particularly suspect so,
having not changed the __del__ method.

I'm starting to think Red Black Trees are pretty complex.

Mine is fixed now (sent to your gmail address). Restoring the tree
properties after deletion is awkward to get right, and doesn't affect
the performance noticeably for smallish trees if you get it wrong.

I realised my code was buggy when I tried adding, then removing a
million items and ran into the recursion limit. It now passes a test
where I check the tree properties after each addition / deletion.

Duncan
 
D

duncan smith

OK, I've got one copy of trees.py with md5
211f80c0fe7fb9cb42feb9645b4b3ffe. You seem to be saying I should have
two though, but I don't know that I do...

I've just re-sent it.

Duncan
 
D

duncan smith

OK, I've got one copy of trees.py with md5
211f80c0fe7fb9cb42feb9645b4b3ffe. You seem to be saying I should have
two though, but I don't know that I do...

[snip]

Yes, 211f80c0fe7fb9cb42feb9645b4b3ffe is the correct checksum for the
latest version. The previous version had an issue when adding
non-distinct items (items that compare equal to items already in the
tree). Cheers.

Duncan
 
D

Dan Stromberg

I'm afraid I'm having some trouble with the module. I've checked it into
my SVN at http://stromberg.dnsalias.org/svn/red-black-tree-mod/trunk/duncan

I have two versions of your tests in there now - "t" is minimally changed,
and test-red_black_tree_mod is pretty restructured to facilitate adding
more tests later. I get the same problem with either version of the tests.

The problem I'm seeing is that the tree, when built from items, isn't
looking quite right. I inserted a print(tree) into the for loop, and I'm
getting the following, where I expected the tree to grow by one element on
each iteration:

$ python t
6 False None None
6 False 3 None
6 False 3 15
6 False 3 15
6 False 3 11
6 False 3 11
6 False 3 11
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15

Thoughts?

BTW, printing an empty tree seems to say "sentinel". 'not sure if that was
intended.

Thanks!



OK, I've got one copy of trees.py with md5
211f80c0fe7fb9cb42feb9645b4b3f**fe. You seem to be saying I should have
two though, but I don't know that I do...
[snip]

Yes, 211f80c0fe7fb9cb42feb9645b4b3f**fe is the correct checksum for the
latest version. The previous version had an issue when adding non-distinct
items (items that compare equal to items already in the tree). Cheers.

Duncan
 
D

Dan Stromberg

I'm afraid I'm having some trouble with the module. I've checked it into
my SVN at
http://stromberg.dnsalias.org/svn/red-black-tree-mod/trunk/duncan

I have two versions of your tests in there now - "t" is minimally changed,
and test-red_black_tree_mod is pretty restructured to facilitate adding
more tests later. I get the same problem with either version of the tests.

The problem I'm seeing is that the tree, when built from items, isn't
looking quite right. I inserted a print(tree) into the for loop, and I'm
getting the following, where I expected the tree to grow by one element on
each iteration:

$ python t
6 False None None
6 False 3 None
6 False 3 15
6 False 3 15
I figured out that this was printing a single node and some of its
attributes, not an entire tree. I changed it to print an entire tree using
self.in_order().

I've also changed around the comparisons a bit, to use a __cmp__ method but
still provide __eq__, __neq__ and a new __lt__.

I'm up against a new problem now that it'd be nice if you could look at:
In BinaryTree.find(), it sometimes compares the item being searched for
against None. In 2.x, this gives strange results, but may be benign in
this code. In 3.x, this raises an exception. I've added a comment about
this in the SVN repo I mentioned above.

You can see the traceback yourself with python3 test-red_black_tree_mod .

What should BinaryTree.find() do if it finds a data.node that is None?

Thanks!

PS: Is it about time we moved this discussion off python-list?
 
D

duncan smith

I'm afraid I'm having some trouble with the module. I've checked it
into my SVN at
http://stromberg.dnsalias.org/svn/red-black-tree-mod/trunk/duncan

I have two versions of your tests in there now - "t" is minimally
changed, and test-red_black_tree_mod is pretty restructured to
facilitate adding more tests later. I get the same problem with either
version of the tests.

The problem I'm seeing is that the tree, when built from items, isn't
looking quite right. I inserted a print(tree) into the for loop, and
I'm getting the following, where I expected the tree to grow by one
element on each iteration:

$ python t
6 False None None
6 False 3 None
6 False 3 15
6 False 3 15
6 False 3 11
6 False 3 11
6 False 3 11
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15
11 False 6 15

Thoughts?

BTW, printing an empty tree seems to say "sentinel". 'not sure if that
was intended.

Thanks!

The leaf node has parent equal to None. All tree nodes have two
children. One or both children may be sentinels, and a sentinel is
signified by having both left and right (children) equal to None. So an
empty tree is a sentinel node that is also root. So the string
"sentinel" is expected (although possibly not the most sensible option).

For non-sentinel nodes the string is generated by,

return '%s %s %s' % (self.data, self.left.data, self.right.data)

for the BinaryTree class, and by

return '%s %s %s %s' % (self.data, self.is_red, self.left.data,
self.right.data)

for the RedBlackTree class.


So what is being printed above is (in each case) the value contained in
the root node, followed by its colour (True if red), and the values
contained in the root node's left and right children.

The root node remains root, although it's value and its children (and
their values) might change due to tree rotations.

It looks OK to me. The empty tree would print "sentinel". After adding
the value 6 there is one tree node with sentinels as children (values
equal to None). Adding 3 results in 3 being the value of the root's left
child. It's right child is still a sentinel. Adding 15 results in that
value being assigned to the right child. Adding 9 results in no change
to the values in the root or its children. Adding 11 results in a tree
rotation and 11 becomes the value in the right child of the root. At a
later point a tree rotation results in the value of the root node being
changed.

I haven't implemented a way of representing the structure of the whole
red black tree. I would probably write some code to generate a dot file
and use that to generate a png. But you could add something like,

print tree.height, tree.size, list(tree)

and get output like,

0 1 [6]
1 2 [3, 6]
1 3 [3, 6, 15]
2 4 [3, 6, 9, 15]
3 5 [3, 6, 9, 11, 15]
4 6 [3, 6, 9, 11, 12, 15]
4 7 [3, 6, 9, 11, 12, 15, 16]
5 8 [3, 6, 9, 11, 12, 14, 15, 16]
5 9 [3, 6, 9, 11, 12, 14, 15, 16, 17]
5 10 [3, 6, 7, 9, 11, 12, 14, 15, 16, 17]
5 11 [3, 6, 7, 9, 11, 12, 14, 15, 16, 17, 18]
5 12 [3, 5, 6, 7, 9, 11, 12, 14, 15, 16, 17, 18]
5 13 [3, 5, 6, 7, 8, 9, 11, 12, 14, 15, 16, 17, 18]
6 14 [3, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18]
6 15 [0, 3, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18]
6 16 [0, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18]
6 17 [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18]
6 18 [-1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18]
6 19 [-1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]


It doesn't give you the structure, but it does show that it seems to be
growing reasonably. Cheers.

Duncan
 
D

duncan smith

On Sat, May 11, 2013 at 4:24 PM, Dan Stromberg <[email protected]


I'm afraid I'm having some trouble with the module. I've checked it
into my SVN at
http://stromberg.dnsalias.org/svn/red-black-tree-mod/trunk/duncan

I have two versions of your tests in there now - "t" is minimally
changed, and test-red_black_tree_mod is pretty restructured to
facilitate adding more tests later. I get the same problem with
either version of the tests.

The problem I'm seeing is that the tree, when built from items,
isn't looking quite right. I inserted a print(tree) into the for
loop, and I'm getting the following, where I expected the tree to
grow by one element on each iteration:

$ python t
6 False None None
6 False 3 None
6 False 3 15
6 False 3 15

I figured out that this was printing a single node and some of its
attributes, not an entire tree. I changed it to print an entire tree
using self.in_order().

Yes, I've just posted regarding that.
I've also changed around the comparisons a bit, to use a __cmp__ method
but still provide __eq__, __neq__ and a new __lt__.


I have implemented a lot (maybe all?) of the set methods in a subclass.
I should probably root that out and have a think about what should be in
the RedBlackTree class and what subclasses might look like.
I'm up against a new problem now that it'd be nice if you could look at:
In BinaryTree.find(), it sometimes compares the item being searched for
against None. In 2.x, this gives strange results, but may be benign in
this code. In 3.x, this raises an exception. I've added a comment
about this in the SVN repo I mentioned above.

You can see the traceback yourself with python3 test-red_black_tree_mod .

What should BinaryTree.find() do if it finds a data.node that is None?

A call to "find(data)" should find and return either a node containing
"data"; or the sentinel node where "data" should be added. It should not
get as far as the left or right child of a sentinel node (which would
equal None). I'll look at this tomorrow. I did have the truth value of a
node depending on it's data value (None implying False). Then I
considered the possibility of actually wanting None as a value in the
tree and changed it, so I could have introduced a bug here.
Thanks!

PS: Is it about time we moved this discussion off python-list?

Maybe. You have my official e-mail address. Cheers.

Duncan
 
D

duncan smith

(e-mail address removed)>> wrote:
[snip]
What should BinaryTree.find() do if it finds a data.node that is None?

A call to "find(data)" should find and return either a node containing
"data"; or the sentinel node where "data" should be added. It should not
get as far as the left or right child of a sentinel node (which would
equal None). I'll look at this tomorrow. I did have the truth value of a
node depending on it's data value (None implying False). Then I
considered the possibility of actually wanting None as a value in the
tree and changed it, so I could have introduced a bug here.

It's a Python3 thing. The initial sentinel node was evaluating to True.
__nonzero__ needs to be changed to __bool__.

Let's do that from now.

Duncan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top