A
Antoninus Twink
This occurred to me on another thread when Ben Paff's name came up.
Like most people, I have my own implementation of a generic balanced
tree (a simple red/black tree, in fact) that I rolled once many years
ago and have used in many projects ever since.
Over the summer, I was reading an interesting article about "Left
leaning red black trees" by Robert Sedgewick (of "Algorithms in C" fame,
though he seems to have moved to Java now - blech).
These are just ordinary red black trees, except that red links have to
lean left. The main benefit he claims for this is that the insert and
delete functions are less complex and shorter, because removing this
ambiguity essentially halves the number of cases you need to treat at
crucial points.
Of course, asymptotically there is no difference between the standard
tree functions for RB trees or LLRB trees, and Sedgewick claims that the
extra work to flip colors if an insertion or deletion produces a
right-leaning red link is negligible.
As I had some spare time over the summer, I thought it would be fun to
try implementing this. I found two things:
1) Despite there being detailed pseudocode on Sedgewick's website, the
actual implementation is still quite tricky. It brought back horrible
memories of working through all the nasty corner cases to debug my
original RB code. It's so long ago that I can't say whether the LLRBs
were slightly easier or not, but it's certainly not as trivial as he
makes out.
2) Compared to ordinary RB trees, the left leaning variety were *dog*
slow. For some reason, the extra balancing step needed to maintain the
left-leaning condition meant that a simple "read a large unsorted list
into the tree and traverse the tree to print it in sorted order" took 15
to 20% longer for the LLRB tree.
I never figured out whether that was a problem with my implementation,
or whether trying to cost the expected number of lean-balances would
reveal that it's genuinely a problem, or what. At that point, real life
intervened, and I don't think I'll be abandonding my trusty old RB trees
any time soon.
I'd be interested to hear if anyone else has experimented with this, and
what their experiences were. My guess is that Sedgewick is working in
Java, where the huge extra overheads mask this extra time, but that in C
it becomes a significant efficiency factor.
Like most people, I have my own implementation of a generic balanced
tree (a simple red/black tree, in fact) that I rolled once many years
ago and have used in many projects ever since.
Over the summer, I was reading an interesting article about "Left
leaning red black trees" by Robert Sedgewick (of "Algorithms in C" fame,
though he seems to have moved to Java now - blech).
These are just ordinary red black trees, except that red links have to
lean left. The main benefit he claims for this is that the insert and
delete functions are less complex and shorter, because removing this
ambiguity essentially halves the number of cases you need to treat at
crucial points.
Of course, asymptotically there is no difference between the standard
tree functions for RB trees or LLRB trees, and Sedgewick claims that the
extra work to flip colors if an insertion or deletion produces a
right-leaning red link is negligible.
As I had some spare time over the summer, I thought it would be fun to
try implementing this. I found two things:
1) Despite there being detailed pseudocode on Sedgewick's website, the
actual implementation is still quite tricky. It brought back horrible
memories of working through all the nasty corner cases to debug my
original RB code. It's so long ago that I can't say whether the LLRBs
were slightly easier or not, but it's certainly not as trivial as he
makes out.
2) Compared to ordinary RB trees, the left leaning variety were *dog*
slow. For some reason, the extra balancing step needed to maintain the
left-leaning condition meant that a simple "read a large unsorted list
into the tree and traverse the tree to print it in sorted order" took 15
to 20% longer for the LLRB tree.
I never figured out whether that was a problem with my implementation,
or whether trying to cost the expected number of lean-balances would
reveal that it's genuinely a problem, or what. At that point, real life
intervened, and I don't think I'll be abandonding my trusty old RB trees
any time soon.
I'd be interested to hear if anyone else has experimented with this, and
what their experiences were. My guess is that Sedgewick is working in
Java, where the huge extra overheads mask this extra time, but that in C
it becomes a significant efficiency factor.