Left leaning red black trees

  • Thread starter Antoninus Twink
  • Start date
A

Antoninus Twink

This occurred to me on another thread when Ben Paff's name came up.

Like most people, I have my own implementation of a generic balanced
tree (a simple red/black tree, in fact) that I rolled once many years
ago and have used in many projects ever since.

Over the summer, I was reading an interesting article about "Left
leaning red black trees" by Robert Sedgewick (of "Algorithms in C" fame,
though he seems to have moved to Java now - blech).

These are just ordinary red black trees, except that red links have to
lean left. The main benefit he claims for this is that the insert and
delete functions are less complex and shorter, because removing this
ambiguity essentially halves the number of cases you need to treat at
crucial points.

Of course, asymptotically there is no difference between the standard
tree functions for RB trees or LLRB trees, and Sedgewick claims that the
extra work to flip colors if an insertion or deletion produces a
right-leaning red link is negligible.

As I had some spare time over the summer, I thought it would be fun to
try implementing this. I found two things:

1) Despite there being detailed pseudocode on Sedgewick's website, the
actual implementation is still quite tricky. It brought back horrible
memories of working through all the nasty corner cases to debug my
original RB code. It's so long ago that I can't say whether the LLRBs
were slightly easier or not, but it's certainly not as trivial as he
makes out.

2) Compared to ordinary RB trees, the left leaning variety were *dog*
slow. For some reason, the extra balancing step needed to maintain the
left-leaning condition meant that a simple "read a large unsorted list
into the tree and traverse the tree to print it in sorted order" took 15
to 20% longer for the LLRB tree.

I never figured out whether that was a problem with my implementation,
or whether trying to cost the expected number of lean-balances would
reveal that it's genuinely a problem, or what. At that point, real life
intervened, and I don't think I'll be abandonding my trusty old RB trees
any time soon.

I'd be interested to hear if anyone else has experimented with this, and
what their experiences were. My guess is that Sedgewick is working in
Java, where the huge extra overheads mask this extra time, but that in C
it becomes a significant efficiency factor.
 
G

Gene

This occurred to me on another thread when Ben Paff's name came up.

Like most people, I have my own implementation of a generic balanced
tree (a simple red/black tree, in fact) that I rolled once many years
ago and have used in many projects ever since.

Over the summer, I was reading an interesting article about "Left
leaning red black trees" by Robert Sedgewick (of "Algorithms in C" fame,
though he seems to have moved to Java now - blech).

These are just ordinary red black trees, except that red links have to
lean left.  The main benefit he claims for this is that the insert and
delete functions are less complex and shorter, because removing this
ambiguity essentially halves the number of cases you need to treat at
crucial points.

Of course, asymptotically there is no difference between the standard
tree functions for RB trees or LLRB trees, and Sedgewick claims that the
extra work to flip colors if an insertion or deletion produces a
right-leaning red link is negligible.

As I had some spare time over the summer, I thought it would be fun to
try implementing this. I found two things:

1) Despite there being detailed pseudocode on Sedgewick's website, the
actual implementation is still quite tricky. It brought back horrible
memories of working through all the nasty corner cases to debug my
original RB code. It's so long ago that I can't say whether the LLRBs
were slightly easier or not, but it's certainly not as trivial as he
makes out.

2) Compared to ordinary RB trees, the left leaning variety were *dog*
slow. For some reason, the extra balancing step needed to maintain the
left-leaning condition meant that a simple "read a large unsorted list
into the tree and traverse the tree to print it in sorted order" took 15
to 20% longer for the LLRB tree.

I never figured out whether that was a problem with my implementation,
or whether trying to cost the expected number of lean-balances would
reveal that it's genuinely a problem, or what. At that point, real life
intervened, and I don't think I'll be abandonding my trusty old RB trees
any time soon.

I'd be interested to hear if anyone else has experimented with this, and
what their experiences were. My guess is that Sedgewick is working in
Java, where the huge extra overheads mask this extra time, but that in C
it becomes a significant efficiency factor.

Here's another code to benchnark. The Sedgewick's paper at
http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf seems pretty clear
and coorect. I wrote mine using it in about 40 minutes without ever
having looked at the algorithm before. Of course it may not be
correct, but the test at the bottom ran clean on the first try, and
the tree depth looks about right.

#include <stdio.h>
#include <stdlib.h>

/* ----- public ----- */

typedef int (*LLRB_COMPARISON)(void *, void *);
typedef struct llrb_s {
struct llrb_node_s *root;
LLRB_COMPARISON comparison;
} LLRB_TREE;

void init(LLRB_TREE *tree, LLRB_COMPARISON comparison);
void *search(LLRB_TREE *tree, void *data);
int max_depth(LLRB_TREE *tree);

/* ----- private ----- */

#define RED 0
#define BLK 1

typedef struct llrb_node_s {
void *data;
struct llrb_node_s *left, *right;
int color;
} NODE;

void init(LLRB_TREE *tree, LLRB_COMPARISON comparison)
{
tree->root = NULL;
tree->comparison = comparison;
}

void *search(LLRB_TREE *tree, void *data)
{
NODE *x = tree->root;
while (x) {
int cmp = tree->comparison(data, x->data);
if (cmp < 0)
x = x->left;
else if (cmp > 0)
x = x->right;
else
return x->data;
}
return NULL;
}

static int do_max_depth(NODE *h)
{
int l, r;

if (h == NULL)
return 0;
l = do_max_depth(h->left);
r = do_max_depth(h->right);
return (l > r) ? l + 1 : r + 1;
}

int max_depth(LLRB_TREE *tree)
{
return do_max_depth(tree->root);
}

static void *safe_malloc(size_t n)
{
void *obj = malloc(n);
if (obj == NULL) {
fprintf(stderr, "out of memory\n");
exit(1);
}
return obj;
}

static NODE *new_node(void *data)
{
NODE *node = safe_malloc(sizeof *node);
node->data = data;
node->left = node->right = NULL;
node->color = RED;
return node;
}

static int is_red(NODE *h)
{
return h != NULL && h->color == RED;
}

static NODE *rotate_left(NODE *h)
{
NODE *x = h->right;
h->right = x->left;
x->left = h;
x->color = h->color;
h->color = RED;
return x;
}

static NODE *rotate_right(NODE *h)
{
NODE *x = h->left;
h->left = x->right;
x->right = h;
x->color = h->color;
h->color = RED;
return x;
}

static void flip_colors(NODE *h)
{
h->color ^= 1;
h->left->color ^= 1;
h->right->color ^= 1;
}

static NODE *do_insert(LLRB_TREE *tree, NODE *h, void *data)
{
if (h == NULL)
return new_node(data);

if (is_red(h->left) && is_red(h->right))
flip_colors(h);

int cmp = tree->comparison(data, h->data);
if (cmp < 0)
h->left = do_insert(tree, h->left, data);
else if (cmp > 0)
h->right = do_insert(tree, h->right, data);
else
h->data = data;

if (is_red(h->right) && !is_red(h->left))
h = rotate_left(h);
if (is_red(h->left) && is_red(h->left->left))
h = rotate_right(h);

return h;
}

static void insert(LLRB_TREE *tree, void *data)
{
tree->root = do_insert(tree, tree->root, data);
tree->root->color = BLK;
}

/* ----- test ----- */

#define N_DATA 10000

int int_comparison(void *a, void *b)
{
return *(int*)a - *(int*)b;
}

int main(void)
{
int i, *data;
LLRB_TREE tree[1];

data = safe_malloc(N_DATA * sizeof data[0]);
init(tree, int_comparison);
for (i = 0; i < N_DATA; i++) {
data = rand();
}
printf("starting insertion\n");
for (i = 0; i < N_DATA; i++) {
insert(tree, &data);
}
printf("done with insertion\n");
printf("max depth with %d nodes is %d\n", N_DATA, max_depth(tree));
printf("starting search...\n");
for (i = 0; i < N_DATA; i++) {
void *rtn = search(tree, &data);
if (rtn == NULL || *(int*)rtn != data) {
printf("could not find data[%d} = %d\n", i, data);
}
}
printf("done.\n");
}
 
G

Gene

This occurred to me on another thread when Ben Paff's name came up.

Like most people, I have my own implementation of a generic balanced
tree (a simple red/black tree, in fact) that I rolled once many years
ago and have used in many projects ever since.

Over the summer, I was reading an interesting article about "Left
leaning red black trees" by Robert Sedgewick (of "Algorithms in C" fame,
though he seems to have moved to Java now - blech).

These are just ordinary red black trees, except that red links have to
lean left.  The main benefit he claims for this is that the insert and
delete functions are less complex and shorter, because removing this
ambiguity essentially halves the number of cases you need to treat at
crucial points.

Of course, asymptotically there is no difference between the standard
tree functions for RB trees or LLRB trees, and Sedgewick claims that the
extra work to flip colors if an insertion or deletion produces a
right-leaning red link is negligible.

As I had some spare time over the summer, I thought it would be fun to
try implementing this. I found two things:

1) Despite there being detailed pseudocode on Sedgewick's website, the
actual implementation is still quite tricky. It brought back horrible
memories of working through all the nasty corner cases to debug my
original RB code. It's so long ago that I can't say whether the LLRBs
were slightly easier or not, but it's certainly not as trivial as he
makes out.

2) Compared to ordinary RB trees, the left leaning variety were *dog*
slow. For some reason, the extra balancing step needed to maintain the
left-leaning condition meant that a simple "read a large unsorted list
into the tree and traverse the tree to print it in sorted order" took 15
to 20% longer for the LLRB tree.

I never figured out whether that was a problem with my implementation,
or whether trying to cost the expected number of lean-balances would
reveal that it's genuinely a problem, or what. At that point, real life
intervened, and I don't think I'll be abandonding my trusty old RB trees
any time soon.

I'd be interested to hear if anyone else has experimented with this, and
what their experiences were. My guess is that Sedgewick is working in
Java, where the huge extra overheads mask this extra time, but that in C
it becomes a significant efficiency factor.

Here's another code to benchmark. In this kind of algorithm, small
things can make a big difference.

Sedgewick's paper at http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
seems pretty clear and correct. I wrote my code based on it very
quickly without ever having seen the algorithm before. Of course it
may have errors, but the test at the bottom ran cleanly on the first
try, and the tree depth looks about right.

#include <stdio.h>
#include <stdlib.h>

/* ----- public ----- */

typedef int (*LLRB_COMPARISON)(void *, void *);
typedef struct llrb_s {
struct llrb_node_s *root;
LLRB_COMPARISON comparison;
} LLRB_TREE;

void init(LLRB_TREE *tree, LLRB_COMPARISON comparison);
void *search(LLRB_TREE *tree, void *data);
int max_depth(LLRB_TREE *tree);

/* ----- private ----- */

#define RED 0
#define BLK 1

typedef struct llrb_node_s {
void *data;
struct llrb_node_s *left, *right;
int color;
} NODE;

void init(LLRB_TREE *tree, LLRB_COMPARISON comparison)
{
tree->root = NULL;
tree->comparison = comparison;
}

void *search(LLRB_TREE *tree, void *data)
{
NODE *x = tree->root;
while (x) {
int cmp = tree->comparison(data, x->data);
if (cmp < 0)
x = x->left;
else if (cmp > 0)
x = x->right;
else
return x->data;
}
return NULL;
}

static int do_max_depth(NODE *h)
{
int l, r;

if (h == NULL)
return 0;
l = do_max_depth(h->left);
r = do_max_depth(h->right);
return (l > r) ? l + 1 : r + 1;
}

int max_depth(LLRB_TREE *tree)
{
return do_max_depth(tree->root);
}

static void *safe_malloc(size_t n)
{
void *obj = malloc(n);
if (obj == NULL) {
fprintf(stderr, "out of memory\n");
exit(1);
}
return obj;
}

static NODE *new_node(void *data)
{
NODE *node = safe_malloc(sizeof *node);
node->data = data;
node->left = node->right = NULL;
node->color = RED;
return node;
}

static int is_red(NODE *h)
{
return h != NULL && h->color == RED;
}

static NODE *rotate_left(NODE *h)
{
NODE *x = h->right;
h->right = x->left;
x->left = h;
x->color = h->color;
h->color = RED;
return x;
}

static NODE *rotate_right(NODE *h)
{
NODE *x = h->left;
h->left = x->right;
x->right = h;
x->color = h->color;
h->color = RED;
return x;
}

static void flip_colors(NODE *h)
{
h->color ^= 1;
h->left->color ^= 1;
h->right->color ^= 1;
}

static NODE *do_insert(LLRB_TREE *tree, NODE *h, void *data)
{
if (h == NULL)
return new_node(data);

if (is_red(h->left) && is_red(h->right))
flip_colors(h);

int cmp = tree->comparison(data, h->data);
if (cmp < 0)
h->left = do_insert(tree, h->left, data);
else if (cmp > 0)
h->right = do_insert(tree, h->right, data);
else
h->data = data;

if (is_red(h->right) && !is_red(h->left))
h = rotate_left(h);
if (is_red(h->left) && is_red(h->left->left))
h = rotate_right(h);

return h;
}

static void insert(LLRB_TREE *tree, void *data)
{
tree->root = do_insert(tree, tree->root, data);
tree->root->color = BLK;
}

/* ----- test ----- */

#define N_DATA 10000

int int_comparison(void *a, void *b)
{
return *(int*)a - *(int*)b;
}

int main(void)
{
int i, *data;
LLRB_TREE tree[1];

data = safe_malloc(N_DATA * sizeof data[0]);
init(tree, int_comparison);
for (i = 0; i < N_DATA; i++) {
data = rand() ^ (rand() << 14);
}
printf("starting insertion\n");
for (i = 0; i < N_DATA; i++) {
if (search(tree, &data))
printf("dupe\n");
insert(tree, &data);
}
printf("done with insertion\n");
printf("max depth with %d nodes is %d\n", N_DATA, max_depth(tree));
printf("starting search...\n");
for (i = 0; i < N_DATA; i++) {
void *rtn = search(tree, &data);
if (rtn == NULL || *(int*)rtn != data) {
printf("could not find data[%d} = %d\n", i, data);
}
}
printf("done.\n");
}
 
J

jacob navia

d:\lcc\mc77\test>timethis llrb

TimeThis : Command Line : llrb
TimeThis : Start Time : Thu Dec 03 08:34:03 2009

starting insertion
done with insertion
max depth with 10,000,000 nodes is 21
starting search...
done.

TimeThis : Command Line : llrb
TimeThis : Start Time : Thu Dec 03 08:34:03 2009
TimeThis : End Time : Thu Dec 03 08:34:12 2009
TimeThis : Elapsed Time : 00:00:09.092

Nine seconds with optimizations ON. 10 million nodes
 
J

jacob navia

d:\lcc\mc77\test>timethis llrb1

TimeThis : Command Line : llrb1
TimeThis : Start Time : Thu Dec 03 08:41:00 2009

starting insertion
done with insertion
max depth with 10000000 nodes is 33
starting search...
done.

TimeThis : Command Line : llrb1
TimeThis : Start Time : Thu Dec 03 08:41:00 2009
TimeThis : End Time : Thu Dec 03 08:41:41 2009
TimeThis : Elapsed Time : 00:00:40.811

40 seconds with 10 million nodes, approx 400% more
than previous version.
 
N

Nick Keighley

d:\lcc\mc77\test>timethis llrb

TimeThis :  Command Line :  llrb
TimeThis :    Start Time :  Thu Dec 03 08:34:03 2009

starting insertion
done with insertion
max depth with 10,000,000 nodes is 21
starting search...
done.

TimeThis :  Command Line :  llrb
TimeThis :    Start Time :  Thu Dec 03 08:34:03 2009
TimeThis :      End Time :  Thu Dec 03 08:34:12 2009
TimeThis :  Elapsed Time :  00:00:09.092

Nine seconds with optimizations ON. 10 million nodes

what are you timeing and what was your conclusion?
 
N

Nick Keighley

d:\lcc\mc77\test>timethis llrb1

TimeThis :  Command Line :  llrb1
TimeThis :    Start Time :  Thu Dec 03 08:41:00 2009

starting insertion
done with insertion
max depth with 10000000 nodes is 33
starting search...
done.

TimeThis :  Command Line :  llrb1
TimeThis :    Start Time :  Thu Dec 03 08:41:00 2009
TimeThis :      End Time :  Thu Dec 03 08:41:41 2009
TimeThis :  Elapsed Time :  00:00:40.811

40 seconds with 10 million nodes, approx 400% more
than previous version.

you're timing Gene's implementaion of an llrb? Why do you think his
version is slow?
 
E

Edwin van den Oetelaar

Gene said:
This occurred to me on another thread when Ben Paff's name came up.

Like most people, I have my own implementation of a generic balanced
tree (a simple red/black tree, in fact) that I rolled once many years
ago and have used in many projects ever since.

Over the summer, I was reading an interesting article about "Left
leaning red black trees" by Robert Sedgewick (of "Algorithms in C" fame,
though he seems to have moved to Java now - blech).
I'd be interested to hear if anyone else has experimented with this, and
what their experiences were. My guess is that Sedgewick is working in
Java, where the huge extra overheads mask this extra time, but that in C
it becomes a significant efficiency factor.
[snip]


Here's another code to benchmark. In this kind of algorithm, small
things can make a big difference.

Sedgewick's paper at http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf
seems pretty clear and correct. I wrote my code based on it very
quickly without ever having seen the algorithm before. Of course it
may have errors, but the test at the bottom ran cleanly on the first
try, and the tree depth looks about right.

[snip code]

I am trying to learn this stuff.
I compiled your code with gcc

$ gcc -Wall -o llbrtree llbrtree.c
$ gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

Your code produces following output on my machine. (I did not expect this)

$ ./llbrtree

starting insertion
done with insertion
max depth with 10000 nodes is 18
starting search...
could not find data[1486} = 1186691323
could not find data[1496} = 1457538724
could not find data[9497} = 1356477751
could not find data[9508} = 1208283185
could not find data[9510} = 996187988
could not find data[9517} = 1563393912
could not find data[9528} = 1650810318
could not find data[9549} = 1178144129
etc
etc

Is there a bug in the code or in my setup?

Thanks,
Edwin
 
B

Ben Bacarisse

Edwin van den Oetelaar said:
Gene wrote:
[snipped code re-instated:]

|| int int_comparison(void *a, void *b)
|| {
|| return *(int*)a - *(int*)b;
|| }

I am trying to learn this stuff.
I compiled your code with gcc

$ gcc -Wall -o llbrtree llbrtree.c
$ gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

Your code produces following output on my machine. (I did not expect this)

$ ./llbrtree

starting insertion
done with insertion
max depth with 10000 nodes is 18
starting search...
could not find data[1486} = 1186691323
could not find data[1496} = 1457538724
could not find data[9497} = 1356477751
could not find data[9508} = 1208283185
could not find data[9510} = 996187988
could not find data[9517} = 1563393912
could not find data[9528} = 1650810318
could not find data[9549} = 1178144129
etc
etc

Is there a bug in the code or in my setup?

A bug I think. The comparison function overflows. Gene might had
tested with a library that has RAND_MAX small enough that the integers
used can be compared that way.

One alternative is to use:

return (*(int *)a > *(int *)b) - (*(int *)a < *(int *)b);
 
G

Gene

d:\lcc\mc77\test>timethis llrb

TimeThis :  Command Line :  llrb
TimeThis :    Start Time :  Thu Dec 03 08:34:03 2009

starting insertion
done with insertion
max depth with 10,000,000 nodes is 21
starting search...
done.

TimeThis :  Command Line :  llrb
TimeThis :    Start Time :  Thu Dec 03 08:34:03 2009
TimeThis :      End Time :  Thu Dec 03 08:34:12 2009
TimeThis :  Elapsed Time :  00:00:09.092

Nine seconds with optimizations ON. 10 million nodes

Cool Jacob. Thanks. Are you sure you are generating 10 million unique
random numbers? If your RNG has a smaller cycle, you are generating a
tree that has only as many nodes as the cycle.

Gene
 
G

Gene

Edwin van den Oetelaar said:
Gene wrote:

[snipped code re-instated:]

|| int int_comparison(void *a, void *b)
|| {
||  return *(int*)a - *(int*)b;
|| }

<snip>




I am trying to learn this stuff.
I compiled your code with gcc
$ gcc -Wall -o llbrtree llbrtree.c
$ gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
Your code produces following output on my machine. (I did not expect this)
$ ./llbrtree
starting insertion
done with insertion
max depth with 10000 nodes is 18
starting search...
could not find data[1486} = 1186691323
could not find data[1496} = 1457538724
could not find data[9497} = 1356477751
could not find data[9508} = 1208283185
could not find data[9510} = 996187988
could not find data[9517} = 1563393912
could not find data[9528} = 1650810318
could not find data[9549} = 1178144129
etc
etc
Is there a bug in the code or in my setup?

A bug I think.  The comparison function overflows.  Gene might had
tested with a library that has RAND_MAX small enough that the integers
used can be compared that way.

One alternative is to use:

  return (*(int *)a > *(int *)b) - (*(int *)a < *(int *)b);
Exactly. Thanks. RAND_MAX =~ 32K on my machine.

BTW, Sedgewick's codes for delete_min() and delete() do seem to be
broken for some edge cases where null pointers are not checked.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top