Efficient binary search tree stored in a flat array?

D

Douglas Alan

I couldn't find a good algorithms forum on the Internet, so I guess
I'll ask this question here instead: Is it possible to efficiently
maintain a binary search tree in a flat array (i.e., without using
pointers), as is typically done for a binary heap?

It *is* possible, of course, to keep an ordered list in a flat array,
and then do a binary search on the ordered array, but then insertion
and deletion are O(n), rather than O(log n). It would also clearly be
possible to store a balanced binary tree in a flat array, storing the
children of the node at index i at indices 2*i and 2*i + 1, just as
one does for a binary heap. But if you do that, I don't know if you
could then do insertions and deletions in O(log n) time.

One idea that came to mind, is that maybe it is possible using a
"treap", which is a combination of a binary heap and a binary search
tree. Insertions and deletions in binary heaps can be done in O(log n)
in a flat array, but I don't know if this is also true for a treap,
since you also have the binary search tree invariants to maintain, in
addition to the binary heap invariants. For all I know, this might
cause rotations to no longer be O(log n).

|>ouglas
 
P

Paul Rubin

Douglas Alan said:
It would also clearly be
possible to store a balanced binary tree in a flat array, storing the
children of the node at index i at indices 2*i and 2*i + 1, just as
one does for a binary heap. But if you do that, I don't know if you
could then do insertions and deletions in O(log n) time.

Probably not. Maybe you could organize a 2-3 tree like that, at the
expense of some space.
 
A

Aahz

I couldn't find a good algorithms forum on the Internet, so I guess
I'll ask this question here instead: Is it possible to efficiently
maintain a binary search tree in a flat array (i.e., without using
pointers), as is typically done for a binary heap?

It *is* possible, of course, to keep an ordered list in a flat array,
and then do a binary search on the ordered array, but then insertion
and deletion are O(n), rather than O(log n).

Still, unless your list is large (more than thousands of elements),
that's the way you should go. See the bisect module. Thing is, the
speed difference between C and Python means the constant for insertion
and deletion is very very small relative to bytecode speed. Keep in
mind that Python's object/binding model means that you're shuffling
pointers in the list rather than items.
 
D

Douglas Alan

Still, unless your list is large (more than thousands of elements),
that's the way you should go.  See the bisect module.  Thing is, the
speed difference between C and Python means the constant for insertion
and deletion is very very small relative to bytecode speed.  Keep in
mind that Python's object/binding model means that you're shuffling
pointers in the list rather than items.

Thank you. My question wasn't intended to be Python specific, though.
I am just curious for purely academic reasons about whether there is
such an algorithm. All the sources I've skimmed only seem to the
answer the question via omission. Which is kind of strange, since it
seems to me like an obvious question to ask.

If I find the free time, I might try to work out myself whether it can
be done with a treap.

|>ouglas
 
F

Florian Brucker

Douglas said:
Thank you. My question wasn't intended to be Python specific, though.
I am just curious for purely academic reasons about whether there is
such an algorithm. All the sources I've skimmed only seem to the
answer the question via omission. Which is kind of strange, since it
seems to me like an obvious question to ask.

IIRC comp.programming would be the place to ask such questions.


HTH,
Florian
 
P

Piet van Oostrum

DA> Thank you. My question wasn't intended to be Python specific, though.
DA> I am just curious for purely academic reasons about whether there is
DA> such an algorithm. All the sources I've skimmed only seem to the
DA> answer the question via omission. Which is kind of strange, since it
DA> seems to me like an obvious question to ask.

Of course you can take any BST algorithm and replace pointers by indices
in the array and allocate new elements in the array. But then you need
array elements to contain the indices for the children explicitely.
 
D

Douglas Alan

Douglas Alan wrote:
IIRC comp.programming would be the place to ask such questions.

Thanks, yes that does look like a good place to post such questions.
Unfortunately, it also looks to be overrun with stories on "hot girls
top and bottom sexy movie", though I'm sure I can ignore those. I'm
scared to look at the posting on "tricky bit twiddling", though.

|>ouglas
 
D

Douglas Alan

Of course you can take any BST algorithm and replace pointers by indices
in the array and allocate new elements in the array. But then you need
array elements to contain the indices for the children explicitely.

And why is this a problem? This is how binary heaps are typically
implemented, and it all works swimmingly. The node rotations for
keeping a binary heap balanced turn out to be suitable for
representation in a flat array. I.e., when you do the node rotations,
you only ever have to copy log n array elements.

In general, however, you can't move nodes around so easily when
represented in a flat array. A node movement in a tree represented
with pointers, might involves changing just two pointers, while when
represented as a flat array, might involve copying most of the array
to maintain the tree invariants. It just so turns out that maintaining
the invariants for a binary heap does not have this issue.

This is why I was curious about treaps, which are a type of binary
heap. The CLRS textbook on algorithms discusses treaps, but doesn't
ever mention whether they are as fortunate as less constrained binary
heaps. I'm sure I could work out for myself whether the treap
rotations are suitable for storage in a flat array, but I might make a
mistake somewhere in my reasoning, and then never know the true
answer!

|>ouglas
 
D

Douglas Alan

It may well be that there is no good simple solution, and people avoid
writing about non-existent algorithms.

I can't imagine why that should be the case. The CLRS textbook on
algorithms, for instance, goes to some pains to mathematically prove
that there is no comparison sort that can operate in faster than O(n
log n) time.

And any decent discussion of rabies would be sure to mention that
there is no known cure.

CLRS talks about binary heaps, binary search trees, and treaps, and it
shows how to maintain a binary heap in a flat array efficiently (n log
n time overall), but it never even seems to bring up the subject as to
whether a binary search tree or a treap can also be efficiently
maintained in a flat array.

Though it may be the case that these questions are left as exercises
for the student, and therefore buried in the exercises and problem
sets that I didn't read carefully.
Piet van Oostrum wrote:
And you loower your locality of reference (cache-friendliness).
Note the insert in Python, for example, is quite cache-friendly.

I can't see that a binary search tree would typically have
particularly good cache-friendliness, so I can't see why a flat-array
representation, such as is done for a binary heap, would have
particularly worse cache-reference.

|>ouglas
 
D

Douglas Alan

I said:
And why is this a problem?

Oh, I'm sorry -- I see what you are saying now. You're saying you can
just implement a normal binary search tree, but store the tree nodes
in an array, as if it were a chunk of memory, and use array indices as
pointers, rather than using memory addresses as pointers.

Fair enough, but I'm not sure what that would buy you. Other than,
perhaps some improved locality of reference, and the potential to
perhaps get the pointers take up less space if you know the array is
never going to grow to be very large.

|>ouglas
 
P

Paul Rubin

Douglas Alan said:
I can't see that a binary search tree would typically have
particularly good cache-friendliness, so I can't see why a flat-array
representation, such as is done for a binary heap, would have
particularly worse cache-reference.

That is a good point. Maybe we should be paying more attention to
cache-oblivious algorithms.

http://en.wikipedia.org/wiki/Cache-oblivious_algorithm

H. Prokop's masters' thesis cited in the wiki article explains the
subject very well. A fair amount of work has been done on it since
then, but not as much as one might expect.
 
P

Piet van Oostrum

DA> Oh, I'm sorry -- I see what you are saying now. You're saying you can
DA> just implement a normal binary search tree, but store the tree nodes
DA> in an array, as if it were a chunk of memory, and use array indices as
DA> pointers, rather than using memory addresses as pointers.
DA> Fair enough, but I'm not sure what that would buy you. Other than,
DA> perhaps some improved locality of reference, and the potential to
DA> perhaps get the pointers take up less space if you know the array is
DA> never going to grow to be very large.

My second sentence that you quoted more or less means `it doesn't buy
you much'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top