Builtin classes list, set, dict reimplemented via B-trees

B

barnesc

Hi again,

Since my linear algebra library appears not to serve any practical
need (I found cgkit, and that works better for me), I've gotten bored
and went back to one of my other projects: reimplementing the Python
builtin classes list(), set(), dict(), and frozenset() with balanced
trees (specifically, counted B-trees stored in memory).

In short, this allows list lookup, insertion, deletion in O(log(N))
time. It allows the set and dictionary types to be maintained in
order, and insert/lookup/remove operations here take O(log(N)) time
as well. Getting the k least or k greatest elements takes
O(log(N)+k) time.

I'm about 50% done with the implementation of this module.

I thought I'd use this for Dijkstra's algorithm and some schedulers,
but then I noticed that a binary heap can be retrofitted to have the
DECREASE-KEY operation (see e.g. David Eppstein's priorityQueue).
Thus my B-tree based classes aren't really necessary, since there are
simpler heap implementations that do the same thing.

So my question is: are there any other *practical* applications of a
B-tree based list/set/dict ? In other words, is this module totally
worth coding, or is it just academic wankery and theoretical flim
flam ? :)

Thanks for your comments/opinions/etc...

- Connelly Barnes
'Y29ubmVsbHliYXJuZXNAeWFob28uY29t'.decode('base64')


---------------------------------------------------------------------
Detailed overview of the bcollections module (from the docstring)
---------------------------------------------------------------------

"""
....

The bset() and bdict() classes improve upon the builtin set and
dictionary types by maintaining the elements in sorted order.

Generally, the b* classes are a factor of O(log(N)) slower[2] than the
corresponding builtin classes, except for certain operations:

- For a bset(), it takes O(log(N)) time to get the minimum, maximum,
or ith element. Also, it takes O(log(N)+k) time to get the
k first, k last, or any slice of k elements from the sorted set.
- For a bdict(), dictionary keys can be retrieved in sorted order
with the above time bounds.
- For a blist(), items can be inserted and removed in O(log(N)) time.
It takes O(log(N)+k) time to get, set, or delete a slice of k
elements.

....

blist - List implemented via B-tree.
bset - Set implemented via B-tree.
Additional methods:
- s.min() - Minimum element
- s.max() - Maximum element
- s.index(x) - Index of x in sorted list of
elements.
- s.next(x) - Least element which is > x
(x need not be in the set).
- s.previous(x) - Greatest element which is < x
(x need not be in the set).
- s - Get element i from sorted list.
- s[i:j] - Get slice from sorted list.
bfrozenset - Immutable set implemented via B-tree.
bdict - Dict implemented via B-tree.
Additional methods:
- a.min() - Minimum key
- a.max() - Maximum key
- a.index(k) - Index of k in sorted list of
keys.
- s.next(x) - Least key which is > x
(x need not be an existing key).
- s.previous(x) - Greatest key which is < x
(x need not be an existing key).
- a.get_key_at(i) - Get key at index i from sorted
list of keys.
- a.get_key_at(i, j) - Get slice from sorted key list.
"""
 
B

Bryan Olson

> Hi again,
>
> Since my linear algebra library appears not to serve any practical
> need (I found cgkit, and that works better for me), I've gotten bored
> and went back to one of my other projects: reimplementing the Python
> builtin classes list(), set(), dict(), and frozenset() with balanced
> trees (specifically, counted B-trees stored in memory). [...]
> So my question is: are there any other *practical* applications of a
> B-tree based list/set/dict ? In other words, is this module totally
> worth coding, or is it just academic wankery and theoretical flim
> flam ? :)

B-trees are specifically designed for disk storage. Seeking to a
page takes much longer than reading a page of several kilobytes.
B-trees read the keys for a many-way comparison in one seek.

For in-memory comparison-based dictionaries, Red-Black trees,
AVL trees, 2-3 trees, or skip-lists, are a better choice.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,526
Members
44,997
Latest member
mileyka

Latest Threads

Top