Sorted dictionary

J

Jan Kaliszewski

Hello,

Inspired by some my needs as well as some discussions in the net, I've
implemented a sorted dictionary (i.e. a dict that returns keys, values and
items always sorted by keys):

http://code.activestate.com/recipes/576998/

Maybe it'll appear to be useful for somebody... And I'm curious about your
opinions.

Regards,
*j
 
C

Chris Rebert

What's the advantage of that over sorting the keys as needed?


E.g.

for key in sorted(dict):
   print key


works.

Well, it does spread out the cost of sorting the key list over each of
the N distinct key assignments, versus sorting the entire list all at
once, on-demand at the end of the process. And you avoid having to
traverse the M buckets in the dictionary to locate the N keys in the
first place. So it has different performance characteristics, which
could theoretically matter depending on the use case; e.g. it could
avoid a GUI application hanging for several seconds while it sorts a
large list of keys.

Cheers,
Chris
 
R

Raymond Hettinger

Hello,

Inspired by some my needs as well as some discussions in the net, I've  
implemented a sorted dictionary (i.e. a dict that returns keys, values and  
items always sorted by keys):

http://code.activestate.com/recipes/576998/

Maybe it'll appear to be useful for somebody... And I'm curious about your  
opinions.

Using an underlying list to track sorted items
means that insertion and deletion take O(n) time.
That could be reduced to O(log n) time by using
a blist or skiplist as the underlying structure
for maintaining a sort.

The other alternative is to just append items to the list
and sort the list on demand. Since the Timsort takes advantage
of existing order, the process will approach O(n) if sorting
after every few updates. This is close the O(n) time it takes
to iterate the list in the first place:

s = SortedDict(items)
list(s) # O(n log n) to sort and O(n) to iterate
s[newkey] = newval
list(s) # close to O(n)

my two cents,


Raymond
 
B

Bearophile

Raymond Hettinger:
Using an underlying list to track sorted items
means that insertion and deletion take O(n) time.
That could be reduced to O(log n) time by using
a blist or skiplist as the underlying structure
for maintaining a sort.

In the collections module it can be useful to have ordered dicts and
sets based on search trees.
I'm thinking about B+ trees to use CPU cache locality better. It can
be fun :)

Bye,
bearophile
 
J

Jan Kaliszewski

Dnia 21-01-2010 o 08:49:22 Chris Rebert said:
Well, it does spread out the cost of sorting the key list over each of
the N distinct key assignments, versus sorting the entire list all at
once, on-demand at the end of the process. And you avoid having to
traverse the M buckets in the dictionary to locate the N keys in the
first place. So it has different performance characteristics, which
could theoretically matter depending on the use case; e.g. it could
avoid a GUI application hanging for several seconds while it sorts a
large list of keys.

Generally, I see 3 possible approaches:

1. What Steven wrote about: simply sort all keys when you need to get them
sorted (it needs iteration over whole a dict). In many cases it's good
enough (so is the best, because of simplicity) -- e.g. when a dict is not
very long (very common case), or when that sorting it is a rare operation.
Sort always when a key is added/deleted, maintaining an auxiliary list of
sorted keys (sufficient especially when you modify a dict rarely and get
from it often).

2. Sort all keys on-demand -- but only if a dict is marked as "modified".
Mark a dict as "modified" when you add/delete any key; and mark a dict as
"unmodified" when sorting. (Of course, it can be automatized, see e.g.:
http://pypi.python.org/pypi/sorteddict). Nice, if your use-pattern is:
add/del a lot, *then* only get/iter a lot...

3. Continually (at each set/del of a key) maintain auxiliary sorted list
of keys -- using binary search (as I did), heap queue (see:
http://pypi.python.org/pypi/HeapDict) or such an algorithm. I think this
approach is the best when you have quite long dict, and you add/del fairly
often and get/iter very often.

Regards,
*j
 
J

Jan Kaliszewski

Dnia 21-01-2010 o 09:27:52 Raymond Hettinger said:
Using an underlying list to track sorted items
means that insertion and deletion take O(n) time.
That could be reduced to O(log n) time by using
a blist or skiplist as the underlying structure
for maintaining a sort.

Please note that I used funcions from bisect, that use binary search.

Doesn't it take O(log n) time?

*j
 
A

Arnaud Delobelle

Jan Kaliszewski said:
Please note that I used funcions from bisect, that use binary search.

Doesn't it take O(log n) time?

Insertion and deletion are still O(n) as all items to the right of the
inserted/deleted one have to be shifted by one place.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top