Sorted dictionary

Jan Kaliszewski · Jan 20, 2010

Hello,

Inspired by some my needs as well as some discussions in the net, I've
implemented a sorted dictionary (i.e. a dict that returns keys, values and
items always sorted by keys):

http://code.activestate.com/recipes/576998/

Maybe it'll appear to be useful for somebody... And I'm curious about your
opinions.

Regards,
*j

Steven D'Aprano · Jan 20, 2010

Hello,

Inspired by some my needs as well as some discussions in the net, I've
implemented a sorted dictionary (i.e. a dict that returns keys, values
and items always sorted by keys):

http://code.activestate.com/recipes/576998/

What's the advantage of that over sorting the keys as needed?

E.g.

for key in sorted(dict):
print key

works.

Chris Rebert · Jan 21, 2010

What's the advantage of that over sorting the keys as needed?

E.g.

for key in sorted(dict):
Â Â print key

works.

Well, it does spread out the cost of sorting the key list over each of
the N distinct key assignments, versus sorting the entire list all at
once, on-demand at the end of the process. And you avoid having to
traverse the M buckets in the dictionary to locate the N keys in the
first place. So it has different performance characteristics, which
could theoretically matter depending on the use case; e.g. it could
avoid a GUI application hanging for several seconds while it sorts a
large list of keys.

Cheers,
Chris

Raymond Hettinger · Jan 21, 2010

Hello,

Inspired by some my needs as well as some discussions in the net, I've
implemented a sorted dictionary (i.e. a dict that returns keys, values and
items always sorted by keys):

http://code.activestate.com/recipes/576998/

Maybe it'll appear to be useful for somebody... And I'm curious about your
opinions.

Using an underlying list to track sorted items
means that insertion and deletion take O(n) time.
That could be reduced to O(log n) time by using
a blist or skiplist as the underlying structure
for maintaining a sort.

The other alternative is to just append items to the list
and sort the list on demand. Since the Timsort takes advantage
of existing order, the process will approach O(n) if sorting
after every few updates. This is close the O(n) time it takes
to iterate the list in the first place:

s = SortedDict(items)
list(s) # O(n log n) to sort and O(n) to iterate
s[newkey] = newval
list(s) # close to O(n)

my two cents,

Raymond

Bearophile · Jan 21, 2010

Raymond Hettinger:

Using an underlying list to track sorted items
means that insertion and deletion take O(n) time.
That could be reduced to O(log n) time by using
a blist or skiplist as the underlying structure
for maintaining a sort.

In the collections module it can be useful to have ordered dicts and
sets based on search trees.
I'm thinking about B+ trees to use CPU cache locality better. It can
be fun

Bye,
bearophile

Jan Kaliszewski · Jan 21, 2010

Dnia 21-01-2010 o 08:49:22 Chris Rebert said:
Well, it does spread out the cost of sorting the key list over each of
the N distinct key assignments, versus sorting the entire list all at
once, on-demand at the end of the process. And you avoid having to
traverse the M buckets in the dictionary to locate the N keys in the
first place. So it has different performance characteristics, which
could theoretically matter depending on the use case; e.g. it could
avoid a GUI application hanging for several seconds while it sorts a
large list of keys.

Generally, I see 3 possible approaches:

1. What Steven wrote about: simply sort all keys when you need to get them
sorted (it needs iteration over whole a dict). In many cases it's good
enough (so is the best, because of simplicity) -- e.g. when a dict is not
very long (very common case), or when that sorting it is a rare operation.
Sort always when a key is added/deleted, maintaining an auxiliary list of
sorted keys (sufficient especially when you modify a dict rarely and get
from it often).

2. Sort all keys on-demand -- but only if a dict is marked as "modified".
Mark a dict as "modified" when you add/delete any key; and mark a dict as
"unmodified" when sorting. (Of course, it can be automatized, see e.g.:
http://pypi.python.org/pypi/sorteddict). Nice, if your use-pattern is:
add/del a lot, *then* only get/iter a lot...

3. Continually (at each set/del of a key) maintain auxiliary sorted list
of keys -- using binary search (as I did), heap queue (see:
http://pypi.python.org/pypi/HeapDict) or such an algorithm. I think this
approach is the best when you have quite long dict, and you add/del fairly
often and get/iter very often.

Regards,
*j

Jan Kaliszewski · Jan 21, 2010

Dnia 21-01-2010 o 09:27:52 Raymond Hettinger said:
Using an underlying list to track sorted items
means that insertion and deletion take O(n) time.
That could be reduced to O(log n) time by using
a blist or skiplist as the underlying structure
for maintaining a sort.

Please note that I used funcions from bisect, that use binary search.

Doesn't it take O(log n) time?

*j

Arnaud Delobelle · Jan 21, 2010

Jan Kaliszewski said:
Please note that I used funcions from bisect, that use binary search.

Doesn't it take O(log n) time?

Insertion and deletion are still O(n) as all items to the right of the
inserted/deleted one have to be shifted by one place.

Sorted indexes in Python	0	Oct 17, 2008
Testing dictionary results with doctest	3	Oct 31, 2008
Print dict in sorted order	6	Jan 29, 2006
R: Sorted list as an alternative to dictionary for when you onlyneedkeys?	0	Jul 19, 2004
Python 3: dict & dict.keys()	4	Jul 23, 2013
dictionary: sorting the values preserving the order	5	Apr 1, 2005
sorteddict PEP proposal [started off as orderedict]	60	Sep 25, 2007
PEP 372 -- Adding an ordered directory to collections	28	Jun 16, 2008

Sorted dictionary

Jan Kaliszewski

Steven D'Aprano

Chris Rebert

Raymond Hettinger

Bearophile

Jan Kaliszewski

Jan Kaliszewski

Arnaud Delobelle

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads