should python have a sort list-map object in the std-lib?

T

Tim Henderson

Hi

The question why are there no sorted dictionaries in python, seems to
pop up with unseeming regularity. That question in itself in
nonsensical sense dictionaries are hash-maps, however should python
have a sorted map type object is a good question.

clearly many people like have a sorted map, and sorting the keys every
time seems rather wasteful, as does keeping a separate sorted list of
the keys.

a related question is, in python is it more efficient to a maintain a
list type in sorted order by inserting each new object in the correct
location (by say finding it with a binary search and then using del
list[x]) or appending each new item to the end of said list and using
the built-in .sort method, or sorted() function?
 
A

Alex Martelli

Tim Henderson said:
Hi

The question why are there no sorted dictionaries in python, seems to
pop up with unseeming regularity. That question in itself in
nonsensical sense dictionaries are hash-maps, however should python
have a sorted map type object is a good question.

clearly many people like have a sorted map, and sorting the keys every
time seems rather wasteful, as does keeping a separate sorted list of
the keys.

It may be the most practical approach, though. Designing ONE "ordered
dictionary" needs to answer several questions that will affect (reduce)
its usefulness no matter how you answer them, such as: do the keys need
to be hashable *as well as* order-comparable? Do copies of key objects
get implicitly taken on insertion? Etc, etc.

a related question is, in python is it more efficient to a maintain a
list type in sorted order by inserting each new object in the correct
location (by say finding it with a binary search and then using del
list[x]) or appending each new item to the end of said list and using
the built-in .sort method, or sorted() function?

Depends on the patterns of occurrences of insertions and deletions
versus needs to use the sortedlist, in terms of how do they get bunched
or spread out in time wrt each other. For many cases, the best answer
is, neither of those you mentioned, but rather the functions in heapq.


Alex
 
T

Tim Henderson

ahh, i hadn't thought of using a proirity queue but that is the correct
solution most of the time, except i suppose when you have input that
causes you to excessively reheap which could be problematic.

i may still look into writing a general sorted map though, it could be
useful especially if there were and easy way to set the type of
functionality required with out resorting to several different types of
sorted-maps.
 
S

Steven D'Aprano

Hi

The question why are there no sorted dictionaries in python, seems to
pop up with unseeming regularity. That question in itself in
nonsensical sense dictionaries are hash-maps, however should python
have a sorted map type object is a good question.

clearly many people like have a sorted map, and sorting the keys every
time seems rather wasteful, as does keeping a separate sorted list of
the keys.

Another good question is, does it *seem* wasteful or is it *actually*
wasteful? More or less wasteful than having special code in Python to
implement "sorted dictionaries" (whatever that means)?

It is good to have a rich, powerful set of tools in a programming
language. It is bad to expect that, no matter what task you need, that
your language should have a single function or object to do it. That just
leads to a bloated language where it is harder to find the feature you
want than it is to simply write it yourself.


a related question is, in python is it more efficient to a maintain a
list type in sorted order by inserting each new object in the correct
location (by say finding it with a binary search and then using del
list[x]) or appending each new item to the end of said list and using
the built-in .sort method, or sorted() function?

I don't think using del to insert an object into a list would be very
efficient, no matter how little time the deletion took :)

Why don't you try it for yourself? You only need some quick and dirty code
to discover which approach is better (for a quick and dirty definition of
"better"). Don't forget to check out the bisect module too.
 
A

Alex Martelli

Tim Henderson said:
ahh, i hadn't thought of using a proirity queue but that is the correct
solution most of the time, except i suppose when you have input that
causes you to excessively reheap which could be problematic.

The worst case is still O(N logN) for heap as well as timsort. But
since timsort can take advantage of order or reverse order already
present in the sequence, it's surely possible to find real use cases in
which one sort at the end of all insertions is O(N) and thus much
faster. Nevertheless, that also depends on having a lot of insertions
before you need any sorting; if you need to "walk sorted" after each
insertion or thereabouts, I would guess heap would be faster again.

i may still look into writing a general sorted map though, it could be
useful especially if there were and easy way to set the type of
functionality required with out resorting to several different types of
sorted-maps.

You can record a callable equivalent to sort's key= once and for all at
the creation of the "sorted map". heapq's functions don't directly
support that in 2.4 (they will in 2.5) but it's easy to implement via
explicit key-extraction upon insertion (once known as the D step in the
DSU, decorate-sort-undecorate, idiom). It's unclear whether you want to
be able to key the sorting off keys only, or keys AND values: the latter
is more general but also takes more work and complication.

As for "different types", if I catch your drift...:

I would suggest avoiding the temptation to overload the type with a
bazillion options-flags requiring deeply different behavior, e.g.
copying keys and/or values versus just taking references to either or
both, or either requiring or foregoing hashable keys (with different
implementations). If such differences are warranted by use cases, it's
better to have several different types than one complicated one. I
would personally suggest mimicking dict's semantic: require hashable
keys, make no copies.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top