An ordered dictionary for the Python library?

M

Mark Summerfield

I feel that Python lacks one useful data structure: an ordered
dictionary.

I find such data structures v. useful in C++. I know that in Python
the sort function is v. fast, but often I prefer never to sort but
simply to use an ordered data structure in the first place.
(I'm aware that for ordered lists I can use the bisect module, but I
want an ordered key-value data structure.)

I think other people must find such things useful. There are three
implementations on the Python Cookbook site, and one on PyPI, all in
pure Python (plus I have my own implementation, also pure Python).

I would suppose that it would be better if it was implemented in C---
for example, my own pure Python ordered dict loads data about eight
times slower than the built-in dict. Nonetheless, I still find it
worth using for the convenience it offers.

Do other Python programmers feel this lack? Is this worth a PEP?

[I originally asked about this on the P3K mailing list, but then
realised that it isn't version-specific really.]
 
M

Michele Simionato

I feel that Python lacks one useful data structure: an ordered
dictionary.

I find such data structures v. useful in C++. I know that in Python
the sort function is v. fast, but often I prefer never to sort but
simply to use an ordered data structure in the first place.
(I'm aware that for ordered lists I can use the bisect module, but I
want an ordered key-value data structure.)

I think other people must find such things useful. There are three
implementations on the Python Cookbook site, and one on PyPI, all in
pure Python (plus I have my own implementation, also pure Python).

I would suppose that it would be better if it was implemented in C---
for example, my own pure Python ordered dict loads data about eight
times slower than the built-in dict. Nonetheless, I still find it
worth using for the convenience it offers.

Do other Python programmers feel this lack? Is this worth a PEP?

Yes, this is a serious lack in the standard library.

Michele Simionato
 
S

stef

Mark said:
I feel that Python lacks one useful data structure: an ordered
dictionary.

I find such data structures v. useful in C++. I know that in Python
the sort function is v. fast, but often I prefer never to sort but
simply to use an ordered data structure in the first place.
(I'm aware that for ordered lists I can use the bisect module, but I
want an ordered key-value data structure.)

I think other people must find such things useful. There are three
implementations on the Python Cookbook site, and one on PyPI, all in
pure Python (plus I have my own implementation, also pure Python).

I would suppose that it would be better if it was implemented in C---
for example, my own pure Python ordered dict loads data about eight
times slower than the built-in dict. Nonetheless, I still find it
worth using for the convenience it offers.

Do other Python programmers feel this lack? Is this worth a PEP?
Yes I think it's really useful,
(or at least I'm used to it in other languages ;-)
If you're going to extend the dictionary,
there's one other flag I'm continuously missing:
"case-insensitive" key.

cheers,
Stef Mientki
[I originally asked about this on the P3K mailing list, but then
realised that it isn't version-specific really.]
 
T

thebjorn

I feel that Python lacks one useful data structure: an ordered
dictionary.

I find such data structures v. useful in C++. I know that in Python
the sort function is v. fast, but often I prefer never to sort but
simply to use an ordered data structure in the first place.
(I'm aware that for ordered lists I can use the bisect module, but I
want an ordered key-value data structure.) [...]
Do other Python programmers feel this lack? Is this worth a PEP?

I usually make a distinction between a sorted dict, where iteration
(and potentially positional indexing) happens in sorted key order; and
an ordered dict where items maintain insertion order. I use the latter
all the time, and e.g. Django's model metaclass does some minor magic
to overcome the fact that field-order is lost by the time your
metaclass gets control, since the attributes are passed as a regular
dict.

-- bjorn
 
S

Steven D'Aprano

I feel that Python lacks one useful data structure: an ordered
dictionary.

I find such data structures v. useful in C++.
[snip]


Personally, I've never missed an ordered dict. What do people use them
for?

In fact, I'm not sure what people mean by ordered dicts. I assume they
mean the keys are ordered, not the values. But ordered by what? Insertion
order? Modification order? Alphabetical order?
 
M

Michele Simionato

On Wed, 12 Sep 2007 07:33:45 +0000, Mark Summerfield wrote:
In fact, I'm not sure what people mean by ordered dicts. I assume they
mean the keys are ordered, not the values. But ordered by what? Insertion
order? Modification order? Alphabetical order?

Insertion order.

M.S.
 
S

Steven D'Aprano

If you're going to extend the dictionary, there's one other flag I'm
continuously missing: "case-insensitive" key.

Completely untested and probably buggy as anything, but here's an
absolutely minimal case-insensitive dictionary.

class CaselessDict(dict):
def __setitem__(self, key, value):
super(CaselessDict, self).__setitem__(key.lower(), value)
def __getitem__(self, key):
return super(CaselessDict, self).__getitem__(key.lower())
 
M

Mark Summerfield

Insertion order.

M.S.


Actually I meant by key order, so insertion order doesn't matter at
all. If you need a dictionary-like data structure that respects
insertion order you could use a list of (key, value) tuples.

Another respondent asked about use cases.

I have found time and again that I needed (key, value) pairs where the
key is some string that provides a representation for human readers
and the value is some internal key (e.g., database ID) that the system
uses internally. In these cases I often need to present the user with
a list of items from which to choose and want to present them in
sorted order. Naturally, I could use an ordinary dict and then do
this:

for string in sorted(d.keys()):
process(string)

But what happens when I need to do this a *lot* and when the number of
items is hundreds or a few thousands? I'm having to sort again and
again, since it is often the case that the items in the list changes
during the runtime of the application. So my solution in C++ is to use
an ordered dictionary (a map in C++ jargon), which in Python means I
can simply write:

for string in od.keys():
process(string)

Another situation where this is useful is if you need to hold lists of
strings preserving case but want them to be presented case-
insensitively. Here, you populate the ordered dictionary like this:

for string in somelist:
od[string.lower()] = string

Now you can iterate over od in case-insensitive order and yet still
have access to the case-sensitive original, and never have to
explicitly sort the strings.

I think it comes down to the nail and hammer issue: if you have a
hammer all problems look nail shaped. In Python we have dict and list
so everything looks suitable for those data structures. But having had
a wrench (ordered dictionaries) in C++, I find that some problems need
a wrench. I think that once Python programmers have access to an
ordered dictionary some proportion of them will find it painful to
program without it, so I really think it belongs in the standard
library.
 
M

Michele Simionato

On 12 Sep, 13:46, Michele Simionato <[email protected]>

Actually I meant by key order, so insertion order doesn't matter at
all. If you need a dictionary-like data structure that respects
insertion order you could use a list of (key, value) tuples.

Another respondent asked about use cases.

I have found time and again that I needed (key, value) pairs where the
key is some string that provides a representation for human readers
and the value is some internal key (e.g., database ID) that the system
uses internally. In these cases I often need to present the user with
a list of items from which to choose and want to present them in
sorted order. Naturally, I could use an ordinary dict and then do
this:

for string in sorted(d.keys()):
process(string)

But what happens when I need to do this a *lot* and when the number of
items is hundreds or a few thousands? I'm having to sort again and
again, since it is often the case that the items in the list changes
during the runtime of the application. So my solution in C++ is to use
an ordered dictionary (a map in C++ jargon), which in Python means I
can simply write:

for string in od.keys():
process(string)

For your use case I would wrap a list [(key, value)] with a dict-like
object and I would use the bisect module in the standard library to
keep
the inner list ordered.

M.S.
 
D

Duncan Booth

Steven D'Aprano said:
In fact, I'm not sure what people mean by ordered dicts. I assume they
mean the keys are ordered, not the values. But ordered by what?
Insertion order? Modification order? Alphabetical order?

All of the above at different times. Usually I think they mean original
insertion order, I'd more often like modification order though.

Insertion order can certainly be useful: any situation where you currently
keep items in a list because the original order is important but also copy
them into a dictionary or set because you need fast lookup.
e.g. if you want only unique items out of a list but need to preserve order
of first occurrence.

Another example would be uml->code round tripping: you parse in an existing
file containing method definitions, then update the method definitions from
a UML model and write the file out again: ArchGenXML does exactly that, but
last I looked the methods were written out in whatever order the dictionary
stored them which really messes up version control. It really should
preserve the order that methods appeared in the file.

Modification order would be great for a cache. Touch items whenever they
are referenced and then whenever the cache gets too big just pop the oldest
until you get back to the desired size.

Alphabetical or other sorted order when you want to get smallest/largest or
next smaller/larger then a specific item. I don't agree you necessarily
mean ordered by key rather than value, I think you could specify any
sortkey and expect updating the value to restore the ordering. Maybe if you
are recording some stats and want to regularly output the top 10?
 
R

Roel Schroeven

Michele Simionato schreef:
Insertion order.

Mark referred to C++ and the sort function, so I assume he meant
alphabetical order (or the equivalent for types other than strings).
 
C

Chris Mellon

Actually I meant by key order, so insertion order doesn't matter at
all. If you need a dictionary-like data structure that respects
insertion order you could use a list of (key, value) tuples.

These sort of disagreements about what such a beast should "obviously"
do are a good part of the reason why one doesn't exist in the standard
library. A sorted dict is so trivial that it's not really seen as
necessary, since all you have to do is sort the keys(). An ordered
dict is generally better satisfied with a list, as you've seen.
There's implementations of all these for people who want something
wrapped up in the cookbook.

But what happens when I need to do this a *lot* and when the number of
items is hundreds or a few thousands?

keys = sorted(d.keys())
for k in key:
...

You've got two choices: either you insert more than you iterate, in
which case sorting at the point of iteration is the most efficient
option, or you iterate more than you insert, in which case sorting
after you insert is easy. There's no way that an ordered dict
implementation could satisfy both constraints. Can you honestly think
of a use case where the performance of your sorting is a bottleneck
*and* it's a suitably general case that a standard lib class should
optimize for that specific case? Even in this short thread we've seen
3 different ideas of what the dict should do.
 
C

Carl Banks

On 12 Sep, 13:46, Michele Simionato <[email protected]>
Actually I meant by key order, so insertion order doesn't matter at
all. If you need a dictionary-like data structure that respects
insertion order you could use a list of (key, value) tuples.
Another respondent asked about use cases.
I have found time and again that I needed (key, value) pairs where the
key is some string that provides a representation for human readers
and the value is some internal key (e.g., database ID) that the system
uses internally. In these cases I often need to present the user with
a list of items from which to choose and want to present them in
sorted order. Naturally, I could use an ordinary dict and then do
this:
for string in sorted(d.keys()):
process(string)
But what happens when I need to do this a *lot* and when the number of
items is hundreds or a few thousands? I'm having to sort again and
again, since it is often the case that the items in the list changes
during the runtime of the application. So my solution in C++ is to use
an ordered dictionary (a map in C++ jargon), which in Python means I
can simply write:
for string in od.keys():
process(string)

For your use case I would wrap a list [(key, value)] with a dict-like
object and I would use the bisect module in the standard library to
keep
the inner list ordered.

That would make insertion and deletion would be quite expensive.
Which might be ok for the OP, but it's often better to go with a
btree.

Which, you know, would be a reasonable candidate for collections
module.


Carl Banks
 
C

Carl Banks

I feel that Python lacks one useful data structure: an ordered
dictionary.
I find such data structures v. useful in C++.

[snip]

Personally, I've never missed an ordered dict. What do people use them
for?

I once had a GUI that displayed the contents of a dict as a tree. The
user could add entries, and the entries would display at the end.
Hence, the ordered dict.

I could have kept the order separately, but what was the point? I
just whipped up a homemade ordered dict class, it worked fine, and I
didn't have to worry about keeping them synchronized.


Carl Banks
 
N

Neil Cerutti

Actually I meant by key order, so insertion order doesn't
matter at all. If you need a dictionary-like data structure
that respects insertion order you could use a list of (key,
value) tuples.

Another respondent asked about use cases.

A mapping with sorted keys could theoretically be convenient when
your keys are comparable but not hashable.
 
M

Mark Summerfield

On 12 Sep, 13:46, Michele Simionato <[email protected]>
Actually I meant by key order, so insertion order doesn't matter at
all. If you need a dictionary-like data structure that respects
insertion order you could use a list of (key, value) tuples.
Another respondent asked about use cases.
I have found time and again that I needed (key, value) pairs where the
key is some string that provides a representation for human readers
and the value is some internal key (e.g., database ID) that the system
uses internally. In these cases I often need to present the user with
a list of items from which to choose and want to present them in
sorted order. Naturally, I could use an ordinary dict and then do
this:
for string in sorted(d.keys()):
process(string)
But what happens when I need to do this a *lot* and when the number of
items is hundreds or a few thousands? I'm having to sort again and
again, since it is often the case that the items in the list changes
during the runtime of the application. So my solution in C++ is to use
an ordered dictionary (a map in C++ jargon), which in Python means I
can simply write:
for string in od.keys():
process(string)

For your use case I would wrap a list [(key, value)] with a dict-like
object and I would use the bisect module in the standard library to
keep
the inner list ordered.

M.S.

Actually, in my own ordereddict I wrap a dict and keep the keys in
order using the bisect module.
This works fine, but I imagine that a C-based implementation based on
B*trees or skiplists would be a lot faster.

In response to some of the following emails, I note one responder says
such a class is trivial to implement. Yes, and so presumably is
defaultdict, but the latter is in the standard library. One of the
"problems" of Python is that the range of skills of its users varies
from casual "amateur" programmers to guru professionals and everywhere
inbetween. Sure at somewhere on that continuum implementing *anything*
is "trivial", but for some users it is worthwhile to have it out of
the box.

What I like about ordered dictionaries is the ability to have sorted
data without sorting. It lets me have multiple sort orders. For
example, in some programs I give the user the choice of how they want
their data ordered and can provide such orderings without sorting,
simply by maintaining a set of parallel data structures with keys
being ordered and values being pointers (object references in Python)
to the relevant values. This costs in memory (but not much since no
values are duplicated and the values are usually large the keys
usually small).

Psuedocode example:

class Person:
def __init__(self, name, payrollno, grade, dept):
self.name = name
# etc

For each person created:
personFromName["%s\t%012d" % (person.name.lower(),
person.payrollno)] = person
personFromPayrollNo[person.payrollno] = person
personFromGrade["%s\t%012d" % (person.grade, person.payrollno)] =
person
personFromDept["%s\t%012d" % (person.dept, person.payrollno)] =
person

So now I have four orderings with no sorting. (I have to disambiguate
to avoid duplicates and to provide subordering.)

If I have 10,000 people I would potentially have to resort 10,000
instances whenever the user chose a new sort order: this way I just
have to iterate over the right ordered dictionary.

So I do feel that there are good use cases, and I think that for some
Python users having this in the library would be a genuine benefit.

Most of the use cases I envisage involve lots of insertions at start
up, relatively few during runtime, and lots of iterations over the
data as the user changes their view of it.

Of course Python does have an ordered dictionary, it is just not
necessarily always installed. You can use the bsdbd3 module's btopen()
method and pass None as filename to get an in-memory Btree, but I
believe (not sure) that there may be length limits on the keys.
 
R

Russell E. Owen

Steven D'Aprano said:
I feel that Python lacks one useful data structure: an ordered
dictionary.

I find such data structures v. useful in C++.
[snip]


Personally, I've never missed an ordered dict. What do people use them
for?

In fact, I'm not sure what people mean by ordered dicts. I assume they
mean the keys are ordered, not the values. But ordered by what? Insertion
order? Modification order? Alphabetical order?

Opinions vary, but I think of it this way:
- A sorted dict keeps its keys sorted. This has an obvious alternative
just sort the keys later (though I can imagine cases where it would be
nicer not to).

- An ordered dict remembers the order of insertion. This is useful when
you want to retain the order of insertion yet you also want fast lookup
of individual items. Unlike a sorted dict, there is no trivial
alternative (though you can argue about "trivial") and I really wish
there was. There are several free versions available.

-- Russell
 
T

Tuomas

Mark said:
I feel that Python lacks one useful data structure: an ordered
dictionary.

Why it should be a dict. With it you can only maintain the order
x1<x2<x3... But by definition the order only requires x1<=x2<=x3...
There are many use cases where keys are not unique. I'd prefer a list
implementation, which would maintain the user defined order (cmp, key).
So you could map your objects with any order.

TV
 
P

Paul Rubin

Mark Summerfield said:
I feel that Python lacks one useful data structure: an ordered
dictionary.
Do other Python programmers feel this lack? Is this worth a PEP?

Yes. It should use a functional data structure.
 
M

Mark Summerfield

Yes. It should use a functional data structure.

Could you elaborate?

Here's my impression of the postings so far: there are 3 categories of
response:

(A) There is no need for such a thing.

(B) It is useful, but so easy to implement that it needn't be in the
library.

(C) It is worth having an ordered data structure of some kind.

Regarding (A) and (B): I remember Python before it had the sets
module. "There's no need for sets, you can do them with dicts".
Thankfully sets are in the language and I for one use them
extensively.

For (C) what I had in mind was:

An API that is identical to a dict, with the few additional methods/
behaviours listed below:
- If an item is inserted it is put in the right place (because the
underlying data structure, b*tree, skiplist or whatever is
intrinsically ordered by < on the key)
- Extra methods
key_at(index : int) -> key
item_at(index : int) -> (key, value)
value_at(index : int) -> value
set_value_at(index : int, value) # no return value necessary but
maybe key if convenient
od[x:y:z] -> list of values ordered by key # can read a slice but
not assign a slice
and maybe
keys_for(fromindex : int, uptobutexcludingindex : int) -> ordered
list of keys

I've given examples of the use cases I envisage (and that I use in
both Python with my own ordered dict and in C++ with the map class) in
a previous posting, and I've shown the API I envisage above. I know
that some people might prefer ordering by value or the option of case-
insensitive ordering, but both these can be achieved using the API
above, e.g.,
od[value.lower()] = value # case-insensitive ordering of list of
values
And as for duplicates there are two approaches I use: (1) make each
string unique as I showed in my previous post, or (2) use a value that
itself is a list, dict or set.
(As for those who have suggested an ordered data structure that isn't
a dict, I don't see a conflict, I'd be happy to see more data
structures in the collections module.)

Is what I've suggested sufficient (and sufficiently general) for the
standard library? If it is not, what should be done, or is there some
other approach and API that would better provide the functionality
envisaged?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top