Why are there no ordered dictionaries?

  • Thread starter Christoph Zwerschke
  • Start date
T

Tom Anderson

What I actually wanted to say is that there may be a confusion between a
"sorted dictionary" (one where the keys are automatically sorted) and an
"ordered dictionary" (where the keys are not automatically ordered, but
have a certain order that is preserved). Those who suggested that the
"sorted" function would be helpful probably thought of a "sorted
dictionary" rather than an "ordered dictionary."

Exactly.

Python could also do with a sorted dict, like binary tree or something,
but that's another story.

tom
 
A

Alex Martelli

Christoph Zwerschke said:
* C++ has a Map template in the STL which is ordered (a "Sorted
Associative Container").

Ordered *by comparisons on keys*, NOT by order of insertion -- an
utterly and completely different idea.
So ordered dictionaries don't seem to be such an exotic idea.

Ordered *by order of key insertion*: Java, PHP
Ordered *by other criteria*: LISP, C++
Unordered: Python, Perl, Ruby, Smalltalk, Awk, Tcl

by classification of the languages you've listed.
I can't help but I still find it unambiguous and intuitive enough to
consider it "the one" standard implementation for ordered dictionaries.

Then you should be very careful not to call C++'s implementation
"ordered", because that makes it VERY HARD to argue that "the one"
thingy;-).

Nevertheless, since sorting by keys (or any function of the keys and
values, including one depending on an external table, which was claimed
to be otherwise in other parts of this thread) is so trivial, while
recovering insertion order is impossible without some auxiliary data
structure ``on the side'', I agree that a dictionary subclass that's
ordered based on insertion timing would have more added value than one
where the 'ordering' is based on any function of keys and values.


Alex
 
B

Bengt Richter

d1[0:0] + d1[2:2] ==> OrderedDict( (1, 11), (3, 13) )

Oops, sorry, that was nonsense again. I meant
d1[0:1] + d1[1:2] ==> OrderedDict( (1, 11), (3, 13) )
Ordered dictionaries could allow slicing and concatenation.

Some operations such as concatenation need of course special
considerations, since the keys must stay unique. A concatenation of
ordered dicts with overlapping keys should probably give an IndexError.
If you define the semantics like feeding overlapping keys in a tuple
sequence to dict, then later duplicate keys just replace prior ones
by same rules as d[k]=v1; d[k]=v2. I think that makes sense in this
context, and can be defined unambigously.

Regards,
Bengt Richter
 
A

Alex Martelli

Tom Anderson said:
Exactly.

Python could also do with a sorted dict, like binary tree or something,
but that's another story.

However, since Christoph himself just misclassified C++'s std::map as
"ordered" (it would be "sorted" in this new terminology he's now
introducing), it seems obvious that the terminological confusion is
rife. Many requests and offers in the past for "ordered dictionaries"
(e.g. on this group) were also "sorted", NOT "ordered", in this new
terminology.

Maybe it would therefore be clearer to have the name of the new
container reflect this, with a specific mention of *insertion* order...
rather than just call it "ordered" and risk probable confusion.


Alex
 
B

bonono

Alex said:
However, since Christoph himself just misclassified C++'s std::map as
"ordered" (it would be "sorted" in this new terminology he's now
introducing), it seems obvious that the terminological confusion is
rife. Many requests and offers in the past for "ordered dictionaries"
(e.g. on this group) were also "sorted", NOT "ordered", in this new
terminology.

Maybe it would therefore be clearer to have the name of the new
container reflect this, with a specific mention of *insertion* order...
rather than just call it "ordered" and risk probable confusion.
Um,

what would be the definition of "sorted" and "ordered", before we can
go on ?
 
B

Bengt Richter

Um,

what would be the definition of "sorted" and "ordered", before we can
go on ?
For me the implication of "sorted" is that there is a sorting algorithm
that can be used to create an ordering from a prior state of order,
whereas "ordered" could be the result of arbitrary permutation, e.g.,
manual shuffling, etc. Of course either way, a result can be said
to have a particular defined order, but "sorted" gets ordered
by sorting, and "ordered" _may_ get its order by any means.

Regards,
Bengt Richter
 
B

bonono

Bengt said:
For me the implication of "sorted" is that there is a sorting algorithm
that can be used to create an ordering from a prior state of order,
whereas "ordered" could be the result of arbitrary permutation, e.g.,
manual shuffling, etc. Of course either way, a result can be said
to have a particular defined order, but "sorted" gets ordered
by sorting, and "ordered" _may_ get its order by any means.
But Alex seems to think that based on another external table should be
classified as "sorted" whereas I would consider it as "manual
shuffling", thus "ordered".

I may be wrong it interpreting him though, which is why I want to
clarify.
 
G

Ganesan Rajagopal

*by order of key insertion*: Java, PHP > Ordered *by other
criteria*: LISP, C++ Java supports both ordered by key insertion
(LinkedHashMap) as well as ordered by key comparison (TreeMap).
Ganesan
 
A

Alex Martelli

But Alex seems to think that based on another external table should be
classified as "sorted" whereas I would consider it as "manual
shuffling", thus "ordered".

I may be wrong it interpreting him though, which is why I want to
clarify.

What you can obtain (or anyway easily simulate in terms of effects on a
loop) through an explicit call to the 'sorted' built-in, possibly with a
suitable 'key=' parameter, I would call "sorted" -- exactly because, as
Bengt put it, there IS a sorting algorithm which, etc, etc (if there
wasn't, you couldn't implement it through the 'sorted' built-in!).

So, any ordering that can be reconstructed from the key,value data held
in a dict (looking up some combinations of those in an external table is
nothing special in these terms) falls under this category. But, a dict
does not record anything about what was set or changed or deleted when;
any ordering which requires access to such information thus deserves to
be placed in a totally separate category.


Alex
 
B

bonono

Alex said:
What you can obtain (or anyway easily simulate in terms of effects on a
loop) through an explicit call to the 'sorted' built-in, possibly with a
suitable 'key=' parameter, I would call "sorted" -- exactly because, as
Bengt put it, there IS a sorting algorithm which, etc, etc (if there
wasn't, you couldn't implement it through the 'sorted' built-in!).

So, any ordering that can be reconstructed from the key,value data held
in a dict (looking up some combinations of those in an external table is
nothing special in these terms) falls under this category. But, a dict
does not record anything about what was set or changed or deleted when;
any ordering which requires access to such information thus deserves to
be placed in a totally separate category.
But I can also record these changes in a seperate table which then
becomes a "sorted" case ?
 
A

Alex Martelli

But I can also record these changes in a seperate table which then
becomes a "sorted" case ?

somedict['x']='y', per se, does no magic callback to let you record
anything when type(somedict) is dict. You can wrap or subclass to your
heart's content to record insertion/deletion/update history, but that
ever-changing "seperate [[sic]] table" is entirely coupled to
'somedict', not therefore "separate" at all, and should properly be kept
as an instance variable of your wrapper or subclass.

That's a pretty obvious difference from cases in which the auxiliary
table used to define the ordering is REALLY *separate* -- independent of
the insertion/etc history of the dictionaries it may be used on.


Alex
 
B

bonono

Alex said:
But I can also record these changes in a seperate table which then
becomes a "sorted" case ?

somedict['x']='y', per se, does no magic callback to let you record
anything when type(somedict) is dict. You can wrap or subclass to your
heart's content to record insertion/deletion/update history, but that
ever-changing "seperate [[sic]] table" is entirely coupled to
'somedict', not therefore "separate" at all, and should properly be kept
as an instance variable of your wrapper or subclass.

That's a pretty obvious difference from cases in which the auxiliary
table used to define the ordering is REALLY *separate* -- independent of
the insertion/etc history of the dictionaries it may be used on.
So, it depends.
 
S

Steve Holden

Christoph said:
This is probably a FAQ, but I dare to ask it nevertheless since I
haven't found a satisfying answer yet: Why isn't there an "ordered
dictionary" class at least in the standard list? Time and again I am
missing that feature. Maybe there is something wrong with my programming
style, but I rather think it is generally useful.

I fully agree with the following posting where somebody complains why so
very basic and useful things are not part of the standard library:
http://groups.google.com/group/comp.lang.python/msg/e652d2f771a49857

Are there plans to get it into the standard lib sometime?
We're always encouraging new posters to use "good" subject lines.
Several hundred posts after your original question I think everyone can
agree that this was a *very* good subject line :)

Perhaps now the answer top your question is more obvious: there is by no
means universal agreement on what an "ordered dictionary" should do.
Given the ease with which Python allows you to implement your chosen
functionality it would be presumptuous of the core developers to favour
any one of the several reasonable alternatives that might be chosen.

regards
Steve
 
B

bonono

Steve said:
Perhaps now the answer top your question is more obvious: there is by no
means universal agreement on what an "ordered dictionary" should do.
Given the ease with which Python allows you to implement your chosen
functionality it would be presumptuous of the core developers to favour
any one of the several reasonable alternatives that might be chosen.
It seems to be though as "ordered dictionary" are slowly to be confined
to only "ordered on order of change to the dictionary".
 
F

Fuzzyman

Christoph said:
Basically, it is what I had in mind. But I'm now seeing some things that
could be improved/supplemented, e.g.
- performance improvements

Implementation changes to follow, based on suggestions in this thread.
For *optimal* performance it needs to be implemented in C of course.
:)

As F points out, a list of tuples may give you a data structure that
can be accessed quicker (although some operations will be slower). An
ordered dictionary will be a lot easier to use and make your code more
readable though.
- the internal keys list should be hidden

I disagree. It is exposed so that you can manually change the order
(e.g. to create a "sorted" dict, rather than one ordered by key
insertion).

What do you *gain* by hiding it ?
- list methods should be mixed in instead

Hmm... I did consider this. Which list methods would you consider
appropriate ?

The difficulty is that integers are valid keys for a dictionary.
Perhaps a subclass that uses integers as indexes instead...

You can always perform list operations on the ``sequence`` attribute of
course.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
F

Fuzzyman

Bengt said:
Hmmm.. it's *the* repository for Python code. I expect quite a few
people use it...

:)
I hadn't realized how much stuff was there. I generally google for stuff,
but I will be looking directly now.

BTW, I made a mod[1] to your odict that I think/hope is going to be generally faster.
It requires 2.4 though. It passes the same doctest, so its probably close to the same.
It stores the ordering info differently, but is also a dict subclass.

odict maintains compatibility with Python 2.2 - so it's not a change
we'd put back into the module I don't think.

Changes that keep compatibility are very welcoemd though.
Do you happen to have a timing test that exercises various aspects, so I can try it
before I embarrass myself? Otherwise I guess I'll write something.

The only tests we have are the ones in the module.

If you create some timing tests we'd be interested in having access to
them though. They would be a useful addition.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
Would the would-be users chime in with some idea of what kinds of operations are
most important timing-wise? Which would get the most use? How dynamic would ordering typically be?

[1] fairly pervasive little mods actually
[ 9:15] C:\pywk\clp>diff odict.py odictb.py |wc
146 458 4948

[ 9:15] C:\pywk\clp>wc odict*.py
467 1228 12483 odict.py
511 1500 14728 odictb.py
978 2728 27211 Totals

Regards,
Bengt Richter
 
F

Fuzzyman

Christoph said:
One implementation detail that I think needs further consideration is in
which way to expose the keys and to mix in list methods for ordered
dictionaries.

In http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/107747
the keys are exposed via the keys() method which is bad. It should be a
copy only, like for ordinary dicts (one comment also mentions that).

In Foord/Larosa's odict, the keys are exposed as a public member which
also seems to be a bad idea ("If you alter the sequence list so that it
no longer reflects the contents of the dictionary, you have broken your
OrderedDict").

So don't do it. ;-)
I think it would be probably the best to hide the keys list from the
public, but to provide list methods for reordering them (sorting,
slicing etc.).

For instance:

d1 = OrderedDict( (1, 11), (2, 12), 3, 13) )

d1[1:] ==> OrderedDict( (2, 12), 3, 13) )

So what do you want returned when you ask for d1[1] ? The member keyed
by 1, or the item in position 1 ?

You can access the members using list operations on the sequence
attribute.

E.g. d1[d1.sequence[index]]

This could probably be made more convenient.
d1[0] + d1[2] ==> OrderedDict( (1, 11), (3, 13) )

d1.reverse() ==> OrderedDict( (3, 13), (2, 12), 1, 11) )

Interesting idea.
d1.insert(1, (4, 14))
==> OrderedDict( (1, 11), (4, 14), (2, 12), 3, 13) )

Also an interesting idea.
etc.

But no other way to directly manipulate the keys should be provided.

Why - don't trust yourself with it ?

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top