Why are there no ordered dictionaries?

  • Thread starter Christoph Zwerschke
  • Start date
K

Kay Schluehr

It seems to be though as "ordered dictionary" are slowly to be confined
to only "ordered on order of change to the dictionary".

While I'm only +0 for a standard odict I'm wondering that discussing
this topic leads to the auctoritative conclusion that it is unsolvable,
we have to accept infinite diversity etc. where people like me seeing a
classification immediately ( mathematical education? ) . Of course this
matter is trivial but we already know about monster-threads revolving
around decorator syntax ( including hurt souls and semi-scientific
papers ) and abandoning the print statement in Python 3.0.
 
D

Duncan Booth

Christoph said:
Ok, I just did a little research an compared support for ordered dicts
in some other languages:
Just to add to your list:

In Javascript Object properties (often used as an associative array) are
defined as unordered although as IE seems to always store them in the order
of original insertion it wouldn't surprise me if there are a lot of
websites depending on that behaviour.

Javascript Array indexes are also stored as properties and are therefore
also unordered.
 
B

Bengt Richter

While I'm only +0 for a standard odict I'm wondering that discussing
this topic leads to the auctoritative conclusion that it is unsolvable,
we have to accept infinite diversity etc. where people like me seeing a
classification immediately ( mathematical education? ) . Of course this
matter is trivial but we already know about monster-threads revolving
around decorator syntax ( including hurt souls and semi-scientific
papers ) and abandoning the print statement in Python 3.0.
I think the concept has converged to a replace-or-append-by-key ordering
of key:value items with methods approximately like a dict. We're now
into usability aspects such as syntactic sugar vs essential primitives,
and default behaviour vs selectable modes, ISTM.

E.g., it might be nice to have a mode that assumes d[key] is d.items()[k][1] when
key is an integer, and otherwise uses dict lookup, for cases where the use
case is just string dict keys.

But feature creep is sure a threat to clean design.

Regards,
Bengt Richter
 
R

Rick Wotnaz

I disagree. It is exposed so that you can manually change the
order (e.g. to create a "sorted" dict, rather than one ordered
by key insertion).

What do you *gain* by hiding it ?

The internal list should be 'hidden' in the sense that it itself
would not be modifiable, though it should be routine to obtain a
copy of it at any time. That copy could then be arranged as needed.
Any rearrangement of the original list's order destroys the reason
for having an entry- or change-ordered dict in the first place.
 
C

Carsten Haese

That could easily be fixed by making the sequence a "managed property"
whose setter raises a ValueError if you try to set it to something
that's not a permutation of what it was.

I'm not a managed property expert (although there's a lovely studio in
Bayswater you might be interested in), but how does this stop you doing:

my_odict.sequence[0] = Shrubbery()

It would only break if the getter returns the internal list directly.
The getter should return a copy instead, which is what the keys() method
already does anyway. This would ensure that the only way to alter the
sequence is by replacing it in its entirety.

-Carsten.
 
F

Fuzzyman

Rick said:
The internal list should be 'hidden' in the sense that it itself
would not be modifiable, though it should be routine to obtain a
copy of it at any time. That copy could then be arranged as needed.
Any rearrangement of the original list's order destroys the reason
for having an entry- or change-ordered dict in the first place.

That's not the only use case. Other use cases are to have a specific
order, not based on entry time.

Simple example :

d1 = OrderedDict(some_dict.items())
d1.sequence.sort()

d1 is now an ordered dict with the keys in alphabetic order.

If you don't want to modify sequence, don't. If you want a copy do :

seq = d1.sequence[:]

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
A

Alex Martelli

Fuzzyman said:
There is already an update method of course. :)

Slicing an ordered dictionary is interesting - but how many people are
actually going to use it ? (What's your use case)

I detest and abhor almost-sequences which can't be sliced (I consider
that a defect of collections.deque). If the ordered dictionary records
by its sequencing the time order of key insertion, being able to ask for
"the last 5 keys entered" or "the first 3 keys entered" seem to me to be
perfectly natural use cases, and most naturally accomplished by slicing
of course, d[-5:] and d[:3] respectively.


Alex
 
A

Alex Martelli

Fuzzyman said:
I disagree. It is exposed so that you can manually change the order
(e.g. to create a "sorted" dict, rather than one ordered by key
insertion).

What do you *gain* by hiding it ?

Freedom of implementation, of course. E.g., I can perhaps get better
performance by keeping a red-black tree of (insertiontime,key) pairs, so
for example deleting a key is O(log(N)) rather than O(N) as it would
have to be if the sequence had to be kept as a Python list.

What ELSE do you ever gain by enforcing abstraction and information
hiding? Conceptual integrity and implementation freedom...

Hmm... I did consider this. Which list methods would you consider
appropriate ?

The difficulty is that integers are valid keys for a dictionary.
Perhaps a subclass that uses integers as indexes instead...

What about allowing slicing, taking advantage of the fact that slices
are NOT valid dictionary keys?
Presumably sort and reverse methods are also desired.
You can always perform list operations on the ``sequence`` attribute of
course.

But NOT having to expose .sequence as a real mutable list whose changes
affect ordering has many advantages, as above noticed.


Alex
 
C

Christoph Zwerschke

Steve said:
Perhaps now the answer top your question is more obvious: there is by no
means universal agreement on what an "ordered dictionary" should do.
Given the ease with which Python allows you to implement your chosen
functionality it would be presumptuous of the core developers to favour
any one of the several reasonable alternatives that might be chosen.

The discussion showed indeed that there are some points which are not so
straightforward as I believed. But I still don't see the "several
reasonable alternatives". So far all implementations I have seen are
basically pretty similar. Is there any implementation that is basically
different from Foord/Larosa's odict? I still don't see a principal
problem here, just the problem that the existing implementations are not
well enough thought-out and sophisticated enough.
> Given the ease with which Python allows you to implement your chosen
> functionality it would be presumptuous of the core developers to
> favour any one of the several reasonable alternatives that might be
> chosen.

You could say the same about the idea of sets in Python, and it is
probably the reason why it took so much time until they became a part of
Python. I wished that had happened earlier, since sets are "the" basic
mathematic structure. I'm now very happy to have sets not only in the
standard lib but even as built-in types and I'm sure there hasn't been
"universal agreement" on how sets should be implemented either. I don't
think it was presumptuous to finally settle down on one implementation.

Of course, ordered dictionaries are no candidates for becoming built-in
data types, and the existing implementations are not sophisticated and
mature enough to go to the standard lib right now. But principally, if
an improved version is developed (maybe on the occasion of this thread)
and will be tested and proven, why should it not go to the standard lib
some day?

BTW, somebody has suggested it could go to "collections" which sounds
like the right place, but the description says the module is intended
for "High-performance container datatypes", and, as has been discussed,
ordered dictionaries are not used for performance or efficiency reasons,
but rather for reasons of convenience. So *if* they ever go to the
standard lib, I'm not sure whether "collections" would be the right
place. Or collections will need a different description - maybe there
are other interesting basic collection types which are chosen for
convenience, not for performance (for instance, ordered sets)?

-- Christoph
 
C

Christoph Zwerschke

It seems to be though as "ordered dictionary" are slowly to be confined
to only "ordered on order of change to the dictionary".

"Ordered dictionary" means that the keys are not an ordinary set like in
an ordinary dictionary, but an "ordered set." I think this definition is
pretty straightforward and common. An ordered set is the same as a
unique list, i.e. a list where all elements are unique.

When there is automatical ordering using a comparison function, I would
not speak of an "ordered directory", but of a "sorted directory." This
would be basically a dictionary plus a comparison function. The keys
would always be iterated in the order given by the comparison function.
It would be nice to have a sorted dictionary as well, even if you can
achieve the same by calling the sorted built-in on the keys. Think of a
sorted directory as a subclass of ordered directories. Implementing it
that way would even have perfomance benefits if the keys are inserted
using the bisect algorithm. This is better than calling sorted() on the
keys of an ordinary dictionary many times.

By the way, you will find the same terminology in Smalltalk, where
"SortedCollection" is a subclass of "OrderedCollection".

-- Christoph
 
C

Christoph Zwerschke

Bengt said:
> I think the concept has converged to a replace-or-append-by-key ordering
> of key:value items with methods approximately like a dict. We're now
> into usability aspects such as syntactic sugar vs essential primitives,
> and default behaviour vs selectable modes, ISTM.

Yes, and we would be good if we do not stop the discussion at this point
with nothing, but really create such a sophisticated implementation.
Whether it will become popular or go to the standard lib some day is a
completely different question.
> E.g., it might be nice to have a mode that assumes d[key] is d.items()[k][1] when
> key is an integer, and otherwise uses dict lookup, for cases where the use
> case is just string dict keys.

I also thought about that and I think PHP has that feature, but it's
probably better to withstand the temptation to do that. It could lead to
an awful confusion if the keys are integers.

-- Christoph
 
C

Christoph Zwerschke

* C++ has a Map template in the STL which is ordered (a "Sorted
Alex said:
Ordered *by comparisons on keys*, NOT by order of insertion -- an
utterly and completely different idea.

Shame on me. I talked so much about the difference between "ordered" and
"sorted" and now I wrote such a confusing sentence. You're right, C++
Maps are not an example for "ordered dictionaries", but for "sorted
dictionaries".

-- Christoph
 
C

Christoph Zwerschke

Duncan said:
In Javascript Object properties (often used as an associative array) are
defined as unordered although as IE seems to always store them in the order
of original insertion it wouldn't surprise me if there are a lot of
websites depending on that behaviour.

You're right with both. The ECMA language definition says object
properties are an unordered collection, but MSIE and probably other
browsers keep the order in which they were created. Of course one should
not rely on that.

-- Christoph
 
C

Carsten Haese

Bengt said:
I think the concept has converged to a replace-or-append-by-key ordering
of key:value items with methods approximately like a dict. We're now
into usability aspects such as syntactic sugar vs essential primitives,
and default behaviour vs selectable modes, ISTM.

Yes, and we would be good if we do not stop the discussion at this point
with nothing, but really create such a sophisticated implementation.
Whether it will become popular or go to the standard lib some day is a
completely different question.
E.g., it might be nice to have a mode that assumes d[key] is d.items()[k][1] when
key is an integer, and otherwise uses dict lookup, for cases where the use
case is just string dict keys.

I also thought about that and I think PHP has that feature, but it's
probably better to withstand the temptation to do that. It could lead to
an awful confusion if the keys are integers.

Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

-Carsten
 
C

Christoph Zwerschke

Alex said:
However, since Christoph himself just misclassified C++'s std::map as
"ordered" (it would be "sorted" in this new terminology he's now
introducing), it seems obvious that the terminological confusion is
rife. Many requests and offers in the past for "ordered dictionaries"
(e.g. on this group) were also "sorted", NOT "ordered", in this new
terminology.

I'm sorry again. Please don't conclude that others are as uneducated as
I am (I haven't studied computer science but mathematics). Speaking
about "ordered" and "sorted" in the context of collections is not a new
terminology I am introducing, but seems to be pretty common in computer
science (see, e.g.
http://www.gamespp.com/java/dataStructuresInJavaPart6.html).

Perhaps Pythonists are not used to that terminology, since they use the
term "list" for an "ordered collection". An ordered dictionary is a
dictionary whose keys are a (unique) list. Sometimes it is also called a
"sequence" (therefore in the odict implementation, the keys attribute
has this name). A unique list, in turn, can also be called an "ordered
set", so you can also understand an ordered dictionary as a dictionary
whose keys are an ordered set. I think all of this is pretty logical ;-)

-- Christoph
 
C

Christoph Zwerschke

Ganesan said:
the definition of "sorted" and "ordered", before we can > go on ? Sorted
would be ordered by key comparison. Iterating over such a container will
give you the keys in sorted order. Java calls this a SortedMap. See
http://java.sun.com/j2se/1.4.2/docs/api/java/util/SortedMap.html C++ STL
map container is also a Sorted Associative container. See
http://www.sgi.com/tech/stl/Map.html Ganesan

Exactly, that's "sorted." "Ordered" means the same there is some order
between the existing elements, but there is no magic (i.e. a general
comparison function) for ordering new elements. Thus, if you add an
element to an ordered collection, it simply gets appended (is considered
as the greatest of all elements) by convention, whereas if you add an
element to a sorted collection, it will be inserted into the correct
place by using the comparison function.

-- Christoph
 
C

Christoph Zwerschke

Alex said:
I detest and abhor almost-sequences which can't be sliced (I consider
that a defect of collections.deque). If the ordered dictionary records
by its sequencing the time order of key insertion, being able to ask for
"the last 5 keys entered" or "the first 3 keys entered" seem to me to be
perfectly natural use cases, and most naturally accomplished by slicing
of course, d[-5:] and d[:3] respectively.

I agree. A use case was requested: Say your dictionary holds form
fields, and you know the last field is always a hidden field you wont to
ignore in your output, or your dictionary holds attributes of a
database, and you don't want to print the first attribute which is
always some kind of uninteresting OID, then you would write
"for field in d[1:]" or "for field in d[:-1]".

(Otherwise, you would have to write "for field in d.keys()[1:]" etc.)

-- Christoph
 
C

Christoph Zwerschke

Bengt said:
from odictb import OrderedDict
d1 = OrderedDict([(1, 11), (2, 12), (3, 13)])
d1 {1: 11, 2: 12, 3: 13}
d1[1:] {2: 12, 3: 13}
d1[0:1] + d1[2:3] {1: 11, 3: 13}
d1.reverse()
d1 {3: 13, 2: 12, 1: 11}
d1.insert(1, (4,14))
d1 {3: 13, 4: 14, 2: 12, 1: 11}
d1.items() [(3, 13), (4, 14), (2, 12), (1, 11)]
d1.keys() [3, 4, 2, 1]
d1.values() [13, 14, 12, 11]
d1[1:2] {4: 14}
d1[-1:]
{1: 11}

Que mas?

Eso es exactamente lo que yo queria haber!

-- Chris
 
C

Christoph Zwerschke

I think it would be probably the best to hide the keys list from the
Tom said:
I'm not too keen on this - there is conceptually a list here, even if
it's one with unusual constraints, so there should be a list i can
manipulate in code, and which should of course be bound by those
constraints.

Think of it similar as the case of an ordinary dictionary: There is
conceptually a set here (the set of keys), but you cannot manipulate it
directly, but only through the according dictionary methods.

For an ordedred dictionary, there is conceptually a list (or more
specifically a unique list). Again you should not manipulate it
directly, but only through methods of the ordered dictionary.

This sounds at first more complicated, but is in reality more easy.

For instance, if I want to put the last two keys of an ordered dict d at
the beginning, I would do it as d = d[:-2] + d[-2:].

With the list attribute (called "sequence" in odict, you would have to
write: d.sequence = d.sequence[:-2] + d.sequence[-2:]. This is not only
longer to write down, but you also have to know that the name of the
attribute is "sequence". Python's strength is that you don't have to
keep many details in mind because it has a small "basic vocabulary" and
orthogonal use. I prefer the ordered dictionary does not introduce new
concepts or attributes if everything can be done intuitively with the
existing Python methods and operators.

-- Christoph
 
C

Christoph Zwerschke

Fuzzyman said:
So what do you want returned when you ask for d1[1] ? The member keyed
by 1, or the item in position 1 ?

In case of conflict, the ordered dictionary should behave like a
dictionary, not like a list. So d1[1] should be the member keyed by 1,
not the item in position 1. Only in case there is no member keyed by 1,
the item in position 1 could be returned, but I think that would be too
adventurous a hattrick and can lead to big confusion. Better to raise a
KeyError in that case.
Why - don't trust yourself with it ?

No, because I think it is not needed if list operations like insert are
directly possible on your dictionary.

But maybe methods such as setkeys() and setvalues() would be nice to
have in addition.

Instead of writing d.sequence = new_sequence, I would write
d.setkeys(new_sequence). But I'm not sure what to do if new_sequence is
not a permutation of the old one. Raise a KeyError? Or even swallow
this? For instance

d = OrderedDict((1,11), (2,12))

d.setkeys((2, 1)) ==> d = OrderedDict((2, 11), (1, 12))

d.setkeys((3, 4)) ==> d = OrderedDict((3, 11), (4, 12)) (or KeyError?)

d.setvalues((12, 11)) ==> d = OrderedDict((1, 12), (2, 11))

d.setvalues((13, 14)) ==> d = OrderedDict((1, 13), (2, 14)) (always ok)

(Or maybe better set_keys in analogy to has_key?)

I hesitate making keys and values managed properties, because this would
conflict with them being methods in ordinary dicts. Ordered dicts should
resemble ordinary dicts as closely as possible. And giving it a
different name like "sequence" I find confusing and unintuitive.

A resort could be the following: If keys() is given a sequence as
argument, then use this as the new sequence of keys, and similar with
values(). This way, no new methods need to be introduced.

-- Christoph
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,076
Latest member
OrderKetoBeez

Latest Threads

Top