Why are there no ordered dictionaries?

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

I disagree. It is exposed so that you can manually change the order
(e.g. to create a "sorted" dict, rather than one ordered by key
insertion). What do you *gain* by hiding it ?

See my other posting. Of course hiding the list can only be done, if

In this case, you can change the order directly by using the list
methods on the dictionary. Sorting would be an example. Instead of

d.sequence = sorted(d.sequence)

you could simply write d.sort() which does the same.
Hmm... I did consider this. Which list methods would you consider
appropriate ?

Everything method that does not clash with the use as dictionary. For
instance, both lists and dicts have __getitem__ and __setitem__, so im
this case, the dictionary method must take precedence. But a dictionary
has not __getslice__ and __setslice__, so here the list methods can be
used (__getslice__ is actually deprecated, but you get the idea). In
some cases, like __delitem__, both have it, but there is no clash.

Other interesting methods are sort() and reverse().

Here, we have another problem however: There is not only the list of
keys, but also the list of values, and sometimes, as in the case of
sort() and reverse(), it would be also nice to have it operate on the
list of values. How should this be done? PHP solves it by using two
different methods ksort and asort for keys and values. In this notation:

d = OrderedDict( (1,13), (3,12), (2,11) )

d.ksort() ==> d = ( (1,13), (2,11), (3,12) )
d.asort() ==> d = ( (2,11), (3,12), (1,13) )

Similar for reverse().

If the keys() and values() methods would be extended to be setters, then
d.ksort() = d.keys(sorted(d.keys())) and
d.asort() = d.values(sorted(d.values()))

Anyway, I don't like "ksort" and "asort". If it must be, I'd rather use

d.ksort() = d.sortkeys()
d.asort() = d.sortvalues()

d.sort() could default to one of them (not sure which one).

-- Christoph
 
C

Christoph Zwerschke

Fuzzyman said:
That's not the only use case. Other use cases are to have a specific
order, not based on entry time.

Simple example :
d1 = OrderedDict(some_dict.items())
d1.sequence.sort()
d1 is now an ordered dict with the keys in alphabetic order.

As I said, I would not need to access the sequence, if I can write
d1.sort() or d1.sortkeys()
If you don't want to modify sequence, don't. If you want a copy do :
seq = d1.sequence[:]

This is not needed since you can do the same with: seq = d1.keys()

-- Christoph
 
C

Christoph Zwerschke

Carsten said:
Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

Exactly. But I don't think in this case such methods would be needed.
You easily get the i-th value in the ordered dict as d.values().

-- Chris
 
C

Christoph Zwerschke

Here is another old posting I just found which again gives the same use
cases and semantics:

http://mail.python.org/pipermail/python-dev/2005-March/051974.html

"Keys are iterated over in the order that they are added. Setting a
value using a key that compares equal to one already in the dict
replaces the value, but not the key, and does not change the iteration
order. Removing a key (and value) then re-adding it moves the key to the
end of the iteration order."

So far all those who would like to have an ordered dict seem to have the
same semantics in mind.
 
C

Carsten Haese

Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

Exactly. But I don't think in this case such methods would be
needed. You easily get the i-th value in the ordered dict as
d.values().

-- Chris[/QUOTE]

True enough, but unless the odict has its list of values on hand, you're
asking the odict to build a list of all its values just so that you can get
the i'th element. Having a method that does the equivalent of d[d.sequence]
would be cleaner and more efficient.

-Carsten
 
A

Alex Martelli

Christoph Zwerschke said:
d.ksort() = d.sortkeys()
d.asort() = d.sortvalues()

d.sort() could default to one of them (not sure which one).

Define JUST d.sort, you can trivially implement the other as
d.sort(key=d.get).


Alex
 
F

Fuzzyman

Alex said:
Freedom of implementation, of course. E.g., I can perhaps get better
performance by keeping a red-black tree of (insertiontime,key) pairs, so
for example deleting a key is O(log(N)) rather than O(N) as it would
have to be if the sequence had to be kept as a Python list.

What ELSE do you ever gain by enforcing abstraction and information
hiding? Conceptual integrity and implementation freedom...

The poster I was replying to inferred that we should hide it to protect
him from breaking it. :)

Your reason is much more valid than the one I inferred. (And probably
worth chanign the code for).
What about allowing slicing, taking advantage of the fact that slices
are NOT valid dictionary keys?

Hmm... ok, simple enough to implement. What would anyone *use* it for
though ?
Presumably sort and reverse methods are also desired.

Yeah - good idea. Insert is also good - I can't remember if we provide
this or not.

Thanks


Fuzzyman
http://www.voidspac.org.uk/python/index.shtml
 
F

Fuzzyman

Christoph said:
Fuzzyman said:
So what do you want returned when you ask for d1[1] ? The member keyed
by 1, or the item in position 1 ?

In case of conflict, the ordered dictionary should behave like a
dictionary, not like a list. So d1[1] should be the member keyed by 1,
not the item in position 1. Only in case there is no member keyed by 1,
the item in position 1 could be returned, but I think that would be too
adventurous a hattrick and can lead to big confusion. Better to raise a
KeyError in that case.

I agree - hard to trace bugs lie this way....

[snip..]
Instead of writing d.sequence = new_sequence, I would write
d.setkeys(new_sequence). But I'm not sure what to do if new_sequence is
not a permutation of the old one. Raise a KeyError? Or even swallow
this? For instance

This is an interesting question - I don't know which is the best
behaviour here.

It's probably better to raise a KeyError - that way the error will
occur when it happens.

The simplest way of doing this is to check that the new key list
(sorted) is the same as the original one (sorted). This is potentially
quite an expensive operation.

It might be faster to compare sets with Python 2.4 - but we intend to
retain compatibility with Python 2.2.
d = OrderedDict((1,11), (2,12))

d.setkeys((2, 1)) ==> d = OrderedDict((2, 11), (1, 12))

d.setkeys((3, 4)) ==> d = OrderedDict((3, 11), (4, 12)) (or KeyError?)

d.setvalues((12, 11)) ==> d = OrderedDict((1, 12), (2, 11))

d.setvalues((13, 14)) ==> d = OrderedDict((1, 13), (2, 14)) (always ok)

(Or maybe better set_keys in analogy to has_key?)

I hesitate making keys and values managed properties, because this would
conflict with them being methods in ordinary dicts. Ordered dicts should
resemble ordinary dicts as closely as possible. And giving it a
different name like "sequence" I find confusing and unintuitive.

You really want to be able to set values as a sequence ? I suppose it's
even easier to implement than changing keys, but I'd never considered
it.
A resort could be the following: If keys() is given a sequence as
argument, then use this as the new sequence of keys, and similar with
values(). This way, no new methods need to be introduced.

That's not a bad idea. I'll chat with Nicola Larosa and see what he
thinks.

It does break backawards compatibility of course...

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
D

Duncan Booth

Christoph said:
You're right with both. The ECMA language definition says object
properties are an unordered collection, but MSIE and probably other
browsers keep the order in which they were created. Of course one
should not rely on that.
Yes, the real fun bit is arrays. If you create an array using the 'new
Array' or [ ... ] then a for..in loop might well trick you into thinking it
is going to go through the elements in order, but if you assign to elements
directly it breaks down:

a = ['a', 'b', 'c'];
a[4] = 'e';
a[3] = 'd';
for (var k in a) {
alert(a[k]+'='+a[k]);
}

On IE this will go through elements in the order 0, 1, 2, 4, 3.

Also, it is original order of insertion which matters, not the 'last
in/last out' you might have expected. In other words it looks rather as
though IE simply sets deleted properties to undefined (and skips them on
iteration), it never really deletes a property, so anyone who tries to use
a Javascript object as an associative array with lots of rapidly changing
keys could be in for a nasty shock.
 
F

Fuzzyman

Carsten said:
Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

Exactly. But I don't think in this case such methods would be
needed. You easily get the i-th value in the ordered dict as
d.values().

-- Chris


True enough, but unless the odict has its list of values on hand, you're
asking the odict to build a list of all its values just so that you can get
the i'th element. Having a method that does the equivalent of d[d.sequence]
would be cleaner and more efficient.
[/QUOTE]

I'm going to add some of the sequence methods. I'm *not* going to allow
indexing, but I will allow slicing.

You can also do d[d.keys()]

This provides two ways of fetching values by index, so I don't want to
add another.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
C

Christoph Zwerschke

Duncan said:
On IE this will go through elements in the order 0, 1, 2, 4, 3.

Oops! I bet most people would not expect that, and it is probably not
mentioned in most Javascript tutorials. I think this is a weakpoint of
the ECMA definition, not MSIE.

-- Christoph
 
F

Fuzzyman

Tom said:
I'm not too keen on this - there is conceptually a list here, even if it's
one with unusual constraints, so there should be a list i can manipulate
in code, and which should of course be bound by those constraints.

I think I am now in favour of hiding hte sequence attribute.

You will be able to mutate the the keys list through :

d1 = OrderedDict(some_sequence_of_items)
keys = d1.keys()
keys.sort() # or other mutation
d1.keys(keys)

Admittedly this is a lot slower than :

d1 = OrderedDict(some_sequence_of_items)
d1.sequence.sort()

*but* it frees the squence attribute from any implementation details.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
B

Bengt Richter

Fuzzyman said:
So what do you want returned when you ask for d1[1] ? The member keyed
by 1, or the item in position 1 ?

In case of conflict, the ordered dictionary should behave like a
dictionary, not like a list. So d1[1] should be the member keyed by 1,
not the item in position 1. Only in case there is no member keyed by 1,
the item in position 1 could be returned, but I think that would be too
adventurous a hattrick and can lead to big confusion. Better to raise a
KeyError in that case.
Why - don't trust yourself with it ?

No, because I think it is not needed if list operations like insert are
directly possible on your dictionary.

But maybe methods such as setkeys() and setvalues() would be nice to
have in addition.

Instead of writing d.sequence = new_sequence, I would write
d.setkeys(new_sequence). But I'm not sure what to do if new_sequence is
not a permutation of the old one. Raise a KeyError? Or even swallow
this? For instance

d = OrderedDict((1,11), (2,12))

d.setkeys((2, 1)) ==> d = OrderedDict((2, 11), (1, 12))

d.setkeys((3, 4)) ==> d = OrderedDict((3, 11), (4, 12)) (or KeyError?)

d.setvalues((12, 11)) ==> d = OrderedDict((1, 12), (2, 11))

d.setvalues((13, 14)) ==> d = OrderedDict((1, 13), (2, 14)) (always ok)

Too many weird possibilities for unexpected key-value associations.
d.setitems() might be safer, but see d.items[i:j] below (without eliminating d.items() ;-)

The implication above is that OrderedDict takes an *args argument,
but really it takes a single argument that is a sequence of k,v pairs,
(and maybe some keyword options).
(Or maybe better set_keys in analogy to has_key?)

I hesitate making keys and values managed properties, because this would
conflict with them being methods in ordinary dicts. Ordered dicts should
resemble ordinary dicts as closely as possible. And giving it a
different name like "sequence" I find confusing and unintuitive.

A resort could be the following: If keys() is given a sequence as
argument, then use this as the new sequence of keys, and similar with
values(). This way, no new methods need to be introduced.
You could make keys, values, and items custom descriptors which would define __call__
for the old key() etc accesses, well as __getitem__ and __setitem__. Then you could write

d.items[0], d.items[-1] = d.items[-1], d.items[0]

to swap the order of the first and last items in the thing (I hesitate to say dict ;-)
You could also operate on slices. BTW, before I showed an example where d[2:3] returned
a new dict instance rather than d.items()[:]. I think the items list is better, since
they naturally add, sort, reverse, etc as lists, and you can write OrderedDict(d[2:3]+d[1:2])
if you want a new dict.

A slice assignment to d[i:j] is really sugar for something, but we have to decide exactly
what in case of duplicate keys in the incoming items, either with each other (effictively
a shorter incoming list with the last key:value pairs replacing prior pairs with the same key,
but the first key determining placement -- but where? what if a key doesn't exists outside of
the target slice (hence not within the eliminated slice, since the whole target dict item list
has unique keys)? One way to define it would be
d.items[i:j] = itemseq

to be implemented as sugar for
newitems = d.items[:i] + list(itemseq) + d.items[j:]
d.clear()
d.update(newitems)
so
d.reverse()
could be spelled
d.items[:] = d.items[::-1]

If you are using slices, you can safely use them directly in the __getitem__ of th dict interface,
as I did in the "Que mas?" post. So we don't have to write d.items[i:j] and could write d[i:j].
The thing is, d[i:j] will tempt to skip ".items" in d.items and write d, which has the dict key meaning,
not the item list index. It is faster not have a descriptor between though.

I think maybe allowing write to keys or values is pretty iffy. There are too many weird
re-associations of keys and values possible, and which you could do my other means if you
really really wanted to. But I do think full index and slice read access would be fine.

I think also that d1==d2 should effectively be implemented as d1[:] == d2[:] -- i.e, compare
the item lists to implement comparisons.

Detailed requirements are most of the work ;-)
I'm thinking now to try subclassing list in a constrained way instead of dict, but well see.

Regards,
Bengt Richter
 
C

Christoph Zwerschke

Fuzzyman said:
I'm going to add some of the sequence methods. I'm *not* going to allow
indexing, but I will allow slicing.

You can also do d[d.keys()]

This provides two ways of fetching values by index, so I don't want to
add another.


And this would be probably faster than d.values() in the usual
implementation where values() has to be built first as Carsten noted.

-- Chris
 
C

Christoph Zwerschke

Fuzzyman said:
You will be able to mutate the the keys list through :

d1 = OrderedDict(some_sequence_of_items)
keys = d1.keys()
keys.sort() # or other mutation
d1.keys(keys)

Admittedly this is a lot slower than :

d1 = OrderedDict(some_sequence_of_items)
d1.sequence.sort()

*but* it frees the squence attribute from any implementation details.

You should also implement

d1.sort() or d1.sortkeys()

which will have no performance drawbacks.

-- Christoph
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top