Why are there no ordered dictionaries?

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

Fuzzyman said:
Of course ours is ordered *and* orderable ! You can explicitly alter
the sequence attribute to change the ordering.

What I actually wanted to say is that there may be a confusion between a
"sorted dictionary" (one where the keys are automatically sorted) and an
"ordered dictionary" (where the keys are not automatically ordered, but
have a certain order that is preserved). Those who suggested that the
"sorted" function would be helpful probably thought of a "sorted
dictionary" rather than an "ordered dictionary."

-- Christoph
 
C

Christoph Zwerschke

Bengt said:
After finally reading that the odict.py in PyPI by Larosa/Foord was what was desired,
but slower in use than what Fredrik posted, I decided to see if I could speed up odict.py.
I now have a version that I think may be generally faster.

Hm, I wouldn't formulate it that way that I want the odict of
Larosa/Foord, but I want "the one" "obvious" odict for which
Larosa/Foord have already made one implementatin ;-)

Others are here:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/438823
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/107747
http://pleac.sourceforge.net/pleac_python/hashes.html#AEN250

It is all essentially the same idea I think (though after having a
closer look I see implementation shortcomings in all of them).
> I now have a version that I think may be generally faster.

Great. I also wanted to do that. Also, I would like to add some
functionality to Larosa/Foord's odict, like creating or updating an
odict from an ordinary dict (keys taken over from the ordinary dict will
be either in random order or automatically sorted). An ordered
dictionary should also have methods for sorting (like PHP's ksort()).
This way, you could initialize an ordered dict from an ordinary dict,
sort it, and from then on never care to call keys().sorted() or
something when iterating over the dictionary. Probably there are other
methods from lists that could be taken over to ordered dicts.

-- Christoph
 
C

Christoph Zwerschke

def __init__(self, init_val = ()):
>> dict.__init__(self, init_val)
>> self.sequence = [x[0] for x in init_val]
> But that doesn't allow you to create an ordered dict from another
> ordered dict.

Right; I did not want to present a full implementation, just the
relevant part. Of course, a real implementation should also allow to
build an ordered dict from another ordered dict or an ordinary dict. (In
the latter case, maybe the keys should be automatically sorted.) But one
or two case disctinctions would not make things mentionable slower.

-- Christoph
 
C

Carsten Haese

Would the default semantics below really be that suprising?

"An ordered dictionary remembers the order in which keys are first seen
[...] Overwriting an entry replaces
its value, but does not affect its position in the key order. Removing
an entry (using 'del') _does_ remove it from the key order. Accordingly,
if the entry is later recreated, it will then occur last in the key
order. [...]"
I can't help but I still find it unambiguous and intuitive enough to
consider it "the one" standard implementation for ordered dictionaries.

I don't think it's intuitive if you can't describe it without
contradicting yourself. If the order of the keys really were the order
in which they were first seen by the dictionary, deleting and recreating
a key should maintain its original position.

-Carsten
 
K

Kay Schluehr

But, as has been mentioned**n, this is only one example of an ordering one
could make default for an "ordered" dictionary. Suppose you say it should
be ordered by insertion order, so
d = OrderedDict(); d[1]='one'; d[2]='two' =>> list(d) => [1, 2]
ok, now we do d[1]='ein' and what is the order? list(d) => [2, 1] ??
Or do replacements not count as "insertions"?

"Insertion-order" is not a good term. For a dictionary {key:value} pair
creation, updating and deletion are possible modifying operations. I
don't see how deletion influences the order so creation and updating
remains. Therefore you have to select beween a creation_order and an
update_order. This can be reduced to one additional keyword e.g.
"creation_order" that is assigned a boolean value which is true by
default.
The devil is always going
to be in the details. Maybe you want a model that works more like a list
of key:value pairs with just optimized access to a pair by key name as
well as position in the list. Or maybe you want to permit append and
NOT prevent [('a',1), ('a':2)] and maybe d['a'] => [1, 2] ???

As far as I understand the requirement an odict provides the same
interface as a dict. The only difference is a certain order of the keys
that is induced by operations on a dict and cannot be established by
properties of the keys ( or values ) itself.
Note that is isn't hard to snap a few pieces together to make an ordered
dict to your own specs. But IMO it belongs in pyPI or such, not in the system
library. At least until it gets a lot of mileage -- and MMV ;-)

It's also not very hard to write a hex2ascii converter. That's the
reason why 20 incompatible versions of it ( coded in C ) exists in my
department ;)

Kay

PS. Here is some attempt of my own to implement an odict, following the
discussion here.
The implementation highlights just the model and is incomplete:

class odict(dict):
def __init__(self, create_order = True):
dict.__init__(self)
self.create_order = create_order
self.__cnt = 0

def __setitem__(self, key, value):
val = dict.get(self,key)
if val and self.create_order:
dict.__setitem__(self, key, (val[0], value))
else:
self.__cnt+=1
dict.__setitem__(self, key, (self.__cnt, value))

def __getitem__(self, key):
return dict.__getitem__(self, key)[1]

def values(self):
return list(zip(*sorted(dict.values(self)))[1])

def keys(self):
ks = [(dict.get(self,k)[0],k) for k in dict.keys(self)]
return list(zip(*sorted(ks))[1])
od = odict()
od["a"] = 0
od["b"] = 8
od.keys()
["a", "b"]
od = odict(create_order = False)
od["a"] = 1
od["b"] = 2
od["a"] = 3
od.keys()
["b", "a"]
 
C

Christoph Zwerschke

One implementation detail that I think needs further consideration is in
which way to expose the keys and to mix in list methods for ordered
dictionaries.

In http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/107747
the keys are exposed via the keys() method which is bad. It should be a
copy only, like for ordinary dicts (one comment also mentions that).

In Foord/Larosa's odict, the keys are exposed as a public member which
also seems to be a bad idea ("If you alter the sequence list so that it
no longer reflects the contents of the dictionary, you have broken your
OrderedDict").

I think it would be probably the best to hide the keys list from the
public, but to provide list methods for reordering them (sorting,
slicing etc.).

For instance:

d1 = OrderedDict( (1, 11), (2, 12), 3, 13) )

d1[1:] ==> OrderedDict( (2, 12), 3, 13) )

d1[0] + d1[2] ==> OrderedDict( (1, 11), (3, 13) )

d1.reverse() ==> OrderedDict( (3, 13), (2, 12), 1, 11) )

d1.insert(1, (4, 14))
==> OrderedDict( (1, 11), (4, 14), (2, 12), 3, 13) )

etc.

But no other way to directly manipulate the keys should be provided.

-- Christoph
 
C

Christoph Zwerschke

Sure. Others think so too. The problem is that if you and these other
people actually write down exactly how this unambigous ordered dict
should behave, you'll end up with a dozen different sets of semantics,
which can be divided in at least three main groups.

That's the point where I dare to disagree. As I have pointed out in
another posting in this thread, all other implementations have the same
semantics for the basic behavior. I cannot see three different groups.

Again, what's so surprising as the "natural" semantics described here:
http://mail.python.org/pipermail/python-dev/2005-March/052041.html

-- Christoph
 
C

Carsten Haese

In Foord/Larosa's odict, the keys are exposed as a public member which
also seems to be a bad idea ("If you alter the sequence list so that it
no longer reflects the contents of the dictionary, you have broken your
OrderedDict").

That could easily be fixed by making the sequence a "managed property"
whose setter raises a ValueError if you try to set it to something
that's not a permutation of what it was.
d1[0] + d1[2] ==> OrderedDict( (1, 11), (3, 13) )

What do you think you're doing here?

-Carsten
 
C

Christoph Zwerschke

Carsten said:
I don't think it's intuitive if you can't describe it without
contradicting yourself. If the order of the keys really were the order
in which they were first seen by the dictionary, deleting and recreating
a key should maintain its original position.

Admitted that description was not perfect (anyway it was not mine ;-)).

If a key is deleted, it is deleted. I don't think it is intuitive to
expect that the dict "remembers" a deleted item. If it is added again,
it is like it is seen for the first time and thus appended.

I don't think your argument viliates what I said in my last post.

-- Chris
 
C

Christoph Zwerschke

I still believe that the concept of an "ordered dictionary" ("behave
Bengt said:
Does that mean that the Larosa/Foord odict.py implementation in PyPI
does not do what you want?

Basically, it is what I had in mind. But I'm now seeing some things that
could be improved/supplemented, e.g.
- performance improvements
- the internal keys list should be hidden
- list methods should be mixed in instead

-- Christoph
 
C

Christoph Zwerschke

Carsten said:
That could easily be fixed by making the sequence a "managed property"
whose setter raises a ValueError if you try to set it to something
that's not a permutation of what it was.

Ok, a managed attribute would be an option. You would allow people to do
what they want with the sequence, and then fix the dictionary
accordingly (if items were deleted from the sequence, they are deleted
from the dictionary, it items were added which are not in the directory,
a ValueError is raised etc.).

But probably it is still better to hide the sequence and instead of
letting people do list operations like sort() on the odict.sequence, let
them do these operations on odict directly.
d1[0] + d1[2] ==> OrderedDict( (1, 11), (3, 13) )
What do you think you're doing here?

Sorry, what I meant was

d1[0:0] + d1[2:2] ==> OrderedDict( (1, 11), (3, 13) )

Ordered dictionaries could allow slicing and concatenation.

-- Christoph
 
C

Christoph Zwerschke

d1[0:0] + d1[2:2] ==> OrderedDict( (1, 11), (3, 13) )

Oops, sorry, that was nonsense again. I meant
d1[0:1] + d1[1:2] ==> OrderedDict( (1, 11), (3, 13) )
Ordered dictionaries could allow slicing and concatenation.

Some operations such as concatenation need of course special
considerations, since the keys must stay unique. A concatenation of
ordered dicts with overlapping keys should probably give an IndexError.

-- Christoph
 
B

Bengt Richter

Carsten said:
That could easily be fixed by making the sequence a "managed property"
whose setter raises a ValueError if you try to set it to something
that's not a permutation of what it was.

Ok, a managed attribute would be an option. You would allow people to do
what they want with the sequence, and then fix the dictionary
accordingly (if items were deleted from the sequence, they are deleted
from the dictionary, it items were added which are not in the directory,
a ValueError is raised etc.).

But probably it is still better to hide the sequence and instead of
letting people do list operations like sort() on the odict.sequence, let
them do these operations on odict directly.
d1[0] + d1[2] ==> OrderedDict( (1, 11), (3, 13) )
What do you think you're doing here?

Sorry, what I meant was

d1[0:0] + d1[2:2] ==> OrderedDict( (1, 11), (3, 13) )

Ordered dictionaries could allow slicing and concatenation.
Those are zero-length slices in normal notation. ITYM [0:1] and [2:3]?Note that you'd have to define addition as possible replacement,
if all the keys happened to match. Or pure concat if none matched,
and variations mixing both ways.
But with the current version you can already write that as

OrderedDict(d1.items()[0:1]+d2.items()[2:3])

you just want the sugar? d1+d2 would be like using [:] in the above line
Not a biggie to do.

Regards,
Bengt Richter
 
B

Bengt Richter

def __init__(self, init_val = ()):
dict.__init__(self, init_val)
self.sequence = [x[0] for x in init_val]
But that doesn't allow you to create an ordered dict from another
ordered dict.

Right; I did not want to present a full implementation, just the
relevant part. Of course, a real implementation should also allow to
build an ordered dict from another ordered dict or an ordinary dict. (In
the latter case, maybe the keys should be automatically sorted.) But one
or two case disctinctions would not make things mentionable slower.
Since the OrderedDict constructor takes a sequence of tuples as a legitimate
argument, you can always create an ordered dict from an unordered by getting
unordered_dict.items() and sorting or ordering them any way you want and calling
the OrderedDict constructor. Ditto for ordered dicts, since they give your their
ordered items with the items() method as a start. I guess one could pass a
key=fun keyword arg to the OrderedDict constuctor to imply a pre-construction sort.

Regards,
Bengt Richter
 
B

Bengt Richter

It's also not very hard to write a hex2ascii converter. That's the
reason why 20 incompatible versions of it ( coded in C ) exists in my
department ;)
Bicycle shed effect, I guess ;-)
Kay

PS. Here is some attempt of my own to implement an odict, following the
discussion here.
The implementation highlights just the model and is incomplete:
This is essentially the tack I took in modifying odict.py, except I
added optional caching of sorted items, and other minor differences.
class odict(dict):
def __init__(self, create_order = True):
dict.__init__(self)
self.create_order = create_order
self.__cnt = 0

def __setitem__(self, key, value):
val = dict.get(self,key)
if val and self.create_order:
dict.__setitem__(self, key, (val[0], value))
else:
self.__cnt+=1
dict.__setitem__(self, key, (self.__cnt, value))

def __getitem__(self, key):
return dict.__getitem__(self, key)[1]

def values(self):
return list(zip(*sorted(dict.values(self)))[1])
maybe more directly
return [v for i,v in sorted(dict.values(self))]
def keys(self):
ks = [(dict.get(self,k)[0],k) for k in dict.keys(self)]
return list(zip(*sorted(ks))[1])
or (untested)
def keys(self):
return [k for k,v in sorted(dict.items(self), key=operator.itemgetter(1))]
def items(self):
return [(k,v[1]) for k,v in sorted(dict.items(self), key=operator.itemgetter(1))]
od = odict()
od["a"] = 0
od["b"] = 8
od.keys()
["a", "b"]
od = odict(create_order = False)
od["a"] = 1
od["b"] = 2
od["a"] = 3
od.keys()
["b", "a"]


Regards,
Bengt Richter
 
B

bonono

Bengt said:
Sorry, I thought you wanted an ordered dict too.
I want/need(well I am told I don't need) to loop over a dict in certain
order but I don't want or need a standard one as I don't think there is
ONE implementation of it. My original post was a response to the
question "why do one want ordered dict", in the tone of "there is no
way one wants it".
I like solving problems. I just get frustrated when people don't focus on getting
the problem defined, which IME is 2/3 of the way to a solution. I don't mind,
in fact enjoy, rambling musings, but if someone seems actually to want a solution
for something, I like to try to realize it concretely.
I tried to define the problem, and how I solve it(if it helps to convey
the message), but was told you don't have the problem in the first
place.
 
T

Tom Anderson

That could easily be fixed by making the sequence a "managed property"
whose setter raises a ValueError if you try to set it to something
that's not a permutation of what it was.

I'm not a managed property expert (although there's a lovely studio in
Bayswater you might be interested in), but how does this stop you doing:

my_odict.sequence[0] = Shrubbery()

Which would break the odict good and hard.

tom
 
T

Tom Anderson

One implementation detail that I think needs further consideration is in
which way to expose the keys and to mix in list methods for ordered
dictionaries.

In Foord/Larosa's odict, the keys are exposed as a public member which
also seems to be a bad idea ("If you alter the sequence list so that it
no longer reflects the contents of the dictionary, you have broken your
OrderedDict").

I think it would be probably the best to hide the keys list from the public,
but to provide list methods for reordering them (sorting, slicing etc.).

I'm not too keen on this - there is conceptually a list here, even if it's
one with unusual constraints, so there should be a list i can manipulate
in code, and which should of course be bound by those constraints.

I think the way to do it is to have a sequence property (which could be a
managed attribute to prevent outright clobberation) which walks like a
list, quacks like a list, but is in fact a mission-specific list subtype
whose mutator methods zealously enforce the invariants guaranteeing the
odict's integrity.

I haven't actually tried to write such a beast, so i don't know if this is
either of possible and straightforward.

tom
 
B

Bengt Richter

One implementation detail that I think needs further consideration is in
which way to expose the keys and to mix in list methods for ordered
dictionaries.

In http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/107747
the keys are exposed via the keys() method which is bad. It should be a
copy only, like for ordinary dicts (one comment also mentions that).

In Foord/Larosa's odict, the keys are exposed as a public member which
also seems to be a bad idea ("If you alter the sequence list so that it
no longer reflects the contents of the dictionary, you have broken your
OrderedDict").

I think it would be probably the best to hide the keys list from the
public, but to provide list methods for reordering them (sorting,
slicing etc.).

For instance:

d1 = OrderedDict( (1, 11), (2, 12), 3, 13) )

d1[1:] ==> OrderedDict( (2, 12), 3, 13) )

d1[0] + d1[2] ==> OrderedDict( (1, 11), (3, 13) )

d1.reverse() ==> OrderedDict( (3, 13), (2, 12), 1, 11) )

d1.insert(1, (4, 14))
==> OrderedDict( (1, 11), (4, 14), (2, 12), 3, 13) )

etc.

But no other way to directly manipulate the keys should be provided.
>>> from odictb import OrderedDict
>>> d1 = OrderedDict([(1, 11), (2, 12), (3, 13)])
>>> d1 {1: 11, 2: 12, 3: 13}
>>> d1[1:] {2: 12, 3: 13}
>>> d1[0:1] + d1[2:3] {1: 11, 3: 13}
>>> d1.reverse()
>>> d1 {3: 13, 2: 12, 1: 11}
>>> d1.insert(1, (4,14))
>>> d1 {3: 13, 4: 14, 2: 12, 1: 11}
>>> d1.items() [(3, 13), (4, 14), (2, 12), (1, 11)]
>>> d1.keys() [3, 4, 2, 1]
>>> d1.values() [13, 14, 12, 11]
>>> d1[1:2] {4: 14}
>>> d1[-1:]
{1: 11}

Que mas?

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,246
Latest member
softprodigy

Latest Threads

Top