Why are there no ordered dictionaries?

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

The implication above is that OrderedDict takes an *args argument,
but really it takes a single argument that is a sequence of k,v pairs,
(and maybe some keyword options).

Right. Interpret it as a short notation only, I did not want to change
the interface. I think it's a common mistake to forget the brackets
because you think one pair should be enough. At least I often forget it.
You could make keys, values, and items custom descriptors which would define __call__
for the old key() etc accesses, well as __getitem__ and __setitem__. Then you could write

d.items[0], d.items[-1] = d.items[-1], d.items[0]

to swap the order of the first and last items in the thing (I hesitate to say dict ;-)
You could also operate on slices.

Nice. You could also get the i-th value with d.values.

But is this considered good style or would it be considered "dirty" to
have a callable member that also supports indexing and slicing? (I don't
know, just asking?) Plus, it opens a can of worms by increasing the
complexity tremendously. Let's see whether this can be handled.
BTW, before I showed an example where d[2:3] returned
a new dict instance rather than d.items()[:]. I think the items list
> is better, since they naturally add, sort, reverse, etc as lists,
> and you can write OrderedDict(d[2:3]+d[1:2]) if you want a new dict.

Not sure about that. I would rather expect that if you slice an object,
you get an object of the same type. And you can also add, sort, reverse,
etc the ordered dict if you want.
A slice assignment to d[i:j] is really sugar for something, but we have to decide exactly
what in case of duplicate keys in the incoming items, either with each other ...

You mean slice assignments to d.items[i:j]. If you make the slice
assignment to d[i:j] then at least you cannot have conflicts in the
incoming items. The first question is: Should a slice assignment be
treated as deletion with subsequential addition (changing the order) or
as a replacement (i.e. try to not change the order)? I agree that the
second would be the expected semantics, but as you mentioned more
difficult to implement.
> One way to define it would be
> d.items[i:j] = itemseq

to be implemented as sugar for
newitems = d.items[:i] + list(itemseq) + d.items[j:]
d.clear()
d.update(newitems)

Which should be the same as
d = OrderedDict(d.items[:i] + list(itemseq) + d.items[j:])
Sounds reasonable.
So
d.reverse()
could be spelled
d.items[:] = d.items[::-1]

Slice assignment for keys and values would also allow to set a new key
order with

d.keys[:] = newkeyseq

So no need to define a setkeys() method or let keys() take arguments.
If we want to allow this kind of things at all.
If you are using slices, you can safely use them directly in the __getitem__ of th dict interface,
as I did in the "Que mas?" post. So we don't have to write d.items[i:j] and could write d[i:j].
The thing is, d[i:j] will tempt to skip ".items" in d.items and write d, which has the dict
> key meaning, not the item list index. It is faster not have a

descriptor between though.

I still think d[i:j] should return an ordered dict, not an item list.
I think maybe allowing write to keys or values is pretty iffy. There are too many weird
re-associations of keys and values possible, and which you could do my other means if you
really really wanted to. But I do think full index and slice read access would be fine.

There are different opinions. Fuzzyman would probably say "Don't trust
yourself?" I myself am undecided. Perhaps you could expect that somebody
knows what he does if he makes a slice assignment to d.keys. In any way,
such an assignment should not "break" the directory (with "broken" I
mean that the internal keys sequence does not match the keys of the
internal dictionary). If anything does not match, it should raise a
KeyException. If it is implemented that way, I think we can allow it.
I think also that d1==d2 should effectively be implemented as d1[:] == d2[:] -- i.e, compare
the item lists to implement comparisons.

Wouldn't it be more performant to compare for
d1.internal_dict==d2.internal_dict and
d1.internal_sequence==d2.internal_sequence?
You don't keep track of the item lists, they need to be built on every
occasion.

-- Christoph
 
C

Christoph Zwerschke

Fuzzyman said:
d.keys() will still return a copy of the list, so d.keys() will
still be slower than d.sequence


Right, I forgot that. Bengt suggested to implement __call__ as well as
__getitem__ and __setitem__ for keys, values and items.

In this case, you could very effectively access it as d.values.

-- Christoph
 
F

Fuzzyman

Christoph said:
Fuzzyman said:
d.keys() will still return a copy of the list, so d.keys() will
still be slower than d.sequence


Right, I forgot that. Bengt suggested to implement __call__ as well as
__getitem__ and __setitem__ for keys, values and items.

In this case, you could very effectively access it as d.values.


That means making keys, values, and items custom objects.
Creating a new instance would have the overhead of creating 4 new
objects instead of just 1. Is the added convenience worth it ? (Plus
the extra layers of method calls for each access).

I'm not sure. It's a nice idea in terms of using it (we could just
leave the sequence attribute as an alias for the new keys attribute -
for backwards compatibility).

All the best,



Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
C

Christoph Zwerschke

Le ruego me perdone.

replace('haber', random.choice('tener', 'hacer', 'lograr'))

Mi espanol es peor que mi python.

-- Christoph
 
F

Fuzzyman

Sure - that was just an example of mutating the keys list without
having direct access to it.

If keys was implemented as an object (with a ``__call__`` method) then
we could also implement sequence methods on it - making it easier to
mutate the keys/values/items directly.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
F

Fuzzyman

Sure - that was just an example of mutating the keys list without
having direct access to it.

If keys was implemented as an object (with a ``__call__`` method) then
we could also implement sequence methods on it - making it easier to
mutate the keys/values/items directly.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
F

Fuzzyman

Alex said:
Fuzzyman said:
There is already an update method of course. :)

Slicing an ordered dictionary is interesting - but how many people are
actually going to use it ? (What's your use case)

I detest and abhor almost-sequences which can't be sliced (I consider
that a defect of collections.deque). If the ordered dictionary records
by its sequencing the time order of key insertion, being able to ask for
"the last 5 keys entered" or "the first 3 keys entered" seem to me to be
perfectly natural use cases, and most naturally accomplished by slicing
of course, d[-5:] and d[:3] respectively.

If you slice an ordered dictionary, I assume you would expect to get an
ordered dictionary back ?

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
C

Christoph Zwerschke

Fuzzyman said:
That means making keys, values, and items custom objects.
Creating a new instance would have the overhead of creating 4 new
objects instead of just 1. Is the added convenience worth it ? (Plus
the extra layers of method calls for each access).

I'm not sure about that either. But since you are using odict for
convenience reasons anyway, and not for performance reasons, it would be
consequential. Performance testing should be made anyway, so this could
be tested as well. I think that creating these 3 additional objects will
not matter much if the dict has more than a handful of items. And the
extra layers will be only called if you really access these keys, values
or items attributes which will not happen very frequently. Normally, you
just loop over an ordered directory and acess keyed values.
I'm not sure. It's a nice idea in terms of using it (we could just
leave the sequence attribute as an alias for the new keys attribute -
for backwards compatibility).

Yes, you could make it a deprecated feature.

-- Christoph
 
A

Alex Martelli

Fuzzyman said:
If you slice an ordered dictionary, I assume you would expect to get an
ordered dictionary back ?

That would be helpful, yes, though there are precedents for types whose
slicing doesn't return an instance of that type (e.g. slices of an mmap
are instances of str, not of mmap, if I recall correctly), most
sliceable sequences do follow that pattern.


Alex
 
T

Tom Anderson

Speaking about "ordered" and "sorted" in the context of collections is
not a new terminology I am introducing, but seems to be pretty common in
computer science

This is quite true. I haven't seen any evidence for 'rife'
misunderstanding of these terms.

That said ...
Perhaps Pythonists are not used to that terminology, since they use the
term "list" for an "ordered collection". An ordered dictionary is a
dictionary whose keys are a (unique) list. Sometimes it is also called a
"sequence"

Maybe we should call it a 'sequenced dictionary' to fit better with
pythonic terminology?

tom
 
T

Tom Anderson

Bengt said:
E.g., it might be nice to have a mode that assumes d[key] is d.items()[k][1] when
key is an integer, and otherwise uses dict lookup, for cases where the use
case is just string dict keys.

I also thought about that and I think PHP has that feature, but it's
probably better to withstand the temptation to do that. It could lead
to an awful confusion if the keys are integers.

Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

+1

Overloading [] to sometimes refer to keys and sometimes to indices is a
really, really, REALLY bad idea. Let's have it refer to keys, and do
indices either via a sequence attribute or the return value of items().

More generally, if we're going to say odict is a subtype of dict, then we
have absolutely no choice but to make the methods that it inherits behave
the same way as in dict - that's what subtyping means. That means not
doing funky things with [], returning a copy from items() rather than a
live view, etc.

So, how do we provide mutatory access to the order of items? Of the
solutions discussed so far, i think having a separate attribute for it -
like items, a live view, not a copy (and probably being a variable rather
than a method) - is the cleanest, but i am starting to think that
overloading items to be a mutable sequence as well as a method is quite
neat. I like it in that the it combines two things - a live view of the
order and a copy of the order - that are really two aspects of one thing,
which seems elegant. However, it does strike me as rather unpythonic; it's
trying to cram a lot of functionality in an unexpected combination into
one place. Sparse is better than dense and all that. I guess the thing to
do is to try both out and see which users prefer.

tom
 
T

Tom Anderson

Think of it similar as the case of an ordinary dictionary: There is
conceptually a set here (the set of keys), but you cannot manipulate it
directly, but only through the according dictionary methods.

Which is a shame!
For an ordedred dictionary, there is conceptually a list (or more
specifically a unique list). Again you should not manipulate it
directly, but only through methods of the ordered dictionary.

This sounds at first more complicated, but is in reality more easy.

For instance, if I want to put the last two keys of an ordered dict d at
the beginning, I would do it as d = d[:-2] + d[-2:].

As i mentioned elsewhere, i think using [] like this is a terrible idea -
and definitely not easier.
With the list attribute (called "sequence" in odict, you would have to
write: d.sequence = d.sequence[:-2] + d.sequence[-2:]. This is not only
longer to write down, but you also have to know that the name of the
attribute is "sequence".

True, but that's not exactly rocket science. I think the rules governing
when your [] acts like a dict [] and when it acts like a list [] are
vastly more complex than the name of one attribute.
Python's strength is that you don't have to keep many details in mind
because it has a small "basic vocabulary" and orthogonal use.

No it isn't - it's in having a wide set of basic building blocks which do
one simple thing well, and thus which are easy to use, but which can be
composed to do more complex things. What are other examples of this kind
of 'orthogonal use'?
I prefer the ordered dictionary does not introduce new concepts or
attributes if everything can be done intuitively with the existing
Python methods and operators.

I strongly agree. However, i don't think your overloading of [] is at all
intuitive.

tom
 
C

Christoph Zwerschke

It seems everybody is in full agreement here.

I have the same mixed feeling about letting keys/values/items become
both managed list attributes and still returning copies of the lists
when called in the usual way as methods.

I don't know any precedent for doing things that way and i'm unsure
whether it meets the Zen of Python. But anyway the idea is nice enough
to try it out.

-- Chris
 
C

Christoph Zwerschke

Tom said:
True, but that's not exactly rocket science. I think the rules governing
when your [] acts like a dict [] and when it acts like a list [] are
vastly more complex than the name of one attribute.

I think it's not really rocket science either to assume that an ordered
dictionary behaves like a dictionary if you access items by subscription
and like a list if you use slices (since slice indexes must evaluate to
integers anyway, they can only be used as indexes, not as keys).
No it isn't - it's in having a wide set of basic building blocks which
do one simple thing well, and thus which are easy to use, but which can
be composed to do more complex things. What are other examples of this
> kind of 'orthogonal use'?

I mean for instance that you can deal with a tuple in the same way as
with a string and things like that. Concerning "small": Somebody coined
the good slogan "Python fits my brain" but I could also say "Python fits
*into* my brain" (I mean without the batteries of course ;-) - you
pretty soon have all the built-in functins, datatypes and their use in
your head. On the other side of course, Python is a huge field and you
can discuss endlessly as we are doing here.

Anyway, my argument was not good (avoiding new attributes names in order
to keep the vocabulary small).

And by the way, what both of us listed as strengths of Python will be
probably claimed by protagonists of any other language as well.

-- Christoph
 
T

Tom Anderson

Tom said:
True, but that's not exactly rocket science. I think the rules governing
when your [] acts like a dict [] and when it acts like a list [] are vastly
more complex than the name of one attribute.

I think it's not really rocket science either to assume that an ordered
dictionary behaves like a dictionary if you access items by subscription
and like a list if you use slices (since slice indexes must evaluate to
integers anyway, they can only be used as indexes, not as keys).

When you put it that way, it makes a certain amount of sense - [:] is
always about index, and [] is always about key. It's still icky, but it is
completely unambiguous.

tom
 
B

Bengt Richter

The implication above is that OrderedDict takes an *args argument,
but really it takes a single argument that is a sequence of k,v pairs,
(and maybe some keyword options).

Right. Interpret it as a short notation only, I did not want to change
the interface. I think it's a common mistake to forget the brackets
because you think one pair should be enough. At least I often forget it. Ok, I forget this too.
You could make keys, values, and items custom descriptors which would define __call__
for the old key() etc accesses, well as __getitem__ and __setitem__. Then you could write

d.items[0], d.items[-1] = d.items[-1], d.items[0]

to swap the order of the first and last items in the thing (I hesitate to say dict ;-)
You could also operate on slices.

Nice. You could also get the i-th value with d.values.

But is this considered good style or would it be considered "dirty" to
have a callable member that also supports indexing and slicing? (I don't
know, just asking?) Plus, it opens a can of worms by increasing the
complexity tremendously. Let's see whether this can be handled.

I suspect not that complex, but it has the same performance disadvantage as
properties.
BTW, before I showed an example where d[2:3] returned
a new dict instance rather than d.items()[:]. I think the items list
is better, since they naturally add, sort, reverse, etc as lists,
and you can write OrderedDict(d[2:3]+d[1:2]) if you want a new dict.

Not sure about that. I would rather expect that if you slice an object,
you get an object of the same type. And you can also add, sort, reverse,
etc the ordered dict if you want.
A slice assignment to d[i:j] is really sugar for something, but we have to decide exactly
what in case of duplicate keys in the incoming items, either with each other ...

You mean slice assignments to d.items[i:j]. If you make the slice
assignment to d[i:j] then at least you cannot have conflicts in the
incoming items. The first question is: Should a slice assignment be
treated as deletion with subsequential addition (changing the order) or
as a replacement (i.e. try to not change the order)? I agree that the
second would be the expected semantics, but as you mentioned more
difficult to implement.
One way to define it would be
d.items[i:j] = itemseq

to be implemented as sugar for
newitems = d.items[:i] + list(itemseq) + d.items[j:]
d.clear()
d.update(newitems)

Which should be the same as
d = OrderedDict(d.items[:i] + list(itemseq) + d.items[j:])
Sounds reasonable.
Actually it's not the same, which is why I wrote it with update.
Analogous to slice assignment in lists, the referred-to object
gets mutated. Hence preexisting references to the object see the change too.

If you just rebind d with a new OrderedDict object, previous bindings
are not affected. I.e.,
d = OrderedDict(sorted((k,i)) for i,k in enumerate('abc'))
e = d
d = OrderedDict(d.items[:1] + [('b', 100)] + d.items[2:])
d.items()[1] # => ('b', 100)
e.items()[1] # => ('b', 1) # unaffected

So
d.reverse()
could be spelled
d.items[:] = d.items[::-1]

Slice assignment for keys and values would also allow to set a new key
order with

d.keys[:] = newkeyseq
Do you really mean just re-ordering the keys without a corresponding reording of values??
That would be a weird renaming of all values. Or do you means that any key should still
retrieve the same value as before if used as d[key]? In which case the values must undergo
the same permutation as the keys. I.e., you are assuming key->value pairings remain stable
through any key reorderings?
So no need to define a setkeys() method or let keys() take arguments.
If we want to allow this kind of things at all.
If you are using slices, you can safely use them directly in the __getitem__ of th dict interface,
as I did in the "Que mas?" post. So we don't have to write d.items[i:j] and could write d[i:j].
The thing is, d[i:j] will tempt to skip ".items" in d.items and write d, which has the dict
key meaning, not the item list index. It is faster not have a

descriptor between though.

I still think d[i:j] should return an ordered dict, not an item list.

Easy to do either way.
There are different opinions. Fuzzyman would probably say "Don't trust
yourself?" I myself am undecided. Perhaps you could expect that somebody
knows what he does if he makes a slice assignment to d.keys. In any way,
such an assignment should not "break" the directory (with "broken" I
mean that the internal keys sequence does not match the keys of the
internal dictionary). If anything does not match, it should raise a
KeyException. If it is implemented that way, I think we can allow it.
Exactly what, though? should e.g.
d.keys[3] = newk3
mean (not a suggested implementation, just to define semantics)
keys = d.keys()
if newk3 in keys and keys.index(newk3)!=3:
raise ValueError,'Attempt to introduce duplicate key'
items = d.items()
items[3] = (newk3, items[3][1])
d.clear()
d.update(items)
?
This would allow what you might call renaming in place.
Similarly
d.keys[i:j] = newkeysij
might have the semantics of
keys = d.keys()
outside = set(keys[:i])+set(keys[j:])
if outside & set(newkeysij) or len(newkeysij) != len(set(newkeysij)):
raise ValueError,'Attempt to introduce duplicate key(s)'
items = d.items()
items[i:j] = [(k, items[kx+i][1]) for kx,k in enumerate(newkeysij)]
d.clear()
d.update(items)

Is this what is desired?
I think also that d1==d2 should effectively be implemented as d1[:] == d2[:] -- i.e, compare
the item lists to implement comparisons.

Wouldn't it be more performant to compare for
d1.internal_dict==d2.internal_dict and
d1.internal_sequence==d2.internal_sequence?
You don't keep track of the item lists, they need to be built on every
occasion.
That depends on how you implement ;-)

Back from holiday, so maybe I'll hack something out.

Regards,
Bengt Richter
 
B

Bengt Richter

Bengt Richter wrote:

E.g., it might be nice to have a mode that assumes d[key] is
d.items()[k][1] when
key is an integer, and otherwise uses dict lookup, for cases where
the use
case is just string dict keys.

I also thought about that and I think PHP has that feature, but it's
probably better to withstand the temptation to do that. It could lead
to an awful confusion if the keys are integers.

Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

+1

Overloading [] to sometimes refer to keys and sometimes to indices is a
really, really, REALLY bad idea. Let's have it refer to keys, and do
indices either via a sequence attribute or the return value of items().

More generally, if we're going to say odict is a subtype of dict, then we
have absolutely no choice but to make the methods that it inherits behave
the same way as in dict - that's what subtyping means. That means not
doing funky things with [], returning a copy from items() rather than a
live view, etc.
OTOH,
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unhashable type
I.e., slices are not valid keys for ordinary dicts, and slices tie in
very well with the ordered aspect of ordered dicts, so that's an
argument for permitting it via the indexing syntax, not just items[:]
or items()[:] which have related but not identical semantics.
So, how do we provide mutatory access to the order of items? Of the
solutions discussed so far, i think having a separate attribute for it -
like items, a live view, not a copy (and probably being a variable rather
than a method) - is the cleanest, but i am starting to think that
overloading items to be a mutable sequence as well as a method is quite
neat. I like it in that the it combines two things - a live view of the
order and a copy of the order - that are really two aspects of one thing,
which seems elegant. However, it does strike me as rather unpythonic; it's
trying to cram a lot of functionality in an unexpected combination into
one place. Sparse is better than dense and all that. I guess the thing to
do is to try both out and see which users prefer.
I wonder who is going to use it for what.

Regards,
Bengt Richter
 
C

Christoph Zwerschke

Bengt said:
d.keys[:] = newkeyseq

Do you really mean just re-ordering the keys without a corresponding reording of values??
That would be a weird renaming of all values. Or do you means that any key should still
retrieve the same value as before if used as d[key]? In which case the values must undergo
the same permutation as the keys. I.e., you are assuming key->value pairings remain stable
through any key reorderings?

Since it is considered as being a dictionary in the first place, the
key->value pairings should of course stay stable. In the usual
implementation based on an ordinary dictionary with an additional key
list ("sequence" in the Foord/Larosa and "_keys" in the Bejamin/Winter
implementation), you would only set the key list, since the value list
is generated dynamically. But if your implementation keeps internal
values or items lists, these need to be adjusted as well.

I will assume that d has is a Foord/Larosa ordered dict with "sequence"
attribute in the following.

Then, with other words,

d.keys[:] = newkeyseq

should do the same as:

d.sequence = newkeyseq
Exactly what, though? should e.g.
d.keys[3] = newk3
mean (not a suggested implementation, just to define semantics)
keys = d.keys()
if newk3 in keys and keys.index(newk3)!=3:
raise ValueError,'Attempt to introduce duplicate key'
items = d.items()
items[3] = (newk3, items[3][1])
d.clear()
d.update(items)

Yes, that would be the correct semantics. Of course this should not be
the real implementation and use KeyError instead of ValueError. With
other words,

d.keys = newkey

sould be the same as:

if d.sequence != newkey:
if newkey in d.sequence:
raise KeyError,'Attempt to introduce duplicate key'
else:
d.sequence = newkey
This would allow what you might call renaming in place.
Similarly
d.keys[i:j] = newkeysij
might have the semantics of
keys = d.keys()
outside = set(keys[:i])+set(keys[j:])
if outside & set(newkeysij) or len(newkeysij) != len(set(newkeysij)):
raise ValueError,'Attempt to introduce duplicate key(s)'
items = d.items()
items[i:j] = [(k, items[kx+i][1]) for kx,k in enumerate(newkeysij)]
d.clear()
d.update(items)

Is this what is desired?

Not quite, because it does not preserve the key->value pairings (see
above) and it would behave strangely or raise an exception if the new
slice is larger. The following code would do:

keys = d.keys()
outside = set(keys[:i])|set(keys[j:])
if outside & set(newkeysij) or len(newkeysij) != len(set(newkeysij)):
raise ValueError,'Attempt to introduce duplicate key(s)'
items = d.items()
items[i:j] = [(k, d.get(k, None)) for k in newkeysij]
d.clear()
d.update(items)

(Note that there was a bug in the second line. You cannot add sets.)

Again, this would be equivalent to:

seq = d.sequence
newseq = seq[:]
newseq[i:j] = newkeysij
if len(newseq) != len(set(newseq)):
raise KeyError,'Attempt to introduce duplicate key(s)'
for k in set(seq[i:j]) - set(newkeysij):
del d[k]
for k in set(newkeysij) - set(seq[i:j]):
d[k] = None
d.sequence = newseq
That depends on how you implement ;-)

Ok, I was thinking of the usual implementations.
Back from holiday, so maybe I'll hack something out.

Let us know when you have something to check out.

Maybe Fuzzyman can make some moderate improvements to the existing
odict.py, and you can do something more "radical". Then we have two
"reference implementations" and can compare how they prove regarding
performance and usability.

-- Christoph
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top