Why is dictionary.keys() a list and not a set?

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

Ok, the answer is easy: For historical reasons - built-in sets exist
only since Python 2.4.

Anyway, I was thinking about whether it would be possible and desirable
to change the old behavior in future Python versions and let dict.keys()
and dict.values() both return sets instead of lists.

If d is a dict, code like:

for x in d.keys():
...

or even

for x in sorted(d.keys()):
...

would still work and do the same.

However, the older idiom

k = d.keys()
k.sort()
for x in k:
...

would break (since you cannot sort a map directly).

So it seems not possible to change the behavior since it would break old
code. Maybe keys() and values() could be made deprecated functions,
superseded by keyset() and valueset(), but probably this would not be
worth the effort.

Another related question: Actually, dicts can be considered as
subclasses of sets and thus could inherit a lot of set methods and
operators. Only intersection and union must be treated with care,
because dictionaries must be mappings (i.e. map to exactly one value).

In fact, some inheritance from sets has already been implemented:

For instance, by allowing the set operator "in" for dictionaries,
instead of "has_key".

The len(), clear(), copy() and update() functions can also be
interpreted as inherited from set. The dictionary method popitem() is
actually the inherited set method pop(). (d.pop() without arguments
could actually do the same as d.popitem(); I guess the suffix "item" was
chosen to remind you that it returns a key/value pair, not only a value.)

But could other set methods also be useful?

A dictionary could inherit the remove() and discard() method from sets.
d.remove(x) would do the same as del d[x], but d.discard(x) would not
raise a KeyError if x not in d.

Or, for instance, if

a = { 1:11, 2:12 }; b = { 2:22, 3:13 }, c = { 2:32 }

then

c.issubset(b) == c <= b == True
b.issuperset(c) == b >= c == True
a.difference(b) == a - b == { 1:11 }
a.s.symmetric_difference(b) == a ^ b == { 1:11, 3:13 }
a.update(b) == a |= b == a = { 1:11, 2:22, 3:13 }
a.intersection_update(b) == a &= b == a = { 2:22 }
a.difference_update(b) == a -= b == a = { 1:11 }
a.symmetric_difference_update(b) == a ^= b == a = { 1:11, 3:13 }

Of these, a |= b may be particularly interesting as short notation for
a.update(b).

-- Christoph
 
M

Mike Meyer

Christoph Zwerschke said:
Ok, the answer is easy: For historical reasons - built-in sets exist
only since Python 2.4.

Anyway, I was thinking about whether it would be possible and
desirable to change the old behavior in future Python versions and let
dict.keys() and dict.values() both return sets instead of lists.

Two (rhetorical, since you dropped the idea) questions:

Are you sure dict.values() should be a set? After all, values aren't
guaranteed to be unique, so dict(a = 1, b = 1).values() currently
returns [1, 1], but would return set([1]) under your proposal.

What about dict.items()?
For instance, by allowing the set operator "in" for dictionaries,
instead of "has_key".

"in" already works for dicdtionaries:
But could other set methods also be useful?

Looks like it.

<mike
 
?

=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=

Christoph said:
Anyway, I was thinking about whether it would be possible and desirable
to change the old behavior in future Python versions and let dict.keys()
and dict.values() both return sets instead of lists.

You answer the question whether it would be possible (no, because of
backwards compatibility); you don't attempt to answer the question
whether it would be desirable. Why do you think that would be
a good idea?

If you want the set of keys of a dictionary d, then do set(d):
set([1, 3])

As Mike Meyer explains, the same is meaningless for .values():
they might not be unique. For .items(), conversion into a set
does would appear to be meaningful, but doesn't actually work:
if the values contain unhashable objects, the set of tuples
cannot be constructed (as set members must be immutable).

Regards,
Martin
 
P

Peter Hansen

Christoph said:
Ok, the answer is easy: For historical reasons - built-in sets exist
only since Python 2.4.

Anyway, I was thinking about whether it would be possible and desirable
to change the old behavior in future Python versions and let dict.keys()
and dict.values() both return sets instead of lists.

Definitely not. I believe it's currently guaranteed that the order of
the items in dict.keys() and dict.values() will match (i.e. the index of
any key in its list will be the same as the index of the corresponding
value in its list). This property is almost certainly used in some
code, so it can't be broken without good reason.

-Peter
 
J

jepler

You can already get a set from a dictionary's keys in an efficient manner:Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDhR6sJd01MZaTXX0RAvz5AJoD02CCRhNt5xsTpZI7rjiVrYUywACfYNzL
vhkMLH1qXOV1i8R1Je5cSzk=
=9crP
-----END PGP SIGNATURE-----
 
B

bonono

Peter said:
Definitely not. I believe it's currently guaranteed that the order of
the items in dict.keys() and dict.values() will match (i.e. the index of
any key in its list will be the same as the index of the corresponding
value in its list). This property is almost certainly used in some
code, so it can't be broken without good reason.

Interesting, but why it guaranteed this as functionality wise they are
two different things and since dict is unordered, there is no need for
such guarantee, would it be just implementation consequence ?

This to me is even more uncertain than the zip(it,it) case.
 
P

Peter Hansen

Interesting, but why it guaranteed this as functionality wise they are
two different things and since dict is unordered, there is no need for
such guarantee, would it be just implementation consequence ?

From the docs (http://docs.python.org/lib/typesmapping.html):

"""
Keys and values are listed in an arbitrary order which is non-random,
varies across Python implementations, and depends on the dictionary's
history of insertions and deletions. If items(), keys(), values(),
iteritems(), iterkeys(), and itervalues() are called with no intervening
modifications to the dictionary, the lists will directly correspond.
This allows the creation of (value, key) pairs using zip(): "pairs =
zip(a.values(), a.keys())". The same relationship holds for the
iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(),
a.iterkeys())" provides the same value for pairs. Another way to create
the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]".
"""

Does that explain it adequately?

-Peter
 
B

bonono

Peter said:
Interesting, but why it guaranteed this as functionality wise they are
two different things and since dict is unordered, there is no need for
such guarantee, would it be just implementation consequence ?

From the docs (http://docs.python.org/lib/typesmapping.html):

"""
Keys and values are listed in an arbitrary order which is non-random,
varies across Python implementations, and depends on the dictionary's
history of insertions and deletions. If items(), keys(), values(),
iteritems(), iterkeys(), and itervalues() are called with no intervening
modifications to the dictionary, the lists will directly correspond.
This allows the creation of (value, key) pairs using zip(): "pairs =
zip(a.values(), a.keys())". The same relationship holds for the
iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(),
a.iterkeys())" provides the same value for pairs. Another way to create
the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]".
"""

Does that explain it adequately?
Thanks, but yes and no.

Which is also my initial puzzle, items() and iteritems() already gives
you the tuples, why such gurantee or the others ? Doesn't that violate
the general idiom that if we can do certain thing in one way, there
better be one and only one way.

Are there other usage scenarios that would be benefit from this as the
example given is not convincing for me.
 
J

jepler

One reason might be Practicality. The zip() versions handily beat the
listcomp versions on a 10kitem dict. (python2.4)

$ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.iteritems()]'
100 loops, best of 3: 5.05 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.items()]'
100 loops, best of 3: 7.14 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.itervalues(), d.iterkeys())'
100 loops, best of 3: 3.13 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.values(), d.keys())'
100 loops, best of 3: 3.28 msec per loop

For comparison,
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'd.items()'
100 loops, best of 3: 2.19 msec per loop

Maybe there are also other reasons to promise this property that I'm not
aware of. Certainly it seems like this property is useful and not hard
to provide for "non-perverse" implementations, much like the
now-documented stability of sort().

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDhTuAJd01MZaTXX0RAp2YAKCCaFalUL3ogLc/JKcIwuX7+N8QigCdG+28
8HTzKvaPYMZxKPamLPajBzw=
=zqeT
-----END PGP SIGNATURE-----
 
B

bonono

One reason might be Practicality. The zip() versions handily beat the
listcomp versions on a 10kitem dict. (python2.4)

$ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.iteritems()]'
100 loops, best of 3: 5.05 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.items()]'
100 loops, best of 3: 7.14 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.itervalues(), d.iterkeys())'
100 loops, best of 3: 3.13 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.values(), d.keys())'
100 loops, best of 3: 3.28 msec per loop

For comparison,
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'd.items()'
100 loops, best of 3: 2.19 msec per loop

Maybe there are also other reasons to promise this property that I'm not
aware of. Certainly it seems like this property is useful and not hard
to provide for "non-perverse" implementations, much like the
Thanks, but we don't need the list comprehension(or do we) ? As the
return of d.iteritems() is already a list of tuples.
 
P

Peter Hansen

Which is also my initial puzzle, items() and iteritems() already gives
you the tuples, why such gurantee or the others ? Doesn't that violate
the general idiom that if we can do certain thing in one way, there
better be one and only one way.

Are there other usage scenarios that would be benefit from this as the
example given is not convincing for me.

As Jeff's reply emphasizes, the "examples" show tuples with *value* and
then *key*, not the other way around which is what .items() and
..itemitems() gives you.

Anyway, you didn't ask for good examples of use, just "why is it
guaranteed", and that's all I was showing you. Whether it was a good
idea might be an interesting discussion, but probably not a particularly
useful one given once again that it's unlikely this particular feature
could be undone now that code exists (we can assume) which is dependent
on it.

-Peter
 
B

bonono

Peter said:
As Jeff's reply emphasizes, the "examples" show tuples with *value* and
then *key*, not the other way around which is what .items() and
.itemitems() gives you.

Anyway, you didn't ask for good examples of use, just "why is it
guaranteed", and that's all I was showing you. Whether it was a good
idea might be an interesting discussion, but probably not a particularly
useful one given once again that it's unlikely this particular feature
could be undone now that code exists (we can assume) which is dependent
on it.
Please don't take it as a challenge, it was not. If it was, it was
about the need for the guarantee, not about you not giving me the
answer I want. But it is really wondering why ?

As for the (k,v) vs (v,k), I still don't think it is a good example. I
can always use index to access the tuple elements and most other
functions expect the first element to be the key. For example :

a=d.items()
do something about a
b = dict(a)

But using the imaginary example :

a = zip(d.values(), d.keys())
do something about a
b = dict(a) # probably not what I want.
 
M

Mike Meyer

Which is also my initial puzzle, items() and iteritems() already gives
you the tuples, why such gurantee or the others ? Doesn't that violate
the general idiom that if we can do certain thing in one way, there
better be one and only one way.

Backwards compatability. The guarantee on the order of keys() and
values() predates items() (and iteritems()). You can't remove the
guarantee without potentially breaking old code, and that's Not Done -
at least not to code that wasn't broken to begin with.

Maybe dropping the guarantee should be considered for P3K, on the off
chance that either keys or values could be made faster at some point
in the future. But I don't see it as a big deal.

<mike
 
B

bonono

Mike said:
Backwards compatability. The guarantee on the order of keys() and
values() predates items() (and iteritems()). You can't remove the
guarantee without potentially breaking old code, and that's Not Done -
at least not to code that wasn't broken to begin with.

Maybe dropping the guarantee should be considered for P3K, on the off
chance that either keys or values could be made faster at some point
in the future. But I don't see it as a big deal.
Thanks, that explain it. I don't know that there was no
items()/iteritems() in earlier version.
 
P

Peter Hansen

Please don't take it as a challenge, it was not. If it was, it was
about the need for the guarantee, not about you not giving me the
answer I want. But it is really wondering why ?

If you really want to know merely the *need* for the guarantee, as in
why was this ever put in in the first place, you will be sure of a
correct answer only if it's provided by those who wrote the code in the
first place. Anything else is guessing or mind-reading. I wasn't
trying to do even that, but just to point to where I had seen the
afore-mentioned guarantee. My responsibility ended there, and with that
I am outta here. :)

-Peter

(And don't worry, I didn't take it as a challenge. If you thought I
took offense somewhere, please re-read my words without that thought in
mind and I think you'll see merely straight-forward factual responses to
your questions, as they were meant to be. Cheers.)
 
S

Steve Holden

(e-mail address removed) wrote:
[...]
Maybe there are also other reasons to promise this property that I'm not
aware of. Certainly it seems like this property is useful and not hard
to provide for "non-perverse" implementations, much like the

Thanks, but we don't need the list comprehension(or do we) ? As the
return of d.iteritems() is already a list of tuples.
No it isn't: it's a dictionary item iterator. This can be used to
provide a list, but it's self-evidently an iterator, not a list:
>>> d = dict.fromkeys(range(5))
>>> d.iteritems
>>> d.iteritems()
>>> [x for x in d.iteritems()] [(0, None), (1, None), (2, None), (3, None), (4, None)]
>>>

regards
Steve
 
B

bonono

Steve said:
(e-mail address removed) wrote:
[...]
Maybe there are also other reasons to promise this property that I'm not
aware of. Certainly it seems like this property is useful and not hard
to provide for "non-perverse" implementations, much like the

Thanks, but we don't need the list comprehension(or do we) ? As the
return of d.iteritems() is already a list of tuples.
No it isn't: it's a dictionary item iterator. This can be used to
provide a list, but it's self-evidently an iterator, not a list:
Oops, typo. I was meant to say d.items(). stand corrected.
 
F

Fredrik Lundh

Mike said:
Backwards compatability. The guarantee on the order of keys() and
values() predates items() (and iteritems()).

according to the SVN repository, the type originally had "keys" and
"has_key" only. "values" and "items" were both added in the same
checkin (may 1993).

performance is of course another aspect; if you *need* two parallel
lists, creating a list full of tuples just to pull them apart and throw
them all away isn't exactly the most efficient way to do things.

(if performance didn't matter at all, we could get rid most dictionary
methods; "iterkeys", "in", and locking should be enough, right?)
Maybe dropping the guarantee should be considered for P3K, on the off
chance that either keys or values could be made faster at some point
in the future. But I don't see it as a big deal.

the guarantee only means that the type must use the same algorithm
to traverse the dictionary data structure for both "keys" and "values".
I'm not aware of any dictionary algorithm that doesn't maintain a link
between keys and values, so that's not really much of a restriction...

</F>
 
B

bonono

Fredrik said:
performance is of course another aspect; if you *need* two parallel
lists, creating a list full of tuples just to pull them apart and throw
them all away isn't exactly the most efficient way to do things.

(if performance didn't matter at all, we could get rid most dictionary
methods; "iterkeys", "in", and locking should be enough, right?)
If I need two parallel list(one key, one value), I can still use the
same [(k,v)] tuple, just access it as x[0], x[1].

But the history alone explains it anyway.
 
S

Steven D'Aprano

Peter said:
Definitely not. I believe it's currently guaranteed that the order of
the items in dict.keys() and dict.values() will match (i.e. the index of
any key in its list will be the same as the index of the corresponding
value in its list). This property is almost certainly used in some
code, so it can't be broken without good reason.

As I recall, that guarantee is only if the dict is not
modified between retrieving the keys and retrieving the
values.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top