Sets in Python

sapsi · Sep 19, 2007

Hello,
I recently tried using the set function in Python and was surprised to
find that

a=[ 1, 2,3, [1,2] ]

doesn't work with 'set', throwing TyperError (unhashable exception). I
found out that this is because lists can't be hashed.

So,this implies 'a' cannot be a set in python which i think is quite
unfortunate, after all 'a' does look like a mathematical set.

My question is,
1) Why can't lists be hashed?
and
2) This is not related, but is there i neat way (without pop and list
comprehension) to convert a set into a list? I say neat because i'm
guessing using list comprehension might turn out be slow and there
might be other methods which are faster.

Thank you for your time
SM

Evil Bert · Sep 19, 2007

sapsi said:
2) This is not related, but is there i neat way (without pop and list
comprehension) to convert a set into a list? I say neat because i'm
guessing using list comprehension might turn out be slow and there
might be other methods which are faster.

a = set([1, 2, 3, 4])
b = list(a)

Raymond Hettinger · Sep 19, 2007

I recently tried using the set function in Python and was surprised to
find that

a=[ 1, 2,3, [1,2] ]

doesn't work with 'set', throwing TyperError (unhashable exception). I
found out that this is because lists can't be hashed.
So,this implies 'a' cannot be a set in python which i think is quite
unfortunate, after all 'a' does look like a mathematical set.

This is written as:

a = set([1, 2, 3, frozenset([1, 2])])

This is not related, but is there i neat way (without pop and list
comprehension) to convert a set into a list?

list(a)

Raymond

Asun Friere · Sep 19, 2007

My question is,
1) Why can't lists be hashed?

They are mutable.

Dustan · Sep 19, 2007

Hello,
I recently tried using the set function in Python and was surprised to
find that

a=[ 1, 2,3, [1,2] ]

doesn't work with 'set', throwing TyperError (unhashable exception). I
found out that this is because lists can't be hashed.

So,this implies 'a' cannot be a set in python which i think is quite
unfortunate, after all 'a' does look like a mathematical set.

It is not the variable *a* itself that's a problem when constructing a
set (ie. set(a)); it is the content. set() goes through each of the
items and adds that item to the set. 1, 2, and 3 are valid because
they can be hashed. The next item in the list, however, is [1,2], and
cannot be hashed because it is a mutable list.

The solution is as Raymond Hettinger said:

a = set([1, 2, 3, frozenset([1, 2])])

My question is,
1) Why can't lists be hashed?

They're mutable.

and
2) This is not related, but is there i neat way (without pop and list
comprehension) to convert a set into a list? I say neat because i'm
guessing using list comprehension might turn out be slow and there
might be other methods which are faster.
list(a_set)

Thank you for your time

You're welcome.

Paddy · Sep 19, 2007

I recently tried using the set function in Python and was surprised to
find that

Click to expand...

a=[ 1, 2,3, [1,2] ]

Click to expand...

doesn't work with 'set', throwing TyperError (unhashable exception). I
found out that this is because lists can't be hashed.
So,this implies 'a' cannot be a set in python which i think is quite
unfortunate, after all 'a' does look like a mathematical set.

Click to expand...

This is written as:

a = set([1, 2, 3, frozenset([1, 2])])

This is not related, but is there i neat way (without pop and list
comprehension) to convert a set into a list?

Click to expand...

list(a)

Raymond

frozenset over turning the embedded list into a tuple?
The tuple would preserve order in the item (1,2)
a = set([1,2,3, (1,2)])

- Paddy.

Francesco Guerrieri · Sep 19, 2007

frozenset over turning the embedded list into a tuple?
The tuple would preserve order in the item (1,2)
a = set([1,2,3, (1,2)])

The OP was probably thinking in mathematical terms as in "the set of
all the possible subsets of the set composed by 1, 2 and 3" and thus
order would not be important.

francesco

Sion Arrowsmith · Sep 19, 2007

sapsi said:
Why can't lists be hashed?

Several people have answered "because they're mutable" without
explaining why mutability precludes hashing. So:

Consider a dict (dicts have been in Python a *lot* longer than
sets, and have the same restriction) which allowed lists as
keys:

d = {}
k = [1, 2]
d[k] = None

Now, if I were to do:

k.append(3)

what would you expect:

d.keys()

to return? Did d magically rehash k when it was modified? Did d[k]
take a copy of k, and if so, how deep was the copy (consider
d[[1, k]] = None followed by a modification to k)? Leaving the hash
unchanged and relying on collision detection to resolve won't work,
since you may go directly for d[[1, 2, 3]] and not spot that
there's already an entry for it since it's been hashed under [1, 2].

"Practicality beats purity" and the design decision was to simply
sidestep these issues by disallowing mutable dict keys. And as the
set implementation is based on the dict implementation, it applies
to sets to.

Karthik Gurusamy · Sep 19, 2007

sapsi said:
sapsi said:

Why can't lists be hashed?

Click to expand...

Several people have answered "because they're mutable" without
explaining why mutability precludes hashing. So:

Consider a dict (dicts have been in Python a *lot* longer than
sets, and have the same restriction) which allowed lists as
keys:

d = {}
k = [1, 2]
d[k] = None

Now, if I were to do:

k.append(3)

what would you expect:

d.keys()

to return? Did d magically rehash k when it was modified? Did d[k]
take a copy of k, and if so, how deep was the copy (consider
d[[1, k]] = None followed by a modification to k)? Leaving the hash
unchanged and relying on collision detection to resolve won't work,
since you may go directly for d[[1, 2, 3]] and not spot that
there's already an entry for it since it's been hashed under [1, 2].

"Practicality beats purity" and the design decision was to simply
sidestep these issues by disallowing mutable dict keys. And as the
set implementation is based on the dict implementation, it applies
to sets to.

While it's easy to explain the behavior, I think the decision to dis-
allow mutable items as keys is a bit arbitrary. There is no need for
dict to recompute hash (first of all, a user doesn't even need to know
if underneath 'hashing' is used -- the service is just a mapping
between one item to another item).

Since we know hashing is used, all that is needed is, a well-defined
way to construct a hash out of a mutable. "Given a sequence, how to
get a hash" is the problem. If later the given sequence is different,
that's not the dict's problem.

d = {} a = 10
d[a] = 'foo'
d[5+5] = 'bar'
d[10]

Click to expand...

Click to expand...

'bar'

aren't the '5+5' which is 10, is different from the previous line's
a?.. so
why not allow similar behavior with lists/other sequence/even other
collections. As long as two objects compare equal the hash-result must
be the same. I guess this takes us to defining the equality operation
for lists-- which I think has a very obvious definition (ie same
length and the ith element of each list compare equal).

So if the list changes, it will result in a different hash and we will
get a hash-miss. I doubt this is in anyway less intuitive than dis-
allowing mutable items as keys.

Karthik

Vampire · Sep 19, 2007

Windows Ip conflict

Hi,
i am Frm malaysia. i am working in a cafe as network admin and we are having 151 work stations.

Since last 1 week i found that soe work stations keep sending the msg.." windows ip is colflicting from another location: and those pc s can not get online.

but the thing is this Pc keep changing.. not always the same pc is showing this msg:

what could be the reason? and how to ressolve it"??

---VaMpirE----

Paddy · Sep 19, 2007

Since we know hashing is used, all that is needed is, a well-defined
way to construct a hash out of a mutable. "Given a sequence, how to
get a hash" is the problem. If later the given sequence is different,
that's not the dict's problem.

Oh it is possible to construct a hash from a mutable. What is
difficult is creating the same hash when the mutable mutates. Or
indeed working out what it means when a hash key mutates and you
access the dictionary.
Ignoring this gives the programmer a big problem hence the limitation.

I don't think you have a better solution.

- Paddy.

Karthik Gurusamy · Sep 20, 2007

Oh it is possible to construct a hash from a mutable. What is
difficult is creating the same hash when the mutable mutates.

Why? There is no reason that the dict should maintain the same hash,
after all the user is calling with a different sequence as key (after
the mutation).

There seems to be an underlying assumption that the dictionary key-

value mapping should somehow maintain the mapping even when the key

changes behind its back.

The contract could very well be, hey if you give me a different
sequence later (by mutating the one you added), don't expect me to
find it in the dictionary.

Or
indeed working out what it means when a hash key mutates and you
access the dictionary.
Ignoring this gives the programmer a big problem hence the limitation.

I don't think you have a better solution.

But why would a programmer expect to find the match, when his/her code
has changed the sequence (or has somehow let the hash key mutate) from
the time of dictionary addition.

If I did, a = [10, 20] and I did d[a]= 'foo', then a.append(30).
If dict complains key error on d[a] now, I won't be surprised. If I do
d[[10, 20, 30]], I will be surprised if it doesn't find the item. Of
course, in today's behavior the above is syntax error.

Karthik

Mark Dickinson · Sep 20, 2007

If I did, a = [10, 20] and I did d[a]= 'foo', then a.append(30).
If dict complains key error on d[a] now, I won't be surprised. If I do
d[[10, 20, 30]], I will be surprised if it doesn't find the item. Of
course, in today's behavior the above is syntax error.

It sounds as though you're proposing something like the following:

k = mylist([1, 2])
d = {k : 'test'}
d[k] 'test'
k.append(3)
d[k]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

So far, so good. But how do you explain the following to a confused
newcomer?

d.keys() [[1, 2, 3]]
k in d.keys() True
k in d False
d[k]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

In other words, to repeat Sion Arrowsmith's question, what would you
expect d.keys() to return after a key of d has been modified?

Mark

Steven D'Aprano · Sep 20, 2007

While it's easy to explain the behavior, I think the decision to dis-
allow mutable items as keys is a bit arbitrary. There is no need for
dict to recompute hash

What???

Of course it does. How else can it look up the key? Because it (somehow)
just recognizes that it has seen the key before? How? By magic?

(first of all, a user doesn't even need to know
if underneath 'hashing' is used -- the service is just a mapping between
one item to another item).

The user doesn't need to know the mechanism, but the dict does. Dicts are
implemented as hash tables. I suppose they could be implemented as
something else (what? linear lists? some sort of tree?) but the same
considerations must be made: the dict must be able to find keys it has
seen before. How is the dict supposed to recognise the key if the key has
changed?

Since we know hashing is used, all that is needed is, a well-defined way
to construct a hash out of a mutable. "Given a sequence, how to get a
hash" is the problem.

Nonsense. That's not the problem. The problem is how to get exactly the
same hash when the sequence has changed.

In other words, if you have two mutable objects M1 and M2, then you
expect:

hash(M1) == hash(M2) if and only if M1 and M2 are equal
hash(M1) != hash(M2) if M1 and M2 are unequal

but also:

if M1 mutates to become equal to M2, hash(M1) must remain the same while
still being different from hash(M2).

That means that hash() now is a non-deterministic function. hash([1,2,3])
will vary according to how the list [1,2,3] was constructed!

Obviously something has to give, because not all of these things are
mutually compatible.

If later the given sequence is different, that's
not the dict's problem.

Data structures don't have problems. Programmers do. And language
designers with sense build languages that minimize the programmers
problems, not maximize them.

So if the list changes, it will result in a different hash and we will
get a hash-miss. I doubt this is in anyway less intuitive than dis-
allowing mutable items as keys.

The choices for the language designer are:

(1) Invent some sort of magical non-deterministic hash function which
always does the Right Thing.

(2) Allow the hash of mutable objects to change, which means you can use
mutable objects as keys in dicts but if you change them, you can no
longer find them in the dict. They'll still be there, using up memory,
but you can't get to them.

(3) Simply disallow mutable objects as keys.

Alternative 1 is impossible, as we've seen, because the requirements for
the Right Thing are not mutually compatible.

Alternative (2) leads to hard-to-find, hard-to-diagnose bugs where you
store objects in a dict only for them to mysteriously disappear later.
Worse, it could lead to bugs like the following hypothetical:

M = [1, 2, 3]
D = {M: 'parrot'} # pretend this works
D {[1, 2, 3]: 'parrot'}
M.append(4)
D {[1, 2, 3, 4]: 'parrot'}
D[[1, 2, 3, 4]]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3, 4]

Try explaining that one to programmers: they can SEE the key in the dict
when they print it, but they can't get it or delete it because the hash
has changed.

Alternative 3 is easy to deal with: simply don't use mutable objects as
keys. That's what Python does. Sure, the programmer sometimes needs to
work around the lack (convert the list into a tuple, or a string, or
pickle it, whatever...) which on rare occasions is hard to do, but at
least there are no mysterious, hard to track down bugs.

prikar20 · Sep 20, 2007

If I did, a = [10, 20] and I did d[a]= 'foo', then a.append(30).
If dict complains key error on d[a] now, I won't be surprised. If I do
d[[10, 20, 30]], I will be surprised if it doesn't find the item. Of
course, in today's behavior the above is syntax error.

Click to expand...

It sounds as though you're proposing something like the following:

k = mylist([1, 2])
d = {k : 'test'}
d[k] 'test'
k.append(3)
d[k]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

So far, so good. But how do you explain the following to a confused
newcomer?

d.keys() [[1, 2, 3]]
k in d.keys() True
k in d False
d[k]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

In other words, to repeat Sion Arrowsmith's question, what would you
expect d.keys() to return after a key of d has been modified?

In the new model, it should be the value at the time of addition.
That is [1,2] (not [1,2,3]). This does mean a copy of key in
maintained internally in the dict. I think today that's not needed
(just a reference to the key's object is sufficient).

Again, this may not be the most elegant solution; neither seems to be
the current requirement that keys must be immutable. Fundamentally
dict is a mapping; underneath it could even use other elaborate
algorithms (say for integer keys, an avl tree) -- there is no reason
to expose the the quirks of hashing to the programmer.

Karthik

Bryan Olson · Sep 20, 2007

Karthik said:
While it's easy to explain the behavior, I think the decision to dis-
allow mutable items as keys is a bit arbitrary.

Furthermore, it's not really true.

class Blurf (object):
def __init__(self, intval):
self.seti(intval)
def seti(self, intval):
self.i = intval

Blurf sure looks mutable, yet Python will happily use Blurfs as
dict keys. The trick is that the dict key is the Blurf's identity,
not its state. We could switch to using state as the key by
implementing methods __hash__ and either __cmp__ or __eq__.

[...]

why not allow similar behavior with lists/other sequence/even other
collections. As long as two objects compare equal the hash-result must
be the same.

That's a good rule to follow in programming one's own classes.
Keep __hash__, __cmp__ and __eq__ consistent.

I guess this takes us to defining the equality operation
for lists-- which I think has a very obvious definition (ie same
length and the ith element of each list compare equal).

Already done. The "==" operator compares lists by their contents;
the "is" operator, by identity.

So if the list changes, it will result in a different hash and we will
get a hash-miss. I doubt this is in anyway less intuitive than dis-
allowing mutable items as keys.

If it's any consolation, you can build your own:

class HashableList (list):
def __hash__(self):
return tuple(self).__hash__()

You'd probably also want to implement slicing.

Bad Python 3000 has no immutable type for byte-strings.
The new bytes type cannot serve for dict keys or set members.
Many things one would want to hash are unhashable -- for
example, the results of the hash functions in hashlib.

Karthik Gurusamy · Sep 20, 2007

What???

Of course it does. How else can it look up the key? Because it (somehow)
just recognizes that it has seen the key before? How? By magic?

You answered it yourself later. For a mapping service, hash is just
one way to do things. What you need is for each item in the
collection, a unique key.
How you go from the key to the value is not something a programmer
needs to know.
Your mind is set on thinking on hash alone and hence you don't see
beyond it.

The user doesn't need to know the mechanism, but the dict does. Dicts are
implemented as hash tables. I suppose they could be implemented as
something else (what? linear lists? some sort of tree?) but the same
considerations must be made:

Oh yes. If the keys are all integers (or any set of items that can be
ordered), why not an avl. It has guaranteed O(log N) while a hash in
worst case is O(N). Why you want to tie yourself to the drawbacks of
one datastructure? Understand your goal is not to provide a hash; but
to provide a mapping service.

the dict must be able to find keys it has

seen before. How is the dict supposed to recognise the key if the key has
changed?

Nonsense. That's not the problem. The problem is how to get exactly the
same hash when the sequence has changed.

Yes, if you keep thinking hash is the only tool you got.

In other words, if you have two mutable objects M1 and M2, then you
expect:

No. I don't expect. I expect the hash to be different. Why do you keep
thinking it's the mappings responsibility to take care of a changing
key.

hash(M1) == hash(M2) if and only if M1 and M2 are equal
hash(M1) != hash(M2) if M1 and M2 are unequal

but also:

if M1 mutates to become equal to M2, hash(M1) must remain the same while
still being different from hash(M2).

That means that hash() now is a non-deterministic function. hash([1,2,3])
will vary according to how the list [1,2,3] was constructed!

Obviously something has to give, because not all of these things are
mutually compatible.

If later the given sequence is different, that's
not the dict's problem.

Click to expand...

Data structures don't have problems. Programmers do. And language
designers with sense build languages that minimize the programmers
problems, not maximize them.

Yes, here you talk about a different goal altogether. Here comes the
'arbitrary' part I mentioned.

The choices for the language designer are:

(1) Invent some sort of magical non-deterministic hash function which
always does the Right Thing.

Nope, just say if the new sequence is different, you don't find the
item in the dict.

(2) Allow the hash of mutable objects to change, which means you can use
mutable objects as keys in dicts but if you change them, you can no
longer find them in the dict. They'll still be there, using up memory,
but you can't get to them.

In the new model, at the time of addition, you need to remember the
key at that time. If it's a list, you make a copy of the items.

(3) Simply disallow mutable objects as keys.

Alternative 1 is impossible, as we've seen, because the requirements for
the Right Thing are not mutually compatible.

Alternative (2) leads to hard-to-find, hard-to-diagnose bugs where you
store objects in a dict only for them to mysteriously disappear later.
Worse, it could lead to bugs like the following hypothetical:

Of course they can be reached with.. for k in dict...

M = [1, 2, 3]
D = {M: 'parrot'} # pretend this works
D

Click to expand...

Click to expand...

{[1, 2, 3]: 'parrot'}>>> M.append(4)
{[1, 2, 3, 4]: 'parrot'}>>> D[[1, 2, 3, 4]]

No, in the new way, the key still remains [1, 2, 3]
What was changed is M. Not the key given to dict at the time of
addition.
Again I'm not describing today's behavior; it's in the new way.

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3, 4]

Try explaining that one to programmers: they can SEE the key in the dict
when they print it, but they can't get it or delete it because the hash
has changed.

No they don't. They see the key at the time of addition ([1,2,3])

Alternative 3 is easy to deal with: simply don't use mutable objects as
keys. That's what Python does. Sure, the programmer sometimes needs to
work around the lack (convert the list into a tuple, or a string, or
pickle it, whatever...) which on rare occasions is hard to do, but at
least there are no mysterious, hard to track down bugs.

When I first saw key's must'be be mutable, it did appear to me to be
mysterious. There was unnecessary tighter coupling between
implementation details and the service exposed to the programmer. (As
I see it, the root cause of all this is, the dict does not make a copy
of the key at the time of item addition, it just makes a new reference
to the same object)

Karthik

Paddy · Sep 20, 2007

In the new model, at the time of addition, you need to remember the
key at that time. If it's a list, you make a copy of the items.

In other words you ask the dict to freeze any mutable keys given to
it.
Try an implementation and you will find it is impractical. Checking
for mutability then doing deep copies of keys and would consume time
in something that greatly affects the speed of Python as a whole.
Pythons designers give *you* the option of doing this and leave the
underlying dict speedy. I can live with that.

- Paddy.

thebjorn · Sep 20, 2007

On Sep 20, 4:17 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
[...]

Data structures don't have problems. Programmers do.

That's QOTW material

... And language
designers with sense build languages that minimize the programmers
problems, not maximize them.
....

The choices for the language designer are:

(1) Invent some sort of magical non-deterministic hash function which
always does the Right Thing.

(2) Allow the hash of mutable objects to change, which means you can use
mutable objects as keys in dicts but if you change them, you can no
longer find them in the dict. They'll still be there, using up memory,
but you can't get to them.

(3) Simply disallow mutable objects as keys.

(4) Allow mutable objects as keys, but have undefined (implementation
defined) behavior if keys are mutated.

This would seem a natural extension of the "we're all adults" paradigm
(if I name a variable _foo you should treat it as private).
Unfortunately there is no visual cue in this case, and tracking down
where you relied on undefined behavior is notoriously time-consuming
(just ask your nearest C++ multiplatform programmer).

Alternative 3 is easy to deal with: simply don't use mutable objects as
keys. That's what Python does. Sure, the programmer sometimes needs to
work around the lack (convert the list into a tuple, or a string, or
pickle it, whatever...) which on rare occasions is hard to do, but at
least there are no mysterious, hard to track down bugs.

Amen.

-- bjorn

Marc 'BlackJack' Rintsch · Sep 20, 2007

If I did, a = [10, 20] and I did d[a]= 'foo', then a.append(30).
If dict complains key error on d[a] now, I won't be surprised. If I do
d[[10, 20, 30]], I will be surprised if it doesn't find the item. Of
course, in today's behavior the above is syntax error.

Click to expand...

It sounds as though you're proposing something like the following:

k = mylist([1, 2])
d = {k : 'test'}
d[k] 'test'
k.append(3)
d[k]

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

So far, so good. But how do you explain the following to a confused
newcomer?

d.keys() [[1, 2, 3]]
k in d.keys() True
k in d False
d[k]

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

In other words, to repeat Sion Arrowsmith's question, what would you
expect d.keys() to return after a key of d has been modified?

Click to expand...

In the new model, it should be the value at the time of addition.
That is [1,2] (not [1,2,3]). This does mean a copy of key in
maintained internally in the dict.

A copy!? That has to be a deep copy. Which would make `dict`\s alot
slower and use more memory. Plus you can't store objects that can't be
copied anymore. That doesn't sound like a good trade off to me.

Ciao,
Marc 'BlackJack' Rintsch

Math python question	10	Mar 31, 2023
random.sample with large weighted sample-sets?	0	Feb 16, 2014
Python List Comprehension Error: Unexpected Output	1	Aug 28, 2023
Processing in Python help	0	Aug 31, 2022
Algorithm for generating pitch-class sets in prime form	1	Jan 28, 2011
Ordering python sets	31	Oct 22, 2008
Problem involving sets...	4	Apr 14, 2006
Python sets which support multiple same elements	2	May 20, 2011

Sets in Python

sapsi

Evil Bert

Raymond Hettinger

Asun Friere

Dustan

Paddy

Francesco Guerrieri

Sion Arrowsmith

Karthik Gurusamy

Vampire

Paddy

Karthik Gurusamy

Mark Dickinson

Steven D'Aprano

prikar20

Bryan Olson

Karthik Gurusamy

Paddy

thebjorn

Marc 'BlackJack' Rintsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads