Set and {} comparison confusion

  • Thread starter Roman Yakovenko
  • Start date
R

Roman Yakovenko

Hi. Could somebody explain why sets and dict are different ?

from sets import Set as set

class A(object):
def __init__( self, x ):
self.x = x
def __eq__( self, other ):
return self.__dict__ == other.__dict__


print 'set( [A(1)] ) == set( [A(1)] )', set( [A(1)] ) == set( [A(1)] )
print 'A(1) == A(1)', A(1) == A(1)
print '[A(1)] == [A(1)]', [A(1)] == [A(1)]
d1 = { A(1) : None }
d2 = { A(1) : None }
print 'd1 == d2', d1 == d2
print 'd1.keys() == d2.keys()', d1.keys() == d2.keys()
print 'd1.values() == d2.values()', d1.values() == d2.values()
print 'd1.items() == d2.items()', d1.items() == d2.items()


output:

set( [A(1)] ) == set( [A(1)] ) False
A(1) == A(1) True
[A(1)] == [A(1)] True
d1 == d2 False
d1.keys() == d2.keys() True
d1.values() == d2.values() True
d1.items() == d2.items() True

Thanks.
 
P

Peter Otten

Roman said:
Hi. Could somebody explain why sets and dict are different ?
.... def __eq__(self, other):
.... return True
....
dict.fromkeys([A()]) == dict.fromkeys([A()]) False
class B(object):
.... def __eq__(self, other):
.... return True
.... def __hash__(self):
.... return 0 # not recommended
....
dict.fromkeys([B()]) == dict.fromkeys([B()])
True

For two dictionary keys to be considered as equal they must have the same
hash value. Set is implemented on top of dictionaries and therefore shows
the same behaviour.

Peter
 
A

Alex Martelli

Roman Yakovenko said:
Hi. Could somebody explain why sets and dict are different ?

sets.Set is build around dict, so let's concentrate on dict, the set
just follows from there.
from sets import Set as set

class A(object):
def __init__( self, x ):
self.x = x
def __eq__( self, other ):
return self.__dict__ == other.__dict__

This class is NOT correctly hashable.

A crucial semantic condition for hashing is:

X==Y ***MUST imply*** hash(X)==hash(Y)

But this is not ensured by this class A. For example, as you've noticed
all A(1) instances compare equal to each other:
print 'A(1) == A(1)', A(1) == A(1)

and yet their hashes are all different (and happen to equal their id's,
since you have not overridden __hash__):

xx = []
for i in xrange(4):
xx.append(A(1))
print hash(xx[-1])

emits, e.g.:

255856
256752
359376
357840

((note we're careful to keep the A(1)'s around, otherwise, if one had
been garbage collected the next one might happen to be allocated just at
the same place, thus accidentally have the same id...)).

As you are violating a semantic precondition of dict (that it be only
keyed into by _validly_ hashable keys), don't be surprised if the
behavior of such dicts is weird. You can imagine equality comparison
starts by checking equality of hash values for keys to bail out fast in
most cases, here the hash values are different, so, poof.

Not easy to make that class A _correctly_ hashable, either (and it
really should be immutable too). Assuming self.x is always hashable and
nobody ever reassigns it nor assigns any other attribute, you could try
adding something like...:

def __hash__(self):
return hash(tuple(self.__dict__.iteritems()))


Alex
 
A

Alex Martelli

Heiko Wundram said:
Am Donnerstag, 9. September 2004 10:18 schrieb Alex Martelli:

Minor correction, shouldn't this be:

for item in onelist:
if item not in another: return False
...

Notice the nice little word not, which makes all the difference... ;)

Heh, yes, of course -- we're checking set equality, not complete
disjunction (which would only need half the amount of work!-)


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top