Rich comparison methods don't work in sets?

G

Gustavo Narea

Hello, everyone.

I've noticed that if I have a class with so-called "rich comparison"
methods
(__eq__, __ne__, etc.), when its instances are included in a set,
set.__contains__/__eq__ won't call the .__eq__ method of the elements
and thus
the code below:
"""
obj1 = RichComparisonClass()
obj2 = RichComparisonClass()

set1 = set([obj1])
set2 = set([obj2])

if obj1 == obj2:
print "Objects 1 and 2 are equivalent"
else:
print "Objects 1 and 2 aren't equivalent"

if set1 == set2:
print "Sets 1 and 2 are equivalent"
else:
print "Sets 1 and 2 aren't equivalent"
"""

Would output:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 aren't equivalent
"""

instead of:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 are equivalent
"""

How can I work around this? The only solution that comes to my mind is
to
write a function like this:
"""
def same_sets(set1, set2):
if len(set1) != len(set2): return False
for element1 in set1:
found = False
for element2 in set2:
if element1 == element2:
found = True
break;
if not found:
return False
return True
"""

But I see it pretty ugly; also I'd also have to roll out my own
"contains"
(e.g., `element in set`) function.

I expect and hope there's a pythonic way to do this. Does it exist?

Thanks in advance!
 
M

Matthew Wilson

Hello, everyone.

I've noticed that if I have a class with so-called "rich comparison"
methods
(__eq__, __ne__, etc.), when its instances are included in a set,
set.__contains__/__eq__ won't call the .__eq__ method of the elements
and thus
the code below:
"""
obj1 = RichComparisonClass()
obj2 = RichComparisonClass()

What does

return? I don't know anything about set internals.

Matt
 
P

Peter Otten

Gustavo said:
I've noticed that if I have a class with so-called "rich comparison"
methods
(__eq__, __ne__, etc.), when its instances are included in a set,
set.__contains__/__eq__ won't call the .__eq__ method of the elements
and thus
the code below:
"""
obj1 = RichComparisonClass()

How is RichComparisonClass implemented? Did you provide a __hash__() method
such that obj1 == obj2 implies hash(obj1) == hash(obj2)? This is necessary
for instances of the class to work properly with sets and dicts.

Peter
 
C

Chris Rebert

Hello, everyone.

I've noticed that if I have a class with so-called "rich comparison"
methods
(__eq__, __ne__, etc.), when its instances are included in a set,
set.__contains__/__eq__ won't call the .__eq__ method of the elements
and thus
the code below:
"""
obj1 = RichComparisonClass()
obj2 = RichComparisonClass()

set1 = set([obj1])
set2 = set([obj2])

if obj1 == obj2:
   print "Objects 1 and 2 are equivalent"
else:
   print "Objects 1 and 2 aren't equivalent"

if set1 == set2:
   print "Sets 1 and 2 are equivalent"
else:
   print "Sets 1 and 2 aren't equivalent"
"""

Would output:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 aren't equivalent
"""

instead of:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 are equivalent
"""

How can I work around this? The only solution that comes to my mind is

Note that sets are dict-based and thus use hash tables internally.
Thus, you must implement a sensible __hash__ method.
The default __hash__ provided by class `object` uses the object ID for the hash.

Since id(x) == id(y) iff x is y, and
hash(x) != hash(y) implies x != y,
If you don't implement __hash__, object's implementation causes every
distinct object of your type to be considered unequal a priori, so it
doesn't bother to check any further by calling the comparison methods.

Cheers,
Chris
 
A

Aaron Brady

Hello, everyone.
I've noticed that if I have a class with so-called "rich comparison"
methods
(__eq__, __ne__, etc.), when its instances are included in a set,
set.__contains__/__eq__ won't call the .__eq__ method of the elements
and thus
the code below:
snip
"""
obj1 = RichComparisonClass()
obj2 = RichComparisonClass()
set1 = set([obj1])
set2 = set([obj2])
if obj1 == obj2:
   print "Objects 1 and 2 are equivalent"
else:
   print "Objects 1 and 2 aren't equivalent"
if set1 == set2:
   print "Sets 1 and 2 are equivalent"
else:
   print "Sets 1 and 2 aren't equivalent"
"""
Would output:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 aren't equivalent
"""
instead of:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 are equivalent
"""
How can I work around this? The only solution that comes to my mind is

Note that sets are dict-based and thus use hash tables internally.
Thus, you must implement a sensible __hash__ method.
The default __hash__ provided by class `object` uses the object ID for the hash.

Since id(x) == id(y)  iff  x is y, and
hash(x) != hash(y) implies x != y,
If you don't implement __hash__, object's implementation causes every
distinct object of your type to be considered unequal a priori, so it
doesn't bother to check any further by calling the comparison methods.

Cheers,
Chris
--http://blog.rebertia.com

You're using sets for uniqueness and faster lookups than a list or
other collection. Uniqueness is determined by hash key first, then by
identity (due to an idiosyncrasy of 'RichCompareBool'), then by rich
equality. If you want to compare to every element, you can't use
sets, because they take short-cuts by omitting many of the
comparisons.
 
G

Gustavo Narea

Hello again, everybody.

Thank you very much for your responses. You guessed right, I didn't
use the __hash__ method (and I forgot to mention that, sorry).

And unfortunately, I think I can't make them hashable, because the
objects are compared based on their attributes, which are in turn
other kind of objects compared based on other attributes. All these
class instances are compared with __eq__/__ne__ and they wrap
relatively complex data which would be hard to attempt to represent
them unambiguously using a 32-bit integer. That's why I'm afraid I
cannot use hashables.

I guess I'll have to use something like the function of my first
post. :(

Thanks,

- Gustavo.
 
S

Steven D'Aprano

Gustavo said:
Hello again, everybody.

Thank you very much for your responses. You guessed right, I didn't
use the __hash__ method (and I forgot to mention that, sorry).

And unfortunately, I think I can't make them hashable, because the
objects are compared based on their attributes, which are in turn
other kind of objects compared based on other attributes. All these
class instances are compared with __eq__/__ne__ and they wrap
relatively complex data which would be hard to attempt to represent
them unambiguously using a 32-bit integer.

There is no need for hash to represent the data unambiguously.
2



The rule is, if x and y are equal, then hash(x) should equal hash(y). This
does NOT imply that if hash(x) == hash(y), then x must equal y, nor is
there any requirement for every unique piece of data to have a unique hash.
 
M

MRAB

Gustavo said:
Hello again, everybody.

Thank you very much for your responses. You guessed right, I didn't
use the __hash__ method (and I forgot to mention that, sorry).

And unfortunately, I think I can't make them hashable, because the
objects are compared based on their attributes, which are in turn
other kind of objects compared based on other attributes. All these
class instances are compared with __eq__/__ne__ and they wrap
relatively complex data which would be hard to attempt to represent
them unambiguously using a 32-bit integer. That's why I'm afraid I
cannot use hashables.

I guess I'll have to use something like the function of my first
post. :(
A hash doesn't have to be unambiguous. It's just a way to reduce the
number of equality checks that have to be made.

Could you create a hash from a tuple of attributes?

If all else fails you could define a hash function that returns a
constant. You would, however, lose the speed advantage that a hash
gives in narrowing down the possible matches.
 
G

Gabriel Genellina

Thank you very much for your responses. You guessed right, I didn't
use the __hash__ method (and I forgot to mention that, sorry).

And unfortunately, I think I can't make them hashable, because the
objects are compared based on their attributes, which are in turn
other kind of objects compared based on other attributes. All these
class instances are compared with __eq__/__ne__ and they wrap
relatively complex data which would be hard to attempt to represent
them unambiguously using a 32-bit integer. That's why I'm afraid I
cannot use hashables.

Combine the hash values of whatever objects are involved in __eq__ but
make sure they cannot change (in that case the hash value would be
invalid).
No need for hash to be unambiguous - objects that compare equal must have
the same hash value, but objects with the same hash value may be different.
 
R

Robert Kern

A hash doesn't have to be unambiguous. It's just a way to reduce the
number of equality checks that have to be made.

Could you create a hash from a tuple of attributes?

A typical way to implement this is to make a method that returns a tuple of
attributes to use in both the __eq__ and __hash__ methods.

class Foo(object):

def key(self):
return (self.x, self.y, self.bar)

def __eq__(self, other):
return self.key() == other.key()

def __ne__(self, other):
return not (self == other)

def __hash__(self):
return hash(self.key())


As long as those attributes are also hashable, this usually works well.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
A

Aaron Brady

A hash doesn't have to be unambiguous. It's just a way to reduce the
number of equality checks that have to be made.

Could you create a hash from a tuple of attributes?

If all else fails you could define a hash function that returns a
constant.

Ooo sneaky. +1 fancy.
You would, however, lose the speed advantage that a hash
gives in narrowing down the possible matches.

Are there /any/ immutable members of the objects?
 
T

Terry Reedy

Gustavo said:
Hello again, everybody.

Thank you very much for your responses. You guessed right, I didn't
use the __hash__ method (and I forgot to mention that, sorry).

And unfortunately, I think I can't make them hashable, because the
objects are compared based on their attributes, which are in turn
other kind of objects compared based on other attributes. All these
class instances are compared with __eq__/__ne__ and they wrap
relatively complex data which would be hard to attempt to represent
them unambiguously using a 32-bit integer. That's why I'm afraid I
cannot use hashables.

If the result of 'o1 == o2' changes over time, then the objects are not
very suitable as set members.
 
R

Robert Kern

If the result of 'o1 == o2' changes over time, then the objects are not
very suitable as set members.

Sometimes, I think it would be nice if sets and the other containers could be
modified to accept a user-supplied hash and comparison functions on
construction. A key function like list.sort() would probably also work. Even
when you have objects that aren't hashable in general, there are many times when
you know you will not be modifying them in a given context. Being able to do set
operations on them in that context would be really useful.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
G

Gustavo Narea

Hi, everyone.

OK, I got it now! The value of the hash is not decisive, as __eq__
will still be called when the hashes match. It's like a filter, for
performance reasons.

It's really nice, I just tried it and it worked.

Thank you very, very much!!

Cheers,

- Gustavo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top