# Re: Rich Comparisons Gotcha

Discussion in 'Python' started by James Stroud, Dec 7, 2008.

1. ### James StroudGuest

Rasmus Fogh wrote:
> Current behaviour is both inconsistent and counterintuitive, as these
> examples show.
>
>>>> x = float('NaN')
>>>> x == x

> False

Perhaps this should raise an exception? I think the problem is not with
comparisons in general but with the fact that nan is type float:

py> type(float('NaN'))
<type 'float'>

No float can be equal to nan, but nan is a float. How can something be
not a number and a float at the same time? The illogicality of nan's
type creates the possibility for the illogical results of comparisons to
nan including comparing nan to itself.

>>>> ll = [x]
>>>> x in ll

> True
>>>> x == ll[0]

> False

But there is consistency on the basis of identity which is the test for
containment (in):

py> x is x
True
py> x in [x]
True

Identity and equality are two different concepts. Comparing identity to
equality is like comparing apples to oranges ;o)

>
>>>> import numpy
>>>> y = numpy.zeros((3,))
>>>> y

> array([ 0., 0., 0.])
>>>> bool(y==y)

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()

But the equality test is not what fails here. It's the cast to bool that
fails, which for numpy works like a unary ufunc. The designers of numpy
thought that this would be a more desirable behavior. The test for
equality likewise is a binary ufunc and the behavior was chosen in numpy
for practical reasons. I don't know if you can overload the == operator
in C, but if you can, you would be able to achieve the same behavior.

>>>> ll1 = [y,1]
>>>> y in ll1

> True
>>>> ll2 = [1,y]
>>>> y in ll2

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()

I think you could be safe calling this a bug with numpy. But the fact
that someone can create a bug with a language is not a condemnation of
the language. For example, C makes it real easy to crash a program by
overrunning the limits of an array, but no one would suggest to remove
arrays from C.

> Can anybody see a way this could be fixed (please)? I may well have to
> live with it, but I would really prefer not to.

Your only hope is to somehow convince the language designers to remove
the ability to overload == then get them to agree on what you think the
proper behavior should be for comparisons. I think the probability of
that happening is about zero, though, because such a change would run
counter to the dynamic nature of the language.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com

James Stroud, Dec 7, 2008

2. ### James StroudGuest

James Stroud wrote:
>[cast to bool] for numpy works like a unary ufunc.

Scratch that. Not thinking and typing at same time.

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com

James Stroud, Dec 7, 2008

3. ### Steven D'ApranoGuest

On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

> Rasmus Fogh wrote:
>> Current behaviour is both inconsistent and counterintuitive, as these
>> examples show.
>>
>>>>> x = float('NaN')
>>>>> x == x

>> False

>
> Perhaps this should raise an exception?

Why on earth would you want checking equality on NaN to raise an
exception??? What benefit does it give?

> I think the problem is not with
> comparisons in general but with the fact that nan is type float:
>
> py> type(float('NaN'))
> <type 'float'>
>
> No float can be equal to nan, but nan is a float. How can something be
> not a number and a float at the same time?

Because floats are not real numbers. They are *almost* numbers, they
often (but not always) behave like numbers, but they're actually not
numbers.

The difference is subtle enough that it is easy to forget that floats are
not numbers, but it's easy enough to find examples proving it:

Some perfectly good numbers don't exist as floats:

>>> 2**-10000 == 0.0

True

Try as you might, you can't get the number 0.1 *exactly* as a float:

>>> 0.1

0.10000000000000001

For any numbers x and y not equal to zero, x+y != x. But that fails for
floats:

>>> 1001.0 + 1e99 == 1e99

True

The above is because of overflow. But even avoiding overflow doesn't
solve the problem. With a little effort, you can also find examples of
"ordinary sized" floats where (x+y)-y != x.

>>> 0.9+0.1-0.9 == 0.1

False

>>>>> import numpy
>>>>> y = numpy.zeros((3,))
>>>>> y

>> array([ 0., 0., 0.])
>>>>> bool(y==y)

>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()

>
> But the equality test is not what fails here. It's the cast to bool that
> fails

And it is right to do so, because it is ambiguous and the library
designers rightly avoided the temptation of guessing what result is
needed.

>>>>> ll1 = [y,1]
>>>>> y in ll1

>> True
>>>>> ll2 = [1,y]
>>>>> y in ll2

>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()

>
> I think you could be safe calling this a bug with numpy.

Only in the sense that there are special cases where the array elements
are all true, or all false, and numpy *could* safely return a bool. But
special cases are not special enough to break the rules. Better for the
numpy caller to write this:

a.all() # or any()

try:
bool(a)
except ValueError:
a.all()

as they would need to do if numpy sometimes returned a bool and sometimes
raised an exception.

--
Steven

Steven D'Aprano, Dec 7, 2008
4. ### James StroudGuest

Steven D'Aprano wrote:
> On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:
>
>> Rasmus Fogh wrote:

>>>>>> ll1 = [y,1]
>>>>>> y in ll1
>>> True
>>>>>> ll2 = [1,y]
>>>>>> y in ll2
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> ValueError: The truth value of an array with more than one element is
>>> ambiguous. Use a.any() or a.all()

>> I think you could be safe calling this a bug with numpy.

>
> Only in the sense that there are special cases where the array elements
> are all true, or all false, and numpy *could* safely return a bool. But
> special cases are not special enough to break the rules. Better for the
> numpy caller to write this:
>
> a.all() # or any()
>
>
> try:
> bool(a)
> except ValueError:
> a.all()
>
> as they would need to do if numpy sometimes returned a bool and sometimes
> raised an exception.

I'm missing how a.all() solves the problem Rasmus describes, namely that
the order of a python *list* affects the results of containment tests by
numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
results in his example. It still seems like a bug in numpy to me, even
if too much other stuff is broken if you fix it (in which case it
apparently becomes an "issue").

James

James Stroud, Dec 8, 2008
5. ### Robert KernGuest

James Stroud wrote:
> Steven D'Aprano wrote:
>> On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:
>>
>>> Rasmus Fogh wrote:

>
>>>>>>> ll1 = [y,1]
>>>>>>> y in ll1
>>>> True
>>>>>>> ll2 = [1,y]
>>>>>>> y in ll2
>>>> Traceback (most recent call last):
>>>> File "<stdin>", line 1, in <module>
>>>> ValueError: The truth value of an array with more than one element is
>>>> ambiguous. Use a.any() or a.all()
>>> I think you could be safe calling this a bug with numpy.

>>
>> Only in the sense that there are special cases where the array
>> elements are all true, or all false, and numpy *could* safely return a
>> bool. But special cases are not special enough to break the rules.
>> Better for the numpy caller to write this:
>>
>> a.all() # or any()
>>
>>
>> try:
>> bool(a)
>> except ValueError:
>> a.all()
>>
>> as they would need to do if numpy sometimes returned a bool and
>> sometimes raised an exception.

>
> I'm missing how a.all() solves the problem Rasmus describes, namely that
> the order of a python *list* affects the results of containment tests by
> numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
> results in his example. It still seems like a bug in numpy to me, even
> if too much other stuff is broken if you fix it (in which case it
> apparently becomes an "issue").

It's an issue, if anything, not a bug. There is no consistent implementation of
bool(some_array) that works in all cases. numpy's predecessor Numeric used to
implement this as returning True if at least one element was non-zero. This
works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not
work well for bool(x==y) (which should be (x==y).all()), but many people got
confused and thought that bool(x==y) worked. When we made numpy, we decided to
explicitly not allow bool(some_array) so that people will not write buggy code
like this again.

The deficiency is in the feature of rich comparisons, not numpy's implementation
of it. __eq__() is allowed to return non-booleans; however, there are some parts
of Python's implementation like list.__contains__() that still expect the return
value of __eq__() to be meaningfully cast to a boolean.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

Robert Kern, Dec 8, 2008
6. ### James StroudGuest

Robert Kern wrote:
> James Stroud wrote:
>> I'm missing how a.all() solves the problem Rasmus describes, namely
>> that the order of a python *list* affects the results of containment
>> tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
>> different results in his example. It still seems like a bug in numpy
>> to me, even if too much other stuff is broken if you fix it (in which
>> case it apparently becomes an "issue").

>
> It's an issue, if anything, not a bug. There is no consistent
> implementation of bool(some_array) that works in all cases. numpy's
> predecessor Numeric used to implement this as returning True if at least
> one element was non-zero. This works well for bool(x!=y) (which is
> equivalent to (x!=y).any()) but does not work well for bool(x==y) (which
> should be (x==y).all()), but many people got confused and thought that
> bool(x==y) worked. When we made numpy, we decided to explicitly not
> allow bool(some_array) so that people will not write buggy code like
> this again.
>
> The deficiency is in the feature of rich comparisons, not numpy's
> implementation of it. __eq__() is allowed to return non-booleans;
> however, there are some parts of Python's implementation like
> list.__contains__() that still expect the return value of __eq__() to be
> meaningfully cast to a boolean.
>

You have explained

py> 112 = [1, y]
py> y in 112
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is...

but not

py> ll1 = [y,1]
py> y in ll1
True

It's this discrepancy that seems like a bug, not that a ValueError is
raised in the former case, which is perfectly reasonable to me.

All I can imagine is that something like the following lives in the
bowels of the python code for list:

def __contains__(self, other):
foundit = False
for i, v in enumerate(self):
if i == 0:
# evaluates to bool numpy array
foundit = one_kind_of_test(v, other)
else:
# raises exception for numpy array
foundit = another_kind_of_test(v, other)
if foundit:
break
return foundit

I'm trying to imagine some other way to get the results mentioned but I
honestly can't. It's beyond me why someone would do such a thing, but
perhaps it's an optimization of some sort.

James

James Stroud, Dec 8, 2008
7. ### Robert KernGuest

James Stroud wrote:
> Robert Kern wrote:
>> James Stroud wrote:
>>> I'm missing how a.all() solves the problem Rasmus describes, namely
>>> that the order of a python *list* affects the results of containment
>>> tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
>>> different results in his example. It still seems like a bug in numpy
>>> to me, even if too much other stuff is broken if you fix it (in which
>>> case it apparently becomes an "issue").

>>
>> It's an issue, if anything, not a bug. There is no consistent
>> implementation of bool(some_array) that works in all cases. numpy's
>> predecessor Numeric used to implement this as returning True if at
>> least one element was non-zero. This works well for bool(x!=y) (which
>> is equivalent to (x!=y).any()) but does not work well for bool(x==y)
>> (which should be (x==y).all()), but many people got confused and
>> thought that bool(x==y) worked. When we made numpy, we decided to
>> explicitly not allow bool(some_array) so that people will not write
>> buggy code like this again.
>>
>> The deficiency is in the feature of rich comparisons, not numpy's
>> implementation of it. __eq__() is allowed to return non-booleans;
>> however, there are some parts of Python's implementation like
>> list.__contains__() that still expect the return value of __eq__() to
>> be meaningfully cast to a boolean.
>>

>
> You have explained
>
> py> 112 = [1, y]
> py> y in 112
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is...
>
> but not
>
> py> ll1 = [y,1]
> py> y in ll1
> True
>
> It's this discrepancy that seems like a bug, not that a ValueError is
> raised in the former case, which is perfectly reasonable to me.

Nothing to do with numpy. list.__contains__() checks for identity with "is"
before it goes to __eq__().

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

Robert Kern, Dec 8, 2008
8. ### James StroudGuest

Robert Kern wrote:
> James Stroud wrote:
>> py> 112 = [1, y]
>> py> y in 112
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is...
>>
>> but not
>>
>> py> ll1 = [y,1]
>> py> y in ll1
>> True
>>
>> It's this discrepancy that seems like a bug, not that a ValueError is
>> raised in the former case, which is perfectly reasonable to me.

>
> Nothing to do with numpy. list.__contains__() checks for identity with
> "is" before it goes to __eq__().

....but only for the first element of the list:

py> import numpy
py> y = numpy.array([1,2,3])
py> y
array([1, 2, 3])
py> y in [1, y]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.ValueError'>: The truth value of an array with more
than one element is ambiguous. Use a.any() or a.all()
py> y is [1, y][1]
True

I think it skips straight to __eq__ if the element is not the first in
the list. That no one acknowledges this makes me feel like a conspiracy
is afoot.

James Stroud, Dec 8, 2008
9. ### Robert KernGuest

James Stroud wrote:
> Robert Kern wrote:
>> James Stroud wrote:
>>> py> 112 = [1, y]
>>> py> y in 112
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> ValueError: The truth value of an array with more than one element is...
>>>
>>> but not
>>>
>>> py> ll1 = [y,1]
>>> py> y in ll1
>>> True
>>>
>>> It's this discrepancy that seems like a bug, not that a ValueError is
>>> raised in the former case, which is perfectly reasonable to me.

>>
>> Nothing to do with numpy. list.__contains__() checks for identity with
>> "is" before it goes to __eq__().

>
> ...but only for the first element of the list:
>
> py> import numpy
> py> y = numpy.array([1,2,3])
> py> y
> array([1, 2, 3])
> py> y in [1, y]
> ------------------------------------------------------------
> Traceback (most recent call last):
> File "<ipython console>", line 1, in <module>
> <type 'exceptions.ValueError'>: The truth value of an array with more
> than one element is ambiguous. Use a.any() or a.all()
> py> y is [1, y][1]
> True
>
> I think it skips straight to __eq__ if the element is not the first in
> the list.

No, it doesn't skip straight to __eq__(). "y is 1" returns False, so (y==1) is
checked. When y is a numpy array, this returns an array of bools.
list.__contains__() tries to convert this array to a bool and
ndarray.__nonzero__() raises the exception.

list.__contains__() checks "is" then __eq__() for each element before moving on
to the next element. It does not try "is" for all elements, then try __eq__()
for all elements.

> That no one acknowledges this makes me feel like a conspiracy
> is afoot.

I don't know what you think I'm not acknowledging.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

Robert Kern, Dec 8, 2008
10. ### James StroudGuest

Robert Kern wrote:
> James Stroud wrote:
>> I think it skips straight to __eq__ if the element is not the first in
>> the list.

>
> No, it doesn't skip straight to __eq__(). "y is 1" returns False, so
> (y==1) is checked. When y is a numpy array, this returns an array of
> bools. list.__contains__() tries to convert this array to a bool and
> ndarray.__nonzero__() raises the exception.
>
> list.__contains__() checks "is" then __eq__() for each element before
> moving on to the next element. It does not try "is" for all elements,
> then try __eq__() for all elements.

Ok. Thanks for the explanation.

> > That no one acknowledges this makes me feel like a conspiracy
> > is afoot.

>
> I don't know what you think I'm not acknowledging.

Sorry. That was a failed attempt at humor.

James

James Stroud, Dec 8, 2008
11. ### Luis ZarrabeitiaGuest

On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
> The deficiency is in the feature of rich comparisons, not numpy's
> implementation of it. __eq__() is allowed to return non-booleans; however,
> there are some parts of Python's implementation like list.__contains__()
> that still expect the return value of __eq__() to be meaningfully cast to a
> boolean.

list.__contains__, tuple.__contains__, the 'if' keyword...

How do can you suggest to fix the list.__contains__ implementation?

Should I wrap all my "if"s with this?:

if isinstance(a, numpy.array) or isisntance(b,numpy.array):
res = compare_numpy(a,b)
elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
res = compare_someotherclass(a,b)
...
else:
res = (a == b)
if res:
# do whatever

--
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
http://profesores.matcom.uh.cu/~kyrie

Luis Zarrabeitia, Dec 10, 2008
12. ### Steven D'ApranoGuest

On Wed, 10 Dec 2008 17:58:49 -0500, Luis Zarrabeitia wrote:

> On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
>> The deficiency is in the feature of rich comparisons, not numpy's
>> implementation of it. __eq__() is allowed to return non-booleans;
>> however, there are some parts of Python's implementation like
>> list.__contains__() that still expect the return value of __eq__() to
>> be meaningfully cast to a boolean.

>
> list.__contains__, tuple.__contains__, the 'if' keyword...
>
> How do can you suggest to fix the list.__contains__ implementation?

I suggest you don't, because I don't think it's broken. I think it's
working as designed. It doesn't succeed with arbitrary data types which
may be broken, buggy or incompatible with __contain__'s design, but
that's okay, it's not supposed to.

> Should I wrap all my "if"s with this?:
>
> if isinstance(a, numpy.array) or isisntance(b,numpy.array):
> res = compare_numpy(a,b)
> elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
> res = compare_someotherclass(a,b)
> ...
> else:
> res = (a == b)
> if res:
> # do whatever

No, inlining that code everywhere you have an if would be stupid. What
you should do is write a single function equals(x, y) that does precisely
what you want it to do, in whatever way you want, and then call it:

if equals(a, b):

Or, put your data inside a wrapper. If you read back over my earlier
posts in this thread, I suggested a lightweight wrapper class you could
use. You could make it even more useful by using delegation to make the
wrapped class behave *exactly* like the original, except for __eq__.

You don't even need to wrap every single item:

def wrap_or_not(obj):
return EqualityWrapper(obj)
return obj

data = [1, 2, 3, BadData, 4]
data = map(wrap_or_not, data)

It isn't really that hard to deal with these things, once you give up the
illusion that your code should automatically work with arbitrarily wacky
data types that you don't control.

--
Steven

Steven D'Aprano, Dec 11, 2008