Re: Rich Comparisons Gotcha

Discussion in 'Python' started by James Stroud, Dec 7, 2008.

  1. James Stroud

    James Stroud Guest

    Rasmus Fogh wrote:
    > Current behaviour is both inconsistent and counterintuitive, as these
    > examples show.
    >
    >>>> x = float('NaN')
    >>>> x == x

    > False


    Perhaps this should raise an exception? I think the problem is not with
    comparisons in general but with the fact that nan is type float:

    py> type(float('NaN'))
    <type 'float'>

    No float can be equal to nan, but nan is a float. How can something be
    not a number and a float at the same time? The illogicality of nan's
    type creates the possibility for the illogical results of comparisons to
    nan including comparing nan to itself.

    >>>> ll = [x]
    >>>> x in ll

    > True
    >>>> x == ll[0]

    > False


    But there is consistency on the basis of identity which is the test for
    containment (in):

    py> x is x
    True
    py> x in [x]
    True

    Identity and equality are two different concepts. Comparing identity to
    equality is like comparing apples to oranges ;o)

    >
    >>>> import numpy
    >>>> y = numpy.zeros((3,))
    >>>> y

    > array([ 0., 0., 0.])
    >>>> bool(y==y)

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > ValueError: The truth value of an array with more than one element is
    > ambiguous. Use a.any() or a.all()


    But the equality test is not what fails here. It's the cast to bool that
    fails, which for numpy works like a unary ufunc. The designers of numpy
    thought that this would be a more desirable behavior. The test for
    equality likewise is a binary ufunc and the behavior was chosen in numpy
    for practical reasons. I don't know if you can overload the == operator
    in C, but if you can, you would be able to achieve the same behavior.

    >>>> ll1 = [y,1]
    >>>> y in ll1

    > True
    >>>> ll2 = [1,y]
    >>>> y in ll2

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > ValueError: The truth value of an array with more than one element is
    > ambiguous. Use a.any() or a.all()


    I think you could be safe calling this a bug with numpy. But the fact
    that someone can create a bug with a language is not a condemnation of
    the language. For example, C makes it real easy to crash a program by
    overrunning the limits of an array, but no one would suggest to remove
    arrays from C.

    > Can anybody see a way this could be fixed (please)? I may well have to
    > live with it, but I would really prefer not to.


    Your only hope is to somehow convince the language designers to remove
    the ability to overload == then get them to agree on what you think the
    proper behavior should be for comparisons. I think the probability of
    that happening is about zero, though, because such a change would run
    counter to the dynamic nature of the language.

    James


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
     
    James Stroud, Dec 7, 2008
    #1
    1. Advertising

  2. James Stroud

    James Stroud Guest

    James Stroud wrote:
    >[cast to bool] for numpy works like a unary ufunc.


    Scratch that. Not thinking and typing at same time.


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
     
    James Stroud, Dec 7, 2008
    #2
    1. Advertising

  3. On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

    > Rasmus Fogh wrote:
    >> Current behaviour is both inconsistent and counterintuitive, as these
    >> examples show.
    >>
    >>>>> x = float('NaN')
    >>>>> x == x

    >> False

    >
    > Perhaps this should raise an exception?


    Why on earth would you want checking equality on NaN to raise an
    exception??? What benefit does it give?


    > I think the problem is not with
    > comparisons in general but with the fact that nan is type float:
    >
    > py> type(float('NaN'))
    > <type 'float'>
    >
    > No float can be equal to nan, but nan is a float. How can something be
    > not a number and a float at the same time?


    Because floats are not real numbers. They are *almost* numbers, they
    often (but not always) behave like numbers, but they're actually not
    numbers.

    The difference is subtle enough that it is easy to forget that floats are
    not numbers, but it's easy enough to find examples proving it:

    Some perfectly good numbers don't exist as floats:

    >>> 2**-10000 == 0.0

    True

    Try as you might, you can't get the number 0.1 *exactly* as a float:

    >>> 0.1

    0.10000000000000001


    For any numbers x and y not equal to zero, x+y != x. But that fails for
    floats:

    >>> 1001.0 + 1e99 == 1e99

    True

    The above is because of overflow. But even avoiding overflow doesn't
    solve the problem. With a little effort, you can also find examples of
    "ordinary sized" floats where (x+y)-y != x.

    >>> 0.9+0.1-0.9 == 0.1

    False



    >>>>> import numpy
    >>>>> y = numpy.zeros((3,))
    >>>>> y

    >> array([ 0., 0., 0.])
    >>>>> bool(y==y)

    >> Traceback (most recent call last):
    >> File "<stdin>", line 1, in <module>
    >> ValueError: The truth value of an array with more than one element is
    >> ambiguous. Use a.any() or a.all()

    >
    > But the equality test is not what fails here. It's the cast to bool that
    > fails


    And it is right to do so, because it is ambiguous and the library
    designers rightly avoided the temptation of guessing what result is
    needed.


    >>>>> ll1 = [y,1]
    >>>>> y in ll1

    >> True
    >>>>> ll2 = [1,y]
    >>>>> y in ll2

    >> Traceback (most recent call last):
    >> File "<stdin>", line 1, in <module>
    >> ValueError: The truth value of an array with more than one element is
    >> ambiguous. Use a.any() or a.all()

    >
    > I think you could be safe calling this a bug with numpy.


    Only in the sense that there are special cases where the array elements
    are all true, or all false, and numpy *could* safely return a bool. But
    special cases are not special enough to break the rules. Better for the
    numpy caller to write this:

    a.all() # or any()

    instead of:

    try:
    bool(a)
    except ValueError:
    a.all()

    as they would need to do if numpy sometimes returned a bool and sometimes
    raised an exception.



    --
    Steven
     
    Steven D'Aprano, Dec 7, 2008
    #3
  4. James Stroud

    James Stroud Guest

    Steven D'Aprano wrote:
    > On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:
    >
    >> Rasmus Fogh wrote:


    >>>>>> ll1 = [y,1]
    >>>>>> y in ll1
    >>> True
    >>>>>> ll2 = [1,y]
    >>>>>> y in ll2
    >>> Traceback (most recent call last):
    >>> File "<stdin>", line 1, in <module>
    >>> ValueError: The truth value of an array with more than one element is
    >>> ambiguous. Use a.any() or a.all()

    >> I think you could be safe calling this a bug with numpy.

    >
    > Only in the sense that there are special cases where the array elements
    > are all true, or all false, and numpy *could* safely return a bool. But
    > special cases are not special enough to break the rules. Better for the
    > numpy caller to write this:
    >
    > a.all() # or any()
    >
    > instead of:
    >
    > try:
    > bool(a)
    > except ValueError:
    > a.all()
    >
    > as they would need to do if numpy sometimes returned a bool and sometimes
    > raised an exception.


    I'm missing how a.all() solves the problem Rasmus describes, namely that
    the order of a python *list* affects the results of containment tests by
    numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
    results in his example. It still seems like a bug in numpy to me, even
    if too much other stuff is broken if you fix it (in which case it
    apparently becomes an "issue").

    James
     
    James Stroud, Dec 8, 2008
    #4
  5. James Stroud

    Robert Kern Guest

    James Stroud wrote:
    > Steven D'Aprano wrote:
    >> On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:
    >>
    >>> Rasmus Fogh wrote:

    >
    >>>>>>> ll1 = [y,1]
    >>>>>>> y in ll1
    >>>> True
    >>>>>>> ll2 = [1,y]
    >>>>>>> y in ll2
    >>>> Traceback (most recent call last):
    >>>> File "<stdin>", line 1, in <module>
    >>>> ValueError: The truth value of an array with more than one element is
    >>>> ambiguous. Use a.any() or a.all()
    >>> I think you could be safe calling this a bug with numpy.

    >>
    >> Only in the sense that there are special cases where the array
    >> elements are all true, or all false, and numpy *could* safely return a
    >> bool. But special cases are not special enough to break the rules.
    >> Better for the numpy caller to write this:
    >>
    >> a.all() # or any()
    >>
    >> instead of:
    >>
    >> try:
    >> bool(a)
    >> except ValueError:
    >> a.all()
    >>
    >> as they would need to do if numpy sometimes returned a bool and
    >> sometimes raised an exception.

    >
    > I'm missing how a.all() solves the problem Rasmus describes, namely that
    > the order of a python *list* affects the results of containment tests by
    > numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
    > results in his example. It still seems like a bug in numpy to me, even
    > if too much other stuff is broken if you fix it (in which case it
    > apparently becomes an "issue").


    It's an issue, if anything, not a bug. There is no consistent implementation of
    bool(some_array) that works in all cases. numpy's predecessor Numeric used to
    implement this as returning True if at least one element was non-zero. This
    works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not
    work well for bool(x==y) (which should be (x==y).all()), but many people got
    confused and thought that bool(x==y) worked. When we made numpy, we decided to
    explicitly not allow bool(some_array) so that people will not write buggy code
    like this again.

    The deficiency is in the feature of rich comparisons, not numpy's implementation
    of it. __eq__() is allowed to return non-booleans; however, there are some parts
    of Python's implementation like list.__contains__() that still expect the return
    value of __eq__() to be meaningfully cast to a boolean.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Dec 8, 2008
    #5
  6. James Stroud

    James Stroud Guest

    Robert Kern wrote:
    > James Stroud wrote:
    >> I'm missing how a.all() solves the problem Rasmus describes, namely
    >> that the order of a python *list* affects the results of containment
    >> tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
    >> different results in his example. It still seems like a bug in numpy
    >> to me, even if too much other stuff is broken if you fix it (in which
    >> case it apparently becomes an "issue").

    >
    > It's an issue, if anything, not a bug. There is no consistent
    > implementation of bool(some_array) that works in all cases. numpy's
    > predecessor Numeric used to implement this as returning True if at least
    > one element was non-zero. This works well for bool(x!=y) (which is
    > equivalent to (x!=y).any()) but does not work well for bool(x==y) (which
    > should be (x==y).all()), but many people got confused and thought that
    > bool(x==y) worked. When we made numpy, we decided to explicitly not
    > allow bool(some_array) so that people will not write buggy code like
    > this again.
    >
    > The deficiency is in the feature of rich comparisons, not numpy's
    > implementation of it. __eq__() is allowed to return non-booleans;
    > however, there are some parts of Python's implementation like
    > list.__contains__() that still expect the return value of __eq__() to be
    > meaningfully cast to a boolean.
    >


    You have explained

    py> 112 = [1, y]
    py> y in 112
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is...

    but not

    py> ll1 = [y,1]
    py> y in ll1
    True

    It's this discrepancy that seems like a bug, not that a ValueError is
    raised in the former case, which is perfectly reasonable to me.


    All I can imagine is that something like the following lives in the
    bowels of the python code for list:

    def __contains__(self, other):
    foundit = False
    for i, v in enumerate(self):
    if i == 0:
    # evaluates to bool numpy array
    foundit = one_kind_of_test(v, other)
    else:
    # raises exception for numpy array
    foundit = another_kind_of_test(v, other)
    if foundit:
    break
    return foundit

    I'm trying to imagine some other way to get the results mentioned but I
    honestly can't. It's beyond me why someone would do such a thing, but
    perhaps it's an optimization of some sort.

    James
     
    James Stroud, Dec 8, 2008
    #6
  7. James Stroud

    Robert Kern Guest

    James Stroud wrote:
    > Robert Kern wrote:
    >> James Stroud wrote:
    >>> I'm missing how a.all() solves the problem Rasmus describes, namely
    >>> that the order of a python *list* affects the results of containment
    >>> tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
    >>> different results in his example. It still seems like a bug in numpy
    >>> to me, even if too much other stuff is broken if you fix it (in which
    >>> case it apparently becomes an "issue").

    >>
    >> It's an issue, if anything, not a bug. There is no consistent
    >> implementation of bool(some_array) that works in all cases. numpy's
    >> predecessor Numeric used to implement this as returning True if at
    >> least one element was non-zero. This works well for bool(x!=y) (which
    >> is equivalent to (x!=y).any()) but does not work well for bool(x==y)
    >> (which should be (x==y).all()), but many people got confused and
    >> thought that bool(x==y) worked. When we made numpy, we decided to
    >> explicitly not allow bool(some_array) so that people will not write
    >> buggy code like this again.
    >>
    >> The deficiency is in the feature of rich comparisons, not numpy's
    >> implementation of it. __eq__() is allowed to return non-booleans;
    >> however, there are some parts of Python's implementation like
    >> list.__contains__() that still expect the return value of __eq__() to
    >> be meaningfully cast to a boolean.
    >>

    >
    > You have explained
    >
    > py> 112 = [1, y]
    > py> y in 112
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > ValueError: The truth value of an array with more than one element is...
    >
    > but not
    >
    > py> ll1 = [y,1]
    > py> y in ll1
    > True
    >
    > It's this discrepancy that seems like a bug, not that a ValueError is
    > raised in the former case, which is perfectly reasonable to me.


    Nothing to do with numpy. list.__contains__() checks for identity with "is"
    before it goes to __eq__().

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Dec 8, 2008
    #7
  8. James Stroud

    James Stroud Guest

    Robert Kern wrote:
    > James Stroud wrote:
    >> py> 112 = [1, y]
    >> py> y in 112
    >> Traceback (most recent call last):
    >> File "<stdin>", line 1, in <module>
    >> ValueError: The truth value of an array with more than one element is...
    >>
    >> but not
    >>
    >> py> ll1 = [y,1]
    >> py> y in ll1
    >> True
    >>
    >> It's this discrepancy that seems like a bug, not that a ValueError is
    >> raised in the former case, which is perfectly reasonable to me.

    >
    > Nothing to do with numpy. list.__contains__() checks for identity with
    > "is" before it goes to __eq__().


    ....but only for the first element of the list:

    py> import numpy
    py> y = numpy.array([1,2,3])
    py> y
    array([1, 2, 3])
    py> y in [1, y]
    ------------------------------------------------------------
    Traceback (most recent call last):
    File "<ipython console>", line 1, in <module>
    <type 'exceptions.ValueError'>: The truth value of an array with more
    than one element is ambiguous. Use a.any() or a.all()
    py> y is [1, y][1]
    True

    I think it skips straight to __eq__ if the element is not the first in
    the list. That no one acknowledges this makes me feel like a conspiracy
    is afoot.
     
    James Stroud, Dec 8, 2008
    #8
  9. James Stroud

    Robert Kern Guest

    James Stroud wrote:
    > Robert Kern wrote:
    >> James Stroud wrote:
    >>> py> 112 = [1, y]
    >>> py> y in 112
    >>> Traceback (most recent call last):
    >>> File "<stdin>", line 1, in <module>
    >>> ValueError: The truth value of an array with more than one element is...
    >>>
    >>> but not
    >>>
    >>> py> ll1 = [y,1]
    >>> py> y in ll1
    >>> True
    >>>
    >>> It's this discrepancy that seems like a bug, not that a ValueError is
    >>> raised in the former case, which is perfectly reasonable to me.

    >>
    >> Nothing to do with numpy. list.__contains__() checks for identity with
    >> "is" before it goes to __eq__().

    >
    > ...but only for the first element of the list:
    >
    > py> import numpy
    > py> y = numpy.array([1,2,3])
    > py> y
    > array([1, 2, 3])
    > py> y in [1, y]
    > ------------------------------------------------------------
    > Traceback (most recent call last):
    > File "<ipython console>", line 1, in <module>
    > <type 'exceptions.ValueError'>: The truth value of an array with more
    > than one element is ambiguous. Use a.any() or a.all()
    > py> y is [1, y][1]
    > True
    >
    > I think it skips straight to __eq__ if the element is not the first in
    > the list.


    No, it doesn't skip straight to __eq__(). "y is 1" returns False, so (y==1) is
    checked. When y is a numpy array, this returns an array of bools.
    list.__contains__() tries to convert this array to a bool and
    ndarray.__nonzero__() raises the exception.

    list.__contains__() checks "is" then __eq__() for each element before moving on
    to the next element. It does not try "is" for all elements, then try __eq__()
    for all elements.

    > That no one acknowledges this makes me feel like a conspiracy
    > is afoot.


    I don't know what you think I'm not acknowledging.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Dec 8, 2008
    #9
  10. James Stroud

    James Stroud Guest

    Robert Kern wrote:
    > James Stroud wrote:
    >> I think it skips straight to __eq__ if the element is not the first in
    >> the list.

    >
    > No, it doesn't skip straight to __eq__(). "y is 1" returns False, so
    > (y==1) is checked. When y is a numpy array, this returns an array of
    > bools. list.__contains__() tries to convert this array to a bool and
    > ndarray.__nonzero__() raises the exception.
    >
    > list.__contains__() checks "is" then __eq__() for each element before
    > moving on to the next element. It does not try "is" for all elements,
    > then try __eq__() for all elements.


    Ok. Thanks for the explanation.

    > > That no one acknowledges this makes me feel like a conspiracy
    > > is afoot.

    >
    > I don't know what you think I'm not acknowledging.


    Sorry. That was a failed attempt at humor.

    James
     
    James Stroud, Dec 8, 2008
    #10
  11. On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
    > The deficiency is in the feature of rich comparisons, not numpy's
    > implementation of it. __eq__() is allowed to return non-booleans; however,
    > there are some parts of Python's implementation like list.__contains__()
    > that still expect the return value of __eq__() to be meaningfully cast to a
    > boolean.


    list.__contains__, tuple.__contains__, the 'if' keyword...

    How do can you suggest to fix the list.__contains__ implementation?

    Should I wrap all my "if"s with this?:

    if isinstance(a, numpy.array) or isisntance(b,numpy.array):
    res = compare_numpy(a,b)
    elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
    res = compare_someotherclass(a,b)
    ...
    else:
    res = (a == b)
    if res:
    # do whatever

    --
    Luis Zarrabeitia (aka Kyrie)
    Fac. de Matemática y Computación, UH.
    http://profesores.matcom.uh.cu/~kyrie
     
    Luis Zarrabeitia, Dec 10, 2008
    #11
  12. On Wed, 10 Dec 2008 17:58:49 -0500, Luis Zarrabeitia wrote:

    > On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
    >> The deficiency is in the feature of rich comparisons, not numpy's
    >> implementation of it. __eq__() is allowed to return non-booleans;
    >> however, there are some parts of Python's implementation like
    >> list.__contains__() that still expect the return value of __eq__() to
    >> be meaningfully cast to a boolean.

    >
    > list.__contains__, tuple.__contains__, the 'if' keyword...
    >
    > How do can you suggest to fix the list.__contains__ implementation?



    I suggest you don't, because I don't think it's broken. I think it's
    working as designed. It doesn't succeed with arbitrary data types which
    may be broken, buggy or incompatible with __contain__'s design, but
    that's okay, it's not supposed to.


    > Should I wrap all my "if"s with this?:
    >
    > if isinstance(a, numpy.array) or isisntance(b,numpy.array):
    > res = compare_numpy(a,b)
    > elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
    > res = compare_someotherclass(a,b)
    > ...
    > else:
    > res = (a == b)
    > if res:
    > # do whatever


    No, inlining that code everywhere you have an if would be stupid. What
    you should do is write a single function equals(x, y) that does precisely
    what you want it to do, in whatever way you want, and then call it:

    if equals(a, b):

    Or, put your data inside a wrapper. If you read back over my earlier
    posts in this thread, I suggested a lightweight wrapper class you could
    use. You could make it even more useful by using delegation to make the
    wrapped class behave *exactly* like the original, except for __eq__.

    You don't even need to wrap every single item:

    def wrap_or_not(obj):
    if obj in list_of_bad_types_i_know_about:
    return EqualityWrapper(obj)
    return obj

    data = [1, 2, 3, BadData, 4]
    data = map(wrap_or_not, data)



    It isn't really that hard to deal with these things, once you give up the
    illusion that your code should automatically work with arbitrarily wacky
    data types that you don't control.


    --
    Steven
     
    Steven D'Aprano, Dec 11, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven Bethard
    Replies:
    3
    Views:
    281
    Tim Peters
    Sep 21, 2004
  2. Carlos Ribeiro
    Replies:
    5
    Views:
    348
    Carlos Ribeiro
    Sep 22, 2004
  3. Phil Frost
    Replies:
    6
    Views:
    324
    Alex Martelli
    Sep 22, 2004
  4. Robert Kern

    Re: Rich Comparisons Gotcha

    Robert Kern, Dec 7, 2008, in forum: Python
    Replies:
    15
    Views:
    573
    Mark Wooding
    Jan 6, 2009
  5. Steven D'Aprano

    Re: Rich Comparisons Gotcha

    Steven D'Aprano, Dec 8, 2008, in forum: Python
    Replies:
    7
    Views:
    311
    Mark Wooding
    Jan 7, 2009
Loading...

Share This Page