numpy.where() and multiple comparisons


J

John Ladasky

Hi folks,

I am awaiting my approval to join the numpy-discussion mailing list, at scipy.org. I realize that would be the best place to ask my question. However, numpy is so widely used, I figure that someone here would be able to help.

I like to use numpy.where() to select parts of arrays. I have encountered what I would consider to be a bug when you try to use where() in conjunction with the multiple comparison syntax of Python. Here's a minimal example:

Python 3.3.2+ (default, Oct 9 2013, 14:50:09)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
import numpy as np
a = np.arange(10)
a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b = np.where(a < 5)
b (array([0, 1, 2, 3, 4]),)
c = np.where(2 < a < 7)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Defining b works as I want and expect. The array contains the indices (notthe values) of a where a < 5.

For my definition of c, I expect (array([3, 4, 5, 6]),). As you can see, Iget a ValueError instead. I have seen the error message about "the truth value of an array with more than one element" before, and generally I understand how I (accidentally) provoke it. This time, I don't see it. In defining c, I expect to be stepping through a, one element at a time, just as Idid when defining b.

Does anyone understand why this happens? Is there a smart work-around? Thanks.
 
Ad

Advertisements

D

duncan smith

Hi folks,

I am awaiting my approval to join the numpy-discussion mailing list, at scipy.org. I realize that would be the best place to ask my question. However, numpy is so widely used, I figure that someone here would be able to help.

I like to use numpy.where() to select parts of arrays. I have encountered what I would consider to be a bug when you try to use where() in conjunction with the multiple comparison syntax of Python. Here's a minimal example:

Python 3.3.2+ (default, Oct 9 2013, 14:50:09)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
import numpy as np
a = np.arange(10)
a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b = np.where(a < 5)
b (array([0, 1, 2, 3, 4]),)
c = np.where(2 < a < 7)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Defining b works as I want and expect. The array contains the indices (not the values) of a where a < 5.

For my definition of c, I expect (array([3, 4, 5, 6]),). As you can see, I get a ValueError instead. I have seen the error message about "the truth value of an array with more than one element" before, and generally I understand how I (accidentally) provoke it. This time, I don't see it. In defining c, I expect to be stepping through a, one element at a time, just as I did when defining b.

Does anyone understand why this happens? Is there a smart work-around? Thanks.

a = np.arange(10)
c = np.where((2 < a) & (a < 7))
c (array([3, 4, 5, 6]),)

Duncan
 
P

Peter Otten

John said:
a = np.arange(10)
c = np.where((2 < a) & (a < 7))
c
(array([3, 4, 5, 6]),)

Nice! Thanks!

Now, why does the multiple comparison fail, if you happen to know?

2 < a < 7

is equivalent to

2 < a and a < 7

Unlike `&` `and` cannot be overridden (*), so the above implies that the
boolean value bool(2 < a) is evaluated. That triggers the error because the
numpy authors refused to guess -- and rightly so, as both implementable
options would be wrong in a common case like yours.

(*) I assume overriding would collide with short-cutting of boolean
expressions.
 
T

Tim Roberts

Peter Otten said:
John said:
a = np.arange(10)
c = np.where((2 < a) & (a < 7))
c
(array([3, 4, 5, 6]),)

Nice! Thanks!

Now, why does the multiple comparison fail, if you happen to know?

2 < a < 7

is equivalent to

2 < a and a < 7

Unlike `&` `and` cannot be overridden (*),,,,

And just in case it isn't obvious to the original poster, the expression "2
< a" only works because the numpy.array class has an override for the "<"
operator. Python natively has no idea how to compare an integer to a
numpy.array object.

Similarly, (2 < a) & (a > 7) works because numpy.array has an override for
the "&" operator. So, that expression is compiled as

numpy.array.__and__(
numpy.array.__lt__(2, a),
numpy.array.__lt__(a, 7)
)

As Peter said, there's no way to override the "and" operator.
 
Ad

Advertisements

T

Terry Reedy

Unlike `&` `and` cannot be overridden (*),
(*) I assume overriding would collide with short-cutting of boolean
expressions.

Yes. 'and' could be called a 'control-flow operator', but in Python it
is not a functional operator.

A functional binary operator expression like 'a + b' abbreviates a
function call, without using (). In this case, it could be written
'operator.add(a,b)'. This function, or it internal equivalent, calls
either a.__add__(b) or b.__radd__(a) or both. It is the overloading of
the special methods that overrides the operator.

The control flow expression 'a and b' cannot abbreviate a function call
because Python calls always evaluate all arguments first. It is
equivalent* to the conditional (control flow) *expression* (also not a
function operator) 'a if not a else b'. Evaluation of either expression
calls bool(a) and hence a.__bool__ or a.__len__.

'a or b' is equivalent* to 'a if a else b'

* 'a (and/or) b' evaluates 'a' once, whereas 'a if (not/)a else b'
evaluates 'a' twice. This is not equivalent when there are side-effects.
Here is an example where this matters.
input('enter a non-0 number :') or 1
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top