boolean operations on sets

F

Flavio

Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.

If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...
 
S

Steve Holden

Flavio said:
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.

If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...
it has been implemented in this way to conform with the definitions of
"and" and "or", which have never been intended to apply to set
operations. The result of these operations has always returned one of
the operands in the case where possible, and they continue to do so with
set operands.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
 
T

Thomas Jollans

So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...

I did not implement this, so I cannot say, but it does have useful
side-effects, for example:

x = A or B

is equivalent to:

if A:
x = A
else:
x = B

also, in python implementations without the (y if x else z) syntax, you can
use (x and y or z) with nearly the same result*. Also, this implementation of
and/or might well be faster ;-)


*: this doesn't work the same if y is a false value; (x and [y] or [z])[0] is
less readable, but works for all y

--
Regards, Thomas Jollans
GPG key: 0xF421434B may be found on various keyservers, eg pgp.mit.edu
Hacker key <http://hackerkey.com/>:
v4sw6+8Yhw4/5ln3pr5Ock2ma2u7Lw2Nl7Di2e2t3/4TMb6HOPTen5/6g5OPa1XsMr9p-7/-6

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBGtz6FJpinDvQhQ0sRAsN8AJ9SsIx6gj3fG+VHtXvp1aaCJ3E2WgCfeh+y
rx90H88SVRlBZbVRXmIG9Lo=
=Qgsq
-----END PGP SIGNATURE-----
 
D

Diez B. Roggisch

Flavio said:
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.

If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way? I can
see this confusing many newbies...

It has nothing to do with sets - it stems from the fact that certain values
in python are considered false, and all others true. And these semantics
were introduced at a point where there was no explicit True/False, so the
operators were defined in exact the way you observed.

Consider this:

"foo" or "bar" -> "foo"

So - nothing to do with sets.

Diez
 
S

Stargaming

Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the standard
Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR operation,
from the mathematical point of view! aparently the "and" operation is
returning the the second set, and the "or" operation is returning the
first.

That might be, because `and` and `or` are not mathematical in Python (at
least not as you think). All the operator bits, e.g. `|` and `&`, are
overloadable. You can just give them any meaning you want.

The `and` and `or` operator, though, are implemented in Python and there
is no way you can make them behave different from how they do it by
default. It has been discussed to remove this behaviour or make them
overloadable as well but this hasn't made it far, as far as I remember.
If python developers wanted these operations to reflect the traditional
(Python) truth value for data structures: False for empty data
structures and True otherwise, why not return simply True or False?

Because in the most cases, returning True of False simply has no
advantage. But returning the actual operands has been of fairly large
use, e.g. for replacing the if expression ("ternary operator") ``THEN if
COND else DEFAULT`` with ``COND and THEN or DEFAULT`` (which has some bad
corner cases, though).
So My question is: Why has this been implemented in this way? I can see
this confusing many newbies...

Hmm, you could be right there. But they shouldn't be biased by default
boolean behaviour, then, anyways.
 
M

Michael J. Fromberger

"Diez B. Roggisch said:
Flavio said:
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
[...]

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.
[...]

It has nothing to do with sets - it stems from the fact that certain values
in python are considered false, and all others true. And these semantics
were introduced at a point where there was no explicit True/False, so the
operators were defined in exact the way you observed.

Consider this:

"foo" or "bar" -> "foo"

So - nothing to do with sets.

In addition to what Diez wrote above, it is worth noting that the
practise of returning the value of the determining expression turns out
to be convenient for the programmer in some cases. Consider the
following example:

x = some_function(a, b, c) or another_function(d, e)

This is a rather nice shorthand notation for the following behaviour:

t = some_function(a, b, c)
if t:
x = t
else:
x = another_function(d, e)

In other words, the short-circuit behaviour of the logical operators
gives you a compact notation for evaluating certain types of conditional
expressions and capturing their values. If the "or" operator converted
the result to True or False, you could not use it this way.

Similarly,

x = some_function(a, b, c) and another_function(d, e)

.... behaves as if you had written:

x = some_function(a, b, c)
if x:
x = another_function(d, e)

Again, as above, if the results were forcibly converted to Boolean
values, you could not use the shorthand.

Now that Python provides an expression variety of "if", this is perhaps
not as useful as it once was; however, it still has a role. Suppose,
for example, that a call to some_function() is very time-consuming; you
would not want to write:

x = some_function(a, b, c) \
if some_function(a, b, c) else another_function(d, e)

.... because then some_function would get evaluated twice. Python does
not permit assignment within an expression, so you can't get rid of the
second call without changing the syntax.

Also, it is a common behaviour in many programming languages for logical
connectives to both short-circuit and yield their values, so I'd argue
that most programmers are proabably accustomed to it. The && and ||
operators of C and its descendants also behave in this manner, as do the
AND and OR of Lisp or Scheme. It is possible that beginners may find it
a little bit confusing at first, but I believe such confusion is minor
and easily remedied.

Cheers,
-M
 
A

Alex Martelli

Michael J. Fromberger <[email protected]>
wrote:
...
Also, it is a common behaviour in many programming languages for logical
connectives to both short-circuit and yield their values, so I'd argue
that most programmers are proabably accustomed to it. The && and ||
operators of C and its descendants also behave in this manner, as do the

Untrue, alas...:

brain:~ alex$ cat a.c
#include <stdio.h>

int main()
{
printf("%d\n", 23 && 45);
return 0;
}
brain:~ alex$ gcc a.c
brain:~ alex$ ./a.out
1

In C, && and || _do_ "short circuit", BUT they always return 0 or 1,
*NOT* "yield their values" (interpreted as "return the false or true
value of either operand", as in Python).


Alex
 
B

Bruno Desthuilliers

Flavio a écrit :
Hi, I have been playing with set operations lately and came across a
kind of surprising result given that it is not mentioned in the
standard Python tutorial:

with python sets, intersections and unions are supposed to be done
like this:
In [7]:set('casa') & set('porca')
Out[7]:set(['a', 'c'])

In [8]:set('casa') | set('porca')
Out[8]:set(['a', 'c', 'o', 'p', 's', 'r'])

and they work correctly. Now what is confusing is that if you do:

In [5]:set('casa') and set('porca')
Out[5]:set(['a', 'p', 'c', 'r', 'o'])

In [6]:set('casa') or set('porca')
Out[6]:set(['a', 'c', 's'])

The results are not what you would expect from an AND or OR
operation, from the mathematical point of view! aparently the "and"
operation is returning the the second set, and the "or" operation is
returning the first.

the semantic of 'and' and 'or' operators in Python is well defined and
works the same for all types AFAIK.
If python developers wanted these operations to reflect the
traditional (Python) truth value for data structures: False for empty
data structures and True otherwise, why not return simply True or
False?

So My question is: Why has this been implemented in this way?

Because Python long lived without the 'bool' type - considering None,
numeric zero, empty string and empty containers as false (ie :
'nothing', and anything else as true (ie : 'something').
I can
see this confusing many newbies...

Yes, and this has been one of the arguments against the introduction of
the bool type. Changing this behaviour would have break lot of existing
code, and indeed, not changing it makes things confusing.

OTHO - and while I agree that there may be cases of useless complexities
in Python -, stripping a language from anything that might confuse a
newbie doesn't make great languages.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top