Proto-PEP: Overloadable Boolean Operators

G

greg

Discussion is invited on the following proto-PEP.

-------------------------------------------------------------

PEP ??? - Overloadable Boolean Operators
========================================

SUMMARY

This PEP proposes an extension to permit objects to define their own
meanings for the boolean operators 'and', 'or' and 'not', and suggests
an efficient strategy for implementation. A prototype of this
implementation is available for download from:

http://www.cosc.canterbury.ac.nz/~greg/python/obo/Python_OBO.tar.gz


BACKGROUND

Python does not currently provide any '__xxx__' special methods
corresponding to the 'and', 'or' and 'not' boolean operators. In the
case of 'and' and 'or', the most likely reason is that these operators
have short-circuiting semantics, i.e. the second operand is not
evaluated if the result can be determined from the first operand. The
usual techique of providing special methods for these operators
therefore would not work.

There is no such difficulty in the case of 'not', however, and it
would be straightforward to provide a special method for this
operator. The rest of this proposal will therefore concentrate on
providing a way to overload 'and' and 'or'.

MOTIVATION

There are many applications in which it is natural to provide custom
meanings for Python operators, and in some of these, having boolean
operators excluded from those able to be customised can be
inconvenient. Examples include:

1. Numeric/Numarray, in which almost all the operators are defined on
arrays so as to perform the appropriate operation between
corresponding elements, and return an array of the results. For
consistency, one would expect a boolean operation between two arrays
to return an array of booleans, but this is not currently possible.

There is a precedent for an extension of this kind: comparison
operators were originally restricted to returning boolean results, and
rich comparisons were added so that comparisons of Numeric arrays
could return arrays of booleans.

2. A symbolic algebra system, in which a Python expression is
evaluated in an environment which results in it constructing a tree of
objects corresponding to the structure of the expression.

3. A relational database interface, in which a Python expression is
used to construct an SQL query.

A workaround often suggested is to use the bitwise operators '&', '|'
and '~' in place of 'and', 'or' and 'not', but this has some
drawbacks. The precedence of these is different in relation to the
other operators, and they may already be in use for other purposes (as
in example 1). There is also the aesthetic consideration of forcing
users to use something other than the most obvious syntax for what
they are trying to express. This would be particularly acute in the
case of example 3, considering that boolean operations are a staple of
SQL queries.

REQUIREMENTS

The requirements for a successful solution to the problem of allowing
boolean operators to be customised are:

1. In the default case (where there is no customisation), the existing
short-circuiting semantics must be preserved.

2. There must not be any appreciable loss of speed in the default
case.

3. If possible, the customisation mechanism should allow the object to
provide either short-circuiting or non-short-circuiting semantics, at
its discretion.

One obvious strategy, that has been previously suggested, is to pass
into the special method the first argument and a function for
evaluating the second argument. This would satisfy requirements 1 and
3, but not requirement 2, since it would incur the overhead of
constructing a function object and possibly a Python function call on
every boolean operation. Therefore, it will not be considered further
here.

The following section proposes an implementation that addresses all
three requirements. A prototype of this implementation, in the form of
a patch to Python 2.3, is available from:

http://www.cosc.canterbury.ac.nz/~greg/python/obo/Python_OBO.tar.gz


PROPOSAL

Special Methods

At the Python level, objects may define the following special methods.

Unary: __not__(self)

Binary, phase 1: __and1__(self) __or1__(self)

Binary, phase 2: __and2__(self, other) __or2__(self, other)
__rand2__(self, other) __ror2__(self, other)

The __not__ method, if defined, implements the 'not' operator. If it
is not defined, or it returns NotImplemented, existing semantics are
used.

To permit short-circuiting, processing of the 'and' and 'or' operators
is split into two phases. Phase 1 occurs after evaluation of the first
operand but before the second. If the first operand defines the
appropriate phase 1 method, it is called with the first operand as
argument. If that method can determine the result without needing the
second operand, it returns the result, and further processing is
skipped.

If the phase 1 method determines that the second operand is needed, it
returns the special value NeedOtherOperand. This triggers the
evaluation of the second operand, and the calling of an appropriate
phase 2 method.

Processing falls back to existing semantics if at any stage a relevant
special method is not found or returns NotImplemented.

As a special case, if the first operand defines a phase 2 method but
no corresponding phase 1 method, the second operand is always
evaluated and the phase 2 method called. This allows an object which
does not want short-circuiting semantics to simply implement the
relevant phase 2 methods and ignore phase 1.


Bytecodes

The patch adds four new bytecodes, LOGICAL_AND_1, LOGICAL_AND_2,
LOGICAL_OR_1 and LOGICAL_OR_2. As an example of their use, the
bytecode generated for an 'and' expression looks like this:

.
.
.
evaluate first operand
LOGICAL_AND_1 L
evaluate second operand
LOGICAL_AND_2
L: .
.
.

The LOGICAL_AND_1 bytecode performs phase 1 processing. If it
determines that the second operand is needed, it leaves the first
operand on the stack and continues with the following code. Otherwise
it pops the first operand, pushes the result and branches to L.

The LOGICAL_AND_2 bytecode performs phase 2 processing, popping both
operands and pushing the result.


Type Slots

A the C level, the new special methods are manifested as five new
slots in the type object. In the patch, they are added to the
tp_as_number substructure, since this allowed making use of some
existing code for dealing with unary and binary operators. Their
existence is signalled by a new type flag,
Py_TPFLAGS_HAVE_BOOLEAN_OVERLOAD.

The new type slots are:

unaryfunc nb_logical_not;
unaryfunc nb_logical_and_1;
unaryfunc nb_logical_or_1;
binaryfunc nb_logical_and_2;
binaryfunc nb_logical_or_2;


Python/C API Functions

There are also five new Python/C API functions corresponding to the
new operations:

PyObject *PyObject_LogicalNot(PyObject *);
PyObject *PyObject_LogicalAnd1(PyObject *);
PyObject *PyObject_LogicalOr1(PyObject *);
PyObject *PyObject_LogicalAnd2(PyObject *, PyObject *);
PyObject *PyObject_LogicalOr2(PyObject *, PyObject *);


AUTHOR

Gregory Ewing
(e-mail address removed)
 
A

Andrew Durdin

Discussion is invited on the following proto-PEP.

If I understand this correctly, then logical operations on instances
of the following class should duplicate the existing behaviour for the
boolean type -- is this correct?

class MyBoolean:
def __init__(self, value):
self._value = value

def __not__(self):
return MyBoolean(not self._value)

def __and1__(self):
if(self._value):
return NeedOtherOperand
else:
return self

def __and2__(self, other):
return self

def __or1__(self):
if(self._value):
return self
else:
return NeedOtherOperand

def __or2__(self, other):
return self


The PEP doesn't explicitly describe when __rand2__ (or __ror2__) is
used. I'd guess that it will be called for (A and B) when A defines
neither __and1__ nor __and2__ -- is this correct?

Anyway, I'll have to have a close look at your patch -- I'd been
toying with the idea of implementing [or attempting to implement] this
myself.
 
G

greg

Andrew said:
If I understand this correctly, then logical operations on instances
of the following class should duplicate the existing behaviour for the
boolean type -- is this correct?

def __and2__(self, other):
return self

That should be

return other

and similarly for __or2__. Otherwise it looks right.
The PEP doesn't explicitly describe when __rand2__ (or __ror2__) is
used. I'd guess that it will be called for (A and B) when A defines
neither __and1__ nor __and2__ -- is this correct?

Not exactly. It will get called whenever the second operand
is needed, the first operand doesn't define __and2__, and the
second operand does define __rand2__.

The exact rules are rather complicated to state completely. I
have some text which spells it all out in mind-numbing detail,
but I left it out of the first draft of the PEP for fear of
scaring people off before I'd got the idea across. Perhaps
I'll include it as an appendix or something.

Greg
 
M

Miki Tebeka

Hello greg,
PEP ??? - Overloadable Boolean Operators
========================================
<snip>
Unary: __not__(self)

Binary, phase 1: __and1__(self) __or1__(self)

Binary, phase 2: __and2__(self, other) __or2__(self, other)
__rand2__(self, other) __ror2__(self, other)

<snip>
Why not just __bool__(self) -> True, False?
IMO this will answer all of your needs and will require less changes.

Bye.
 
J

Jp Calderone

Miki said:
Hello greg,



Why not just __bool__(self) -> True, False?
IMO this will answer all of your needs and will require less changes.

There is already a __nonzero__ special. Presumably more control is
desired.

Jp
 
G

greg

Miki said:
Why not just __bool__(self) -> True, False?
IMO this will answer all of your needs and will require less changes.

Have you read any of the rationale in the PEP?
The whole reason for this is that overriding __bool__
is *not* sufficient for the use cases I have
in mind.

Greg
 
A

Alex Martelli

greg said:
Python does not currently provide any '__xxx__' special methods
corresponding to the 'and', 'or' and 'not' boolean operators. In the ...
There is no such difficulty in the case of 'not', however, and it

Indeed, that's what the strangely-named __nonzero__ special method does:
it's invoked upon the 'not' operator and in no other case, so it would
be strange to claim it's anything but "a special method corresponding to
the 'not' boolean operator". Problem is, __nonzero__ is currently
typechecked -- it has to return an integer. Maybe relaxing that
constraint might be enough for 'not' (while your elaborate proposals may
well be necessary for 'and' & 'or').


Alex
 
M

Michael Hudson

Indeed, that's what the strangely-named __nonzero__ special method does:
it's invoked upon the 'not' operator and in no other case,

Erm. No.
.... def __nonzero__(self):
.... print 'hi!'
.... return 1
.... ....
hi!

Or do I misunderstand what you're saying?

Cheers,
mwh
 
A

Alex Martelli

Michael Hudson said:
Erm. No.

... def __nonzero__(self):
... print 'hi!'
... return 1
...
...
hi!

Or do I misunderstand what you're saying?

Oops, no, you're right, I was wrong -- not sure why I had that
misconception in my mind right now...


Alex
 
G

Greg Ewing

Alex said:
Problem is, __nonzero__ is currently
typechecked -- it has to return an integer.

Yes, that's the problem. I should probably elaborate
on that a bit in the PEP.
 
A

Andrew Durdin

Yes, that's the problem. I should probably elaborate
on that a bit in the PEP.

That is not the only issue with __nonzero__ versus __not__ -- in some
cases (e.g. the symbolic algebra or SQL query constructor) it is
useful to determine when an explicit "not" operator has been used.

I'm not at a machine with the patch installed on it at the moment, but
I just began to whether this patch would have an effect on expressions
like (a < b < c) (which are also short-circuiting)... Come to think
of it, how do objects which override __gt__ and the other comparisons
(particularly for expression construction) work in that case?
 
A

Alex Martelli

Andrew Durdin said:
That is not the only issue with __nonzero__ versus __not__ -- in some
cases (e.g. the symbolic algebra or SQL query constructor) it is
useful to determine when an explicit "not" operator has been used.

Yes, you're right, and my assertion was flawed. Typechecking is only
part of the issue.

I'm not at a machine with the patch installed on it at the moment, but
I just began to whether this patch would have an effect on expressions
like (a < b < c) (which are also short-circuiting)... Come to think
of it, how do objects which override __gt__ and the other comparisons
(particularly for expression construction) work in that case?

I believe a < b < c has exactly the semantics of [but with no doube
evaluation of b...]:
(a < b) and (b < c)

Overriding _lt_ to give a print seems to confirm that:

In [1]: class chatty:
...: def __init__(self, i): self.i = i
...: def __lt__(self, other):
...: print '%d < %d' % (self.i, other.i)
...: return self.i < other.i
...:

In [2]: chatty(3) < chatty(6) < chatty(9)
3 < 6
6 < 9
Out[2]: True

In [3]: chatty(3) < chatty(16) < chatty(9)
3 < 16
16 < 9
Out[3]: False

In [4]: chatty(3) < chatty(1) < chatty(9)
3 < 1
Out[4]: False


Alex
 
M

Michael Hudson

Andrew Durdin said:
That is not the only issue with __nonzero__ versus __not__ -- in some
cases (e.g. the symbolic algebra or SQL query constructor) it is
useful to determine when an explicit "not" operator has been used.

Yes, you're right, and my assertion was flawed. Typechecking is only
part of the issue.

I'm not at a machine with the patch installed on it at the moment, but
I just began to whether this patch would have an effect on expressions
like (a < b < c) (which are also short-circuiting)... Come to think
of it, how do objects which override __gt__ and the other comparisons
(particularly for expression construction) work in that case?

I believe a < b < c has exactly the semantics of [but with no doube
evaluation of b...]:
(a < b) and (b < c)

Overriding _lt_ to give a print seems to confirm that:

dis.dis can be used to give the same impression:
.... return a < b < c
.... .... return a < b and b < c
.... 2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 DUP_TOP
7 ROT_THREE
8 COMPARE_OP 0 (<)
11 JUMP_IF_FALSE 10 (to 24)
14 POP_TOP
15 LOAD_FAST 2 (c)
18 COMPARE_OP 0 (<)
21 JUMP_FORWARD 2 (to 26) 27 LOAD_CONST 0 (None)
30 RETURN_VALUE 2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 COMPARE_OP 0 (<)
9 JUMP_IF_FALSE 10 (to 22)
12 POP_TOP
13 LOAD_FAST 1 (b)
16 LOAD_FAST 2 (c)
19 COMPARE_OP 0 (<) 23 LOAD_CONST 0 (None)
26 RETURN_VALUE

Cheers,
mwh
 
G

Greg Ewing

Andrew said:
I'm not at a machine with the patch installed on it at the moment, but
I just began to whether this patch would have an effect on expressions
like (a < b < c)

No, it won't have any effect on those. I don't think
it should, either -- it should only apply when you
explicitly write an 'and'.
 
T

Tim Hochberg

Greg said:
No, it won't have any effect on those. I don't think
it should, either -- it should only apply when you
explicitly write an 'and'.

[I should probably download the patch and try this before commenting,
but no time, so: open mouth, insert foot].

Why? You specifically mention Numeric/Numarray as one motivating factor
for this patch, and as a long time Numeric/Numarray user I'd rather have
(a < b < c) work than (a and b).

My biggest concern is that if (a < b < c) is not changed in conjunction
with (a and b), the behaviour of the former becomes difficult to
explain. Currently one can explain it by saying, "a < b < c is
interpreted as (a < b) and (b < c), and 'and' doesn't work on numeric
arrays". However, if 'and' starts to work sensible for numarrays, but (a
< b < c) retains the old behaviour, the behaviour becomes both hard to
explain and hard to understand.

I'll try to download the patch and comment more completely later.

-tim
 
G

Greg Ewing

Tim said:
Why? You specifically mention Numeric/Numarray as one motivating factor
for this patch, and as a long time Numeric/Numarray user I'd rather have
(a < b < c) work than (a and b).

You have a point. I'll think about this some more.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top