PEP 335: Overloadable Boolean Operators - Official Posting

G

Greg Ewing

PEP: 335
Title: Overloadable Boolean Operators
Version: $Revision: 1.2 $
Last-Modified: $Date: 2004/09/09 14:17:17 $
Author: Gregory Ewing <[email protected]>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Aug-2004
Python-Version: 2.4
Post-History: 05-Sep-2004


Abstract
========

This PEP proposes an extension to permit objects to define their own
meanings for the boolean operators 'and', 'or' and 'not', and suggests
an efficient strategy for implementation. A prototype of this
implementation is available for download.


Background
==========

Python does not currently provide any '__xxx__' special methods
corresponding to the 'and', 'or' and 'not' boolean operators. In the
case of 'and' and 'or', the most likely reason is that these operators
have short-circuiting semantics, i.e. the second operand is not
evaluated if the result can be determined from the first operand. The
usual technique of providing special methods for these operators
therefore would not work.

There is no such difficulty in the case of 'not', however, and it
would be straightforward to provide a special method for this
operator. The rest of this proposal will therefore concentrate mainly
on providing a way to overload 'and' and 'or'.


Motivation
==========

There are many applications in which it is natural to provide custom
meanings for Python operators, and in some of these, having boolean
operators excluded from those able to be customised can be
inconvenient. Examples include:

1. Numeric/Numarray, in which almost all the operators are defined on
arrays so as to perform the appropriate operation between
corresponding elements, and return an array of the results. For
consistency, one would expect a boolean operation between two
arrays to return an array of booleans, but this is not currently
possible.

There is a precedent for an extension of this kind: comparison
operators were originally restricted to returning boolean results,
and rich comparisons were added so that comparisons of Numeric
arrays could return arrays of booleans.

2. A symbolic algebra system, in which a Python expression is
evaluated in an environment which results in it constructing a tree
of objects corresponding to the structure of the expression.

3. A relational database interface, in which a Python expression is
used to construct an SQL query.

A workaround often suggested is to use the bitwise operators '&', '|'
and '~' in place of 'and', 'or' and 'not', but this has some
drawbacks. The precedence of these is different in relation to the
other operators, and they may already be in use for other purposes (as
in example 1). There is also the aesthetic consideration of forcing
users to use something other than the most obvious syntax for what
they are trying to express. This would be particularly acute in the
case of example 3, considering that boolean operations are a staple of
SQL queries.


Rationale
=========

The requirements for a successful solution to the problem of allowing
boolean operators to be customised are:

1. In the default case (where there is no customisation), the existing
short-circuiting semantics must be preserved.

2. There must not be any appreciable loss of speed in the default
case.

3. If possible, the customisation mechanism should allow the object to
provide either short-circuiting or non-short-circuiting semantics,
at its discretion.

One obvious strategy, that has been previously suggested, is to pass
into the special method the first argument and a function for
evaluating the second argument. This would satisfy requirements 1 and
3, but not requirement 2, since it would incur the overhead of
constructing a function object and possibly a Python function call on
every boolean operation. Therefore, it will not be considered further
here.

The following section proposes a strategy that addresses all three
requirements. A `prototype implementation`_ of this strategy is
available for download.

... _prototype implementation:
http://www.cosc.canterbury.ac.nz/~greg/python/obo//Python_OBO.tar.gz


Specification
=============

Special Methods
---------------

At the Python level, objects may define the following special methods.

=============== ================= ========================
Unary Binary, phase 1 Binary, phase 2
=============== ================= ========================
* __not__(self) * __and1__(self) * __and2__(self, other)
* __or1__(self) * __or2__(self, other)
* __rand2__(self, other)
* __ror2__(self, other)
=============== ================= ========================

The __not__ method, if defined, implements the 'not' operator. If it
is not defined, or it returns NotImplemented, existing semantics are
used.

To permit short-circuiting, processing of the 'and' and 'or' operators
is split into two phases. Phase 1 occurs after evaluation of the first
operand but before the second. If the first operand defines the
appropriate phase 1 method, it is called with the first operand as
argument. If that method can determine the result without needing the
second operand, it returns the result, and further processing is
skipped.

If the phase 1 method determines that the second operand is needed, it
returns the special value NeedOtherOperand. This triggers the
evaluation of the second operand, and the calling of an appropriate
phase 2 method. During phase 2, the __and2__/__rand2__ and
__or2__/__ror2__ method pairs work as for other binary operators.

Processing falls back to existing semantics if at any stage a relevant
special method is not found or returns NotImplemented.

As a special case, if the first operand defines a phase 2 method but
no corresponding phase 1 method, the second operand is always
evaluated and the phase 2 method called. This allows an object which
does not want short-circuiting semantics to simply implement the
relevant phase 2 methods and ignore phase 1.


Bytecodes
---------

The patch adds four new bytecodes, LOGICAL_AND_1, LOGICAL_AND_2,
LOGICAL_OR_1 and LOGICAL_OR_2. As an example of their use, the
bytecode generated for an 'and' expression looks like this::

.
.
.
evaluate first operand
LOGICAL_AND_1 L
evaluate second operand
LOGICAL_AND_2
L: .
.
.

The LOGICAL_AND_1 bytecode performs phase 1 processing. If it
determines that the second operand is needed, it leaves the first
operand on the stack and continues with the following code. Otherwise
it pops the first operand, pushes the result and branches to L.

The LOGICAL_AND_2 bytecode performs phase 2 processing, popping both
operands and pushing the result.


Type Slots
----------

A the C level, the new special methods are manifested as five new
slots in the type object. In the patch, they are added to the
tp_as_number substructure, since this allowed making use of some
existing code for dealing with unary and binary operators. Their
existence is signalled by a new type flag,
Py_TPFLAGS_HAVE_BOOLEAN_OVERLOAD.

The new type slots are::

unaryfunc nb_logical_not;
unaryfunc nb_logical_and_1;
unaryfunc nb_logical_or_1;
binaryfunc nb_logical_and_2;
binaryfunc nb_logical_or_2;


Python/C API Functions
----------------------

There are also five new Python/C API functions corresponding to the
new operations::

PyObject *PyObject_LogicalNot(PyObject *);
PyObject *PyObject_LogicalAnd1(PyObject *);
PyObject *PyObject_LogicalOr1(PyObject *);
PyObject *PyObject_LogicalAnd2(PyObject *, PyObject *);
PyObject *PyObject_LogicalOr2(PyObject *, PyObject *);


Copyright
=========

This document has been placed in the public domain.


...
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
 
G

Gerrit

Hello,

Greg said:
At the Python level, objects may define the following special methods.

=============== ================= ========================
Unary Binary, phase 1 Binary, phase 2
=============== ================= ========================
* __not__(self) * __and1__(self) * __and2__(self, other)
* __or1__(self) * __or2__(self, other)
* __rand2__(self, other)
* __ror2__(self, other)
=============== ================= ========================

The __not__ method, if defined, implements the 'not' operator. If it
is not defined, or it returns NotImplemented, existing semantics are
used.

I like the idea but I don't like the method names. I don't like numbers
in method names, since they don't intuitively tell the user about what
it does. In my opinion, and1 should be called and, or1 or, rand2 rand
and ror2 ror. The only two left are then and2 and or2. Another
possibility: always have a second argument in and, and default it to
None. It can return NeedOtherOperand or similar if it needs, and ignore
it otherwise. It results in a problem when the second operand is None,
but perhaps that can be circumvented. Something in this direction may be
more elegant than having 3 methods just for and.

def __and__(self, other):
if self.value % 3 == math.ceil(random.random()):
return self
elif other is None: # or SomeSpecialValue
return NeedOtherOperand
else:
return other

Another disadvantage is yet more cluttering of builtins, because
NeedOtherValue would need to be a builtin.
As a special case, if the first operand defines a phase 2 method but
no corresponding phase 1 method, the second operand is always
evaluated and the phase 2 method called. This allows an object which
does not want short-circuiting semantics to simply implement the
relevant phase 2 methods and ignore phase 1.

What happens if an object defines only rand2? Does it depend on the left
operand to be called, or is it always called? E.g. can the right operand
influence short-circuiting (I assume not)?

Excellent PEP, btw.

Gerrit.
 
C

Colin J. Williams

I understand that the intent, eventually, is that the basic types become
classes.

Would making Bool a class permit the achievement of these objectives?

If Bool were a class, then subclasses could use __repr__ to provide
alternative responses to False/True, such as No/Yes or Fail/OK.

Colin W.
 
G

Gerrit

Colin said:
I understand that the intent, eventually, is that the basic types become
classes.

Would making Bool a class permit the achievement of these objectives?

If Bool were a class, then subclasses could use __repr__ to provide
alternative responses to False/True, such as No/Yes or Fail/OK.

No, this has nothing to do with it. The representation is not the issue.
The issue is that currently, 'a and b' returns either a or b. Suppose we
have an array A = [a1, b1, c1], B = [a2, b2, c2], A and B will currently
always return A. However, we may want it to return [a1 and a2, b1 and
b2, c1 and c2]. That is what this PEP is about.

The lack op ability to subclass bool is something different. I don't
know why that is.

regards,
Gerrit.
 
C

Colin J. Williams

Gerrit said:
Colin said:
I understand that the intent, eventually, is that the basic types become
classes.

Would making Bool a class permit the achievement of these objectives?

If Bool were a class, then subclasses could use __repr__ to provide
alternative responses to False/True, such as No/Yes or Fail/OK.


No, this has nothing to do with it. The representation is not the issue.
The issue is that currently, 'a and b' returns either a or b. Suppose we
have an array A = [a1, b1, c1], B = [a2, b2, c2], A and B will currently
always return A. However, we may want it to return [a1 and a2, b1 and
b2, c1 and c2]. That is what this PEP is about.

I misled with my comment about reprsentation.

The basic suggestion I was trying to make is that, with a subclass,
the expression 'A and B' could return almost whatever one wishes.

The use of __repr__ is a secondary benefit.

Colin W.
 
J

John Roth

Colin J. Williams said:
....

I misled with my comment about reprsentation.

The basic suggestion I was trying to make is that, with a subclass,
the expression 'A and B' could return almost whatever one wishes.

Unfortunately, it can't. A subclass needs a magic method to hook
into, without that it's got no leverage whatsoever.

John Roth
 
J

John Roth

It's certainly an interesting PEP. I've got some comments off the
top.

One is that I'm not at all certain that I like the notion of
'and' and 'or' sometimes short circuiting and sometimes
not, especially since you can't reliably predict when it will
and when it won't if the program is playing fast and loose
with typing.

A second comment is that I think you need seven slots - the
'rand2' and 'ror2' slots seem to be missing.

A third comment is that I think the 'and1' and 'or1'
should return either True or False, meaning 'short
circuit with the current value on top of the stack',
or 'continue with the evaluation with the current
value on top of the stack'. I'd like to see a discussion
of why this (IMO conceptually simpler) option
wasn't chosen. In particular, I really don't like the
idea that and1 and or1 can arbitrarily change the result
of the first expression evaluation.

John Roth
 
G

Greg Ewing

Gerrit said:
I like the idea but I don't like the method names. I don't like numbers
in method names, since they don't intuitively tell the user about what
it does. In my opinion, and1 should be called and, or1 or, rand2 rand
and ror2 ror. The only two left are then and2 and or2.

I think it would be very confusing to have numbers in *some*
method names but not others, with little rhyme or reason as
to which.

The names I chose use the numbers in a very systematic way:
The 1 methods are called during phase 1 and have 1 operand;
the 2 methods are called during phase 2 and have 2 operands.

I'm open to suggestions about alternative naming schemes,
but your suggestion doesn't seem like an improvement to me.
Another
possibility: always have a second argument in and, and default it to
None. It can return NeedOtherOperand or similar if it needs, and ignore
it otherwise.

That would require all implementations of these methods to
do a dance to cope with None as the second argument, even
when there was no interest in short-circuiting. I'm not
convinced there's a net gain in elegance.
What happens if an object defines only rand2? Does it depend on the left
operand to be called, or is it always called? E.g. can the right operand
influence short-circuiting (I assume not)?

Obviously not, since that would require the right operand to
be evaluated in order to determine whether to evaluate the
right operand - kind of a catch-22.

This seems to be an inescapable consequence of having short
circuiting at all. Some things can only be controlled by the
left operand.
 
G

Greg Ewing

John said:
A second comment is that I think you need seven slots - the
'rand2' and 'ror2' slots seem to be missing.

If you're talking about the C-level slots, that's not
a mistake. If you look, you'll find that *none* of the
__rxxx__ methods exist at the C level. If the left
operand doesn't define a slot for an operator, the
*same* C-level slot of the right operand is called,
with the operands in the same order. The implementations
of these slots for user-defined classes sort this out
and call __xxx__ or __rxxx__ Python methods as
appropriate.
A third comment is that I think the 'and1' and 'or1'
should return either True or False, meaning 'short
circuit with the current value on top of the stack',
or 'continue with the evaluation with the current
value on top of the stack'. I'd like to see a discussion
of why this (IMO conceptually simpler) option
wasn't chosen.

It's in the interests of generality. Without use cases,
it's admittedly unknown whether this amount of generality
will ever be needed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top