533 lines
16 KiB
Plaintext
533 lines
16 KiB
Plaintext
PEP: 335
|
||
Title: Overloadable Boolean Operators
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Gregory Ewing <greg.ewing@canterbury.ac.nz>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 29-Aug-2004
|
||
Python-Version: 3.3
|
||
Post-History: 05-Sep-2004, 30-Sep-2011
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes an extension to permit objects to define their own
|
||
meanings for the boolean operators 'and', 'or' and 'not', and suggests
|
||
an efficient strategy for implementation. A prototype of this
|
||
implementation is available for download.
|
||
|
||
|
||
Background
|
||
==========
|
||
|
||
Python does not currently provide any '__xxx__' special methods
|
||
corresponding to the 'and', 'or' and 'not' boolean operators. In the
|
||
case of 'and' and 'or', the most likely reason is that these operators
|
||
have short-circuiting semantics, i.e. the second operand is not
|
||
evaluated if the result can be determined from the first operand. The
|
||
usual technique of providing special methods for these operators
|
||
therefore would not work.
|
||
|
||
There is no such difficulty in the case of 'not', however, and it
|
||
would be straightforward to provide a special method for this
|
||
operator. The rest of this proposal will therefore concentrate mainly
|
||
on providing a way to overload 'and' and 'or'.
|
||
|
||
|
||
Motivation
|
||
==========
|
||
|
||
There are many applications in which it is natural to provide custom
|
||
meanings for Python operators, and in some of these, having boolean
|
||
operators excluded from those able to be customised can be
|
||
inconvenient. Examples include:
|
||
|
||
1. NumPy, in which almost all the operators are defined on
|
||
arrays so as to perform the appropriate operation between
|
||
corresponding elements, and return an array of the results. For
|
||
consistency, one would expect a boolean operation between two
|
||
arrays to return an array of booleans, but this is not currently
|
||
possible.
|
||
|
||
There is a precedent for an extension of this kind: comparison
|
||
operators were originally restricted to returning boolean results,
|
||
and rich comparisons were added so that comparisons of NumPy
|
||
arrays could return arrays of booleans.
|
||
|
||
2. A symbolic algebra system, in which a Python expression is
|
||
evaluated in an environment which results in it constructing a tree
|
||
of objects corresponding to the structure of the expression.
|
||
|
||
3. A relational database interface, in which a Python expression is
|
||
used to construct an SQL query.
|
||
|
||
A workaround often suggested is to use the bitwise operators '&', '|'
|
||
and '~' in place of 'and', 'or' and 'not', but this has some
|
||
drawbacks. The precedence of these is different in relation to the
|
||
other operators, and they may already be in use for other purposes (as
|
||
in example 1). There is also the aesthetic consideration of forcing
|
||
users to use something other than the most obvious syntax for what
|
||
they are trying to express. This would be particularly acute in the
|
||
case of example 3, considering that boolean operations are a staple of
|
||
SQL queries.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
The requirements for a successful solution to the problem of allowing
|
||
boolean operators to be customised are:
|
||
|
||
1. In the default case (where there is no customisation), the existing
|
||
short-circuiting semantics must be preserved.
|
||
|
||
2. There must not be any appreciable loss of speed in the default
|
||
case.
|
||
|
||
3. Ideally, the customisation mechanism should allow the object to
|
||
provide either short-circuiting or non-short-circuiting semantics,
|
||
at its discretion.
|
||
|
||
One obvious strategy, that has been previously suggested, is to pass
|
||
into the special method the first argument and a function for
|
||
evaluating the second argument. This would satisfy requirements 1 and
|
||
3, but not requirement 2, since it would incur the overhead of
|
||
constructing a function object and possibly a Python function call on
|
||
every boolean operation. Therefore, it will not be considered further
|
||
here.
|
||
|
||
The following section proposes a strategy that addresses all three
|
||
requirements. A `prototype implementation`_ of this strategy is
|
||
available for download.
|
||
|
||
.. _prototype implementation:
|
||
http://www.cosc.canterbury.ac.nz/~greg/python/obo//Python_OBO.tar.gz
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
Special Methods
|
||
---------------
|
||
|
||
At the Python level, objects may define the following special methods.
|
||
|
||
=============== ================= ========================
|
||
Unary Binary, phase 1 Binary, phase 2
|
||
=============== ================= ========================
|
||
* __not__(self) * __and1__(self) * __and2__(self, other)
|
||
* __or1__(self) * __or2__(self, other)
|
||
* __rand2__(self, other)
|
||
* __ror2__(self, other)
|
||
=============== ================= ========================
|
||
|
||
The __not__ method, if defined, implements the 'not' operator. If it
|
||
is not defined, or it returns NotImplemented, existing semantics are
|
||
used.
|
||
|
||
To permit short-circuiting, processing of the 'and' and 'or' operators
|
||
is split into two phases. Phase 1 occurs after evaluation of the first
|
||
operand but before the second. If the first operand defines the
|
||
relevant phase 1 method, it is called with the first operand as
|
||
argument. If that method can determine the result without needing the
|
||
second operand, it returns the result, and further processing is
|
||
skipped.
|
||
|
||
If the phase 1 method determines that the second operand is needed, it
|
||
returns the special value NeedOtherOperand. This triggers the
|
||
evaluation of the second operand, and the calling of a relevant
|
||
phase 2 method. During phase 2, the __and2__/__rand2__ and
|
||
__or2__/__ror2__ method pairs work as for other binary operators.
|
||
|
||
Processing falls back to existing semantics if at any stage a relevant
|
||
special method is not found or returns NotImplemented.
|
||
|
||
As a special case, if the first operand defines a phase 2 method but
|
||
no corresponding phase 1 method, the second operand is always
|
||
evaluated and the phase 2 method called. This allows an object which
|
||
does not want short-circuiting semantics to simply implement the
|
||
phase 2 methods and ignore phase 1.
|
||
|
||
|
||
Bytecodes
|
||
---------
|
||
|
||
The patch adds four new bytecodes, LOGICAL_AND_1, LOGICAL_AND_2,
|
||
LOGICAL_OR_1 and LOGICAL_OR_2. As an example of their use, the
|
||
bytecode generated for an 'and' expression looks like this::
|
||
|
||
.
|
||
.
|
||
.
|
||
evaluate first operand
|
||
LOGICAL_AND_1 L
|
||
evaluate second operand
|
||
LOGICAL_AND_2
|
||
L: .
|
||
.
|
||
.
|
||
|
||
The LOGICAL_AND_1 bytecode performs phase 1 processing. If it
|
||
determines that the second operand is needed, it leaves the first
|
||
operand on the stack and continues with the following code. Otherwise
|
||
it pops the first operand, pushes the result and branches to L.
|
||
|
||
The LOGICAL_AND_2 bytecode performs phase 2 processing, popping both
|
||
operands and pushing the result.
|
||
|
||
|
||
Type Slots
|
||
----------
|
||
|
||
At the C level, the new special methods are manifested as five new
|
||
slots in the type object. In the patch, they are added to the
|
||
tp_as_number substructure, since this allows making use of some
|
||
existing code for dealing with unary and binary operators. Their
|
||
existence is signalled by a new type flag,
|
||
Py_TPFLAGS_HAVE_BOOLEAN_OVERLOAD.
|
||
|
||
The new type slots are::
|
||
|
||
unaryfunc nb_logical_not;
|
||
unaryfunc nb_logical_and_1;
|
||
unaryfunc nb_logical_or_1;
|
||
binaryfunc nb_logical_and_2;
|
||
binaryfunc nb_logical_or_2;
|
||
|
||
|
||
Python/C API Functions
|
||
----------------------
|
||
|
||
There are also five new Python/C API functions corresponding to the
|
||
new operations::
|
||
|
||
PyObject *PyObject_LogicalNot(PyObject *);
|
||
PyObject *PyObject_LogicalAnd1(PyObject *);
|
||
PyObject *PyObject_LogicalOr1(PyObject *);
|
||
PyObject *PyObject_LogicalAnd2(PyObject *, PyObject *);
|
||
|
||
|
||
|
||
Alternatives and Optimisations
|
||
==============================
|
||
|
||
This section discusses some possible variations on the proposal,
|
||
and ways in which the bytecode sequences generated for boolean
|
||
expressions could be optimised.
|
||
|
||
Reduced special method set
|
||
--------------------------
|
||
|
||
For completeness, the full version of this proposal includes a
|
||
mechanism for types to define their own customised short-circuiting
|
||
behaviour. However, the full mechanism is not needed to address the
|
||
main use cases put forward here, and it would be possible to
|
||
define a simplified version that only includes the phase 2
|
||
methods. There would then only be 5 new special methods (__and2__,
|
||
__rand2__, __or2__, __ror2__, __not__) with 3 associated type slots
|
||
and 3 API functions.
|
||
|
||
This simplified version could be expanded to the full version
|
||
later if desired.
|
||
|
||
Additional bytecodes
|
||
--------------------
|
||
|
||
As defined here, the bytecode sequence for code that branches on
|
||
the result of a boolean expression would be slightly longer than
|
||
it currently is. For example, in Python 2.7,
|
||
|
||
::
|
||
|
||
if a and b:
|
||
statement1
|
||
else:
|
||
statement2
|
||
|
||
generates
|
||
|
||
::
|
||
|
||
LOAD_GLOBAL a
|
||
POP_JUMP_IF_FALSE false_branch
|
||
LOAD_GLOBAL b
|
||
POP_JUMP_IF_FALSE false_branch
|
||
<code for statement1>
|
||
JUMP_FORWARD end_branch
|
||
false_branch:
|
||
<code for statement2>
|
||
end_branch:
|
||
|
||
Under this proposal as described so far, it would become something like
|
||
|
||
::
|
||
|
||
LOAD_GLOBAL a
|
||
LOGICAL_AND_1 test
|
||
LOAD_GLOBAL b
|
||
LOGICAL_AND_2
|
||
test:
|
||
POP_JUMP_IF_FALSE false_branch
|
||
<code for statement1>
|
||
JUMP_FORWARD end_branch
|
||
false_branch:
|
||
<code for statement2>
|
||
end_branch:
|
||
|
||
This involves executing one extra bytecode in the short-circuiting
|
||
case and two extra bytecodes in the non-short-circuiting case.
|
||
|
||
However, by introducing extra bytecodes that combine the logical
|
||
operations with testing and branching on the result, it can be
|
||
reduced to the same number of bytecodes as the original:
|
||
|
||
::
|
||
|
||
LOAD_GLOBAL a
|
||
AND1_JUMP true_branch, false_branch
|
||
LOAD_GLOBAL b
|
||
AND2_JUMP_IF_FALSE false_branch
|
||
true_branch:
|
||
<code for statement1>
|
||
JUMP_FORWARD end_branch
|
||
false_branch:
|
||
<code for statement2>
|
||
end_branch:
|
||
|
||
Here, AND1_JUMP performs phase 1 processing as above,
|
||
and then examines the result. If there is a result, it is popped
|
||
from the stack, its truth value is tested and a branch taken to
|
||
one of two locations.
|
||
|
||
Otherwise, the first operand is left on the stack and execution
|
||
continues to the next bytecode. The AND2_JUMP_IF_FALSE bytecode
|
||
performs phase 2 processing, pops the result and branches if
|
||
it tests false
|
||
|
||
For the 'or' operator, there would be corresponding OR1_JUMP
|
||
and OR2_JUMP_IF_TRUE bytecodes.
|
||
|
||
If the simplified version without phase 1 methods is used, then
|
||
early exiting can only occur if the first operand is false for
|
||
'and' and true for 'or'. Consequently, the two-target AND1_JUMP and
|
||
OR1_JUMP bytecodes can be replaced with AND1_JUMP_IF_FALSE and
|
||
OR1_JUMP_IF_TRUE, these being ordinary branch instructions with
|
||
only one target.
|
||
|
||
Optimisation of 'not'
|
||
---------------------
|
||
|
||
Recent versions of Python implement a simple optimisation in
|
||
which branching on a negated boolean expression is implemented
|
||
by reversing the sense of the branch, saving a UNARY_NOT opcode.
|
||
|
||
Taking a strict view, this optimisation should no longer be
|
||
performed, because the 'not' operator may be overridden to produce
|
||
quite different results from usual. However, in typical use cases,
|
||
it is not envisaged that expressions involving customised boolean
|
||
operations will be used for branching -- it is much more likely
|
||
that the result will be used in some other way.
|
||
|
||
Therefore, it would probably do little harm to specify that the
|
||
compiler is allowed to use the laws of boolean algebra to
|
||
simplify any expression that appears directly in a boolean
|
||
context. If this is inconvenient, the result can always be assigned
|
||
to a temporary name first.
|
||
|
||
This would allow the existing 'not' optimisation to remain, and
|
||
would permit future extensions of it such as using De Morgan's laws
|
||
to extend it deeper into the expression.
|
||
|
||
|
||
Usage Examples
|
||
==============
|
||
|
||
Example 1: NumPy Arrays
|
||
-----------------------
|
||
|
||
::
|
||
|
||
#-----------------------------------------------------------------
|
||
#
|
||
# This example creates a subclass of numpy array to which
|
||
# 'and', 'or' and 'not' can be applied, producing an array
|
||
# of booleans.
|
||
#
|
||
#-----------------------------------------------------------------
|
||
|
||
from numpy import array, ndarray
|
||
|
||
class BArray(ndarray):
|
||
|
||
def __str__(self):
|
||
return "barray(%s)" % ndarray.__str__(self)
|
||
|
||
def __and2__(self, other):
|
||
return self & other
|
||
|
||
def __or2__(self, other):
|
||
return self & other
|
||
|
||
def __not__(self):
|
||
return self == 0
|
||
|
||
def barray(*args, **kwds):
|
||
return array(*args, **kwds).view(type=BArray)
|
||
|
||
a0 = barray([0, 1, 2, 4])
|
||
a1 = barray([1, 2, 3, 4])
|
||
a2 = barray([5, 6, 3, 4])
|
||
a3 = barray([5, 1, 2, 4])
|
||
|
||
print("a0:", a0)
|
||
print("a1:", a1)
|
||
print("a2:", a2)
|
||
print("a3:", a3)
|
||
print("not a0:", not a0)
|
||
print("a0 == a1 and a2 == a3:", a0 == a1 and a2 == a3)
|
||
print("a0 == a1 or a2 == a3:", a0 == a1 or a2 == a3)
|
||
|
||
Example 1 Output
|
||
----------------
|
||
|
||
::
|
||
|
||
a0: barray([0 1 2 4])
|
||
a1: barray([1 2 3 4])
|
||
a2: barray([5 6 3 4])
|
||
a3: barray([5 1 2 4])
|
||
not a0: barray([ True False False False])
|
||
a0 == a1 and a2 == a3: barray([False False False True])
|
||
a0 == a1 or a2 == a3: barray([False False False True])
|
||
|
||
|
||
Example 2: Database Queries
|
||
---------------------------
|
||
|
||
::
|
||
|
||
#-----------------------------------------------------------------
|
||
#
|
||
# This example demonstrates the creation of a DSL for database
|
||
# queries allowing 'and' and 'or' operators to be used to
|
||
# formulate the query.
|
||
#
|
||
#-----------------------------------------------------------------
|
||
|
||
class SQLNode:
|
||
|
||
def __and2__(self, other):
|
||
return SQLBinop("and", self, other)
|
||
|
||
def __rand2__(self, other):
|
||
return SQLBinop("and", other, self)
|
||
|
||
def __eq__(self, other):
|
||
return SQLBinop("=", self, other)
|
||
|
||
|
||
class Table(SQLNode):
|
||
|
||
def __init__(self, name):
|
||
self.__tablename__ = name
|
||
|
||
def __getattr__(self, name):
|
||
return SQLAttr(self, name)
|
||
|
||
def __sql__(self):
|
||
return self.__tablename__
|
||
|
||
|
||
class SQLBinop(SQLNode):
|
||
|
||
def __init__(self, op, opnd1, opnd2):
|
||
self.op = op.upper()
|
||
self.opnd1 = opnd1
|
||
self.opnd2 = opnd2
|
||
|
||
def __sql__(self):
|
||
return "(%s %s %s)" % (sql(self.opnd1), self.op, sql(self.opnd2))
|
||
|
||
|
||
class SQLAttr(SQLNode):
|
||
|
||
def __init__(self, table, name):
|
||
self.table = table
|
||
self.name = name
|
||
|
||
def __sql__(self):
|
||
return "%s.%s" % (sql(self.table), self.name)
|
||
|
||
|
||
class SQLSelect(SQLNode):
|
||
|
||
def __init__(self, targets):
|
||
self.targets = targets
|
||
self.where_clause = None
|
||
|
||
def where(self, expr):
|
||
self.where_clause = expr
|
||
return self
|
||
|
||
def __sql__(self):
|
||
result = "SELECT %s" % ", ".join(sql(target) for target in self.targets)
|
||
if self.where_clause:
|
||
result = "%s WHERE %s" % (result, sql(self.where_clause))
|
||
return result
|
||
|
||
|
||
def sql(expr):
|
||
if isinstance(expr, SQLNode):
|
||
return expr.__sql__()
|
||
elif isinstance(expr, str):
|
||
return "'%s'" % expr.replace("'", "''")
|
||
else:
|
||
return str(expr)
|
||
|
||
|
||
def select(*targets):
|
||
return SQLSelect(targets)
|
||
|
||
|
||
#--------------------------------------------------------------------------------
|
||
|
||
::
|
||
dishes = Table("dishes")
|
||
customers = Table("customers")
|
||
orders = Table("orders")
|
||
|
||
query = select(customers.name, dishes.price, orders.amount).where(
|
||
customers.cust_id == orders.cust_id and orders.dish_id == dishes.dish_id
|
||
and dishes.name == "Spam, Eggs, Sausages and Spam")
|
||
|
||
print(repr(query))
|
||
print(sql(query))
|
||
|
||
Example 2 Output
|
||
----------------
|
||
|
||
::
|
||
|
||
<__main__.SQLSelect object at 0x1cc830>
|
||
SELECT customers.name, dishes.price, orders.amount WHERE
|
||
(((customers.cust_id = orders.cust_id) AND (orders.dish_id =
|
||
dishes.dish_id)) AND (dishes.name = 'Spam, Eggs, Sausages and Spam'))
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|