578 lines
22 KiB
Plaintext
578 lines
22 KiB
Plaintext
PEP: 532
|
||
Title: Defining a conditional result management protocol
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 30-Oct-2016
|
||
Python-Version: 3.7
|
||
|
||
Abstract
|
||
========
|
||
|
||
Inspired by PEP 335, PEP 505, PEP 531, and the related discussions, this PEP
|
||
proposes the addition of a new conditional result management protocol to Python
|
||
that allows objects to customise the behaviour of the following expressions:
|
||
|
||
* ``if-else`` conditional expressions
|
||
* the ``and`` logical conjunction operator
|
||
* the ``or`` logical disjunction operator
|
||
* chained comparisons (which implicitly invoke ``and``)
|
||
* the ``not`` logical negation operator
|
||
|
||
Each of these expressions is ultimately a variant on the underlying pattern::
|
||
|
||
THEN_RESULT if CONDITION else ELSE_RESULT
|
||
|
||
Currently, the ``CONDITION`` expression can control *which* branch is taken
|
||
(based on whether it evaluates to ``True`` or ``False`` in a boolean context),
|
||
but it can't influence the *result* of taking that branch.
|
||
|
||
This PEP proposes the addition of two new "conditional result management"
|
||
protocol methods that allow conditional result managers to influence the
|
||
results of each branch directly:
|
||
|
||
* ``__then__(self, result)``, to alter the result when the condition is ``True``
|
||
* ``__else__(self, result)``, to alter the result when the condition is ``False``
|
||
|
||
While there are some practical complexities arising from the current handling
|
||
of single-valued arrays in NumPy, this should be sufficient to allow elementwise
|
||
chained comparison operations for matrices, where the result is a matrix of
|
||
boolean values, rather than tautologically returning ``True`` or raising
|
||
``ValueError``.
|
||
|
||
To properly support logical negation of conditional result managers, a new
|
||
``__not__`` protocol methro would also be introduced allowing objects to control
|
||
the result of ``not obj`` expressions.
|
||
|
||
The PEP further proposes the addition of new ``exists`` and ``missing`` builtins
|
||
that allow conditional branching based on whether or not an object is ``None``,
|
||
but return the original object rather than the existence checking wrapper as
|
||
the result of any conditional expressions. In addition to being usable as
|
||
a simple boolean operator (e.g. as in ``assert all(exists, items)``), this
|
||
allows existence checking fallback operations (aka null-coalescing operations)
|
||
to be written as::
|
||
|
||
value = exists(expr1) or exists(expr2) or expr3
|
||
|
||
and existence checking precondition operations (aka null-propagating
|
||
or null-severing operations) can be written as either::
|
||
|
||
value = exists(obj) and obj.field.of.interest
|
||
value = exists(obj) and obj["field"]["of"]["interest"]
|
||
|
||
or::
|
||
|
||
value = missing(obj) or obj.field.of.interest
|
||
value = missing(obj) or obj["field"]["of"]["interest"]
|
||
|
||
|
||
Relationship with other PEPs
|
||
============================
|
||
|
||
This PEP is a direct successor to PEP 531, replacing the existence checking
|
||
protocol and the new ``?then`` and ``?else`` syntactic operators defined there
|
||
with the ability to customise the behaviour of the established ``not``,
|
||
``and`` and ``or`` operators. The existence checking use case is taken from
|
||
that PEP.
|
||
|
||
It is also a direct successor to PEP 335, which proposed the ability to
|
||
overload the ``and`` and ``or`` operators directly, rather than indirectly
|
||
via interpretation as variants of the more general ``if-else`` conditional
|
||
expressions. The discussion of the element-wise comparison use case is
|
||
drawn from Guido's rejection of that PEP.
|
||
|
||
This PEP competes with a number of aspects of PEP 505, proposing that improved
|
||
support for null-coalescing operations be offered through a new protocol and
|
||
new builtin, rather than through new syntax. It doesn't compete specifically
|
||
with the proposed shorthands for existence checking attribute access and
|
||
subscripting, but instead offers an alternative underlying semantic framework
|
||
for defining them:
|
||
|
||
* ``LHS ?? RHS`` would mean ``exists(LHS) or RHS``
|
||
* ``EXPR?.attr`` would mean ``missing(EXPR) or EXPR.attr``
|
||
* ``EXPR?[key]`` would mean ``missing(EXPR) or EXPR[key]``
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
Conditional expressions (``if-else``)
|
||
-------------------------------------
|
||
|
||
The conditional expression ``THEN_RESULT if CONDITION else ELSE_RESULT`` is
|
||
currently approximately equivalent to the following code::
|
||
|
||
if CONDITION:
|
||
_expr_result = THEN_RESULT
|
||
else:
|
||
_expr_result = ELSE_RESULT
|
||
|
||
The new protocol proposed in this PEP would change that to::
|
||
|
||
_condition = CONDITION
|
||
_condition_type = type(CONDITION)
|
||
if _condition:
|
||
_then_result = THEN_RESULT
|
||
if hasattr(_condition_type, "__then__"):
|
||
_then_result = _condition_type.__then__(_condition, _then_result)
|
||
_expr_result = _then_result
|
||
else:
|
||
_else_result = ELSE_RESULT
|
||
if hasattr(_condition_type, "__else__"):
|
||
_else_result = _condition_type.__else__(_condition, _else_result)
|
||
_expr_result = _else_result
|
||
|
||
The key change is that the value determining which branch of the conditional
|
||
expression gets executed *also* gets a chance to postprocess the results of
|
||
the expressions on each of the branches.
|
||
|
||
Interpreter implementations may check eagerly for the new protocol methods
|
||
on condition objects in order to retain an optimised fast path for the great
|
||
many objects that support use in a boolean context, but don't implement the new
|
||
protocol.
|
||
|
||
|
||
Logical conjunction (``and``)
|
||
-----------------------------
|
||
|
||
Logical conjunction is affected by this proposal as if::
|
||
|
||
LHS and RHS
|
||
|
||
was internally implemented by the interpreter as::
|
||
|
||
_lhs_result = LHS
|
||
_expr_result = RHS if _lhs_result else _lhs_result
|
||
|
||
Conditional result managers can force non-shortcircuiting evaluation under
|
||
logical conjunction by always returning ``True`` from ``__bool__`` and
|
||
enforce this at runtime by raising ``NotImplementedError`` by raising
|
||
``NotImplementedError`` in ``__else__``.
|
||
|
||
Alternatively, conditional result managers can detect short-circuited evaluation
|
||
of logical conjunction in ``__else__`` implementations by looking for cases
|
||
where ``self`` and ``result`` are the exact same object.
|
||
|
||
|
||
Logical disjunction (``or``)
|
||
-----------------------------
|
||
|
||
Logical disjunction is affected by this proposal as if::
|
||
|
||
LHS or RHS
|
||
|
||
was internally implemented by the interpreter as::
|
||
|
||
_lhs_result = LHS
|
||
_expr_result = _lhs_result if _lhs_result else RHS
|
||
|
||
Conditional result managers can force non-shortcircuiting evaluation under
|
||
logical disjunction by always returning ``False`` from ``__bool__`` and
|
||
enforce this at runtime by raising ``NotImplementedError`` by raising
|
||
``NotImplementedError`` in ``__then__``.
|
||
|
||
Alternatively, conditional result managers can detect short-circuited evaluation
|
||
of logical disjunction in ``__then__`` implementations by looking for cases
|
||
where ``self`` and ``result`` are the exact same object.
|
||
|
||
|
||
Chained comparisons
|
||
-------------------
|
||
|
||
Chained comparisons are affected by this proposal as if::
|
||
|
||
LEFT_BOUND left_op EXPR right_op RIGHT_BOUND
|
||
|
||
was internally implemented by the interpreter as::
|
||
|
||
_expr = EXPR
|
||
_lhs_result = LEFT_BOUND left_op EXPR
|
||
_expr_result = _lhs_result if _lhs_result else (_expr right_op RIGHT_BOUND)
|
||
|
||
As with any logical conjunction, conditional result managers returned by
|
||
comparison operations can force non-shortcircuiting evaluating in these
|
||
cases by always returning ``True`` from ``__bool__``.
|
||
|
||
|
||
Existence checking comparisons
|
||
------------------------------
|
||
|
||
Two new builtins implementing the new protocol are proposed to encapsulate the
|
||
notion of "existence checking": seeing if a value is ``None`` and either
|
||
falling back to an alternative value (an operation known as "None-coalescing")
|
||
or passing it through as the result of the overall expression (an operation
|
||
known as "None-severing" or "None-propagating").
|
||
|
||
These builtins would be defined as follows::
|
||
|
||
class exists:
|
||
"""Conditional result manager for 'EXPR is not None' checks"""
|
||
def __init__(self, value):
|
||
self.value = value
|
||
def __not__(self):
|
||
return missing(self.value)
|
||
def __bool__(self):
|
||
return self.value is not None
|
||
def __then__(self, result):
|
||
if result is self:
|
||
return result.value
|
||
return result
|
||
def __else__(self, result):
|
||
if result is self:
|
||
return result.value
|
||
return result
|
||
|
||
class missing:
|
||
"""Conditional result manager for 'EXPR is None' checks"""
|
||
def __init__(self, value):
|
||
self.value = value
|
||
def __not__(self):
|
||
return exists(self.value)
|
||
def __bool__(self):
|
||
return self.value is None
|
||
def __then__(self, result):
|
||
if result is self:
|
||
return result.value
|
||
return result
|
||
def __else__(self, result):
|
||
if result is self:
|
||
return result.value
|
||
return result
|
||
|
||
|
||
Aside from changing the definition of ``__bool__`` to be based on
|
||
``is not None`` rather than normal truth checking, the key characteristic of
|
||
``exists`` is that when it is used as a conditional result manager, it is
|
||
*ephemeral*: when it detects that short circuiting has taken place, it returns
|
||
the original value, rather than the existence checking wrapper.
|
||
|
||
``missing`` is defined as the logically inverted counterpart of ``exists``:
|
||
``not exists(obj)`` is semantically equivalent to ``missing(obj)``.
|
||
|
||
|
||
Other conditional constructs
|
||
----------------------------
|
||
|
||
No changes are proposed to if statements, while statements, comprehensions, or
|
||
generator expressions, as the boolean clauses they contain are purely used for
|
||
control flow purposes and don't have programmatically accessible "results".
|
||
|
||
However, it's worth noting that while such proposals are outside the scope of
|
||
this PEP, the conditional result management protocol defined here would be
|
||
sufficient to support constructs like::
|
||
|
||
while exists(dynamic_query()) as result:
|
||
... # Code using result
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Avoiding new syntax
|
||
-------------------
|
||
|
||
Adding new syntax to Python to make particular software design problems easier
|
||
to handle is considered a solution of last resort. As a successor to PEP 335,
|
||
this PEP focuses on making the existing ``and`` and ``or`` operators less rigid
|
||
in their interpretation, rather than on proposing new operators.
|
||
|
||
|
||
Element-wise chained comparisons
|
||
--------------------------------
|
||
|
||
In ultimately rejecting PEP 335, Guido van Rossum noted [1_]:
|
||
|
||
The NumPy folks brought up a somewhat separate issue: for them,
|
||
the most common use case is chained comparisons (e.g. A < B < C).
|
||
|
||
To understand this obversation, we first need to look at how comparisons work
|
||
with NumPy arrays::
|
||
|
||
>>> import numpy as np
|
||
>>> increasing = np.arange(5)
|
||
>>> increasing
|
||
array([0, 1, 2, 3, 4])
|
||
>>> decreasing = np.arange(4, -1, -1)
|
||
>>> decreasing
|
||
array([4, 3, 2, 1, 0])
|
||
>>> increasing < decreasing
|
||
array([ True, True, False, False, False], dtype=bool)
|
||
|
||
Here we see that NumPy array comparisons are element-wise by default, comparing
|
||
each element in the lefthand array to the corresponding element in the righthand
|
||
array, and producing a matrix of boolean results.
|
||
|
||
If either side of the comparison is a scalar value, then it is broadcast across
|
||
the array and compared to each individual element::
|
||
|
||
>>> 0 < increasing
|
||
array([False, True, True, True, True], dtype=bool)
|
||
>>> increasing < 4
|
||
array([ True, True, True, True, False], dtype=bool)
|
||
|
||
However, this broadcasting idiom breaks down if we attempt to use chained
|
||
comparisons::
|
||
|
||
>>> 0 < increasing < 4
|
||
Traceback (most recent call last):
|
||
File "<stdin>", line 1, in <module>
|
||
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
|
||
|
||
The problem is that internally, Python implicitly expands this chained
|
||
comparison into the form::
|
||
|
||
>>> 0 < increasing and increasing < 4
|
||
Traceback (most recent call last):
|
||
File "<stdin>", line 1, in <module>
|
||
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
|
||
|
||
And NumPy only permits implicit coercion to a boolean value for single-element
|
||
arrays where ``a.any()`` and ``a.all()`` can be assured of having the same
|
||
result::
|
||
|
||
>>> np.array([False]) and np.array([False])
|
||
array([False], dtype=bool)
|
||
>>> np.array([False]) and np.array([True])
|
||
array([False], dtype=bool)
|
||
>>> np.array([True]) and np.array([False])
|
||
array([False], dtype=bool)
|
||
>>> np.array([True]) and np.array([True])
|
||
array([ True], dtype=bool)
|
||
|
||
The proposal in this PEP would allow this situation to be changed by updating
|
||
the definition of element-wise comparison operations in NumPy to return a
|
||
dedicated subclass that both implements the new protocol methods and also
|
||
changes the result array's interpretation in a boolean context to always
|
||
return true and hence avoid Python's default short-circuiting behaviour::
|
||
|
||
class ComparisonResultArray(np.ndarray):
|
||
def __bool__(self):
|
||
return True
|
||
def __then__(self, result):
|
||
if result is self:
|
||
msg = ("Comparison array truth values are ambiguous outside "
|
||
"chained comparisons. Use a.any() or a.all()")
|
||
raise ValueError(msg)
|
||
return np.logical_and(self, result.view(ComparisonResultArray))
|
||
def __else__(self, result):
|
||
raise NotImplementedError("Comparison result arrays are never False")
|
||
|
||
With this change, the chained comparison example above would be able to return::
|
||
|
||
>>> 0 < increasing < 4
|
||
ComparisonResultArray([ False, True, True, True, False], dtype=bool)
|
||
|
||
|
||
Existence checking expressions
|
||
------------------------------
|
||
|
||
An increasingly common requirement in modern software development is the need
|
||
to work with "semi-structured data": data where the structure of the data is
|
||
known in advance, but pieces of it may be missing at runtime, and the software
|
||
manipulating that data is expected to degrade gracefully (e.g. by omitting
|
||
results that depend on the missing data) rather than failing outright.
|
||
|
||
Some particularly common cases where this issue arises are:
|
||
|
||
* handling optional application configuration settings and function parameters
|
||
* handling external service failures in distributed systems
|
||
* handling data sets that include some partial records
|
||
|
||
At the moment, writing such software in Python can be genuinely awkward, as
|
||
your code ends up littered with expressions like:
|
||
|
||
* ``value1 = expr1.field.of.interest if expr1 is not None else None``
|
||
* ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None``
|
||
* ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not None else expr5``
|
||
|
||
PEP 531 goes into more detail on some of the challenges of working with this
|
||
kind of data, particularly in data transformation pipelines where dealing with
|
||
potentially missing content is the norm rather than the exception.
|
||
|
||
The combined impact of the proposals in this PEP is to allow the above sample
|
||
expressions to instead be written as:
|
||
|
||
* ``value1 = exists(expr1) and expr1.field.of.interest``
|
||
* ``value2 = exists(expr2) and expr2.["field"]["of"]["interest"]``
|
||
* ``value3 = exists(expr3) or exists(expr4) or expr5``
|
||
|
||
In these forms, significantly more of the text presented to the reader is
|
||
immediately relevant to the question "What does this code do?", while the
|
||
boilerplate code to handle missing data by passing it through to the output
|
||
or falling back to an alternative input, has shrunk to four uses of the new
|
||
``exists`` builtin, two uses of the ``and`` keyword, and two uses of the
|
||
``or`` keyword.
|
||
|
||
In the first two examples, the 31 character boilerplate suffix
|
||
``if exprN is not None else None`` (minimally 27 characters for a single letter
|
||
variable name) has been replaced by a 20 character `exists(expr1) and``
|
||
prefix (minimally 16 characters with a single letter variable name), somewhat
|
||
improving the signal-to-pattern-noise ratio of the lines (especially if it
|
||
encourages the use of more meaningful variable and field names rather than
|
||
making them shorter purely for the sake of expression brevity).
|
||
|
||
In the last example, not only are two instances of the 26 character boilerplate,
|
||
``if exprN is not None else`` (minimally 22 characters) replaced with the
|
||
14 character function call ``exists() or``, with that function call being
|
||
placed directly around the original expression, eliminating the need to
|
||
duplicate it in the conditional existence check.
|
||
|
||
|
||
Risks and concerns
|
||
==================
|
||
|
||
Readability
|
||
-----------
|
||
|
||
Python has a long history of disallowing customisation of the control flow
|
||
operators, and overloading them isn't particularly common in other languages
|
||
either. Even languages which do permit overloading may lose the property of
|
||
short-circuiting evaluation when overloaded (e.g. that happens when overloading
|
||
``&&`` and ``||`` in C++).
|
||
|
||
This history means that the idea of ``and`` and ``or`` suddenly gaining the
|
||
ability to be interpreted differently based on the type of the left-hand
|
||
operand is a potentially controversial one from a readability and
|
||
maintainability perspective, to the point where it may be *less* controversial
|
||
to define a single new ``??`` operator as proposed in PEP 505, or separate
|
||
``?then`` and ``?else`` operators as suggested in PEP 531 than it would be to
|
||
redefine the existing operators (as currently proposed in this PEP).
|
||
|
||
Such an approach would also address one of Guido's key concerns with PEP 335
|
||
[1_] that would also apply to this PEP as currently written:
|
||
|
||
Amongst other reasons, I really dislike that the PEP adds to the bytecode
|
||
for all uses of these operators even though almost no call sites will ever
|
||
need the feature.
|
||
|
||
If the protocol in this PEP was combined with the core syntactic proposals in
|
||
PEP 531, then the end result would look something like:
|
||
|
||
* ``value1 = exists(expr1) ?then expr1.field.of.interest``
|
||
* ``value2 = exists(expr2) ?then expr2["field"]["of"]["interest"]``
|
||
* ``value3 = exists(expr3) ?else exists(expr4) ?else expr5``
|
||
|
||
Rather than indicating use of the existence protocol as suggested in PEP 531,
|
||
the ``?`` here would indicate use of the conditional result management protocol,
|
||
and hence the fact the result may be something other than the LHS as written
|
||
when the short-circuiting path is executed.
|
||
|
||
Alternatively, if only a single new operator was added as proposed in PEP
|
||
505, but it used the semantics proposed for ``or`` in this PEP, then the end
|
||
result would look something like:
|
||
|
||
* ``value1 = missing(expr1) ?? expr1.field.of.interest``
|
||
* ``value2 = missing(expr2) ?? expr2["field"]["of"]["interest"]``
|
||
* ``value3 = exists(expr3) ?? exists(expr4) ?? expr5``
|
||
|
||
If new operators were added rather than redefining the semantics of ``and``,
|
||
``or`` and ``if-else``, then it would make sense to *require* that their left
|
||
hand operand be a conditional result manager that defines both ``__then__``
|
||
and ``__else__``, rather than accepting arbitrary objects as ``and`` and ``or``
|
||
do.
|
||
|
||
With that approach, chained comparisons would be conditionally redefined in
|
||
terms of the new protocol when the left comparison produces a conditional result
|
||
manager, while continuing to be defined in terms of ``and`` for any other
|
||
left comparison result.
|
||
|
||
|
||
Compatibility
|
||
-------------
|
||
|
||
At least CPython's peephole optimizer, and presumably other Python optimizers,
|
||
include a lot of assumptions about the semantics of ``and`` and ``or``
|
||
expressions. This means that any changes to those semantics are likely to
|
||
require interpreter implementors to closely review a whole lot of code
|
||
related not only to the way those operations are implemented, but also to the
|
||
way they're optimized.
|
||
|
||
By contrast, new operators would be substantially lower risk, as existing
|
||
optimizers couldn't be making any assumptions about how they work.
|
||
|
||
|
||
Speed of execution
|
||
------------------
|
||
|
||
Making relatively common operations like ``and`` and ``or`` check for additional
|
||
protocol methods is likely to slow them down in the common case. The additional
|
||
overhead should be small relative to the cost of boolean truth checking, but
|
||
it won't be zero.
|
||
|
||
Defining new operators rather than reusing existing ones would address this
|
||
concern as well.
|
||
|
||
|
||
Design Discussion
|
||
=================
|
||
|
||
Arbitrary sentinel objects
|
||
--------------------------
|
||
|
||
Unlike PEP 531, this proposal readily handles custom sentinel objects::
|
||
|
||
# Definition of a base configurable sentinel check that defaults to None
|
||
class SentinelCheck:
|
||
sentinel = None
|
||
def __init__(self, value):
|
||
self.value = value
|
||
def __bool__(self):
|
||
return self.value is not self.sentinel
|
||
def __then__(self, result):
|
||
if result is self:
|
||
return result.value
|
||
return result
|
||
def __else__(self, result):
|
||
if result is self:
|
||
return result.value
|
||
return result
|
||
|
||
# Local subclass using a custom sentinel object
|
||
class if_defined(SentinelCheck):
|
||
sentinel=object()
|
||
|
||
# Using the sentinel to check whether or not an argument was supplied
|
||
def my_func(arg=if_defined.sentinel):
|
||
arg = if_defined(arg) or calculate_default()
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
As with PEP 505, actual implementation has been deferred pending in-principle
|
||
interest in the idea of making these changes - aside from the compatibility
|
||
concerns noted above, the implementation isn't really the hard part of these
|
||
proposals, the hard part is deciding whether or not this is a change where the
|
||
long term benefits for new and existing Python users outweigh the short term
|
||
costs involved in the wider ecosystem (including developers of other
|
||
implementations, language curriculum developers, and authors of other Python
|
||
related educational material) adjusting to the change.
|
||
|
||
...TBD...
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] PEP 335 rejection notification
|
||
(http://mail.python.org/pipermail/python-dev/2012-March/117510.html)
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain under the terms of the
|
||
CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|