PEP: 532 Title: Defining a conditional result management protocol Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-Oct-2016 Python-Version: 3.7 Abstract ======== Inspired by PEP 335, PEP 505, PEP 531, and the related discussions, this PEP proposes the addition of a new conditional result management protocol to Python that allows objects to customise the behaviour of the following expressions: * ``if-else`` conditional expressions * the ``and`` logical conjunction operator * the ``or`` logical disjunction operator * chained comparisons (which implicitly invoke ``and``) * the ``not`` logical negation operator Each of these expressions is ultimately a variant on the underlying pattern:: THEN_RESULT if CONDITION else ELSE_RESULT Currently, the ``CONDITION`` expression can control *which* branch is taken (based on whether it evaluates to ``True`` or ``False`` in a boolean context), but it can't influence the *result* of taking that branch. This PEP proposes the addition of two new "conditional result management" protocol methods that allow conditional result managers to influence the results of each branch directly: * ``__then__(self, result)``, to alter the result when the condition is ``True`` * ``__else__(self, result)``, to alter the result when the condition is ``False`` While there are some practical complexities arising from the current handling of single-valued arrays in NumPy, this should be sufficient to allow elementwise chained comparison operations for matrices, where the result is a matrix of boolean values, rather than tautologically returning ``True`` or raising ``ValueError``. To properly support logical negation of conditional result managers, a new ``__not__`` protocol methro would also be introduced allowing objects to control the result of ``not obj`` expressions. The PEP further proposes the addition of new ``exists`` and ``missing`` builtins that allow conditional branching based on whether or not an object is ``None``, but return the original object rather than the existence checking wrapper as the result of any conditional expressions. In addition to being usable as a simple boolean operator (e.g. as in ``assert all(exists, items)``), this allows existence checking fallback operations (aka null-coalescing operations) to be written as:: value = exists(expr1) or exists(expr2) or expr3 and existence checking precondition operations (aka null-propagating or null-severing operations) can be written as either:: value = exists(obj) and obj.field.of.interest value = exists(obj) and obj["field"]["of"]["interest"] or:: value = missing(obj) or obj.field.of.interest value = missing(obj) or obj["field"]["of"]["interest"] Relationship with other PEPs ============================ This PEP is a direct successor to PEP 531, replacing the existence checking protocol and the new ``?then`` and ``?else`` syntactic operators defined there with the ability to customise the behaviour of the established ``not``, ``and`` and ``or`` operators. The existence checking use case is taken from that PEP. It is also a direct successor to PEP 335, which proposed the ability to overload the ``and`` and ``or`` operators directly, rather than indirectly via interpretation as variants of the more general ``if-else`` conditional expressions. The discussion of the element-wise comparison use case is drawn from Guido's rejection of that PEP. This PEP competes with a number of aspects of PEP 505, proposing that improved support for null-coalescing operations be offered through a new protocol and new builtin, rather than through new syntax. It doesn't compete specifically with the proposed shorthands for existence checking attribute access and subscripting, but instead offers an alternative underlying semantic framework for defining them: * ``LHS ?? RHS`` would mean ``exists(LHS) or RHS`` * ``EXPR?.attr`` would mean ``missing(EXPR) or EXPR.attr`` * ``EXPR?[key]`` would mean ``missing(EXPR) or EXPR[key]`` Specification ============= Conditional expressions (``if-else``) ------------------------------------- The conditional expression ``THEN_RESULT if CONDITION else ELSE_RESULT`` is currently approximately equivalent to the following code:: if CONDITION: _expr_result = THEN_RESULT else: _expr_result = ELSE_RESULT The new protocol proposed in this PEP would change that to:: _condition = CONDITION _condition_type = type(CONDITION) if _condition: _then_result = THEN_RESULT if hasattr(_condition_type, "__then__"): _then_result = _condition_type.__then__(_condition, _then_result) _expr_result = _then_result else: _else_result = ELSE_RESULT if hasattr(_condition_type, "__else__"): _else_result = _condition_type.__else__(_condition, _else_result) _expr_result = _else_result The key change is that the value determining which branch of the conditional expression gets executed *also* gets a chance to postprocess the results of the expressions on each of the branches. Interpreter implementations may check eagerly for the new protocol methods on condition objects in order to retain an optimised fast path for the great many objects that support use in a boolean context, but don't implement the new protocol. Logical conjunction (``and``) ----------------------------- Logical conjunction is affected by this proposal as if:: LHS and RHS was internally implemented by the interpreter as:: _lhs_result = LHS _expr_result = RHS if _lhs_result else _lhs_result Conditional result managers can force non-shortcircuiting evaluation under logical conjunction by always returning ``True`` from ``__bool__`` and enforce this at runtime by raising ``NotImplementedError`` by raising ``NotImplementedError`` in ``__else__``. Alternatively, conditional result managers can detect short-circuited evaluation of logical conjunction in ``__else__`` implementations by looking for cases where ``self`` and ``result`` are the exact same object. Logical disjunction (``or``) ----------------------------- Logical disjunction is affected by this proposal as if:: LHS or RHS was internally implemented by the interpreter as:: _lhs_result = LHS _expr_result = _lhs_result if _lhs_result else RHS Conditional result managers can force non-shortcircuiting evaluation under logical disjunction by always returning ``False`` from ``__bool__`` and enforce this at runtime by raising ``NotImplementedError`` by raising ``NotImplementedError`` in ``__then__``. Alternatively, conditional result managers can detect short-circuited evaluation of logical disjunction in ``__then__`` implementations by looking for cases where ``self`` and ``result`` are the exact same object. Chained comparisons ------------------- Chained comparisons are affected by this proposal as if:: LEFT_BOUND left_op EXPR right_op RIGHT_BOUND was internally implemented by the interpreter as:: _expr = EXPR _lhs_result = LEFT_BOUND left_op EXPR _expr_result = _lhs_result if _lhs_result else (_expr right_op RIGHT_BOUND) As with any logical conjunction, conditional result managers returned by comparison operations can force non-shortcircuiting evaluating in these cases by always returning ``True`` from ``__bool__``. Existence checking comparisons ------------------------------ Two new builtins implementing the new protocol are proposed to encapsulate the notion of "existence checking": seeing if a value is ``None`` and either falling back to an alternative value (an operation known as "None-coalescing") or passing it through as the result of the overall expression (an operation known as "None-severing" or "None-propagating"). These builtins would be defined as follows:: class exists: """Conditional result manager for 'EXPR is not None' checks""" def __init__(self, value): self.value = value def __not__(self): return missing(self.value) def __bool__(self): return self.value is not None def __then__(self, result): if result is self: return result.value return result def __else__(self, result): if result is self: return result.value return result class missing: """Conditional result manager for 'EXPR is None' checks""" def __init__(self, value): self.value = value def __not__(self): return exists(self.value) def __bool__(self): return self.value is None def __then__(self, result): if result is self: return result.value return result def __else__(self, result): if result is self: return result.value return result Aside from changing the definition of ``__bool__`` to be based on ``is not None`` rather than normal truth checking, the key characteristic of ``exists`` is that when it is used as a conditional result manager, it is *ephemeral*: when it detects that short circuiting has taken place, it returns the original value, rather than the existence checking wrapper. ``missing`` is defined as the logically inverted counterpart of ``exists``: ``not exists(obj)`` is semantically equivalent to ``missing(obj)``. Other conditional constructs ---------------------------- No changes are proposed to if statements, while statements, comprehensions, or generator expressions, as the boolean clauses they contain are purely used for control flow purposes and don't have programmatically accessible "results". However, it's worth noting that while such proposals are outside the scope of this PEP, the conditional result management protocol defined here would be sufficient to support constructs like:: while exists(dynamic_query()) as result: ... # Code using result Rationale ========= Avoiding new syntax ------------------- Adding new syntax to Python to make particular software design problems easier to handle is considered a solution of last resort. As a successor to PEP 335, this PEP focuses on making the existing ``and`` and ``or`` operators less rigid in their interpretation, rather than on proposing new operators. Element-wise chained comparisons -------------------------------- In ultimately rejecting PEP 335, Guido van Rossum noted [1_]: The NumPy folks brought up a somewhat separate issue: for them, the most common use case is chained comparisons (e.g. A < B < C). To understand this obversation, we first need to look at how comparisons work with NumPy arrays:: >>> import numpy as np >>> increasing = np.arange(5) >>> increasing array([0, 1, 2, 3, 4]) >>> decreasing = np.arange(4, -1, -1) >>> decreasing array([4, 3, 2, 1, 0]) >>> increasing < decreasing array([ True, True, False, False, False], dtype=bool) Here we see that NumPy array comparisons are element-wise by default, comparing each element in the lefthand array to the corresponding element in the righthand array, and producing a matrix of boolean results. If either side of the comparison is a scalar value, then it is broadcast across the array and compared to each individual element:: >>> 0 < increasing array([False, True, True, True, True], dtype=bool) >>> increasing < 4 array([ True, True, True, True, False], dtype=bool) However, this broadcasting idiom breaks down if we attempt to use chained comparisons:: >>> 0 < increasing < 4 Traceback (most recent call last): File "", line 1, in ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() The problem is that internally, Python implicitly expands this chained comparison into the form:: >>> 0 < increasing and increasing < 4 Traceback (most recent call last): File "", line 1, in ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() And NumPy only permits implicit coercion to a boolean value for single-element arrays where ``a.any()`` and ``a.all()`` can be assured of having the same result:: >>> np.array([False]) and np.array([False]) array([False], dtype=bool) >>> np.array([False]) and np.array([True]) array([False], dtype=bool) >>> np.array([True]) and np.array([False]) array([False], dtype=bool) >>> np.array([True]) and np.array([True]) array([ True], dtype=bool) The proposal in this PEP would allow this situation to be changed by updating the definition of element-wise comparison operations in NumPy to return a dedicated subclass that both implements the new protocol methods and also changes the result array's interpretation in a boolean context to always return true and hence avoid Python's default short-circuiting behaviour:: class ComparisonResultArray(np.ndarray): def __bool__(self): return True def __then__(self, result): if result is self: msg = ("Comparison array truth values are ambiguous outside " "chained comparisons. Use a.any() or a.all()") raise ValueError(msg) return np.logical_and(self, result.view(ComparisonResultArray)) def __else__(self, result): raise NotImplementedError("Comparison result arrays are never False") With this change, the chained comparison example above would be able to return:: >>> 0 < increasing < 4 ComparisonResultArray([ False, True, True, True, False], dtype=bool) Existence checking expressions ------------------------------ An increasingly common requirement in modern software development is the need to work with "semi-structured data": data where the structure of the data is known in advance, but pieces of it may be missing at runtime, and the software manipulating that data is expected to degrade gracefully (e.g. by omitting results that depend on the missing data) rather than failing outright. Some particularly common cases where this issue arises are: * handling optional application configuration settings and function parameters * handling external service failures in distributed systems * handling data sets that include some partial records At the moment, writing such software in Python can be genuinely awkward, as your code ends up littered with expressions like: * ``value1 = expr1.field.of.interest if expr1 is not None else None`` * ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None`` * ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not None else expr5`` PEP 531 goes into more detail on some of the challenges of working with this kind of data, particularly in data transformation pipelines where dealing with potentially missing content is the norm rather than the exception. The combined impact of the proposals in this PEP is to allow the above sample expressions to instead be written as: * ``value1 = exists(expr1) and expr1.field.of.interest`` * ``value2 = exists(expr2) and expr2.["field"]["of"]["interest"]`` * ``value3 = exists(expr3) or exists(expr4) or expr5`` In these forms, significantly more of the text presented to the reader is immediately relevant to the question "What does this code do?", while the boilerplate code to handle missing data by passing it through to the output or falling back to an alternative input, has shrunk to four uses of the new ``exists`` builtin, two uses of the ``and`` keyword, and two uses of the ``or`` keyword. In the first two examples, the 31 character boilerplate suffix ``if exprN is not None else None`` (minimally 27 characters for a single letter variable name) has been replaced by a 20 character `exists(expr1) and`` prefix (minimally 16 characters with a single letter variable name), somewhat improving the signal-to-pattern-noise ratio of the lines (especially if it encourages the use of more meaningful variable and field names rather than making them shorter purely for the sake of expression brevity). In the last example, not only are two instances of the 26 character boilerplate, ``if exprN is not None else`` (minimally 22 characters) replaced with the 14 character function call ``exists() or``, with that function call being placed directly around the original expression, eliminating the need to duplicate it in the conditional existence check. Risks and concerns ================== Readability ----------- Python has a long history of disallowing customisation of the control flow operators, and overloading them isn't particularly common in other languages either. Even languages which do permit overloading may lose the property of short-circuiting evaluation when overloaded (e.g. that happens when overloading ``&&`` and ``||`` in C++). This history means that the idea of ``and`` and ``or`` suddenly gaining the ability to be interpreted differently based on the type of the left-hand operand is a potentially controversial one from a readability and maintainability perspective, to the point where it may be *less* controversial to define a single new ``??`` operator as proposed in PEP 505, or separate ``?then`` and ``?else`` operators as suggested in PEP 531 than it would be to redefine the existing operators (as currently proposed in this PEP). Such an approach would also address one of Guido's key concerns with PEP 335 [1_] that would also apply to this PEP as currently written: Amongst other reasons, I really dislike that the PEP adds to the bytecode for all uses of these operators even though almost no call sites will ever need the feature. If the protocol in this PEP was combined with the core syntactic proposals in PEP 531, then the end result would look something like: * ``value1 = exists(expr1) ?then expr1.field.of.interest`` * ``value2 = exists(expr2) ?then expr2["field"]["of"]["interest"]`` * ``value3 = exists(expr3) ?else exists(expr4) ?else expr5`` Rather than indicating use of the existence protocol as suggested in PEP 531, the ``?`` here would indicate use of the conditional result management protocol, and hence the fact the result may be something other than the LHS as written when the short-circuiting path is executed. Alternatively, if only a single new operator was added as proposed in PEP 505, but it used the semantics proposed for ``or`` in this PEP, then the end result would look something like: * ``value1 = missing(expr1) ?? expr1.field.of.interest`` * ``value2 = missing(expr2) ?? expr2["field"]["of"]["interest"]`` * ``value3 = exists(expr3) ?? exists(expr4) ?? expr5`` If new operators were added rather than redefining the semantics of ``and``, ``or`` and ``if-else``, then it would make sense to *require* that their left hand operand be a conditional result manager that defines both ``__then__`` and ``__else__``, rather than accepting arbitrary objects as ``and`` and ``or`` do. With that approach, chained comparisons would be conditionally redefined in terms of the new protocol when the left comparison produces a conditional result manager, while continuing to be defined in terms of ``and`` for any other left comparison result. Compatibility ------------- At least CPython's peephole optimizer, and presumably other Python optimizers, include a lot of assumptions about the semantics of ``and`` and ``or`` expressions. This means that any changes to those semantics are likely to require interpreter implementors to closely review a whole lot of code related not only to the way those operations are implemented, but also to the way they're optimized. By contrast, new operators would be substantially lower risk, as existing optimizers couldn't be making any assumptions about how they work. Speed of execution ------------------ Making relatively common operations like ``and`` and ``or`` check for additional protocol methods is likely to slow them down in the common case. The additional overhead should be small relative to the cost of boolean truth checking, but it won't be zero. Defining new operators rather than reusing existing ones would address this concern as well. Design Discussion ================= Arbitrary sentinel objects -------------------------- Unlike PEP 531, this proposal readily handles custom sentinel objects:: # Definition of a base configurable sentinel check that defaults to None class SentinelCheck: sentinel = None def __init__(self, value): self.value = value def __bool__(self): return self.value is not self.sentinel def __then__(self, result): if result is self: return result.value return result def __else__(self, result): if result is self: return result.value return result # Local subclass using a custom sentinel object class if_defined(SentinelCheck): sentinel=object() # Using the sentinel to check whether or not an argument was supplied def my_func(arg=if_defined.sentinel): arg = if_defined(arg) or calculate_default() Implementation ============== As with PEP 505, actual implementation has been deferred pending in-principle interest in the idea of making these changes - aside from the compatibility concerns noted above, the implementation isn't really the hard part of these proposals, the hard part is deciding whether or not this is a change where the long term benefits for new and existing Python users outweigh the short term costs involved in the wider ecosystem (including developers of other implementations, language curriculum developers, and authors of other Python related educational material) adjusting to the change. ...TBD... References ========== .. [1] PEP 335 rejection notification (http://mail.python.org/pipermail/python-dev/2012-March/117510.html) Copyright ========= This document has been placed in the public domain under the terms of the CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: