PEP: 532 Title: Defining "else" as a protocol based circuit breaking operator Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-Oct-2016 Python-Version: 3.7 Abstract ======== Inspired by PEP 335, PEP 505, PEP 531, and the related discussions, this PEP proposes the addition of a new protocol-driven circuit breaking operator to Python that allows the left operand to decide whether or not the expression should short circuit and return a result immediately, or else continue on with evaluation of the right operand:: exists(foo) else bar missing(foo) else foo.bar() These two expressions can be read as: * "the expression result is 'foo' if it exists, otherwise it is 'bar'" * "the expression result is 'foo' if it is missing, otherwise it is 'foo.bar()'" Execution of these expressions relies on a new circuit breaking protocol that implicitly avoids repeated evaluation of the left operand while letting that operand fully control the result of the expression, regardless of whether it skips evaluation of the right operand or not:: _lhs = LHS type(_lhs).__then__(_lhs) if _lhs else type(_lhs).__else__(_lhs, RHS) To properly support logical negation of circuit breakers, a new ``__not__`` protocol method would also be introduced allowing objects to control the result of ``not obj`` expressions. As shown in the basic example above, the PEP further proposes the addition of builtin ``exists`` and ``missing`` circuit breakers that provide conditional branching based on whether or not an object is ``None``, but return the original object rather than the existence checking wrapper when the expression evaluation short circuits. In addition to being usable as simple boolean operators (e.g. as in ``assert all(exists, items)`` or ``if any(missing, items):``), these circuit breakers will allow existence checking fallback operations (aka None-coalescing operations) to be written as:: value = exists(expr1) else exists(expr2) else expr3 and existence checking precondition operations (aka None-propagating or None-severing operations) to be written as:: value = missing(obj) else obj.field.of.interest value = missing(obj) else obj["field"]["of"]["interest"] A change to the definition of chained comparisons is also proposed, where the comparison chaining will be updated to use the circuit breaking operator rather than the logical disjunction (``and``) operator if the left hand comparison returns a circuit breaker as its result. While there are some practical complexities arising from the current handling of single-valued arrays in NumPy, this change should be sufficient to allow elementwise chained comparison operations for matrices, where the result is a matrix of boolean values, rather than tautologically returning ``True`` or raising ``ValueError``. Relationship with other PEPs ============================ This PEP is a direct successor to PEP 531, replacing the existence checking protocol and the new ``?then`` and ``?else`` syntactic operators defined there with a single protocol driven ``else`` operator and adjustments to the ``not`` operator. The existence checking use cases are taken from that PEP. It is also a direct successor to PEP 335, which proposed the ability to overload the ``and`` and ``or`` operators directly, with the ability to overload the semantics of comparison chaining being one of the consequences of that change. The proposal in this PEP to instead handle the element-wise comparison use case by changing the semantic definition of comparison chaining is drawn from Guido's rejection of PEP 335. This PEP competes with the dedicated null-coalescing operator in PEP 505, proposing that improved support for null-coalescing operations be offered through a more general protocol-driven short circuiting operator and related builtins, rather than through a dedicated single-purpose operator. It doesn't compete with PEP 505's proposed shorthands for existence checking attribute access and subscripting, but instead offers an alternative underlying semantic framework for defining them: * ``EXPR?.attr`` would be syntactic sugar for ``missing(EXPR) else EXPR.attr`` * ``EXPR?[key]`` would be syntactic sugar for ``missing(EXPR) else EXPR[key]`` In both cases, the dedicated syntactic form could be optimised to avoid actually creating the circuit breaker instance. Specification ============= The circuit breaking operator (``else``) ---------------------------------------- Circuit breaking expressions would be written using ``else`` as a new binary operator, akin to the existing ``and`` and ``or`` logical operators:: LHS else RHS Ignoring the hidden variable assignment, this is semantically equivalent to:: _lhs = LHS type(_lhs).__then__(_lhs) if _lhs else type(_lhs).__else__(_lhs, RHS) The key difference relative to the existing ``or`` operator is that the value determining which branch of the conditional expression gets executed *also* gets a chance to postprocess the results of the expressions on each of the branches. As part of the short-circuiting behaviour, interpreter implementations are expected to access only the protocol method needed for the branch that is actually executed, but it is still recommended that circuit breaker authors that always return ``True`` or always return ``False`` from ``__bool__`` explicitly raise ``NotImplementedError`` with a suitable message from branch methods that are never expected to be executed (see the comparison chaining use case in the Rationale section below for an example of that). It is proposed that the ``else`` operator use the same precedence level as the existing ``and`` and ``or`` operators. The precedence/associativity in the otherwise ambiguous case of ``expr1 if cond else expr2 else epxr3`` is TBD based on ease of implementation (a guideline will be added to PEP 8 to say "don't do that", as this construct will be inherently confusing for readers, regardless of how the interpreter executes it). Overloading logical inversion (``not``) --------------------------------------- Any circuit breaker definition will have a logical inverse that is still a circuit breaker, but inverts the answer as to whether or not to short circuit the expression evaluation. For example, the ``exists`` and ``missing`` circuit breakers proposed in this PEP are each other's logical inverse. A new protocol method, ``__not__(self)``, will be introduced to permit circuit breakers and other types to override ``not`` expressions to return their logical inverse rather than a coerced boolean result. To preserve the semantics of existing language optimisations, ``__not__`` implementations will be obliged to respect the following invariant:: assert not bool(obj) == bool(not obj) Chained comparisons ------------------- A chained comparison like ``0 < x < 10`` written as:: LEFT_BOUND left_op EXPR right_op RIGHT_BOUND is currently roughly semantically equivalent to:: _expr = EXPR _lhs_result = LEFT_BOUND left_op _expr _expr_result = _lhs_result and (_expr right_op RIGHT_BOUND) This PEP proposes that this be changed to explicitly check if the left comparison returns a circuit breaker, and if so, use ``else`` rather than ``and`` to implement the comparison chaining:: _expr = EXPR _lhs_result = LEFT_BOUND left_op _expr if hasattr(type(_lhs_result), "__then__"): _expr_result = _lhs_result else (_expr right_op RIGHT_BOUND) else: _expr_result = _lhs_result and (_expr right_op RIGHT_BOUND) This allows types like NumPy arrays to control the behaviour of chained comparisons by returning circuit breakers from comparison operations. Existence checking comparisons ------------------------------ Two new builtins implementing the new protocol are proposed to encapsulate the notion of "existence checking": seeing if a value is ``None`` and either falling back to an alternative value (an operation known as "None-coalescing") or passing it through as the result of the overall expression (an operation known as "None-severing" or "None-propagating"). These builtins would be defined as follows:: class CircuitBreaker: """Base circuit breaker type (available as types.CircuitBreaker)""" def __init__(self, value, condition, inverse_type): self.value = value self._condition = condition self._inverse_type = inverse_type def __bool__(self): return self._condition def __not__(self): return self._inverse_type(self.value) def __then__(self): return self.value def __else__(self, other): if other is self: return self.value return other class exists(types.CircuitBreaker): """Circuit breaker for 'EXPR is not None' checks""" def __init__(self, value): super().__init__(value, value is not None, missing) class missing(types.CircuitBreaker): """Circuit breaker for 'EXPR is None' checks""" def __init__(self, value): super().__init__(value, value is None, exists) Aside from changing the definition of ``__bool__`` to be based on ``is not None`` rather than normal truth checking, the key characteristic of ``exists`` is that when it is used as a circuit breaker, it is *ephemeral*: when it is told that short circuiting has taken place, it returns the original value, rather than the existence checking wrapper. ``missing`` is defined as the logically inverted counterpart of ``exists``: ``not exists(obj)`` is semantically equivalent to ``missing(obj)``. The ``__else__`` implementations for both builtin circuit breakers are defined such that the wrapper will always be removed even if you explicitly pass the circuit breaker to both sides of the ``else`` expression:: breaker = exists(foo) assert (breaker else breaker) is foo breaker = missing(foo) assert (breaker else breaker) is foo Other conditional constructs ---------------------------- No changes are proposed to if statements, while statements, conditional expressions, comprehensions, or generator expressions, as the boolean clauses they contain are already used for control flow purposes. However, it's worth noting that while such proposals are outside the scope of this PEP, the circuit breaking protocol defined here would be sufficient to support constructs like:: while exists(dynamic_query()) as result: ... # Code using result and: if exists(re.search(pattern, text)) as match: ... # Code using match Leaving the door open to such a future extension is the main reason for recommending that circuit breaker implementations handle the ``self is other`` case in ``__else__`` implementations the same way as they handle the short-circuiting behaviour in ``__then__``. Style guide recommendations --------------------------- The following additions to PEP 8 are proposed in relation to the new features introduced by this PEP: * In the absence of other considerations, prefer the use of the builtin circuit breakers ``exists`` and ``missing`` over the corresponding conditional expressions * Do not combine conditional expressions (``if-else``) and circuit breaking expressions (the ``else`` operator) in a single expression - use one or the other depending on the situation, but not both. Rationale ========= Adding a new operator --------------------- Similar to PEP 335, early drafts of this PEP focused on making the existing ``and`` and ``or`` operators less rigid in their interpretation, rather than proposing new operators. However, this proved to be problematic for a few reasons: * defining a shared protocol for both ``and`` and ``or`` was confusing, as ``__then__`` was the short-circuiting outcome for ``or``, while ``__else__`` was the short-circuiting outcome for ``__and__`` * the ``and`` and ``or`` operators have a long established and stable meaning, so readers would inevitably be surprised if their meaning now became dependent on the type of the left operand. Even new users would be confused by this change due to 25+ years of teaching material that assumes the current well-known semantics for these operators * Python interpreter implementations, including CPython, have taken advantage of the existing semantics of ``and`` and ``or`` when defining runtime and compile time optimisations, which would all need to be reviewed and potentially discarded if the semantics of those operations changed Proposing a single new operator instead resolves all of those issues - ``__then__`` always indicates short circuiting, ``__else__`` only indicates "short circuiting" if the circuit breaker itself is also passed in as the right operand, and the semantics of ``and`` and ``or`` remain entirely unchanged. While the semantics of the unary ``not`` operator do change, the invariant required of ``__not__`` implementations means that existing expression optimisations in boolean contexts will remain valid. As a result of that design simplification, the new protocol and operator would even allow us to expose ``operator.logical_and`` and ``operator.logical_or`` as circuit breaker definitions if we chose to do so:: class logical_or(types.CircuitBreaker): """Circuit breaker for 'bool(EXPR)' checks""" def __init__(self, value): super().__init__(value, bool(value), logical_and) class logical_and(types.CircuitBreaker): """Circuit breaker for 'not bool(EXPR)' checks""" def __init__(self, value): super().__init__(value, not bool(value), logical_or) Naming the operator and protocol -------------------------------- The names "circuit breaking operator", "circuit breaking protocol" and "circuit breaker" are all inspired by the phrase "short circuiting operator": the general language design term for operators that only conditionally evaluate their right operand. The electrical analogy is that circuit breakers in Python detect and handle short circuits in expressions before they trigger any exceptions similar to the way that circuit breakers detect and handle short circuits in electrical systems before they damage any equipment or harm any humans. The Python level analogy is that just as a ``break`` statement lets you terminate a loop before it reaches its natural conclusion, a circuit breaking expression lets you terminate evaluation of the expression and produce a result immediately. Using an existing keyword ------------------------- Using an existing keyword has the benefit of allowing the new expression to be introduced without a ``__future__`` statement. ``else`` is semantically appropriate for the proposed new protocol, and the only syntactic ambiguity introduced arises when the new operator is combined with the explicit ``if-else`` conditional expression syntax. Element-wise chained comparisons -------------------------------- In ultimately rejecting PEP 335, Guido van Rossum noted [1_]: The NumPy folks brought up a somewhat separate issue: for them, the most common use case is chained comparisons (e.g. A < B < C). To understand this obversation, we first need to look at how comparisons work with NumPy arrays:: >>> import numpy as np >>> increasing = np.arange(5) >>> increasing array([0, 1, 2, 3, 4]) >>> decreasing = np.arange(4, -1, -1) >>> decreasing array([4, 3, 2, 1, 0]) >>> increasing < decreasing array([ True, True, False, False, False], dtype=bool) Here we see that NumPy array comparisons are element-wise by default, comparing each element in the lefthand array to the corresponding element in the righthand array, and producing a matrix of boolean results. If either side of the comparison is a scalar value, then it is broadcast across the array and compared to each individual element:: >>> 0 < increasing array([False, True, True, True, True], dtype=bool) >>> increasing < 4 array([ True, True, True, True, False], dtype=bool) However, this broadcasting idiom breaks down if we attempt to use chained comparisons:: >>> 0 < increasing < 4 Traceback (most recent call last): File "", line 1, in ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() The problem is that internally, Python implicitly expands this chained comparison into the form:: >>> 0 < increasing and increasing < 4 Traceback (most recent call last): File "", line 1, in ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() And NumPy only permits implicit coercion to a boolean value for single-element arrays where ``a.any()`` and ``a.all()`` can be assured of having the same result:: >>> np.array([False]) and np.array([False]) array([False], dtype=bool) >>> np.array([False]) and np.array([True]) array([False], dtype=bool) >>> np.array([True]) and np.array([False]) array([False], dtype=bool) >>> np.array([True]) and np.array([True]) array([ True], dtype=bool) The proposal in this PEP would allow this situation to be changed by updating the definition of element-wise comparison operations in NumPy to return a dedicated subclass that implements the new circuit breaking protocol and also changes the result array's interpretation in a boolean context to always return ``False`` and hence never trigger the short-circuiting behaviour:: class ComparisonResultArray(np.ndarray): def __bool__(self): return False def _raise_NotImplementedError(self): msg = ("Comparison array truth values are ambiguous outside " "chained comparisons. Use a.any() or a.all()") raise NotImplementedError(msg) def __not__(self): self._raise_NotImplementedError() def __then__(self): self._raise_NotImplementedError() def __else__(self, other): return np.logical_and(self, other.view(ComparisonResultArray)) With this change, the chained comparison example above would be able to return:: >>> 0 < increasing < 4 ComparisonResultArray([ False, True, True, True, False], dtype=bool) Existence checking expressions ------------------------------ An increasingly common requirement in modern software development is the need to work with "semi-structured data": data where the structure of the data is known in advance, but pieces of it may be missing at runtime, and the software manipulating that data is expected to degrade gracefully (e.g. by omitting results that depend on the missing data) rather than failing outright. Some particularly common cases where this issue arises are: * handling optional application configuration settings and function parameters * handling external service failures in distributed systems * handling data sets that include some partial records At the moment, writing such software in Python can be genuinely awkward, as your code ends up littered with expressions like: * ``value1 = expr1.field.of.interest if expr1 is not None else None`` * ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None`` * ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not None else expr5`` PEP 531 goes into more detail on some of the challenges of working with this kind of data, particularly in data transformation pipelines where dealing with potentially missing content is the norm rather than the exception. The combined impact of the proposals in this PEP is to allow the above sample expressions to instead be written as: * ``value1 = missing(expr1) else expr1.field.of.interest`` * ``value2 = missing(expr2) else expr2.["field"]["of"]["interest"]`` * ``value3 = exists(expr3) else exists(expr4) else expr5`` In these forms, significantly more of the text presented to the reader is immediately relevant to the question "What does this code do?", while the boilerplate code to handle missing data by passing it through to the output or falling back to an alternative input, has shrunk to two uses of the new ``missing`` builtin, and two uses of the new ``exists`` builtin. In the first two examples, the 31 character boilerplate suffix ``if exprN is not None else None`` (minimally 27 characters for a single letter variable name) has been replaced by a 19 character ``missing(expr1) else`` prefix (minimally 15 characters with a single letter variable name), markedly improving the signal-to-pattern-noise ratio of the lines (especially if it encourages the use of more meaningful variable and field names rather than making them shorter purely for the sake of expression brevity). The additional syntactic sugar proposals in PEP 505 would further reduce this boilerplate to a single ``?`` character that also eliminated the repetition of the expession being checked for existence. In the last example, not only are two instances of the 21 character boilerplate, `` if exprN is not None`` (minimally 17 characters) replaced with the 8 character function call ``exists()``, but that function call is placed directly around the original expression, eliminating the need to duplicate it in the conditional existence check. Risks and concerns ================== This PEP has been designed specifically to address the risks and concerns raised when discussing PEPs 335, 505 and 531. * it defines a new operator and adjusts the definition of chained comparison rather than impacting the existing ``and`` and ``or`` operators * the changes to the ``not`` unary operator are defined in such a way that control flow optimisations based on the existing semantics remain valid * rather than the cryptic ``??``, it uses ``else`` as the operator keyword in exactly the same sense as it is already used in conditional expressions * it defines a general purpose short-circuiting binary operator that can even be used to express the existing semantics of ``and`` and ``or`` rather than focusing solely and inflexibly on existence checking * it names the proposed builtins in such a way that they provide a strong mnemonic hint as to when the expression containing them will short-circuit and skip evaluating the right operand Design Discussion ================= Arbitrary sentinel objects -------------------------- Unlike PEPs 505 and 531, this proposal readily handles custom sentinel objects:: class defined(types.CircuitBreaker): MISSING = object() def __init__(self, value): super().__init__(self, value is not self.MISSING, not_defined) class not_defined(types.CircuitBreaker): def __init__(self, value): super().__init__(self, value is defined.MISSING, defined) # Using the sentinel to check whether or not an argument was supplied def my_func(arg=defined.MISSING): arg = defined(arg) else calculate_default() Implementation ============== As with PEP 505, actual implementation has been deferred pending in-principle interest in the idea of making these changes - aside from the possible syntactic ambiguity concerns noted above, the implementation isn't really the hard part of these proposals, the hard part is deciding whether or not this is a change where the long term benefits for new and existing Python users outweigh the short term costs involved in the wider ecosystem (including developers of other implementations, language curriculum developers, and authors of other Python related educational material) adjusting to the change. ...TBD... Acknowledgements ================ Thanks go to Mark E. Haase for feedback on and contributions to earlier drafts of this proposal. However, his clear and exhaustive explanation of the original protocol design that modified the semantics of ``if-else`` conditional expressions to use an underlying ``__then__``/``__else__`` protocol helped convince me it was too complicated to keep, so this iteration contains neither that version of the protocol, nor Mark's explanation of it. References ========== .. [1] PEP 335 rejection notification (http://mail.python.org/pipermail/python-dev/2012-March/117510.html) Copyright ========= This document has been placed in the public domain under the terms of the CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: