diff --git a/pep-0531.txt b/pep-0531.txt index 83640b802..d86c1fe7a 100644 --- a/pep-0531.txt +++ b/pep-0531.txt @@ -8,25 +8,35 @@ Type: Standards Track Content-Type: text/x-rst Created: 25-Oct-2016 Python-Version: 3.7 +Post-History: 28-Oct-2016 Abstract ======== Inspired by PEP 505 and the related discussions, this PEP proposes the addition -of two new logical operators to Python: +of two new control flow operators to Python: -* Existence-checking fallback: ``expr1 ?else expr2`` -* Existence-checking precondition: ``expr1 ?and expr2`` +* Existence-checking precondition ("exists-then"): ``expr1 ?then expr2`` +* Existence-checking fallback ("exists-else"): ``expr1 ?else expr2`` as well as the following abbreviations for common existence checking expressions and statements: -* Existence-checking attribute access: ``obj?.attr`` (for ``obj ?and obj.attr``) -* Existence-checking subscripting: ``obj?[expr]`` (for ``obj ?and obj[expr]``) -* Existence-checking assignment: ``target ?= expr`` +* Existence-checking attribute access: + ``obj?.attr`` (for ``obj ?then obj.attr``) +* Existence-checking subscripting: + ``obj?[expr]`` (for ``obj ?then obj[expr]``) +* Existence-checking assignment: + ``value ?= expr`` (for ``value = value ?else expr``) -These expressions will be defined in terms of a new "existence" protocol, -accessible as ``operator.exists``, with the following characteristics: +The common ``?`` symbol in these new operator definitions indicates that they +use a new "existence checking" protocol rather than the established +truth-checking protocol used by if statements, while loops, comprehensions, +generator expressions, conditional expressions, logical conjunction, and +logical disjunction. + +This new protocol would be made available as ``operator.exists``, with the +following characteristics: * types can define a new ``__exists__`` magic method (Python) or ``tp_exists`` slot (C) to override the default behaviour. This optional @@ -34,12 +44,13 @@ accessible as ``operator.exists``, with the following characteristics: * ``operator.exists(None)`` returns ``False`` * ``operator.exists(NotImplemented)`` returns ``False`` * ``operator.exists(Ellipsis)`` returns ``False`` -* Python's builtin and standard library numeric types will override the - existence check such that ``NaN`` values return ``False`` and other - values return ``True`` +* ``float``, ``complex`` and ``decimal.Decimal`` will override the existence + check such that ``NaN`` values return ``False`` and other values (including + zero values) return ``True`` * for any other type, ``operator.exists(obj)`` returns True by default. Most - importantly, values that evaluate to False in a boolean context (zeroes, - empty containers) evaluate to True in an existence checking context + importantly, values that evaluate to False in a truth checking context + (zeroes, empty containers) will still evaluate to True in an existence + checking context Relationship with other PEPs @@ -47,15 +58,15 @@ Relationship with other PEPs While this PEP was inspired by and builds on Mark Haase's excellent work in putting together PEP 505, it ultimately competes with that PEP due to -differences in the specifics of the proposed syntax and semantics for the -feature. +significant differences in the specifics of the proposed syntax and semantics +for the feature. It also presents a different perspective on the rationale for the change by focusing on the benefits to existing Python users as the typical demands of application and service development activities are genuinely changing. It isn't an accident that similar features are now appearing in multiple -programming languages, and it's a good idea for us to learn from how other -language designers are handling the problem, but precedents set elsewhere +programming languages, and while it's a good idea for us to learn from how other +language designers are handling the problem, precedents being set elsewhere are more relevant to *how* we would go about tackling this problem than they are to whether or not we think it's a problem we should address in the first place. @@ -91,39 +102,71 @@ an essential feature of modern programming environments. At the moment, writing such software in Python can be genuinely awkward, as your code ends up littered with expressions like: -* ``value = expr.field.of.interest if expr is not None else None`` -* ``value = expr["field"]["of"]["interest"] if expr is not None else None`` -* ``value = expr1 if expr1 is not None else expr2 if expr2 is not None else expr3`` +* ``value1 = expr1.field.of.interest if expr1 is not None else None`` +* ``value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None`` +* ``value3 = expr3 if expr3 is not None else expr4 if expr4 is not None else expr5`` If these are only occasional, then expanding out to full statement forms may help improve readability, but if you have 4 or 5 of them in a row (which is a fairly common situation in data transformation pipelines), then replacing them with 16 or 20 lines of conditional logic really doesn't help matters. +Expanding the three examples above that way hopefully helps illustrate that:: + + _expr1 = expr1 + if _expr1 is not None: + value1 = _expr1.field.of.interest + else: + value1 = None + _expr2 = expr2 + if _expr2 is not None: + value2 = _expr2["field"]["of"]["interest"] + else: + value2 = None + _expr3 = expr3 + if _expr3 is not None: + value3 = _expr3 + else: + _expr4 = expr4 + if _expr4 is not None: + value3 = _expr4 + else: + value3 = expr5 + The combined impact of the proposals in this PEP is to allow the above sample expressions to instead be written as: -* ``value = expr?.field.of.interest`` -* ``value = expr?["field"]["of"]["interest"]`` -* ``value = expr1 ?else expr2 ?else expr3`` +* ``value1 = expr1?.field.of.interest`` +* ``value2 = expr2?["field"]["of"]["interest"]`` +* ``value3 = expr3 ?else expr4 ?else expr5`` -In the first two examples, the 30 character boilerplate clause -`` if expr is not None else None`` (minimally 27 characters for a single letter +In these forms, almost all of the information presented to the reader is +immediately relevant to the question "What does this code do?", while the +boilerplate code to handle missing data by passing it through to the output +or falling back to an alternative input, has shrunk to two uses of the ``?`` +symbol and two uses of the ``?else`` keyword. + +In the first two examples, the 31 character boilerplate clause +`` if exprN is not None else None`` (minimally 27 characters for a single letter variable name) has been replaced by a single ``?`` character, substantially improving the signal-to-pattern-noise ratio of the lines (especially if it encourages the use of more meaningful variable and field names rather than making them shorter purely for the sake of expression brevity). In the last example, two instances of the 21 character boilerplate, -`` if expr1 is not None`` (minimally 17 characters) are replaced with single +`` if exprN is not None`` (minimally 17 characters) are replaced with single characters, again substantially improving the signal-to-pattern-noise ratio. +Furthermore, each of our 5 "subexpressions of potential interest" is included +exactly once, rather than 4 of them needing to be duplicated or pulled out +to a named variable in order to first check if they exist. + The existence checking precondition operator is mainly defined to provide a -common conceptual basis for the existence checking attribute access and +clear conceptual basis for the existence checking attribute access and subscripting operators: -* ``obj?.attr`` is roughly equivalent to ``obj ?and obj.attr`` -* ``obj?[expr]``is roughly equivalent to ``obj ?and obj[expr]`` +* ``obj?.attr`` is roughly equivalent to ``obj ?then obj.attr`` +* ``obj?[expr]``is roughly equivalent to ``obj ?then obj[expr]`` The main semantic difference between the shorthand forms and their expanded equivalents is that the common subexpression to the left of the existence @@ -140,18 +183,31 @@ handling idiom: * ``value = value if value is not None else expensive_default()`` -allowing that to instead be abbreviated as: +by allowing that to instead be abbreviated as: * ``value ?= expensive_default()`` This is mainly beneficial when the target is a subscript operation or -subattribute - -Even without this specific change, the PEP would still +subattribute, as even without this specific change, the PEP would still permit this idiom to be updated to: * ``value = value ?else expensive_default()`` +The main argument *against* adding this form is that it's arguably ambiguous +and could mean either: + +* ``value = value ?else expensive_default()``; or +* ``value = value ?then value.subfield.of.interest`` + +The second form isn't at all useful, but if this concern was deemed significant +enough to address while still keeping the augmented assignment feature, +the full keyword could be included in the syntax: + +* ``value ?else= expensive_default()`` + +Alternatively, augmented assignment could just be dropped from the current +proposal entirely and potentially reconsidered at a later date. + Existence checking protocol --------------------------- @@ -172,16 +228,50 @@ Similarly, it seems reasonable to declare that the other placeholder builtin singletons, ``Ellipsis`` and ``NotImplemented``, also qualify as objects that represent the absence of data moreso than they represent data. -Proposed syntax ---------------- -Without a mathematical precedent to draw on (as Python historically has for -other operations), the proposed use of ``?`` as the key syntactic marker for -this feature is primarily derived from the corresponding syntax in other -languages that offer similar features. +Proposed symbolic notation +-------------------------- -Drawing from the excellent summary in PEP 505 and the Wikipedia articles on -the "safe navigation operator [1] and the "null coalescing operator" [2], +Python has historically only had one kind of implied boolean context: truth +checking, which can be invoked directly via the ``bool()`` builtin. As this PEP +proposes a new kind of control flow operation based on existence checking rather +than truth checking, it is considered valuable to have a reminder directly +in the code when existence checking is being used rather than truth checking. + +The mathematical symbol for existence assertions is U+2203 'THERE EXISTS': ``∃`` + +Accordingly, one possible approach to the syntactic additions proposed in this +PEP would be to use that already defined mathematical notation: + +* ``expr1 ∃then expr2`` +* ``expr1 ∃else expr2`` +* ``obj∃.attr`` +* ``obj∃[expr]`` +* ``target ∃= expr`` + +However, there are two major problems with that approach, one practical, and +one pedagogical. + +The practical problem is the usual one that most keyboards don't offer any easy +way of entering mathematical symbols other than those used in basic arithmetic +(even the symbols appearing in this PEP were ultimately copied & pasted +from [3]_ rather than being entered directly). + +The pedagogical problem is that the symbols for existence assertions (``∃``) +and universal assertions (``∀``) aren't going to be familiar to most people +the way basic arithmetic operators are, so we wouldn't actually be making the +proposed syntax easier to understand by adopting ``∃``. + +By contrast, ``?`` is one of the few remaining unused ASCII punctuation +characters in Python's syntax, making it available as a candidate syntactic +marker for "this control flow operation is based on an existence check, not a +truth check". + +Taking that path would also have the advantage of aligning Python's syntax +with corresponding syntax in other languages that offer similar features. + +Drawing from the existing summary in PEP 505 and the Wikipedia articles on +the "safe navigation operator [1]_ and the "null coalescing operator" [2]_, we see: * The ``?.`` existence checking attribute access syntax precisely aligns with: @@ -203,17 +293,60 @@ we see: * the "nil-coalescing" operator in Swift (``??``) To be clear, these aren't the only spelling of these operators used in other -languages, but they're the most common ones, and the ``?`` is far and away -the most common syntactic marker (presumably prompted by the use of ``?`` in -C-style conditional expressions, which many of these languages also offer). +languages, but they're the most common ones, and the ``?`` symbol is the most +common syntactic marker by far (presumably prompted by the use of ``?`` to +introduce the "then" clause in C-style conditional expressions, which many +of these languages also offer). -``?else`` is proposed over ``?or`` for the existence checking fallback syntax -simply because it reads more clearly as "choose the first subexpression that -exists" when multiple instances of the expression are chained together. -``?and`` is proposed as the spelling for the existence checking precondition -syntax as it semantically relates to ``?else`` in the same way that ``and`` -relates to ``or``. +Proposed keywords +----------------- + +Given the symbolic marker ``?``, it would be syntactically unambiguous to spell +the existence checking precondition and fallback operations using the same +keywords as their truth checking counterparts: + +* ``expr1 ?and expr2`` (instead of ``expr1 ?then expr2``) +* ``expr1 ?or expr2`` (instead of ``expr1 ?else expr2``) + +However, while syntactically unambiguous when written, this approach makes +the code incredibly hard to *pronounce* (What's the pronunciation of "?"?) and +also hard to *describe* (given reused keywords, there's no obvious shorthand +terms for "existence checking precondition (?and)" and "existence checking +fallback (?or)" that would distinguish them from "logical conjunction (and)" +and "logical disjunction (or)"). + +We could try to encourage folks to pronounce the ``?`` symbol as "exists", +making the shorthand names the "exists-and expression" and the +"exists-or expression", but there'd be no way of guessing those names purely +from seeing them written in a piece of code. + +Instead, this PEP takes advantage of the proposed symbolic syntax to introduce +a new keyword (``?then``) and borrow an existing one (``?else``) in a way +that allows people to refer to "then expressions" and "else expressions" +without ambiguity. + +These keywords also align well with the conditional expressions that are +semantically equivalent to the proposed expressions. + +For ``?else`` expressions, ``expr1 ?else expr2`` is equivalent to:: + + _lhs_result = expr1 + _lhs_result if operator.exists(_lhs_result) else expr2 + +Here the parallel is clear, since the ``else expr2`` appears at the end of +both the abbreviated and expanded forms. + +For ``?then`` expressions, ``expr1 ?then expr2`` is equivalent to:: + + _lhs_result = expr1 + expr2 if operator.exists(_lhs_result) else _lhs_result + +Here the parallel isn't as immediately obvious due to Python's traditionally +anonymous "then" clauses (introduced by ``:`` in ``if`` statements and suffixed +by ``if`` in conditional expressions), but it's still reasonably clear as long +as you're already familiar with the "if-then-else" explanation of conditional +control flow. Risks and concerns @@ -225,7 +358,7 @@ Readability Learning to read and write the new syntax effectively mainly requires internalising two concepts: -* expressions containing ``?`` will return None if their input is None +* expressions containing ``?`` include an existence check and may short circuit * if ``None`` or another "non-existent" value is an expected input, and the correct handling is to propagate that to the result, then the existence checking operators are likely what you want @@ -234,6 +367,7 @@ Currently, these concepts aren't explicitly represented at the language level, so it's a matter of learning to recognise and use the various idiomatic patterns based on conditional expressions and statements. + Magic syntax ------------ @@ -242,11 +376,11 @@ There's nothing about ``?`` as a syntactic element that inherently suggests symbol in Python code is as a trailing suffix in IPython environments to request help information for the result of the preceding expression. -However, the notion of existence checking really does make the most sense -as a modifier on existing operators (aside from the proposed spelling of -the fallback operator as ``?else`` rather than ``?or``), and that calls for +However, the notion of existence checking really does benefit from a pervasive +visual marker that distinguishes it from truth checking, and that calls for a single-character symbolic syntax if we're going to do it at all. + Conceptual complexity --------------------- @@ -259,7 +393,8 @@ of the language, as many more expectations will map correctly between truth checking with ``bool(expr)`` and existence checking with ``operator.exists(expr)`` than currently map between truth checking and existence checking with ``expr is not None`` (or ``expr is not NotImplemented`` -in the context of operand coercion). +in the context of operand coercion, or the various NaN-checking operations +in mathematical libraries). As a simple example of the new parallels introduced by this PEP, compare:: @@ -269,14 +404,117 @@ As a simple example of the new parallels introduced by this PEP, compare:: at_least_one_exists = any(map(operator.exists, iterable)) +Design Discussion +================= + +Subtleties in chaining existence checking expressions +----------------------------------------------------- + +Similar subtleties arise in chaining existence checking expressions as already +exist in chaining logical operators: the behaviour can be surprising if the +right hand side of one of the expressions in the chain itself returns a +value that doesn't exist. + +As a result, ``value = arg1 ?then f(arg1) ?else default()`` would be dubious for +essentially the same reason that ``value = cond and expr1 or expr2`` is dubious: +the former will evaluate ``default()`` if ``f(arg1)`` returns ``None``, just +as the latter will evaluate ``expr2`` if ``expr1`` evaluates to ``False`` in +a boolean context. + + +Ambiguous interaction with conditional expressions +-------------------------------------------------- + +In the proposal as currently written, the following is a syntax error: + +* ``value = f(arg) if arg ?else default`` + +While the following is a valid operation that checks a second condition if the +first doesn't exist rather than merely being false: + +* ``value = expr1 if cond1 ?else cond2 else expr2`` + +The expression chaining problem described above means that the argument can be +made that the first operation should instead be equivalent to: + +* ``value = f(arg) if operator.exists(arg) else default`` + +requiring the second to be written in the arguably clearer form: + +* ``value = expr1 if (cond1 ?else cond2) else expr2`` + +Alternatively, the first form could remain a syntax error, and the existence +checking symbol could instead be attached to the ``if`` keyword: + +* ``value = expr1 if? cond else expr2`` + + +Existence checking in other truth-checking contexts +--------------------------------------------------- + +The truth-checking protocol is currently used in the following syntactic +constructs: + +* logical conjunction (and-expressions) +* logical disjunction (or-expressions) +* conditional expressions (if-else expressions) +* if statements +* while loops +* filter clauses in comprehensions and generator expressions + +In the current PEP, switching from truth-checking with ``and`` and ``or`` to +existence-checking is a matter of substituting in the new keywords, ``?then`` +and ``?else`` in the appropriate places. + +For other truth-checking contexts, it proposes either importing and +using the ``operator.exists`` API, or else continuing with the current idiom +of checking specifically for ``expr is not None`` (or the context appropriate +equivalent). + +The simplest possible enhancement in that regard would be to elevate the +proposed ``exists()`` API from an operator module function to a new builtin +function. + +Alternatively, the ``?`` existence checking symbol could be supported as a +modifier on the ``if`` and ``while`` keywords to indicate the use of an +existence check rather than a truth check. + +However, it isn't at all clear that the potential consistency benefits gained +for either suggestion would justify the additional disruption, so they've +currently been omitted from the proposal. + + +Defining expected invariant relations between ``__bool__`` and ``__exists__`` +----------------------------------------------------------------------------- + +The PEP currently leaves the definition of ``__bool__`` on all existing types +unmodified, which ensures the entire proposal remains backwards compatible, +but results in the following cases where ``bool(obj)`` returns ``True``, but +the proposed ``operator.exists(obj)`` would return ``False``: + +* ``NaN`` values for ``float``, ``complex``, and ``decimal.Decimal`` +* ``Ellipsis`` +* ``NotImplemented`` + +The main argument for potentially changing these is that it becomes easier to +reason about potential code behaviour if we have a recommended invariant in +place saying that values which indicate they don't exist in an existence +checking context should also report themselves as being ``False`` in a truth +checking context. + +Failing to define such an invariant would lead to arguably odd outcomes like +``float("NaN") ?else 0.0`` returning ``0.0`` while ``float("NaN") or 0.0`` +returns ``NaN``. + + Limitations =========== Arbitrary sentinel objects -------------------------- -This proposal doesn't currently attempt to provide syntactic support for the -"sentinel object" idiom, where ``None`` is a permitted explicit value, so a +This proposal doesn't attempt to provide syntactic support for the "sentinel +object" idiom, where ``None`` is a permitted explicit value, so a separate sentinel object is defined to indicate missing values:: _SENTINEL = object() @@ -306,8 +544,8 @@ Given that change, the sentinel object idiom could be rewritten as:: def f(obj=Maybe.SENTINEL): return Maybe(obj) ?else default_value() -However, I don't think cases where none of the 3 standard sentinel values (i.e. -``None``, ``Ellipsis`` and ``NotImplemented``) can be used are going to be +However, I don't think cases where the 3 proposed standard sentinel values (i.e. +``None``, ``Ellipsis`` and ``NotImplemented``) can't be used are going to be anywhere near common enough for the additional protocol complexity and the loss of symmetry between ``__bool__`` and ``__exists__`` to be worth it. @@ -318,7 +556,8 @@ Specification The Abstract already gives the gist of the proposal and the Rationale gives some specific examples. If there's enough interest in the basic idea, then a full specification will need to provide a precise correspondence between the -proposed syntactic sugar and the underlying conditional expressions. +proposed syntactic sugar and the underlying conditional expressions that is +sufficient to guide the creation of a reference implementation. ...TBD... @@ -346,13 +585,15 @@ References .. [2] Wikipedia: Null coalescing operator (https://en.wikipedia.org/wiki/Null_coalescing_operator) +.. [3] FileFormat.info: Unicode Character 'THERE EXISTS' (U+2203) + (http://www.fileformat.info/info/unicode/char/2203/index.htm) Copyright ========= -This document has been placed in the public domain. - +This document has been placed in the public domain under the terms of the +CC0 1.0 license: https://creativecommons.org/publicdomain/zero/1.0/ ..