From 551e80bcd8b7133e0e7b9a357b3fd4a431ffa156 Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Tue, 25 Oct 2016 17:15:18 +1000 Subject: [PATCH] PEP 531: Existence checking operators Based on the last round of PEP 505 discussions, I'm seeing a lot more merit in the general idea, but have some strong opinions on how to describe the concepts to current and new Python users. Those opinions are different enough from what's currently in PEP 505 that I think writing a competing PEP is the most useful way to articulate them. --- pep-0531.txt | 357 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 357 insertions(+) create mode 100644 pep-0531.txt diff --git a/pep-0531.txt b/pep-0531.txt new file mode 100644 index 000000000..74993d730 --- /dev/null +++ b/pep-0531.txt @@ -0,0 +1,357 @@ +PEP: 531 +Title: Existence checking operators +Version: $Revision$ +Last-Modified: $Date$ +Author: Nick Coghlan +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 25-Oct-2016 +Python-Version: 3.7 + +Abstract +======== + +Inspired by PEP 505 and the related discussions, this PEP proposes the addition +of two new logical operators to Python: + +* Existence-checking fallback: ``expr1 ?else expr2`` +* Existence-checking precondition: ``expr1 ?and expr2`` + +as well as the following abbreviations for common existence checking +expressions and statements: + +* Existence-checking attribute access: ``obj?.attr`` (for ``obj ?and obj.attr``) +* Existence-checking subscripting: ``obj?[expr]`` (for ``obj ?and obj[expr]``) +* Existence-checking assignment: ``target ?= expr`` + +These expressions will be defined in terms of a new "existence" protocol, +accessible as ``operator.exists``, with the following characteristics: + + * types can define a new ``__exists__`` magic method (Python) or + ``tp_exists`` slot (C) to override the default behaviour. This optional + method has the same signature and possible return values as ``__bool__``. + * ``operator.exists(None)`` returns ``False`` + * ``operator.exists(NotImplemented)`` returns ``False`` + * ``operator.exists(Ellipsis)`` returns ``False`` + * Python's builtin and standard library numeric types will override the + existence check such that ``NaN`` values return ``False`` and other + values return ``True`` + * for any other type, ``operator.exists(obj)`` returns True by default. Most + importantly, values that evaluate to False in a boolean context (zeroes, + empty containers) evaluate to True in an existence checking context + + +Relationship with other PEPs +============================ + +While this PEP was inspired by and builds on Mark Haase's excellent work in +putting together PEP 505, it ultimately competes with that PEP due to +differences in the specifics of the proposed syntax and semantics for the +feature. + +It also presents a different perspective on the rationale for the change by +focusing on the benefits to existing Python users as the typical demands of +application and service development activities are genuinely changing. It +isn't an accident that similar features are now appearing in multiple +programming languages, and it's a good idea for us to learn from how other +language designers are handling the problem, but precedents set elsewhere +are more relevant to *how* we would go about tackling this problem than they +are to whether or not we think it's a problem we should address in the first +place. + + +Rationale +========= + +Existence checking expressions +------------------------------ + +An increasingly common requirement in modern software development is the need +to work with "semi-structured data": data where the structure of the data is +known in advance, but pieces of it may be missing at runtime, and the software +manipulating that data is expected to degrade gracefully (e.g. by omitting +results that depend on the missing data) rather than failing outright. + +Some particularly common cases where this issue arises are: + +- handling optional application configuration settings and function parameters +- handling external service failures in distributed systems +- handling data sets that include some partial records + +It is the latter two cases that are the primary motivation for this PEP - while +needing to deal with optional configuration settings and parameters is a design +requirement at least as old as Python itself, the rise of public cloud +infrastructure, the development of software systems as collaborative networks +of distributed services, and the availability of large public and private data +sets for analysis means that the ability to degrade operations gracefully in +the face of partial service failures or partial data availability is becoming +an essential feature of modern programming environments. + +At the moment, writing such software in Python can be genuinely awkward, as +your code ends up littered with expressions like: + +* ``value = expr.field.of.interest if expr is not None else None`` +* ``value = expr["field"]["of"]["interest"] if expr is not None else None`` +* ``value = expr1 if expr1 is not None else expr2 if expr2 is not None else expr3`` + +If these are only occasional, then expanding out to full statement forms may +help improve readability, but if you have 4 or 5 of them in a row (which is a +fairly common situation in data transformation pipelines), then replacing them +with 16 or 20 lines of conditional logic really doesn't help matters. + +The combined impact of the proposals in this PEP is to allow the above sample +expressions to instead be written as: + +* ``value = expr?.field.of.interest`` +* ``value = expr?["field"]["of"]["interest"]`` +* ``value = expr1 ?else expr2 ?else expr3`` + +In the first two examples, the 30 character boilerplate clause +`` if expr is not None else None`` (minimally 27 characters for a single letter +variable name) has been replaced by a single ``?`` character, substantially +improving the signal-to-pattern-noise ratio of the lines (especially if it +encourages the use of more meaningful variable and field names rather than +making them shorter purely for the sake of expression brevity). + +In the last example, two instances of the 21 character boilerplate, +`` if expr1 is not None`` (minimally 17 characters) are replaced with single +characters, again substantially improving the signal-to-pattern-noise ratio. + +The existence checking precondition operator is mainly defined to provide a +common conceptual basis for the existence checking attribute access and +subscripting operators: + +* ``obj?.attr`` is roughly equivalent to ``obj ?and obj.attr`` +* ``obj?[expr]``is roughly equivalent to ``obj ?and obj[expr]`` + +The main semantic difference between the shorthand forms and their expanded +equivalents is that the common subexpression to the left of the existence +checking operator is evaluated only once in the shorthand form (similar to +the benefit offered by augmented assignment statements). + + +Existence checking assignment +----------------------------- + +Existence-checking assignment is proposed as a relatively straightforward +expansion of the concepts in this PEP to also cover the common configuration +handling idiom: + +* ``value = value if value is not None else expensive_default()`` + +to instead be abbreviated as: + +* ``value ?= expensive_default()`` + +This is mainly beneficial when the target is a subscript operation or +subattribute + +Even without this specific change, the PEP would still +permit this idiom to be updated to: + +* ``value = value ?else expensive_default()`` + + +Existence checking protocol +--------------------------- + +The existence checking protocol is including in this proposal primarily to +allow for proxy objects (e.g. local representations of remote resources) and +mock objects used in testing to correctly indicate non-existence of target +resources, even though the proxy or mock object itself is not None. + +However, with that protocol defined, it then seems natural to expand it to +provide a type independent way of checking for ``NaN`` values in numeric types +- at the moment you need to be aware of the exact data type you're working with +(e.g. builtin floats, builtin complex numbers, the decimal module) and use the +appropriate operation (e.g. ``math.isnan``, ``cmath.isnan``, +``decimal.getcontext().is_nan()``, respectively) + +Similarly, it seems reasonable to declare that the other placeholder builtin +singletons, ``Ellipsis`` and ``NotImplemented``, also qualify as objects that +represent the absence of data moreso than they represent data. + +Proposed syntax +--------------- + +Without a mathematical precedent to draw on (as Python historically has for +other operations), the proposed use of ``?`` as the key syntactic marker for +this feature is primarily derived from the corresponding syntax in other +languages that offer similar features. + +Drawing from the excellent summary in PEP 505 and the Wikipedia articles on +the "safe navigation operator [1] and the "null coalescing operator" [2], +we see: + +* The ``?.`` existence checking attribute access syntax precisely aligns with: + + * the "safe navigation" attribute access operator in C# (``?.``) + * the "optional chaining" operator in Swift (``?.``) + * the "safe navigation" attribute access operator in Groovy (``?.``) + * the "conditional member access" operator in Dart (``?.``) + +* The ``?[]`` existence checking attribute access syntax precisely aligns with: + + * the "safe navigation" attribute access operator in C# (``?[]``) + * the "optional subscript" operator in Swift (``?[].``) + +* The ``?else`` existence checking fallback syntax semantically aligns with: + + * the "null-coalescing" operator in C# (``??``) + * the "null-coalescing" operator in PHP (``??``) + * the "nil-coalescing" operator in Swift (``??``) + +To be clear, these aren't the only spelling of these operators used in other +languages, but they're the most common ones, and the ``?`` is far and away +the most common syntactic marker (presumably prompted by the use of ``?`` in +C-style conditional expressions, which many of these languages also offer). + + +Risks and concerns +================== + +Readability +----------- + +Learning to read and write the new syntax effectively mainly requires +internalising two concepts: + +* expressions containing ``?`` will return None if their input is None +* if ``None`` or another "non-existent" value is an expected input, and the + correct handling is to propagate that to the result, then the existence + checking operators are likely what you want + +Currently, these concepts aren't explicitly represented at the language level, +so it's a matter of learning to recognise and use the various idiomatic +patterns based on conditional expressions and statements. + +Magic syntax +------------ + +There's nothing about ``?`` as a syntactic element that inherently suggests +``is not None`` or ``operator.exists``. The main current use of ``?`` as a +symbol in Python code is as a trailing suffix in IPython environments to +request help information for the result of the preceding expression. + +However, the notion of existence checking really does make the most sense +as a modifier on existing operators (aside from the proposed spelling of +the fallback operator as ``?else`` rather than ``?or``), and that calls for +a single-character symbolic syntax if we're going to do it at all. + +Conceptual complexity +--------------------- + +This proposal takes the currently ad hoc and informal concept of "existence +checking" and elevates it to the status of being a syntactic language feature +with a clearly defined operator protocol. + +In many ways, this should actually *reduce* the overall conceptual complexity +of the language, as many more expectations will map correctly between truth +checking with ``bool(expr)`` and existence checking with +``operator.exists(expr)`` than currently map between truth checking and +existence checking with ``expr is not None`` (or ``expr is not NotImplemented`` +in the context of operand coercion). + +As a simple example of the new parallels introduced by this PEP, compare:: + + all_are_true = all(map(bool, iterable)) + at_least_one_is_true = any(map(bool, iterable)) + all_exist = all(map(operator.exists, iterable)) + at_least_one_exists = any(map(operator.exists, iterable)) + + +Limitations +=========== + +Arbitrary sentinel objects +-------------------------- + +This proposal doesn't currently attempt to provide syntactic support for the +"sentinel object" idiom, where ``None`` is a permitted explicit value, so a +separate sentinel object is defined to indicate missing values:: + + _SENTINEL = object() + def f(obj=_SENTINEL): + return obj if obj is not _SENTINEL else default_value() + +This could potentially be supported at the expense of making the existence +protocol definition significantly more complex, both to define and to use: + +* at the Python layer, ``operator.exists`` and ``__exists__`` implementations + would return the empty tuple to indicate non-existence, and otherwise return + a singleton tuple containing a reference to the object to be used as the + result of the existence check +* at the C layer, ``tp_exists`` implementations would return NULL to indicate + non-existence, and otherwise return a `PyObject *` pointer as the + result of the existence check + +Given that change, the sentinel object idiom could be rewritten as:: + + class Maybe: + SENTINEL = object() + def __init__(self, value): + self._result = (value,) is value is not self.SENTINEL else () + def __exists__(self): + return self._result + + def f(obj=Maybe.SENTINEL): + return Maybe(obj) ?else default_value() + +However, I don't think cases where none of the 3 standard sentinel values (i.e. +``None``, ``Ellipsis`` and ``NotImplemented``) can be used are going to be +anywhere near common enough for the additional protocol complexity and the loss +of symmetry between ``__bool__`` and ``__exists__`` to be worth it. + + +Specification +============= + +The Abstract already gives the gist of the proposal and the Rationale gives +some specific examples. If there's enough interest in the basic idea, then a +full specification will need to provide a precise correspondence between the +proposed syntactic sugar and the underlying conditional expressions. + +...TBD... + + +Implementation +============== + +As with PEP 505, actual implementation has been deferred pending in-principle +interest in the idea of adding these operators - the implementation isn't +the hard part of these proposals, the hard part is deciding whether or not +this is a change where the long term benefits for new and existing Python users +outweigh the short term costs involved in the wider ecosystem (including +developers of other implementations, language curriculum developers, and +authors of other Python related educational material) adjusting to the change. + +...TBD... + + +References +========== + +.. [1] Wikipedia: Safe navigation operator + (https://en.wikipedia.org/wiki/Safe_navigation_operator) + +.. [2] Wikipedia: Null coalescing operator + (https://en.wikipedia.org/wiki/Null_coalescing_operator) + + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: