PEP 531: Existence checking operators

Based on the last round of PEP 505 discussions, I'm seeing a
lot more merit in the general idea, but have some strong opinions
on how to describe the concepts to current and new Python users.

Those opinions are different enough from what's currently in PEP
505 that I think writing a competing PEP is the most useful way to
articulate them.
This commit is contained in:
Nick Coghlan 2016-10-25 17:15:18 +10:00
parent d309d32728
commit 551e80bcd8
1 changed files with 357 additions and 0 deletions

357
pep-0531.txt Normal file
View File

@ -0,0 +1,357 @@
PEP: 531
Title: Existence checking operators
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 25-Oct-2016
Python-Version: 3.7
Abstract
========
Inspired by PEP 505 and the related discussions, this PEP proposes the addition
of two new logical operators to Python:
* Existence-checking fallback: ``expr1 ?else expr2``
* Existence-checking precondition: ``expr1 ?and expr2``
as well as the following abbreviations for common existence checking
expressions and statements:
* Existence-checking attribute access: ``obj?.attr`` (for ``obj ?and obj.attr``)
* Existence-checking subscripting: ``obj?[expr]`` (for ``obj ?and obj[expr]``)
* Existence-checking assignment: ``target ?= expr``
These expressions will be defined in terms of a new "existence" protocol,
accessible as ``operator.exists``, with the following characteristics:
* types can define a new ``__exists__`` magic method (Python) or
``tp_exists`` slot (C) to override the default behaviour. This optional
method has the same signature and possible return values as ``__bool__``.
* ``operator.exists(None)`` returns ``False``
* ``operator.exists(NotImplemented)`` returns ``False``
* ``operator.exists(Ellipsis)`` returns ``False``
* Python's builtin and standard library numeric types will override the
existence check such that ``NaN`` values return ``False`` and other
values return ``True``
* for any other type, ``operator.exists(obj)`` returns True by default. Most
importantly, values that evaluate to False in a boolean context (zeroes,
empty containers) evaluate to True in an existence checking context
Relationship with other PEPs
============================
While this PEP was inspired by and builds on Mark Haase's excellent work in
putting together PEP 505, it ultimately competes with that PEP due to
differences in the specifics of the proposed syntax and semantics for the
feature.
It also presents a different perspective on the rationale for the change by
focusing on the benefits to existing Python users as the typical demands of
application and service development activities are genuinely changing. It
isn't an accident that similar features are now appearing in multiple
programming languages, and it's a good idea for us to learn from how other
language designers are handling the problem, but precedents set elsewhere
are more relevant to *how* we would go about tackling this problem than they
are to whether or not we think it's a problem we should address in the first
place.
Rationale
=========
Existence checking expressions
------------------------------
An increasingly common requirement in modern software development is the need
to work with "semi-structured data": data where the structure of the data is
known in advance, but pieces of it may be missing at runtime, and the software
manipulating that data is expected to degrade gracefully (e.g. by omitting
results that depend on the missing data) rather than failing outright.
Some particularly common cases where this issue arises are:
- handling optional application configuration settings and function parameters
- handling external service failures in distributed systems
- handling data sets that include some partial records
It is the latter two cases that are the primary motivation for this PEP - while
needing to deal with optional configuration settings and parameters is a design
requirement at least as old as Python itself, the rise of public cloud
infrastructure, the development of software systems as collaborative networks
of distributed services, and the availability of large public and private data
sets for analysis means that the ability to degrade operations gracefully in
the face of partial service failures or partial data availability is becoming
an essential feature of modern programming environments.
At the moment, writing such software in Python can be genuinely awkward, as
your code ends up littered with expressions like:
* ``value = expr.field.of.interest if expr is not None else None``
* ``value = expr["field"]["of"]["interest"] if expr is not None else None``
* ``value = expr1 if expr1 is not None else expr2 if expr2 is not None else expr3``
If these are only occasional, then expanding out to full statement forms may
help improve readability, but if you have 4 or 5 of them in a row (which is a
fairly common situation in data transformation pipelines), then replacing them
with 16 or 20 lines of conditional logic really doesn't help matters.
The combined impact of the proposals in this PEP is to allow the above sample
expressions to instead be written as:
* ``value = expr?.field.of.interest``
* ``value = expr?["field"]["of"]["interest"]``
* ``value = expr1 ?else expr2 ?else expr3``
In the first two examples, the 30 character boilerplate clause
`` if expr is not None else None`` (minimally 27 characters for a single letter
variable name) has been replaced by a single ``?`` character, substantially
improving the signal-to-pattern-noise ratio of the lines (especially if it
encourages the use of more meaningful variable and field names rather than
making them shorter purely for the sake of expression brevity).
In the last example, two instances of the 21 character boilerplate,
`` if expr1 is not None`` (minimally 17 characters) are replaced with single
characters, again substantially improving the signal-to-pattern-noise ratio.
The existence checking precondition operator is mainly defined to provide a
common conceptual basis for the existence checking attribute access and
subscripting operators:
* ``obj?.attr`` is roughly equivalent to ``obj ?and obj.attr``
* ``obj?[expr]``is roughly equivalent to ``obj ?and obj[expr]``
The main semantic difference between the shorthand forms and their expanded
equivalents is that the common subexpression to the left of the existence
checking operator is evaluated only once in the shorthand form (similar to
the benefit offered by augmented assignment statements).
Existence checking assignment
-----------------------------
Existence-checking assignment is proposed as a relatively straightforward
expansion of the concepts in this PEP to also cover the common configuration
handling idiom:
* ``value = value if value is not None else expensive_default()``
to instead be abbreviated as:
* ``value ?= expensive_default()``
This is mainly beneficial when the target is a subscript operation or
subattribute
Even without this specific change, the PEP would still
permit this idiom to be updated to:
* ``value = value ?else expensive_default()``
Existence checking protocol
---------------------------
The existence checking protocol is including in this proposal primarily to
allow for proxy objects (e.g. local representations of remote resources) and
mock objects used in testing to correctly indicate non-existence of target
resources, even though the proxy or mock object itself is not None.
However, with that protocol defined, it then seems natural to expand it to
provide a type independent way of checking for ``NaN`` values in numeric types
- at the moment you need to be aware of the exact data type you're working with
(e.g. builtin floats, builtin complex numbers, the decimal module) and use the
appropriate operation (e.g. ``math.isnan``, ``cmath.isnan``,
``decimal.getcontext().is_nan()``, respectively)
Similarly, it seems reasonable to declare that the other placeholder builtin
singletons, ``Ellipsis`` and ``NotImplemented``, also qualify as objects that
represent the absence of data moreso than they represent data.
Proposed syntax
---------------
Without a mathematical precedent to draw on (as Python historically has for
other operations), the proposed use of ``?`` as the key syntactic marker for
this feature is primarily derived from the corresponding syntax in other
languages that offer similar features.
Drawing from the excellent summary in PEP 505 and the Wikipedia articles on
the "safe navigation operator [1] and the "null coalescing operator" [2],
we see:
* The ``?.`` existence checking attribute access syntax precisely aligns with:
* the "safe navigation" attribute access operator in C# (``?.``)
* the "optional chaining" operator in Swift (``?.``)
* the "safe navigation" attribute access operator in Groovy (``?.``)
* the "conditional member access" operator in Dart (``?.``)
* The ``?[]`` existence checking attribute access syntax precisely aligns with:
* the "safe navigation" attribute access operator in C# (``?[]``)
* the "optional subscript" operator in Swift (``?[].``)
* The ``?else`` existence checking fallback syntax semantically aligns with:
* the "null-coalescing" operator in C# (``??``)
* the "null-coalescing" operator in PHP (``??``)
* the "nil-coalescing" operator in Swift (``??``)
To be clear, these aren't the only spelling of these operators used in other
languages, but they're the most common ones, and the ``?`` is far and away
the most common syntactic marker (presumably prompted by the use of ``?`` in
C-style conditional expressions, which many of these languages also offer).
Risks and concerns
==================
Readability
-----------
Learning to read and write the new syntax effectively mainly requires
internalising two concepts:
* expressions containing ``?`` will return None if their input is None
* if ``None`` or another "non-existent" value is an expected input, and the
correct handling is to propagate that to the result, then the existence
checking operators are likely what you want
Currently, these concepts aren't explicitly represented at the language level,
so it's a matter of learning to recognise and use the various idiomatic
patterns based on conditional expressions and statements.
Magic syntax
------------
There's nothing about ``?`` as a syntactic element that inherently suggests
``is not None`` or ``operator.exists``. The main current use of ``?`` as a
symbol in Python code is as a trailing suffix in IPython environments to
request help information for the result of the preceding expression.
However, the notion of existence checking really does make the most sense
as a modifier on existing operators (aside from the proposed spelling of
the fallback operator as ``?else`` rather than ``?or``), and that calls for
a single-character symbolic syntax if we're going to do it at all.
Conceptual complexity
---------------------
This proposal takes the currently ad hoc and informal concept of "existence
checking" and elevates it to the status of being a syntactic language feature
with a clearly defined operator protocol.
In many ways, this should actually *reduce* the overall conceptual complexity
of the language, as many more expectations will map correctly between truth
checking with ``bool(expr)`` and existence checking with
``operator.exists(expr)`` than currently map between truth checking and
existence checking with ``expr is not None`` (or ``expr is not NotImplemented``
in the context of operand coercion).
As a simple example of the new parallels introduced by this PEP, compare::
all_are_true = all(map(bool, iterable))
at_least_one_is_true = any(map(bool, iterable))
all_exist = all(map(operator.exists, iterable))
at_least_one_exists = any(map(operator.exists, iterable))
Limitations
===========
Arbitrary sentinel objects
--------------------------
This proposal doesn't currently attempt to provide syntactic support for the
"sentinel object" idiom, where ``None`` is a permitted explicit value, so a
separate sentinel object is defined to indicate missing values::
_SENTINEL = object()
def f(obj=_SENTINEL):
return obj if obj is not _SENTINEL else default_value()
This could potentially be supported at the expense of making the existence
protocol definition significantly more complex, both to define and to use:
* at the Python layer, ``operator.exists`` and ``__exists__`` implementations
would return the empty tuple to indicate non-existence, and otherwise return
a singleton tuple containing a reference to the object to be used as the
result of the existence check
* at the C layer, ``tp_exists`` implementations would return NULL to indicate
non-existence, and otherwise return a `PyObject *` pointer as the
result of the existence check
Given that change, the sentinel object idiom could be rewritten as::
class Maybe:
SENTINEL = object()
def __init__(self, value):
self._result = (value,) is value is not self.SENTINEL else ()
def __exists__(self):
return self._result
def f(obj=Maybe.SENTINEL):
return Maybe(obj) ?else default_value()
However, I don't think cases where none of the 3 standard sentinel values (i.e.
``None``, ``Ellipsis`` and ``NotImplemented``) can be used are going to be
anywhere near common enough for the additional protocol complexity and the loss
of symmetry between ``__bool__`` and ``__exists__`` to be worth it.
Specification
=============
The Abstract already gives the gist of the proposal and the Rationale gives
some specific examples. If there's enough interest in the basic idea, then a
full specification will need to provide a precise correspondence between the
proposed syntactic sugar and the underlying conditional expressions.
...TBD...
Implementation
==============
As with PEP 505, actual implementation has been deferred pending in-principle
interest in the idea of adding these operators - the implementation isn't
the hard part of these proposals, the hard part is deciding whether or not
this is a change where the long term benefits for new and existing Python users
outweigh the short term costs involved in the wider ecosystem (including
developers of other implementations, language curriculum developers, and
authors of other Python related educational material) adjusting to the change.
...TBD...
References
==========
.. [1] Wikipedia: Safe navigation operator
(https://en.wikipedia.org/wiki/Safe_navigation_operator)
.. [2] Wikipedia: Null coalescing operator
(https://en.wikipedia.org/wiki/Null_coalescing_operator)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: