python-peps/peps/pep-0661.rst

454 lines
18 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 661
Title: Sentinel Values
Author: Tal Einat <tal@python.org>
Discussions-To: https://discuss.python.org/t/pep-661-sentinel-values/9126
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 06-Jun-2021
Post-History: `20-May-2021 <https://discuss.python.org/t/sentinel-values-in-the-stdlib/8810>`__, `06-Jun-2021 <https://discuss.python.org/t/pep-661-sentinel-values/9126>`__
TL;DR: See the `Specification`_ and `Reference Implementation`_.
Abstract
========
Unique placeholder values, commonly known as "sentinel values", are common in
programming. They have many uses, such as for:
* Default values for function arguments, for when a value was not given::
def foo(value=None):
...
* Return values from functions when something is not found or unavailable::
>>> "abc".find("d")
-1
* Missing data, such as NULL in relational databases or "N/A" ("not
available") in spreadsheets
Python has the special value ``None``, which is intended to be used as such
a sentinel value in most cases. However, sometimes an alternative sentinel
value is needed, usually when it needs to be distinct from ``None`` since
``None`` is a valid value in that context. Such cases are common enough that
several idioms for implementing such sentinels have arisen over the years, but
uncommon enough that there hasn't been a clear need for standardization.
However, the common implementations, including some in the stdlib, suffer from
several significant drawbacks.
This PEP proposes adding a utility for defining sentinel values, to be used
in the stdlib and made publicly available as part of the stdlib.
Note: Changing all existing sentinels in the stdlib to be implemented this
way is not deemed necessary, and whether to do so is left to the discretion
of the maintainers.
Motivation
==========
In May 2021, a question was brought up on the python-dev mailing list
[1]_ about how to better implement a sentinel value for
``traceback.print_exception``. The existing implementation used the
following common idiom::
_sentinel = object()
However, this object has an uninformative and overly verbose repr, causing the
function's signature to be overly long and hard to read::
>>> help(traceback.print_exception)
Help on function print_exception in module traceback:
print_exception(exc, /, value=<object object at
0x000002825DF09650>, tb=<object object at 0x000002825DF09650>,
limit=None, file=None, chain=True)
Additionally, two other drawbacks of many existing sentinels were brought up
in the discussion:
1. Some do not have a distinct type, hence it is impossible to define clear
type signatures for functions with such sentinels as default values.
2. They behave unexpectedly after being copied or unpickled, due to a separate
instance being created and thus comparisons using ``is`` failing.
In the ensuing discussion, Victor Stinner supplied a list of currently used
sentinel values in the Python standard library [2]_. This showed that the
need for sentinels is fairly common, that there are various implementation
methods used even within the stdlib, and that many of these suffer from at
least one of the three above drawbacks.
The discussion did not lead to any clear consensus on whether a standard
implementation method is needed or desirable, whether the drawbacks mentioned
are significant, nor which kind of implementation would be good. The author
of this PEP created an issue on bugs.python.org (now a GitHub issue [3]_)
suggesting options for improvement, but that focused on only a single
problematic aspect of a few cases, and failed to gather any support.
A poll [4]_ was created on discuss.python.org to get a clearer sense of
the community's opinions. After nearly two weeks, significant further,
discussion, and 39 votes, the poll's results were not conclusive. 40% had
voted for "The status-quo is fine / theres no need for consistency in
this", but most voters had voted for one or more standardized solutions.
Specifically, 37% of the voters chose "Consistent use of a new, dedicated
sentinel factory / class / meta-class, also made publicly available in the
stdlib".
With such mixed opinions, this PEP was created to facilitate making a decision
on the subject.
While working on this PEP, iterating on various options and implementations
and continuing discussions, the author has come to the opinion that a simple,
good implementation available in the standard library would be worth having,
both for use in the standard library itself and elsewhere.
Rationale
=========
The criteria guiding the chosen implementation were:
1. The sentinel objects should behave as expected by a sentinel object: When
compared using the ``is`` operator, it should always be considered
identical to itself but never to any other object.
2. Creating a sentinel object should be a simple, straightforward one-liner.
3. It should be simple to define as many distinct sentinel values as needed.
4. The sentinel objects should have a clear and short repr.
5. It should be possible to use clear type signatures for sentinels.
6. The sentinel objects should behave correctly after copying and/or
unpickling.
7. Such sentinels should work when using CPython 3.x and PyPy3, and ideally
also with other implementations of Python.
8. As simple and straightforward as possible, in implementation and especially
in use. Avoid this becoming one more special thing to learn when learning
Python. It should be easy to find and use when needed, and obvious enough
when reading code that one would normally not feel a need to look up its
documentation.
With so many uses in the Python standard library [2]_, it would be useful to
have an implementation in the standard library, since the stdlib cannot use
implementations of sentinel objects available elsewhere (such as the
``sentinels`` [5]_ or ``sentinel`` [6]_ PyPI packages).
After researching existing idioms and implementations, and going through many
different possible implementations, an implementation was written which meets
all of these criteria (see `Reference Implementation`_).
Specification
=============
A new ``Sentinel`` class will be added to a new ``sentinels`` module.
Its initializer will accept a single required argument, the name of the
sentinel object, and three optional arguments: the repr of the object, its
boolean value, and the name of its module::
>>> from sentinels import Sentinel
>>> NotGiven = Sentinel('NotGiven')
>>> NotGiven
<NotGiven>
>>> MISSING = Sentinel('MISSING', repr='mymodule.MISSING')
>>> MISSING
mymodule.MISSING
>>> MEGA = Sentinel('MEGA',
repr='<MEGA>',
bool_value=False,
module_name='mymodule')
<MEGA>
Checking if a value is such a sentinel *should* be done using the ``is``
operator, as is recommended for ``None``. Equality checks using ``==`` will
also work as expected, returning ``True`` only when the object is compared
with itself. Identity checks such as ``if value is MISSING:`` should usually
be used rather than boolean checks such as ``if value:`` or ``if not value:``.
Sentinel instances are truthy by default, unlike ``None``. This parallels the
default for arbitrary classes, as well as the boolean value of ``Ellipsis``.
The names of sentinels are unique within each module. When calling
``Sentinel()`` in a module where a sentinel with that name was already
defined, the existing sentinel with that name will be returned. Sentinels
with the same name in different modules will be distinct from each other.
Creating a copy of a sentinel object, such as by using ``copy.copy()`` or by
pickling and unpickling, will return the same object.
Type annotations for sentinel values should use ``Literal[<sentinel_object>]``.
For example::
def foo(value: int | Literal[MISSING] = MISSING) -> int:
...
The ``module_name`` optional argument should normally not need to be supplied,
as ``Sentinel()`` will usually be able to recognize the module in which it was
called. ``module_name`` should be supplied only in unusual cases when this
automatic recognition does not work as intended, such as perhaps when using
Jython or IronPython. This parallels the designs of ``Enum`` and
``namedtuple``. For more details, see :pep:`435`.
The ``Sentinel`` class may not be sub-classed, to avoid overly-clever uses
based on it, such as attempts to use it as a base for implementing singletons.
It is considered important that the addition of Sentinel to the stdlib should
add minimal complexity.
Ordering comparisons are undefined for sentinel objects.
Backwards Compatibility
=======================
While not breaking existing code, adding a new "sentinels" stdlib module could
cause some confusion with regard to existing modules named "sentinels", and
specifically with the "sentinels" package on PyPI.
The existing "sentinels" package on PyPI [10]_ appears to be abandoned, with
the latest release being made on Aug. 2016. Therefore, using this name for a
new stdlib module seems reasonable.
If and when this PEP is accepted, it may be worth verifying if this has indeed
been abandoned, and if so asking to transfer ownership to the CPython
maintainers to reduce the potential for confusion with the new stdlib module.
How to Teach This
=================
The normal types of documentation of new stdlib modules and features, namely
doc-strings, module docs and a section in "What's New", should suffice.
Security Implications
=====================
This proposal should have no security implications.
Reference Implementation
========================
The reference implementation is found in a dedicated GitHub repo [7]_. A
simplified version follows::
_registry = {}
class Sentinel:
"""Unique sentinel values."""
def __new__(cls, name, repr=None, bool_value=True, module_name=None):
name = str(name)
repr = str(repr) if repr else f'<{name.split(".")[-1]}>'
bool_value = bool(bool_value)
if module_name is None:
try:
module_name = \
sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
module_name = __name__
registry_key = f'{module_name}-{name}'
sentinel = _registry.get(registry_key, None)
if sentinel is not None:
return sentinel
sentinel = super().__new__(cls)
sentinel._name = name
sentinel._repr = repr
sentinel._bool_value = bool_value
sentinel._module_name = module_name
return _registry.setdefault(registry_key, sentinel)
def __repr__(self):
return self._repr
def __bool__(self):
return self._bool_value
def __reduce__(self):
return (
self.__class__,
(
self._name,
self._repr,
self._module_name,
),
)
Rejected Ideas
==============
Use ``NotGiven = object()``
---------------------------
This suffers from all of the drawbacks mentioned in the `Rationale`_ section.
Add a single new sentinel value, such as ``MISSING`` or ``Sentinel``
--------------------------------------------------------------------
Since such a value could be used for various things in various places, one
could not always be confident that it would never be a valid value in some use
cases. On the other hand, a dedicated and distinct sentinel value can be used
with confidence without needing to consider potential edge-cases.
Additionally, it is useful to be able to provide a meaningful name and repr
for a sentinel value, specific to the context where it is used.
Finally, this was a very unpopular option in the poll [4]_, with only 12%
of the votes voting for it.
Use the existing ``Ellipsis`` sentinel value
--------------------------------------------
This is not the original intended use of Ellipsis, though it has become
increasingly common to use it to define empty class or function blocks instead
of using ``pass``.
Also, similar to a potential new single sentinel value, ``Ellipsis`` can't be
as confidently used in all cases, unlike a dedicated, distinct value.
Use a single-valued enum
------------------------
The suggested idiom is::
class NotGivenType(Enum):
NotGiven = 'NotGiven'
NotGiven = NotGivenType.NotGiven
Besides the excessive repetition, the repr is overly long:
``<NotGivenType.NotGiven: 'NotGiven'>``. A shorter repr can be defined, at
the expense of a bit more code and yet more repetition.
Finally, this option was the least popular among the nine options in the
poll [4]_, being the only option to receive no votes.
A sentinel class decorator
--------------------------
The suggested idiom is::
@sentinel(repr='<NotGiven>')
class NotGivenType: pass
NotGiven = NotGivenType()
While this allows for a very simple and clear implementation of the decorator,
the idiom is too verbose, repetitive, and difficult to remember.
Using class objects
-------------------
Since classes are inherently singletons, using a class as a sentinel value
makes sense and allows for a simple implementation.
The simplest version of this is::
class NotGiven: pass
To have a clear repr, one would need to use a meta-class::
class NotGiven(metaclass=SentinelMeta): pass
... or a class decorator::
@Sentinel
class NotGiven: pass
Using classes this way is unusual and could be confusing. The intention of
code would be hard to understand without comments. It would also cause
such sentinels to have some unexpected and undesirable behavior, such as
being callable.
Define a recommended "standard" idiom, without supplying an implementation
--------------------------------------------------------------------------
Most common existing idioms have significant drawbacks. So far, no idiom
has been found that is clear and concise while avoiding these drawbacks.
Also, in the poll [4]_ on this subject, the options for recommending an
idiom were unpopular, with the highest-voted option being voted for by only
25% of the voters.
Additional Notes
================
* This PEP and the initial implementation are drafted in a dedicated GitHub
repo [7]_.
* For sentinels defined in a class scope, to avoid potential name clashes,
one should use the fully-qualified name of the variable in the module. Only
the part of the name after the last period will be used for the default
repr. For example::
>>> class MyClass:
... NotGiven = sentinel('MyClass.NotGiven')
>>> MyClass.NotGiven
<NotGiven>
* One should be careful when creating sentinels in a function or method, since
sentinels with the same name created by code in the same module will be
identical. If distinct sentinel objects are needed, make sure to use
distinct names.
* There is no single desirable value for the "truthiness" of sentinels, i.e.
their boolean value. It is sometimes useful for the boolean value to be
``True``, and sometimes ``False``. Of the built-in sentinels in Python,
``None`` evaluates to ``False``, while ``Ellipsis`` (a.k.a. ``...``)
evaluates to ``True``. The desire for this to be set as needed came up in
discussions as well.
* The boolean value of ``NotImplemented`` is ``True``, but using this is
deprecated since Python 3.9 (doing so generates a deprecation warning.)
This deprecation is due to issues specific to ``NotImplemented``, as
described in bpo-35712 [8]_.
* To define multiple, related sentinel values, possibly with a defined
ordering among them, one should instead use ``Enum`` or something similar.
* There was a discussion on the typing-sig mailing list [9]_ about the typing
for these sentinels, where different options were discussed.
Open Issues
===========
* **Is adding a new stdlib module the right way to go?** I could not find any
existing module which seems like a logical place for this. However, adding
new stdlib modules should be done judiciously, so perhaps choosing an
existing module would be preferable even if it is not a perfect fit?
Footnotes
=========
.. [1] Python-Dev mailing list: `The repr of a sentinel <https://mail.python.org/archives/list/python-dev@python.org/thread/ZLVPD2OISI7M4POMTR2FCQTE6TPMPTO3/>`_
.. [2] Python-Dev mailing list: `"The stdlib contains tons of sentinels" <https://mail.python.org/archives/list/python-dev@python.org/message/JBYXQH3NV3YBF7P2HLHB5CD6V3GVTY55/>`_
.. [3] `bpo-44123: Make function parameter sentinel values true singletons <https://github.com/python/cpython/issues/88289>`_
.. [4] discuss.python.org Poll: `Sentinel Values in the Stdlib <https://discuss.python.org/t/sentinel-values-in-the-stdlib/8810/>`_
.. [5] `The "sentinels" package on PyPI <https://pypi.org/project/sentinels/>`_
.. [6] `The "sentinel" package on PyPI <https://pypi.org/project/sentinel/>`_
.. [7] `Reference implementation at the taleinat/python-stdlib-sentinels GitHub repo <https://github.com/taleinat/python-stdlib-sentinels>`_
.. [8] `bpo-35712: Make NotImplemented unusable in boolean context <https://github.com/python/cpython/issues/79893>`_
.. [9] `Discussion thread about type signatures for these sentinels on the typing-sig mailing list <https://mail.python.org/archives/list/typing-sig@python.org/thread/NDEJ7UCDPINP634GXWDARVMTGDVSNBKV/#LVCPTY26JQJW7NKGKGAZXHQKWVW7GOGL>`_
.. [10] `sentinels package on PyPI <https://pypi.org/project/sentinels/>`_
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.