PEP 661: Update draft in anticipation of proposal (#2785)

This commit is contained in:
Tal Einat 2022-09-08 21:02:39 +03:00 committed by GitHub
parent 6c72c58229
commit 7bad8d9f10
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 192 additions and 126 deletions

View File

@ -15,27 +15,43 @@ TL;DR: See the `Specification`_ and `Reference Implementation`_.
Abstract
========
Unique placeholder values, commonly known as "sentinel values", are useful in
Python programs for several things, such as default values for function
arguments where ``None`` is a valid input value. These cases are common
enough for several idioms for implementing such "sentinels" to have arisen
over the years, but uncommon enough that there hasn't been a clear need for
standardization. However, the common implementations, including some in the
stdlib, suffer from several significant drawbacks.
Unique placeholder values, commonly known as "sentinel values", are common in
programming. They have many uses, such as for:
This PEP suggests adding a utility for defining sentinel values, to be used
* Default values for function arguments, for when a value was not given::
def foo(value=None):
...
* Return values from functions when something is not found or unavailable::
>>> "abc".find("d")
-1
* Missing data, such as NULL in relational databases or "N/A" ("not
available") in spreadsheets
Python has the special value ``None``, which is intended to be used as such
a sentinel value in most cases. However, sometimes an alternative sentinel
value is needed, usually when it needs to be distinct from ``None``. These
cases are common enough that several idioms for implementing such sentinels
have arisen over the years, but uncommon enough that there hasn't been a
clear need for standardization. However, the common implementations,
including some in the stdlib, suffer from several significant drawbacks.
This PEP proposes adding a utility for defining sentinel values, to be used
in the stdlib and made publicly available as part of the stdlib.
Note: Changing all existing sentinels in the stdlib to be implemented this
way is not deemed necessary, and whether to do so is left to the discretion
of each maintainer.
of the maintainers.
Motivation
==========
In May 2021, a question was brought up on the `python-dev mailing list
<python-dev-thread_>`__ about how to better implement a sentinel value for
In May 2021, a question was brought up on the python-dev mailing list
[1]_ about how to better implement a sentinel value for
``traceback.print_exception``. The existing implementation used the
following common idiom::
@ -54,22 +70,25 @@ function's signature to be overly long and hard to read::
Additionally, two other drawbacks of many existing sentinels were brought up
in the discussion:
1. Not having a distinct type, hence it being impossible to define strict
type signatures functions with sentinels as default values
1. Not having a distinct type, hence it being impossible to define clear
type signatures for functions with sentinels as default values
2. Incorrect behavior after being copied or unpickled, due to a separate
instance being created and thus comparisons using ``is`` failing
In the ensuing discussion, Victor Stinner supplied a list of currently used
`sentinel values in the Python standard library <list-of-sentinels-in-stdlib_>`__.
This showed that the need for sentinels is fairly common, that there are
various implementation methods used even within the stdlib, and that many of
these suffer from at least one of the aforementioned drawbacks.
sentinel values in the Python standard library [2]_. This showed that the
need for sentinels is fairly common, that there are various implementation
methods used even within the stdlib, and that many of these suffer from at
least one of the three aforementioned drawbacks.
The discussion did not lead to any clear consensus on whether a standard
implementation method is needed or desirable, whether the drawbacks mentioned
are significant, nor which kind of implementation would be good.
are significant, nor which kind of implementation would be good. The author
of this PEP created an issue on bugs.python.org [3]_ suggesting options for
improvement, but that focused on only a single problematic aspect of a few
cases, and failed to gather any support.
A `poll was created on discuss.python.org <poll_>`__ to get a clearer sense of
A poll [4]_ was created on discuss.python.org to get a clearer sense of
the community's opinions. The poll's results were not conclusive, with 40%
voting for "The status-quo is fine / theres no need for consistency in
this", but most voters voting for one or more standardized solutions.
@ -80,6 +99,11 @@ stdlib".
With such mixed opinions, this PEP was created to facilitate making a decision
on the subject.
While working on this PEP, iterating on various options and implementations
and continuing discussions, the author has come to the opinion that a simple,
good implementation available in the standard library would be worth having,
both for use in the standard library itself and elsewhere.
Rationale
=========
@ -87,17 +111,26 @@ Rationale
The criteria guiding the chosen implementation were:
1. The sentinel objects should behave as expected by a sentinel object: When
compared using the ``is`` operator, it should always be considered identical
to itself but never to any other object.
2. It should be simple to define as many distinct sentinel values as needed.
3. The sentinel objects should have a clear and short repr.
4. The sentinel objects should each have a *distinct* type, usable in type
annotations to define *strict* type signatures.
5. The sentinel objects should behave correctly after copying and/or
compared using the ``is`` operator, it should always be considered
identical to itself but never to any other object.
2. Creating a sentinel object should be a simple, straightforward one-liner.
3. It should be simple to define as many distinct sentinel values as needed.
4. The sentinel objects should have a clear and short repr.
5. It should be possible to use clear type signatures for sentinels.
6. The sentinel objects should behave correctly after copying and/or
unpickling.
6. Creating a sentinel object should be a simple, straightforward one-liner.
7. Works using CPython and PyPy3. Will hopefully also work with other
implementations.
7. Such sentinels should work when using CPython 3.x and PyPy3, and ideally
also with other implementations of Python.
8. As simple and straightforward as possible, in implementation and especially
in use. Avoid this becoming one more special thing to learn when learning
Python. It should be easy to find and use when needed, and obvious enough
when reading code that one would normally not feel a need to look up its
documentation.
With so many uses in the Python standard library [2]_, it would be useful to
have an implementation in the standard library, since the stdlib cannot use
implementations of sentinel objects available elsewhere (such as the
``sentinels`` [5]_ or ``sentinel`` [6]_ PyPI packages).
After researching existing idioms and implementations, and going through many
different possible implementations, an implementation was written which meets
@ -107,79 +140,99 @@ all of these criteria (see `Reference Implementation`_).
Specification
=============
A new ``sentinel`` function will be added to a new ``sentinels`` module.
It will accept a single required argument, the name of the sentinel object,
and a single optional argument, the repr of the object.
A new ``Sentinel`` class will be added to a new ``sentinels`` module.
Its initializer will accept a single required argument, the name of the
sentinel object, and two optional arguments: the repr of the object, and the
name of its module::
::
>>> NotGiven = sentinel('NotGiven')
>>> from sentinel import Sentinel
>>> NotGiven = Sentinel('NotGiven')
>>> NotGiven
<NotGiven>
>>> MISSING = sentinel('MISSING', repr='mymodule.MISSING')
>>> MISSING = Sentinel('MISSING', repr='mymodule.MISSING')
>>> MISSING
mymodule.MISSING
>>> MEGA = Sentinel('MEGA', repr='<MEGA>', module_name='mymodule')
<MEGA>
Checking if a value is such a sentinel *should* be done using the ``is``
operator, as is recommended for ``None``. Equality checks using ``==`` will
also work as expected, returning ``True`` only when the object is compared
with itself.
with itself. Identity checks such as ``if value is MISSING:`` should usually
be used rather than boolean checks such as ``if value:`` or ``if not value:``.
Sentinel instances are truthy by default.
The name should be set to the name of the variable used to reference the
object, as in the examples above. Otherwise, the sentinel object won't be
able to survive copying or pickling+unpickling while retaining the above
described behavior. Note, that when defined in a class scope, the name must
be the fully-qualified name of the variable in the module, for example::
The names of sentinels are unique within each module. When calling
``Sentinel()`` in a module where a sentinel with that name was already
defined, the existing sentinel with that name will be returned. Sentinels
with the same name in different modules will be distinct from each other.
class MyClass:
NotGiven = sentinel('MyClass.NotGiven')
Creating a copy of a sentinel object, such as by using ``copy.copy()`` or by
pickling and unpickling, will return the same object.
Type annotations for sentinel values will use `typing.Literal`_.
For example::
Type annotations for sentinel values should use ``Sentinel``. For example::
def foo(value: int | Literal[NotGiven]) -> None:
def foo(value: int | Sentinel = MISSING) -> int:
...
.. _typing.Literal: https://docs.python.org/3/library/typing.html#typing.Literal
The ``module_name`` optional argument should normally not need to be supplied,
as ``Sentinel()`` will usually be able to recognize the module in which it was
called. ``module_name`` should be supplied only in unusual cases when this
automatic recognition does not work as intended, such as perhaps when using
Jython or IronPython. This parallels the designs of ``Enum`` and
``namedtuple``. For more details, see :pep:`435`.
The ``Sentinel`` class may be sub-classed. Instances of each sub-class will
be unique, even if using the same name and module. This allows for
customizing the behavior of sentinels, such as controlling their truthiness.
Reference Implementation
========================
The reference implementation is found in a `dedicated GitHub repo
<reference-github-repo_>`__. A simplified version follows::
The reference implementation is found in a dedicated GitHub repo [7]_. A
simplified version follows::
def sentinel(name, repr=None):
"""Create a unique sentinel object."""
repr = repr or f'<{name}>'
_registry = {}
module = _get_parent_frame().f_globals.get('__name__', '__main__')
class_name = _get_class_name(name, module)
class_namespace = {
'__repr__': lambda self: repr,
}
cls = type(class_name, (), class_namespace)
cls.__module__ = module
_get_parent_frame().f_globals[class_name] = cls
class Sentinel:
"""Unique sentinel values."""
sentinel = cls()
cls.__new__ = lambda cls_: sentinel
def __new__(cls, name, repr=None, module_name=None):
name = str(name)
repr = str(repr) if repr else f'<{name.split(".")[-1]}>'
if module_name is None:
try:
module_name = \
sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
module_name = __name__
return sentinel
registry_key = f'{module_name}-{name}'
def _get_class_name(sentinel_qualname, module_name):
return '__'.join(['_sentinel_type',
module_name.replace('.', '_'),
sentinel_qualname.replace('.', '_')])
sentinel = _registry.get(registry_key, None)
if sentinel is not None:
return sentinel
sentinel = super().__new__(cls)
sentinel._name = name
sentinel._repr = repr
sentinel._module_name = module_name
Note that a dedicated class is created automatically for each sentinel object.
This class is assigned to the namespace of the module from which the
``sentinel()`` call was made, or to that of the ``sentinels`` module itself as
a fallback. These classes have long names comprised of several parts to
ensure their uniqueness. However, these names usually wouldn't be used, since
type annotations should use ``Literal[]`` as described above, and identity
checks should be preferred over type checks.
return _registry.setdefault(registry_key, sentinel)
def __repr__(self):
return self._repr
def __reduce__(self):
return (
self.__class__,
(
self._name,
self._repr,
self._module_name,
),
)
Rejected Ideas
@ -192,8 +245,8 @@ Use ``NotGiven = object()``
This suffers from all of the drawbacks mentioned in the `Rationale`_ section.
Add a single new sentinel value, e.g. ``MISSING`` or ``Sentinel``
-----------------------------------------------------------------
Add a single new sentinel value, such as ``MISSING`` or ``Sentinel``
--------------------------------------------------------------------
Since such a value could be used for various things in various places, one
could not always be confident that it would never be a valid value in some use
@ -203,7 +256,7 @@ with confidence without needing to consider potential edge-cases.
Additionally, it is useful to be able to provide a meaningful name and repr
for a sentinel value, specific to the context where it is used.
Finally, this was a very unpopular option in the `poll <poll_>`__, with only 12%
Finally, this was a very unpopular option in the poll [4]_, with only 12%
of the votes voting for it.
@ -221,9 +274,7 @@ as confidently used in all cases, unlike a dedicated, distinct value.
Use a single-valued enum
------------------------
The suggested idiom is:
::
The suggested idiom is::
class NotGivenType(Enum):
NotGiven = 'NotGiven'
@ -233,23 +284,21 @@ Besides the excessive repetition, the repr is overly long:
``<NotGivenType.NotGiven: 'NotGiven'>``. A shorter repr can be defined, at
the expense of a bit more code and yet more repetition.
Finally, this option was the least popular among the nine options in the `poll
<poll_>`__, being the only option to receive no votes.
Finally, this option was the least popular among the nine options in the
poll [4]_, being the only option to receive no votes.
A sentinel class decorator
--------------------------
The suggested interface:
::
The suggested idiom is::
@sentinel(repr='<NotGiven>')
class NotGivenType: pass
NotGiven = NotGivenType()
While this allowed for a very simple and clear implementation, the interface
is too verbose, repetitive, and difficult to remember.
While this allows for a very simple and clear implementation of the decorator,
the idiom is too verbose, repetitive, and difficult to remember.
Using class objects
@ -258,33 +307,23 @@ Using class objects
Since classes are inherently singletons, using a class as a sentinel value
makes sense and allows for a simple implementation.
The simplest version of this idiom is:
::
The simplest version of this is::
class NotGiven: pass
To have a clear repr, one could define ``__repr__``:
::
class NotGiven:
def __repr__(self):
return '<NotGiven>'
... or use a meta-class:
::
To have a clear repr, one would need to use a meta-class::
class NotGiven(metaclass=SentinelMeta): pass
However, all such implementations don't have a dedicated type for the
sentinel, which is considered desirable for strict typing. A dedicated type
could be created by a meta-class or class decorator, but at that point the
implementation would become much more complex and loses its advantages over
the chosen implementation.
... or a class decorator::
Additionally, using classes this way is unusual and could be confusing.
@Sentinel
class NotGiven: pass
Using classes this way is unusual and could be confusing. The intention of
code would be hard to understand without comments. It would also cause
such sentinels to have some unexpected and undesirable behavior, such as
being callable.
Define a recommended "standard" idiom, without supplying an implementation
@ -293,38 +332,65 @@ Define a recommended "standard" idiom, without supplying an implementation
Most common exiting idioms have significant drawbacks. So far, no idiom
has been found that is clear and concise while avoiding these drawbacks.
Also, in the `poll on this subject <poll_>`__, the options for recommending an
Also, in the poll [4]_ on this subject, the options for recommending an
idiom were unpopular, with the highest-voted option being voted for by only
25% of the voters.
Specific type signatures for each sentinel value
------------------------------------------------
For a long time, the author of this PEP strove to have type signatures for
such sentinels that were specific to each value. A leading proposal
(supported by Guido and others) was to expand the use of ``Literal``, e.g.
``Literal[MISSING]``. After much thought and discussion, especially on the
typing-sig mailing list [8]_, it seems that all such solutions would require
special-casing and/or added complexity in the implementations of static type
checkers, while also constraining the implementation of sentinels.
Therefore, this PEP no longer proposes such signatures. Instead, this PEP
suggests using ``Sentinel`` as the type signature for sentinel values.
It is somewhat unfortunate that static type checkers will sometimes not be
able to deduce more specific types due to this, such as inside a conditional
block like ``if value is not MISSING: ...``. However, this is a minor issue
in practice, as type checkers can be easily made to understand these cases
using ``typing.cast()``.
Additional Notes
================
* This PEP and the initial implementation are drafted in a `dedicated GitHub
repo <reference-github-repo_>`__.
* This PEP and the initial implementation are drafted in a dedicated GitHub
repo [7]_.
* The support for copying/unpickling works when defined in a module's scope or
a (possibly nested) class's scope. Note that in the latter case, the name
provided as the first parameter must be the fully-qualified name of the
variable in the module::
* For sentinels defined in a class scope, to avoid potential name clashes,
one should use the fully-qualified name of the variable in the module. Only
the part of the name after the last period will be used for the default
repr. For example::
class MyClass:
NotGiven = sentinel('MyClass.NotGiven', repr='<NotGiven>')
>>> class MyClass:
... NotGiven = sentinel('MyClass.NotGiven')
>>> MyClass.NotGiven
<NotGiven>
* One should be careful when creating sentinels in a function or method, since
sentinels with the same name created by code in the same module will be
identical. If distinct sentinel objects are needed, make sure to use
distinct names.
References
==========
.. _python-dev-thread: https://mail.python.org/archives/list/python-dev@python.org/thread/ZLVPD2OISI7M4POMTR2FCQTE6TPMPTO3/
.. _list-of-sentinels-in-stdlib: https://mail.python.org/archives/list/python-dev@python.org/message/JBYXQH3NV3YBF7P2HLHB5CD6V3GVTY55/
.. _poll: https://discuss.python.org/t/sentinel-values-in-the-stdlib/8810/
.. _reference-github-repo: https://github.com/taleinat/python-stdlib-sentinels
* `bpo-44123: Make function parameter sentinel values true singletons <https://bugs.python.org/issue44123>`_
* `The "sentinels" package on PyPI <https://pypi.org/project/sentinels/>`_
* `The "sentinel" package on PyPI <https://pypi.org/project/sentinel/>`_
* `Discussion thread about type signatures for these sentinels on the typing-sig mailing list <https://mail.python.org/archives/list/typing-sig@python.org/thread/NDEJ7UCDPINP634GXWDARVMTGDVSNBKV/#LVCPTY26JQJW7NKGKGAZXHQKWVW7GOGL>`_
.. [1] Python-Dev mailing list: `The repr of a sentinel <https://mail.python.org/archives/list/python-dev@python.org/thread/ZLVPD2OISI7M4POMTR2FCQTE6TPMPTO3/>`_
.. [2] Python-Dev mailing list: `"The stdlib contains tons of sentinels" <https://mail.python.org/archives/list/python-dev@python.org/message/JBYXQH3NV3YBF7P2HLHB5CD6V3GVTY55/>`_
.. [3] `bpo-44123: Make function parameter sentinel values true singletons <https://bugs.python.org/issue44123>`_
.. [4] discuss.python.org Poll: `Sentinel Values in the Stdlib <https://discuss.python.org/t/sentinel-values-in-the-stdlib/8810/>`_
.. [5] `The "sentinels" package on PyPI <https://pypi.org/project/sentinels/>`_
.. [6] `The "sentinel" package on PyPI <https://pypi.org/project/sentinel/>`_
.. [7] `Reference implementation at the taleinat/python-stdlib-sentinels GitHub repo <https://github.com/taleinat/python-stdlib-sentinels>`_
.. [8] `Discussion thread about type signatures for these sentinels on the typing-sig mailing list <https://mail.python.org/archives/list/typing-sig@python.org/thread/NDEJ7UCDPINP634GXWDARVMTGDVSNBKV/#LVCPTY26JQJW7NKGKGAZXHQKWVW7GOGL>`_
Copyright