PEP 637: Default sentinel value for no value passed as index (#1626)

* PEP 637: more rationale for function call not being an option

* PEP 637 Commentary on the choice of tuple as sentinel when no index is specified

* Finalised explanation of rationale behind the empty tuple

* Clarified the proposed default hack

* Clarified asymmetry to functions

* Added review comments
This commit is contained in:
Stefano Borini 2020-09-28 21:13:20 +01:00 committed by GitHub
parent fa61557ccc
commit 43834b1add
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 140 additions and 27 deletions

View File

@ -393,6 +393,9 @@ The successful implementation of this PEP will result in the following behavior:
del obj[spam=1, eggs=2] del obj[spam=1, eggs=2]
# calls type(obj).__delitem__(obj, (), spam=1, eggs=2) # calls type(obj).__delitem__(obj, (), spam=1, eggs=2)
The choice of the empty tuple as a sentinel has been debated. Details are provided in
the Rejected Ideas section.
10. Keyword arguments must allow slice syntax:: 10. Keyword arguments must allow slice syntax::
obj[3:4, spam=1:4, eggs=2] obj[3:4, spam=1:4, eggs=2]
@ -772,44 +775,148 @@ make the dictionary type accept it automatically, to insert or refer to the valu
the empty tuple as key. Moreover, a typing notation such as ``Tuple[]`` can easily the empty tuple as key. Moreover, a typing notation such as ``Tuple[]`` can easily
be written as ``Tuple`` without the indexing notation. be written as ``Tuple`` without the indexing notation.
Use None instead of the empty tuple when no positional index is given Sentinel value for no given positional index
--------------------------------------------------------------------- --------------------------------------------
The case ``obj[k=3]`` will lead to a call ``__getitem__((), k=3)``. The topic of which value to pass as the index in the case of::
The alternative ``__getitem__(None, k=3)`` was considered but rejected:
NumPy uses `None` to indicate inserting a new axis/dimensions (there's obj[k=3]
a ``np.newaxis`` alias as well)::
has been considerably debated.
One apparently rational choice would be to pass no value at all, by making use of
the keyword only argument feature, but unfortunately will not work well with
the ``__setitem__`` dunder, as a positional element for the value is always
passed, and we can't "skip over" the index one unless we introduce a very weird behavior
where the first argument refers to the index when specified, and to the value when
index is not specified. This is extremely deceiving and error prone.
The above consideration makes it impossible to have a keyword only dunder, and
opens up the question of what entity to pass for the index position when no index
is passed::
obj[k=3] = 5 # would call type(obj).__setitem__(???, 5, k=3)
A proposed hack would be to let the user specify which entity to use when an
index is not specified, by specifying a default for the ``index``, but this
forces necessarily to also specify a (never going to be used, as a value is
always passed by design) default for the ``value``, as we can't have
non-default arguments after defaulted one::
def __setitem__(index=SENTINEL, value=NEVERUSED, *, k)
which seems ugly, redundant and confusing. We must therefore accept that some
form of sentinel index must be passed by the python implementation when the
``obj[k=3]`` notation is used. This also means that default arguments to those
parameters are simply never going to be used (but it's already the
case with the current implementation, so no change there).
Additionally, some classes may want to use ``**kwargs``, instead of a keyword-only
argument, meaning that having a definition like::
def __setitem__(self, index, value, **kwargs):
and a user that wants to pass a keyword ``value``::
x[value=1] = 0
expecting a call like::
obj.__setitem__(SENTINEL, 0, **{"value": 1})
will instead accidentally be catched by the named ``value``, producing a
``duplicate value error``. The user should not be worried about the actual
local names of those two arguments if they are, for all practical purposes,
positional only. Unfortunately, using positional-only values will ensure this
does not happen but it will still not solve the need to pass both ``index`` and
``value`` even when the index is not provided. The point is that the user should not
be prevented to use keyword arguments to refer to a column ``index``, ``value``
(or ``self``) just because the class implementor happens to use those names
in the parameter list.
Moreover, we also require the three dunders to behave in the same way: it would
be extremely inconvenient if only ``__setitem__`` were to receive this
sentinel, and ``__get|delitem__`` would not because they can get away with a
signature that allows for no index specification, thus allowing for a
user-specified default index.
Whatever the choice of the sentinel, it will make the following cases
degenerate and thus impossible to differentiate in the dunder::
obj[k=3]
obj[SENTINEL, k=3]
The question now shifts to which entity should represent the sentinel:
the options were:
1. Empty tuple
2. None
3. NotImplemented
4. a new sentinel object (e.g. NoIndex)
For option 1, the call will become::
type(obj).__getitem__((), k=3)
therefore making ``obj[k=3]`` and ``obj[(), k=3]`` degenerate and indistinguishable.
This option sounds appealing because:
1. The numpy community was inquired [#numpy-ml]_, and the general consensus
of the responses was that the empty tuple felt appropriate.
2. It shows a parallel with the behavior of ``*args`` in a function, when
no positional arguments are given::
>>> def foo(*args, **kwargs):
... print(args, kwargs)
...
>>> foo(k=3)
() {'k': 3}
Although we do accept the following asymmetry in behavior compared to functions
when a single value is passed, but that ship has sailed::
>>> foo(5, k=3)
(5,) {'k': 3} # for indexing, a plain 5, not a 1-tuple is passed
For option 2, using ``None``, it was objected that NumPy uses it to indicate
inserting a new axis/dimensions (there's a ``np.newaxis`` alias as well)::
arr = np.array(5) arr = np.array(5)
arr.ndim == 0 arr.ndim == 0
arr[None].ndim == arr[None,].ndim == 1 arr[None].ndim == arr[None,].ndim == 1
So the final conclusion is that we favor the following series:: While this is not an insurmountable issue, it certainly will ripple onto numpy.
The only issues with both the above is that both the empty tuple and None are
potential legitimate indexes, and there might be value in being able to differentiate
the two degenerate cases.
So, an alternative strategy (option 3) would be to use an existing entity that is
unlikely to be used as a valid index. One option could be the current built-in constant
``NotImplemented``, which is currently returned by comparison operators to
report that they do not implement the comparison, and a different strategy
should be attempted (e.g. to ask the other object). Unfortunately, its name and
traditional use calls back to a feature that is not available, rather than the
fact that something was not passed by the user.
This leaves us with option 4: a new built-in constant. This constant
must be unhashable (so it's never going to be a valid key) and have a clear
name that makes it obvious its context: ``NoIndex``. This
would solve all the above issues, but the question is: is it worth it?
From a quick inquire, it seems that most people on python-ideas seem to believe
it's not crucial, and the empty tuple is an acceptable option. Hence the
resulting series will be::
obj[k=3] # __getitem__((), k=3). Empty tuple obj[k=3] # __getitem__((), k=3). Empty tuple
obj[1, k=3] # __getitem__(1, k=3). Integer obj[1, k=3] # __getitem__(1, k=3). Integer
obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple
more than this:: and the following two notation will be degenerate::
obj[k=3] # __getitem__(None, k=3). None
obj[1, k=3] # __getitem__(1, k=3). Integer
obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple
With the first more in line with a ``*args`` semantics for calling a routine with
no positional arguments::
>>> def foo(*args, **kwargs):
... print(args, kwargs)
...
>>> foo(k=3)
() {'k': 3}
Although we accept the following asymmetry::
>>> foo(1, k=3)
(1,) {'k': 3}
obj[(), k=3] # __getitem__((), k=3)
obj[k=3] # __getitem__((), k=3)
Common objections Common objections
================= =================
@ -832,6 +939,11 @@ Common objections
dict(i=float, j=float) # would create a dictionary, not a type dict(i=float, j=float) # would create a dictionary, not a type
Finally, function calls do not allow for a setitem-like notation, as shown
in the Overview: operations such as ``f(1, x=3) = 5`` are not allowed, and are
instead allowed for indexing operations.
References References
========== ==========
@ -845,7 +957,8 @@ References
(https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/) (https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/)
.. [#pep-0001] "PEP 1 -- PEP Purpose and Guidelines" .. [#pep-0001] "PEP 1 -- PEP Purpose and Guidelines"
(https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep) (https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep)
.. [#numpy-ml] "[Numpy-discussion] Request for comments on PEP 637 - Support for indexing with keyword arguments"
(http://numpy-discussion.10968.n7.nabble.com/Request-for-comments-on-PEP-637-Support-for-indexing-with-keyword-arguments-td48489.html)
Copyright Copyright
========= =========