PEP 637: Default sentinel value for no value passed as index (#1626)
* PEP 637: more rationale for function call not being an option * PEP 637 Commentary on the choice of tuple as sentinel when no index is specified * Finalised explanation of rationale behind the empty tuple * Clarified the proposed default hack * Clarified asymmetry to functions * Added review comments
This commit is contained in:
parent
fa61557ccc
commit
43834b1add
167
pep-0637.rst
167
pep-0637.rst
|
@ -393,6 +393,9 @@ The successful implementation of this PEP will result in the following behavior:
|
|||
del obj[spam=1, eggs=2]
|
||||
# calls type(obj).__delitem__(obj, (), spam=1, eggs=2)
|
||||
|
||||
The choice of the empty tuple as a sentinel has been debated. Details are provided in
|
||||
the Rejected Ideas section.
|
||||
|
||||
10. Keyword arguments must allow slice syntax::
|
||||
|
||||
obj[3:4, spam=1:4, eggs=2]
|
||||
|
@ -772,44 +775,148 @@ make the dictionary type accept it automatically, to insert or refer to the valu
|
|||
the empty tuple as key. Moreover, a typing notation such as ``Tuple[]`` can easily
|
||||
be written as ``Tuple`` without the indexing notation.
|
||||
|
||||
Use None instead of the empty tuple when no positional index is given
|
||||
---------------------------------------------------------------------
|
||||
Sentinel value for no given positional index
|
||||
--------------------------------------------
|
||||
|
||||
The case ``obj[k=3]`` will lead to a call ``__getitem__((), k=3)``.
|
||||
The alternative ``__getitem__(None, k=3)`` was considered but rejected:
|
||||
NumPy uses `None` to indicate inserting a new axis/dimensions (there's
|
||||
a ``np.newaxis`` alias as well)::
|
||||
The topic of which value to pass as the index in the case of::
|
||||
|
||||
obj[k=3]
|
||||
|
||||
has been considerably debated.
|
||||
|
||||
One apparently rational choice would be to pass no value at all, by making use of
|
||||
the keyword only argument feature, but unfortunately will not work well with
|
||||
the ``__setitem__`` dunder, as a positional element for the value is always
|
||||
passed, and we can't "skip over" the index one unless we introduce a very weird behavior
|
||||
where the first argument refers to the index when specified, and to the value when
|
||||
index is not specified. This is extremely deceiving and error prone.
|
||||
|
||||
The above consideration makes it impossible to have a keyword only dunder, and
|
||||
opens up the question of what entity to pass for the index position when no index
|
||||
is passed::
|
||||
|
||||
obj[k=3] = 5 # would call type(obj).__setitem__(???, 5, k=3)
|
||||
|
||||
A proposed hack would be to let the user specify which entity to use when an
|
||||
index is not specified, by specifying a default for the ``index``, but this
|
||||
forces necessarily to also specify a (never going to be used, as a value is
|
||||
always passed by design) default for the ``value``, as we can't have
|
||||
non-default arguments after defaulted one::
|
||||
|
||||
def __setitem__(index=SENTINEL, value=NEVERUSED, *, k)
|
||||
|
||||
which seems ugly, redundant and confusing. We must therefore accept that some
|
||||
form of sentinel index must be passed by the python implementation when the
|
||||
``obj[k=3]`` notation is used. This also means that default arguments to those
|
||||
parameters are simply never going to be used (but it's already the
|
||||
case with the current implementation, so no change there).
|
||||
|
||||
Additionally, some classes may want to use ``**kwargs``, instead of a keyword-only
|
||||
argument, meaning that having a definition like::
|
||||
|
||||
def __setitem__(self, index, value, **kwargs):
|
||||
|
||||
and a user that wants to pass a keyword ``value``::
|
||||
|
||||
x[value=1] = 0
|
||||
|
||||
expecting a call like::
|
||||
|
||||
obj.__setitem__(SENTINEL, 0, **{"value": 1})
|
||||
|
||||
will instead accidentally be catched by the named ``value``, producing a
|
||||
``duplicate value error``. The user should not be worried about the actual
|
||||
local names of those two arguments if they are, for all practical purposes,
|
||||
positional only. Unfortunately, using positional-only values will ensure this
|
||||
does not happen but it will still not solve the need to pass both ``index`` and
|
||||
``value`` even when the index is not provided. The point is that the user should not
|
||||
be prevented to use keyword arguments to refer to a column ``index``, ``value``
|
||||
(or ``self``) just because the class implementor happens to use those names
|
||||
in the parameter list.
|
||||
|
||||
Moreover, we also require the three dunders to behave in the same way: it would
|
||||
be extremely inconvenient if only ``__setitem__`` were to receive this
|
||||
sentinel, and ``__get|delitem__`` would not because they can get away with a
|
||||
signature that allows for no index specification, thus allowing for a
|
||||
user-specified default index.
|
||||
|
||||
Whatever the choice of the sentinel, it will make the following cases
|
||||
degenerate and thus impossible to differentiate in the dunder::
|
||||
|
||||
obj[k=3]
|
||||
obj[SENTINEL, k=3]
|
||||
|
||||
The question now shifts to which entity should represent the sentinel:
|
||||
the options were:
|
||||
|
||||
1. Empty tuple
|
||||
2. None
|
||||
3. NotImplemented
|
||||
4. a new sentinel object (e.g. NoIndex)
|
||||
|
||||
For option 1, the call will become::
|
||||
|
||||
type(obj).__getitem__((), k=3)
|
||||
|
||||
therefore making ``obj[k=3]`` and ``obj[(), k=3]`` degenerate and indistinguishable.
|
||||
|
||||
This option sounds appealing because:
|
||||
|
||||
1. The numpy community was inquired [#numpy-ml]_, and the general consensus
|
||||
of the responses was that the empty tuple felt appropriate.
|
||||
2. It shows a parallel with the behavior of ``*args`` in a function, when
|
||||
no positional arguments are given::
|
||||
|
||||
>>> def foo(*args, **kwargs):
|
||||
... print(args, kwargs)
|
||||
...
|
||||
>>> foo(k=3)
|
||||
() {'k': 3}
|
||||
|
||||
Although we do accept the following asymmetry in behavior compared to functions
|
||||
when a single value is passed, but that ship has sailed::
|
||||
|
||||
>>> foo(5, k=3)
|
||||
(5,) {'k': 3} # for indexing, a plain 5, not a 1-tuple is passed
|
||||
|
||||
For option 2, using ``None``, it was objected that NumPy uses it to indicate
|
||||
inserting a new axis/dimensions (there's a ``np.newaxis`` alias as well)::
|
||||
|
||||
arr = np.array(5)
|
||||
arr.ndim == 0
|
||||
arr[None].ndim == arr[None,].ndim == 1
|
||||
|
||||
So the final conclusion is that we favor the following series::
|
||||
While this is not an insurmountable issue, it certainly will ripple onto numpy.
|
||||
|
||||
The only issues with both the above is that both the empty tuple and None are
|
||||
potential legitimate indexes, and there might be value in being able to differentiate
|
||||
the two degenerate cases.
|
||||
|
||||
So, an alternative strategy (option 3) would be to use an existing entity that is
|
||||
unlikely to be used as a valid index. One option could be the current built-in constant
|
||||
``NotImplemented``, which is currently returned by comparison operators to
|
||||
report that they do not implement the comparison, and a different strategy
|
||||
should be attempted (e.g. to ask the other object). Unfortunately, its name and
|
||||
traditional use calls back to a feature that is not available, rather than the
|
||||
fact that something was not passed by the user.
|
||||
|
||||
This leaves us with option 4: a new built-in constant. This constant
|
||||
must be unhashable (so it's never going to be a valid key) and have a clear
|
||||
name that makes it obvious its context: ``NoIndex``. This
|
||||
would solve all the above issues, but the question is: is it worth it?
|
||||
|
||||
From a quick inquire, it seems that most people on python-ideas seem to believe
|
||||
it's not crucial, and the empty tuple is an acceptable option. Hence the
|
||||
resulting series will be::
|
||||
|
||||
obj[k=3] # __getitem__((), k=3). Empty tuple
|
||||
obj[1, k=3] # __getitem__(1, k=3). Integer
|
||||
obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple
|
||||
|
||||
more than this::
|
||||
|
||||
obj[k=3] # __getitem__(None, k=3). None
|
||||
obj[1, k=3] # __getitem__(1, k=3). Integer
|
||||
obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple
|
||||
|
||||
With the first more in line with a ``*args`` semantics for calling a routine with
|
||||
no positional arguments::
|
||||
|
||||
>>> def foo(*args, **kwargs):
|
||||
... print(args, kwargs)
|
||||
...
|
||||
>>> foo(k=3)
|
||||
() {'k': 3}
|
||||
|
||||
Although we accept the following asymmetry::
|
||||
|
||||
>>> foo(1, k=3)
|
||||
(1,) {'k': 3}
|
||||
and the following two notation will be degenerate::
|
||||
|
||||
obj[(), k=3] # __getitem__((), k=3)
|
||||
obj[k=3] # __getitem__((), k=3)
|
||||
|
||||
Common objections
|
||||
=================
|
||||
|
@ -832,6 +939,11 @@ Common objections
|
|||
|
||||
dict(i=float, j=float) # would create a dictionary, not a type
|
||||
|
||||
Finally, function calls do not allow for a setitem-like notation, as shown
|
||||
in the Overview: operations such as ``f(1, x=3) = 5`` are not allowed, and are
|
||||
instead allowed for indexing operations.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
|
@ -845,7 +957,8 @@ References
|
|||
(https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/)
|
||||
.. [#pep-0001] "PEP 1 -- PEP Purpose and Guidelines"
|
||||
(https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep)
|
||||
|
||||
.. [#numpy-ml] "[Numpy-discussion] Request for comments on PEP 637 - Support for indexing with keyword arguments"
|
||||
(http://numpy-discussion.10968.n7.nabble.com/Request-for-comments-on-PEP-637-Support-for-indexing-with-keyword-arguments-td48489.html)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue