1124 lines
44 KiB
ReStructuredText
1124 lines
44 KiB
ReStructuredText
PEP: 637
|
||
Title: Support for indexing with keyword arguments
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Stefano Borini
|
||
Sponsor: Steven D'Aprano
|
||
Discussions-To: python-ideas@python.org
|
||
Status: Rejected
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 24-Aug-2020
|
||
Python-Version: 3.10
|
||
Post-History: 23-Sep-2020
|
||
Resolution: https://mail.python.org/archives/list/python-dev@python.org/thread/6TAQ2BEVSJNV4JM2RJYSSYFJUT3INGZD/
|
||
|
||
|
||
.. note::
|
||
This PEP has been rejected. In general, the cost of introducing new syntax
|
||
was not outweighed by the perceived benefits. See the link in the Resolution
|
||
header field for details.
|
||
|
||
Abstract
|
||
========
|
||
|
||
At present keyword arguments are allowed in function calls, but not in
|
||
item access. This PEP proposes that Python be extended to allow keyword
|
||
arguments in item access.
|
||
|
||
The following example shows keyword arguments for ordinary function calls::
|
||
|
||
>>> val = f(1, 2, a=3, b=4)
|
||
|
||
The proposal would extend the syntax to allow a similar construct
|
||
to indexing operations::
|
||
|
||
>>> val = x[1, 2, a=3, b=4] # getitem
|
||
>>> x[1, 2, a=3, b=4] = val # setitem
|
||
>>> del x[1, 2, a=3, b=4] # delitem
|
||
|
||
and would also provide appropriate semantics. Single- and double-star unpacking of
|
||
arguments is also provided::
|
||
|
||
>>> val = x[*(1, 2), **{a=3, b=4}] # Equivalent to above.
|
||
|
||
This PEP is a successor to :pep:`472`, which was rejected due to lack of
|
||
interest in 2019. Since then there's been renewed interest in the feature.
|
||
|
||
Overview
|
||
========
|
||
|
||
Background
|
||
----------
|
||
|
||
:pep:`472` was opened in 2014. The PEP detailed various use cases and was created by
|
||
extracting implementation strategies from a broad discussion on the
|
||
python-ideas mailing list, although no clear consensus was reached on which strategy
|
||
should be used. Many corner cases have been examined more closely and felt
|
||
awkward, backward incompatible or both.
|
||
|
||
The PEP was eventually rejected in 2019 [#rejection]_ mostly
|
||
due to lack of interest for the feature despite its 5 years of existence.
|
||
|
||
However, with the introduction of type hints in :pep:`484` the
|
||
square bracket notation has been used consistently to enrich the typing
|
||
annotations, e.g. to specify a list of integers as Sequence[int]. Additionally,
|
||
there has been an expanded growth of packages for data analysis such as pandas
|
||
and xarray, which use names to describe columns in a table (pandas) or axis in
|
||
an nd-array (xarray). These packages allow users to access specific data by
|
||
names, but cannot currently use index notation ([]) for this functionality.
|
||
|
||
As a result, a renewed interest in a more flexible syntax that would allow for
|
||
named information has been expressed occasionally in many different threads on
|
||
python-ideas, recently by Caleb Donovick [#request-1]_ in 2019 and Andras
|
||
Tantos [#request-2]_ in 2020. These requests prompted a strong activity on the
|
||
python-ideas mailing list, where the various options have been re-discussed and
|
||
a general consensus on an implementation strategy has now been reached.
|
||
|
||
Use cases
|
||
---------
|
||
|
||
The following practical use cases present different cases where a keyword
|
||
specification would improve notation and provide additional value:
|
||
|
||
1. To provide a more communicative meaning to the index, preventing e.g. accidental
|
||
inversion of indexes::
|
||
|
||
>>> grid_position[x=3, y=5, z=8]
|
||
>>> rain_amount[time=0:12, location=location]
|
||
>>> matrix[row=20, col=40]
|
||
|
||
2. To enrich the typing notation with keywords, especially during the use of generics::
|
||
|
||
def function(value: MyType[T=int]):
|
||
|
||
3. In some domain, such as computational physics and chemistry, the use of a
|
||
notation such as ``Basis[Z=5]`` is a Domain Specific Language notation to represent
|
||
a level of accuracy::
|
||
|
||
>>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])
|
||
|
||
4. Pandas currently uses a notation such as::
|
||
|
||
>>> df[df['x'] == 1]
|
||
|
||
which could be replaced with ``df[x=1]``.
|
||
|
||
5. xarray has named dimensions. Currently these are handled with functions .isel::
|
||
|
||
>>> data.isel(row=10) # Returns the tenth row
|
||
|
||
which could also be replaced with ``data[row=10]``. A more complex example::
|
||
|
||
>>> # old syntax
|
||
>>> da.isel(space=0, time=slice(None, 2))[...] = spam
|
||
>>> # new syntax
|
||
>>> da[space=0, time=:2] = spam
|
||
|
||
Another example::
|
||
|
||
>>> # old syntax
|
||
>>> ds["empty"].loc[dict(lon=5, lat=6)] = 10
|
||
>>> # new syntax
|
||
>>> ds["empty"][lon=5, lat=6] = 10
|
||
|
||
>>> # old syntax
|
||
>>> ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10
|
||
>>> # new syntax
|
||
>>> ds["empty"][lon=1:5, lat=6:] = 10
|
||
|
||
6. Functions/methods whose argument is another function (plus its
|
||
arguments) need some way to determine which arguments are destined for
|
||
the target function, and which are used to configure how they run the
|
||
target. This is simple (if non-extensible) for positional parameters,
|
||
but we need some way to distinguish these for keywords. [#trio-run]_
|
||
|
||
An indexed notation would afford a Pythonic way to pass keyword
|
||
arguments to these functions without cluttering the caller's code.
|
||
|
||
::
|
||
|
||
>>> # Let's start this example with basic syntax without keywords.
|
||
>>> # the positional values are arguments to `func` while
|
||
>>> # `name=` is processed by `trio.run`.
|
||
>>> trio.run(func, value1, value2, name="func")
|
||
>>> # `trio.run` ends up calling `func(value1, value2)`.
|
||
|
||
>>> # If we want/need to pass value2 by keyword (keyword-only argument,
|
||
>>> # additional arguments that won't break backwards compatibility ...),
|
||
>>> # currently we need to resort to functools.partial:
|
||
>>> trio.run(functools.partial(func, param2=value2), value1, name="func")
|
||
>>> trio.run(functools.partial(func, value1, param2=value2), name="func")
|
||
|
||
>>> # One possible workaround is to convert `trio.run` to an object
|
||
>>> # with a `__call__` method, and use an "option" helper,
|
||
>>> trio.run.option(name="func")(func, value1, param2=value2)
|
||
>>> # However, foo(bar)(baz) is uncommon and thus disruptive to the reader.
|
||
>>> # Also, you need to remember the name of the `option` method.
|
||
|
||
>>> # This PEP allows us to replace `option` with `__getitem__`.
|
||
>>> # The call is now shorter, more mnemonic, and looks+works like typing
|
||
>>> trio.run[name="func"](func, value1, param2=value2)
|
||
|
||
7. Availability of star arguments would benefit :pep:`646` Variadic Generics,
|
||
especially in the forms ``a[*x]`` and ``a[*x, *y, p, q, *z]``. The PEP details
|
||
exactly this notation in its "Unpacking: Star Operator" section.
|
||
|
||
It is important to note that how the notation is interpreted is up to the
|
||
implementation. This PEP only defines and dictates the behavior of Python
|
||
regarding passed keyword arguments, not how these arguments should be
|
||
interpreted and used by the implementing class.
|
||
|
||
Current status of indexing operation
|
||
------------------------------------
|
||
|
||
Before detailing the new syntax and semantics to the indexing notation, it is
|
||
relevant to analyse how the indexing notation works today, in which contexts,
|
||
and how it is different from a function call.
|
||
|
||
Subscripting ``obj[x]`` is, effectively, an alternate and specialised form of
|
||
function call syntax with a number of differences and restrictions compared to
|
||
``obj(x)``. The current Python syntax focuses exclusively on position to express
|
||
the index, and also contains syntactic sugar to refer to non-punctiform
|
||
selection (slices). Some common examples::
|
||
|
||
>>> a[3] # returns the fourth element of 'a'
|
||
>>> a[1:10:2] # slice notation (extract a non-trivial data subset)
|
||
>>> a[3, 2] # multiple indexes (for multidimensional arrays)
|
||
|
||
This translates into a ``__(get|set|del)item__`` dunder call which is passed a single
|
||
parameter containing the index (for ``__getitem__`` and ``__delitem__``) or two parameters
|
||
containing index and value (for ``__setitem__``).
|
||
|
||
The behavior of the indexing call is fundamentally different from a function call
|
||
in various aspects:
|
||
|
||
The first difference is in meaning to the reader. A function call says
|
||
"arbitrary function call potentially with side-effects". An indexing operation
|
||
says "lookup", typically to point at a subset or specific sub-aspect of an
|
||
entity (as in the case of typing notation). This fundamental difference means
|
||
that, while we cannot prevent abuse, implementors should be aware that the
|
||
introduction of keyword arguments to alter the behavior of the lookup may
|
||
violate this intrinsic meaning.
|
||
|
||
The second difference of the indexing notation compared to a function
|
||
is that indexing can be used for both getting and setting operations.
|
||
In Python, a function cannot be on the left hand side of an assignment. In
|
||
other words, both of these are valid::
|
||
|
||
>>> x = a[1, 2]
|
||
>>> a[1, 2] = 5
|
||
|
||
but only the first one of these is valid::
|
||
|
||
>>> x = f(1, 2)
|
||
>>> f(1, 2) = 5 # invalid
|
||
|
||
This asymmetry is important, and makes one understand that there is a natural
|
||
imbalance between the two forms. It is therefore not a given that the two
|
||
should behave transparently and symmetrically.
|
||
|
||
The third difference is that functions have names assigned to their
|
||
arguments, unless the passed parameters are captured with ``*args``, in which case
|
||
they end up as entries in the args tuple. In other words, functions already
|
||
have anonymous argument semantic, exactly like the indexing operation. However,
|
||
``__(get|set|del)item__`` is not always receiving a tuple as the ``index`` argument
|
||
(to be uniform in behavior with ``*args``). In fact, given a trivial class::
|
||
|
||
class X:
|
||
def __getitem__(self, index):
|
||
print(index)
|
||
|
||
The index operation basically forwards the content of the square brackets "as is"
|
||
in the ``index`` argument::
|
||
|
||
>>> x=X()
|
||
>>> x[0]
|
||
0
|
||
>>> x[0, 1]
|
||
(0, 1)
|
||
>>> x[(0, 1)]
|
||
(0, 1)
|
||
>>>
|
||
>>> x[()]
|
||
()
|
||
>>> x[{1, 2, 3}]
|
||
{1, 2, 3}
|
||
>>> x["hello"]
|
||
hello
|
||
>>> x["hello", "hi"]
|
||
('hello', 'hi')
|
||
|
||
The fourth difference is that the indexing operation knows how to convert
|
||
colon notations to slices, thanks to support from the parser. This is valid::
|
||
|
||
a[1:3]
|
||
|
||
this one isn't::
|
||
|
||
f(1:3)
|
||
|
||
The fifth difference is that there's no zero-argument form. This is valid::
|
||
|
||
f()
|
||
|
||
this one isn't::
|
||
|
||
a[]
|
||
|
||
Specification
|
||
=============
|
||
|
||
Before describing the specification, it is important to stress the difference in
|
||
nomenclature between *positional index*, *final index* and *keyword argument*, as it is important to
|
||
understand the fundamental asymmetries at play. The ``__(get|set|del)item__``
|
||
is fundamentally an indexing operation, and the way the element is retrieved,
|
||
set, or deleted is through an index, the *final index*.
|
||
|
||
The current status quo is to directly build the *final index* from what is passed between
|
||
square brackets, the *positional index*. In other words, what is passed in the
|
||
square brackets is trivially used to generate what the code in ``__getitem__`` then uses
|
||
for the indicisation operation. As we already saw for the dict, ``d[1]`` has a
|
||
positional index of ``1`` and also a final index of ``1`` (because it's the element that is
|
||
then added to the dictionary) and ``d[1, 2]`` has positional index of ``(1, 2)`` and
|
||
final index also of ``(1, 2)`` (because yet again it's the element that is added to the dictionary).
|
||
However, the positional index ``d[1,2:3]`` is not accepted by the dictionary, because
|
||
there's no way to transform the positional index into a final index, as the slice object is
|
||
unhashable. The positional index is what is currently known as the ``index`` parameter in
|
||
``__getitem__``. Nevertheless, nothing prevents to construct a dictionary-like class that
|
||
creates the final index by e.g. converting the positional index to a string.
|
||
|
||
This PEP extends the current status quo, and grants more flexibility to
|
||
create the final index via an enhanced syntax that combines the positional index
|
||
and keyword arguments, if passed.
|
||
|
||
The above brings an important point across. Keyword arguments, in the context of the index
|
||
operation, may be used to take indexing decisions to obtain the final index, and therefore
|
||
will have to accept values that are unconventional for functions. See for
|
||
example use case 1, where a slice is accepted.
|
||
|
||
The successful implementation of this PEP will result in the following behavior:
|
||
|
||
1. An empty subscript is still illegal, regardless of context (see Rejected Ideas)::
|
||
|
||
obj[] # SyntaxError
|
||
|
||
2. A single index value remains a single index value when passed::
|
||
|
||
obj[index]
|
||
# calls type(obj).__getitem__(obj, index)
|
||
|
||
obj[index] = value
|
||
# calls type(obj).__setitem__(obj, index, value)
|
||
|
||
del obj[index]
|
||
# calls type(obj).__delitem__(obj, index)
|
||
|
||
This remains the case even if the index is followed by keywords; see point 5 below.
|
||
|
||
3. Comma-separated arguments are still parsed as a tuple and passed as
|
||
a single positional argument::
|
||
|
||
obj[spam, eggs]
|
||
# calls type(obj).__getitem__(obj, (spam, eggs))
|
||
|
||
obj[spam, eggs] = value
|
||
# calls type(obj).__setitem__(obj, (spam, eggs), value)
|
||
|
||
del obj[spam, eggs]
|
||
# calls type(obj).__delitem__(obj, (spam, eggs))
|
||
|
||
The points above mean that classes which do not want to support keyword
|
||
arguments in subscripts need do nothing at all, and the feature is therefore
|
||
completely backwards compatible.
|
||
|
||
4. Keyword arguments, if any, must follow positional arguments::
|
||
|
||
obj[1, 2, spam=None, 3] # SyntaxError
|
||
|
||
This is like function calls, where intermixing positional and keyword
|
||
arguments give a SyntaxError.
|
||
|
||
5. Keyword subscripts, if any, will be handled like they are in
|
||
function calls. Examples::
|
||
|
||
# Single index with keywords:
|
||
|
||
obj[index, spam=1, eggs=2]
|
||
# calls type(obj).__getitem__(obj, index, spam=1, eggs=2)
|
||
|
||
obj[index, spam=1, eggs=2] = value
|
||
# calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2)
|
||
|
||
del obj[index, spam=1, eggs=2]
|
||
# calls type(obj).__delitem__(obj, index, spam=1, eggs=2)
|
||
|
||
# Comma-separated indices with keywords:
|
||
|
||
obj[foo, bar, spam=1, eggs=2]
|
||
# calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2)
|
||
|
||
obj[foo, bar, spam=1, eggs=2] = value
|
||
# calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2)
|
||
|
||
del obj[foo, bar, spam=1, eggs=2]
|
||
# calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2)
|
||
|
||
Note that:
|
||
|
||
- a single positional index will not turn into a tuple
|
||
just because one adds a keyword value.
|
||
|
||
- for ``__setitem__``, the same order is retained for index and value.
|
||
The keyword arguments go at the end, as is normal for a function
|
||
definition.
|
||
|
||
6. The same rules apply with respect to keyword subscripts as for
|
||
keywords in function calls:
|
||
|
||
- the interpreter matches up each keyword subscript to a named parameter
|
||
in the appropriate method;
|
||
|
||
- if a named parameter is used twice, that is an error;
|
||
|
||
- if there are any named parameters left over (without a value) when the
|
||
keywords are all used, they are assigned their default value (if any);
|
||
|
||
- if any such parameter doesn't have a default, that is an error;
|
||
|
||
- if there are any keyword subscripts remaining after all the named
|
||
parameters are filled, and the method has a ``**kwargs`` parameter,
|
||
they are bound to the ``**kwargs`` parameter as a dict;
|
||
|
||
- but if no ``**kwargs`` parameter is defined, it is an error.
|
||
|
||
|
||
7. Sequence unpacking is allowed inside subscripts::
|
||
|
||
obj[*items]
|
||
|
||
This allows notations such as ``[:, *args, :]``, which could be treated
|
||
as ``[(slice(None), *args, slice(None))]``. Multiple star unpacking are
|
||
allowed::
|
||
|
||
obj[1, *(2, 3), *(4, 5), 6, foo=5]
|
||
# Equivalent to obj[(1, 2, 3, 4, 5, 6), foo=3)
|
||
|
||
The following notation equivalence must be honored::
|
||
|
||
obj[*()]
|
||
# Equivalent to obj[()]
|
||
|
||
obj[*(), foo=3]
|
||
# Equivalent to obj[(), foo=3]
|
||
|
||
obj[*(x,)]
|
||
# Equivalent to obj[(x,)]
|
||
|
||
obj[*(x,),]
|
||
# Equivalent to obj[(x,)]
|
||
|
||
Note in particular case 3: sequence unpacking of a single element will
|
||
not behave as if only one single argument was passed. A related case is
|
||
the following example::
|
||
|
||
obj[1, *(), foo=5]
|
||
# Equivalent to obj[(1,), foo=5]
|
||
# calls type(obj).__getitem__(obj, (1,), foo=5)
|
||
|
||
However, as we saw earlier, for backward compatibility a single index will be passed as is::
|
||
|
||
obj[1, foo=5]
|
||
# calls type(obj).__getitem__(obj, 1, foo=5)
|
||
|
||
In other words, a single positional index will be passed "as is" only if no sequence
|
||
unpacking is present. If a sequence unpacking is present, then the index will become a tuple,
|
||
regardless of the resulting number of elements in the index after the unpacking has taken place.
|
||
|
||
8. Dict unpacking is permitted::
|
||
|
||
items = {'spam': 1, 'eggs': 2}
|
||
obj[index, **items]
|
||
# equivalent to obj[index, spam=1, eggs=2]
|
||
|
||
The following notation equivalent should be honored::
|
||
|
||
obj[**{}]
|
||
# Equivalent to obj[()]
|
||
|
||
obj[3, **{}]
|
||
# Equivalent to obj[3]
|
||
|
||
9. Keyword-only subscripts are permitted. The positional index will be the empty tuple::
|
||
|
||
obj[spam=1, eggs=2]
|
||
# calls type(obj).__getitem__(obj, (), spam=1, eggs=2)
|
||
|
||
obj[spam=1, eggs=2] = 5
|
||
# calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2)
|
||
|
||
del obj[spam=1, eggs=2]
|
||
# calls type(obj).__delitem__(obj, (), spam=1, eggs=2)
|
||
|
||
The choice of the empty tuple as a sentinel has been debated. Details are provided in
|
||
the Rejected Ideas section.
|
||
|
||
10. Keyword arguments must allow slice syntax::
|
||
|
||
obj[3:4, spam=1:4, eggs=2]
|
||
# calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2)
|
||
|
||
This may open up the possibility to accept the same syntax for general function
|
||
calls, but this is not part of this recommendation.
|
||
|
||
11. Keyword arguments allow for default values::
|
||
|
||
# Given type(obj).__getitem__(obj, index, spam=True, eggs=2)
|
||
obj[3] # Valid. index = 3, spam = True, eggs = 2
|
||
obj[3, spam=False] # Valid. index = 3, spam = False, eggs = 2
|
||
obj[spam=False] # Valid. index = (), spam = False, eggs = 2
|
||
obj[] # Invalid.
|
||
|
||
12. The same semantics given above must be extended to ``__class__getitem__``:
|
||
Since :pep:`560`, type hints are dispatched so that for ``x[y]``, if no
|
||
``__getitem__`` method is found, and ``x`` is a type (class) object,
|
||
and ``x`` has a class method ``__class_getitem__``, that method is
|
||
called. The same changes should be applied to this method as well,
|
||
so that a writing like ``list[T=int]`` can be accepted.
|
||
|
||
Indexing behavior in standard classes (dict, list, etc.)
|
||
--------------------------------------------------------
|
||
|
||
None of what is proposed in this PEP will change the behavior of the current
|
||
core classes that use indexing. Adding keywords to the index operation for
|
||
custom classes is not the same as modifying e.g. the standard dict type to
|
||
handle keyword arguments. In fact, dict (as well as list and other stdlib
|
||
classes with indexing semantics) will remain the same and will continue not to
|
||
accept keyword arguments. In other words, if ``d`` is a ``dict``, the
|
||
statement ``d[1, a=2]`` will raise ``TypeError``, as their implementation will
|
||
not support the use of keyword arguments. The same holds for all other classes
|
||
(list, dict, etc.)
|
||
|
||
Corner case and Gotchas
|
||
-----------------------
|
||
|
||
With the introduction of the new notation, a few corner cases need to be analysed.
|
||
|
||
1. Technically, if a class defines their getter like this::
|
||
|
||
def __getitem__(self, index):
|
||
|
||
then the caller could call that using keyword syntax, like these two cases::
|
||
|
||
obj[3, index=4]
|
||
obj[index=1]
|
||
|
||
The resulting behavior would be an error automatically, since it would be like
|
||
attempting to call the method with two values for the ``index`` argument, and
|
||
a ``TypeError`` will be raised. In the first case, the ``index`` would be ``3``,
|
||
in the second case, it would be the empty tuple ``()``.
|
||
|
||
Note that this behavior applies for all currently existing classes that rely on
|
||
indexing, meaning that there is no way for the new behavior to introduce
|
||
backward compatibility issues on this respect.
|
||
|
||
Classes that wish to stress this behavior explicitly can define their
|
||
parameters as positional-only::
|
||
|
||
def __getitem__(self, index, /):
|
||
|
||
2. a similar case occurs with setter notation::
|
||
|
||
# Given type(obj).__setitem__(obj, index, value):
|
||
obj[1, value=3] = 5
|
||
|
||
This poses no issue because the value is passed automatically, and the Python interpreter will raise
|
||
``TypeError: got multiple values for keyword argument 'value'``
|
||
|
||
|
||
3. If the subscript dunders are declared to use positional-or-keyword
|
||
parameters, there may be some surprising cases when arguments are passed
|
||
to the method. Given the signature::
|
||
|
||
def __getitem__(self, index, direction='north')
|
||
|
||
if the caller uses this::
|
||
|
||
obj[0, 'south']
|
||
|
||
they will probably be surprised by the method call::
|
||
|
||
# expected type(obj).__getitem__(obj, 0, direction='south')
|
||
# but actually get:
|
||
type(obj).__getitem__(obj, (0, 'south'), direction='north')
|
||
|
||
Solution: best practice suggests that keyword subscripts should be
|
||
flagged as keyword-only when possible::
|
||
|
||
def __getitem__(self, index, *, direction='north')
|
||
|
||
The interpreter need not enforce this rule, as there could be scenarios
|
||
where this is the desired behaviour. But linters may choose to warn
|
||
about subscript methods which don't use the keyword-only flag.
|
||
|
||
4. As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.:
|
||
``d[1, a=3]`` is treated as ``__getitem__(d, 1, a=3)``, NOT ``__getitem__(d, (1,), a=3)``. It would be
|
||
extremely confusing if adding keyword arguments were to change the type of the passed index.
|
||
In other words, adding a keyword to a single-valued subscript will not change it into a tuple.
|
||
For those cases where an actual tuple needs to be passed, a proper syntax will have to be used::
|
||
|
||
obj[(1,), a=3]
|
||
# calls type(obj).__getitem__(obj, (1,), a=3)
|
||
|
||
In this case, the call is passing a single element (which is passed as is, as from rule above),
|
||
only that the single element happens to be a tuple.
|
||
|
||
Note that this behavior just reveals the truth that the ``obj[1,]`` notation is shorthand for
|
||
``obj[(1,)]`` (and also ``obj[1]`` is shorthand for ``obj[(1)]``, with the expected behavior).
|
||
When keywords are present, the rule that you can omit this outermost pair of parentheses is no
|
||
longer true::
|
||
|
||
obj[1]
|
||
# calls type(obj).__getitem__(obj, 1)
|
||
|
||
obj[1, a=3]
|
||
# calls type(obj).__getitem__(obj, 1, a=3)
|
||
|
||
obj[1,]
|
||
# calls type(obj).__getitem__(obj, (1,))
|
||
|
||
obj[(1,), a=3]
|
||
# calls type(obj).__getitem__(obj, (1,), a=3)
|
||
|
||
This is particularly relevant in the case where two entries are passed::
|
||
|
||
obj[1, 2]
|
||
# calls type(obj).__getitem__(obj, (1, 2))
|
||
|
||
obj[(1, 2)]
|
||
# same as above
|
||
|
||
obj[1, 2, a=3]
|
||
# calls type(obj).__getitem__(obj, (1, 2), a=3)
|
||
|
||
obj[(1, 2), a=3]
|
||
# calls type(obj).__getitem__(obj, (1, 2), a=3)
|
||
|
||
And particularly when the tuple is extracted as a variable::
|
||
|
||
t = (1, 2)
|
||
obj[t]
|
||
# calls type(obj).__getitem__(obj, (1, 2))
|
||
|
||
obj[t, a=3]
|
||
# calls type(obj).__getitem__(obj, (1, 2), a=3)
|
||
|
||
Why? because in the case ``obj[1, 2, a=3]`` we are passing two elements (which
|
||
are then packed as a tuple and passed as the index). In the case ``obj[(1, 2), a=3]``
|
||
we are passing a single element (which is passed as is) which happens to be a tuple.
|
||
The final result is that they are the same.
|
||
|
||
C Interface
|
||
===========
|
||
|
||
Resolution of the indexing operation is performed through a call to the following functions
|
||
|
||
- ``PyObject_GetItem(PyObject *o, PyObject *key)`` for the get operation
|
||
- ``PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value)`` for the set operation
|
||
- ``PyObject_DelItem(PyObject *o, PyObject *key)`` for the del operation
|
||
|
||
These functions are used extensively within the Python executable, and are
|
||
also part of the public C API, as exported by ``Include/abstract.h``. It is clear that
|
||
the signature of this function cannot be changed, and different C level functions
|
||
need to be implemented to support the extended call. We propose
|
||
|
||
- ``PyObject_GetItemWithKeywords(PyObject *o, PyObject *key, PyObject *kwargs)``
|
||
- ``PyObject_SetItemWithKeywords(PyObject *o, PyObject *key, PyObject *value, PyObject *kwargs)``
|
||
- ``PyObject_GetItemWithKeywords(PyObject *o, PyObject *key, PyObject *kwargs)``
|
||
|
||
New opcodes will be needed for the enhanced call. Currently, the
|
||
implementation uses ``BINARY_SUBSCR``, ``STORE_SUBSCR`` and ``DELETE_SUBSCR``
|
||
to invoke the old functions. We propose ``BINARY_SUBSCR_KW``,
|
||
``STORE_SUBSCR_KW`` and ``DELETE_SUBSCR_KW`` for the new operations. The
|
||
compiler will have to generate these new opcodes. The
|
||
old C implementations will call the extended methods passing ``NULL``
|
||
as kwargs.
|
||
|
||
Finally, the following new slots must be added to the ``PyMappingMethods`` struct:
|
||
|
||
- ``mp_subscript_kw``
|
||
- ``mp_ass_subscript_kw``
|
||
|
||
These slots will have the appropriate signature to handle the dictionary object
|
||
containing the keywords.
|
||
|
||
"How to teach" recommendations
|
||
==============================
|
||
|
||
One request that occurred during feedback sessions was to detail a possible narrative
|
||
for teaching the feature, e.g. to students, data scientists, and similar audience.
|
||
This section addresses that need.
|
||
|
||
We will only describe the indexing from the perspective of use, not of
|
||
implementation, because it is the aspect that the above mentioned audience will
|
||
likely encounter. Only a subset of the users will have to implement their own
|
||
dunder functions, and can be considered advanced usage. A proper explanation could be:
|
||
|
||
The indexing operation is generally used to refer to a subset of a larger
|
||
dataset by means of an index. In the commonly seen cases, the index is made by
|
||
one or more numbers, strings, slices, etc.
|
||
|
||
Some types may allow indexing to occur not only with the index, but also with
|
||
named values. These named values are given between square brackets using the
|
||
same syntax used for function call keyword arguments. The meaning of the names
|
||
and their use is found in the documentation of the type, as it varies from one
|
||
type to another.
|
||
|
||
The teacher will now show some practical real world examples, explaining the
|
||
semantics of the feature in the shown library. At the time of writing these
|
||
examples do not exist, obviously, but the libraries most likely to implement
|
||
the feature are pandas and numpy, possibly as a method to refer to columns by
|
||
name.
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
A reference implementation is currently being developed here [#reference-impl]_.
|
||
|
||
|
||
Workarounds
|
||
===========
|
||
|
||
Every PEP that changes the Python language should :pep:`"clearly explain why
|
||
the existing language specification is inadequate to address the
|
||
problem that the PEP solves" <1#what-belongs-in-a-successful-pep>`.
|
||
|
||
Some rough equivalents to the proposed extension, which we call work-arounds,
|
||
are already possible. The work-arounds provide an alternative to enabling the
|
||
new syntax, while leaving the semantics to be defined elsewhere.
|
||
|
||
These work-arounds follow. In them the helpers ``H`` and ``P`` are not intended to
|
||
be universal. For example, a module or package might require the use of its own
|
||
helpers.
|
||
|
||
1. User defined classes can be given ``getitem`` and ``delitem`` methods,
|
||
that respectively get and delete values stored in a container::
|
||
|
||
>>> val = x.getitem(1, 2, a=3, b=4)
|
||
>>> x.delitem(1, 2, a=3, b=4)
|
||
|
||
The same can't be done for ``setitem``. It's not valid syntax::
|
||
|
||
>>> x.setitem(1, 2, a=3, b=4) = val
|
||
SyntaxError: can't assign to function call
|
||
|
||
2. A helper class, here called ``H``, can be used to swap the container
|
||
and parameter roles. In other words, we use::
|
||
|
||
H(1, 2, a=3, b=4)[x]
|
||
|
||
as a substitute for::
|
||
|
||
x[1, 2, a=3, b=4]
|
||
|
||
This method will work for ``getitem``, ``delitem`` and also for
|
||
``setitem``. This is because::
|
||
|
||
>>> H(1, 2, a=3, b=4)[x] = val
|
||
|
||
is valid syntax, which can be given the appropriate semantics.
|
||
|
||
3. A helper function, here called ``P``, can be used to store the
|
||
arguments in a single object. For example::
|
||
|
||
>>> x[P(1, 2, a=3, b=4)] = val
|
||
|
||
is valid syntax, and can be given the appropriate semantics.
|
||
|
||
4. The ``lo:hi:step`` syntax for slices is sometimes very useful. This
|
||
syntax is not directly available in the work-arounds. However::
|
||
|
||
s[lo:hi:step]
|
||
|
||
provides a work-around that is available everything, where::
|
||
|
||
class S:
|
||
def __getitem__(self, key): return key
|
||
|
||
s = S()
|
||
|
||
defines the helper object ``s``.
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Previous PEP 472 solutions
|
||
--------------------------
|
||
|
||
:pep:`472` presents a good amount of ideas that are now all to be considered
|
||
Rejected. A personal email from D'Aprano to the author specifically said:
|
||
|
||
I have now carefully read through PEP 472 in full, and I am afraid I
|
||
cannot support any of the strategies currently in the PEP.
|
||
|
||
We agree that those options are inferior to the currently presented, for one
|
||
reason or another.
|
||
|
||
To keep this document compact, we will not present here the objections for
|
||
all options presented in :pep:`472`. Suffice to say that they were discussed,
|
||
and each proposed alternative had one or few dealbreakers.
|
||
|
||
Adding new dunders
|
||
------------------
|
||
|
||
It was proposed to introduce new dunders ``__(get|set|del)item_ex__``
|
||
that are invoked over the ``__(get|set|del)item__`` triad, if they are present.
|
||
|
||
The rationale around this choice is to make the intuition around how to add kwd
|
||
arg support to square brackets more obvious and in line with the function
|
||
behavior. Given::
|
||
|
||
def __getitem_ex__(self, x, y): ...
|
||
|
||
These all just work and produce the same result effortlessly::
|
||
|
||
obj[1, 2]
|
||
obj[1, y=2]
|
||
obj[y=2, x=1]
|
||
|
||
In other words, this solution would unify the behavior of ``__getitem__`` to the traditional
|
||
function signature, but since we can't change ``__getitem__`` and break backward compatibility,
|
||
we would have an extended version that is used preferentially.
|
||
|
||
The problems with this approach were found to be:
|
||
|
||
- It will slow down subscripting. For every subscript access, this new dunder
|
||
attribute gets investigated on the class, and if it is not present then the
|
||
default key translation function is executed.
|
||
Different ideas were proposed to handle this, from wrapping the method
|
||
only at class instantiation time, to add a bit flag to signal the availability
|
||
of these methods. Regardess of the solution, the new dunder would be effective
|
||
only if added at class creation time, not if it's added later. This would
|
||
be unusual and would disallow (and behave unexpectedly) monkeypatching of the
|
||
methods for whatever reason it might be needed.
|
||
|
||
- It adds complexity to the mechanism.
|
||
|
||
- Will require a long and painful transition period during which time
|
||
libraries will have to somehow support both calling conventions, because most
|
||
likely, the extended methods will delegate to the traditional ones when the
|
||
right conditions are matched in the arguments, or some classes will support
|
||
the traditional dunder and others the extended dunder. While this will not
|
||
affect calling code, it will affect development.
|
||
|
||
- it would potentially lead to mixed situations where the extended version is
|
||
defined for the getter, but not for the setter.
|
||
|
||
- In the ``__setitem_ex__`` signature, value would have to be made the first
|
||
element, because the index is of arbitrary length depending on the specified
|
||
indexes. This would look awkward because the visual notation does not match
|
||
the signature::
|
||
|
||
obj[1, 2] = 3
|
||
# calls type(obj).__setitem_ex__(obj, 3, 1, 2)
|
||
|
||
- the solution relies on the assumption that all keyword indices necessarily map
|
||
into positional indices, or that they must have a name. This assumption may be
|
||
false: xarray, which is the primary Python package for numpy arrays with
|
||
labelled dimensions, supports indexing by additional dimensions (so called
|
||
"non-dimension coordinates") that don't correspond directly to the dimensions
|
||
of the underlying numpy array, and those have no position to match up to.
|
||
In other words, anonymous indexes are a plausible use case that this solution
|
||
would remove, although it could be argued that using ``*args`` would solve
|
||
that issue.
|
||
|
||
Adding an adapter function
|
||
--------------------------
|
||
|
||
Similar to the above, in the sense that a pre-function would be called to
|
||
convert the "new style" indexing into "old style indexing" that is then passed.
|
||
Has problems similar to the above.
|
||
|
||
create a new "kwslice" object
|
||
-----------------------------
|
||
|
||
This proposal has already been explored in "New arguments contents" P4 in :pep:`472`::
|
||
|
||
obj[a, b:c, x=1]
|
||
# calls type(obj).__getitem__(obj, a, slice(b, c), key(x=1))
|
||
|
||
This solution requires everyone who needs keyword arguments to parse the tuple
|
||
and/or key object by hand to extract them. This is painful and opens up to the
|
||
get/set/del function to always accept arbitrary keyword arguments, whether they
|
||
make sense or not. We want the developer to be able to specify which arguments
|
||
make sense and which ones do not.
|
||
|
||
|
||
Using a single bit to change the behavior
|
||
-----------------------------------------
|
||
|
||
A special class dunder flag::
|
||
|
||
__keyfn__ = True
|
||
|
||
would change the signature of the ``__get|set|delitem__`` to a "function like" dispatch,
|
||
meaning that this::
|
||
|
||
>>> d[1, 2, z=3]
|
||
|
||
would result in a call to::
|
||
|
||
>>> type(obj).__getitem__(obj, 1, 2, z=3)
|
||
# instead of type(obj).__getitem__(obj, (1, 2), z=3)
|
||
|
||
This option has been rejected because it feels odd that a signature of a method
|
||
depends on a specific value of another dunder. It would be confusing for both
|
||
static type checkers and for humans: a static type checker would have to hard-code
|
||
a special case for this, because there really is nothing else in Python
|
||
where the signature of a dunder depends on the value of another dunder.
|
||
A human that has to implement a ``__getitem__`` dunder would have to look if in the
|
||
class (or in any of its subclasses) for a ``__keyfn__`` before the dunder can be written.
|
||
Moreover, adding a base classes that have the ``__keyfn__`` flag set would break
|
||
the signature of the current methods. This would be even more problematic if the
|
||
flag is changed at runtime, or if the flag is generated by calling a function
|
||
that returns randomly True or something else.
|
||
|
||
Allowing for empty index notation obj[]
|
||
---------------------------------------
|
||
|
||
The current proposal prevents ``obj[]`` from being valid notation. However
|
||
a commenter stated
|
||
|
||
We have ``Tuple[int, int]`` as a tuple of two integers. And we have ``Tuple[int]``
|
||
as a tuple of one integer. And occasionally we need to spell a tuple of *no*
|
||
values, since that's the type of ``()``. But we currently are forced to write
|
||
that as ``Tuple[()]``. If we allowed ``Tuple[]`` that odd edge case would be
|
||
removed.
|
||
|
||
So I probably would be okay with allowing ``obj[]`` syntactically, as long as the
|
||
dict type could be made to reject it.
|
||
|
||
This proposal already established that, in case no positional index is given, the
|
||
passed value must be the empty tuple. Allowing for the empty index notation would
|
||
make the dictionary type accept it automatically, to insert or refer to the value with
|
||
the empty tuple as key. Moreover, a typing notation such as ``Tuple[]`` can easily
|
||
be written as ``Tuple`` without the indexing notation.
|
||
|
||
However, subsequent discussion with Brandt Bucher during implementation has revealed
|
||
that the case ``obj[]`` would fit a natural evolution for variadic generics, giving
|
||
more strength to the above comment. In the end, after a discussion between D'Aprano,
|
||
Bucher and the author, we decided to leave the ``obj[]`` notation as a syntax
|
||
error for now, and possibly extend the notation with an additional PEP to hold
|
||
the equivalence ``obj[]`` as ``obj[()]``.
|
||
|
||
|
||
Sentinel value for no given positional index
|
||
--------------------------------------------
|
||
|
||
The topic of which value to pass as the index in the case of::
|
||
|
||
obj[k=3]
|
||
|
||
has been considerably debated.
|
||
|
||
One apparently rational choice would be to pass no value at all, by making use of
|
||
the keyword only argument feature, but unfortunately will not work well with
|
||
the ``__setitem__`` dunder, as a positional element for the value is always
|
||
passed, and we can't "skip over" the index one unless we introduce a very weird behavior
|
||
where the first argument refers to the index when specified, and to the value when
|
||
index is not specified. This is extremely deceiving and error prone.
|
||
|
||
The above consideration makes it impossible to have a keyword only dunder, and
|
||
opens up the question of what entity to pass for the index position when no index
|
||
is passed::
|
||
|
||
obj[k=3] = 5
|
||
# would call type(obj).__setitem__(obj, ???, 5, k=3)
|
||
|
||
A proposed hack would be to let the user specify which entity to use when an
|
||
index is not specified, by specifying a default for the ``index``, but this
|
||
forces necessarily to also specify a (never going to be used, as a value is
|
||
always passed by design) default for the ``value``, as we can't have
|
||
non-default arguments after defaulted one::
|
||
|
||
def __setitem__(self, index=SENTINEL, value=NEVERUSED, *, k)
|
||
|
||
which seems ugly, redundant and confusing. We must therefore accept that some
|
||
form of sentinel index must be passed by the Python implementation when the
|
||
``obj[k=3]`` notation is used. This also means that default arguments to those
|
||
parameters are simply never going to be used (but it's already the
|
||
case with the current implementation, so no change there).
|
||
|
||
Additionally, some classes may want to use ``**kwargs``, instead of a keyword-only
|
||
argument, meaning that having a definition like::
|
||
|
||
def __setitem__(self, index, value, **kwargs):
|
||
|
||
and a user that wants to pass a keyword ``value``::
|
||
|
||
x[value=1] = 0
|
||
|
||
expecting a call like::
|
||
|
||
type(obj).__setitem__(obj, SENTINEL, 0, **{"value": 1})
|
||
|
||
will instead accidentally be caught by the named ``value``, producing a
|
||
``duplicate value error``. The user should not be worried about the actual
|
||
local names of those two arguments if they are, for all practical purposes,
|
||
positional only. Unfortunately, using positional-only values will ensure this
|
||
does not happen but it will still not solve the need to pass both ``index`` and
|
||
``value`` even when the index is not provided. The point is that the user should not
|
||
be prevented to use keyword arguments to refer to a column ``index``, ``value``
|
||
(or ``self``) just because the class implementor happens to use those names
|
||
in the parameter list.
|
||
|
||
Moreover, we also require the three dunders to behave in the same way: it would
|
||
be extremely inconvenient if only ``__setitem__`` were to receive this
|
||
sentinel, and ``__get|delitem__`` would not because they can get away with a
|
||
signature that allows for no index specification, thus allowing for a
|
||
user-specified default index.
|
||
|
||
Whatever the choice of the sentinel, it will make the following cases
|
||
degenerate and thus impossible to differentiate in the dunder::
|
||
|
||
obj[k=3]
|
||
obj[SENTINEL, k=3]
|
||
|
||
The question now shifts to which entity should represent the sentinel:
|
||
the options were:
|
||
|
||
1. Empty tuple
|
||
2. None
|
||
3. NotImplemented
|
||
4. a new sentinel object (e.g. NoIndex)
|
||
|
||
For option 1, the call will become::
|
||
|
||
type(obj).__getitem__(obj, (), k=3)
|
||
|
||
therefore making ``obj[k=3]`` and ``obj[(), k=3]`` degenerate and indistinguishable.
|
||
|
||
This option sounds appealing because:
|
||
|
||
1. The numpy community was inquired [#numpy-ml]_, and the general consensus
|
||
of the responses was that the empty tuple felt appropriate.
|
||
2. It shows a parallel with the behavior of ``*args`` in a function, when
|
||
no positional arguments are given::
|
||
|
||
>>> def foo(*args, **kwargs):
|
||
... print(args, kwargs)
|
||
...
|
||
>>> foo(k=3)
|
||
() {'k': 3}
|
||
|
||
Although we do accept the following asymmetry in behavior compared to functions
|
||
when a single value is passed, but that ship has sailed::
|
||
|
||
>>> foo(5, k=3)
|
||
(5,) {'k': 3} # for indexing, a plain 5, not a 1-tuple is passed
|
||
|
||
For option 2, using ``None``, it was objected that NumPy uses it to indicate
|
||
inserting a new axis/dimensions (there's a ``np.newaxis`` alias as well)::
|
||
|
||
arr = np.array(5)
|
||
arr.ndim == 0
|
||
arr[None].ndim == arr[None,].ndim == 1
|
||
|
||
While this is not an insurmountable issue, it certainly will ripple onto numpy.
|
||
|
||
The only issues with both the above is that both the empty tuple and None are
|
||
potential legitimate indexes, and there might be value in being able to differentiate
|
||
the two degenerate cases.
|
||
|
||
So, an alternative strategy (option 3) would be to use an existing entity that is
|
||
unlikely to be used as a valid index. One option could be the current built-in constant
|
||
``NotImplemented``, which is currently returned by operators methods to
|
||
report that they do not implement a particular operation, and a different strategy
|
||
should be attempted (e.g. to ask the other object). Unfortunately, its name and
|
||
traditional use calls back to a feature that is not available, rather than the
|
||
fact that something was not passed by the user.
|
||
|
||
This leaves us with option 4: a new built-in constant. This constant
|
||
must be unhashable (so it's never going to be a valid key) and have a clear
|
||
name that makes it obvious its context: ``NoIndex``. This
|
||
would solve all the above issues, but the question is: is it worth it?
|
||
|
||
From a quick inquire, it seems that most people on python-ideas seem to believe
|
||
it's not crucial, and the empty tuple is an acceptable option. Hence the
|
||
resulting series will be::
|
||
|
||
obj[k=3]
|
||
# type(obj).__getitem__(obj, (), k=3). Empty tuple
|
||
|
||
obj[1, k=3]
|
||
# type(obj).__getitem__(obj, 1, k=3). Integer
|
||
|
||
obj[1, 2, k=3]
|
||
# type(obj).__getitem__(obj, (1, 2), k=3). Tuple
|
||
|
||
and the following two notation will be degenerate::
|
||
|
||
obj[(), k=3]
|
||
# type(obj).__getitem__(obj, (), k=3)
|
||
|
||
obj[k=3]
|
||
# type(obj).__getitem__(obj, (), k=3)
|
||
|
||
Common objections
|
||
=================
|
||
|
||
1. Just use a method call.
|
||
|
||
One of the use cases is typing, where the indexing is used exclusively, and
|
||
function calls are out of the question. Moreover, function calls do not handle
|
||
slice notation, which is commonly used in some cases for arrays.
|
||
|
||
One problem is type hint creation has been extended to built-ins in Python 3.9,
|
||
so that you do not have to import Dict, List, et al anymore.
|
||
|
||
Without kwdargs inside ``[]``, you would not be able to do this::
|
||
|
||
Vector = dict[i=float, j=float]
|
||
|
||
but for obvious reasons, call syntax using builtins to create custom type hints
|
||
isn't an option::
|
||
|
||
dict(i=float, j=float)
|
||
# would create a dictionary, not a type
|
||
|
||
Finally, function calls do not allow for a setitem-like notation, as shown
|
||
in the Overview: operations such as ``f(1, x=3) = 5`` are not allowed, and are
|
||
instead allowed for indexing operations.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#rejection] "Rejection of PEP 472"
|
||
(https://mail.python.org/pipermail/python-dev/2019-March/156693.html)
|
||
.. [#request-1] "Allow kwargs in __{get|set|del}item__"
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/)
|
||
.. [#request-2] "PEP 472 -- Support for indexing with keyword arguments"
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/)
|
||
.. [#trio-run] "trio.run() should take \*\*kwargs in addition to \*args"
|
||
(https://github.com/python-trio/trio/issues/470)
|
||
.. [#numpy-ml] "[Numpy-discussion] Request for comments on PEP 637 - Support for indexing with keyword arguments"
|
||
(http://numpy-discussion.10968.n7.nabble.com/Request-for-comments-on-PEP-637-Support-for-indexing-with-keyword-arguments-td48489.html)
|
||
.. [#reference-impl] "Reference implementation"
|
||
(https://github.com/python/cpython/compare/master...stefanoborini:PEP-637-implementation-attempt-2)
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|