python-peps/pep-0637.txt

1016 lines
33 KiB
Plaintext
Raw Normal View History

PEP: 637
Title: Support for indexing with keyword arguments
Version: $Revision$
Last-Modified: $Date$
Author: Stefano Borini, Jonathan Fine
Sponsor: Steven D'Aprano
Discussions-To: python-ideas@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 24-Aug-2020
Python-Version: 3.10
Post-History: 23-Sep-2020
Resolution:
Abstract
========
At present keyword arguments are allowed in function calls, but not in
item access. This PEP proposes that Python be extended to allow keyword
arguments in item access.
The following example shows keyword arguments for ordinary function calls:
::
>>> val = f(1, 2, a=3, b=4)
The proposal would extend the syntax to allow a similar construct
to indexing operations:
::
>>> val = x[1, 2, a=3, b=4] # getitem
>>> x[1, 2, a=3, b=4] = val # setitem
>>> del x[1, 2, a=3, b=4] # delitem
and would also provide appropriate semantics.
This PEP is a successor to PEP 472, which was rejected due to lack of
interest in 2019. Since then there's been renewed interest in the feature.
Overview
========
Background
----------
PEP 472 was opened in 2014. The PEP detailed various use cases and was created by
extracting implementation strategies from a broad discussion on the
python-ideas mailing list, although no clear consensus was reached on which strategy
should be used. Many corner cases have been examined more closely and felt
awkward, backward incompatible or both.
The PEP was eventually rejected in 2019 [#rejection]_ mostly
due to lack of interest for the feature despite its 5 years of existence.
However, with the introduction of type hints in PEP 484 [#pep-0484]_ the
square bracket notation has been used consistently to enrich the typing
annotations, e.g. to specify a list of integers as Sequence[int]. Additionally,
there has been an expanded growth of packages for data analysis such as pandas
and xarray, which use names to describe columns in a table (pandas) or axis in
an nd-array (xarray). These packages allow users to access specific data by
names, but cannot currently use index notation ([]) for this functionality.
As a result, a renewed interest in a more flexible syntax that would allow for
named information has been expressed occasionally in many different threads on
python-ideas, recently by Caleb Donovick [#request-1]_ in 2019 and Andras
Tantos [#request-2]_ in 2020. These requests prompted a strong activity on the
python-ideas mailing list, where the various options have been re-discussed and
a general consensus on an implementation strategy has now been reached.
Use cases
---------
The following practical use cases present different cases where a keyworded
specification would improve notation and provide additional value:
1. To provide a more communicative meaning to the index, preventing e.g. accidental
inversion of indexes
::
>>> grid_position[x=3, y=5, z=8]
>>> rain_amount[time=0:12, location=location]
>>> matrix[row=20, col=40]
2. To enrich the typing notation with keywords, especially during the use of generics
::
def function(value: MyType[T=int]):
3. In some domain, such as computational physics and chemistry, the use of a
notation such as ``Basis[Z=5]`` is a Domain Specific Language notation to represent
a level of accuracy
::
>>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])
4. Pandas currently uses a notation such as
::
>>> df[df['x'] == 1]
which could be replaced with df[x=1].
5. xarray has named dimensions. Currently these are handled with functions .isel:
::
>>> data.isel(row=10) # Returns the tenth row
which could also be replaced with `data[row=10]`. A more complex example:
::
>>> # old syntax
>>> da.isel(space=0, time=slice(None, 2))[...] = spam
>>> # new syntax
>>> da[space=0, time=:2] = spam
Another example:
::
>>> # old syntax
>>> ds["empty"].loc[dict(lon=5, lat=6)] = 10
>>> # new syntax
>>> ds["empty"][lon=5, lat=6] = 10
>>> # old syntax
>>> ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10
>>> # new syntax
>>> ds["empty"][lon=1:5, lat=6:] = 10
It is important to note that how the notation is interpreted is up to the
implementation. This PEP only defines and dictates the behavior of python
regarding passed keyword arguments, not how these arguments should be
interpreted and used by the implementing class.
Syntax and Semantics
====================
Current status
--------------
Before attacking the problem of detailing the new syntax and semantics to the
indexing notation, it is relevant to analyse how the indexing notation works
today, in which contexts, and how it is different from a function call.
Subscripting ``obj[x]`` is, effectively, an alternate and specialised form of
function call syntax with a number of differences and restrictions compared to
``obj(x)``. The current python syntax focuses exclusively on position to express
the index, and also contains syntactic sugar to refer to non-punctiform
selection (slices). Some common examples:
::
>>> a[3] # returns the fourth element of 'a'
>>> a[1:10:2] # slice notation (extract a non-trivial data subset)
>>> a[3, 2] # multiple indexes (for multidimensional arrays)
This translates into a ``__(get|set|del)item__`` dunder call which is passed a single
parameter containing the index (for ``__getitem__`` and ``__delitem__``) or two parameters
containing index and value (for ``__setitem__``).
The behavior of the indexing call is fundamentally different from a function call
in various aspects:
The first difference is in meaning to the reader. A function call says
"arbitrary function call potentially with side-effects". An indexing operation
says "lookup", typically to point at a subset or specific sub-aspect of an
entity (as in the case of typing notation). This fundamental difference means
that, while we cannot prevent abuse, implementors should be aware that the
introduction of keyword arguments to alter the behavior of the lookup may
violate this intrinsic meaning.
The second difference of the indexing notation compared to a function
is that indexing can be used for both getting and setting operations.
In python, a function cannot be on the left hand side of an assignment. In
other words, both of these are valid
::
>>> x = a[1, 2]
>>> a[1, 2] = 5
but only the first one of these is valid
::
>>> x = f(1, 2)
>>> f(1, 2) = 5 # invalid
This asymmetry is important, and makes one understand that there is a natural
imbalance between the two forms. It is therefore not a given that the two
should behave transparently and symmetrically.
The third difference is that functions have names assigned to their
arguments, unless the passed parameters are captured with \*args, in which case
they end up as entries in the args tuple. In other words, functions already
have anonymous argument semantic, exactly like the indexing operation. However,
__(get|set|del)item__ is not always receiving a tuple as the ``index`` argument
(to be uniform in behavior with \*args). In fact, given a trivial class:
::
class X:
def __getitem__(self, index):
print(index)
The index operation basically forwards the content of the square brackets "as is"
in the ``index`` argument:
::
>>> x=X()
>>> x[0]
0
>>> x[0, 1]
(0, 1)
>>> x[(0, 1)]
(0, 1)
>>>
>>> x[()]
()
>>> x[{1, 2, 3}]
{1, 2, 3}
>>> x["hello"]
hello
>>> x["hello", "hi"]
('hello', 'hi')
The fourth difference is that the indexing operation knows how to convert
colon notations to slices, thanks to support from the parser. This is valid
::
a[1:3]
this one isn't
::
f(1:3)
The fifth difference is that there's no zero-argument form. This is valid
::
f()
this one isn't
::
a[]
New Proposal
------------
Before describing the new proposal, it is important to stress the difference in
nomenclature between _index_ and keyword _argument_, as it is important to
understand the fundamental asymmetry between the two. The ``__(get|set|del)item__``
is fundamentally an indexing operation, and the way the element is retrieved,
set, or deleted is through an index.
The current status quo is to build a _final_ index from what is passed between
square brackets, the _positional_ index. In other words, what is passed in the
square brackets is trivially used to generate what the code in ``__getitem__`` then uses
for the indicisation operation. As we already saw for the dict, ``d[1]`` has a
positional index of ``1`` and also a final index of ``1`` (because it's the element that is
then added to the dictionary) and ``d[1, 2]`` has positional index of ``(1, 2)`` and
final index also of ``(1, 2)`` (because yet again it's the element that is added to the dictionary).
However, the positional index ``d[1,2:3]`` is not accepted by the dictionary, because
there's no way to transform the positional index into a final index, as the slice object is
unhashable. The positional index is what is currently known as the ``index`` parameter in
``__getitem__``. Nevertheless, nothing prevents to construct a dictionary-like class that
creates the final index by e.g. converting the positional index to a string.
The new proposal extends the current status quo, and grants more flexibility to
create the _final_ index via an enhanced syntax that combines the positional index
and keyword arguments, if passed.
The above brings an important point across. Keyword arguments, in the context of the index
operation, may be used to take indexing decisions to obtain the final index, and therefore
will have to accept values that are unconventional for functions. See for
example use case 1, where a slice is accepted.
The new notation will make all of the following valid notation:
::
>>> a[1] # Current case, single index
>>> a[1, 2] # Current case, multiple indexes
>>> a[1, 2:5] # Current case, slicing.
>>> a[3, R=3, K=4] # New case. Single index, and keyword arguments
>>> a[K=3, R=2] # New case. No index with keyword arguments
>>> a[3, R=3:10, K=4] # New case. Slice in keyword argument
>>> a[3, R=..., K=4] # New case. Ellipsis in keyword argument
The new notation will NOT make the following valid notation:
::
>>> a[] # INVALID. No index and no keyword arguments.
It is worth stressing out that none of what is proposed in this PEP will change
the behavior of the current core classes that use indexing. Adding keywords to
the index operation for custom classes is not the same as modifying e.g. the
standard dict type to handle keyword arguments. In fact, dict (as well as list and other
stdlib classes with indexing semantics) will remain the same and will continue
not to accept keyword arguments.
Syntax and Semantics
====================
The following old semantics are preserved:
1. As said above, an empty subscript is still illegal, regardless of context.
::
obj[] # SyntaxError
2. A single index value remains a single index value when passed:
::
obj[index]
# calls type(obj).__getitem__(obj, index)
obj[index] = value
# calls type(obj).__setitem__(obj, index, value)
del obj[index]
# calls type(obj).__delitem__(obj, index)
This remains the case even if the index is followed by keywords; see point 5 below.
3. Comma-seperated arguments are still parsed as a tuple and passed as
a single positional argument:
::
obj[spam, eggs]
# calls type(obj).__getitem__(obj, (spam, eggs))
obj[spam, eggs] = value
# calls type(obj).__setitem__(obj, (spam, eggs), value)
del obj[spam, eggs]
# calls type(obj).__delitem__(obj, (spam, eggs))
The points above mean that classes which do not want to support keyword
arguments in subscripts need do nothing at all, and the feature is therefore
completely backwards compatible.
4. Keyword arguments, if any, must follow positional arguments.
::
obj[1, 2, spam=None, 3] # SyntaxError
This is like function calls, where intermixing positional and keyword
arguments give a SyntaxError.
5. Keyword subscripts, if any, will be handled like they are in
function calls. Examples:
::
# Single index with keywords:
obj[index, spam=1, eggs=2]
# calls type(obj).__getitem__(obj, index, spam=1, eggs=2)
obj[index, spam=1, eggs=2] = value
# calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2)
del obj[index, spam=1, eggs=2]
# calls type(obj).__delitem__(obj, index, spam=1, eggs=2)
# Comma-separated indices with keywords:
obj[foo, bar, spam=1, eggs=2]
# calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2)
obj[foo, bar, spam=1, eggs=2] = value
# calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2)
del obj[foo, bar, spam=1, eggs=2]
# calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2)
Note that:
- a single positional index will not turn into a tuple
just because one adds a keyword value.
- for ``__setitem__``, the same order is retained for index and value.
The keyword arguments go at the end, as is normal for a function
definition.
6. The same rules apply with respect to keyword subscripts as for
keywords in function calls:
- the interpeter matches up each keyword subscript to a named parameter
in the appropriate method;
- if a named parameter is used twice, that is an error;
- if there are any named parameters left over (without a value) when the
keywords are all used, they are assigned their default value (if any);
- if any such parameter doesn't have a default, that is an error;
- if there are any keyword subscripts remaining after all the named
parameters are filled, and the method has a ``**kwargs`` parameter,
they are bound to the ``**kwargs`` parameter as a dict;
- but if no ``**kwargs`` parameter is defined, it is an error.
7. Sequence unpacking remains a syntax error inside subscripts:
::
obj[*items]
Reason: unpacking items would result it being immediately repacked into
a tuple. Anyone using sequence unpacking in the subscript is probably
confused as to what is happening, and it is best if they receive an
immediate syntax error with an informative error message.
This restriction has however been considered arbitrary by some, and it might
be lifted at a later stage for symmetry with kwargs unpacking, see next.
8. Dict unpacking is permitted:
::
items = {'spam': 1, 'eggs': 2}
obj[index, **items]
# equivalent to obj[index, spam=1, eggs=2]
9. Keyword-only subscripts are permitted. The positional index will be the empty tuple:
::
obj[spam=1, eggs=2]
# calls type(obj).__getitem__(obj, (), spam=1, eggs=2)
obj[spam=1, eggs=2] = 5
# calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2)
del obj[spam=1, eggs=2]
# calls type(obj).__delitem__(obj, (), spam=1, eggs=2)
10. Keyword arguments must allow slice syntax.
::
obj[3:4, spam=1:4, eggs=2]
# calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2)
This may open up the possibility to accept the same syntax for general function
calls, but this is not part of this recommendation.
11. Keyword arguments must allow Ellipsis
::
obj[..., spam=..., eggs=2]
# calls type(obj).__getitem__(obj, Ellipsis, spam=Ellipsis, eggs=2)
12. Keyword arguments allow for default values
::
# Given type(obj).__getitem__(obj, index, spam=True, eggs=2)
obj[3] # Valid. index = 3, spam = True, eggs = 2
obj[3, spam=False] # Valid. index = 3, spam = False, eggs = 2
obj[spam=False] # Valid. index = (), spam = False, eggs = 2
obj[] # Invalid.
13. The same semantics given above must be extended to ``__class__getitem__``:
Since PEP 560, type hints are dispatched so that for ``x[y]``, if no
``__getitem__`` method is found, and ``x`` is a type (class) object,
and ``x`` has a class method ``__class_getitem__``, that method is
called. The same changes should be applied to this method as well,
so that a writing like ``list[T=int]`` can be accepted.
Existing indexing implementations in standard classes
-----------------------------------------------------
As said before, we recommend that current classes that use indexing operations
do not modify their behavior. In other words, if ``d`` is a ``dict``, the
statement ``d[1, a=2]`` will raise ``TypeError``, as their implementation will
not support the use of keyword arguments. The same holds for all other classes
(list, frozendict, etc.)
Corner case and Gotchas
-----------------------
With the introduction of the new notation, a few corner cases need to be analysed:
1. Technically, if a class defines their getter like this:
::
def __getitem__(self, index):
then the caller could call that using keyword syntax, like these two cases:
::
obj[3, index=4]
obj[index=1]
The resulting behavior would be an error automatically, since it would be like
attempting to call the method with two values for the ``index`` argument, and
a ``TypeError`` will be raised. In the first case, the ``index`` would be ``3``,
in the second case, it would be the empty tuple ``()``.
Note that this behavior applies for all currently existing classes that rely on
indexing, meaning that there is no way for the new behavior to introduce
backward compatibility issues on this respect.
Classes that wish to stress this behavior explicitly can define their
parameters as positional-only:
::
def __getitem__(self, index, /):
2. a similar case occurs with setter notation
::
# Given type(obj).__getitem__(self, index, value):
obj[1, value=3] = 5
This poses no issue because the value is passed automatically, and the python interpreter will raise
``TypeError: got multiple values for keyword argument 'value'``
3. If the subscript dunders are declared to use positional-or-keyword
parameters, there may be some surprising cases when arguments are passed
to the method. Given the signature:
::
def __getitem__(self, index, direction='north')
if the caller uses this:
::
obj[0, 'south']
they will probably be surprised by the method call:
::
# expected type(obj).__getitem__(0, direction='south')
# but actually get:
obj.__getitem__((0, 'south'), direction='north')
Solution: best practice suggests that keyword subscripts should be
flagged as keyword-only when possible:
::
def __getitem__(self, index, *, direction='north')
The interpreter need not enforce this rule, as there could be scenarios
where this is the desired behaviour. But linters may choose to warn
about subscript methods which don't use the keyword-only flag.
4. As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.:
``d[1, a=3]`` is treated as ``__getitem__(1, a=3)``, NOT ``__getitem__((1,), a=3)``. It would be
extremely confusing if adding keyword arguments were to change the type of the passed index.
In other words, adding a keyword to a single-valued subscript will not change it into a tuple.
For those cases where an actual tuple needs to be passed, a proper syntax will have to be used:
::
obj[(1,), a=3] # calls __getitem__((1,), a=3)
In this case, the call is passing a single element (which is passed as is, as from rule above),
only that the single element happens to be a tuple.
Note that this behavior just reveals the truth that the ``obj[1,]`` notation is shorthand for
``obj[(1,)]`` (and also ``obj[1]`` is shorthand for ``obj[(1)]``, with the expected behavior).
When keywords are present, the rule that you can omit this outermost pair of parentheses is no
longer true.
::
obj[1] # calls __getitem__(1)
obj[1, a=3] # calls __getitem__(1, a=3)
obj[1,] # calls __getitem__((1,))
obj[(1,), a=3] # calls __getitem__((1,), a=3)
This is particularly relevant in the case where two entries are passed:
::
obj[1, 2] # calls __getitem__((1, 2))
obj[(1, 2)] # same as above
obj[1, 2, a=3] # calls __getitem__((1, 2), a=3)
obj[(1, 2), a=3] # calls __getitem__((1, 2), a=3)
And particularly when the tuple is extracted as a variable:
::
t = (1, 2)
obj[t] # calls __getitem__((1, 2))
obj[t, a=3] # calls __getitem__((1, 2), a=3)
Why? because in the case ``obj[1, 2, a=3]`` we are passing two elements (which
are then packed as a tuple and passed as the index). In the case ``obj[(1, 2), a=3]``
we are passing a single element (which is passed as is) which happens to be a tuple.
The final result is that they are the same.
C Interface
===========
Resolution of the indexing operation is performed through a call to
``PyObject_GetItem(PyObject *o, PyObject *key)`` for the get operation,
``PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value)`` for the set operation, and
``PyObject_DelItem(PyObject *o, PyObject *key)`` for the del operation.
These functions are used extensively within the python executable, and are
also part of the public C API, as exported by ``Include/abstract.h``. It is clear that
the signature of this function cannot be changed, and different C level functions
need to be implemented to support the extended call. We propose
``PyObject_GetItemEx(PyObject *o, PyObject *key, PyObject *kwargs)``,
``PyObject_SetItemEx(PyObject *o, PyObject *key, PyObject *value, PyObject *kwargs)`` and
``PyObject_DetItemEx(PyObject *o, PyObject *key, PyObject *kwargs)``.
Additionally, new opcodes will be needed for the enhanced call.
Currently, the implementation uses ``BINARY_SUBSCR``, ``STORE_SUBSCR`` and ``DELETE_SUBSCR``
to invoke the old functions. We propose ``BINARY_SUBSCR_EX``,
``STORE_SUBSCR_EX`` and ``DELETE_SUBSCR_EX`` for the extended operation. The parser will
have to generate these new opcodes. The ``PyObject_(Get|Set|Del)Item`` implementations
will call the extended methods passing ``NULL`` as kwargs.
Workarounds
===========
Every PEP that changes the Python language should "clearly explain why
the existing language specification is inadequate to address the
problem that the PEP solves." [#pep-0001]_
Some rough equivalents to the proposed extension, which we call work-arounds,
are already possible. The work-arounds provide an alternative to enabling the
new syntax, while leaving the semantics to be defined elsewhere.
These work-arounds follow. In them the helpers ``H`` and ``P`` are not intended to
be universal. For example, a module or package might require the use of its own
helpers.
1. User defined classes can be given ``getitem`` and ``delitem`` methods,
that respectively get and delete values stored in a container.
::
>>> val = x.getitem(1, 2, a=3, b=4)
>>> x.delitem(1, 2, a=3, b=4)
The same can't be done for ``setitem``. It's not valid syntax.
::
>>> x.setitem(1, 2, a=3, b=4) = val
SyntaxError: can't assign to function call
2. A helper class, here called ``H``, can be used to swap the container
and parameter roles. In other words, we use
::
H(1, 2, a=3, b=4)[x]
as a substitute for
::
x[1, 2, a=3, b=4]
This method will work for ``getitem``, ``delitem`` and also for
``setitem``. This is because
::
>>> H(1, 2, a=3, b=4)[x] = val
is valid syntax, which can be given the appropriate semantics.
3. A helper function, here called ``P``, can be used to store the
arguments in a single object. For example
::
>>> x[P(1, 2, a=3, b=4)] = val
is valid syntax, and can be given the appropriate semantics.
4. The ``lo:hi:step`` syntax for slices is sometimes very useful. This
syntax is not directly available in the work-arounds. However
::
s[lo:hi:step]
provides a work-around that is available everything, where
::
class S:
def __getitem__(self, key): return key
s = S()
defines the helper object `s`.
Rejected Ideas
==============
Previous PEP 472 solutions
--------------------------
PEP 472 presents a good amount of ideas that are now all to be considered
Rejected. A personal email from D'Aprano to one of the authors (Stefano Borini)
specifically said:
I have now carefully read through PEP 472 in full, and I am afraid I
cannot support any of the strategies currently in the PEP.
We agree that those options are inferior to the currently presented, for one
reason or another.
To keep this document compact, we will not present here the objections for
all options presented in PEP 472. Suffice to say that they were discussed,
and each proposed alternative had one or few dealbreakers.
Adding new dunders
------------------
It was proposed to introduce new dunders ``__(get|set|del)item_ex__``
that are invoked over the ``__(get|set|del)item__`` triad, if they are present.
The rationale around this choice is to make the intuition around how to add kwd
arg support to square brackets more obvious and in line with the function
behavior. Given:
::
def __getitem_ex__(self, x, y): ...
These all just work and produce the same result effortlessly:
::
obj[1, 2]
obj[1, y=2]
obj[y=2, x=1]
In other words, this solution would unify the behavior of ``__getitem__`` to the traditional
function signature, but since we can't change ``__getitem__`` and break backward compatibility,
we would have an extended version that is used preferentially.
The problems with this approach were found to be:
- It will slow down subscripting. For every subscript access, this new dunder
attribute gets investigated on the class, and if it is not present then the
default key translation function is executed.
Different ideas were proposed to handle this, from wrapping the method
only at class instantiation time, to add a bit flag to signal the availability
of these methods. Regardess of the solution, the new dunder would be effective
only if added at class creation time, not if it's added later. This would
be unusual and would disallow (and behave unexpectedly) monkeypatching of the
methods for whatever reason it might be needed.
- It adds complexity to the mechanism.
- Will require a long and painful transition period during which time
libraries will have to somehow support both calling conventions, because most
likely, the extended methods will delegate to the traditional ones when the
right conditions are matched in the arguments, or some classes will support
the traditional dunder and others the extended dunder. While this will not
affect calling code, it will affect development.
- it would potentially lead to mixed situations where the extended version is
defined for the getter, but not for the setter.
- In the __setitem_ex__ signature, value would have to be made the first
element, because the index is of arbitrary length depending on the specified
indexes. This would look awkward because the visual notation does not match
the signature:
::
obj[1, 2] = 3 # calls obj.__setitem_ex__(3, 1, 2)
- the solution relies on the assumption that all keyword indices necessarily map
into positional indices, or that they must have a name. This assumption may be
false: xarray, which is the primary python package for numpy arrays with
labelled dimensions, supports indexing by additional dimensions (so called
"non-dimension coordinates") that don't correspond directly to the dimensions
of the underlying numpy array, and those have no position to match up to.
In other words, anonymous indexes are a plausible use case that this solution
would remove, although it could be argued that using ``*args`` would solve
that issue.
Adding an adapter function
--------------------------
Similar to the above, in the sense that a pre-function would be called to
convert the "new style" indexing into "old style indexing" that is then passed.
Has problems similar to the above.
create a new "kwslice" object
-----------------------------
This proposal has already been explored in "New arguments contents" P4 in PEP 472.
::
obj[a, b:c, x=1] # calls __getitem__(a, slice(b, c), key(x=1))
This solution requires everyone who needs keyword arguments to parse the tuple
and/or key object by hand to extract them. This is painful and opens up to the
get/set/del function to always accept arbitrary keyword arguments, whether they
make sense or not. We want the developer to be able to specify which arguments
make sense and which ones do not.
Using a single bit to change the behavior
-----------------------------------------
A special class dunder flag
::
__keyfn__ = True
would change the signature of the ``__get|set|delitem__`` to a "function like" dispatch,
meaning that this
::
>>> d[1, 2, z=3]
would result in a call to
::
>>> d.__getitem__(1, 2, z=3) # instead of d.__getitem__((1, 2), z=3)
This option has been rejected because it feels odd that a signature of a method
depends on a specific value of another dunder. It would be confusing for both
static type checkers and for humans: a static type checker would have to hard-code
a special case for this, because there really is nothing else in Python
where the signature of a dunder depends on the value of another dunder.
A human that has to implement a ``__getitem__`` dunder would have to look if in the
class (or in any of its subclasses) for a ``__keyfn__`` before the dunder can be written.
Moreover, adding a base classes that have the ``__keyfn__`` flag set would break
the signature of the current methods. This would be even more problematic if the
flag is changed at runtime, or if the flag is generated by calling a function
that returns randomly True or something else.
Allowing for empty index notation obj[]
---------------------------------------
The current proposal prevents ``obj[]`` from being valid notation. However
a commenter stated
We have ``Tuple[int, int]`` as a tuple of two integers. And we have `Tuple[int]`
as a tuple of one integer. And occasionally we need to spell a tuple of *no*
values, since that's the type of ``()``. But we currently are forced to write
that as ``Tuple[()]``. If we allowed ``Tuple[]`` that odd edge case would be
removed.
So I probably would be okay with allowing ``obj[]`` syntactically, as long as the
dict type could be made to reject it.
This proposal already established that, in case no positional index is given, the
passed value must be the empty tuple. Allowing for the empty index notation would
make the dictionary type accept it automatically, to insert or refer to the value with
the empty tuple as key. Moreover, a typing notation such as ``Tuple[]`` can easily
be written as ``Tuple`` without the indexing notation.
Use None instead of the empty tuple when no positional index is given
---------------------------------------------------------------------
The case ``obj[k=3]`` will lead to a call ``__getitem__((), k=3)``.
The alternative ``__getitem__(None, k=3)`` was considered but rejected:
NumPy uses `None` to indicate inserting a new axis/dimensions (there's
a ``np.newaxis`` alias as well):
::
arr = np.array(5)
arr.ndim == 0
arr[None].ndim == arr[None,].ndim == 1
So the final conclusion is that we favor the following series:
::
obj[k=3] # __getitem__((), k=3). Empty tuple
obj[1, k=3] # __getitem__(1, k=3). Integer
obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple
more than this:
::
obj[k=3] # __getitem__(None, k=3). None
obj[1, k=3] # __getitem__(1, k=3). Integer
obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple
With the first more in line with a \*args semantics for calling a routine with
no positional arguments
::
>>> def foo(*args, **kwargs):
... print(args, kwargs)
...
>>> foo(k=3)
() {'k': 3}
Although we accept the following asymmetry:
::
>>> foo(1, k=3)
(1,) {'k': 3}
Common objections
=================
1. Just use a method call.
One of the use cases is typing, where the indexing is used exclusively, and
function calls are out of the question. Moreover, function calls do not handle
slice notation, which is commonly used in some cases for arrays.
One problem is type hint creation has been extended to built-ins in python 3.9,
so that you do not have to import Dict, List, et al anymore.
Without kwdargs inside ``[]``, you would not be able to do this:
::
Vector = dict[i=float, j=float]
but for obvious reasons, call syntax using builtins to create custom type hints
isn't an option:
::
dict(i=float, j=float) # would create a dictionary, not a type
References
==========
.. [#rejection] "Rejection of PEP 472"
(https://mail.python.org/pipermail/python-dev/2019-March/156693.html)
.. [#pep-0484] "PEP 484 -- Type hints"
(https://www.python.org/dev/peps/pep-0484)
.. [#request-1] "Allow kwargs in __{get|set|del}item__"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/)
.. [#request-2] "PEP 472 -- Support for indexing with keyword arguments"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/)
.. [#pep-0001] "PEP 1 -- PEP Purpose and Guidelines"
(https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: