2014-07-13 20:40:44 -04:00
|
|
|
|
PEP: 472
|
|
|
|
|
Title: Support for indexing with keyword arguments
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Author: Stefano Borini, Joseph Martinot-Lagarde
|
2014-07-18 12:22:40 -04:00
|
|
|
|
Discussions-To: python-ideas@python.org
|
2019-03-15 18:02:57 -04:00
|
|
|
|
Status: Rejected
|
2014-07-13 20:40:44 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 24-Jun-2014
|
|
|
|
|
Python-Version: 3.6
|
|
|
|
|
Post-History: 02-Jul-2014
|
2019-03-15 18:02:57 -04:00
|
|
|
|
Resolution: https://mail.python.org/pipermail/python-dev/2019-March/156693.html
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This PEP proposes an extension of the indexing operation to support keyword
|
|
|
|
|
arguments. Notations in the form ``a[K=3,R=2]`` would become legal syntax.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
For future-proofing considerations, ``a[1:2, K=3, R=4]`` are considered and
|
2014-07-13 20:40:44 -04:00
|
|
|
|
may be allowed as well, depending on the choice for implementation. In addition
|
|
|
|
|
to a change in the parser, the index protocol (``__getitem__``, ``__setitem__``
|
|
|
|
|
and ``__delitem__``) will also potentially require adaptation.
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
The indexing syntax carries a strong semantic content, differentiating it from
|
|
|
|
|
a method call: it implies referring to a subset of data. We believe this
|
|
|
|
|
semantic association to be important, and wish to expand the strategies allowed
|
|
|
|
|
to refer to this data.
|
|
|
|
|
|
|
|
|
|
As a general observation, the number of indices needed by an indexing operation
|
|
|
|
|
depends on the dimensionality of the data: one-dimensional data (e.g. a list)
|
|
|
|
|
requires one index (e.g. ``a[3]``), two-dimensional data (e.g. a matrix) requires
|
|
|
|
|
two indices (e.g. ``a[2,3]``) and so on. Each index is a selector along one of the
|
|
|
|
|
axes of the dimensionality, and the position in the index tuple is the
|
|
|
|
|
metainformation needed to associate each index to the corresponding axis.
|
|
|
|
|
|
|
|
|
|
The current python syntax focuses exclusively on position to express the
|
|
|
|
|
association to the axes, and also contains syntactic sugar to refer to
|
|
|
|
|
non-punctiform selection (slices)
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> a[3] # returns the fourth element of a
|
|
|
|
|
>>> a[1:10:2] # slice notation (extract a non-trivial data subset)
|
|
|
|
|
>>> a[3,2] # multiple indexes (for multidimensional arrays)
|
|
|
|
|
|
|
|
|
|
The additional notation proposed in this PEP would allow notations involving
|
|
|
|
|
keyword arguments in the indexing operation, e.g.
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> a[K=3, R=2]
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
which would allow to refer to axes by conventional names.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
One must additionally consider the extended form that allows both positional
|
|
|
|
|
and keyword specification
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> a[3,R=3,K=4]
|
|
|
|
|
|
|
|
|
|
This PEP will explore different strategies to enable the use of these notations.
|
|
|
|
|
|
|
|
|
|
Use cases
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
The following practical use cases present two broad categories of usage of a
|
|
|
|
|
keyworded specification: Indexing and contextual option. For indexing:
|
|
|
|
|
|
|
|
|
|
1. To provide a more communicative meaning to the index, preventing e.g. accidental
|
|
|
|
|
inversion of indexes
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> gridValues[x=3, y=5, z=8]
|
|
|
|
|
>>> rain[time=0:12, location=location]
|
|
|
|
|
|
|
|
|
|
2. In some domain, such as computational physics and chemistry, the use of a
|
2017-03-24 17:11:33 -04:00
|
|
|
|
notation such as ``Basis[Z=5]`` is a Domain Specific Language notation to represent
|
2014-07-13 20:40:44 -04:00
|
|
|
|
a level of accuracy
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])
|
|
|
|
|
|
|
|
|
|
In this case, the index operation would return a basis set at the chosen level
|
|
|
|
|
of accuracy (represented by the parameter Z). The reason behind an indexing is that
|
|
|
|
|
the BasisSet object could be internally represented as a numeric table, where
|
2017-03-24 17:11:33 -04:00
|
|
|
|
rows (the "coefficient" axis, hidden to the user in this example) are associated
|
2014-07-13 20:40:44 -04:00
|
|
|
|
to individual elements (e.g. row 0:5 contains coefficients for element 1,
|
|
|
|
|
row 5:8 coefficients for element 2) and each column is associated to a given
|
|
|
|
|
degree of accuracy ("accuracy" or "Z" axis) so that first column is low
|
2017-03-24 17:11:33 -04:00
|
|
|
|
accuracy, second column is medium accuracy and so on. With that indexing,
|
2014-07-13 20:40:44 -04:00
|
|
|
|
the user would obtain another object representing the contents of the column
|
|
|
|
|
of the internal table for accuracy level 3.
|
|
|
|
|
|
|
|
|
|
Additionally, the keyword specification can be used as an option contextual to
|
|
|
|
|
the indexing. Specifically:
|
|
|
|
|
|
|
|
|
|
1. A "default" option allows to specify a default return value when the index
|
|
|
|
|
is not present
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> lst = [1, 2, 3]
|
|
|
|
|
>>> value = lst[5, default=0] # value is 0
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
2. For a sparse dataset, to specify an interpolation strategy
|
2014-07-13 20:40:44 -04:00
|
|
|
|
to infer a missing point from e.g. its surrounding data.
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> value = array[1, 3, interpolate=spline_interpolator]
|
|
|
|
|
|
|
|
|
|
3. A unit could be specified with the same mechanism
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> value = array[1, 3, unit="degrees"]
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
How the notation is interpreted is up to the implementing class.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
Current implementation
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
Currently, the indexing operation is handled by methods ``__getitem__``,
|
|
|
|
|
``__setitem__`` and ``__delitem__``. These methods' signature accept one argument
|
|
|
|
|
for the index (with ``__setitem__`` accepting an additional argument for the set
|
|
|
|
|
value). In the following, we will analyze ``__getitem__(self, idx)`` exclusively,
|
|
|
|
|
with the same considerations implied for the remaining two methods.
|
|
|
|
|
|
|
|
|
|
When an indexing operation is performed, ``__getitem__(self, idx)`` is called.
|
|
|
|
|
Traditionally, the full content between square brackets is turned into a single
|
|
|
|
|
object passed to argument ``idx``:
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
- When a single element is passed, e.g. ``a[2]``, ``idx`` will be ``2``.
|
2016-05-03 04:18:02 -04:00
|
|
|
|
- When multiple elements are passed, they must be separated by commas: ``a[2, 3]``.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
In this case, ``idx`` will be a tuple ``(2, 3)``. With ``a[2, 3, "hello", {}]``
|
2016-05-03 04:18:02 -04:00
|
|
|
|
``idx`` will be ``(2, 3, "hello", {})``.
|
|
|
|
|
- A slicing notation e.g. ``a[2:10]`` will produce a slice object, or a tuple
|
|
|
|
|
containing slice objects if multiple values were passed.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
Except for its unique ability to handle slice notation, the indexing operation
|
|
|
|
|
has similarities to a plain method call: it acts like one when invoked with
|
|
|
|
|
only one element; If the number of elements is greater than one, the ``idx``
|
|
|
|
|
argument behaves like a ``*args``. However, as stated in the Motivation section,
|
|
|
|
|
an indexing operation has the strong semantic implication of extraction of a
|
|
|
|
|
subset out of a larger set, which is not automatically associated to a regular
|
|
|
|
|
method call unless appropriate naming is chosen. Moreover, its different visual
|
|
|
|
|
style is important for readability.
|
|
|
|
|
|
|
|
|
|
Specifications
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
The implementation should try to preserve the current signature for
|
|
|
|
|
``__getitem__``, or modify it in a backward-compatible way. We will present
|
|
|
|
|
different alternatives, taking into account the possible cases that need
|
|
|
|
|
to be addressed
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C0. a[1]; a[1,2] # Traditional indexing
|
2017-03-24 17:11:33 -04:00
|
|
|
|
C1. a[Z=3]
|
2014-07-13 20:40:44 -04:00
|
|
|
|
C2. a[Z=3, R=4]
|
|
|
|
|
C3. a[1, Z=3]
|
|
|
|
|
C4. a[1, Z=3, R=4]
|
|
|
|
|
C5. a[1, 2, Z=3]
|
|
|
|
|
C6. a[1, 2, Z=3, R=4]
|
|
|
|
|
C7. a[1, Z=3, 2, R=4] # Interposed ordering
|
|
|
|
|
|
|
|
|
|
Strategy "Strict dictionary"
|
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
This strategy acknowledges that ``__getitem__`` is special in accepting only
|
|
|
|
|
one object, and the nature of that object must be non-ambiguous in its
|
|
|
|
|
specification of the axes: it can be either by order, or by name. As a result
|
|
|
|
|
of this assumption, in presence of keyword arguments, the passed entity is a
|
|
|
|
|
dictionary and all labels must be specified.
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2)
|
|
|
|
|
C1. a[Z=3] -> idx = {"Z": 3}
|
|
|
|
|
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}
|
|
|
|
|
C3. a[1, Z=3] -> raise SyntaxError
|
|
|
|
|
C4. a[1, Z=3, R=4] -> raise SyntaxError
|
|
|
|
|
C5. a[1, 2, Z=3] -> raise SyntaxError
|
|
|
|
|
C6. a[1, 2, Z=3, R=4] -> raise SyntaxError
|
|
|
|
|
C7. a[1, Z=3, 2, R=4] -> raise SyntaxError
|
|
|
|
|
|
|
|
|
|
Pros
|
|
|
|
|
''''
|
|
|
|
|
|
|
|
|
|
- Strong conceptual similarity between the tuple case and the dictionary case.
|
|
|
|
|
In the first case, we are specifying a tuple, so we are naturally defining
|
|
|
|
|
a plain set of values separated by commas. In the second, we are specifying a
|
|
|
|
|
dictionary, so we are specifying a homogeneous set of key/value pairs, as
|
|
|
|
|
in ``dict(Z=3, R=4)``;
|
2017-03-24 17:11:33 -04:00
|
|
|
|
- Simple and easy to parse on the ``__getitem__`` side: if it gets a tuple,
|
|
|
|
|
determine the axes using positioning. If it gets a dictionary, use
|
2014-07-13 20:40:44 -04:00
|
|
|
|
the keywords.
|
|
|
|
|
- C interface does not need changes.
|
|
|
|
|
|
|
|
|
|
Neutral
|
|
|
|
|
'''''''
|
|
|
|
|
|
|
|
|
|
- Degeneracy of ``a[{"Z": 3, "R": 4}]`` with ``a[Z=3, R=4]`` means the notation
|
|
|
|
|
is syntactic sugar.
|
|
|
|
|
|
|
|
|
|
Cons
|
|
|
|
|
''''
|
|
|
|
|
|
|
|
|
|
- Very strict.
|
|
|
|
|
- Destroys ordering of the passed arguments. Preserving the
|
2022-01-21 06:03:51 -05:00
|
|
|
|
order would be possible with an OrderedDict as drafted by :pep:`468`.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
- Does not allow use cases with mixed positional/keyword arguments such as
|
2014-07-13 20:40:44 -04:00
|
|
|
|
``a[1, 2, default=5]``.
|
|
|
|
|
|
|
|
|
|
Strategy "mixed dictionary"
|
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
|
|
This strategy relaxes the above constraint to return a dictionary containing
|
|
|
|
|
both numbers and strings as keys.
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2)
|
|
|
|
|
C1. a[Z=3] -> idx = {"Z": 3}
|
|
|
|
|
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}
|
|
|
|
|
C3. a[1, Z=3] -> idx = { 0: 1, "Z": 3}
|
|
|
|
|
C4. a[1, Z=3, R=4] -> idx = { 0: 1, "Z": 3, "R": 4}
|
|
|
|
|
C5. a[1, 2, Z=3] -> idx = { 0: 1, 1: 2, "Z": 3}
|
|
|
|
|
C6. a[1, 2, Z=3, R=4] -> idx = { 0: 1, 1: 2, "Z": 3, "R": 4}
|
|
|
|
|
C7. a[1, Z=3, 2, R=4] -> idx = { 0: 1, "Z": 3, 2: 2, "R": 4}
|
|
|
|
|
|
|
|
|
|
Pros
|
|
|
|
|
''''
|
|
|
|
|
- Opens for mixed cases.
|
|
|
|
|
|
|
|
|
|
Cons
|
|
|
|
|
''''
|
|
|
|
|
- Destroys ordering information for string keys. We have no way of saying if
|
|
|
|
|
``"Z"`` in C7 was in position 1 or 3.
|
|
|
|
|
- Implies switching from a tuple to a dict as soon as one specified index
|
|
|
|
|
has a keyword argument. May be confusing to parse.
|
|
|
|
|
|
|
|
|
|
Strategy "named tuple"
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
Return a named tuple for ``idx`` instead of a tuple. Keyword arguments would
|
|
|
|
|
obviously have their stated name as key, and positional argument would have an
|
|
|
|
|
underscore followed by their order:
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C0. a[1]; a[1,2] -> idx = 1; idx = (_0=1, _1=2)
|
|
|
|
|
C1. a[Z=3] -> idx = (Z=3)
|
|
|
|
|
C2. a[Z=3, R=2] -> idx = (Z=3, R=2)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
C3. a[1, Z=3] -> idx = (_0=1, Z=3)
|
2014-07-13 20:40:44 -04:00
|
|
|
|
C4. a[1, Z=3, R=2] -> idx = (_0=1, Z=3, R=2)
|
|
|
|
|
C5. a[1, 2, Z=3] -> idx = (_0=1, _2=2, Z=3)
|
|
|
|
|
C6. a[1, 2, Z=3, R=4] -> (_0=1, _1=2, Z=3, R=4)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
C7. a[1, Z=3, 2, R=4] -> (_0=1, Z=3, _1=2, R=4)
|
2014-07-13 20:40:44 -04:00
|
|
|
|
or (_0=1, Z=3, _2=2, R=4)
|
|
|
|
|
or raise SyntaxError
|
|
|
|
|
|
|
|
|
|
The required typename of the namedtuple could be ``Index`` or the name of the
|
|
|
|
|
argument in the function definition, it keeps the ordering and is easy to
|
|
|
|
|
analyse by using the ``_fields`` attribute. It is backward compatible, provided
|
|
|
|
|
that C0 with more than one entry now passes a namedtuple instead of a plain
|
2017-03-24 17:11:33 -04:00
|
|
|
|
tuple.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Pros
|
2014-07-13 20:40:44 -04:00
|
|
|
|
''''
|
|
|
|
|
- Looks nice. namedtuple transparently replaces tuple and gracefully
|
|
|
|
|
degrades to the old behavior.
|
|
|
|
|
- Does not require a change in the C interface
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Cons
|
2014-07-13 20:40:44 -04:00
|
|
|
|
''''
|
|
|
|
|
- According to some sources [#namedtuple]_ namedtuple is not well developed.
|
|
|
|
|
To include it as such important object would probably require rework
|
|
|
|
|
and improvement;
|
|
|
|
|
- The namedtuple fields, and thus the type, will have to change according
|
|
|
|
|
to the passed arguments. This can be a performance bottleneck, and makes
|
|
|
|
|
it impossible to guarantee that two subsequent index accesses get the same
|
|
|
|
|
Index class;
|
|
|
|
|
- the ``_n`` "magic" fields are a bit unusual, but ipython already uses them
|
|
|
|
|
for result history.
|
|
|
|
|
- Python currently has no builtin namedtuple. The current one is available
|
|
|
|
|
in the "collections" module in the standard library.
|
|
|
|
|
- Differently from a function, the two notations ``gridValues[x=3, y=5, z=8]``
|
|
|
|
|
and ``gridValues[3,5,8]`` would not gracefully match if the order is modified
|
2017-03-24 17:11:33 -04:00
|
|
|
|
at call time (e.g. we ask for ``gridValues[y=5, z=8, x=3])``. In a function,
|
2014-07-13 20:40:44 -04:00
|
|
|
|
we can pre-define argument names so that keyword arguments are properly
|
|
|
|
|
matched. Not so in ``__getitem__``, leaving the task for interpreting and
|
|
|
|
|
matching to ``__getitem__`` itself.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Strategy "New argument contents"
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
In the current implementation, when many arguments are passed to ``__getitem__``,
|
2017-03-24 17:11:33 -04:00
|
|
|
|
they are grouped in a tuple and this tuple is passed to ``__getitem__`` as the
|
2014-07-13 20:40:44 -04:00
|
|
|
|
single argument ``idx``. This strategy keeps the current signature, but expands the
|
2017-03-24 17:11:33 -04:00
|
|
|
|
range of variability in type and contents of ``idx`` to more complex representations.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
We identify four possible ways to implement this strategy:
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
- **P1**: uses a single dictionary for the keyword arguments.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
- **P2**: uses individual single-item dictionaries.
|
|
|
|
|
- **P3**: similar to **P2**, but replaces single-item dictionaries with a ``(key, value)`` tuple.
|
|
|
|
|
- **P4**: similar to **P2**, but uses a special and additional new object: ``keyword()``
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Some of these possibilities lead to degenerate notations, i.e. indistinguishable
|
|
|
|
|
from an already possible representation. Once again, the proposed notation
|
2014-07-13 20:40:44 -04:00
|
|
|
|
becomes syntactic sugar for these representations.
|
|
|
|
|
|
|
|
|
|
Under this strategy, the old behavior for C0 is unchanged.
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C0: a[1] -> idx = 1 # integer
|
|
|
|
|
a[1,2] -> idx = (1,2) # tuple
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
In C1, we can use either a dictionary or a tuple to represent key and value pair
|
|
|
|
|
for the specific indexing entry. We need to have a tuple with a tuple in C1
|
|
|
|
|
because otherwise we cannot differentiate ``a["Z", 3]`` from ``a[Z=3]``.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C1: a[Z=3] -> idx = {"Z": 3} # P1/P2 dictionary with single key
|
2017-03-24 17:11:33 -04:00
|
|
|
|
or idx = (("Z", 3),) # P3 tuple of tuples
|
|
|
|
|
or idx = keyword("Z", 3) # P4 keyword object
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
As you can see, notation P1/P2 implies that ``a[Z=3]`` and ``a[{"Z": 3}]`` will
|
|
|
|
|
call ``__getitem__`` passing the exact same value, and is therefore syntactic
|
|
|
|
|
sugar for the latter. Same situation occurs, although with different index, for
|
|
|
|
|
P3. Using a keyword object as in P4 would remove this degeneracy.
|
|
|
|
|
|
|
|
|
|
For the C2 case:
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} # P1 dictionary/ordereddict
|
|
|
|
|
or idx = ({"Z": 3}, {"R": 4}) # P2 tuple of two single-key dict
|
|
|
|
|
or idx = (("Z", 3), ("R", 4)) # P3 tuple of tuples
|
|
|
|
|
or idx = (keyword("Z", 3),
|
2014-07-13 20:40:44 -04:00
|
|
|
|
keyword("R", 4) ) # P4 keyword objects
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
P1 naturally maps to the traditional ``**kwargs`` behavior, however it breaks
|
|
|
|
|
the convention that two or more entries for the index produce a tuple. P2
|
|
|
|
|
preserves this behavior, and additionally preserves the order. Preserving the
|
2022-01-21 06:03:51 -05:00
|
|
|
|
order would also be possible with an OrderedDict as drafted by :pep:`468`.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
The remaining cases are here shown:
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C3. a[1, Z=3] -> idx = (1, {"Z": 3}) # P1/P2
|
|
|
|
|
or idx = (1, ("Z", 3)) # P3
|
|
|
|
|
or idx = (1, keyword("Z", 3)) # P4
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2014-07-13 20:40:44 -04:00
|
|
|
|
C4. a[1, Z=3, R=4] -> idx = (1, {"Z": 3, "R": 4}) # P1
|
|
|
|
|
or idx = (1, {"Z": 3}, {"R": 4}) # P2
|
|
|
|
|
or idx = (1, ("Z", 3), ("R", 4)) # P3
|
2017-03-24 17:11:33 -04:00
|
|
|
|
or idx = (1, keyword("Z", 3),
|
2014-07-13 20:40:44 -04:00
|
|
|
|
keyword("R", 4)) # P4
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
|
|
|
|
C5. a[1, 2, Z=3] -> idx = (1, 2, {"Z": 3}) # P1/P2
|
2014-07-13 20:40:44 -04:00
|
|
|
|
or idx = (1, 2, ("Z", 3)) # P3
|
|
|
|
|
or idx = (1, 2, keyword("Z", 3)) # P4
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2014-07-13 20:40:44 -04:00
|
|
|
|
C6. a[1, 2, Z=3, R=4] -> idx = (1, 2, {"Z":3, "R": 4}) # P1
|
|
|
|
|
or idx = (1, 2, {"Z": 3}, {"R": 4}) # P2
|
|
|
|
|
or idx = (1, 2, ("Z", 3), ("R", 4)) # P3
|
2017-03-24 17:11:33 -04:00
|
|
|
|
or idx = (1, 2, keyword("Z", 3),
|
2014-07-13 20:40:44 -04:00
|
|
|
|
keyword("R", 4)) # P4
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2014-07-13 20:40:44 -04:00
|
|
|
|
C7. a[1, Z=3, 2, R=4] -> idx = (1, 2, {"Z": 3, "R": 4}) # P1. Pack the keyword arguments. Ugly.
|
|
|
|
|
or raise SyntaxError # P1. Same behavior as in function calls.
|
|
|
|
|
or idx = (1, {"Z": 3}, 2, {"R": 4}) # P2
|
|
|
|
|
or idx = (1, ("Z", 3), 2, ("R", 4)) # P3
|
2017-03-24 17:11:33 -04:00
|
|
|
|
or idx = (1, keyword("Z", 3),
|
2014-07-13 20:40:44 -04:00
|
|
|
|
2, keyword("R", 4)) # P4
|
|
|
|
|
|
|
|
|
|
Pros
|
2017-03-24 17:11:33 -04:00
|
|
|
|
''''
|
|
|
|
|
- Signature is unchanged;
|
|
|
|
|
- P2/P3 can preserve ordering of keyword arguments as specified at indexing,
|
|
|
|
|
- P1 needs an OrderedDict, but would destroy interposed ordering if allowed:
|
2014-07-13 20:40:44 -04:00
|
|
|
|
all keyword indexes would be dumped into the dictionary;
|
|
|
|
|
- Stays within traditional types: tuples and dicts. Evt. OrderedDict;
|
|
|
|
|
- Some proposed strategies are similar in behavior to a traditional function call;
|
|
|
|
|
- The C interface for ``PyObject_GetItem`` and family would remain unchanged.
|
|
|
|
|
|
|
|
|
|
Cons
|
2017-03-24 17:11:33 -04:00
|
|
|
|
''''
|
2019-06-25 00:58:50 -04:00
|
|
|
|
- Apparently complex and wasteful;
|
2014-07-13 20:40:44 -04:00
|
|
|
|
- Degeneracy in notation (e.g. ``a[Z=3]`` and ``a[{"Z":3}]`` are equivalent and
|
|
|
|
|
indistinguishable notations at the ``__[get|set|del]item__`` level).
|
|
|
|
|
This behavior may or may not be acceptable.
|
|
|
|
|
- for P4, an additional object similar in nature to slice() is needed,
|
|
|
|
|
but only to disambiguate the above degeneracy.
|
|
|
|
|
- ``idx`` type and layout seems to change depending on the whims of the caller;
|
|
|
|
|
- May be complex to parse what is passed, especially in the case of tuple of tuples;
|
|
|
|
|
- P2 Creates a lot of single keys dictionary as members of a tuple. Looks ugly.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
P3 would be lighter and easier to use than the tuple of dicts, and still
|
|
|
|
|
preserves order (unlike the regular dict), but would result in clumsy
|
2014-07-13 20:40:44 -04:00
|
|
|
|
extraction of keywords.
|
|
|
|
|
|
|
|
|
|
Strategy "kwargs argument"
|
|
|
|
|
---------------------------
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
``__getitem__`` accepts an optional ``**kwargs`` argument which should be keyword only.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
``idx`` also becomes optional to support a case where no non-keyword arguments are allowed.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
The signature would then be either
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
__getitem__(self, idx)
|
2014-07-13 20:40:44 -04:00
|
|
|
|
__getitem__(self, idx, **kwargs)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
__getitem__(self, **kwargs)
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
Applied to our cases would produce:
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
C0. a[1,2] -> idx=(1,2); kwargs={}
|
|
|
|
|
C1. a[Z=3] -> idx=None ; kwargs={"Z":3}
|
|
|
|
|
C2. a[Z=3, R=4] -> idx=None ; kwargs={"Z":3, "R":4}
|
|
|
|
|
C3. a[1, Z=3] -> idx=1 ; kwargs={"Z":3}
|
2017-03-24 17:11:33 -04:00
|
|
|
|
C4. a[1, Z=3, R=4] -> idx=1 ; kwargs={"Z":3, "R":4}
|
2014-07-13 20:40:44 -04:00
|
|
|
|
C5. a[1, 2, Z=3] -> idx=(1,2); kwargs={"Z":3}
|
|
|
|
|
C6. a[1, 2, Z=3, R=4] -> idx=(1,2); kwargs={"Z":3, "R":4}
|
|
|
|
|
C7. a[1, Z=3, 2, R=4] -> raise SyntaxError # in agreement to function behavior
|
|
|
|
|
|
|
|
|
|
Empty indexing ``a[]`` of course remains invalid syntax.
|
|
|
|
|
|
|
|
|
|
Pros
|
2017-03-24 17:11:33 -04:00
|
|
|
|
''''
|
2014-07-13 20:40:44 -04:00
|
|
|
|
- Similar to function call, evolves naturally from it;
|
|
|
|
|
- Use of keyword indexing with an object whose ``__getitem__``
|
|
|
|
|
doesn't have a kwargs will fail in an obvious way.
|
|
|
|
|
That's not the case for the other strategies.
|
|
|
|
|
|
|
|
|
|
Cons
|
2017-03-24 17:11:33 -04:00
|
|
|
|
''''
|
2014-07-13 20:40:44 -04:00
|
|
|
|
- It doesn't preserve order, unless an OrderedDict is used;
|
|
|
|
|
- Forbids C7, but is it really needed?
|
|
|
|
|
- Requires a change in the C interface to pass an additional
|
|
|
|
|
PyObject for the keyword arguments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C interface
|
|
|
|
|
===========
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
As briefly introduced in the previous analysis, the C interface would
|
2014-07-13 20:40:44 -04:00
|
|
|
|
potentially have to change to allow the new feature. Specifically,
|
2017-03-24 17:11:33 -04:00
|
|
|
|
``PyObject_GetItem`` and related routines would have to accept an additional
|
2014-07-13 20:40:44 -04:00
|
|
|
|
``PyObject *kw`` argument for Strategy "kwargs argument". The remaining
|
|
|
|
|
strategies would not require a change in the C function signatures, but the
|
2017-03-24 17:11:33 -04:00
|
|
|
|
different nature of the passed object would potentially require adaptation.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
Strategy "named tuple" would behave correctly without any change: the class
|
|
|
|
|
returned by the factory method in collections returns a subclass of tuple,
|
|
|
|
|
meaning that ``PyTuple_*`` functions can handle the resulting object.
|
|
|
|
|
|
|
|
|
|
Alternative Solutions
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
In this section, we present alternative solutions that would workaround the
|
|
|
|
|
missing feature and make the proposed enhancement not worth of implementation.
|
|
|
|
|
|
|
|
|
|
Use a method
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
One could keep the indexing as is, and use a traditional ``get()`` method for those
|
|
|
|
|
cases where basic indexing is not enough. This is a good point, but as already
|
|
|
|
|
reported in the introduction, methods have a different semantic weight from
|
2017-03-24 17:11:33 -04:00
|
|
|
|
indexing, and you can't use slices directly in methods. Compare e.g.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
``a[1:3, Z=2]`` with ``a.get(slice(1,3), Z=2)``.
|
|
|
|
|
|
|
|
|
|
The authors however recognize this argument as compelling, and the advantage
|
|
|
|
|
in semantic expressivity of a keyword-based indexing may be offset by a rarely
|
|
|
|
|
used feature that does not bring enough benefit and may have limited adoption.
|
|
|
|
|
|
|
|
|
|
Emulate requested behavior by abusing the slice object
|
|
|
|
|
------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
This extremely creative method exploits the slice objects' behavior, provided
|
|
|
|
|
that one accepts to use strings (or instantiate properly named placeholder
|
|
|
|
|
objects for the keys), and accept to use ":" instead of "=".
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> a["K":3]
|
|
|
|
|
slice('K', 3, None)
|
|
|
|
|
>>> a["K":3, "R":4]
|
|
|
|
|
(slice('K', 3, None), slice('R', 4, None))
|
2017-03-24 17:11:33 -04:00
|
|
|
|
>>>
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
While clearly smart, this approach does not allow easy inquire of the key/value
|
|
|
|
|
pair, it's too clever and esotheric, and does not allow to pass a slice as in
|
|
|
|
|
``a[K=1:10:2]``.
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
However, Tim Delaney comments
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
"I really do think that ``a[b=c, d=e]`` should just be syntax sugar for
|
|
|
|
|
``a['b':c, 'd':e]``. It's simple to explain, and gives the greatest backwards
|
|
|
|
|
compatibility. In particular, libraries that already abused slices in this
|
2014-07-13 20:40:44 -04:00
|
|
|
|
way will just continue to work with the new syntax."
|
|
|
|
|
|
|
|
|
|
We think this behavior would produce inconvenient results. The library Pandas uses
|
|
|
|
|
strings as labels, allowing notation such as
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> a[:, "A":"F"]
|
|
|
|
|
|
|
|
|
|
to extract data from column "A" to column "F". Under the above comment, this notation
|
|
|
|
|
would be equally obtained with
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> a[:, A="F"]
|
|
|
|
|
|
|
|
|
|
which is weird and collides with the intended meaning of keyword in indexing, that
|
|
|
|
|
is, specifying the axis through conventional names rather than positioning.
|
|
|
|
|
|
|
|
|
|
Pass a dictionary as an additional index
|
|
|
|
|
----------------------------------------
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
>>> a[1, 2, {"K": 3}]
|
|
|
|
|
|
|
|
|
|
this notation, although less elegant, can already be used and achieves similar
|
|
|
|
|
results. It's evident that the proposed Strategy "New argument contents" can be
|
|
|
|
|
interpreted as syntactic sugar for this notation.
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Additional Comments
|
2014-07-13 20:40:44 -04:00
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
Commenters also expressed the following relevant points:
|
|
|
|
|
|
|
|
|
|
Relevance of ordering of keyword arguments
|
|
|
|
|
------------------------------------------
|
|
|
|
|
|
|
|
|
|
As part of the discussion of this PEP, it's important to decide if the ordering
|
|
|
|
|
information of the keyword arguments is important, and if indexes and keys can
|
2022-01-21 06:03:51 -05:00
|
|
|
|
be ordered in an arbitrary way (e.g. ``a[1,Z=3,2,R=4]``). :pep:`468`
|
2014-07-13 20:40:44 -04:00
|
|
|
|
tries to address the first point by proposing the use of an ordereddict,
|
|
|
|
|
however one would be inclined to accept that keyword arguments in indexing are
|
|
|
|
|
equivalent to kwargs in function calls, and therefore as of today equally
|
|
|
|
|
unordered, and with the same restrictions.
|
|
|
|
|
|
|
|
|
|
Need for homogeneity of behavior
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
Relative to Strategy "New argument contents", a comment from Ian Cordasco
|
2017-03-24 17:11:33 -04:00
|
|
|
|
points out that
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
"it would be unreasonable for just one method to behave totally
|
|
|
|
|
differently from the standard behaviour in Python. It would be confusing for
|
|
|
|
|
only ``__getitem__`` (and ostensibly, ``__setitem__``) to take keyword
|
|
|
|
|
arguments but instead of turning them into a dictionary, turn them into
|
|
|
|
|
individual single-item dictionaries." We agree with his point, however it must
|
|
|
|
|
be pointed out that ``__getitem__`` is already special in some regards when it
|
|
|
|
|
comes to passed arguments.
|
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Chris Angelico also states:
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
"it seems very odd to start out by saying "here, let's give indexing the
|
|
|
|
|
option to carry keyword args, just like with function calls", and then come
|
|
|
|
|
back and say "oh, but unlike function calls, they're inherently ordered and
|
|
|
|
|
carried very differently"." Again, we agree on this point. The most
|
|
|
|
|
straightforward strategy to keep homogeneity would be Strategy "kwargs
|
|
|
|
|
argument", opening to a ``**kwargs`` argument on ``__getitem__``.
|
|
|
|
|
|
|
|
|
|
One of the authors (Stefano Borini) thinks that only the "strict dictionary"
|
|
|
|
|
strategy is worth of implementation. It is non-ambiguous, simple, does not
|
|
|
|
|
force complex parsing, and addresses the problem of referring to axes either
|
|
|
|
|
by position or by name. The "options" use case is probably best handled with
|
|
|
|
|
a different approach, and may be irrelevant for this PEP. The alternative
|
2017-03-24 17:11:33 -04:00
|
|
|
|
"named tuple" is another valid choice.
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
Having .get() become obsolete for indexing with default fallback
|
|
|
|
|
----------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
Introducing a "default" keyword could make ``dict.get()`` obsolete, which would be
|
2017-03-24 17:11:33 -04:00
|
|
|
|
replaced by ``d["key", default=3]``. Chris Angelico however states:
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
"Currently, you need to write ``__getitem__`` (which raises an exception on
|
|
|
|
|
finding a problem) plus something else, e.g. ``get()``, which returns a default
|
|
|
|
|
instead. By your proposal, both branches would go inside ``__getitem__``, which
|
2017-03-24 17:11:33 -04:00
|
|
|
|
means they could share code; but there still need to be two branches."
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Additionally, Chris continues:
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
"There'll be an ad-hoc and fairly arbitrary puddle of names (some will go
|
|
|
|
|
``default=``, others will say that's way too long and go ``def=``, except that
|
|
|
|
|
that's a keyword so they'll use ``dflt=`` or something...), unless there's a
|
2017-03-24 17:11:33 -04:00
|
|
|
|
strong force pushing people to one consistent name.".
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
This argument is valid but it's equally valid for any function call, and is
|
|
|
|
|
generally fixed by established convention and documentation.
|
|
|
|
|
|
|
|
|
|
On degeneracy of notation
|
|
|
|
|
-------------------------
|
|
|
|
|
|
|
|
|
|
User Drekin commented: "The case of ``a[Z=3]`` and ``a[{"Z": 3}]`` is similar to
|
|
|
|
|
current ``a[1, 2]`` and ``a[(1, 2)]``. Even though one may argue that the parentheses
|
|
|
|
|
are actually not part of tuple notation but are just needed because of syntax,
|
|
|
|
|
it may look as degeneracy of notation when compared to function call: ``f(1, 2)``
|
2017-03-24 17:11:33 -04:00
|
|
|
|
is not the same thing as ``f((1, 2))``.".
|
2014-07-13 20:40:44 -04:00
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [#keyword-1] "keyword-only args in __getitem__"
|
|
|
|
|
(http://article.gmane.org/gmane.comp.python.ideas/27584)
|
|
|
|
|
|
|
|
|
|
.. [#keyword-2] "Accepting keyword arguments for __getitem__"
|
|
|
|
|
(https://mail.python.org/pipermail/python-ideas/2014-June/028164.html)
|
|
|
|
|
|
|
|
|
|
.. [#keyword-3] "PEP pre-draft: Support for indexing with keyword arguments"
|
|
|
|
|
https://mail.python.org/pipermail/python-ideas/2014-July/028250.html
|
|
|
|
|
|
|
|
|
|
.. [#namedtuple] "namedtuple is not as good as it should be"
|
|
|
|
|
(https://mail.python.org/pipermail/python-ideas/2013-June/021257.html)
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|