Add PEP 472: "Support for indexing with keyword arguments" by Stefano Borini, Joseph Martinot-Lagarde.
This commit is contained in:
parent
689e1bff5e
commit
84d82f06f2
|
@ -0,0 +1,653 @@
|
|||
PEP: 472
|
||||
Title: Support for indexing with keyword arguments
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Stefano Borini, Joseph Martinot-Lagarde
|
||||
Discussion-To: python-ideas@python.org
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 24-Jun-2014
|
||||
Python-Version: 3.6
|
||||
Post-History: 02-Jul-2014
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes an extension of the indexing operation to support keyword
|
||||
arguments. Notations in the form ``a[K=3,R=2]`` would become legal syntax.
|
||||
For future-proofing considerations, ``a[1:2, K=3, R=4]`` are considered and
|
||||
may be allowed as well, depending on the choice for implementation. In addition
|
||||
to a change in the parser, the index protocol (``__getitem__``, ``__setitem__``
|
||||
and ``__delitem__``) will also potentially require adaptation.
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
The indexing syntax carries a strong semantic content, differentiating it from
|
||||
a method call: it implies referring to a subset of data. We believe this
|
||||
semantic association to be important, and wish to expand the strategies allowed
|
||||
to refer to this data.
|
||||
|
||||
As a general observation, the number of indices needed by an indexing operation
|
||||
depends on the dimensionality of the data: one-dimensional data (e.g. a list)
|
||||
requires one index (e.g. ``a[3]``), two-dimensional data (e.g. a matrix) requires
|
||||
two indices (e.g. ``a[2,3]``) and so on. Each index is a selector along one of the
|
||||
axes of the dimensionality, and the position in the index tuple is the
|
||||
metainformation needed to associate each index to the corresponding axis.
|
||||
|
||||
The current python syntax focuses exclusively on position to express the
|
||||
association to the axes, and also contains syntactic sugar to refer to
|
||||
non-punctiform selection (slices)
|
||||
|
||||
::
|
||||
|
||||
>>> a[3] # returns the fourth element of a
|
||||
>>> a[1:10:2] # slice notation (extract a non-trivial data subset)
|
||||
>>> a[3,2] # multiple indexes (for multidimensional arrays)
|
||||
|
||||
The additional notation proposed in this PEP would allow notations involving
|
||||
keyword arguments in the indexing operation, e.g.
|
||||
|
||||
::
|
||||
|
||||
>>> a[K=3, R=2]
|
||||
|
||||
which would allow to refer to axes by conventional names.
|
||||
|
||||
One must additionally consider the extended form that allows both positional
|
||||
and keyword specification
|
||||
|
||||
::
|
||||
|
||||
>>> a[3,R=3,K=4]
|
||||
|
||||
This PEP will explore different strategies to enable the use of these notations.
|
||||
|
||||
Use cases
|
||||
=========
|
||||
|
||||
The following practical use cases present two broad categories of usage of a
|
||||
keyworded specification: Indexing and contextual option. For indexing:
|
||||
|
||||
1. To provide a more communicative meaning to the index, preventing e.g. accidental
|
||||
inversion of indexes
|
||||
|
||||
::
|
||||
|
||||
>>> gridValues[x=3, y=5, z=8]
|
||||
>>> rain[time=0:12, location=location]
|
||||
|
||||
2. In some domain, such as computational physics and chemistry, the use of a
|
||||
notation such as ``Basis[Z=5]`` is a Domain Specific Language notation to represent
|
||||
a level of accuracy
|
||||
|
||||
::
|
||||
|
||||
>>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])
|
||||
|
||||
In this case, the index operation would return a basis set at the chosen level
|
||||
of accuracy (represented by the parameter Z). The reason behind an indexing is that
|
||||
the BasisSet object could be internally represented as a numeric table, where
|
||||
rows (the "coefficient" axis, hidden to the user in this example) are associated
|
||||
to individual elements (e.g. row 0:5 contains coefficients for element 1,
|
||||
row 5:8 coefficients for element 2) and each column is associated to a given
|
||||
degree of accuracy ("accuracy" or "Z" axis) so that first column is low
|
||||
accuracy, second column is medium accuracy and so on. With that indexing,
|
||||
the user would obtain another object representing the contents of the column
|
||||
of the internal table for accuracy level 3.
|
||||
|
||||
Additionally, the keyword specification can be used as an option contextual to
|
||||
the indexing. Specifically:
|
||||
|
||||
1. A "default" option allows to specify a default return value when the index
|
||||
is not present
|
||||
|
||||
::
|
||||
|
||||
>>> lst = [1, 2, 3]
|
||||
>>> value = lst[5, default=0] # value is 0
|
||||
|
||||
2. For a sparse dataset, to specify an interpolation strategy
|
||||
to infer a missing point from e.g. its surrounding data.
|
||||
|
||||
::
|
||||
|
||||
>>> value = array[1, 3, interpolate=spline_interpolator]
|
||||
|
||||
3. A unit could be specified with the same mechanism
|
||||
|
||||
::
|
||||
|
||||
>>> value = array[1, 3, unit="degrees"]
|
||||
|
||||
How the notation is interpreted is up to the implementing class.
|
||||
|
||||
Current implementation
|
||||
======================
|
||||
|
||||
Currently, the indexing operation is handled by methods ``__getitem__``,
|
||||
``__setitem__`` and ``__delitem__``. These methods' signature accept one argument
|
||||
for the index (with ``__setitem__`` accepting an additional argument for the set
|
||||
value). In the following, we will analyze ``__getitem__(self, idx)`` exclusively,
|
||||
with the same considerations implied for the remaining two methods.
|
||||
|
||||
When an indexing operation is performed, ``__getitem__(self, idx)`` is called.
|
||||
Traditionally, the full content between square brackets is turned into a single
|
||||
object passed to argument ``idx``:
|
||||
|
||||
- When a single element is passed, e.g. ``a[2]``, ``idx`` will be ``2``.
|
||||
- When multiple elements are passed, they must be separated by commas: ``a[2, 3]``.
|
||||
In this case, ``idx`` will be a tuple ``(2, 3)``. With ``a[2, 3, "hello", {}]``
|
||||
``idx`` will be ``(2, 3, "hello", {})``.
|
||||
- A slicing notation e.g. ``a[2:10]`` will produce a slice object, or a tuple
|
||||
containing slice objects if multiple values were passed.
|
||||
|
||||
Except for its unique ability to handle slice notation, the indexing operation
|
||||
has similarities to a plain method call: it acts like one when invoked with
|
||||
only one element; If the number of elements is greater than one, the ``idx``
|
||||
argument behaves like a ``*args``. However, as stated in the Motivation section,
|
||||
an indexing operation has the strong semantic implication of extraction of a
|
||||
subset out of a larger set, which is not automatically associated to a regular
|
||||
method call unless appropriate naming is chosen. Moreover, its different visual
|
||||
style is important for readability.
|
||||
|
||||
Specifications
|
||||
==============
|
||||
|
||||
The implementation should try to preserve the current signature for
|
||||
``__getitem__``, or modify it in a backward-compatible way. We will present
|
||||
different alternatives, taking into account the possible cases that need
|
||||
to be addressed
|
||||
|
||||
::
|
||||
|
||||
C0. a[1]; a[1,2] # Traditional indexing
|
||||
C1. a[Z=3]
|
||||
C2. a[Z=3, R=4]
|
||||
C3. a[1, Z=3]
|
||||
C4. a[1, Z=3, R=4]
|
||||
C5. a[1, 2, Z=3]
|
||||
C6. a[1, 2, Z=3, R=4]
|
||||
C7. a[1, Z=3, 2, R=4] # Interposed ordering
|
||||
|
||||
Strategy "Strict dictionary"
|
||||
----------------------------
|
||||
|
||||
This strategy acknowledges that ``__getitem__`` is special in accepting only
|
||||
one object, and the nature of that object must be non-ambiguous in its
|
||||
specification of the axes: it can be either by order, or by name. As a result
|
||||
of this assumption, in presence of keyword arguments, the passed entity is a
|
||||
dictionary and all labels must be specified.
|
||||
|
||||
::
|
||||
|
||||
C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2)
|
||||
C1. a[Z=3] -> idx = {"Z": 3}
|
||||
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}
|
||||
C3. a[1, Z=3] -> raise SyntaxError
|
||||
C4. a[1, Z=3, R=4] -> raise SyntaxError
|
||||
C5. a[1, 2, Z=3] -> raise SyntaxError
|
||||
C6. a[1, 2, Z=3, R=4] -> raise SyntaxError
|
||||
C7. a[1, Z=3, 2, R=4] -> raise SyntaxError
|
||||
|
||||
Pros
|
||||
''''
|
||||
|
||||
- Strong conceptual similarity between the tuple case and the dictionary case.
|
||||
In the first case, we are specifying a tuple, so we are naturally defining
|
||||
a plain set of values separated by commas. In the second, we are specifying a
|
||||
dictionary, so we are specifying a homogeneous set of key/value pairs, as
|
||||
in ``dict(Z=3, R=4)``;
|
||||
- Simple and easy to parse on the ``__getitem__`` side: if it gets a tuple,
|
||||
determine the axes using positioning. If it gets a dictionary, use
|
||||
the keywords.
|
||||
- C interface does not need changes.
|
||||
|
||||
Neutral
|
||||
'''''''
|
||||
|
||||
- Degeneracy of ``a[{"Z": 3, "R": 4}]`` with ``a[Z=3, R=4]`` means the notation
|
||||
is syntactic sugar.
|
||||
|
||||
Cons
|
||||
''''
|
||||
|
||||
- Very strict.
|
||||
- Destroys ordering of the passed arguments. Preserving the
|
||||
order would be possible with an OrderedDict as drafted by PEP-468 [#PEP-468]_.
|
||||
- Does not allow use cases with mixed positional/keyword arguments such as
|
||||
``a[1, 2, default=5]``.
|
||||
|
||||
Strategy "mixed dictionary"
|
||||
---------------------------
|
||||
|
||||
This strategy relaxes the above constraint to return a dictionary containing
|
||||
both numbers and strings as keys.
|
||||
|
||||
::
|
||||
|
||||
C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2)
|
||||
C1. a[Z=3] -> idx = {"Z": 3}
|
||||
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}
|
||||
C3. a[1, Z=3] -> idx = { 0: 1, "Z": 3}
|
||||
C4. a[1, Z=3, R=4] -> idx = { 0: 1, "Z": 3, "R": 4}
|
||||
C5. a[1, 2, Z=3] -> idx = { 0: 1, 1: 2, "Z": 3}
|
||||
C6. a[1, 2, Z=3, R=4] -> idx = { 0: 1, 1: 2, "Z": 3, "R": 4}
|
||||
C7. a[1, Z=3, 2, R=4] -> idx = { 0: 1, "Z": 3, 2: 2, "R": 4}
|
||||
|
||||
Pros
|
||||
''''
|
||||
- Opens for mixed cases.
|
||||
|
||||
Cons
|
||||
''''
|
||||
- Destroys ordering information for string keys. We have no way of saying if
|
||||
``"Z"`` in C7 was in position 1 or 3.
|
||||
- Implies switching from a tuple to a dict as soon as one specified index
|
||||
has a keyword argument. May be confusing to parse.
|
||||
|
||||
Strategy "named tuple"
|
||||
-----------------------
|
||||
|
||||
Return a named tuple for ``idx`` instead of a tuple. Keyword arguments would
|
||||
obviously have their stated name as key, and positional argument would have an
|
||||
underscore followed by their order:
|
||||
|
||||
::
|
||||
|
||||
C0. a[1]; a[1,2] -> idx = 1; idx = (_0=1, _1=2)
|
||||
C1. a[Z=3] -> idx = (Z=3)
|
||||
C2. a[Z=3, R=2] -> idx = (Z=3, R=2)
|
||||
C3. a[1, Z=3] -> idx = (_0=1, Z=3)
|
||||
C4. a[1, Z=3, R=2] -> idx = (_0=1, Z=3, R=2)
|
||||
C5. a[1, 2, Z=3] -> idx = (_0=1, _2=2, Z=3)
|
||||
C6. a[1, 2, Z=3, R=4] -> (_0=1, _1=2, Z=3, R=4)
|
||||
C7. a[1, Z=3, 2, R=4] -> (_0=1, Z=3, _1=2, R=4)
|
||||
or (_0=1, Z=3, _2=2, R=4)
|
||||
or raise SyntaxError
|
||||
|
||||
The required typename of the namedtuple could be ``Index`` or the name of the
|
||||
argument in the function definition, it keeps the ordering and is easy to
|
||||
analyse by using the ``_fields`` attribute. It is backward compatible, provided
|
||||
that C0 with more than one entry now passes a namedtuple instead of a plain
|
||||
tuple.
|
||||
|
||||
Pros
|
||||
''''
|
||||
- Looks nice. namedtuple transparently replaces tuple and gracefully
|
||||
degrades to the old behavior.
|
||||
- Does not require a change in the C interface
|
||||
|
||||
Cons
|
||||
''''
|
||||
- According to some sources [#namedtuple]_ namedtuple is not well developed.
|
||||
To include it as such important object would probably require rework
|
||||
and improvement;
|
||||
- The namedtuple fields, and thus the type, will have to change according
|
||||
to the passed arguments. This can be a performance bottleneck, and makes
|
||||
it impossible to guarantee that two subsequent index accesses get the same
|
||||
Index class;
|
||||
- the ``_n`` "magic" fields are a bit unusual, but ipython already uses them
|
||||
for result history.
|
||||
- Python currently has no builtin namedtuple. The current one is available
|
||||
in the "collections" module in the standard library.
|
||||
- Differently from a function, the two notations ``gridValues[x=3, y=5, z=8]``
|
||||
and ``gridValues[3,5,8]`` would not gracefully match if the order is modified
|
||||
at call time (e.g. we ask for ``gridValues[y=5, z=8, x=3])``. In a function,
|
||||
we can pre-define argument names so that keyword arguments are properly
|
||||
matched. Not so in ``__getitem__``, leaving the task for interpreting and
|
||||
matching to ``__getitem__`` itself.
|
||||
|
||||
|
||||
Strategy "New argument contents"
|
||||
--------------------------------
|
||||
|
||||
In the current implementation, when many arguments are passed to ``__getitem__``,
|
||||
they are grouped in a tuple and this tuple is passed to ``__getitem__`` as the
|
||||
single argument ``idx``. This strategy keeps the current signature, but expands the
|
||||
range of variability in type and contents of ``idx`` to more complex representations.
|
||||
|
||||
We identify four possible ways to implement this strategy:
|
||||
|
||||
- **P1**: uses a single dictionary for the keyword arguments.
|
||||
- **P2**: uses individual single-item dictionaries.
|
||||
- **P3**: similar to **P2**, but replaces single-item dictionaries with a ``(key, value)`` tuple.
|
||||
- **P4**: similar to **P2**, but uses a special and additional new object: ``keyword()``
|
||||
|
||||
Some of these possibilities lead to degenerate notations, i.e. indistinguishable
|
||||
from an already possible representation. Once again, the proposed notation
|
||||
becomes syntactic sugar for these representations.
|
||||
|
||||
Under this strategy, the old behavior for C0 is unchanged.
|
||||
|
||||
::
|
||||
|
||||
C0: a[1] -> idx = 1 # integer
|
||||
a[1,2] -> idx = (1,2) # tuple
|
||||
|
||||
In C1, we can use either a dictionary or a tuple to represent key and value pair
|
||||
for the specific indexing entry. We need to have a tuple with a tuple in C1
|
||||
because otherwise we cannot differentiate ``a["Z", 3]`` from ``a[Z=3]``.
|
||||
|
||||
::
|
||||
|
||||
C1: a[Z=3] -> idx = {"Z": 3} # P1/P2 dictionary with single key
|
||||
or idx = (("Z", 3),) # P3 tuple of tuples
|
||||
or idx = keyword("Z", 3) # P4 keyword object
|
||||
|
||||
As you can see, notation P1/P2 implies that ``a[Z=3]`` and ``a[{"Z": 3}]`` will
|
||||
call ``__getitem__`` passing the exact same value, and is therefore syntactic
|
||||
sugar for the latter. Same situation occurs, although with different index, for
|
||||
P3. Using a keyword object as in P4 would remove this degeneracy.
|
||||
|
||||
For the C2 case:
|
||||
|
||||
::
|
||||
|
||||
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} # P1 dictionary/ordereddict
|
||||
or idx = ({"Z": 3}, {"R": 4}) # P2 tuple of two single-key dict
|
||||
or idx = (("Z", 3), ("R", 4)) # P3 tuple of tuples
|
||||
or idx = (keyword("Z", 3),
|
||||
keyword("R", 4) ) # P4 keyword objects
|
||||
|
||||
|
||||
P1 naturally maps to the traditional ``**kwargs`` behavior, however it breaks
|
||||
the convention that two or more entries for the index produce a tuple. P2
|
||||
preserves this behavior, and additionally preserves the order. Preserving the
|
||||
order would also be possible with an OrderedDict as drafted by PEP-468 [#PEP-468]_.
|
||||
|
||||
The remaining cases are here shown:
|
||||
|
||||
::
|
||||
|
||||
C3. a[1, Z=3] -> idx = (1, {"Z": 3}) # P1/P2
|
||||
or idx = (1, ("Z", 3)) # P3
|
||||
or idx = (1, keyword("Z", 3)) # P4
|
||||
|
||||
C4. a[1, Z=3, R=4] -> idx = (1, {"Z": 3, "R": 4}) # P1
|
||||
or idx = (1, {"Z": 3}, {"R": 4}) # P2
|
||||
or idx = (1, ("Z", 3), ("R", 4)) # P3
|
||||
or idx = (1, keyword("Z", 3),
|
||||
keyword("R", 4)) # P4
|
||||
|
||||
C5. a[1, 2, Z=3] -> idx = (1, 2, {"Z": 3}) # P1/P2
|
||||
or idx = (1, 2, ("Z", 3)) # P3
|
||||
or idx = (1, 2, keyword("Z", 3)) # P4
|
||||
|
||||
C6. a[1, 2, Z=3, R=4] -> idx = (1, 2, {"Z":3, "R": 4}) # P1
|
||||
or idx = (1, 2, {"Z": 3}, {"R": 4}) # P2
|
||||
or idx = (1, 2, ("Z", 3), ("R", 4)) # P3
|
||||
or idx = (1, 2, keyword("Z", 3),
|
||||
keyword("R", 4)) # P4
|
||||
|
||||
C7. a[1, Z=3, 2, R=4] -> idx = (1, 2, {"Z": 3, "R": 4}) # P1. Pack the keyword arguments. Ugly.
|
||||
or raise SyntaxError # P1. Same behavior as in function calls.
|
||||
or idx = (1, {"Z": 3}, 2, {"R": 4}) # P2
|
||||
or idx = (1, ("Z", 3), 2, ("R", 4)) # P3
|
||||
or idx = (1, keyword("Z", 3),
|
||||
2, keyword("R", 4)) # P4
|
||||
|
||||
Pros
|
||||
''''
|
||||
- Signature is unchanged;
|
||||
- P2/P3 can preserve ordering of keyword arguments as specified at indexing,
|
||||
- P1 needs an OrderedDict, but would destroy interposed ordering if allowed:
|
||||
all keyword indexes would be dumped into the dictionary;
|
||||
- Stays within traditional types: tuples and dicts. Evt. OrderedDict;
|
||||
- Some proposed strategies are similar in behavior to a traditional function call;
|
||||
- The C interface for ``PyObject_GetItem`` and family would remain unchanged.
|
||||
|
||||
Cons
|
||||
''''
|
||||
- Apparenty complex and wasteful;
|
||||
- Degeneracy in notation (e.g. ``a[Z=3]`` and ``a[{"Z":3}]`` are equivalent and
|
||||
indistinguishable notations at the ``__[get|set|del]item__`` level).
|
||||
This behavior may or may not be acceptable.
|
||||
- for P4, an additional object similar in nature to slice() is needed,
|
||||
but only to disambiguate the above degeneracy.
|
||||
- ``idx`` type and layout seems to change depending on the whims of the caller;
|
||||
- May be complex to parse what is passed, especially in the case of tuple of tuples;
|
||||
- P2 Creates a lot of single keys dictionary as members of a tuple. Looks ugly.
|
||||
P3 would be lighter and easier to use than the tuple of dicts, and still
|
||||
preserves order (unlike the regular dict), but would result in clumsy
|
||||
extraction of keywords.
|
||||
|
||||
Strategy "kwargs argument"
|
||||
---------------------------
|
||||
|
||||
``__getitem__`` accepts an optional ``**kwargs`` argument which should be keyword only.
|
||||
``idx`` also becomes optional to support a case where no non-keyword arguments are allowed.
|
||||
The signature would then be either
|
||||
|
||||
::
|
||||
|
||||
__getitem__(self, idx)
|
||||
__getitem__(self, idx, **kwargs)
|
||||
__getitem__(self, **kwargs)
|
||||
|
||||
Applied to our cases would produce:
|
||||
|
||||
::
|
||||
|
||||
C0. a[1,2] -> idx=(1,2); kwargs={}
|
||||
C1. a[Z=3] -> idx=None ; kwargs={"Z":3}
|
||||
C2. a[Z=3, R=4] -> idx=None ; kwargs={"Z":3, "R":4}
|
||||
C3. a[1, Z=3] -> idx=1 ; kwargs={"Z":3}
|
||||
C4. a[1, Z=3, R=4] -> idx=1 ; kwargs={"Z":3, "R":4}
|
||||
C5. a[1, 2, Z=3] -> idx=(1,2); kwargs={"Z":3}
|
||||
C6. a[1, 2, Z=3, R=4] -> idx=(1,2); kwargs={"Z":3, "R":4}
|
||||
C7. a[1, Z=3, 2, R=4] -> raise SyntaxError # in agreement to function behavior
|
||||
|
||||
Empty indexing ``a[]`` of course remains invalid syntax.
|
||||
|
||||
Pros
|
||||
''''
|
||||
- Similar to function call, evolves naturally from it;
|
||||
- Use of keyword indexing with an object whose ``__getitem__``
|
||||
doesn't have a kwargs will fail in an obvious way.
|
||||
That's not the case for the other strategies.
|
||||
|
||||
Cons
|
||||
''''
|
||||
- It doesn't preserve order, unless an OrderedDict is used;
|
||||
- Forbids C7, but is it really needed?
|
||||
- Requires a change in the C interface to pass an additional
|
||||
PyObject for the keyword arguments.
|
||||
|
||||
|
||||
C interface
|
||||
===========
|
||||
|
||||
As briefly introduced in the previous analysis, the C interface would
|
||||
potentially have to change to allow the new feature. Specifically,
|
||||
``PyObject_GetItem`` and related routines would have to accept an additional
|
||||
``PyObject *kw`` argument for Strategy "kwargs argument". The remaining
|
||||
strategies would not require a change in the C function signatures, but the
|
||||
different nature of the passed object would potentially require adaptation.
|
||||
|
||||
Strategy "named tuple" would behave correctly without any change: the class
|
||||
returned by the factory method in collections returns a subclass of tuple,
|
||||
meaning that ``PyTuple_*`` functions can handle the resulting object.
|
||||
|
||||
Alternative Solutions
|
||||
=====================
|
||||
|
||||
In this section, we present alternative solutions that would workaround the
|
||||
missing feature and make the proposed enhancement not worth of implementation.
|
||||
|
||||
Use a method
|
||||
------------
|
||||
|
||||
One could keep the indexing as is, and use a traditional ``get()`` method for those
|
||||
cases where basic indexing is not enough. This is a good point, but as already
|
||||
reported in the introduction, methods have a different semantic weight from
|
||||
indexing, and you can't use slices directly in methods. Compare e.g.
|
||||
``a[1:3, Z=2]`` with ``a.get(slice(1,3), Z=2)``.
|
||||
|
||||
The authors however recognize this argument as compelling, and the advantage
|
||||
in semantic expressivity of a keyword-based indexing may be offset by a rarely
|
||||
used feature that does not bring enough benefit and may have limited adoption.
|
||||
|
||||
Emulate requested behavior by abusing the slice object
|
||||
------------------------------------------------------
|
||||
|
||||
This extremely creative method exploits the slice objects' behavior, provided
|
||||
that one accepts to use strings (or instantiate properly named placeholder
|
||||
objects for the keys), and accept to use ":" instead of "=".
|
||||
|
||||
::
|
||||
|
||||
>>> a["K":3]
|
||||
slice('K', 3, None)
|
||||
>>> a["K":3, "R":4]
|
||||
(slice('K', 3, None), slice('R', 4, None))
|
||||
>>>
|
||||
|
||||
While clearly smart, this approach does not allow easy inquire of the key/value
|
||||
pair, it's too clever and esotheric, and does not allow to pass a slice as in
|
||||
``a[K=1:10:2]``.
|
||||
|
||||
However, Tim Delaney comments
|
||||
|
||||
"I really do think that ``a[b=c, d=e]`` should just be syntax sugar for
|
||||
``a['b':c, 'd':e]``. It's simple to explain, and gives the greatest backwards
|
||||
compatibility. In particular, libraries that already abused slices in this
|
||||
way will just continue to work with the new syntax."
|
||||
|
||||
We think this behavior would produce inconvenient results. The library Pandas uses
|
||||
strings as labels, allowing notation such as
|
||||
|
||||
::
|
||||
|
||||
>>> a[:, "A":"F"]
|
||||
|
||||
to extract data from column "A" to column "F". Under the above comment, this notation
|
||||
would be equally obtained with
|
||||
|
||||
::
|
||||
|
||||
>>> a[:, A="F"]
|
||||
|
||||
which is weird and collides with the intended meaning of keyword in indexing, that
|
||||
is, specifying the axis through conventional names rather than positioning.
|
||||
|
||||
Pass a dictionary as an additional index
|
||||
----------------------------------------
|
||||
|
||||
::
|
||||
|
||||
>>> a[1, 2, {"K": 3}]
|
||||
|
||||
this notation, although less elegant, can already be used and achieves similar
|
||||
results. It's evident that the proposed Strategy "New argument contents" can be
|
||||
interpreted as syntactic sugar for this notation.
|
||||
|
||||
Additional Comments
|
||||
===================
|
||||
|
||||
Commenters also expressed the following relevant points:
|
||||
|
||||
Relevance of ordering of keyword arguments
|
||||
------------------------------------------
|
||||
|
||||
As part of the discussion of this PEP, it's important to decide if the ordering
|
||||
information of the keyword arguments is important, and if indexes and keys can
|
||||
be ordered in an arbitrary way (e.g. ``a[1,Z=3,2,R=4]``). PEP-468 [#PEP-468]_
|
||||
tries to address the first point by proposing the use of an ordereddict,
|
||||
however one would be inclined to accept that keyword arguments in indexing are
|
||||
equivalent to kwargs in function calls, and therefore as of today equally
|
||||
unordered, and with the same restrictions.
|
||||
|
||||
Need for homogeneity of behavior
|
||||
--------------------------------
|
||||
|
||||
Relative to Strategy "New argument contents", a comment from Ian Cordasco
|
||||
points out that
|
||||
|
||||
"it would be unreasonable for just one method to behave totally
|
||||
differently from the standard behaviour in Python. It would be confusing for
|
||||
only ``__getitem__`` (and ostensibly, ``__setitem__``) to take keyword
|
||||
arguments but instead of turning them into a dictionary, turn them into
|
||||
individual single-item dictionaries." We agree with his point, however it must
|
||||
be pointed out that ``__getitem__`` is already special in some regards when it
|
||||
comes to passed arguments.
|
||||
|
||||
Chris Angelico also states:
|
||||
|
||||
"it seems very odd to start out by saying "here, let's give indexing the
|
||||
option to carry keyword args, just like with function calls", and then come
|
||||
back and say "oh, but unlike function calls, they're inherently ordered and
|
||||
carried very differently"." Again, we agree on this point. The most
|
||||
straightforward strategy to keep homogeneity would be Strategy "kwargs
|
||||
argument", opening to a ``**kwargs`` argument on ``__getitem__``.
|
||||
|
||||
One of the authors (Stefano Borini) thinks that only the "strict dictionary"
|
||||
strategy is worth of implementation. It is non-ambiguous, simple, does not
|
||||
force complex parsing, and addresses the problem of referring to axes either
|
||||
by position or by name. The "options" use case is probably best handled with
|
||||
a different approach, and may be irrelevant for this PEP. The alternative
|
||||
"named tuple" is another valid choice.
|
||||
|
||||
Having .get() become obsolete for indexing with default fallback
|
||||
----------------------------------------------------------------
|
||||
|
||||
Introducing a "default" keyword could make ``dict.get()`` obsolete, which would be
|
||||
replaced by ``d["key", default=3]``. Chris Angelico however states:
|
||||
|
||||
"Currently, you need to write ``__getitem__`` (which raises an exception on
|
||||
finding a problem) plus something else, e.g. ``get()``, which returns a default
|
||||
instead. By your proposal, both branches would go inside ``__getitem__``, which
|
||||
means they could share code; but there still need to be two branches."
|
||||
|
||||
Additionally, Chris continues:
|
||||
|
||||
"There'll be an ad-hoc and fairly arbitrary puddle of names (some will go
|
||||
``default=``, others will say that's way too long and go ``def=``, except that
|
||||
that's a keyword so they'll use ``dflt=`` or something...), unless there's a
|
||||
strong force pushing people to one consistent name.".
|
||||
|
||||
This argument is valid but it's equally valid for any function call, and is
|
||||
generally fixed by established convention and documentation.
|
||||
|
||||
On degeneracy of notation
|
||||
-------------------------
|
||||
|
||||
User Drekin commented: "The case of ``a[Z=3]`` and ``a[{"Z": 3}]`` is similar to
|
||||
current ``a[1, 2]`` and ``a[(1, 2)]``. Even though one may argue that the parentheses
|
||||
are actually not part of tuple notation but are just needed because of syntax,
|
||||
it may look as degeneracy of notation when compared to function call: ``f(1, 2)``
|
||||
is not the same thing as ``f((1, 2))``.".
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#keyword-1] "keyword-only args in __getitem__"
|
||||
(http://article.gmane.org/gmane.comp.python.ideas/27584)
|
||||
|
||||
.. [#keyword-2] "Accepting keyword arguments for __getitem__"
|
||||
(https://mail.python.org/pipermail/python-ideas/2014-June/028164.html)
|
||||
|
||||
.. [#keyword-3] "PEP pre-draft: Support for indexing with keyword arguments"
|
||||
https://mail.python.org/pipermail/python-ideas/2014-July/028250.html
|
||||
|
||||
.. [#namedtuple] "namedtuple is not as good as it should be"
|
||||
(https://mail.python.org/pipermail/python-ideas/2013-June/021257.html)
|
||||
|
||||
.. [#PEP-468] "Preserving the order of \*\*kwargs in a function."
|
||||
http://legacy.python.org/dev/peps/pep-0468/
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
Loading…
Reference in New Issue