249 lines
8.7 KiB
Plaintext
249 lines
8.7 KiB
Plaintext
PEP: 357
|
|
Title: Allowing Any Object to be Used for Slicing
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Travis Oliphant <oliphant@ee.byu.edu>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 09-Feb-2006
|
|
Python-Version: 2.5
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes adding an ``nb_index`` slot in ``PyNumberMethods`` and an
|
|
``__index__`` special method so that arbitrary objects can be used
|
|
whenever integers are explicitly needed in Python, such as in slice
|
|
syntax (from which the slot gets its name).
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Currently integers and long integers play a special role in
|
|
slicing in that they are the only objects allowed in slice
|
|
syntax. In other words, if X is an object implementing the
|
|
sequence protocol, then ``X[obj1:obj2]`` is only valid if ``obj1`` and
|
|
``obj2`` are both integers or long integers. There is no way for ``obj1``
|
|
and ``obj2`` to tell Python that they could be reasonably used as
|
|
indexes into a sequence. This is an unnecessary limitation.
|
|
|
|
In NumPy, for example, there are 8 different integer scalars
|
|
corresponding to unsigned and signed integers of 8, 16, 32, and 64
|
|
bits. These type-objects could reasonably be used as integers in
|
|
many places where Python expects true integers but cannot inherit from
|
|
the Python integer type because of incompatible memory layouts.
|
|
There should be some way to be able to tell Python that an object can
|
|
behave like an integer.
|
|
|
|
It is not possible to use the ``nb_int`` (and ``__int__`` special method)
|
|
for this purpose because that method is used to *coerce* objects
|
|
to integers. It would be inappropriate to allow every object that
|
|
can be coerced to an integer to be used as an integer everywhere
|
|
Python expects a true integer. For example, if ``__int__`` were used
|
|
to convert an object to an integer in slicing, then float objects
|
|
would be allowed in slicing and ``x[3.2:5.8]`` would not raise an error
|
|
as it should.
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
Add an ``nb_index`` slot to ``PyNumberMethods``, and a corresponding
|
|
``__index__`` special method. Objects could define a function to
|
|
place in the ``nb_index`` slot that returns a Python integer
|
|
(either an int or a long). This integer can
|
|
then be appropriately converted to a ``Py_ssize_t`` value whenever
|
|
Python needs one such as in ``PySequence_GetSlice``,
|
|
``PySequence_SetSlice``, and ``PySequence_DelSlice``.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
1) The ``nb_index`` slot will have the following signature::
|
|
|
|
PyObject *index_func (PyObject *self)
|
|
|
|
The returned object must be a Python ``IntType`` or
|
|
Python ``LongType``. NULL should be returned on
|
|
error with an appropriate error set.
|
|
|
|
2) The ``__index__`` special method will have the signature::
|
|
|
|
def __index__(self):
|
|
return obj
|
|
|
|
where obj must be either an int or a long.
|
|
|
|
3) 3 new abstract C-API functions will be added
|
|
|
|
a) The first checks to see if the object supports the index
|
|
slot and if it is filled in.
|
|
|
|
::
|
|
|
|
int PyIndex_Check(obj)
|
|
|
|
This will return true if the object defines the ``nb_index``
|
|
slot.
|
|
|
|
b) The second is a simple wrapper around the ``nb_index`` call that
|
|
raises ``PyExc_TypeError`` if the call is not available or if it
|
|
doesn't return an int or long. Because the
|
|
``PyIndex_Check`` is performed inside the ``PyNumber_Index`` call
|
|
you can call it directly and manage any error rather than
|
|
check for compatibility first.
|
|
|
|
::
|
|
|
|
PyObject *PyNumber_Index (PyObject *obj)
|
|
|
|
c) The third call helps deal with the common situation of
|
|
actually needing a ``Py_ssize_t`` value from the object to use for
|
|
indexing or other needs.
|
|
|
|
::
|
|
|
|
Py_ssize_t PyNumber_AsSsize_t(PyObject *obj, PyObject *exc)
|
|
|
|
The function calls the ``nb_index`` slot of obj if it is
|
|
available and then converts the returned Python integer into
|
|
a ``Py_ssize_t`` value. If this goes well, then the value is
|
|
returned. The second argument allows control over what
|
|
happens if the integer returned from ``nb_index`` cannot fit
|
|
into a ``Py_ssize_t`` value.
|
|
|
|
If exc is NULL, then the returned value will be clipped to
|
|
``PY_SSIZE_T_MAX`` or ``PY_SSIZE_T_MIN`` depending on whether the
|
|
``nb_index`` slot of obj returned a positive or negative
|
|
integer. If exc is non-NULL, then it is the error object
|
|
that will be set to replace the ``PyExc_OverflowError`` that was
|
|
raised when the Python integer or long was converted to ``Py_ssize_t``.
|
|
|
|
4) A new ``operator.index(obj)`` function will be added that calls
|
|
equivalent of obj.``__index__``() and raises an error if obj does not implement
|
|
the special method.
|
|
|
|
|
|
Implementation Plan
|
|
===================
|
|
|
|
1) Add the ``nb_index`` slot in object.h and modify typeobject.c to
|
|
create the ``__index__`` method
|
|
|
|
2) Change the ``ISINT`` macro in ``ceval.c`` to ``ISINDEX`` and alter it to
|
|
accommodate objects with the index slot defined.
|
|
|
|
3) Change the ``_PyEval_SliceIndex`` function to accommodate objects
|
|
with the index slot defined.
|
|
|
|
4) Change all builtin objects (e.g. lists) that use the as_mapping
|
|
slots for subscript access and use a special-check for integers to
|
|
check for the slot as well.
|
|
|
|
5) Add the ``nb_index`` slot to integers and long_integers
|
|
(which just return themselves)
|
|
|
|
6) Add ``PyNumber_Index`` C-API to return an integer from any
|
|
Python Object that has the ``nb_index`` slot.
|
|
|
|
7) Add the ``operator.index(x)`` function.
|
|
|
|
8) Alter ``arrayobject.c`` and ``mmapmodule.c`` to use the new C-API for their
|
|
sub-scripting and other needs.
|
|
|
|
9) Add unit-tests
|
|
|
|
|
|
Discussion Questions
|
|
====================
|
|
|
|
Speed
|
|
-----
|
|
|
|
Implementation should not slow down Python because integers and long
|
|
integers used as indexes will complete in the same number of
|
|
instructions. The only change will be that what used to generate
|
|
an error will now be acceptable.
|
|
|
|
Why not use ``nb_int`` which is already there?
|
|
----------------------------------------------
|
|
|
|
The ``nb_int`` method is used for coercion and so means something
|
|
fundamentally different than what is requested here. This PEP
|
|
proposes a method for something that *can* already be thought of as
|
|
an integer communicate that information to Python when it needs an
|
|
integer. The biggest example of why using ``nb_int`` would be a bad
|
|
thing is that float objects already define the ``nb_int`` method, but
|
|
float objects *should not* be used as indexes in a sequence.
|
|
|
|
Why the name ``__index__``?
|
|
---------------------------
|
|
|
|
Some questions were raised regarding the name ``__index__`` when other
|
|
interpretations of the slot are possible. For example, the slot
|
|
can be used any time Python requires an integer internally (such
|
|
as in "mystring" \* 3). The name was suggested by Guido because
|
|
slicing syntax is the biggest reason for having such a slot and
|
|
in the end no better name emerged. See the discussion thread [1]_
|
|
for examples of names that were suggested such as "``__discrete__``" and
|
|
"``__ordinal__``".
|
|
|
|
Why return ``PyObject *`` from ``nb_index``?
|
|
--------------------------------------------
|
|
|
|
Initially ``Py_ssize_t`` was selected as the return type for the
|
|
``nb_index`` slot. However, this led to an inability to track and
|
|
distinguish overflow and underflow errors without ugly and brittle
|
|
hacks. As the ``nb_index`` slot is used in at least 3 different ways
|
|
in the Python core (to get an integer, to get a slice end-point,
|
|
and to get a sequence index), there is quite a bit of flexibility
|
|
needed to handle all these cases. The importance of having the
|
|
necessary flexibility to handle all the use cases is critical.
|
|
For example, the initial implementation that returned ``Py_ssize_t`` for
|
|
``nb_index`` led to the discovery that on a 32-bit machine with >=2GB of RAM
|
|
``s = 'x' * (2**100)`` works but ``len(s)`` was clipped at 2147483647.
|
|
Several fixes were suggested but eventually it was decided that
|
|
``nb_index`` needed to return a Python Object similar to the ``nb_int``
|
|
and nb_long slots in order to handle overflow correctly.
|
|
|
|
Why can't ``__index__`` return any object with the ``nb_index`` method?
|
|
-----------------------------------------------------------------------
|
|
|
|
This would allow infinite recursion in many different ways that are not
|
|
easy to check for. This restriction is similar to the requirement that
|
|
``__nonzero__`` return an int or a bool.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
Submitted as patch 1436368 to SourceForge.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Travis Oliphant, PEP for adding an sq_index slot so that any object, a
|
|
or b, can be used in X[a:b] notation,
|
|
|
|
http://mail.python.org/pipermail/python-dev/2006-February/thread.html#60594
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End: |