205 lines
8.3 KiB
Plaintext
205 lines
8.3 KiB
Plaintext
PEP: 357
|
|
Title: Allowing Any Object to be Used for Slicing
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Travis Oliphant <oliphant@ee.byu.edu>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Created: 09-Feb-2006
|
|
Python-Version: 2.5
|
|
|
|
Abstract
|
|
|
|
This PEP proposes adding an nb_index slot in PyNumberMethods and an
|
|
__index__ special method so that arbitrary objects can be used
|
|
whenever integers are explicitly needed in Python, such as in slice
|
|
syntax (from which the slot gets its name).
|
|
|
|
Rationale
|
|
|
|
Currently integers and long integers play a special role in
|
|
slicing in that they are the only objects allowed in slice
|
|
syntax. In other words, if X is an object implementing the
|
|
sequence protocol, then X[obj1:obj2] is only valid if obj1 and
|
|
obj2 are both integers or long integers. There is no way for obj1
|
|
and obj2 to tell Python that they could be reasonably used as
|
|
indexes into a sequence. This is an unnecessary limitation.
|
|
|
|
In NumPy, for example, there are 8 different integer scalars
|
|
corresponding to unsigned and signed integers of 8, 16, 32, and 64
|
|
bits. These type-objects could reasonably be used as integers in
|
|
many places where Python expects true integers but cannot inherit from
|
|
the Python integer type because of incompatible memory layouts.
|
|
There should be some way to be able to tell Python that an object can
|
|
behave like an integer.
|
|
|
|
It is not possible to use the nb_int (and __int__ special method)
|
|
for this purpose because that method is used to *coerce* objects
|
|
to integers. It would be inappropriate to allow every object that
|
|
can be coerced to an integer to be used as an integer everywhere
|
|
Python expects a true integer. For example, if __int__ were used
|
|
to convert an object to an integer in slicing, then float objects
|
|
would be allowed in slicing and x[3.2:5.8] would not raise an error
|
|
as it should.
|
|
|
|
Proposal
|
|
|
|
Add an nb_index slot to PyNumberMethods, and a corresponding
|
|
__index__ special method. Objects could define a function to
|
|
place in the nb_index slot that returns a Python integer
|
|
(either an int or a long). This integer can
|
|
then be appropriately converted to a Py_ssize_t value whenever
|
|
Python needs one such as in PySequence_GetSlice,
|
|
PySequence_SetSlice, and PySequence_DelSlice.
|
|
|
|
Specification:
|
|
|
|
1) The nb_index slot will have the following signature
|
|
|
|
PyObject *index_func (PyObject *self)
|
|
|
|
The returned object must be a Python IntType or
|
|
Python LongType. NULL should be returned on
|
|
error with an appropriate error set.
|
|
|
|
2) The __index__ special method will have the signature
|
|
|
|
def __index__(self):
|
|
return obj
|
|
|
|
where obj must be either an int or a long.
|
|
|
|
3) 3 new abstract C-API functions will be added
|
|
|
|
a) The first checks to see if the object supports the index
|
|
slot and if it is filled in.
|
|
|
|
int PyIndex_Check(obj)
|
|
|
|
This will return true if the object defines the nb_index
|
|
slot.
|
|
|
|
b) The second is a simple wrapper around the nb_index call that
|
|
raises PyExc_TypeError if the call is not available or if it
|
|
doesn't return an int or long. Because the
|
|
PyIndex_Check is performed inside the PyNumber_Index call
|
|
you can call it directly and manage any error rather than
|
|
check for compatibility first.
|
|
|
|
PyObject *PyNumber_Index (PyObject *obj)
|
|
|
|
c) The third call helps deal with the common situation of
|
|
actually needing a Py_ssize_t value from the object to use for
|
|
indexing or other needs.
|
|
|
|
Py_ssize_t PyNumber_AsSsize_t(PyObject *obj, PyObject *exc)
|
|
|
|
The function calls the nb_index slot of obj if it is
|
|
available and then converts the returned Python integer into
|
|
a Py_ssize_t value. If this goes well, then the value is
|
|
returned. The second argument allows control over what
|
|
happens if the integer returned from nb_index cannot fit
|
|
into a Py_ssize_t value.
|
|
|
|
If exc is NULL, then the returnd value will be clipped to
|
|
PY_SSIZE_T_MAX or PY_SSIZE_T_MIN depending on whether the
|
|
nb_index slot of obj returned a positive or negative
|
|
integer. If exc is non-NULL, then it is the error object
|
|
that will be set to replace the PyExc_OverflowError that was
|
|
raised when the Python integer or long was converted to Py_ssize_t.
|
|
|
|
4) A new operator.index(obj) function will be added that calls
|
|
equivalent of obj.__index__() and raises an error if obj does not implement
|
|
the special method.
|
|
|
|
Implementation Plan
|
|
|
|
1) Add the nb_index slot in object.h and modify typeobject.c to
|
|
create the __index__ method
|
|
|
|
2) Change the ISINT macro in ceval.c to ISINDEX and alter it to
|
|
accomodate objects with the index slot defined.
|
|
|
|
3) Change the _PyEval_SliceIndex function to accommodate objects
|
|
with the index slot defined.
|
|
|
|
4) Change all builtin objects (e.g. lists) that use the as_mapping
|
|
slots for subscript access and use a special-check for integers to
|
|
check for the slot as well.
|
|
|
|
5) Add the nb_index slot to integers and long_integers
|
|
(which just return themselves)
|
|
|
|
6) Add PyNumber_Index C-API to return an integer from any
|
|
Python Object that has the nb_index slot.
|
|
|
|
7) Add the operator.index(x) function.
|
|
|
|
8) Alter arrayobject.c and mmapmodule.c to use the new C-API for their
|
|
sub-scripting and other needs.
|
|
|
|
9) Add unit-tests
|
|
|
|
|
|
Discussion Questions
|
|
|
|
Speed:
|
|
|
|
Implementation should not slow down Python because integers and long
|
|
integers used as indexes will complete in the same number of
|
|
instructions. The only change will be that what used to generate
|
|
an error will now be acceptable.
|
|
|
|
Why not use nb_int which is already there?
|
|
|
|
The nb_int method is used for coercion and so means something
|
|
fundamentally different than what is requested here. This PEP
|
|
proposes a method for something that *can* already be thought of as
|
|
an integer communicate that information to Python when it needs an
|
|
integer. The biggest example of why using nb_int would be a bad
|
|
thing is that float objects already define the nb_int method, but
|
|
float objects *should not* be used as indexes in a sequence.
|
|
|
|
Why the name __index__?
|
|
|
|
Some questions were raised regarding the name __index__ when other
|
|
interpretations of the slot are possible. For example, the slot
|
|
can be used any time Python requires an integer internally (such
|
|
as in "mystring" * 3). The name was suggested by Guido because
|
|
slicing syntax is the biggest reason for having such a slot and
|
|
in the end no better name emerged. See the discussion thread:
|
|
http://mail.python.org/pipermail/python-dev/2006-February/thread.html#60594
|
|
for examples of names that were suggested such as "__discrete__" and
|
|
"__ordinal__".
|
|
|
|
Why return PyObject * from nb_index?
|
|
|
|
Intially Py_ssize_t was selected as the return type for the
|
|
nb_index slot. However, this led to an inability to track and
|
|
distinguish overflow and underflow errors without ugly and brittle
|
|
hacks. As the nb_index slot is used in at least 3 different ways
|
|
in the Python core (to get an integer, to get a slice end-point,
|
|
and to get a sequence index), there is quite a bit of flexibility
|
|
needed to handle all these cases. The importance of having the
|
|
necessary flexibility to handle all the use cases is critical.
|
|
For example, the initial implementation that returned Py_ssize_t for
|
|
nb_index led to the discovery that on a 32-bit machine with >=2GB of RAM
|
|
s = 'x' * (2**100) works but len(s) was clipped at 2147483647.
|
|
Several fixes were suggested but eventually it was decided that
|
|
nb_index needed to return a Python Object similar to the nb_int
|
|
and nb_long slots in order to handle overflow correctly.
|
|
|
|
Why can't __index__ return any object with the nb_index method?
|
|
|
|
This would allow infinite recursion in many different ways that are not
|
|
easy to check for. This restriction is similar to the requirement that
|
|
__nonzero__ return an int or a bool.
|
|
|
|
Reference Implementation
|
|
|
|
Submitted as patch 1436368 to SourceForge.
|
|
|
|
Copyright
|
|
|
|
This document is placed in the public domain.
|