PEP: 357 Title: Allowing Any Object to be Used for Slicing Version: $Revision$ Last-Modified: $Date$ Author: Travis Oliphant Status: Final Type: Standards Track Content-Type: text/x-rst Created: 09-Feb-2006 Python-Version: 2.5 Post-History: Abstract ======== This PEP proposes adding an ``nb_index`` slot in ``PyNumberMethods`` and an ``__index__`` special method so that arbitrary objects can be used whenever integers are explicitly needed in Python, such as in slice syntax (from which the slot gets its name). Rationale ========= Currently integers and long integers play a special role in slicing in that they are the only objects allowed in slice syntax. In other words, if X is an object implementing the sequence protocol, then ``X[obj1:obj2]`` is only valid if ``obj1`` and ``obj2`` are both integers or long integers. There is no way for ``obj1`` and ``obj2`` to tell Python that they could be reasonably used as indexes into a sequence. This is an unnecessary limitation. In NumPy, for example, there are 8 different integer scalars corresponding to unsigned and signed integers of 8, 16, 32, and 64 bits. These type-objects could reasonably be used as integers in many places where Python expects true integers but cannot inherit from the Python integer type because of incompatible memory layouts. There should be some way to be able to tell Python that an object can behave like an integer. It is not possible to use the ``nb_int`` (and ``__int__`` special method) for this purpose because that method is used to *coerce* objects to integers. It would be inappropriate to allow every object that can be coerced to an integer to be used as an integer everywhere Python expects a true integer. For example, if ``__int__`` were used to convert an object to an integer in slicing, then float objects would be allowed in slicing and ``x[3.2:5.8]`` would not raise an error as it should. Proposal ======== Add an ``nb_index`` slot to ``PyNumberMethods``, and a corresponding ``__index__`` special method. Objects could define a function to place in the ``nb_index`` slot that returns a Python integer (either an int or a long). This integer can then be appropriately converted to a ``Py_ssize_t`` value whenever Python needs one such as in ``PySequence_GetSlice``, ``PySequence_SetSlice``, and ``PySequence_DelSlice``. Specification ============= 1) The ``nb_index`` slot will have the following signature:: PyObject *index_func (PyObject *self) The returned object must be a Python ``IntType`` or Python ``LongType``. NULL should be returned on error with an appropriate error set. 2) The ``__index__`` special method will have the signature:: def __index__(self): return obj where obj must be either an int or a long. 3) 3 new abstract C-API functions will be added a) The first checks to see if the object supports the index slot and if it is filled in. :: int PyIndex_Check(obj) This will return true if the object defines the ``nb_index`` slot. b) The second is a simple wrapper around the ``nb_index`` call that raises ``PyExc_TypeError`` if the call is not available or if it doesn't return an int or long. Because the ``PyIndex_Check`` is performed inside the ``PyNumber_Index`` call you can call it directly and manage any error rather than check for compatibility first. :: PyObject *PyNumber_Index (PyObject *obj) c) The third call helps deal with the common situation of actually needing a ``Py_ssize_t`` value from the object to use for indexing or other needs. :: Py_ssize_t PyNumber_AsSsize_t(PyObject *obj, PyObject *exc) The function calls the ``nb_index`` slot of obj if it is available and then converts the returned Python integer into a ``Py_ssize_t`` value. If this goes well, then the value is returned. The second argument allows control over what happens if the integer returned from ``nb_index`` cannot fit into a ``Py_ssize_t`` value. If exc is NULL, then the returned value will be clipped to ``PY_SSIZE_T_MAX`` or ``PY_SSIZE_T_MIN`` depending on whether the ``nb_index`` slot of obj returned a positive or negative integer. If exc is non-NULL, then it is the error object that will be set to replace the ``PyExc_OverflowError`` that was raised when the Python integer or long was converted to ``Py_ssize_t``. 4) A new ``operator.index(obj)`` function will be added that calls equivalent of obj.``__index__``() and raises an error if obj does not implement the special method. Implementation Plan =================== 1) Add the ``nb_index`` slot in object.h and modify typeobject.c to create the ``__index__`` method 2) Change the ``ISINT`` macro in ``ceval.c`` to ``ISINDEX`` and alter it to accommodate objects with the index slot defined. 3) Change the ``_PyEval_SliceIndex`` function to accommodate objects with the index slot defined. 4) Change all builtin objects (e.g. lists) that use the as_mapping slots for subscript access and use a special-check for integers to check for the slot as well. 5) Add the ``nb_index`` slot to integers and long_integers (which just return themselves) 6) Add ``PyNumber_Index`` C-API to return an integer from any Python Object that has the ``nb_index`` slot. 7) Add the ``operator.index(x)`` function. 8) Alter ``arrayobject.c`` and ``mmapmodule.c`` to use the new C-API for their sub-scripting and other needs. 9) Add unit-tests Discussion Questions ==================== Speed ----- Implementation should not slow down Python because integers and long integers used as indexes will complete in the same number of instructions. The only change will be that what used to generate an error will now be acceptable. Why not use ``nb_int`` which is already there? ---------------------------------------------- The ``nb_int`` method is used for coercion and so means something fundamentally different than what is requested here. This PEP proposes a method for something that *can* already be thought of as an integer communicate that information to Python when it needs an integer. The biggest example of why using ``nb_int`` would be a bad thing is that float objects already define the ``nb_int`` method, but float objects *should not* be used as indexes in a sequence. Why the name ``__index__``? --------------------------- Some questions were raised regarding the name ``__index__`` when other interpretations of the slot are possible. For example, the slot can be used any time Python requires an integer internally (such as in "mystring" \* 3). The name was suggested by Guido because slicing syntax is the biggest reason for having such a slot and in the end no better name emerged. See the discussion thread [1]_ for examples of names that were suggested such as "``__discrete__``" and "``__ordinal__``". Why return ``PyObject *`` from ``nb_index``? -------------------------------------------- Initially ``Py_ssize_t`` was selected as the return type for the ``nb_index`` slot. However, this led to an inability to track and distinguish overflow and underflow errors without ugly and brittle hacks. As the ``nb_index`` slot is used in at least 3 different ways in the Python core (to get an integer, to get a slice end-point, and to get a sequence index), there is quite a bit of flexibility needed to handle all these cases. The importance of having the necessary flexibility to handle all the use cases is critical. For example, the initial implementation that returned ``Py_ssize_t`` for ``nb_index`` led to the discovery that on a 32-bit machine with >=2GB of RAM ``s = 'x' * (2**100)`` works but ``len(s)`` was clipped at 2147483647. Several fixes were suggested but eventually it was decided that ``nb_index`` needed to return a Python Object similar to the ``nb_int`` and nb_long slots in order to handle overflow correctly. Why can't ``__index__`` return any object with the ``nb_index`` method? ----------------------------------------------------------------------- This would allow infinite recursion in many different ways that are not easy to check for. This restriction is similar to the requirement that ``__nonzero__`` return an int or a bool. Reference Implementation ======================== Submitted as patch 1436368 to SourceForge. References ========== .. [1] Travis Oliphant, PEP for adding an sq_index slot so that any object, a or b, can be used in X[a:b] notation, http://mail.python.org/pipermail/python-dev/2006-February/thread.html#60594 Copyright ========= This document is placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: