Add PEP 353.

2006-01-10 21:06:23 +00:00 · 2006-01-10 21:06:23 +00:00 · bb269df4c0
parent a6c22be775
commit bb269df4c0
2 changed files with 248 additions and 0 deletions
--- a/pep-0000.txt
+++ b/pep-0000.txt
@ -107,6 +107,7 @@ Index by Category
 I   350  Codetags                                     Elliott
 S   351  The freeze protocol                          Warsaw
 S   352  Required Superclass for Exceptions           GvR, Cannon
+ S   353  Using ssize_t as the index type              von Loewis
 S   754  IEEE 754 Floating Point Special Values       Warnes

 Finished PEPs (done, implemented in Subversion)
@ -400,6 +401,7 @@ Numerical Index
 I   350  Codetags                                     Elliott
 S   351  The freeze protocol                          Warsaw
 S   352  Required Superclass for Exceptions           GvR, Cannon
+ S   353  Using ssize_t as the index type              von Loewis
 SR  666  Reject Foolish Indentation                   Creighton
 S   754  IEEE 754 Floating Point Special Values       Warnes
 I  3000  Python 3.0 Plans                             Kuchling, Cannon
--- a/pep-0353.txt
+++ b/pep-0353.txt
@ -0,0 +1,246 @@
+PEP: 353
+Title: Using ssize_t as the index type
+Version: $Revision$
+Last-Modified: $Date$
+Author: Martin v. Löwis <martin@v.loewis.de>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 18-Dec-2005
+Post-History:
+
+
+Abstract
+========
+
+In Python 2.4, indices of sequences are restricted to the C type
+int. On 64-bit machines, sequences therefore cannot use the full
+address space, and are restricted to 2**31 elements. This PEP proposes
+to change this, introducing a platform-specific index type
+Py_ssize_t. An implementation of the proposed change is in
+http://svn.python.org/projects/python/branches/ssize_t.
+
+
+Rationale
+=========
+
+64-bit machines are becoming more popular, and the size of main memory
+increases beyond 4GiB. On such machines, Python currently is limited,
+in that sequences (strings, unicode objects, tuples, lists,
+array.arrays, ...)  cannot contain more than 2GiElements.
+
+Today, very few machines have memory to represent larger lists: as
+each pointer is 8B (in a 64-bit machine), one needs 16GiB to just hold
+the pointers of such a list; with data in the list, the memory
+consumption grows even more.  However, there are three container types
+for which users request improvements today:
+
+* strings (currently restricted to 2GiB)
+* mmap objects (likewise; plus the system typically
+  won't keep the whole object in memory concurrently)
+* Numarray objects (from Numerical Python)
+
+As the proposed change will cause incompatibilities on 64-bit
+machines, it should be carried out while such machines are not in wide
+use (IOW, as early as possible).
+
+
+Specification
+=============
+
+A new type Py_ssize_t is introduced, which has the same size as the
+compiler's size_t type, but is signed. It will be a typedef for
+ssize_t where available.
+
+The internal representation of the length fields of all container
+types is changed from int to ssize_t, for all types included in the
+standard distribution.  In particular, PyObject_VAR_HEAD is changed to
+use Py_ssize_t, affecting all extension modules that use that macro.
+
+All occurrences of index and length parameters and results are changed
+to use Py_ssize_t, including the sequence slots in type objects.
+
+New conversion functions PyInt_FromSsize_t and PyInt_AsSsize_t, are
+introduced. PyInt_FromSsize_t will transparently return a long int
+object if the value exceeds the LONG_MAX; PyInt_AsSsize_t will
+transparently process long int objects.
+
+New function pointer typedefs ssizeargfunc, ssizessizeargfunc,
+ssizeobjargproc, and ssizessizeobjargproc are introduced.
+
+A new conversion code 'n' is introduced for PyArg_ParseTuple
+and Py_BuildValue, which operates on Py_ssize_t.
+
+The conversion codes 's#' and 't#' will output Py_ssize_t
+if the macro PY_SIZE_T_CLEAN is defined before Python.h
+is included, and continue to output int if that macro
+isn't defined.
+
+At places where a conversion from size_t/Py_ssize_t to
+int is necessary, the strategy for conversion is chosen
+on a case-by-case basis (see next section).
+
+
+Conversion guidelines
+=====================
+
+Module authors have the choice whether they support this PEP in their
+code or not; if they support it, they have the choice of different
+levels of compatibility.
+
+If a module is not converted to support this PEP, it will continue to
+work unmodified on a 32-bit system.  On a 64-bit system, compile-time
+errors and warnings might be issued, and the module might crash the
+interpreter if the warnings are ignored.
+
+Conversion of a module can either attempt to continue using int
+indices, or use Py_ssize_t indices throughout.
+
+If the module should continue to use int indices, care must be taken
+when calling functions that return Py_ssize_t or size_t, in
+particular, for functions that return the length of an object (this
+includes the strlen function and the sizeof operator). A good compiler
+will warn when a Py_ssize_t/size_t value is truncated into an int.
+In these cases, three strategies are available:
+
+*  statically determine that the size can never exceed an int
+   (e.g. when taking the sizeof a struct, or the strlen of
+   a file pathname). In this case, write::
+
+     some_int = Py_SAFE_DOWNCAST(some_value, Py_ssize_t, int);
+
+   This will add an assertion in debug mode that the value
+   really fits into an int, and just add a cast otherwise.
+
+*  statically determine that the value shouldn't overflow an
+   int unless there is a bug in the C code somewhere. Test
+   whether the value is smaller than INT_MAX, and raise an
+   InternalError if it isn't.
+*  otherwise, check whether the value fits an int, and raise
+   a ValueError if it doesn't.
+
+The same care must be taking for tp_as_sequence slots, in
+addition, the signatures of these slots change, and the
+slots must be explicitly recast (e.g. from intargfunc
+to ssizeargfunc). Compatibility with previous Python
+versions can be achieved with the test::
+
+ #if PY_VERSION_HEX < 0x02050000
+ typedef int Py_ssize_t;
+ #endif
+
+and then using Py_ssize_t in the rest of the code. For
+the tp_as_sequence slots, additional typedefs might
+be necessary; alternatively, by replacing::
+
+ PyObject* foo_item(struct MyType* obj, int index)
+ {
+   ...
+ }
+
+with::
+
+ PyObject* foo_item(PyObject* _obj, Py_ssize_t index)
+ {
+    struct MyType* obj = (struct MyType*)_obj;
+    ...
+ }
+
+it becomes possible to drop the cast entirely; the type
+of foo_item should then match the sq_item slot in all
+Python versions.
+
+If the module should be extended to use Py_ssize_t indices, all usages
+of the type int should be reviewed, to see whether it should be
+changed to Py_ssize_t. The compiler will help in finding the spots,
+but a manual review is still necessary. 
+
+Particular care must be taken for PyArg_ParseTuple calls:
+they need all be checked for s# and t# converters, and
+PY_SIZE_T_CLEAN must be defined before including Python.h
+if the calls have been updated accordingly.
+
+
+Discussion
+==========
+
+Why not size_t
+--------------
+
+An initial attempt to implement this feature tried to use
+size_t. It quickly turned out that this cannot work: Python
+uses negative indices in many places (to indicate counting
+from the end). Even in places where size_t would be usable,
+to many reformulations of code where necessary, e.g. in
+loops like::
+
+  for(index = length-1; index >= 0; index--)
+
+This loop will never terminate if index is changed from
+int to size_t.
+
+Why not Py_intptr_t
+-------------------
+
+Conceptually, Py_intptr_t and Py_ssize_t are different things:
+Py_intptr_t needs to be the same size as void*, and Py_ssize_t
+the same size as size_t. These could differ, e.g. on machines
+where pointers have segment and offset. On current flat-address
+space machines, there is no difference, so for all practical
+purposes, Py_intptr_t would have worked as well.
+
+Doesn't this break much code?
+-----------------------------
+
+With the changes proposed, code breakage is fairly
+minimal. On a 32-bit system, no code will break, as
+Py_ssize_t is just a typedef for int.
+
+On a 64-bit system, the compiler will warn in many
+places. If these warnings are ignored, the code will
+continue to work as long as the container sizes don't
+exceeed 2**31, i.e. it will work nearly as good as
+it does currently. There are two exceptions to this
+statement: if the extension module implements the
+sequence protocol, it must be updated, or the calling
+conventions will be wrong. The other exception is
+the places where Py_ssize_t is output through a 
+pointer (rather than a return value); this applies
+most notably to codecs and slice objects.
+
+If the conversion of the code is made, the same code
+can continue to work on earlier Python releases.
+
+Doesn't this consume too much memory?
+-------------------------------------
+
+One might think that using Py_ssize_t in all tuples,
+strings, lists, etc. is a waste of space. This is
+not true, though: on a 32-bit machine, there is no
+change. On a 64-bit machine, the size of many
+containers doesn't change, e.g.
+
+* in lists and tuples, a pointer immediately follows
+  the ob_size member. This means that the compiler
+  currently inserts a 4 padding bytes; with the 
+  change, these padding bytes become part of the size.
+* in strings, the ob_shash field follows ob_size.
+  This field is of type long, which is a 64-bit
+  type on most 64-bit systems (except Win64), so
+  the compiler inserts padding before it as well.
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   End: