Add PEP 353.
This commit is contained in:
parent
a6c22be775
commit
bb269df4c0
|
@ -107,6 +107,7 @@ Index by Category
|
|||
I 350 Codetags Elliott
|
||||
S 351 The freeze protocol Warsaw
|
||||
S 352 Required Superclass for Exceptions GvR, Cannon
|
||||
S 353 Using ssize_t as the index type von Loewis
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
|
||||
Finished PEPs (done, implemented in Subversion)
|
||||
|
@ -400,6 +401,7 @@ Numerical Index
|
|||
I 350 Codetags Elliott
|
||||
S 351 The freeze protocol Warsaw
|
||||
S 352 Required Superclass for Exceptions GvR, Cannon
|
||||
S 353 Using ssize_t as the index type von Loewis
|
||||
SR 666 Reject Foolish Indentation Creighton
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
I 3000 Python 3.0 Plans Kuchling, Cannon
|
||||
|
|
|
@ -0,0 +1,246 @@
|
|||
PEP: 353
|
||||
Title: Using ssize_t as the index type
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Martin v. Löwis <martin@v.loewis.de>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 18-Dec-2005
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
In Python 2.4, indices of sequences are restricted to the C type
|
||||
int. On 64-bit machines, sequences therefore cannot use the full
|
||||
address space, and are restricted to 2**31 elements. This PEP proposes
|
||||
to change this, introducing a platform-specific index type
|
||||
Py_ssize_t. An implementation of the proposed change is in
|
||||
http://svn.python.org/projects/python/branches/ssize_t.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
64-bit machines are becoming more popular, and the size of main memory
|
||||
increases beyond 4GiB. On such machines, Python currently is limited,
|
||||
in that sequences (strings, unicode objects, tuples, lists,
|
||||
array.arrays, ...) cannot contain more than 2GiElements.
|
||||
|
||||
Today, very few machines have memory to represent larger lists: as
|
||||
each pointer is 8B (in a 64-bit machine), one needs 16GiB to just hold
|
||||
the pointers of such a list; with data in the list, the memory
|
||||
consumption grows even more. However, there are three container types
|
||||
for which users request improvements today:
|
||||
|
||||
* strings (currently restricted to 2GiB)
|
||||
* mmap objects (likewise; plus the system typically
|
||||
won't keep the whole object in memory concurrently)
|
||||
* Numarray objects (from Numerical Python)
|
||||
|
||||
As the proposed change will cause incompatibilities on 64-bit
|
||||
machines, it should be carried out while such machines are not in wide
|
||||
use (IOW, as early as possible).
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
A new type Py_ssize_t is introduced, which has the same size as the
|
||||
compiler's size_t type, but is signed. It will be a typedef for
|
||||
ssize_t where available.
|
||||
|
||||
The internal representation of the length fields of all container
|
||||
types is changed from int to ssize_t, for all types included in the
|
||||
standard distribution. In particular, PyObject_VAR_HEAD is changed to
|
||||
use Py_ssize_t, affecting all extension modules that use that macro.
|
||||
|
||||
All occurrences of index and length parameters and results are changed
|
||||
to use Py_ssize_t, including the sequence slots in type objects.
|
||||
|
||||
New conversion functions PyInt_FromSsize_t and PyInt_AsSsize_t, are
|
||||
introduced. PyInt_FromSsize_t will transparently return a long int
|
||||
object if the value exceeds the LONG_MAX; PyInt_AsSsize_t will
|
||||
transparently process long int objects.
|
||||
|
||||
New function pointer typedefs ssizeargfunc, ssizessizeargfunc,
|
||||
ssizeobjargproc, and ssizessizeobjargproc are introduced.
|
||||
|
||||
A new conversion code 'n' is introduced for PyArg_ParseTuple
|
||||
and Py_BuildValue, which operates on Py_ssize_t.
|
||||
|
||||
The conversion codes 's#' and 't#' will output Py_ssize_t
|
||||
if the macro PY_SIZE_T_CLEAN is defined before Python.h
|
||||
is included, and continue to output int if that macro
|
||||
isn't defined.
|
||||
|
||||
At places where a conversion from size_t/Py_ssize_t to
|
||||
int is necessary, the strategy for conversion is chosen
|
||||
on a case-by-case basis (see next section).
|
||||
|
||||
|
||||
Conversion guidelines
|
||||
=====================
|
||||
|
||||
Module authors have the choice whether they support this PEP in their
|
||||
code or not; if they support it, they have the choice of different
|
||||
levels of compatibility.
|
||||
|
||||
If a module is not converted to support this PEP, it will continue to
|
||||
work unmodified on a 32-bit system. On a 64-bit system, compile-time
|
||||
errors and warnings might be issued, and the module might crash the
|
||||
interpreter if the warnings are ignored.
|
||||
|
||||
Conversion of a module can either attempt to continue using int
|
||||
indices, or use Py_ssize_t indices throughout.
|
||||
|
||||
If the module should continue to use int indices, care must be taken
|
||||
when calling functions that return Py_ssize_t or size_t, in
|
||||
particular, for functions that return the length of an object (this
|
||||
includes the strlen function and the sizeof operator). A good compiler
|
||||
will warn when a Py_ssize_t/size_t value is truncated into an int.
|
||||
In these cases, three strategies are available:
|
||||
|
||||
* statically determine that the size can never exceed an int
|
||||
(e.g. when taking the sizeof a struct, or the strlen of
|
||||
a file pathname). In this case, write::
|
||||
|
||||
some_int = Py_SAFE_DOWNCAST(some_value, Py_ssize_t, int);
|
||||
|
||||
This will add an assertion in debug mode that the value
|
||||
really fits into an int, and just add a cast otherwise.
|
||||
|
||||
* statically determine that the value shouldn't overflow an
|
||||
int unless there is a bug in the C code somewhere. Test
|
||||
whether the value is smaller than INT_MAX, and raise an
|
||||
InternalError if it isn't.
|
||||
* otherwise, check whether the value fits an int, and raise
|
||||
a ValueError if it doesn't.
|
||||
|
||||
The same care must be taking for tp_as_sequence slots, in
|
||||
addition, the signatures of these slots change, and the
|
||||
slots must be explicitly recast (e.g. from intargfunc
|
||||
to ssizeargfunc). Compatibility with previous Python
|
||||
versions can be achieved with the test::
|
||||
|
||||
#if PY_VERSION_HEX < 0x02050000
|
||||
typedef int Py_ssize_t;
|
||||
#endif
|
||||
|
||||
and then using Py_ssize_t in the rest of the code. For
|
||||
the tp_as_sequence slots, additional typedefs might
|
||||
be necessary; alternatively, by replacing::
|
||||
|
||||
PyObject* foo_item(struct MyType* obj, int index)
|
||||
{
|
||||
...
|
||||
}
|
||||
|
||||
with::
|
||||
|
||||
PyObject* foo_item(PyObject* _obj, Py_ssize_t index)
|
||||
{
|
||||
struct MyType* obj = (struct MyType*)_obj;
|
||||
...
|
||||
}
|
||||
|
||||
it becomes possible to drop the cast entirely; the type
|
||||
of foo_item should then match the sq_item slot in all
|
||||
Python versions.
|
||||
|
||||
If the module should be extended to use Py_ssize_t indices, all usages
|
||||
of the type int should be reviewed, to see whether it should be
|
||||
changed to Py_ssize_t. The compiler will help in finding the spots,
|
||||
but a manual review is still necessary.
|
||||
|
||||
Particular care must be taken for PyArg_ParseTuple calls:
|
||||
they need all be checked for s# and t# converters, and
|
||||
PY_SIZE_T_CLEAN must be defined before including Python.h
|
||||
if the calls have been updated accordingly.
|
||||
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
||||
Why not size_t
|
||||
--------------
|
||||
|
||||
An initial attempt to implement this feature tried to use
|
||||
size_t. It quickly turned out that this cannot work: Python
|
||||
uses negative indices in many places (to indicate counting
|
||||
from the end). Even in places where size_t would be usable,
|
||||
to many reformulations of code where necessary, e.g. in
|
||||
loops like::
|
||||
|
||||
for(index = length-1; index >= 0; index--)
|
||||
|
||||
This loop will never terminate if index is changed from
|
||||
int to size_t.
|
||||
|
||||
Why not Py_intptr_t
|
||||
-------------------
|
||||
|
||||
Conceptually, Py_intptr_t and Py_ssize_t are different things:
|
||||
Py_intptr_t needs to be the same size as void*, and Py_ssize_t
|
||||
the same size as size_t. These could differ, e.g. on machines
|
||||
where pointers have segment and offset. On current flat-address
|
||||
space machines, there is no difference, so for all practical
|
||||
purposes, Py_intptr_t would have worked as well.
|
||||
|
||||
Doesn't this break much code?
|
||||
-----------------------------
|
||||
|
||||
With the changes proposed, code breakage is fairly
|
||||
minimal. On a 32-bit system, no code will break, as
|
||||
Py_ssize_t is just a typedef for int.
|
||||
|
||||
On a 64-bit system, the compiler will warn in many
|
||||
places. If these warnings are ignored, the code will
|
||||
continue to work as long as the container sizes don't
|
||||
exceeed 2**31, i.e. it will work nearly as good as
|
||||
it does currently. There are two exceptions to this
|
||||
statement: if the extension module implements the
|
||||
sequence protocol, it must be updated, or the calling
|
||||
conventions will be wrong. The other exception is
|
||||
the places where Py_ssize_t is output through a
|
||||
pointer (rather than a return value); this applies
|
||||
most notably to codecs and slice objects.
|
||||
|
||||
If the conversion of the code is made, the same code
|
||||
can continue to work on earlier Python releases.
|
||||
|
||||
Doesn't this consume too much memory?
|
||||
-------------------------------------
|
||||
|
||||
One might think that using Py_ssize_t in all tuples,
|
||||
strings, lists, etc. is a waste of space. This is
|
||||
not true, though: on a 32-bit machine, there is no
|
||||
change. On a 64-bit machine, the size of many
|
||||
containers doesn't change, e.g.
|
||||
|
||||
* in lists and tuples, a pointer immediately follows
|
||||
the ob_size member. This means that the compiler
|
||||
currently inserts a 4 padding bytes; with the
|
||||
change, these padding bytes become part of the size.
|
||||
* in strings, the ob_shash field follows ob_size.
|
||||
This field is of type long, which is a 64-bit
|
||||
type on most 64-bit systems (except Win64), so
|
||||
the compiler inserts padding before it as well.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
Loading…
Reference in New Issue