Add PEP 353.
This commit is contained in:
parent
a6c22be775
commit
bb269df4c0
|
@ -107,6 +107,7 @@ Index by Category
|
||||||
I 350 Codetags Elliott
|
I 350 Codetags Elliott
|
||||||
S 351 The freeze protocol Warsaw
|
S 351 The freeze protocol Warsaw
|
||||||
S 352 Required Superclass for Exceptions GvR, Cannon
|
S 352 Required Superclass for Exceptions GvR, Cannon
|
||||||
|
S 353 Using ssize_t as the index type von Loewis
|
||||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||||
|
|
||||||
Finished PEPs (done, implemented in Subversion)
|
Finished PEPs (done, implemented in Subversion)
|
||||||
|
@ -400,6 +401,7 @@ Numerical Index
|
||||||
I 350 Codetags Elliott
|
I 350 Codetags Elliott
|
||||||
S 351 The freeze protocol Warsaw
|
S 351 The freeze protocol Warsaw
|
||||||
S 352 Required Superclass for Exceptions GvR, Cannon
|
S 352 Required Superclass for Exceptions GvR, Cannon
|
||||||
|
S 353 Using ssize_t as the index type von Loewis
|
||||||
SR 666 Reject Foolish Indentation Creighton
|
SR 666 Reject Foolish Indentation Creighton
|
||||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||||
I 3000 Python 3.0 Plans Kuchling, Cannon
|
I 3000 Python 3.0 Plans Kuchling, Cannon
|
||||||
|
|
|
@ -0,0 +1,246 @@
|
||||||
|
PEP: 353
|
||||||
|
Title: Using ssize_t as the index type
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: Martin v. Löwis <martin@v.loewis.de>
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 18-Dec-2005
|
||||||
|
Post-History:
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
In Python 2.4, indices of sequences are restricted to the C type
|
||||||
|
int. On 64-bit machines, sequences therefore cannot use the full
|
||||||
|
address space, and are restricted to 2**31 elements. This PEP proposes
|
||||||
|
to change this, introducing a platform-specific index type
|
||||||
|
Py_ssize_t. An implementation of the proposed change is in
|
||||||
|
http://svn.python.org/projects/python/branches/ssize_t.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
=========
|
||||||
|
|
||||||
|
64-bit machines are becoming more popular, and the size of main memory
|
||||||
|
increases beyond 4GiB. On such machines, Python currently is limited,
|
||||||
|
in that sequences (strings, unicode objects, tuples, lists,
|
||||||
|
array.arrays, ...) cannot contain more than 2GiElements.
|
||||||
|
|
||||||
|
Today, very few machines have memory to represent larger lists: as
|
||||||
|
each pointer is 8B (in a 64-bit machine), one needs 16GiB to just hold
|
||||||
|
the pointers of such a list; with data in the list, the memory
|
||||||
|
consumption grows even more. However, there are three container types
|
||||||
|
for which users request improvements today:
|
||||||
|
|
||||||
|
* strings (currently restricted to 2GiB)
|
||||||
|
* mmap objects (likewise; plus the system typically
|
||||||
|
won't keep the whole object in memory concurrently)
|
||||||
|
* Numarray objects (from Numerical Python)
|
||||||
|
|
||||||
|
As the proposed change will cause incompatibilities on 64-bit
|
||||||
|
machines, it should be carried out while such machines are not in wide
|
||||||
|
use (IOW, as early as possible).
|
||||||
|
|
||||||
|
|
||||||
|
Specification
|
||||||
|
=============
|
||||||
|
|
||||||
|
A new type Py_ssize_t is introduced, which has the same size as the
|
||||||
|
compiler's size_t type, but is signed. It will be a typedef for
|
||||||
|
ssize_t where available.
|
||||||
|
|
||||||
|
The internal representation of the length fields of all container
|
||||||
|
types is changed from int to ssize_t, for all types included in the
|
||||||
|
standard distribution. In particular, PyObject_VAR_HEAD is changed to
|
||||||
|
use Py_ssize_t, affecting all extension modules that use that macro.
|
||||||
|
|
||||||
|
All occurrences of index and length parameters and results are changed
|
||||||
|
to use Py_ssize_t, including the sequence slots in type objects.
|
||||||
|
|
||||||
|
New conversion functions PyInt_FromSsize_t and PyInt_AsSsize_t, are
|
||||||
|
introduced. PyInt_FromSsize_t will transparently return a long int
|
||||||
|
object if the value exceeds the LONG_MAX; PyInt_AsSsize_t will
|
||||||
|
transparently process long int objects.
|
||||||
|
|
||||||
|
New function pointer typedefs ssizeargfunc, ssizessizeargfunc,
|
||||||
|
ssizeobjargproc, and ssizessizeobjargproc are introduced.
|
||||||
|
|
||||||
|
A new conversion code 'n' is introduced for PyArg_ParseTuple
|
||||||
|
and Py_BuildValue, which operates on Py_ssize_t.
|
||||||
|
|
||||||
|
The conversion codes 's#' and 't#' will output Py_ssize_t
|
||||||
|
if the macro PY_SIZE_T_CLEAN is defined before Python.h
|
||||||
|
is included, and continue to output int if that macro
|
||||||
|
isn't defined.
|
||||||
|
|
||||||
|
At places where a conversion from size_t/Py_ssize_t to
|
||||||
|
int is necessary, the strategy for conversion is chosen
|
||||||
|
on a case-by-case basis (see next section).
|
||||||
|
|
||||||
|
|
||||||
|
Conversion guidelines
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Module authors have the choice whether they support this PEP in their
|
||||||
|
code or not; if they support it, they have the choice of different
|
||||||
|
levels of compatibility.
|
||||||
|
|
||||||
|
If a module is not converted to support this PEP, it will continue to
|
||||||
|
work unmodified on a 32-bit system. On a 64-bit system, compile-time
|
||||||
|
errors and warnings might be issued, and the module might crash the
|
||||||
|
interpreter if the warnings are ignored.
|
||||||
|
|
||||||
|
Conversion of a module can either attempt to continue using int
|
||||||
|
indices, or use Py_ssize_t indices throughout.
|
||||||
|
|
||||||
|
If the module should continue to use int indices, care must be taken
|
||||||
|
when calling functions that return Py_ssize_t or size_t, in
|
||||||
|
particular, for functions that return the length of an object (this
|
||||||
|
includes the strlen function and the sizeof operator). A good compiler
|
||||||
|
will warn when a Py_ssize_t/size_t value is truncated into an int.
|
||||||
|
In these cases, three strategies are available:
|
||||||
|
|
||||||
|
* statically determine that the size can never exceed an int
|
||||||
|
(e.g. when taking the sizeof a struct, or the strlen of
|
||||||
|
a file pathname). In this case, write::
|
||||||
|
|
||||||
|
some_int = Py_SAFE_DOWNCAST(some_value, Py_ssize_t, int);
|
||||||
|
|
||||||
|
This will add an assertion in debug mode that the value
|
||||||
|
really fits into an int, and just add a cast otherwise.
|
||||||
|
|
||||||
|
* statically determine that the value shouldn't overflow an
|
||||||
|
int unless there is a bug in the C code somewhere. Test
|
||||||
|
whether the value is smaller than INT_MAX, and raise an
|
||||||
|
InternalError if it isn't.
|
||||||
|
* otherwise, check whether the value fits an int, and raise
|
||||||
|
a ValueError if it doesn't.
|
||||||
|
|
||||||
|
The same care must be taking for tp_as_sequence slots, in
|
||||||
|
addition, the signatures of these slots change, and the
|
||||||
|
slots must be explicitly recast (e.g. from intargfunc
|
||||||
|
to ssizeargfunc). Compatibility with previous Python
|
||||||
|
versions can be achieved with the test::
|
||||||
|
|
||||||
|
#if PY_VERSION_HEX < 0x02050000
|
||||||
|
typedef int Py_ssize_t;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
and then using Py_ssize_t in the rest of the code. For
|
||||||
|
the tp_as_sequence slots, additional typedefs might
|
||||||
|
be necessary; alternatively, by replacing::
|
||||||
|
|
||||||
|
PyObject* foo_item(struct MyType* obj, int index)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
with::
|
||||||
|
|
||||||
|
PyObject* foo_item(PyObject* _obj, Py_ssize_t index)
|
||||||
|
{
|
||||||
|
struct MyType* obj = (struct MyType*)_obj;
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
it becomes possible to drop the cast entirely; the type
|
||||||
|
of foo_item should then match the sq_item slot in all
|
||||||
|
Python versions.
|
||||||
|
|
||||||
|
If the module should be extended to use Py_ssize_t indices, all usages
|
||||||
|
of the type int should be reviewed, to see whether it should be
|
||||||
|
changed to Py_ssize_t. The compiler will help in finding the spots,
|
||||||
|
but a manual review is still necessary.
|
||||||
|
|
||||||
|
Particular care must be taken for PyArg_ParseTuple calls:
|
||||||
|
they need all be checked for s# and t# converters, and
|
||||||
|
PY_SIZE_T_CLEAN must be defined before including Python.h
|
||||||
|
if the calls have been updated accordingly.
|
||||||
|
|
||||||
|
|
||||||
|
Discussion
|
||||||
|
==========
|
||||||
|
|
||||||
|
Why not size_t
|
||||||
|
--------------
|
||||||
|
|
||||||
|
An initial attempt to implement this feature tried to use
|
||||||
|
size_t. It quickly turned out that this cannot work: Python
|
||||||
|
uses negative indices in many places (to indicate counting
|
||||||
|
from the end). Even in places where size_t would be usable,
|
||||||
|
to many reformulations of code where necessary, e.g. in
|
||||||
|
loops like::
|
||||||
|
|
||||||
|
for(index = length-1; index >= 0; index--)
|
||||||
|
|
||||||
|
This loop will never terminate if index is changed from
|
||||||
|
int to size_t.
|
||||||
|
|
||||||
|
Why not Py_intptr_t
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Conceptually, Py_intptr_t and Py_ssize_t are different things:
|
||||||
|
Py_intptr_t needs to be the same size as void*, and Py_ssize_t
|
||||||
|
the same size as size_t. These could differ, e.g. on machines
|
||||||
|
where pointers have segment and offset. On current flat-address
|
||||||
|
space machines, there is no difference, so for all practical
|
||||||
|
purposes, Py_intptr_t would have worked as well.
|
||||||
|
|
||||||
|
Doesn't this break much code?
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
With the changes proposed, code breakage is fairly
|
||||||
|
minimal. On a 32-bit system, no code will break, as
|
||||||
|
Py_ssize_t is just a typedef for int.
|
||||||
|
|
||||||
|
On a 64-bit system, the compiler will warn in many
|
||||||
|
places. If these warnings are ignored, the code will
|
||||||
|
continue to work as long as the container sizes don't
|
||||||
|
exceeed 2**31, i.e. it will work nearly as good as
|
||||||
|
it does currently. There are two exceptions to this
|
||||||
|
statement: if the extension module implements the
|
||||||
|
sequence protocol, it must be updated, or the calling
|
||||||
|
conventions will be wrong. The other exception is
|
||||||
|
the places where Py_ssize_t is output through a
|
||||||
|
pointer (rather than a return value); this applies
|
||||||
|
most notably to codecs and slice objects.
|
||||||
|
|
||||||
|
If the conversion of the code is made, the same code
|
||||||
|
can continue to work on earlier Python releases.
|
||||||
|
|
||||||
|
Doesn't this consume too much memory?
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
One might think that using Py_ssize_t in all tuples,
|
||||||
|
strings, lists, etc. is a waste of space. This is
|
||||||
|
not true, though: on a 32-bit machine, there is no
|
||||||
|
change. On a 64-bit machine, the size of many
|
||||||
|
containers doesn't change, e.g.
|
||||||
|
|
||||||
|
* in lists and tuples, a pointer immediately follows
|
||||||
|
the ob_size member. This means that the compiler
|
||||||
|
currently inserts a 4 padding bytes; with the
|
||||||
|
change, these padding bytes become part of the size.
|
||||||
|
* in strings, the ob_shash field follows ob_size.
|
||||||
|
This field is of type long, which is a 64-bit
|
||||||
|
type on most 64-bit systems (except Win64), so
|
||||||
|
the compiler inserts padding before it as well.
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
..
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
End:
|
Loading…
Reference in New Issue