2001-03-15 23:11:01 -05:00
|
|
|
|
PEP: 237
|
|
|
|
|
Title: Unifying Long Integers and Integers
|
|
|
|
|
Version: $Revision$
|
2006-03-23 15:13:19 -05:00
|
|
|
|
Last-Modified: $Date$
|
2002-08-11 00:05:13 -04:00
|
|
|
|
Author: Moshe Zadka, Guido van Rossum
|
2001-03-15 23:11:01 -05:00
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Created: 11-Mar-2001
|
|
|
|
|
Python-Version: 2.2
|
2001-08-23 00:34:41 -04:00
|
|
|
|
Post-History: 16-Mar-2001, 14-Aug-2001, 23-Aug-2001
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Python currently distinguishes between two kinds of integers
|
|
|
|
|
(ints): regular or short ints, limited by the size of a C long
|
|
|
|
|
(typically 32 or 64 bits), and long ints, which are limited only
|
|
|
|
|
by available memory. When operations on short ints yield results
|
|
|
|
|
that don't fit in a C long, they raise an error. There are some
|
|
|
|
|
other distinctions too. This PEP proposes to do away with most of
|
|
|
|
|
the differences in semantics, unifying the two types from the
|
|
|
|
|
perspective of the Python user.
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Many programs find a need to deal with larger numbers after the
|
|
|
|
|
fact, and changing the algorithms later is bothersome. It can
|
|
|
|
|
hinder performance in the normal case, when all arithmetic is
|
|
|
|
|
performed using long ints whether or not they are needed.
|
|
|
|
|
|
2001-03-15 23:11:01 -05:00
|
|
|
|
Having the machine word size exposed to the language hinders
|
|
|
|
|
portability. For examples Python source files and .pyc's are not
|
2001-08-14 14:12:48 -04:00
|
|
|
|
portable between 32-bit and 64-bit machines because of this.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
There is also the general desire to hide unnecessary details from
|
|
|
|
|
the Python user when they are irrelevant for most applications.
|
2001-08-14 15:30:15 -04:00
|
|
|
|
An example is memory allocation, which is explicit in C but
|
|
|
|
|
automatic in Python, giving us the convenience of unlimited sizes
|
|
|
|
|
on strings, lists, etc. It makes sense to extend this convenience
|
|
|
|
|
to numbers.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
It will give new Python programmers (whether they are new to
|
|
|
|
|
programming in general or not) one less thing to learn before they
|
|
|
|
|
can start using the language.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
Implementation
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Initially, two alternative implementations were proposed (one by
|
2001-08-14 15:30:15 -04:00
|
|
|
|
each author):
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
|
|
|
|
1. The PyInt type's slot for a C long will be turned into a
|
|
|
|
|
|
|
|
|
|
union {
|
|
|
|
|
long i;
|
|
|
|
|
struct {
|
|
|
|
|
unsigned long length;
|
|
|
|
|
digit digits[1];
|
|
|
|
|
} bignum;
|
|
|
|
|
};
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Only the n-1 lower bits of the long have any meaning; the top
|
|
|
|
|
bit is always set. This distinguishes the union. All PyInt
|
|
|
|
|
functions will check this bit before deciding which types of
|
|
|
|
|
operations to use.
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-14 15:30:15 -04:00
|
|
|
|
2. The existing short and long int types remain, but operations
|
|
|
|
|
return a long int instead of raising OverflowError when a
|
2001-08-01 12:48:28 -04:00
|
|
|
|
result cannot be represented as a short int. A new type,
|
|
|
|
|
integer, may be introduced that is an abstract base type of
|
|
|
|
|
which both the int and long implementation types are
|
|
|
|
|
subclassed. This is useful so that programs can check
|
|
|
|
|
integer-ness with a single test:
|
|
|
|
|
|
|
|
|
|
if isinstance(i, integer): ...
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
After some consideration, the second implementation plan was
|
|
|
|
|
selected, since it is far easier to implement, is backwards
|
|
|
|
|
compatible at the C API level, and in addition can be implemented
|
|
|
|
|
partially as a transitional measure.
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Incompatibilities
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
The following operations have (usually subtly) different semantics
|
|
|
|
|
for short and for long integers, and one or the other will have to
|
|
|
|
|
be changed somehow. This is intended to be an exhaustive list.
|
|
|
|
|
If you know of any other operation that differ in outcome
|
|
|
|
|
depending on whether a short or a long int with the same value is
|
|
|
|
|
passed, please write the second author.
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
- Currently, all arithmetic operators on short ints except <<
|
|
|
|
|
raise OverflowError if the result cannot be represented as a
|
|
|
|
|
short int. This will be changed to return a long int instead.
|
|
|
|
|
The following operators can currently raise OverflowError: x+y,
|
|
|
|
|
x-y, x*y, x**y, divmod(x, y), x/y, x%y, and -x. (The last four
|
|
|
|
|
can only overflow when the value -sys.maxint-1 is involved.)
|
|
|
|
|
|
|
|
|
|
- Currently, x<<n can lose bits for short ints. This will be
|
|
|
|
|
changed to return a long int containing all the shifted-out
|
2002-08-11 20:55:43 -04:00
|
|
|
|
bits, if returning a short int would lose bits (where changing
|
|
|
|
|
sign is considered a special case of losing bits).
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-10-30 16:12:14 -05:00
|
|
|
|
- Currently, hex and oct literals for short ints may specify
|
|
|
|
|
negative values; for example 0xffffffff == -1 on a 32-bit
|
2001-08-14 14:12:48 -04:00
|
|
|
|
machine. This will be changed to equal 0xffffffffL (2**32-1).
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2002-08-11 00:05:13 -04:00
|
|
|
|
- Currently, the '%u', '%x', '%X' and '%o' string formatting
|
|
|
|
|
operators and the hex() and oct() built-in functions behave
|
|
|
|
|
differently for negative numbers: negative short ints are
|
|
|
|
|
formatted as unsigned C long, while negative long ints are
|
|
|
|
|
formatted with a minus sign. This will be changed to use the
|
|
|
|
|
long int semantics in all cases (but without the trailing 'L'
|
|
|
|
|
that currently distinguishes the output of hex() and oct() for
|
|
|
|
|
long ints). Note that this means that '%u' becomes an alias for
|
|
|
|
|
'%d'. It will eventually be removed.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
- Currently, repr() of a long int returns a string ending in 'L'
|
2003-12-01 20:22:50 -05:00
|
|
|
|
while repr() of a short int doesn't. The 'L' will be dropped;
|
|
|
|
|
but not before Python 3.0.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
- Currently, an operation with long operands will never return a
|
|
|
|
|
short int. This *may* change, since it allows some
|
2003-12-01 20:22:50 -05:00
|
|
|
|
optimization. (No changes have been made in this area yet, and
|
|
|
|
|
none are planned.)
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-22 23:50:54 -04:00
|
|
|
|
- The expression type(x).__name__ depends on whether x is a short
|
|
|
|
|
or a long int. Since implementation alternative 2 is chosen,
|
2003-12-01 20:22:50 -05:00
|
|
|
|
this difference will remain. (In Python 3.0, we *may* be able
|
|
|
|
|
to deploy a trick to hide the difference, because it *is*
|
|
|
|
|
annoying to reveal the difference to user code, and more so as
|
|
|
|
|
the difference between the two types is less visible.)
|
2001-08-22 23:50:54 -04:00
|
|
|
|
|
|
|
|
|
- Long and short ints are handled different by the marshal module,
|
|
|
|
|
and by the pickle and cPickle modules. This difference will
|
2003-12-01 20:22:50 -05:00
|
|
|
|
remain (at least until Python 3.0).
|
2001-08-22 23:50:54 -04:00
|
|
|
|
|
|
|
|
|
- Short ints with small values (typically between -1 and 99
|
|
|
|
|
inclusive) are "interned" -- whenever a result has such a value,
|
|
|
|
|
an existing short int with the same value is returned. This is
|
|
|
|
|
not done for long ints with the same values. This difference
|
|
|
|
|
will remain. (Since there is no guarantee of this interning, is
|
|
|
|
|
is debatable whether this is a semantic difference -- but code
|
|
|
|
|
may exist that uses 'is' for comparisons of short ints and
|
|
|
|
|
happens to work because of this interning. Such code may fail
|
|
|
|
|
if used with long ints.)
|
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Literals
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
A trailing 'L' at the end of an integer literal will stop having
|
|
|
|
|
any meaning, and will be eventually become illegal. The compiler
|
|
|
|
|
will choose the appropriate type solely based on the value.
|
2003-12-01 20:22:50 -05:00
|
|
|
|
(Until Python 3.0, it will force the literal to be a long; but
|
|
|
|
|
literals without a trailing 'L' may also be long, if they are not
|
|
|
|
|
representable as short ints.)
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Built-in Functions
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
The function int() will return a short or a long int depending on
|
2003-12-01 20:22:50 -05:00
|
|
|
|
the argument value. In Python 3.0, the function long() will call
|
|
|
|
|
the function int(); before then, it will continue to force the
|
|
|
|
|
result to be a long int, but otherwise work the same way as int().
|
|
|
|
|
The built-in name 'long' will remain in the language to represent
|
|
|
|
|
the long implementation type (unless it is completely eradicated
|
|
|
|
|
in Python 3.0), but using the int() function is still recommended,
|
|
|
|
|
since it will automatically return a long when needed.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
C API
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
The C API remains unchanged; C code will still need to be aware of
|
2003-12-01 20:22:50 -05:00
|
|
|
|
the difference between short and long ints. (The Python 3.0 C API
|
|
|
|
|
will probably be completely incompatible.)
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
The PyArg_Parse*() APIs already accept long ints, as long as they
|
|
|
|
|
are within the range representable by C ints or longs, so that
|
|
|
|
|
functions taking C int or long argument won't have to worry about
|
|
|
|
|
dealing with Python longs.
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Transition
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
2003-12-01 20:22:50 -05:00
|
|
|
|
There are three major phases to the transition:
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
A. Short int operations that currently raise OverflowError return
|
|
|
|
|
a long int value instead. This is the only change in this
|
|
|
|
|
phase. Literals will still distinguish between short and long
|
|
|
|
|
ints. The other semantic differences listed above (including
|
|
|
|
|
the behavior of <<) will remain. Because this phase only
|
|
|
|
|
changes situations that currently raise OverflowError, it is
|
|
|
|
|
assumed that this won't break existing code. (Code that
|
|
|
|
|
depends on this exception would have to be too convoluted to be
|
|
|
|
|
concerned about it.) For those concerned about extreme
|
2001-08-22 23:50:54 -04:00
|
|
|
|
backwards compatibility, a command line option (or a call to
|
|
|
|
|
the warnings module) will allow a warning or an error to be
|
|
|
|
|
issued at this point, but this is off by default.
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
2003-12-01 20:22:50 -05:00
|
|
|
|
B. The remaining semantic differences are addressed. In all cases
|
|
|
|
|
the long int semantics will prevail. Since this will introduce
|
|
|
|
|
backwards incompatibilities which will break some old code,
|
|
|
|
|
this phase may require a future statement and/or warnings, and
|
|
|
|
|
a prolonged transition phase. The trailing 'L' will continue
|
|
|
|
|
to be used for longs as input and by repr().
|
|
|
|
|
|
|
|
|
|
C. The trailing 'L' is dropped from repr(), and made illegal on
|
|
|
|
|
input. (If possible, the 'long' type completely disappears.)
|
2006-06-12 15:13:38 -04:00
|
|
|
|
The trailing 'L' is also dropped from hex() and oct().
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
Phase A will be implemented in Python 2.2.
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
2003-12-01 20:22:50 -05:00
|
|
|
|
Phase B will be implemented gradually in Python 2.3 and Python
|
|
|
|
|
2.4. Envisioned stages of phase B:
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2002-08-11 00:05:13 -04:00
|
|
|
|
B0. Warnings are enabled about operations that will change their
|
|
|
|
|
numeric outcome in stage B1, in particular hex() and oct(),
|
|
|
|
|
'%u', '%x', '%X' and '%o', hex and oct literals in the
|
|
|
|
|
(inclusive) range [sys.maxint+1, sys.maxint*2+1], and left
|
2003-12-01 20:22:50 -05:00
|
|
|
|
shifts losing bits.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2003-12-01 20:22:50 -05:00
|
|
|
|
B1. The new semantic for these operations are implemented.
|
|
|
|
|
Operations that give different results than before will *not*
|
|
|
|
|
issue a warning.
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
We propose the following timeline:
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2002-08-11 00:05:13 -04:00
|
|
|
|
B0. Python 2.3.
|
|
|
|
|
|
|
|
|
|
B1. Python 2.4.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2003-12-01 20:22:50 -05:00
|
|
|
|
Phase C will be implemented in Python 3.0 (at least two years
|
|
|
|
|
after Python 2.4 is released).
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
2001-08-22 23:50:54 -04:00
|
|
|
|
OverflowWarning
|
|
|
|
|
|
|
|
|
|
Here are the rules that guide warnings generated in situations
|
|
|
|
|
that currently raise OverflowError. This applies to transition
|
2004-08-24 21:57:46 -04:00
|
|
|
|
phase A. Historical note: despite that phase A was completed in
|
|
|
|
|
Python 2.2, and phase B0 in Python 2.3, nobody noticed that
|
|
|
|
|
OverflowWarning was still generated in Python 2.3. It was finally
|
|
|
|
|
disabled in Python 2.4. The Python builtin OverflowWarning, and
|
|
|
|
|
the corresponding C API PyExc_OverflowWarning, are no longer
|
|
|
|
|
generated or used in Python 2.4, but will remain for the (unlikely)
|
|
|
|
|
case of user code until Python 2.5.
|
2001-08-22 23:50:54 -04:00
|
|
|
|
|
|
|
|
|
- A new warning category is introduced, OverflowWarning. This is
|
|
|
|
|
a built-in name.
|
|
|
|
|
|
|
|
|
|
- If an int result overflows, an OverflowWarning warning is
|
|
|
|
|
issued, with a message argument indicating the operation,
|
|
|
|
|
e.g. "integer addition". This may or may not cause a warning
|
|
|
|
|
message to be displayed on sys.stderr, or may cause an exception
|
|
|
|
|
to be raised, all under control of the -W command line and the
|
|
|
|
|
warnings module.
|
|
|
|
|
|
|
|
|
|
- The OverflowWarning warning is ignored by default.
|
|
|
|
|
|
|
|
|
|
- The OverflowWarning warning can be controlled like all warnings,
|
|
|
|
|
via the -W command line option or via the
|
|
|
|
|
warnings.filterwarnings() call. For example:
|
|
|
|
|
|
|
|
|
|
python -Wdefault::OverflowWarning
|
|
|
|
|
|
|
|
|
|
cause the OverflowWarning to be displayed the first time it
|
|
|
|
|
occurs at a particular source line, and
|
|
|
|
|
|
|
|
|
|
python -Werror::OverflowWarning
|
|
|
|
|
|
|
|
|
|
cause the OverflowWarning to be turned into an exception
|
|
|
|
|
whenever it happens. The following code enables the warning
|
|
|
|
|
from inside the program:
|
|
|
|
|
|
|
|
|
|
import warnings
|
|
|
|
|
warnings.filterwarnings("default", "", OverflowWarning)
|
|
|
|
|
|
|
|
|
|
See the python man page for the -W option and the the warnings
|
|
|
|
|
module documentation for filterwarnings().
|
|
|
|
|
|
|
|
|
|
- If the OverflowWarning warning is turned into an error,
|
|
|
|
|
OverflowError is substituted. This is needed for backwards
|
|
|
|
|
compatibility.
|
|
|
|
|
|
|
|
|
|
- Unless the warning is turned into an exceptions, the result of
|
|
|
|
|
the operation (e.g., x+y) is recomputed after converting the
|
|
|
|
|
arguments to long ints.
|
|
|
|
|
|
|
|
|
|
|
2001-08-14 14:37:44 -04:00
|
|
|
|
Example
|
|
|
|
|
|
|
|
|
|
If you pass a long int to a C function or built-in operation that
|
|
|
|
|
takes an integer, it will be treated the same as as a short int as
|
|
|
|
|
long as the value fits (by virtue of how PyArg_ParseTuple() is
|
|
|
|
|
implemented). If the long value doesn't fit, it will still raise
|
|
|
|
|
an OverflowError. For example:
|
|
|
|
|
|
|
|
|
|
def fact(n):
|
|
|
|
|
if n <= 1:
|
|
|
|
|
return 1
|
|
|
|
|
return n*fact(n-1)
|
|
|
|
|
|
|
|
|
|
A = "ABCDEFGHIJKLMNOPQ"
|
|
|
|
|
n = input("Gimme an int: ")
|
|
|
|
|
print A[fact(n)%17]
|
|
|
|
|
|
|
|
|
|
For n >= 13, this currently raises OverflowError (unless the user
|
|
|
|
|
enters a trailing 'L' as part of their input), even though the
|
|
|
|
|
calculated index would always be in range(17). With the new
|
|
|
|
|
approach this code will do the right thing: the index will be
|
|
|
|
|
calculated as a long int, but its value will be in range.
|
|
|
|
|
|
|
|
|
|
|
2001-08-22 23:50:54 -04:00
|
|
|
|
Resolved Issues
|
2001-08-14 14:12:48 -04:00
|
|
|
|
|
2001-08-22 23:50:54 -04:00
|
|
|
|
These issues, previously open, have been resolved.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2006-06-12 15:13:38 -04:00
|
|
|
|
- hex() and oct() applied to longs will continue to produce a
|
|
|
|
|
trailing 'L' until Python 3000. The original text above wasn't
|
|
|
|
|
clear about this, but since it didn't happen in Python 2.4 it
|
|
|
|
|
was thought better to leave it alone. BDFL pronouncement here:
|
|
|
|
|
|
|
|
|
|
http://mail.python.org/pipermail/python-dev/2006-June/065918.html
|
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
- What to do about sys.maxint? Leave it in, since it is still
|
|
|
|
|
relevant whenever the distinction between short and long ints is
|
|
|
|
|
still relevant (e.g. when inspecting the type of a value).
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 15:30:15 -04:00
|
|
|
|
- Should we remove '%u' completely? Remove it.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
- Should we warn about << not truncating integers? Yes.
|
2001-03-16 08:02:23 -05:00
|
|
|
|
|
2001-08-14 14:12:48 -04:00
|
|
|
|
- Should the overflow warning be on a portable maximum size? No.
|
2001-03-19 14:36:46 -05:00
|
|
|
|
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-23 00:34:41 -04:00
|
|
|
|
Implementation
|
|
|
|
|
|
2003-12-01 20:22:50 -05:00
|
|
|
|
The implementation work for the Python 2.x line is completed;
|
|
|
|
|
phase A was released with Python 2.2, phase B0 with Python 2.3,
|
|
|
|
|
and phase B1 will be released with Python 2.4 (and is already in
|
|
|
|
|
CVS).
|
2001-08-23 00:34:41 -04:00
|
|
|
|
|
|
|
|
|
|
2001-03-15 23:11:01 -05:00
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|