python-peps/pep-0237.txt

331 lines
12 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 237
Title: Unifying Long Integers and Integers
Version: $Revision$
Author: pep@zadka.site.co.il (Moshe Zadka), guido@python.org (Guido van Rossum)
Status: Draft
Type: Standards Track
Created: 11-Mar-2001
Python-Version: 2.2
Post-History: 16-Mar-2001, 14-Aug-2001, 23-Aug-2001
Abstract
Python currently distinguishes between two kinds of integers
(ints): regular or short ints, limited by the size of a C long
(typically 32 or 64 bits), and long ints, which are limited only
by available memory. When operations on short ints yield results
that don't fit in a C long, they raise an error. There are some
other distinctions too. This PEP proposes to do away with most of
the differences in semantics, unifying the two types from the
perspective of the Python user.
Rationale
Many programs find a need to deal with larger numbers after the
fact, and changing the algorithms later is bothersome. It can
hinder performance in the normal case, when all arithmetic is
performed using long ints whether or not they are needed.
Having the machine word size exposed to the language hinders
portability. For examples Python source files and .pyc's are not
portable between 32-bit and 64-bit machines because of this.
There is also the general desire to hide unnecessary details from
the Python user when they are irrelevant for most applications.
An example is memory allocation, which is explicit in C but
automatic in Python, giving us the convenience of unlimited sizes
on strings, lists, etc. It makes sense to extend this convenience
to numbers.
It will give new Python programmers (whether they are new to
programming in general or not) one less thing to learn before they
can start using the language.
Implementation
Initially, two alternative implementations were proposed (one by
each author):
1. The PyInt type's slot for a C long will be turned into a
union {
long i;
struct {
unsigned long length;
digit digits[1];
} bignum;
};
Only the n-1 lower bits of the long have any meaning; the top
bit is always set. This distinguishes the union. All PyInt
functions will check this bit before deciding which types of
operations to use.
2. The existing short and long int types remain, but operations
return a long int instead of raising OverflowError when a
result cannot be represented as a short int. A new type,
integer, may be introduced that is an abstract base type of
which both the int and long implementation types are
subclassed. This is useful so that programs can check
integer-ness with a single test:
if isinstance(i, integer): ...
After some consideration, the second implementation plan was
selected, since it is far easier to implement, is backwards
compatible at the C API level, and in addition can be implemented
partially as a transitional measure.
Incompatibilities
The following operations have (usually subtly) different semantics
for short and for long integers, and one or the other will have to
be changed somehow. This is intended to be an exhaustive list.
If you know of any other operation that differ in outcome
depending on whether a short or a long int with the same value is
passed, please write the second author.
- Currently, all arithmetic operators on short ints except <<
raise OverflowError if the result cannot be represented as a
short int. This will be changed to return a long int instead.
The following operators can currently raise OverflowError: x+y,
x-y, x*y, x**y, divmod(x, y), x/y, x%y, and -x. (The last four
can only overflow when the value -sys.maxint-1 is involved.)
- Currently, x<<n can lose bits for short ints. This will be
changed to return a long int containing all the shifted-out
bits, if returning a short int would lose bits.
- Currently, hex and oct literals for short ints may specify
negative values; for example 0xffffffff == -1 on a 32-bit
machine. This will be changed to equal 0xffffffffL (2**32-1).
- Currently, the '%u', '%x' and '%o' string formatting operators
and the hex() and oct() built-in functions behave differently
for negative numbers: negative short ints are formatted as
unsigned C long, while negative long ints are formatted with a
minus sign. This will be changed to use the long int semantics
in all cases (but without the trailing 'L' that currently
distinguishes the output of hex() and oct() for long ints).
Note that this means that '%u' becomes an alias for '%d'. It
will eventually be removed.
- Currently, repr() of a long int returns a string ending in 'L'
while repr() of a short int doesn't. The 'L' will be dropped.
- Currently, an operation with long operands will never return a
short int. This *may* change, since it allows some
optimization.
- The expression type(x).__name__ depends on whether x is a short
or a long int. Since implementation alternative 2 is chosen,
this difference will remain.
- Long and short ints are handled different by the marshal module,
and by the pickle and cPickle modules. This difference will
remain.
- Short ints with small values (typically between -1 and 99
inclusive) are "interned" -- whenever a result has such a value,
an existing short int with the same value is returned. This is
not done for long ints with the same values. This difference
will remain. (Since there is no guarantee of this interning, is
is debatable whether this is a semantic difference -- but code
may exist that uses 'is' for comparisons of short ints and
happens to work because of this interning. Such code may fail
if used with long ints.)
Literals
A trailing 'L' at the end of an integer literal will stop having
any meaning, and will be eventually become illegal. The compiler
will choose the appropriate type solely based on the value.
Built-in Functions
The function int() will return a short or a long int depending on
the argument value. The function long() will call the function
int(). The built-in name 'long' will remain in the language to
represent the long implementation type, but using the int()
function is still recommended, since it will automatically return
a long when needed.
C API
The C API remains unchanged; C code will still need to be aware of
the difference between short and long ints.
The PyArg_Parse*() APIs already accept long ints, as long as they
are within the range representable by C ints or longs, so that
functions taking C int or long argument won't have to worry about
dealing with Python longs.
Transition
There are two major phases to the transition:
A. Short int operations that currently raise OverflowError return
a long int value instead. This is the only change in this
phase. Literals will still distinguish between short and long
ints. The other semantic differences listed above (including
the behavior of <<) will remain. Because this phase only
changes situations that currently raise OverflowError, it is
assumed that this won't break existing code. (Code that
depends on this exception would have to be too convoluted to be
concerned about it.) For those concerned about extreme
backwards compatibility, a command line option (or a call to
the warnings module) will allow a warning or an error to be
issued at this point, but this is off by default.
B. The remaining semantic differences are addressed. In most
cases the long int semantics will prevail; however, the
trailing 'L' from long int representations will be dropped.
Eventually, support for integer literals with a trailing 'L'
will be removed. Since this will introduce backwards
incompatibilities which will break some old code, this phase
may require a future statement and/or warnings, and a
prolonged transition phase.
Phase A will be implemented in Python 2.2.
Phase B will be implemented starting with Python 2.3. Envisioned
stages of phase B:
B1. The remaining semantic differences are addressed. Operations
that give different results than before will issue a warning
that is on by default. A warning for the use of long literals
(with a trailing 'L') may be enabled through a command line
option, but it is off by default.
B2. The warning for long literals is turned on by default.
B3. The warnings about operations that give different results than
before are turned off by default.
B4. Long literals are no longer legal. All warnings related to
this issue are gone.
We propose the following timeline:
B1. Python 2.3.
B2. Python 2.4.
B3. The rest of the Python 2.x line.
B4. Python 3.0 (at least two years in the future).
OverflowWarning
Here are the rules that guide warnings generated in situations
that currently raise OverflowError. This applies to transition
phase A.
- A new warning category is introduced, OverflowWarning. This is
a built-in name.
- If an int result overflows, an OverflowWarning warning is
issued, with a message argument indicating the operation,
e.g. "integer addition". This may or may not cause a warning
message to be displayed on sys.stderr, or may cause an exception
to be raised, all under control of the -W command line and the
warnings module.
- The OverflowWarning warning is ignored by default.
- The OverflowWarning warning can be controlled like all warnings,
via the -W command line option or via the
warnings.filterwarnings() call. For example:
python -Wdefault::OverflowWarning
cause the OverflowWarning to be displayed the first time it
occurs at a particular source line, and
python -Werror::OverflowWarning
cause the OverflowWarning to be turned into an exception
whenever it happens. The following code enables the warning
from inside the program:
import warnings
warnings.filterwarnings("default", "", OverflowWarning)
See the python man page for the -W option and the the warnings
module documentation for filterwarnings().
- If the OverflowWarning warning is turned into an error,
OverflowError is substituted. This is needed for backwards
compatibility.
- Unless the warning is turned into an exceptions, the result of
the operation (e.g., x+y) is recomputed after converting the
arguments to long ints.
Example
If you pass a long int to a C function or built-in operation that
takes an integer, it will be treated the same as as a short int as
long as the value fits (by virtue of how PyArg_ParseTuple() is
implemented). If the long value doesn't fit, it will still raise
an OverflowError. For example:
def fact(n):
if n <= 1:
return 1
return n*fact(n-1)
A = "ABCDEFGHIJKLMNOPQ"
n = input("Gimme an int: ")
print A[fact(n)%17]
For n >= 13, this currently raises OverflowError (unless the user
enters a trailing 'L' as part of their input), even though the
calculated index would always be in range(17). With the new
approach this code will do the right thing: the index will be
calculated as a long int, but its value will be in range.
Resolved Issues
These issues, previously open, have been resolved.
- What to do about sys.maxint? Leave it in, since it is still
relevant whenever the distinction between short and long ints is
still relevant (e.g. when inspecting the type of a value).
- Should we remove '%u' completely? Remove it.
- Should we warn about << not truncating integers? Yes.
- Should the overflow warning be on a portable maximum size? No.
Implementation
A complete implementation of phase A is present in the current CVS
tree and will be released with Python 2.2a3. (It didn't make it
into 2.2a2.) Still missing are documentation and a test suite.
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: