Completely revamped. Pick implementation plan 2. Propose to do the

easy bit in 2.2, the rest (introducing incompatibilities) in following
releases.
This commit is contained in:
Guido van Rossum 2001-08-14 18:12:48 +00:00
parent d4979cd62a
commit 3206040daf
1 changed files with 134 additions and 135 deletions

View File

@ -11,66 +11,43 @@ Post-History: 16-Mar-2001
Abstract
Python has both integers (machine word size integral) types, and
long integers (unbounded integral) types. When integers
operations overflow the machine registers, they raise an error.
This PEP proposes to do away with the distinction, and unify the
types from the perspective of both the Python interpreter and the
C API.
Note from second author: this PEP requires more thought about
implementation details. I've started to make a list of semantic
differences but I doubt it's complete.
Python currently distinguishes between two kinds of integers
(ints): regular or short ints, limited by the size of a C long
(typically 32 or 64 bits), and long ints, which are limited only
by available memory. When operations on short ints yield results
that don't fit in a C long, they raise an error. There are some
other distinctions too. This PEP proposes to do away with most of
the differences in semantics, unifying the two types from the
perspective of the Python user.
Rationale
Many programs find a need to deal with larger numbers after the
fact, and changing the algorithms later is bothersome. It can
hinder performance in the normal case, when all arithmetic is
performed using long ints whether or not they are needed.
Having the machine word size exposed to the language hinders
portability. For examples Python source files and .pyc's are not
portable between 32-bit and 64-bit machines because of this. Many
programs find a need to deal with larger numbers after the fact,
and changing the algorithms later is not only bothersome, but
hinders performance in the normal case.
portable between 32-bit and 64-bit machines because of this.
There is also the general desire to hide unnecessary details from
the Python user when they are irrelevant for most applications.
(Another example is memory allocation, which explicit in C but
automatic in Python, giving us the convenience of unlimited sizes
on strings, lists, etc.)
An example is memory allocation, which explicit in C but automatic
in Python, giving us the convenience of unlimited sizes on
strings, lists, etc. It makes sense to extend this convenience to
numbers.
It will give new Python programmers (whether they are new to
programming in general or not) one less thing to learn before they
can start using the language.
Transition
There are three phases of the transition:
1. Ints and longs are treated the same, no warnings are issued for
code that uses longs. Warnings for the use of longs (either
long literals, ending in 'L' or 'l', or use of the long()
function) may be enabled through a command line option.
2. Longs are treated the same as ints but their use triggers a
warning (which may be turned off or turned into an error using
the -W command line option).
3. Long literals and (if we choose implementation plan 1 below)
the long() built-in are no longer legal.
We propose the following timeline:
1. Python 2.2.
2. The rest of the Python 2.x line.
3. Python 3.0 (at least two years in the future).
Implementation
There are two alternative implementations to choose from.
Initially, two alternative implementations were proposed (one by
each autor):
1. The PyInt type's slot for a C long will be turned into a
@ -82,10 +59,10 @@ Implementation
} bignum;
};
Only the n-1 lower bits of the long have any meaning; the top bit
is always set. This distinguishes the union. All PyInt functions
will check this bit before deciding which types of operations to
use.
Only the n-1 lower bits of the long have any meaning; the top
bit is always set. This distinguishes the union. All PyInt
functions will check this bit before deciding which types of
operations to use.
2. The existing short and long int types remain, but the short int
returns a long int instead of raising OverflowError when a
@ -97,98 +74,135 @@ Implementation
if isinstance(i, integer): ...
Literals
A trailing 'L' at the end of an integer literal will stop having
any meaning, and will be eventually phased out.
After some consideration, the second implementation plan was
selected, since it is far easier to implement, is backwards
compatible at the C API level, and in addition can be implemented
partially as a transitional measure.
Built-in Functions
The function long() will call the function int(). If
implementation plan 1 is chosen, it will eventually be phased out;
with implementation plan 2, it remains in the language to
represent the long implementation type -- but the int() function
is still recommended, since it will automatically return a long
when needed.
C API
If implementation plan 1 is chosen, all PyLong_As* will call
PyInt_As*. If PyInt_As* does not exist, it will be added.
Similarly for PyLong_From*. A similar path of warnings as for the
Python built-ins will be followed.
If implementation plan 2 is chosen, the C API remains unchanged.
(The PyArg_Parse*() APIs already accept long ints, as long as they
are within the range representable by C ints or longs. This will
remain unchanged.)
Overflows
When an arithmetic operation on two numbers whose internal
representation is as machine-level integers returns something
whose internal representation is a bignum, a warning which is
turned off by default will be issued. This is only a debugging
aid, and has no guaranteed semantics.
A command line option may be used to enable these warnings (the
regular warning framework supports warnings that are off by
default, but this is be too slow -- it makes a call to an
complex piece of Python code).
This warning is not part of the transition plan; it will always be
off by default, and the feature will probably disappear in Python
3.0.
Semantic Changes
Incompatibilities
The following operations have (usually subtly) different semantics
for short and for long integers, and one will have to change
somehow. This is intended to be an exhaustive list; if you know
of anything else that might change, please write the author.
for short and for long integers, and one or the other will have to
be changed somehow. This is intended to be an exhaustive list.
If you know of any other operation that differ in outcome
depending on whether a short or a long int with the same value is
passed, please write the second author.
- Currently, all arithmetic operators on short ints except <<
raise OverflowError if the result cannot be represented as a
short int. This will change (of course).
short int. This will be changed to return a long int instead.
The following operators can currently raise OverflowError: x+y,
x-y, x*y, x**y, divmod(x, y), x/y, x%y, and -x. (The last four
can only overflow when the value -sys.maxint-1 is involved.)
- Currently x<<n can lose bits for short ints. No more.
- Currently, x<<n can lose bits for short ints. This will be
changed to return a long int containing all the shifted-out
bits, if returning a short int would lose bits.
- Currently, hex and oct literals for for short ints may specify
negative values; for example 0xffffffff == -1 on a 32-bint
machine. No more; this will equal 0xffffffffL which is 2**32-1.
machine. This will be changed to equal 0xffffffffL (2**32-1).
- Currently, the '%u', '%x' and '%o' string formatting operators
and the hex() and oct() built-in functions behave differently
for negative numbers: negative short ints are formatted as
unsigned C long, while negative long ints are formatted with a
minus sign. The long int semantics will rule (but without the
trailing 'L' that currently distinguishes the output of hex()
and oct() for long ints).
minus sign. This will be changed to use the long int semantics
in all cases (but without the trailing 'L' that currently
distinguishes the output of hex() and oct() for long ints).
Note that this means that '%u' becomes an alias for '%d'. It
will eventually be removed.
- Currently, repr() of a long int returns a string ending in 'L'
while repr() of a short int doesn't. The 'L' will be dropped.
- Currently, an operation with long operands will never return a
short int. This may change (it allows an optimization). This
is only relevant if implementation plan 2 is chosen.
- Currently, type(x) may reveal the difference between short and
long ints. This will change if implementation plan 1 is chosen.
short int. This *may* change, since it allows some
optimization.
Jython Issues
Literals
Jython will have a PyInt interface which is implemented by both
from PyFixNum and PyBigNum.
A trailing 'L' at the end of an integer literal will stop having
any meaning, and will be eventually become illegal. The compiler
will choose the appropriate type solely based on the value.
(Question for the Jython developers -- do you foresee any other
problems?)
Built-in Functions
The function int() will return a short or a long int depending on
the argument value. The function long() will call the function
int(). The built-in name 'long' will remain in the language to
represent the long implementation type, but using the int()
function is still recommended, since it will automatically return
a long when needed.
C API
The C API remains unchanged; C code will still need to be aware of
the difference between short and long ints.
The PyArg_Parse*() APIs already accept long ints, as long as they
are within the range representable by C ints or longs, so that
functions taking C int or long argument won't have to worry about
dealing with Python longs.
Transition
There are two major phases to the transition:
A. Short int operations that currently raise OverflowError return
a long int value instead. This is the only change in this
phase. Literals will still distinguish between short and long
ints. The other semantic differences listed above (including
the behavior of <<) will remain. Because this phase only
changes situations that currently raise OverflowError, it is
assumed that this won't break existing code. (Code that
depends on this exception would have to be too convoluted to be
concerned about it.) For those concerned about extreme
backwards compatibility, a command line option will allow a
warning to be issued at this point, but this is off by default.
B. The remaining semantic differences are addressed. In most
cases the long int semantics will prevail; however, the
trailing 'L' from long int representations will be dropped.
Eventually, support for integer literals with a trailing 'L'
will be removed. Since this will introduce backwards
incompatibilities which will break some old code, this phase
may require a future statement and/or warnings, and a
prolongued transition phase.
Phase A will be implemented in Python 2.2.
Phase B will be implemented starting with Python 2.3. Envisioned
stages of phase B:
B1. The remaining semantic differenes are addressed. Operations
that give different results than before will issue a warning
that is on by default. A warning for the use of long literals
(with a trailing 'L') may be enabled through a command line
option, but it is off by default.
B2. The warning for long literals is turned on by default.
B3. The warnings about operations that give different results than
before are turned off by default.
B4. Long literals are no longer legal. All warnings related to
this issue are gone.
We propose the following timeline:
B1. Python 2.3.
B2. Python 2.4.
B3. The rest of the Python 2.x line.
B4. Python 3.0 (at least two years in the future).
Open Issues
@ -197,30 +211,15 @@ Open Issues
feedback is received or we gather more experience with the initial
implementation.
- Which implementation plan to choose? Moshe is for plan 1, Guido
is for plan 2. Plan 2 seems less work. Plan 1 probably breaks
more at the C API level, e.g. PyInt_AS_LONG below.
- What to do about sys.maxint? Leave it in, since it is still
relevant whenever the distinction between short and long ints is
still relevant (e.g. when inspecting the type of a value).
- What to do about sys.maxint? (If implementation plan 1 is
chosen, it should probably be phased out; for plan 2, it is
still meaningful.)
- Should be remove '%u' completely? Remove it.
- What to do about PyInt_AS_LONG failures? (Only relevant with
implementation plan 1.)
- Should we warn about << not truncating integers? Yes.
- What do do about %u, %o, %x formatting operators?
- Should we warn about << not cutting integers?
- Should the overflow warning be on a portable maximum size?
- Will unification of types and classes help with a more
straightforward implementation? (Yes, it allows a common base
class.)
- Define an C API that can be used to find out what the
representation of an int is (only relevant for implementation
plan 1).
- Should the overflow warning be on a portable maximum size? No.
Copyright