Completely revamped. Pick implementation plan 2. Propose to do the
easy bit in 2.2, the rest (introducing incompatibilities) in following releases.
This commit is contained in:
parent
d4979cd62a
commit
3206040daf
269
pep-0237.txt
269
pep-0237.txt
|
@ -11,66 +11,43 @@ Post-History: 16-Mar-2001
|
|||
|
||||
Abstract
|
||||
|
||||
Python has both integers (machine word size integral) types, and
|
||||
long integers (unbounded integral) types. When integers
|
||||
operations overflow the machine registers, they raise an error.
|
||||
This PEP proposes to do away with the distinction, and unify the
|
||||
types from the perspective of both the Python interpreter and the
|
||||
C API.
|
||||
|
||||
Note from second author: this PEP requires more thought about
|
||||
implementation details. I've started to make a list of semantic
|
||||
differences but I doubt it's complete.
|
||||
Python currently distinguishes between two kinds of integers
|
||||
(ints): regular or short ints, limited by the size of a C long
|
||||
(typically 32 or 64 bits), and long ints, which are limited only
|
||||
by available memory. When operations on short ints yield results
|
||||
that don't fit in a C long, they raise an error. There are some
|
||||
other distinctions too. This PEP proposes to do away with most of
|
||||
the differences in semantics, unifying the two types from the
|
||||
perspective of the Python user.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Many programs find a need to deal with larger numbers after the
|
||||
fact, and changing the algorithms later is bothersome. It can
|
||||
hinder performance in the normal case, when all arithmetic is
|
||||
performed using long ints whether or not they are needed.
|
||||
|
||||
Having the machine word size exposed to the language hinders
|
||||
portability. For examples Python source files and .pyc's are not
|
||||
portable between 32-bit and 64-bit machines because of this. Many
|
||||
programs find a need to deal with larger numbers after the fact,
|
||||
and changing the algorithms later is not only bothersome, but
|
||||
hinders performance in the normal case.
|
||||
portable between 32-bit and 64-bit machines because of this.
|
||||
|
||||
There is also the general desire to hide unnecessary details from
|
||||
the Python user when they are irrelevant for most applications.
|
||||
(Another example is memory allocation, which explicit in C but
|
||||
automatic in Python, giving us the convenience of unlimited sizes
|
||||
on strings, lists, etc.)
|
||||
An example is memory allocation, which explicit in C but automatic
|
||||
in Python, giving us the convenience of unlimited sizes on
|
||||
strings, lists, etc. It makes sense to extend this convenience to
|
||||
numbers.
|
||||
|
||||
It will give new Python programmers (whether they are new to
|
||||
programming in general or not) one less thing to learn before they
|
||||
can start using the language.
|
||||
|
||||
|
||||
Transition
|
||||
|
||||
There are three phases of the transition:
|
||||
|
||||
1. Ints and longs are treated the same, no warnings are issued for
|
||||
code that uses longs. Warnings for the use of longs (either
|
||||
long literals, ending in 'L' or 'l', or use of the long()
|
||||
function) may be enabled through a command line option.
|
||||
|
||||
2. Longs are treated the same as ints but their use triggers a
|
||||
warning (which may be turned off or turned into an error using
|
||||
the -W command line option).
|
||||
|
||||
3. Long literals and (if we choose implementation plan 1 below)
|
||||
the long() built-in are no longer legal.
|
||||
|
||||
We propose the following timeline:
|
||||
|
||||
1. Python 2.2.
|
||||
|
||||
2. The rest of the Python 2.x line.
|
||||
|
||||
3. Python 3.0 (at least two years in the future).
|
||||
|
||||
|
||||
Implementation
|
||||
|
||||
There are two alternative implementations to choose from.
|
||||
Initially, two alternative implementations were proposed (one by
|
||||
each autor):
|
||||
|
||||
1. The PyInt type's slot for a C long will be turned into a
|
||||
|
||||
|
@ -82,10 +59,10 @@ Implementation
|
|||
} bignum;
|
||||
};
|
||||
|
||||
Only the n-1 lower bits of the long have any meaning; the top bit
|
||||
is always set. This distinguishes the union. All PyInt functions
|
||||
will check this bit before deciding which types of operations to
|
||||
use.
|
||||
Only the n-1 lower bits of the long have any meaning; the top
|
||||
bit is always set. This distinguishes the union. All PyInt
|
||||
functions will check this bit before deciding which types of
|
||||
operations to use.
|
||||
|
||||
2. The existing short and long int types remain, but the short int
|
||||
returns a long int instead of raising OverflowError when a
|
||||
|
@ -97,98 +74,135 @@ Implementation
|
|||
|
||||
if isinstance(i, integer): ...
|
||||
|
||||
|
||||
Literals
|
||||
|
||||
A trailing 'L' at the end of an integer literal will stop having
|
||||
any meaning, and will be eventually phased out.
|
||||
After some consideration, the second implementation plan was
|
||||
selected, since it is far easier to implement, is backwards
|
||||
compatible at the C API level, and in addition can be implemented
|
||||
partially as a transitional measure.
|
||||
|
||||
|
||||
Built-in Functions
|
||||
|
||||
The function long() will call the function int(). If
|
||||
implementation plan 1 is chosen, it will eventually be phased out;
|
||||
with implementation plan 2, it remains in the language to
|
||||
represent the long implementation type -- but the int() function
|
||||
is still recommended, since it will automatically return a long
|
||||
when needed.
|
||||
|
||||
|
||||
C API
|
||||
|
||||
If implementation plan 1 is chosen, all PyLong_As* will call
|
||||
PyInt_As*. If PyInt_As* does not exist, it will be added.
|
||||
Similarly for PyLong_From*. A similar path of warnings as for the
|
||||
Python built-ins will be followed.
|
||||
|
||||
If implementation plan 2 is chosen, the C API remains unchanged.
|
||||
|
||||
(The PyArg_Parse*() APIs already accept long ints, as long as they
|
||||
are within the range representable by C ints or longs. This will
|
||||
remain unchanged.)
|
||||
|
||||
|
||||
Overflows
|
||||
|
||||
When an arithmetic operation on two numbers whose internal
|
||||
representation is as machine-level integers returns something
|
||||
whose internal representation is a bignum, a warning which is
|
||||
turned off by default will be issued. This is only a debugging
|
||||
aid, and has no guaranteed semantics.
|
||||
|
||||
A command line option may be used to enable these warnings (the
|
||||
regular warning framework supports warnings that are off by
|
||||
default, but this is be too slow -- it makes a call to an
|
||||
complex piece of Python code).
|
||||
|
||||
This warning is not part of the transition plan; it will always be
|
||||
off by default, and the feature will probably disappear in Python
|
||||
3.0.
|
||||
|
||||
|
||||
Semantic Changes
|
||||
Incompatibilities
|
||||
|
||||
The following operations have (usually subtly) different semantics
|
||||
for short and for long integers, and one will have to change
|
||||
somehow. This is intended to be an exhaustive list; if you know
|
||||
of anything else that might change, please write the author.
|
||||
for short and for long integers, and one or the other will have to
|
||||
be changed somehow. This is intended to be an exhaustive list.
|
||||
If you know of any other operation that differ in outcome
|
||||
depending on whether a short or a long int with the same value is
|
||||
passed, please write the second author.
|
||||
|
||||
- Currently, all arithmetic operators on short ints except <<
|
||||
raise OverflowError if the result cannot be represented as a
|
||||
short int. This will change (of course).
|
||||
short int. This will be changed to return a long int instead.
|
||||
The following operators can currently raise OverflowError: x+y,
|
||||
x-y, x*y, x**y, divmod(x, y), x/y, x%y, and -x. (The last four
|
||||
can only overflow when the value -sys.maxint-1 is involved.)
|
||||
|
||||
- Currently x<<n can lose bits for short ints. No more.
|
||||
- Currently, x<<n can lose bits for short ints. This will be
|
||||
changed to return a long int containing all the shifted-out
|
||||
bits, if returning a short int would lose bits.
|
||||
|
||||
- Currently, hex and oct literals for for short ints may specify
|
||||
negative values; for example 0xffffffff == -1 on a 32-bint
|
||||
machine. No more; this will equal 0xffffffffL which is 2**32-1.
|
||||
machine. This will be changed to equal 0xffffffffL (2**32-1).
|
||||
|
||||
- Currently, the '%u', '%x' and '%o' string formatting operators
|
||||
and the hex() and oct() built-in functions behave differently
|
||||
for negative numbers: negative short ints are formatted as
|
||||
unsigned C long, while negative long ints are formatted with a
|
||||
minus sign. The long int semantics will rule (but without the
|
||||
trailing 'L' that currently distinguishes the output of hex()
|
||||
and oct() for long ints).
|
||||
minus sign. This will be changed to use the long int semantics
|
||||
in all cases (but without the trailing 'L' that currently
|
||||
distinguishes the output of hex() and oct() for long ints).
|
||||
Note that this means that '%u' becomes an alias for '%d'. It
|
||||
will eventually be removed.
|
||||
|
||||
- Currently, repr() of a long int returns a string ending in 'L'
|
||||
while repr() of a short int doesn't. The 'L' will be dropped.
|
||||
|
||||
- Currently, an operation with long operands will never return a
|
||||
short int. This may change (it allows an optimization). This
|
||||
is only relevant if implementation plan 2 is chosen.
|
||||
|
||||
- Currently, type(x) may reveal the difference between short and
|
||||
long ints. This will change if implementation plan 1 is chosen.
|
||||
short int. This *may* change, since it allows some
|
||||
optimization.
|
||||
|
||||
|
||||
Jython Issues
|
||||
Literals
|
||||
|
||||
Jython will have a PyInt interface which is implemented by both
|
||||
from PyFixNum and PyBigNum.
|
||||
A trailing 'L' at the end of an integer literal will stop having
|
||||
any meaning, and will be eventually become illegal. The compiler
|
||||
will choose the appropriate type solely based on the value.
|
||||
|
||||
(Question for the Jython developers -- do you foresee any other
|
||||
problems?)
|
||||
|
||||
Built-in Functions
|
||||
|
||||
The function int() will return a short or a long int depending on
|
||||
the argument value. The function long() will call the function
|
||||
int(). The built-in name 'long' will remain in the language to
|
||||
represent the long implementation type, but using the int()
|
||||
function is still recommended, since it will automatically return
|
||||
a long when needed.
|
||||
|
||||
|
||||
C API
|
||||
|
||||
The C API remains unchanged; C code will still need to be aware of
|
||||
the difference between short and long ints.
|
||||
|
||||
The PyArg_Parse*() APIs already accept long ints, as long as they
|
||||
are within the range representable by C ints or longs, so that
|
||||
functions taking C int or long argument won't have to worry about
|
||||
dealing with Python longs.
|
||||
|
||||
|
||||
Transition
|
||||
|
||||
There are two major phases to the transition:
|
||||
|
||||
A. Short int operations that currently raise OverflowError return
|
||||
a long int value instead. This is the only change in this
|
||||
phase. Literals will still distinguish between short and long
|
||||
ints. The other semantic differences listed above (including
|
||||
the behavior of <<) will remain. Because this phase only
|
||||
changes situations that currently raise OverflowError, it is
|
||||
assumed that this won't break existing code. (Code that
|
||||
depends on this exception would have to be too convoluted to be
|
||||
concerned about it.) For those concerned about extreme
|
||||
backwards compatibility, a command line option will allow a
|
||||
warning to be issued at this point, but this is off by default.
|
||||
|
||||
B. The remaining semantic differences are addressed. In most
|
||||
cases the long int semantics will prevail; however, the
|
||||
trailing 'L' from long int representations will be dropped.
|
||||
Eventually, support for integer literals with a trailing 'L'
|
||||
will be removed. Since this will introduce backwards
|
||||
incompatibilities which will break some old code, this phase
|
||||
may require a future statement and/or warnings, and a
|
||||
prolongued transition phase.
|
||||
|
||||
Phase A will be implemented in Python 2.2.
|
||||
|
||||
Phase B will be implemented starting with Python 2.3. Envisioned
|
||||
stages of phase B:
|
||||
|
||||
B1. The remaining semantic differenes are addressed. Operations
|
||||
that give different results than before will issue a warning
|
||||
that is on by default. A warning for the use of long literals
|
||||
(with a trailing 'L') may be enabled through a command line
|
||||
option, but it is off by default.
|
||||
|
||||
B2. The warning for long literals is turned on by default.
|
||||
|
||||
B3. The warnings about operations that give different results than
|
||||
before are turned off by default.
|
||||
|
||||
B4. Long literals are no longer legal. All warnings related to
|
||||
this issue are gone.
|
||||
|
||||
We propose the following timeline:
|
||||
|
||||
B1. Python 2.3.
|
||||
|
||||
B2. Python 2.4.
|
||||
|
||||
B3. The rest of the Python 2.x line.
|
||||
|
||||
B4. Python 3.0 (at least two years in the future).
|
||||
|
||||
|
||||
Open Issues
|
||||
|
@ -197,30 +211,15 @@ Open Issues
|
|||
feedback is received or we gather more experience with the initial
|
||||
implementation.
|
||||
|
||||
- Which implementation plan to choose? Moshe is for plan 1, Guido
|
||||
is for plan 2. Plan 2 seems less work. Plan 1 probably breaks
|
||||
more at the C API level, e.g. PyInt_AS_LONG below.
|
||||
- What to do about sys.maxint? Leave it in, since it is still
|
||||
relevant whenever the distinction between short and long ints is
|
||||
still relevant (e.g. when inspecting the type of a value).
|
||||
|
||||
- What to do about sys.maxint? (If implementation plan 1 is
|
||||
chosen, it should probably be phased out; for plan 2, it is
|
||||
still meaningful.)
|
||||
- Should be remove '%u' completely? Remove it.
|
||||
|
||||
- What to do about PyInt_AS_LONG failures? (Only relevant with
|
||||
implementation plan 1.)
|
||||
- Should we warn about << not truncating integers? Yes.
|
||||
|
||||
- What do do about %u, %o, %x formatting operators?
|
||||
|
||||
- Should we warn about << not cutting integers?
|
||||
|
||||
- Should the overflow warning be on a portable maximum size?
|
||||
|
||||
- Will unification of types and classes help with a more
|
||||
straightforward implementation? (Yes, it allows a common base
|
||||
class.)
|
||||
|
||||
- Define an C API that can be used to find out what the
|
||||
representation of an int is (only relevant for implementation
|
||||
plan 1).
|
||||
- Should the overflow warning be on a portable maximum size? No.
|
||||
|
||||
|
||||
Copyright
|
||||
|
|
Loading…
Reference in New Issue