2001-03-15 23:11:01 -05:00
|
|
|
|
PEP: 237
|
|
|
|
|
Title: Unifying Long Integers and Integers
|
|
|
|
|
Version: $Revision$
|
2001-07-29 05:48:51 -04:00
|
|
|
|
Author: pep@zadka.site.co.il (Moshe Zadka), guido@python.org (Guido van Rossum)
|
2001-03-15 23:11:01 -05:00
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Created: 11-Mar-2001
|
|
|
|
|
Python-Version: 2.2
|
2001-03-16 11:02:24 -05:00
|
|
|
|
Post-History: 16-Mar-2001
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
Python has both integers (machine word size integral) types, and
|
|
|
|
|
long integers (unbounded integral) types. When integers
|
|
|
|
|
operations overflow the machine registers, they raise an error.
|
|
|
|
|
This PEP proposes to do away with the distinction, and unify the
|
|
|
|
|
types from the perspective of both the Python interpreter and the
|
|
|
|
|
C API.
|
|
|
|
|
|
2001-07-29 05:48:51 -04:00
|
|
|
|
Note from second author: this PEP requires more thought about
|
|
|
|
|
implementation details. I've started to make a list of semantic
|
|
|
|
|
differences but I doubt it's complete.
|
|
|
|
|
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
Having the machine word size exposed to the language hinders
|
|
|
|
|
portability. For examples Python source files and .pyc's are not
|
2001-08-01 12:48:28 -04:00
|
|
|
|
portable between 32-bit and 64-bit machines because of this. Many
|
|
|
|
|
programs find a need to deal with larger numbers after the fact,
|
|
|
|
|
and changing the algorithms later is not only bothersome, but
|
|
|
|
|
hinders performance in the normal case.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
There is also the general desire to hide unnecessary details from
|
|
|
|
|
the Python user when they are irrelevant for most applications.
|
|
|
|
|
(Another example is memory allocation, which explicit in C but
|
|
|
|
|
automatic in Python, giving us the convenience of unlimited sizes
|
|
|
|
|
on strings, lists, etc.)
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
It will give new Python programmers (whether they are new to
|
|
|
|
|
programming in general or not) one less thing to learn before they
|
|
|
|
|
can start using the language.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
Transition
|
|
|
|
|
|
|
|
|
|
There are three phases of the transition:
|
|
|
|
|
|
|
|
|
|
1. Ints and longs are treated the same, no warnings are issued for
|
|
|
|
|
code that uses longs. Warnings for the use of longs (either
|
|
|
|
|
long literals, ending in 'L' or 'l', or use of the long()
|
|
|
|
|
function) may be enabled through a command line option.
|
|
|
|
|
|
|
|
|
|
2. Longs are treated the same as ints but their use triggers a
|
|
|
|
|
warning (which may be turned off or turned into an error using
|
|
|
|
|
the -W command line option).
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
3. Long literals and (if we choose implementation plan 1 below)
|
|
|
|
|
the long() built-in are no longer legal.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
We propose the following timeline:
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
1. Python 2.2.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
2. The rest of the Python 2.x line.
|
|
|
|
|
|
|
|
|
|
3. Python 3.0 (at least two years in the future).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implementation
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
There are two alternative implementations to choose from.
|
|
|
|
|
|
|
|
|
|
1. The PyInt type's slot for a C long will be turned into a
|
|
|
|
|
|
|
|
|
|
union {
|
|
|
|
|
long i;
|
|
|
|
|
struct {
|
|
|
|
|
unsigned long length;
|
|
|
|
|
digit digits[1];
|
|
|
|
|
} bignum;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
Only the n-1 lower bits of the long have any meaning; the top bit
|
|
|
|
|
is always set. This distinguishes the union. All PyInt functions
|
|
|
|
|
will check this bit before deciding which types of operations to
|
|
|
|
|
use.
|
|
|
|
|
|
|
|
|
|
2. The existing short and long int types remain, but the short int
|
|
|
|
|
returns a long int instead of raising OverflowError when a
|
|
|
|
|
result cannot be represented as a short int. A new type,
|
|
|
|
|
integer, may be introduced that is an abstract base type of
|
|
|
|
|
which both the int and long implementation types are
|
|
|
|
|
subclassed. This is useful so that programs can check
|
|
|
|
|
integer-ness with a single test:
|
|
|
|
|
|
|
|
|
|
if isinstance(i, integer): ...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Literals
|
|
|
|
|
|
|
|
|
|
A trailing 'L' at the end of an integer literal will stop having
|
|
|
|
|
any meaning, and will be eventually phased out.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
|
|
|
|
|
Built-in Functions
|
|
|
|
|
|
|
|
|
|
The function long() will call the function int(). If
|
|
|
|
|
implementation plan 1 is chosen, it will eventually be phased out;
|
|
|
|
|
with implementation plan 2, it remains in the language to
|
|
|
|
|
represent the long implementation type -- but the int() function
|
|
|
|
|
is still recommended, since it will automatically return a long
|
|
|
|
|
when needed.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C API
|
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
If implementation plan 1 is chosen, all PyLong_As* will call
|
|
|
|
|
PyInt_As*. If PyInt_As* does not exist, it will be added.
|
|
|
|
|
Similarly for PyLong_From*. A similar path of warnings as for the
|
|
|
|
|
Python built-ins will be followed.
|
|
|
|
|
|
|
|
|
|
If implementation plan 2 is chosen, the C API remains unchanged.
|
|
|
|
|
|
|
|
|
|
(The PyArg_Parse*() APIs already accept long ints, as long as they
|
|
|
|
|
are within the range representable by C ints or longs. This will
|
|
|
|
|
remain unchanged.)
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Overflows
|
|
|
|
|
|
|
|
|
|
When an arithmetic operation on two numbers whose internal
|
|
|
|
|
representation is as machine-level integers returns something
|
|
|
|
|
whose internal representation is a bignum, a warning which is
|
|
|
|
|
turned off by default will be issued. This is only a debugging
|
|
|
|
|
aid, and has no guaranteed semantics.
|
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
A command line option may be used to enable these warnings (the
|
|
|
|
|
regular warning framework supports warnings that are off by
|
|
|
|
|
default, but this is be too slow -- it makes a call to an
|
|
|
|
|
complex piece of Python code).
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
This warning is not part of the transition plan; it will always be
|
|
|
|
|
off by default, and the feature will probably disappear in Python
|
|
|
|
|
3.0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Semantic Changes
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
|
|
|
|
The following operations have (usually subtly) different semantics
|
|
|
|
|
for short and for long integers, and one will have to change
|
|
|
|
|
somehow. This is intended to be an exhaustive list; if you know
|
|
|
|
|
of anything else that might change, please write the author.
|
|
|
|
|
|
|
|
|
|
- Currently, all arithmetic operators on short ints except <<
|
|
|
|
|
raise OverflowError if the result cannot be represented as a
|
|
|
|
|
short int. This will change (of course).
|
|
|
|
|
|
|
|
|
|
- Currently x<<n can lose bits for short ints. No more.
|
|
|
|
|
|
|
|
|
|
- Currently, hex and oct literals for for short ints may specify
|
|
|
|
|
negative values; for example 0xffffffff == -1 on a 32-bint
|
2001-08-01 12:48:28 -04:00
|
|
|
|
machine. No more; this will equal 0xffffffffL which is 2**32-1.
|
|
|
|
|
|
|
|
|
|
- Currently, the '%u', '%x' and '%o' string formatting operators
|
|
|
|
|
and the hex() and oct() built-in functions behave differently
|
|
|
|
|
for negative numbers: negative short ints are formatted as
|
|
|
|
|
unsigned C long, while negative long ints are formatted with a
|
|
|
|
|
minus sign. The long int semantics will rule (but without the
|
|
|
|
|
trailing 'L' that currently distinguishes the output of hex()
|
|
|
|
|
and oct() for long ints).
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
|
|
|
|
- Currently, repr() of a long int returns a string ending in 'L'
|
|
|
|
|
while repr() of a short int doesn't. The 'L' will be dropped.
|
|
|
|
|
|
|
|
|
|
- Currently, an operation with long operands will never return a
|
2001-08-01 12:48:28 -04:00
|
|
|
|
short int. This may change (it allows an optimization). This
|
|
|
|
|
is only relevant if implementation plan 2 is chosen.
|
2001-07-29 05:48:51 -04:00
|
|
|
|
|
|
|
|
|
- Currently, type(x) may reveal the difference between short and
|
2001-08-01 12:48:28 -04:00
|
|
|
|
long ints. This will change if implementation plan 1 is chosen.
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jython Issues
|
|
|
|
|
|
|
|
|
|
Jython will have a PyInt interface which is implemented by both
|
|
|
|
|
from PyFixNum and PyBigNum.
|
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
(Question for the Jython developers -- do you foresee any other
|
|
|
|
|
problems?)
|
|
|
|
|
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
Open Issues
|
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
We expect that these issues will be resolved over time, as more
|
|
|
|
|
feedback is received or we gather more experience with the initial
|
|
|
|
|
implementation.
|
|
|
|
|
|
|
|
|
|
- Which implementation plan to choose? Moshe is for plan 1, Guido
|
|
|
|
|
is for plan 2. Plan 2 seems less work. Plan 1 probably breaks
|
|
|
|
|
more at the C API level, e.g. PyInt_AS_LONG below.
|
|
|
|
|
|
|
|
|
|
- What to do about sys.maxint? (If implementation plan 1 is
|
|
|
|
|
chosen, it should probably be phased out; for plan 2, it is
|
|
|
|
|
still meaningful.)
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
- What to do about PyInt_AS_LONG failures? (Only relevant with
|
|
|
|
|
implementation plan 1.)
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
- What do do about %u, %o, %x formatting operators?
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
- Should we warn about << not cutting integers?
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
- Should the overflow warning be on a portable maximum size?
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
- Will unification of types and classes help with a more
|
|
|
|
|
straightforward implementation? (Yes, it allows a common base
|
|
|
|
|
class.)
|
2001-03-16 08:02:23 -05:00
|
|
|
|
|
2001-08-01 12:48:28 -04:00
|
|
|
|
- Define an C API that can be used to find out what the
|
|
|
|
|
representation of an int is (only relevant for implementation
|
|
|
|
|
plan 1).
|
2001-03-19 14:36:46 -05:00
|
|
|
|
|
2001-03-15 23:11:01 -05:00
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|