340 lines
14 KiB
ReStructuredText
340 lines
14 KiB
ReStructuredText
PEP: 237
|
|
Title: Unifying Long Integers and Integers
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Moshe Zadka, Guido van Rossum
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 11-Mar-2001
|
|
Python-Version: 2.2
|
|
Post-History: 16-Mar-2001, 14-Aug-2001, 23-Aug-2001
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Python currently distinguishes between two kinds of integers (ints): regular
|
|
or short ints, limited by the size of a C long (typically 32 or 64 bits), and
|
|
long ints, which are limited only by available memory. When operations on
|
|
short ints yield results that don't fit in a C long, they raise an error.
|
|
There are some other distinctions too. This PEP proposes to do away with most
|
|
of the differences in semantics, unifying the two types from the perspective
|
|
of the Python user.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Many programs find a need to deal with larger numbers after the fact, and
|
|
changing the algorithms later is bothersome. It can hinder performance in the
|
|
normal case, when all arithmetic is performed using long ints whether or not
|
|
they are needed.
|
|
|
|
Having the machine word size exposed to the language hinders portability. For
|
|
examples Python source files and .pyc's are not portable between 32-bit and
|
|
64-bit machines because of this.
|
|
|
|
There is also the general desire to hide unnecessary details from the Python
|
|
user when they are irrelevant for most applications. An example is memory
|
|
allocation, which is explicit in C but automatic in Python, giving us the
|
|
convenience of unlimited sizes on strings, lists, etc. It makes sense to
|
|
extend this convenience to numbers.
|
|
|
|
It will give new Python programmers (whether they are new to programming in
|
|
general or not) one less thing to learn before they can start using the
|
|
language.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Initially, two alternative implementations were proposed (one by each author):
|
|
|
|
1. The ``PyInt`` type's slot for a C long will be turned into a::
|
|
|
|
union {
|
|
long i;
|
|
struct {
|
|
unsigned long length;
|
|
digit digits[1];
|
|
} bignum;
|
|
};
|
|
|
|
Only the ``n-1`` lower bits of the ``long`` have any meaning; the top bit
|
|
is always set. This distinguishes the ``union``. All ``PyInt`` functions
|
|
will check this bit before deciding which types of operations to use.
|
|
|
|
2. The existing short and long int types remain, but operations return
|
|
a long int instead of raising ``OverflowError`` when a result cannot be
|
|
represented as a short int. A new type, ``integer``, may be introduced
|
|
that is an abstract base type of which both the ``int`` and ``long``
|
|
implementation types are subclassed. This is useful so that programs can
|
|
check integer-ness with a single test::
|
|
|
|
if isinstance(i, integer): ...
|
|
|
|
After some consideration, the second implementation plan was selected, since
|
|
it is far easier to implement, is backwards compatible at the C API level, and
|
|
in addition can be implemented partially as a transitional measure.
|
|
|
|
|
|
Incompatibilities
|
|
=================
|
|
|
|
The following operations have (usually subtly) different semantics for short
|
|
and for long integers, and one or the other will have to be changed somehow.
|
|
This is intended to be an exhaustive list. If you know of any other operation
|
|
that differ in outcome depending on whether a short or a long int with the same
|
|
value is passed, please write the second author.
|
|
|
|
- Currently, all arithmetic operators on short ints except ``<<`` raise
|
|
``OverflowError`` if the result cannot be represented as a short int. This
|
|
will be changed to return a long int instead. The following operators can
|
|
currently raise ``OverflowError``: ``x+y``, ``x-y``, ``x*y``, ``x**y``,
|
|
``divmod(x, y)``, ``x/y``, ``x%y``, and ``-x``. (The last four can only
|
|
overflow when the value ``-sys.maxint-1`` is involved.)
|
|
|
|
- Currently, ``x<<n`` can lose bits for short ints. This will be changed to
|
|
return a long int containing all the shifted-out bits, if returning a short
|
|
int would lose bits (where changing sign is considered a special case of
|
|
losing bits).
|
|
|
|
- Currently, hex and oct literals for short ints may specify negative values;
|
|
for example ``0xffffffff == -1`` on a 32-bit machine. This will be changed
|
|
to equal ``0xffffffffL`` (``2**32-1``).
|
|
|
|
- Currently, the ``%u``, ``%x``, ``%X`` and ``%o`` string formatting operators
|
|
and the ``hex()`` and ``oct()`` built-in functions behave differently for
|
|
negative numbers: negative short ints are formatted as unsigned C long,
|
|
while negative long ints are formatted with a minus sign. This will be
|
|
changed to use the long int semantics in all cases (but without the trailing
|
|
*L* that currently distinguishes the output of ``hex()`` and ``oct()`` for
|
|
long ints). Note that this means that ``%u`` becomes an alias for ``%d``.
|
|
It will eventually be removed.
|
|
|
|
- Currently, ``repr()`` of a long int returns a string ending in *L* while
|
|
``repr()`` of a short int doesn't. The *L* will be dropped; but not before
|
|
Python 3.0.
|
|
|
|
- Currently, an operation with long operands will never return a short int.
|
|
This *may* change, since it allows some optimization. (No changes have been
|
|
made in this area yet, and none are planned.)
|
|
|
|
- The expression ``type(x).__name__`` depends on whether *x* is a short or a
|
|
long int. Since implementation alternative 2 is chosen, this difference
|
|
will remain. (In Python 3.0, we *may* be able to deploy a trick to hide the
|
|
difference, because it *is* annoying to reveal the difference to user code,
|
|
and more so as the difference between the two types is less visible.)
|
|
|
|
- Long and short ints are handled different by the ``marshal`` module, and by
|
|
the ``pickle`` and ``cPickle`` modules. This difference will remain (at
|
|
least until Python 3.0).
|
|
|
|
- Short ints with small values (typically between -1 and 99 inclusive) are
|
|
*interned* -- whenever a result has such a value, an existing short int with
|
|
the same value is returned. This is not done for long ints with the same
|
|
values. This difference will remain. (Since there is no guarantee of this
|
|
interning, it is debatable whether this is a semantic difference -- but code
|
|
may exist that uses ``is`` for comparisons of short ints and happens to work
|
|
because of this interning. Such code may fail if used with long ints.)
|
|
|
|
|
|
Literals
|
|
========
|
|
|
|
A trailing *L* at the end of an integer literal will stop having any
|
|
meaning, and will be eventually become illegal. The compiler will choose the
|
|
appropriate type solely based on the value. (Until Python 3.0, it will force
|
|
the literal to be a long; but literals without a trailing *L* may also be
|
|
long, if they are not representable as short ints.)
|
|
|
|
|
|
Built-in Functions
|
|
==================
|
|
|
|
The function ``int()`` will return a short or a long int depending on the
|
|
argument value. In Python 3.0, the function ``long()`` will call the function
|
|
``int()``; before then, it will continue to force the result to be a long int,
|
|
but otherwise work the same way as ``int()``. The built-in name ``long`` will
|
|
remain in the language to represent the long implementation type (unless it is
|
|
completely eradicated in Python 3.0), but using the ``int()`` function is
|
|
still recommended, since it will automatically return a long when needed.
|
|
|
|
|
|
C API
|
|
=====
|
|
|
|
The C API remains unchanged; C code will still need to be aware of the
|
|
difference between short and long ints. (The Python 3.0 C API will probably
|
|
be completely incompatible.)
|
|
|
|
The ``PyArg_Parse*()`` APIs already accept long ints, as long as they are
|
|
within the range representable by C ints or longs, so that functions taking C
|
|
int or long argument won't have to worry about dealing with Python longs.
|
|
|
|
|
|
Transition
|
|
==========
|
|
|
|
There are three major phases to the transition:
|
|
|
|
1. Short int operations that currently raise ``OverflowError`` return a long
|
|
int value instead. This is the only change in this phase. Literals will
|
|
still distinguish between short and long ints. The other semantic
|
|
differences listed above (including the behavior of ``<<``) will remain.
|
|
Because this phase only changes situations that currently raise
|
|
``OverflowError``, it is assumed that this won't break existing code.
|
|
(Code that depends on this exception would have to be too convoluted to be
|
|
concerned about it.) For those concerned about extreme backwards
|
|
compatibility, a command line option (or a call to the warnings module)
|
|
will allow a warning or an error to be issued at this point, but this is
|
|
off by default.
|
|
|
|
2. The remaining semantic differences are addressed. In all cases the long
|
|
int semantics will prevail. Since this will introduce backwards
|
|
incompatibilities which will break some old code, this phase may require a
|
|
future statement and/or warnings, and a prolonged transition phase. The
|
|
trailing *L* will continue to be used for longs as input and by
|
|
``repr()``.
|
|
|
|
A. Warnings are enabled about operations that will change their numeric
|
|
outcome in stage 2B, in particular ``hex()`` and ``oct()``, ``%u``,
|
|
``%x``, ``%X`` and ``%o``, ``hex`` and ``oct`` literals in the
|
|
(inclusive) range ``[sys.maxint+1, sys.maxint*2+1]``, and left shifts
|
|
losing bits.
|
|
B. The new semantic for these operations are implemented. Operations that
|
|
give different results than before will *not* issue a warning.
|
|
|
|
3. The trailing *L* is dropped from ``repr()``, and made illegal on input.
|
|
(If possible, the ``long`` type completely disappears.) The trailing *L*
|
|
is also dropped from ``hex()`` and ``oct()``.
|
|
|
|
Phase 1 will be implemented in Python 2.2.
|
|
|
|
Phase 2 will be implemented gradually, with 2A in Python 2.3 and 2B in
|
|
Python 2.4.
|
|
|
|
Phase 3 will be implemented in Python 3.0 (at least two years after Python 2.4
|
|
is released).
|
|
|
|
|
|
OverflowWarning
|
|
===============
|
|
|
|
Here are the rules that guide warnings generated in situations that currently
|
|
raise ``OverflowError``. This applies to transition phase 1. Historical
|
|
note: despite that phase 1 was completed in Python 2.2, and phase 2A in Python
|
|
2.3, nobody noticed that OverflowWarning was still generated in Python 2.3.
|
|
It was finally disabled in Python 2.4. The Python builtin
|
|
``OverflowWarning``, and the corresponding C API ``PyExc_OverflowWarning``,
|
|
are no longer generated or used in Python 2.4, but will remain for the
|
|
(unlikely) case of user code until Python 2.5.
|
|
|
|
- A new warning category is introduced, ``OverflowWarning``. This is a
|
|
built-in name.
|
|
|
|
- If an int result overflows, an ``OverflowWarning`` warning is issued, with a
|
|
message argument indicating the operation, e.g. "integer addition". This
|
|
may or may not cause a warning message to be displayed on ``sys.stderr``, or
|
|
may cause an exception to be raised, all under control of the ``-W`` command
|
|
line and the warnings module.
|
|
|
|
- The ``OverflowWarning`` warning is ignored by default.
|
|
|
|
- The ``OverflowWarning`` warning can be controlled like all warnings, via the
|
|
``-W`` command line option or via the ``warnings.filterwarnings()`` call.
|
|
For example::
|
|
|
|
python -Wdefault::OverflowWarning
|
|
|
|
cause the ``OverflowWarning`` to be displayed the first time it occurs at a
|
|
particular source line, and::
|
|
|
|
python -Werror::OverflowWarning
|
|
|
|
cause the ``OverflowWarning`` to be turned into an exception whenever it
|
|
happens. The following code enables the warning from inside the program::
|
|
|
|
import warnings
|
|
warnings.filterwarnings("default", "", OverflowWarning)
|
|
|
|
See the python ``man`` page for the ``-W`` option and the ``warnings``
|
|
module documentation for ``filterwarnings()``.
|
|
|
|
- If the ``OverflowWarning`` warning is turned into an error,
|
|
``OverflowError`` is substituted. This is needed for backwards
|
|
compatibility.
|
|
|
|
- Unless the warning is turned into an exceptions, the result of the operation
|
|
(e.g., ``x+y``) is recomputed after converting the arguments to long ints.
|
|
|
|
|
|
Example
|
|
=======
|
|
|
|
If you pass a long int to a C function or built-in operation that takes an
|
|
integer, it will be treated the same as a short int as long as the value fits
|
|
(by virtue of how ``PyArg_ParseTuple()`` is implemented). If the long value
|
|
doesn't fit, it will still raise an ``OverflowError``. For example::
|
|
|
|
def fact(n):
|
|
if n <= 1:
|
|
return 1
|
|
return n*fact(n-1)
|
|
|
|
A = "ABCDEFGHIJKLMNOPQ"
|
|
n = input("Gimme an int: ")
|
|
print A[fact(n)%17]
|
|
|
|
For ``n >= 13``, this currently raises ``OverflowError`` (unless the user
|
|
enters a trailing *L* as part of their input), even though the calculated
|
|
index would always be in ``range(17)``. With the new approach this code will
|
|
do the right thing: the index will be calculated as a long int, but its value
|
|
will be in range.
|
|
|
|
|
|
Resolved Issues
|
|
===============
|
|
|
|
These issues, previously open, have been resolved.
|
|
|
|
- ``hex()`` and ``oct()`` applied to longs will continue to produce a trailing
|
|
*L* until Python 3000. The original text above wasn't clear about this,
|
|
but since it didn't happen in Python 2.4 it was thought better to leave it
|
|
alone. BDFL pronouncement here:
|
|
|
|
https://mail.python.org/pipermail/python-dev/2006-June/065918.html
|
|
|
|
- What to do about ``sys.maxint``? Leave it in, since it is still relevant
|
|
whenever the distinction between short and long ints is still relevant (e.g.
|
|
when inspecting the type of a value).
|
|
|
|
- Should we remove ``%u`` completely? Remove it.
|
|
|
|
- Should we warn about ``<<`` not truncating integers? Yes.
|
|
|
|
- Should the overflow warning be on a portable maximum size? No.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
The implementation work for the Python 2.x line is completed; phase 1 was
|
|
released with Python 2.2, phase 2A with Python 2.3, and phase 2B will be
|
|
released with Python 2.4 (and is already in CVS).
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
End:
|