PEP: 237 Title: Unifying Long Integers and Integers Version: $Revision$ Last-Modified: $Date$ Author: Moshe Zadka, Guido van Rossum Status: Final Type: Standards Track Content-Type: text/x-rst Created: 11-Mar-2001 Python-Version: 2.2 Post-History: 16-Mar-2001, 14-Aug-2001, 23-Aug-2001 Abstract ======== Python currently distinguishes between two kinds of integers (ints): regular or short ints, limited by the size of a C long (typically 32 or 64 bits), and long ints, which are limited only by available memory. When operations on short ints yield results that don't fit in a C long, they raise an error. There are some other distinctions too. This PEP proposes to do away with most of the differences in semantics, unifying the two types from the perspective of the Python user. Rationale ========= Many programs find a need to deal with larger numbers after the fact, and changing the algorithms later is bothersome. It can hinder performance in the normal case, when all arithmetic is performed using long ints whether or not they are needed. Having the machine word size exposed to the language hinders portability. For examples Python source files and .pyc's are not portable between 32-bit and 64-bit machines because of this. There is also the general desire to hide unnecessary details from the Python user when they are irrelevant for most applications. An example is memory allocation, which is explicit in C but automatic in Python, giving us the convenience of unlimited sizes on strings, lists, etc. It makes sense to extend this convenience to numbers. It will give new Python programmers (whether they are new to programming in general or not) one less thing to learn before they can start using the language. Implementation ============== Initially, two alternative implementations were proposed (one by each author): 1. The ``PyInt`` type's slot for a C long will be turned into a:: union { long i; struct { unsigned long length; digit digits[1]; } bignum; }; Only the ``n-1`` lower bits of the ``long`` have any meaning; the top bit is always set. This distinguishes the ``union``. All ``PyInt`` functions will check this bit before deciding which types of operations to use. 2. The existing short and long int types remain, but operations return a long int instead of raising ``OverflowError`` when a result cannot be represented as a short int. A new type, ``integer``, may be introduced that is an abstract base type of which both the ``int`` and ``long`` implementation types are subclassed. This is useful so that programs can check integer-ness with a single test:: if isinstance(i, integer): ... After some consideration, the second implementation plan was selected, since it is far easier to implement, is backwards compatible at the C API level, and in addition can be implemented partially as a transitional measure. Incompatibilities ================= The following operations have (usually subtly) different semantics for short and for long integers, and one or the other will have to be changed somehow. This is intended to be an exhaustive list. If you know of any other operation that differ in outcome depending on whether a short or a long int with the same value is passed, please write the second author. - Currently, all arithmetic operators on short ints except ``<<`` raise ``OverflowError`` if the result cannot be represented as a short int. This will be changed to return a long int instead. The following operators can currently raise ``OverflowError``: ``x+y``, ``x-y``, ``x*y``, ``x**y``, ``divmod(x, y)``, ``x/y``, ``x%y``, and ``-x``. (The last four can only overflow when the value ``-sys.maxint-1`` is involved.) - Currently, ``x<= 13``, this currently raises ``OverflowError`` (unless the user enters a trailing *L* as part of their input), even though the calculated index would always be in ``range(17)``. With the new approach this code will do the right thing: the index will be calculated as a long int, but its value will be in range. Resolved Issues =============== These issues, previously open, have been resolved. - ``hex()`` and ``oct()`` applied to longs will continue to produce a trailing *L* until Python 3000. The original text above wasn't clear about this, but since it didn't happen in Python 2.4 it was thought better to leave it alone. BDFL pronouncement here: http://mail.python.org/pipermail/python-dev/2006-June/065918.html - What to do about ``sys.maxint``? Leave it in, since it is still relevant whenever the distinction between short and long ints is still relevant (e.g. when inspecting the type of a value). - Should we remove ``%u`` completely? Remove it. - Should we warn about ``<<`` not truncating integers? Yes. - Should the overflow warning be on a portable maximum size? No. Implementation ============== The implementation work for the Python 2.x line is completed; phase 1 was released with Python 2.2, phase 2A with Python 2.3, and phase 2B will be released with Python 2.4 (and is already in CVS). Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil End: