python-peps/pep-0237.txt

PEP: 237
Title: Unifying Long Integers and Integers
Version: $Revision$
Author: pep@zadka.site.co.il (Moshe Zadka), guido@python.org (Guido van Rossum)
Status: Draft
Type: Standards Track
Created: 11-Mar-2001
Python-Version: 2.2
Post-History: 16-Mar-2001


Abstract

    Python has both integers (machine word size integral) types, and
    long integers (unbounded integral) types.  When integers
    operations overflow the machine registers, they raise an error.
    This PEP proposes to do away with the distinction, and unify the
    types from the perspective of both the Python interpreter and the
    C API.

    Note from second author: this PEP requires more thought about
    implementation details.  I've started to make a list of semantic
    differences but I doubt it's complete.


Rationale

    Having the machine word size exposed to the language hinders
    portability.  For examples Python source files and .pyc's are not
    portable between 32-bit and 64-bit machines because of this.  Many
    programs find a need to deal with larger numbers after the fact,
    and changing the algorithms later is not only bothersome, but
    hinders performance in the normal case.

    There is also the general desire to hide unnecessary details from
    the Python user when they are irrelevant for most applications.
    (Another example is memory allocation, which explicit in C but
    automatic in Python, giving us the convenience of unlimited sizes
    on strings, lists, etc.)

    It will give new Python programmers (whether they are new to
    programming in general or not) one less thing to learn before they
    can start using the language.


Transition

    There are three phases of the transition:

    1. Ints and longs are treated the same, no warnings are issued for
       code that uses longs.  Warnings for the use of longs (either
       long literals, ending in 'L' or 'l', or use of the long()
       function) may be enabled through a command line option.

    2. Longs are treated the same as ints but their use triggers a
       warning (which may be turned off or turned into an error using
       the -W command line option).

    3. Long literals and (if we choose implementation plan 1 below)
       the long() built-in are no longer legal.

    We propose the following timeline:

    1. Python 2.2.

    2. The rest of the Python 2.x line.

    3. Python 3.0 (at least two years in the future).


Implementation

    There are two alternative implementations to choose from.

    1. The PyInt type's slot for a C long will be turned into a 

        union {
            long i;
            struct {
                unsigned long length;
                digit digits[1];
            } bignum;
        };

       Only the n-1 lower bits of the long have any meaning; the top bit
       is always set.  This distinguishes the union.  All PyInt functions
       will check this bit before deciding which types of operations to
       use.

    2. The existing short and long int types remain, but the short int
       returns a long int instead of raising OverflowError when a
       result cannot be represented as a short int.  A new type,
       integer, may be introduced that is an abstract base type of
       which both the int and long implementation types are
       subclassed.  This is useful so that programs can check
       integer-ness with a single test:

           if isinstance(i, integer): ...


Literals

    A trailing 'L' at the end of an integer literal will stop having
    any meaning, and will be eventually phased out.


Built-in Functions

    The function long() will call the function int().  If
    implementation plan 1 is chosen, it will eventually be phased out;
    with implementation plan 2, it remains in the language to
    represent the long implementation type -- but the int() function
    is still recommended, since it will automatically return a long
    when needed.


C API

    If implementation plan 1 is chosen, all PyLong_As* will call
    PyInt_As*.  If PyInt_As* does not exist, it will be added.
    Similarly for PyLong_From*.  A similar path of warnings as for the
    Python built-ins will be followed.

    If implementation plan 2 is chosen, the C API remains unchanged.

    (The PyArg_Parse*() APIs already accept long ints, as long as they
    are within the range representable by C ints or longs.  This will
    remain unchanged.)


Overflows

    When an arithmetic operation on two numbers whose internal
    representation is as machine-level integers returns something
    whose internal representation is a bignum, a warning which is
    turned off by default will be issued.  This is only a debugging
    aid, and has no guaranteed semantics.

    A command line option may be used to enable these warnings (the
    regular warning framework supports warnings that are off by
    default, but this is be too slow -- it makes a call to an
    complex piece of Python code).

    This warning is not part of the transition plan; it will always be
    off by default, and the feature will probably disappear in Python
    3.0.


Semantic Changes

    The following operations have (usually subtly) different semantics
    for short and for long integers, and one will have to change
    somehow.  This is intended to be an exhaustive list; if you know
    of anything else that might change, please write the author.

    - Currently, all arithmetic operators on short ints except <<
      raise OverflowError if the result cannot be represented as a
      short int.  This will change (of course).

    - Currently x<<n can lose bits for short ints.  No more.

    - Currently, hex and oct literals for for short ints may specify
      negative values; for example 0xffffffff == -1 on a 32-bint
      machine.  No more; this will equal 0xffffffffL which is 2**32-1.

    - Currently, the '%u', '%x' and '%o' string formatting operators
      and the hex() and oct() built-in functions behave differently
      for negative numbers: negative short ints are formatted as
      unsigned C long, while negative long ints are formatted with a
      minus sign.  The long int semantics will rule (but without the
      trailing 'L' that currently distinguishes the output of hex()
      and oct() for long ints).

    - Currently, repr() of a long int returns a string ending in 'L'
      while repr() of a short int doesn't.  The 'L' will be dropped.

    - Currently, an operation with long operands will never return a
      short int.  This may change (it allows an optimization).  This
      is only relevant if implementation plan 2 is chosen.

    - Currently, type(x) may reveal the difference between short and
      long ints.  This will change if implementation plan 1 is chosen.


Jython Issues

    Jython will have a PyInt interface which is implemented by both
    from PyFixNum and PyBigNum.

    (Question for the Jython developers -- do you foresee any other
    problems?)


Open Issues

    We expect that these issues will be resolved over time, as more
    feedback is received or we gather more experience with the initial
    implementation.

    - Which implementation plan to choose?  Moshe is for plan 1, Guido
      is for plan 2.  Plan 2 seems less work.  Plan 1 probably breaks
      more at the C API level, e.g. PyInt_AS_LONG below.

    - What to do about sys.maxint?  (If implementation plan 1 is
      chosen, it should probably be phased out; for plan 2, it is
      still meaningful.)

    - What to do about PyInt_AS_LONG failures?  (Only relevant with
      implementation plan 1.)

    - What do do about %u, %o, %x formatting operators?

    - Should we warn about << not cutting integers?

    - Should the overflow warning be on a portable maximum size?

    - Will unification of types and classes help with a more
      straightforward implementation?  (Yes, it allows a common base
      class.)

    - Define an C API that can be used to find out what the
      representation of an int is (only relevant for implementation
      plan 1).


Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
+								PEP: 237
 								Title: Unifying Long Integers and Integers
 								Version: $Revision$
-												Grab co-authorship; added list of things that will/may change; added
alternative implementation.

											
										
										
											2001-07-29 05:48:51 -04:00
+								Author: pep@zadka.site.co.il (Moshe Zadka), guido@python.org (Guido van Rossum)
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
+								Status: Draft
 								Type: Standards Track
 								Created: 11-Mar-2001
 								Python-Version: 2.2
-												Documenting the fact that I posted these today

											
										
										
											2001-03-16 11:02:24 -05:00
+								Post-History: 16-Mar-2001
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
 								Abstract
 								    Python has both integers (machine word size integral) types, and
 								    long integers (unbounded integral) types.  When integers
 								    operations overflow the machine registers, they raise an error.
 								    This PEP proposes to do away with the distinction, and unify the
 								    types from the perspective of both the Python interpreter and the
 								    C API.
-												Grab co-authorship; added list of things that will/may change; added
alternative implementation.

											
										
										
											2001-07-29 05:48:51 -04:00
+								    Note from second author: this PEP requires more thought about
 								    implementation details.  I've started to make a list of semantic
 								    differences but I doubt it's complete.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
 								Rationale
 								    Having the machine word size exposed to the language hinders
 								    portability.  For examples Python source files and .pyc's are not
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    portable between 32-bit and 64-bit machines because of this.  Many
 								    programs find a need to deal with larger numbers after the fact,
 								    and changing the algorithms later is not only bothersome, but
 								    hinders performance in the normal case.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    There is also the general desire to hide unnecessary details from
 								    the Python user when they are irrelevant for most applications.
 								    (Another example is memory allocation, which explicit in C but
 								    automatic in Python, giving us the convenience of unlimited sizes
 								    on strings, lists, etc.)
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    It will give new Python programmers (whether they are new to
 								    programming in general or not) one less thing to learn before they
 								    can start using the language.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								Transition
 								    There are three phases of the transition:
 . Ints and longs are treated the same, no warnings are issued for
 								       code that uses longs.  Warnings for the use of longs (either
 								       long literals, ending in 'L' or 'l', or use of the long()
 								       function) may be enabled through a command line option.
 . Longs are treated the same as ints but their use triggers a
 								       warning (which may be turned off or turned into an error using
 								       the -W command line option).
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+. Long literals and (if we choose implementation plan 1 below)
 								       the long() built-in are no longer legal.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    We propose the following timeline:
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+. Python 2.2.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+. The rest of the Python 2.x line.
 . Python 3.0 (at least two years in the future).
 								Implementation
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    There are two alternative implementations to choose from.
 . The PyInt type's slot for a C long will be turned into a
 								        union {
 								            long i;
 								            struct {
 								                unsigned long length;
 								                digit digits[1];
 								            } bignum;
 								        };
 								       Only the n-1 lower bits of the long have any meaning; the top bit
 								       is always set.  This distinguishes the union.  All PyInt functions
 								       will check this bit before deciding which types of operations to
 								       use.
 . The existing short and long int types remain, but the short int
 								       returns a long int instead of raising OverflowError when a
 								       result cannot be represented as a short int.  A new type,
 								       integer, may be introduced that is an abstract base type of
 								       which both the int and long implementation types are
 								       subclassed.  This is useful so that programs can check
 								       integer-ness with a single test:
 								           if isinstance(i, integer): ...
 								Literals
 								    A trailing 'L' at the end of an integer literal will stop having
 								    any meaning, and will be eventually phased out.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
 								Built-in Functions
 								    The function long() will call the function int().  If
 								    implementation plan 1 is chosen, it will eventually be phased out;
 								    with implementation plan 2, it remains in the language to
 								    represent the long implementation type -- but the int() function
 								    is still recommended, since it will automatically return a long
 								    when needed.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
 								C API
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    If implementation plan 1 is chosen, all PyLong_As* will call
 								    PyInt_As*.  If PyInt_As* does not exist, it will be added.
 								    Similarly for PyLong_From*.  A similar path of warnings as for the
 								    Python built-ins will be followed.
 								    If implementation plan 2 is chosen, the C API remains unchanged.
 								    (The PyArg_Parse*() APIs already accept long ints, as long as they
 								    are within the range representable by C ints or longs.  This will
 								    remain unchanged.)
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
 								Overflows
 								    When an arithmetic operation on two numbers whose internal
 								    representation is as machine-level integers returns something
 								    whose internal representation is a bignum, a warning which is
 								    turned off by default will be issued.  This is only a debugging
 								    aid, and has no guaranteed semantics.
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    A command line option may be used to enable these warnings (the
 								    regular warning framework supports warnings that are off by
 								    default, but this is be too slow -- it makes a call to an
 								    complex piece of Python code).
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    This warning is not part of the transition plan; it will always be
 								    off by default, and the feature will probably disappear in Python
 .0.
 								Semantic Changes
-												Grab co-authorship; added list of things that will/may change; added
alternative implementation.

											
										
										
											2001-07-29 05:48:51 -04:00
 								    The following operations have (usually subtly) different semantics
 								    for short and for long integers, and one will have to change
 								    somehow.  This is intended to be an exhaustive list; if you know
 								    of anything else that might change, please write the author.
 								    - Currently, all arithmetic operators on short ints except <<
 								      raise OverflowError if the result cannot be represented as a
 								      short int.  This will change (of course).
 								    - Currently x<<n can lose bits for short ints.  No more.
 								    - Currently, hex and oct literals for for short ints may specify
 								      negative values; for example 0xffffffff == -1 on a 32-bint
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								      machine.  No more; this will equal 0xffffffffL which is 2**32-1.
 								    - Currently, the '%u', '%x' and '%o' string formatting operators
 								      and the hex() and oct() built-in functions behave differently
 								      for negative numbers: negative short ints are formatted as
 								      unsigned C long, while negative long ints are formatted with a
 								      minus sign.  The long int semantics will rule (but without the
 								      trailing 'L' that currently distinguishes the output of hex()
 								      and oct() for long ints).
-												Grab co-authorship; added list of things that will/may change; added
alternative implementation.

											
										
										
											2001-07-29 05:48:51 -04:00
 								    - Currently, repr() of a long int returns a string ending in 'L'
 								      while repr() of a short int doesn't.  The 'L' will be dropped.
 								    - Currently, an operation with long operands will never return a
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								      short int.  This may change (it allows an optimization).  This
 								      is only relevant if implementation plan 2 is chosen.
-												Grab co-authorship; added list of things that will/may change; added
alternative implementation.

											
										
										
											2001-07-29 05:48:51 -04:00
 								    - Currently, type(x) may reveal the difference between short and
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								      long ints.  This will change if implementation plan 1 is chosen.
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
 								Jython Issues
 								    Jython will have a PyInt interface which is implemented by both
 								    from PyFixNum and PyBigNum.
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    (Question for the Jython developers -- do you foresee any other
 								    problems?)
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
 								Open Issues
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    We expect that these issues will be resolved over time, as more
 								    feedback is received or we gather more experience with the initial
 								    implementation.
 								    - Which implementation plan to choose?  Moshe is for plan 1, Guido
 								      is for plan 2.  Plan 2 seems less work.  Plan 1 probably breaks
 								      more at the C API level, e.g. PyInt_AS_LONG below.
 								    - What to do about sys.maxint?  (If implementation plan 1 is
 								      chosen, it should probably be phased out; for plan 2, it is
 								      still meaningful.)
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    - What to do about PyInt_AS_LONG failures?  (Only relevant with
 								      implementation plan 1.)
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    - What do do about %u, %o, %x formatting operators?
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    - Should we warn about << not cutting integers?
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    - Should the overflow warning be on a portable maximum size?
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    - Will unification of types and classes help with a more
 								      straightforward implementation?  (Yes, it allows a common base
 								      class.)
-												Bug fix in implementation sketch, and some implementation open issues.

											
										
										
											2001-03-16 08:02:23 -05:00
-												Lots of updates, more rationale, explicit transition plan.

											
										
										
											2001-08-01 12:48:28 -04:00
+								    - Define an C API that can be used to find out what the
 								      representation of an int is (only relevant for implementation
 								      plan 1).
-												Changes discussed in the mailing list.

											
										
										
											2001-03-19 14:36:46 -05:00
-												PEP 237, Unifying Long Integers and Integers, Moshe Zadka

[checking in for Moshe, after editorial, spell check, and formatting
passes by Barry]

											
										
										
											2001-03-15 23:11:01 -05:00
 								Copyright
 								    This document has been placed in the public domain.
 								Local Variables:
 								mode: indented-text
 								indent-tabs-mode: nil
 								End: