529 lines
23 KiB
Plaintext
529 lines
23 KiB
Plaintext
PEP: 238
|
||
Title: Changing the Division Operator
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: moshez@zadka.site.co.il (Moshe Zadka), guido@python.org (Guido van Rossum)
|
||
Status: Final
|
||
Type: Standards Track
|
||
Created: 11-Mar-2001
|
||
Python-Version: 2.2
|
||
Post-History: 16-Mar-2001, 26-Jul-2001, 27-Jul-2001
|
||
|
||
|
||
Abstract
|
||
|
||
The current division (/) operator has an ambiguous meaning for
|
||
numerical arguments: it returns the floor of the mathematical
|
||
result of division if the arguments are ints or longs, but it
|
||
returns a reasonable approximation of the division result if the
|
||
arguments are floats or complex. This makes expressions expecting
|
||
float or complex results error-prone when integers are not
|
||
expected but possible as inputs.
|
||
|
||
We propose to fix this by introducing different operators for
|
||
different operations: x/y to return a reasonable approximation of
|
||
the mathematical result of the division ("true division"), x//y to
|
||
return the floor ("floor division"). We call the current, mixed
|
||
meaning of x/y "classic division".
|
||
|
||
Because of severe backwards compatibility issues, not to mention a
|
||
major flamewar on c.l.py, we propose the following transitional
|
||
measures (starting with Python 2.2):
|
||
|
||
- Classic division will remain the default in the Python 2.x
|
||
series; true division will be standard in Python 3.0.
|
||
|
||
- The // operator will be available to request floor division
|
||
unambiguously.
|
||
|
||
- The future division statement, spelled "from __future__ import
|
||
division", will change the / operator to mean true division
|
||
throughout the module.
|
||
|
||
- A command line option will enable run-time warnings for classic
|
||
division applied to int or long arguments; another command line
|
||
option will make true division the default.
|
||
|
||
- The standard library will use the future division statement and
|
||
the // operator when appropriate, so as to completely avoid
|
||
classic division.
|
||
|
||
|
||
Motivation
|
||
|
||
The classic division operator makes it hard to write numerical
|
||
expressions that are supposed to give correct results from
|
||
arbitrary numerical inputs. For all other operators, one can
|
||
write down a formula such as x*y**2 + z, and the calculated result
|
||
will be close to the mathematical result (within the limits of
|
||
numerical accuracy, of course) for any numerical input type (int,
|
||
long, float, or complex). But division poses a problem: if the
|
||
expressions for both arguments happen to have an integral type, it
|
||
implements floor division rather than true division.
|
||
|
||
The problem is unique to dynamically typed languages: in a
|
||
statically typed language like C, the inputs, typically function
|
||
arguments, would be declared as double or float, and when a call
|
||
passes an integer argument, it is converted to double or float at
|
||
the time of the call. Python doesn't have argument type
|
||
declarations, so integer arguments can easily find their way into
|
||
an expression.
|
||
|
||
The problem is particularly pernicious since ints are perfect
|
||
substitutes for floats in all other circumstances: math.sqrt(2)
|
||
returns the same value as math.sqrt(2.0), 3.14*100 and 3.14*100.0
|
||
return the same value, and so on. Thus, the author of a numerical
|
||
routine may only use floating point numbers to test his code, and
|
||
believe that it works correctly, and a user may accidentally pass
|
||
in an integer input value and get incorrect results.
|
||
|
||
Another way to look at this is that classic division makes it
|
||
difficult to write polymorphic functions that work well with
|
||
either float or int arguments; all other operators already do the
|
||
right thing. No algorithm that works for both ints and floats has
|
||
a need for truncating division in one case and true division in
|
||
the other.
|
||
|
||
The correct work-around is subtle: casting an argument to float()
|
||
is wrong if it could be a complex number; adding 0.0 to an
|
||
argument doesn't preserve the sign of the argument if it was minus
|
||
zero. The only solution without either downside is multiplying an
|
||
argument (typically the first) by 1.0. This leaves the value and
|
||
sign unchanged for float and complex, and turns int and long into
|
||
a float with the corresponding value.
|
||
|
||
It is the opinion of the authors that this is a real design bug in
|
||
Python, and that it should be fixed sooner rather than later.
|
||
Assuming Python usage will continue to grow, the cost of leaving
|
||
this bug in the language will eventually outweigh the cost of
|
||
fixing old code -- there is an upper bound to the amount of code
|
||
to be fixed, but the amount of code that might be affected by the
|
||
bug in the future is unbounded.
|
||
|
||
Another reason for this change is the desire to ultimately unify
|
||
Python's numeric model. This is the subject of PEP 228[0] (which
|
||
is currently incomplete). A unified numeric model removes most of
|
||
the user's need to be aware of different numerical types. This is
|
||
good for beginners, but also takes away concerns about different
|
||
numeric behavior for advanced programmers. (Of course, it won't
|
||
remove concerns about numerical stability and accuracy.)
|
||
|
||
In a unified numeric model, the different types (int, long, float,
|
||
complex, and possibly others, such as a new rational type) serve
|
||
mostly as storage optimizations, and to some extent to indicate
|
||
orthogonal properties such as inexactness or complexity. In a
|
||
unified model, the integer 1 should be indistinguishable from the
|
||
floating point number 1.0 (except for its inexactness), and both
|
||
should behave the same in all numeric contexts. Clearly, in a
|
||
unified numeric model, if a==b and c==d, a/c should equal b/d
|
||
(taking some liberties due to rounding for inexact numbers), and
|
||
since everybody agrees that 1.0/2.0 equals 0.5, 1/2 should also
|
||
equal 0.5. Likewise, since 1//2 equals zero, 1.0//2.0 should also
|
||
equal zero.
|
||
|
||
|
||
Variations
|
||
|
||
Aesthetically, x//y doesn't please everyone, and hence several
|
||
variations have been proposed. They are addressed here:
|
||
|
||
- x div y. This would introduce a new keyword. Since div is a
|
||
popular identifier, this would break a fair amount of existing
|
||
code, unless the new keyword was only recognized under a future
|
||
division statement. Since it is expected that the majority of
|
||
code that needs to be converted is dividing integers, this would
|
||
greatly increase the need for the future division statement.
|
||
Even with a future statement, the general sentiment against
|
||
adding new keywords unless absolutely necessary argues against
|
||
this.
|
||
|
||
- div(x, y). This makes the conversion of old code much harder.
|
||
Replacing x/y with x//y or x div y can be done with a simple
|
||
query replace; in most cases the programmer can easily verify
|
||
that a particular module only works with integers so all
|
||
occurrences of x/y can be replaced. (The query replace is still
|
||
needed to weed out slashes occurring in comments or string
|
||
literals.) Replacing x/y with div(x, y) would require a much
|
||
more intelligent tool, since the extent of the expressions to
|
||
the left and right of the / must be analyzed before the
|
||
placement of the "div(" and ")" part can be decided.
|
||
|
||
- x \ y. The backslash is already a token, meaning line
|
||
continuation, and in general it suggests an "escape" to Unix
|
||
eyes. In addition (this due to Terry Reedy) this would make
|
||
things like eval("x\y") harder to get right.
|
||
|
||
|
||
Alternatives
|
||
|
||
In order to reduce the amount of old code that needs to be
|
||
converted, several alternative proposals have been put forth.
|
||
Here is a brief discussion of each proposal (or category of
|
||
proposals). If you know of an alternative that was discussed on
|
||
c.l.py that isn't mentioned here, please mail the second author.
|
||
|
||
- Let / keep its classic semantics; introduce // for true
|
||
division. This still leaves a broken operator in the language,
|
||
and invites to use the broken behavior. It also shuts off the
|
||
road to a unified numeric model a la PEP 228[0].
|
||
|
||
- Let int division return a special "portmanteau" type that
|
||
behaves as an integer in integer context, but like a float in a
|
||
float context. The problem with this is that after a few
|
||
operations, the int and the float value could be miles apart,
|
||
it's unclear which value should be used in comparisons, and of
|
||
course many contexts (like conversion to string) don't have a
|
||
clear integer or float preference.
|
||
|
||
- Use a directive to use specific division semantics in a module,
|
||
rather than a future statement. This retains classic division
|
||
as a permanent wart in the language, requiring future
|
||
generations of Python programmers to be aware of the problem and
|
||
the remedies.
|
||
|
||
- Use "from __past__ import division" to use classic division
|
||
semantics in a module. This also retains the classic division
|
||
as a permanent wart, or at least for a long time (eventually the
|
||
past division statement could raise an ImportError).
|
||
|
||
- Use a directive (or some other way) to specify the Python
|
||
version for which a specific piece of code was developed. This
|
||
requires future Python interpreters to be able to emulate
|
||
*exactly* several previous versions of Python, and moreover to
|
||
do so for multiple versions within the same interpreter. This
|
||
is way too much work. A much simpler solution is to keep
|
||
multiple interpreters installed. Another argument against this
|
||
is that the version directive is almost always overspecified:
|
||
most code written for Python X.Y, works for Python X.(Y-1) and
|
||
X.(Y+1) as well, so specifying X.Y as a version is more
|
||
constraining than it needs to be. At the same time, there's no
|
||
way to know at which future or past version the code will break.
|
||
|
||
|
||
API Changes
|
||
|
||
During the transitional phase, we have to support *three* division
|
||
operators within the same program: classic division (for / in
|
||
modules without a future division statement), true division (for /
|
||
in modules with a future division statement), and floor division
|
||
(for //). Each operator comes in two flavors: regular, and as an
|
||
augmented assignment operator (/= or //=).
|
||
|
||
The names associated with these variations are:
|
||
|
||
- Overloaded operator methods:
|
||
|
||
__div__(), __floordiv__(), __truediv__();
|
||
|
||
__idiv__(), __ifloordiv__(), __itruediv__().
|
||
|
||
- Abstract API C functions:
|
||
|
||
PyNumber_Divide(), PyNumber_FloorDivide(),
|
||
PyNumber_TrueDivide();
|
||
|
||
PyNumber_InPlaceDivide(), PyNumber_InPlaceFloorDivide(),
|
||
PyNumber_InPlaceTrueDivide().
|
||
|
||
- Byte code opcodes:
|
||
|
||
BINARY_DIVIDE, BINARY_FLOOR_DIVIDE, BINARY_TRUE_DIVIDE;
|
||
|
||
INPLACE_DIVIDE, INPLACE_FLOOR_DIVIDE, INPLACE_TRUE_DIVIDE.
|
||
|
||
- PyNumberMethod slots:
|
||
|
||
nb_divide, nb_floor_divide, nb_true_divide,
|
||
|
||
nb_inplace_divide, nb_inplace_floor_divide,
|
||
nb_inplace_true_divide.
|
||
|
||
The added PyNumberMethod slots require an additional flag in
|
||
tp_flags; this flag will be named Py_TPFLAGS_HAVE_NEWDIVIDE and
|
||
will be included in Py_TPFLAGS_DEFAULT.
|
||
|
||
The true and floor division APIs will look for the corresponding
|
||
slots and call that; when that slot is NULL, they will raise an
|
||
exception. There is no fallback to the classic divide slot.
|
||
|
||
In Python 3.0, the classic division semantics will be removed; the
|
||
classic division APIs will become synonymous with true division.
|
||
|
||
|
||
Command Line Option
|
||
|
||
The -Q command line option takes a string argument that can take
|
||
four values: "old", "warn", "warnall", or "new". The default is
|
||
"old" in Python 2.2 but will change to "warn" in later 2.x
|
||
versions. The "old" value means the classic division operator
|
||
acts as described. The "warn" value means the classic division
|
||
operator issues a warning (a DeprecationWarning using the standard
|
||
warning framework) when applied to ints or longs. The "warnall"
|
||
value also issues warnings for classic division when applied to
|
||
floats or complex; this is for use by the fixdiv.py conversion
|
||
script mentioned below. The "new" value changes the default
|
||
globally so that the / operator is always interpreted as true
|
||
division. The "new" option is only intended for use in certain
|
||
educational environments, where true division is required, but
|
||
asking the students to include the future division statement in
|
||
all their code would be a problem.
|
||
|
||
This option will not be supported in Python 3.0; Python 3.0 will
|
||
always interpret / as true division.
|
||
|
||
(This option was originally proposed as -D, but that turned out to
|
||
be an existing option for Jython, hence the Q -- mnemonic for
|
||
Quotient. Other names have been proposed, like -Qclassic,
|
||
-Qclassic-warn, -Qtrue, or -Qold_division etc.; these seem more
|
||
verbose to me without much advantage. After all the term classic
|
||
division is not used in the language at all (only in the PEP), and
|
||
the term true division is rarely used in the language -- only in
|
||
__truediv__.)
|
||
|
||
|
||
Semantics of Floor Division
|
||
|
||
Floor division will be implemented in all the Python numeric
|
||
types, and will have the semantics of
|
||
|
||
a // b == floor(a/b)
|
||
|
||
except that the result type will be the common type into which a
|
||
and b are coerced before the operation.
|
||
|
||
Specifically, if a and b are of the same type, a//b will be of
|
||
that type too. If the inputs are of different types, they are
|
||
first coerced to a common type using the same rules used for all
|
||
other arithmetic operators.
|
||
|
||
In particular, if a and b are both ints or longs, the result has
|
||
the same type and value as for classic division on these types
|
||
(including the case of mixed input types; int//long and long//int
|
||
will both return a long).
|
||
|
||
For floating point inputs, the result is a float. For example:
|
||
|
||
3.5//2.0 == 1.0
|
||
|
||
For complex numbers, // raises an exception, since floor() of a
|
||
complex number is not allowed.
|
||
|
||
For user-defined classes and extension types, all semantics are up
|
||
to the implementation of the class or type.
|
||
|
||
|
||
Semantics of True Division
|
||
|
||
True division for ints and longs will convert the arguments to
|
||
float and then apply a float division. That is, even 2/1 will
|
||
return a float (2.0), not an int. For floats and complex, it will
|
||
be the same as classic division.
|
||
|
||
The 2.2 implementation of true division acts as if the float type
|
||
had unbounded range, so that overflow doesn't occur unless the
|
||
magnitude of the mathematical *result* is too large to represent
|
||
as a float. For example, after "x = 1L << 40000", float(x) raises
|
||
OverflowError (note that this is also new in 2.2: previously the
|
||
outcome was platform-dependent, most commonly a float infinity). But
|
||
x/x returns 1.0 without exception, while x/1 raises OverflowError.
|
||
|
||
Note that for int and long arguments, true division may lose
|
||
information; this is in the nature of true division (as long as
|
||
rationals are not in the language). Algorithms that consciously
|
||
use longs should consider using //, as true division of longs
|
||
retains no more than 53 bits of precision (on most platforms).
|
||
|
||
If and when a rational type is added to Python (see PEP 239[2]),
|
||
true division for ints and longs should probably return a
|
||
rational. This avoids the problem with true division of ints and
|
||
longs losing information. But until then, for consistency, float is
|
||
the only choice for true division.
|
||
|
||
|
||
The Future Division Statement
|
||
|
||
If "from __future__ import division" is present in a module, or if
|
||
-Qnew is used, the / and /= operators are translated to true
|
||
division opcodes; otherwise they are translated to classic
|
||
division (until Python 3.0 comes along, where they are always
|
||
translated to true division).
|
||
|
||
The future division statement has no effect on the recognition or
|
||
translation of // and //=.
|
||
|
||
See PEP 236[4] for the general rules for future statements.
|
||
|
||
(It has been proposed to use a longer phrase, like "true_division"
|
||
or "modern_division". These don't seem to add much information.)
|
||
|
||
|
||
Open Issues
|
||
|
||
We expect that these issues will be resolved over time, as more
|
||
feedback is received or we gather more experience with the initial
|
||
implementation.
|
||
|
||
- It has been proposed to call // the quotient operator, and the /
|
||
operator the ratio operator. I'm not sure about this -- for
|
||
some people quotient is just a synonym for division, and ratio
|
||
suggests rational numbers, which is wrong. I prefer the
|
||
terminology to be slightly awkward if that avoids unambiguity.
|
||
Also, for some folks "quotient" suggests truncation towards
|
||
zero, not towards infinity as "floor division" says explicitly.
|
||
|
||
- It has been argued that a command line option to change the
|
||
default is evil. It can certainly be dangerous in the wrong
|
||
hands: for example, it would be impossible to combine a 3rd
|
||
party library package that requires -Qnew with another one that
|
||
requires -Qold. But I believe that the VPython folks need a way
|
||
to enable true division by default, and other educators might
|
||
need the same. These usually have enough control over the
|
||
library packages available in their environment.
|
||
|
||
- For classes to have to support all three of __div__(),
|
||
__floordiv__() and __truediv__() seems painful; and what to do
|
||
in 3.0? Maybe we only need __div__() and __floordiv__(), or
|
||
maybe at least true division should try __truediv__() first and
|
||
__div__() second.
|
||
|
||
|
||
Resolved Issues
|
||
|
||
- Issue: For very large long integers, the definition of true
|
||
division as returning a float causes problems, since the range of
|
||
Python longs is much larger than that of Python floats. This
|
||
problem will disappear if and when rational numbers are supported.
|
||
|
||
Resolution: For long true division, Python uses an internal
|
||
float type with native double precision but unbounded range, so
|
||
that OverflowError doesn't occur unless the quotient is too large
|
||
to represent as a native double.
|
||
|
||
- Issue: In the interim, maybe the long-to-float conversion could be
|
||
made to raise OverflowError if the long is out of range.
|
||
|
||
Resolution: This has been implemented, but, as above, the
|
||
magnitude of the inputs to long true division doesn't matter; only
|
||
the magnitude of the quotient matters.
|
||
|
||
- Issue: Tim Peters will make sure that whenever an in-range float
|
||
is returned, decent precision is guaranteed.
|
||
|
||
Resolution: Provided the quotient of long true division is
|
||
representable as a float, it suffers no more than 3 rounding
|
||
errors: one each for converting the inputs to an internal float
|
||
type with native double precision but unbounded range, and
|
||
one more for the division. However, note that if the magnitude
|
||
of the quotient is too *small* to represent as a native double,
|
||
0.0 is returned without exception ("silent underflow").
|
||
|
||
|
||
FAQ
|
||
|
||
Q. When will Python 3.0 be released?
|
||
|
||
A. We don't plan that long ahead, so we can't say for sure. We
|
||
want to allow at least two years for the transition. If Python
|
||
3.0 comes out sooner, we'll keep the 2.x line alive for
|
||
backwards compatibility until at least two years from the
|
||
release of Python 2.2. In practice, you will be able to
|
||
continue to use the Python 2.x line for several years after
|
||
Python 3.0 is released, so you can take your time with the
|
||
transition. Sites are expected to have both Python 2.x and
|
||
Python 3.x installed simultaneously.
|
||
|
||
Q. Why isn't true division called float division?
|
||
|
||
A. Because I want to keep the door open to *possibly* introducing
|
||
rationals and making 1/2 return a rational rather than a
|
||
float. See PEP 239[2].
|
||
|
||
Q. Why is there a need for __truediv__ and __itruediv__?
|
||
|
||
A. We don't want to make user-defined classes second-class
|
||
citizens. Certainly not with the type/class unification going
|
||
on.
|
||
|
||
Q. How do I write code that works under the classic rules as well
|
||
as under the new rules without using // or a future division
|
||
statement?
|
||
|
||
A. Use x*1.0/y for true division, divmod(x, y)[0] for int
|
||
division. Especially the latter is best hidden inside a
|
||
function. You may also write float(x)/y for true division if
|
||
you are sure that you don't expect complex numbers. If you
|
||
know your integers are never negative, you can use int(x/y) --
|
||
while the documentation of int() says that int() can round or
|
||
truncate depending on the C implementation, we know of no C
|
||
implementation that doesn't truncate, and we're going to change
|
||
the spec for int() to promise truncation. Note that classic
|
||
division (and floor division) round towards negative infinity,
|
||
while int() rounds towards zero, giving different answers for
|
||
negative numbers.
|
||
|
||
Q. How do I specify the division semantics for input(), compile(),
|
||
execfile(), eval() and exec?
|
||
|
||
A. They inherit the choice from the invoking module. PEP 236[4]
|
||
now lists this as a resolved problem, referring to PEP 264[5].
|
||
|
||
Q. What about code compiled by the codeop module?
|
||
|
||
A. This is dealt with properly; see PEP 264[5].
|
||
|
||
Q. Will there be conversion tools or aids?
|
||
|
||
A. Certainly. While these are outside the scope of the PEP, I
|
||
should point out two simple tools that will be released with
|
||
Python 2.2a3: Tools/scripts/finddiv.py finds division operators
|
||
(slightly smarter than "grep /") and Tools/scripts/fixdiv.py
|
||
can produce patches based on run-time analysis.
|
||
|
||
Q. Why is my question not answered here?
|
||
|
||
A. Because we weren't aware of it. If it's been discussed on
|
||
c.l.py and you believe the answer is of general interest,
|
||
please notify the second author. (We don't have the time or
|
||
inclination to answer every question sent in private email,
|
||
hence the requirement that it be discussed on c.l.py first.)
|
||
|
||
|
||
Implementation
|
||
|
||
Essentially everything mentioned here is implemented in CVS and
|
||
will be released with Python 2.2a3; most of it was already
|
||
released with Python 2.2a2.
|
||
|
||
|
||
References
|
||
|
||
[0] PEP 228, Reworking Python's Numeric Model
|
||
http://www.python.org/peps/pep-0228.html
|
||
|
||
[1] PEP 237, Unifying Long Integers and Integers, Zadka,
|
||
http://www.python.org/peps/pep-0237.html
|
||
|
||
[2] PEP 239, Adding a Rational Type to Python, Zadka,
|
||
http://www.python.org/peps/pep-0239.html
|
||
|
||
[3] PEP 240, Adding a Rational Literal to Python, Zadka,
|
||
http://www.python.org/peps/pep-0240.html
|
||
|
||
[4] PEP 236, Back to the __future__, Peters,
|
||
http://www.python.org/peps/pep-0236.html
|
||
|
||
[5] PEP 264, Future statements in simulated shells
|
||
http://www.python.org/peps/pep-0236.html
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|