python-peps/pep-0327.txt

653 lines
22 KiB
Plaintext
Raw Normal View History

PEP: 327
Title: Decimal Data Type
Version: $Revision$
Last-Modified: $Date$
Author: Facundo Batista <facundo@taniquetil.com.ar>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 17-Oct-2003
Python-Version: 2.4
Post-History: 30-Nov-2003, 02-Jan-2004
Abstract
========
The idea is to have a Decimal data type, for every use where decimals
are needed but binary floating point is too inexact.
The Decimal data type will support the Python standard functions and
operations, and must comply the decimal arithmetic ANSI standard
X3.274-1996 [1]_.
Decimal will be floating point (as opposed to fixed point) and will
have bounded precision (the precision is the upper limit on the
quantity of significant digits in a result).
This work is based on code and test functions written by Eric Price,
Aahz and Tim Peters. Actually I'll work on the Decimal.py code in the
sandbox (at python/nondist/sandbox/decimal in the SourceForge CVS
repository). Much of the explanation in this PEP is taken from
Cowlishaw's work [2]_ and comp.lang.python.
Motivation
==========
Here I'll expose the reasons of why I think a Decimal data type is
needed and why others numeric data types are not enough.
I wanted a Money data type, and after proposing a pre-PEP in
comp.lang.python, the community agreed to have a numeric data type
with the needed arithmetic behaviour, and then build Money over it:
all the considerations about quantity of digits after the decimal
point, rounding, etc., will be handled through Money. It is not the
purpose of this PEP to have a data type that can be used as Money
without further effort.
One of the biggest advantages of implementing a standard is that
someone already thought all the creepy cases for you. And to a
standard GvR redirected me: Mike Cowlishaw's General Decimal
Arithmetic specification [2]_. This document defines a general
purpose decimal arithmetic. A correct implementation of this
specification will conform to the decimal arithmetic defined in
ANSI/IEEE standard 854-1987, except for some minor restrictions, and
will also provide unrounded decimal arithmetic and integer arithmetic
as proper subsets.
The problem with binary float
-----------------------------
In decimal math, there are many numbers that can't be represented with
a fixed number of decimal digits, e.g. 1/3 = 0.3333333333.......
In base 2 (the way that standard floating point is calculated), 1/2 =
0.1, 1/4 = 0.01, 1/8 = 0.001, etc. Decimal 0.2 equals 2/10 equals
1/5, resulting in the binary fractional number
0.001100110011001... As you can see, the problem is that some decimal
numbers can't be represented exactly in binary, resulting in small
roundoff errors.
So we need a decimal data type that represents exactly decimal
numbers. Instead of a binary data type, we need a decimal one.
Why floating point?
-------------------
So we go to decimal, but why *floating point*?
Floating point numbers use a fixed quantity of digits (precision) to
represent a number, working with an exponent when the number gets too
big or too small. For example, with a precision of 5::
1234 ==> 1234e0
12345 ==> 12345e0
123456 ==> 12345e1
In contrast, we have the example of a ``long`` integer with infinite
precision, meaning that you can have the number as big as you want,
and you'll never lose any information.
In a fixed point number, the position of the decimal point is fixed.
For a fixed point data type, check Tim Peter's FixedPoint at
SourceForge [4]_. I'll go for floating point because it's easier to
implement the arithmetic behaviour of the standard, and then you can
implement a fixed point data type over Decimal.
But why can't we have a floating point number with infinite precision?
It's not so easy, because of inexact divisions. E.g.: 1/3 =
0.3333333333333... ad infinitum. In this case you should store a
infinite amount of 3s, which takes too much memory, ;).
John Roth proposed to eliminate the division operator and force the
user to use an explicit method, just to avoid this kind of trouble.
This generated adverse reactions in comp.lang.python, as everybody
wants to have support for the ``/`` operator in a numeric data type.
With this exposed maybe you're thinking "Hey! Can we just store the 1
and the 3 as numerator and denominator?", which take us to the next
point.
Why not rational
----------------
Rational numbers are stored using two integer numbers, the numerator
and the denominator. This implies that the arithmetic operations
can't be executed directly (e.g. to add two rational numbers you first
need to calculate the common denominator).
Quoting Alex Martelli:
The performance implications of the fact that summing two
rationals (which take O(M) and O(N) space respectively) gives a
rational which takes O(M+N) memory space is just too troublesome.
There are excellent Rational implementations in both pure Python
and as extensions (e.g., gmpy), but they'll always be a "niche
market" IMHO. Probably worth PEPping, not worth doing without
Decimal -- which is the right way to represent sums of money, a
truly major use case in the real world.
Anyway, if you're interested in this data type, you maybe will want to
take a look at PEP 239: Adding a Rational Type to Python.
So, what do we have?
--------------------
The result is a Decimal data type, with bounded precision and floating
point.
Will it be useful? I can't say it better than Alex Martelli:
Python (out of the box) doesn't let you have binary floating point
numbers *with whatever precision you specify*: you're limited to
what your hardware supplies. Decimal, be it used as a fixed or
floating point number, should suffer from no such limitation:
whatever bounded precision you may specify on number creation
(your memory permitting) should work just as well. Most of the
expense of programming simplicity can be hidden from application
programs and placed in a suitable decimal arithmetic type. As per
http://www2.hursley.ibm.com/decimal/, *a single data type can be
used for integer, fixed-point, and floating-point decimal
arithmetic* -- and for money arithmetic which doesn't drive the
application programmer crazy.
There are several uses for such a data type. As I said before, I will
use it as base for Money. In this case the bounded precision is not
an issue; quoting Tim Peters:
A precision of 20 would be way more than enough to account for
total world economic output, down to the penny, since the
beginning of time.
General Decimal Arithmetic Specification
========================================
Here I'll include information and descriptions that are part of the
specification [2]_ (the structure of the number, the context, etc.).
All the requirements included in this section are not for discussion
(barring typos or other mistakes), as they are in the standard, and
the PEP is just for implementing the standard.
Because of copyright restrictions, I can not copy here explanations
taken from the specification, so I'll try to explain it in my own
words. I firmly encourage you to read the original specification
document [2]_ for details or if you have any doubt.
The Arithmetic Model
--------------------
The specification is based on a decimal arithmetic model, as defined
by the relevant standards: IEEE 854 [3]_, ANSI X3-274 [1]_, and the
proposed revision [5]_ of IEEE 754 [6]_.
The model has three components:
- Numbers: just the values that the operation uses as input or output.
- Operations: addition, multiplication, etc.
- Context: a set of parameters and rules that the user can select and
which govern the results of operations (for example, the precision
to be used).
Numbers
-------
Numbers may be finite or special values. The former can be
represented exactly. The latter are infinites and undefined (such as
0/0).
Finite numbers are defined by three parameters:
- Sign: 0 (positive) or 1 (negative).
- Coefficient: a non-negative integer.
- Exponent: a signed integer, the power of ten of the coefficient
multiplier.
The numerical value of a finite number is given by::
(-1)**sign * coefficient * 10**exponent
Special values are named as following:
- Infinity: a value which is infinitely large. Could be positive or
negative.
- Quiet NaN ("qNaN"): represent undefined results (*Not a Number*).
Does not cause an Invalid operation condition. The sign in a NaN
has no meaning.
- Signaling NaN ("sNaN"): also *Not a Number*, but will cause an
Invalid operation condition if used in any operation.
Context
-------
The context is a set of parameters and rules that the user can select
and which govern the results of operations (for example, the precision
to be used).
The context gets that name because surrounds the Decimal numbers.
It's up to the implementation to work with one or several contexts,
but definitely the idea is not to get a context per Decimal number.
These definitions don't affect the internal storage of the Decimal
numbers, just the way that the arithmetic operations are performed.
The context is defined by the following parameters:
- Precision: The maximum number of significant digits that can result
from an arithmetic operation (integer > 0).
- Rounding: The name of the algorithm to be used when rounding is
necessary, one of "round-down", "round-half-up", "round-half-even",
"round-ceiling", "round-floor", "round-half-down", and "round-up".
See `Rounding Algorithms`_ below.
- Flags and trap-enablers: `Exceptional conditions`_ are grouped into
signals, controllable individually, each consisting of a flag
(boolean, set when the signal occurs) and a trap-enabler (a boolean
that controls behavior). The signals are: "clamped",
"division-by-zero", "inexact", "invalid-operation", "overflow",
"rounded", "subnormal" and "underflow".
Default Contexts
----------------
The specification defines two default contexts, which should be easily
selectable by the user.
Basic Default Context:
- flags: all set to 0
- trap-enablers: inexact, rounded, and subnormal are set to 0; all
others are set to 1
- precision: is set to 9
- rounding: is set to round-half-up
Extended Default Context:
- flags: all set to 0
- trap-enablers: all set to 0
- precision: is set to the designated single precision
- rounding: is set to round-half-even
Exceptional Conditions
----------------------
The table below lists the exceptional conditions that may arise during
the arithmetic operations, the corresponding signal, and the defined
result. For details, see the specification [2]_.
==================== ================= ===================================
Condition Signal Result
==================== ================= ===================================
Clamped clamped see spec [2]_
Conversion syntax invalid-operation [0,qNaN]
Division by zero division-by-zero [sign,inf]
Division impossible invalid-operation [0,qNaN]
Division undefined invalid-operation [0,qNaN]
Inexact inexact unchanged
Insufficient storage [0,qNaN]
Invalid context invalid-operation [0,qNaN]
Invalid operation invalid-operation [0,qNaN] (or [s,qNaN] or [s,qNaN,d]
when the cause is a signaling NaN)
Overflow overflow depends on the rounding mode
Rounded rounded unchanged
Subnormal subnormal unchanged
Underflow underflow see spec [2]_
==================== ================= ===================================
Rounding Algorithms
-------------------
``round-down``: The discarded digits are ignored; the result is
unchanged (round toward 0, truncate)::
1.123 --> 1.12
1.128 --> 1.12
1.125 --> 1.12
1.135 --> 1.13
``round-half-up``: If the discarded digits represent greater than or
equal to half (0.5) then the result should be incremented by 1;
otherwise the discarded digits are ignored::
1.123 --> 1.12
1.128 --> 1.13
1.125 --> 1.13
1.135 --> 1.14
``round-half-even``: If the discarded digits represent greater than
half (0.5) then the result coefficient is incremented by 1; if they
represent less than half, then the result is not adjusted; otherwise
the result is unaltered if its rightmost digit is even, or incremented
by 1 if its rightmost digit is odd (to make an even digit)::
1.123 --> 1.12
1.128 --> 1.13
1.125 --> 1.12
1.135 --> 1.14
``round-ceiling``: If all of the discarded digits are zero or if the
sign is negative the result is unchanged; otherwise, the result is
incremented by 1::
1.123 --> 1.13
1.128 --> 1.13
-1.123 --> -1.12
-1.128 --> -1.12
``round-floor``: If all of the discarded digits are zero or if the
sign is positive the result is unchanged; otherwise, the absolute
value of the result is incremented by 1::
1.123 --> 1.12
1.128 --> 1.12
-1.123 --> -1.13
-1.128 --> -1.13
``round-half-down``: If the discarded digits represent greater than
half (0.5) then the result is incremented by 1; otherwise the
discarded digits are ignored::
1.123 --> 1.12
1.128 --> 1.13
1.125 --> 1.12
1.135 --> 1.13
``round-up``: If all of the discarded digits are zero the result is
unchanged, otherwise the result is incremented by 1 (round away from
0)::
1.123 --> 1.13
1.128 --> 1.13
1.125 --> 1.13
1.135 --> 1.14
Rationale
=========
I must separate the requirements in two sections. The first is to
comply with the ANSI standard. All the requirements for this are
specified in the Mike Cowlishaw's work [2]_. He also provided a
**comprehensive** suite of test cases.
The second section of requirements (standard Python functions support,
usability, etc.) is detailed from here, where I'll include all the
decisions made and why, and all the subjects still being discussed.
Explicit construction
---------------------
The explicit construction does not get affected by the context (there
is no rounding, no limits by the precision, etc.), because the context
affects just operations' results.
**From int or long**: There's no loss and no need to specify any other
information::
Decimal(35)
Decimal(-124)
**From string**: Strings with floats in normal and engineering
notation will be supported. In this transformation there is no loss
of information, as the string is directly converted to Decimal (there
is not an intermediate conversion through float)::
Decimal("-12")
Decimal("23.2e-7")
**From float**: The initial discussion on this item was what should
happen when passing floating point to the constructor:
1. ``Decimal(1.1) == Decimal('1.1')``
2. ``Decimal(1.1) ==
Decimal('110000000000000008881784197001252...e-51')``
3. an exception is raised
Several people alleged that (1) is the better option here, because
it's what you expect when writing ``Decimal(1.1)``. And quoting John
Roth, it's easy to implement:
It's not at all difficult to find where the actual number ends and
where the fuzz begins. You can do it visually, and the algorithms
to do it are quite well known.
But If I *really* want my number to be
``Decimal('110000000000000008881784197001252...e-51')``, why can not
write ``Decimal(1.1)``? Why should I expect Decimal to be "rounding"
it? Remember that ``1.1`` *is* binary floating point, so I can
predict the result. It's not intuitive to a beginner, but that's the
way it is.
Anyway, Paul Moore shown that (1) can't be, because::
(1) says D(1.1) == D('1.1')
but 1.1 == 1.1000000000000001
so D(1.1) == D(1.1000000000000001)
together: D(1.1000000000000001) == D('1.1')
which is wrong, because if I write ``Decimal('1.1')`` it is exact, not
``D(1.1000000000000001)``. He also proposed to have an explicit
conversion to float. bokr says you need to put the precision in the
constructor and mwilson has the idea to::
d = Decimal (1.1, 1) # take float value to 1 decimal place
d = Decimal (1.1) # gets `places` from pre-set context
But Alex Martelli says that:
Constructing with some specified precision would be fine. Thus,
I think "construction from float with some default precision" runs
a substantial risk of tricking naive users.
So, I think that the best solution is to have a parameter that says in
which position after the decimal point you apply a round-half-up
rounding. If you do not specify this parameter, you get an exact
conversion. In this way::
Decimal(1.1, 2) == Decimal('1.1')
Decimal(1.1, 16) == Decimal('1.1000000000000001')
Decimal(1.1) == Decimal('110000000000000008881784197001252...e-51')
**From tuples**: Aahz suggested to construct from tuples: it's easier
to implement ``eval()``'s round trip and "someone who has numeric
values representing a Decimal does not need to convert them to a
string."
The structure will be a tuple of three elements: sign, number and
exponent. The sign is 1 or 0, the number is a tuple of decimal digits
and the exponent is a signed int or long::
Decimal((1, (3, 2, 2, 5), -2)) # for -32.25
**From Decimal**: No mystery here, just a copy.
**Syntax to all the cases**::
Decimal(value, [decimal_digits])
where ``value`` can be any of the data types just mentioned and
``decimal_digits`` is allowed only when value is float.
Implicit construction
---------------------
As the implicit construction is the consequence of an operation, it
will be affected by the context as is detailed in each point.
John Roth suggested that "The other type should be handled in the same
way the decimal() constructor would handle it". But Alex Martelli
thinks that
this total breach with Python tradition would be a terrible
mistake. 23+"43" is NOT handled in the same way as 23+int("45"),
and a VERY good thing that is too. It's a completely different
thing for a user to EXPLICITLY indicate they want construction
(conversion) and to just happen to sum two objects one of which by
mistake could be a string.
So, here I define the behaviour again for each data type.
**From int or long**: Aahz suggested the need of an explicit
conversion from int, but also thinks it's OK if the precision in the
current Context is not exceeded; in that case you raise ValueError.
Votes in comp.lang.python agreed with this.
**From string**: Everybody agrees to raise an exception here.
**From float**: Aahz is strongly opposed to interact with float,
suggesting an explicit conversion:
The problem is that Decimal is capable of greater precision,
accuracy, and range than float.
But in Python it's OK to do ``35 + 1.1``, so why can't I do
``Decimal(35) + 1.1``? We agree that when a naive user writes ``1.1``
doesn't know that he's being inexact, but that happens in the both
examples I just mentioned.
So, what should we do? I propose to allow the interaction with float,
making an exact conversion and raising ValueError if exceeds the
precision in the current context (this is maybe too tricky, because
for example with a precision of 9, ``Decimal(35) + 1.2`` is OK but
``Decimal(35) + 1.1`` raises an error).
**From Decimal**: There isn't any issue here.
Use of Context
--------------
In the last pre-PEP I said that "The Context must be omnipresent,
meaning that changes to it affects all the current and future Decimal
instances". I was wrong. In response, John Roth said:
The context should be selectable for the particular usage. That
is, it should be possible to have several different contexts in
play at one time in an application.
In comp.lang.python, Aahz explained that the idea is to have a
"context per thread". So, all the instances of a thread belongs to a
context, and you can change a context in thread A (and the behaviour
of the instances of that thread) without changing nothing in thread B.
Also, and again correcting me, he said:
(the) Context applies only to operations, not to Decimal
instances; changing the Context does not affect existing instances
if there are no operations on them.
Arguing about special cases when there's need to perform operations
with other rules that those of the current context, Tim Peters said
that the context will have the operations as methods. This way, the
user "can create whatever private context object(s) it needs, and
spell arithmetic as explicit method calls on its private context
object(s), so that the default thread context object is neither
consulted nor modified".
Python Usability
----------------
- Decimal should support the basic arithmetic (``+, -, *, /, //, **,
%, divmod``) and comparison (``==, !=, <, >, <=, >=, cmp``)
operators in the following cases (check `Implicit Construction`_ to
see what types could OtherType be, and what happens in each case):
- Decimal op Decimal
- Decimal op otherType
- otherType op Decimal
- Decimal op= Decimal
- Decimal op= otherType
- Decimal should support unary operators (``-, +, abs``).
- Decimal should support the built-in methods:
- min, max
- float, int, long
- str, repr
- hash
- copy, deepcopy
- bool (0 is false, otherwise true)
- Calling repr() should do round trip, meaning that::
m = Decimal(...)
m == eval(repr(m))
- Decimal should be immutable.
Reference Implementation
========================
To be included later:
- code
- test code
- documentation
References
==========
.. [1] ANSI standard X3.274-1996 (Programming Language REXX):
http://www.rexxla.org/Standards/ansi.html
.. [2] General Decimal Arithmetic specification (Cowlishaw):
http://www2.hursley.ibm.com/decimal/decarith.html (related
documents and links at http://www2.hursley.ibm.com/decimal/)
.. [3] ANSI/IEEE standard 854-1987 (Radix-Independent Floating-Point
Arithmetic):
http://www.cs.berkeley.edu/~ejr/projects/754/private/drafts/854-1987/dir.html
(unofficial text; official copies can be ordered from
http://standards.ieee.org/catalog/ordering.html)
.. [4] Tim Peter's FixedPoint at SourceForge:
http://fixedpoint.sourceforge.net/
.. [5] IEEE 754 revision:
http://grouper.ieee.org/groups/754/revision.html
.. [6] IEEE 754 references:
http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: