Update PEP 485 from Chris Barker's edits

This commit is contained in:
Chris Angelico 2015-02-05 10:17:42 +11:00
parent f19d84a049
commit 1641c51ddd
1 changed files with 397 additions and 87 deletions

View File

@ -15,8 +15,10 @@ Abstract
========
This PEP proposes the addition of a function to the standard library
that determines whether one value is approximately equal or "close"
to another value.
that determines whether one value is approximately equal or "close" to
another value. It is also proposed that an assertion be added to the
``unittest.TestCase`` class to provide easy access for those using
unittest for testing.
Rationale
@ -37,27 +39,35 @@ the standard library.
Existing Implementations
------------------------
The standard library includes the
``unittest.TestCase.assertAlmostEqual`` method, but it:
The standard library includes the ``unittest.TestCase.assertAlmostEqual``
method, but it:
* Is buried in the unittest.TestCase class
* Is an assertion, so you can't use it as a general test (easily)
* Is an assertion, so you can't use it as a general test at the command
line, etc. (easily)
* Uses number of decimal digits or an absolute delta, which are
particular use cases that don't provide a general relative error.
* Is an absolute difference test. Often the measure of difference
requires, particularly for floating point numbers, a relative error,
i.e "Are these two values within x% of each-other?", rather than an
absolute error. Particularly when the magnatude of the values is
unknown a priori.
The numpy package has the ``allclose()`` and ``isclose()`` functions.
The numpy package has the ``allclose()`` and ``isclose()`` functions,
but they are only available with numpy.
The statistics package tests include an implementation, used for its
unit tests.
One can also find discussion and sample implementations on Stack
Overflow, and other help sites.
Overflow and other help sites.
These existing implementations indicate that this is a common need,
and not trivial to write oneself, making it a candidate for the
standard library.
Many other non-python systems provide such a test, including the Boost C++
library and the APL language (reference?).
These existing implementations indicate that this is a common need and
not trivial to write oneself, making it a candidate for the standard
library.
Proposed Implementation
@ -68,21 +78,22 @@ python-ideas list [1]_.
The new function will have the following signature::
is_close_to(actual, expected, tol=1e-8, abs_tol=0.0)
is_close(a, b, rel_tolerance=1e-9, abs_tolerance=0.0)
``actual``: is the value that has been computed, measured, etc.
``a`` and ``b``: are the two values to be tested to relative closeness
``expected``: is the "known" value.
``rel_tolerance``: is the relative tolerance -- it is the amount of
error allowed, relative to the magnitude a and b. For example, to set
a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-8, which
assures that the two values are the same within about 8 decimal
digits.
``tol``: is the relative tolerance -- it is the amount of error
allowed, relative to the magnitude of the expected value.
``abs_tol``: is an minimum absolute tolerance level -- useful for
``abs_tolerance``: is an minimum absolute tolerance level -- useful for
comparisons near zero.
Modulo error checking, etc, the function will return the result of::
abs(expected-actual) <= max(tol*expected, abs_tol)
abs(a-b) <= max( rel_tolerance * min(abs(a), abs(b), abs_tolerance )
Handling of non-finite numbers
@ -116,32 +127,82 @@ accommodate these types:
Behavior near zero
------------------
Relative comparison is problematic if either value is zero. In this
case, the difference is relative to zero, and thus will always be
smaller than the prescribed tolerance. To handle this case, an
optional parameter, ``abs_tol`` (default 0.0) can be used to set a
minimum tolerance to be used in the case of very small relative
tolerance. That is, the values will be considered close if::
Relative comparison is problematic if either value is zero. By
definition, no value is small relative to zero. And computationally,
if either value is zero, the difference is the absolute value of the
other value, and the computed absolute tolerance will be rel_tolerance
times that value. rel-tolerance is always less than one, so the
difference will never be less than the tolerance.
abs(a-b) <= abs(tol*expected) or abs(a-b) <= abs_tol
However, while mathematically correct, there are many use cases where
a user will need to know if a computed value is "close" to zero. This
calls for an absolute tolerance test. If the user needs to call this
function inside a loop or comprehension, where some, but not all, of
the expected values may be zero, it is important that both a relative
tolerance and absolute tolerance can be tested for with a single
function with a single set of parameters.
If the user sets the rel_tol parameter to 0.0, then only the absolute
tolerance will effect the result, so this function provides an
absolute tolerance check as well.
There is a similar issue if the two values to be compared straddle zero:
if a is approximately equal to -b, then a and b will never be computed
as "close".
To handle this case, an optional parameter, ``abs_tolerance`` can be
used to set a minimum tolerance used in the case of very small or zero
computed absolute tolerance. That is, the values will be always be
considered close if the difference between them is less than the
abs_tolerance
The default absolute tolerance value is set to zero because there is
no value that is appropriate for the general case. It is impossible to
know an appropriate value without knowing the likely values expected
for a given use case. If all the values tested are on order of one,
then a value of about 1e-8 might be appropriate, but that would be far
too large if expected values are on order of 1e-12 or smaller.
Any non-zero default might result in user's tests passing totally
inappropriately. If, on the other hand a test against zero fails the
first time with defaults, a user will be prompted to select an
appropriate value for the problem at hand in order to get the test to
pass.
NOTE: that the author of this PEP has resolved to go back over many of
his tests that use the numpy ``all_close()`` function, which provides
a default abs_tolerance, and make sure that the default value is
appropriate.
If the user sets the rel_tolerance parameter to 0.0, then only the
absolute tolerance will effect the result. While not the goal of the
function, it does allow it to be used as a purely absolute tolerance
check as well.
unittest assertion
-------------------
[need text here]
implementation
--------------
A sample implementation is available (as of Jan 22, 2015) on gitHub:
https://github.com/PythonCHB/close_pep/blob/master/is_close_to.py
https://github.com/PythonCHB/close_pep/blob/master
This implementation has a flag that lets the user select which
relative tolerance test to apply -- this PEP does not suggest that
that be retained, but rather than the strong test be selected.
Relative Difference
===================
There are essentially two ways to think about how close two numbers
are to each-other: absolute difference: simply ``abs(a-b)``, and
relative difference: ``abs(a-b)/scale_factor`` [2]_. The absolute
difference is trivial enough that this proposal focuses on the
relative difference.
are to each-other:
Absolute difference: simply ``abs(a-b)``
Relative difference: ``abs(a-b)/scale_factor`` [2]_.
The absolute difference is trivial enough that this proposal focuses
on the relative difference.
Usually, the scale factor is some function of the values under
consideration, for instance:
@ -152,7 +213,106 @@ consideration, for instance:
3) The minimum absolute value of the two.
4) The arithmetic mean of the two
4) The absolute value of the arithmetic mean of the two
These lead to the following possibilities for determining if two
values, a and b, are close to each other.
1) ``abs(a-b) <= tol*abs(a)``
2) ``abs(a-b) <= tol * max( abs(a), abs(b) )``
3) ``abs(a-b) <= tol * min( abs(a), abs(b) )``
4) ``abs(a-b) <= tol * (a + b)/2``
NOTE: (2) and (3) can also be written as:
2) ``(abs(a-b) <= tol*abs(a)) or (abs(a-b) <= tol*abs(a))``
3) ``(abs(a-b) <= tol*abs(a)) and (abs(a-b) <= tol*abs(a))``
(Boost refers to these as the "weak" and "strong" formulations [3]_)
These can be a tiny bit more computationally efficient, and thus are
used in the example code.
Each of these formulations can lead to slightly different results.
However, if the tolerance value is small, the differences are quite
small. In fact, often less than available floating point precision.
How much difference does it make?
---------------------------------
When selecting a method to determine closeness, one might want to know
how much of a difference it could make to use one test or the other
-- i.e. how many values are there (or what range of values) that will
pass one test, but not the other.
The largest difference is between options (2) and (3) where the
allowable absolute difference is scaled by either the larger or
smaller of the values.
Define ``delta`` to be the difference between the allowable absolute
tolerance defined by the larger value and that defined by the smaller
value. That is, the amount that the two input values need to be
different in order to get a different result from the two tests.
``tol`` is the relative tolerance value.
Assume that ``a`` is the larger value and that both ``a`` and ``b``
are positive, to make the analysis a bit easier. ``delta`` is
therefore::
delta = tol * (a-b)
or::
delta / tol = (a-b)
The largest absolute difference that would pass the test: ``(a-b)``,
equals the tolerance times the larger value::
(a-b) = tol * a
Substituting into the expression for delta::
delta / tol = tol * a
so::
delta = tol**2 * a
For example, for ``a = 10``, ``b = 9``, ``tol = 0.1`` (10%):
maximum tolerance ``tol * a == 0.1 * 10 == 1.0``
minimum tolerance ``tol * b == 0.1 * 9.0 == 0.9``
delta = ``(1.0 - 0.9) * 0.1 = 0.1`` or ``tol**2 * a = 0.1**2 * 10 = .01``
The absolute difference between the maximum and minimum tolerance
tests in this case could be substantial. However, the primary use
case for the proposed function is testing the results of computations.
In that case a relative tolerance is likely to be selected of much
smaller magnitude.
For example, a relative tolerance of ``1e-8`` is about half the
precision available in a python float. In that case, the difference
between the two tests is ``1e-8**2 * a`` or ``1e-16 * a``, which is
close to the limit of precision of a python float. If the relative
tolerance is set to the proposed default of 1e-9 (or smaller), the
difference between the two tests will be lost to the limits of
precision of floating point. That is, each of the four methods will
yield exactly the same results for all values of a and b.
In addition, in common use, tolerances are defined to 1 significant
figure -- that is, 1e-8 is specifying about 8 decimal digits of
accuracy. So the difference between the various possible tests is well
below the precision to which the tolerance is specified.
Symmetry
@ -161,46 +321,113 @@ Symmetry
A relative comparison can be either symmetric or non-symmetric. For a
symmetric algorithm:
``is_close_to(a,b)`` is always equal to ``is_close_to(b,a)``
``is_close_to(a,b)`` is always the same as ``is_close_to(b,a)``
This is an appealing consistency -- it mirrors the symmetry of
equality, and is less likely to confuse people. However, often the
question at hand is:
If a relative closeness test uses only one of the values (such as (1)
above), then the result is asymmetric, i.e. is_close_to(a,b) is not
necessarily the same as is_close_to(b,a).
"Is this computed or measured value within some tolerance of a known
value?"
Which approach is most appropriate depends on what question is being
asked. If the question is: "are these two numbers close to each
other?", there is no obvious ordering, and a symmetric test is most
appropriate.
In this case, the user wants the relative tolerance to be specifically
scaled against the known value. It is also easier for the user to
reason about.
However, if the question is: "Is the computed value within x% of this
known value?", then it is appropriate to scale the tolerance to the
known value, and an asymmetric test is most appropriate.
This proposal uses this asymmetric test to allow this specific
definition of relative tolerance.
From the previous section, it is clear that either approach would
yield the same or similar results in the common use cases. In that
case, the goal of this proposal is to provide a function that is least
likely to produce surprising results.
Example:
The symmetric approach provide an appealing consistency -- it
mirrors the symmetry of equality, and is less likely to confuse
people. A symmetric test also relieves the user of the need to think
about the order in which to set the arguments. It was also pointed
out that there may be some cases where the order of evaluation may not
be well defined, for instance in the case of comparing a set of values
all against each other.
For the question: "Is the value of a within 10% of b?", Using b to
scale the percent error clearly defines the result.
There may be cases when a user does need to know that a value is
within a particular range of a known value. In that case, it is easy
enough to simply write the test directly::
However, as this approach is not symmetric, a may be within 10% of b,
but b is not within 10% of a. Consider the case::
if a-b <= tol*a:
a = 9.0
b = 10.0
(assuming a > b in this case). There is little need to provide a
function for this particular case.
The difference between a and b is 1.0. 10% of a is 0.9, so b is not
within 10% of a. But 10% of b is 1.0, so a is within 10% of b.
This proposal uses a symmetric test.
Casual users might reasonably expect that if a is close to b, then b
would also be close to a. However, in the common cases, the tolerance
is quite small and often poorly defined, i.e. 1e-8, defined to only
one significant figure, so the result will be very similar regardless
of the order of the values. And if the user does care about the
precise result, s/he can take care to always pass in the two
parameters in sorted order.
Which symmetric test?
---------------------
This proposed implementation uses asymmetric criteria with the scaling
value clearly identified.
There are three symmetric tests considered:
The case that uses the arithmetic mean of the two values requires that
the value be either added together before dividing by 2, which could
result in extra overflow to inf for very large numbers, or require
each value to be divided by two before being added together, which
could result in underflow to -inf for very small numbers. This effect
would only occur at the very limit of float values, but it was decided
there as no benefit to the method worth reducing the range of
functionality.
This leaves the boost "weak" test (2)-- or using the smaller value to
scale the tolerance, or the Boost "strong" (3) test, which uses the
smaller of the values to scale the tolerance. For small tolerance,
they yield the same result, but this proposal uses the boost "strong"
test case: it is symmetric and provides a slightly stricter criteria
for tolerance.
Defaults
========
Default values are required for the relative and absolute tolerance.
Relative Tolerance Default
--------------------------
The relative tolerance required for two values to be considered
"close" is entirely use-case dependent. Nevertheless, the relative
tolerance needs to be less than 1.0, and greater than 1e-16
(approximate precision of a python float). The value of 1e-9 was
selected because it is the largest relative tolerance for which the
various possible methods will yield the same result, and it is also
about half of the precision available to a python float. In the
general case, a good numerical algorithm is not expected to lose more
than about half of available digits of accuracy, and if a much larger
tolerance is acceptable, the user should be considering the proper
value in that case. Thus 1-e9 is expected to "just work" for many
cases.
Absolute tolerance default
--------------------------
The absolute tolerance value will be used primarily for comparing to
zero. The absolute tolerance required to determine if a value is
"close" to zero is entirely use-case dependent. There is also
essentially no bounds to the useful range -- expected values would
conceivably be anywhere within the limits of a python float. Thus a
default of 0.0 is selected.
If, for a given use case, a user needs to compare to zero, the test
will be guaranteed to fail the first time, and the user can select an
appropriate value.
It was suggested that comparing to zero is, in fact, a common use case
(evidence suggest that the numpy functions are often used with zero).
In this case, it would be desirable to have a "useful" default. Values
around 1-e8 were suggested, being about half of floating point
precision for values of around value 1.
However, to quote The Zen: "In the face of ambiguity, refuse the
temptation to guess." Guessing that users will most often be concerned
with values close to 1.0 would lead to spurious passing tests when used
with smaller values -- this is potentially more damaging than
requiring the user to thoughtfully select an appropriate value.
Expected Uses
@ -208,10 +435,23 @@ Expected Uses
The primary expected use case is various forms of testing -- "are the
results computed near what I expect as a result?" This sort of test
may or may not be part of a formal unit testing suite.
may or may not be part of a formal unit testing suite. Such testing
could be used one-off at the command line, in an iPython notebook,
part of doctests, or simple assets in an ``if __name__ == "__main__"``
block.
The function might be used also to determine if a measured value is
within an expected value.
The proposed unitest.TestCase assertion would have course be used in
unit testing.
It would also be an appropriate function to use for the termination
criteria for a simple iterative solution to an implicit function::
guess = something
while True:
new_guess = implicit_function(guess, *args)
if is_close(new_guess, guess):
break
guess = new_guess
Inappropriate uses
@ -238,8 +478,8 @@ Tests that values are approximately (or not approximately) equal by
computing the difference, rounding to the given number of decimal
places (default 7), and comparing to zero.
This method was not selected for this proposal, as the use of decimal
digits is a specific, not generally useful or flexible test.
This method is purely an absolute tolerance test, and does not address
the need for a relative tolerance test.
numpy ``is_close()``
--------------------
@ -262,13 +502,16 @@ all_close, for similar use cases as this proposal:
In this approach, the absolute and relative tolerance are added
together, rather than the ``or`` method used in this proposal. This is
computationally more simple, and if relative tolerance is larger than
the absolute tolerance, then the addition will have no effect. But if
the absolute and relative tolerances are of similar magnitude, then
the absolute tolerance, then the addition will have no effect. However,
if the absolute and relative tolerances are of similar magnitude, then
the allowed difference will be about twice as large as expected.
Also, if the value passed in are small compared to the absolute
tolerance, then the relative tolerance will be completely swamped,
perhaps unexpectedly.
This makes the function harder to understand, with no computational
advantage in this context.
Even more critically, if the values passed in are small compared to
the absolute tolerance, then the relative tolerance will be
completely swamped, perhaps unexpectedly.
This is why, in this proposal, the absolute tolerance defaults to zero
-- the user will be required to choose a value appropriate for the
@ -279,25 +522,92 @@ Boost floating-point comparison
-------------------------------
The Boost project ( [3]_ ) provides a floating point comparison
function. Is is a symetric approach, with both "weak" (larger of the
function. Is is a symmetric approach, with both "weak" (larger of the
two relative errors) and "strong" (smaller of the two relative errors)
options.
options. This proposal uses the Boost "strong" approach. There is no
need to complicate the API by providing the option to select different
methods when the results will be similar in most cases, and the user
is unlikely to know which to select in any case.
Alternate Proposals
-------------------
A Recipe
'''''''''
The primary alternate proposal was to not provide a standard library
function at all, but rather, provide a recipe for users to refer to.
This would have the advantage that the recipe could provide and
explain the various options, and let the user select that which is
most appropriate. However, that would require anyone needing such a
test to, at the very least, copy the function into their code base,
and select the comparison method to use.
In addition, adding the function to the standard library allows it to
be used in the ``unittest.TestCase.assertIsClose()`` method, providing
a substantial convenience to those using unittest.
``zero_tol``
''''''''''''
One possibility was to provide a zero tolerance parameter, rather than
the absolute tolerance parameter. This would be an absolute tolerance
that would only be applied in the case of one of the arguments being
exactly zero. This would have the advantage of retaining the full
relative tolerance behavior for all non-zero values, while allowing
tests against zero to work. However, it would also result in the
potentially surprising result that a small value could be "close" to
zero, but not "close" to an even smaller value. e.g., 1e-10 is "close"
to zero, but not "close" to 1e-11.
No absolute tolerance
'''''''''''''''''''''
Given the issues with comparing to zero, another possibility would
have been to only provide a relative tolerance, and let every
comparison to zero fail. In this case, the user would need to do a
simple absolute test: `abs(val) < zero_tol` in the case where the
comparison involved zero.
However, this would not allow the same call to be used for a sequence
of values, such as in a loop or comprehension, or in the
``TestCase.assertClose()`` method. Making the function far less
useful. It is noted that the default abs_tolerance=0.0 achieves the
same effect if the default is not overidden.
Other tests
''''''''''''
The other tests considered are all discussed in the Relative Error
section above.
It was decided that a method that clearly defined which value was used
to scale the relative error would be more appropriate for the standard
library.
References
==========
.. [1] Python-ideas list discussion thread
(https://mail.python.org/pipermail/python-ideas/2015-January/030947.html)
.. [1] Python-ideas list discussion threads
.. [2] Wikipedaia page on relative difference
(http://en.wikipedia.org/wiki/Relative_change_and_difference)
https://mail.python.org/pipermail/python-ideas/2015-January/030947.html
https://mail.python.org/pipermail/python-ideas/2015-January/031124.html
https://mail.python.org/pipermail/python-ideas/2015-January/031313.html
.. [2] Wikipedia page on relative difference
http://en.wikipedia.org/wiki/Relative_change_and_difference
.. [3] Boost project floating-point comparison algorithms
(http://www.boost.org/doc/libs/1_35_0/libs/test/doc/components/test_tools/floating_point_comparison.html)
http://www.boost.org/doc/libs/1_35_0/libs/test/doc/components/test_tools/floating_point_comparison.html
.. Bruce Dawson's discussion of floating point.
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
Copyright