First public draft of new PEP 485 for an is_close_to() function

2015-01-22 12:15:51 +11:00 · 2015-01-22 12:15:51 +11:00 · 6c5d998ce0
parent 837ab51351
commit 6c5d998ce0
1 changed files with 311 additions and 0 deletions
--- a/pep-0485.txt
+++ b/pep-0485.txt
@ -0,0 +1,311 @@
+PEP: 485
+Title: A Function for testing approximate equality
+Version: $Revision$
+Last-Modified: $Date$
+Author: Christopher Barker <Chris.Barker@noaa.gov>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 20-Jan-2015
+Python-Version: 3.5
+Post-History:
+
+
+Abstract
+========
+
+This PEP proposes the addition of a function to the standard library
+that determines whether one value is approximately equal or "close"
+to another value. 
+
+
+Rationale
+=========
+
+Floating point values contain limited precision, which results in
+their being unable to exactly represent some values, and for error to
+accumulate with repeated computation.  As a result, it is common
+advice to only use an equality comparison in very specific situations.
+Often a inequality comparison fits the bill, but there are times
+(often in testing) where the programmer wants to determine whether a
+computed value is "close" to an expected value, without requiring them
+to be exactly equal. This is common enough, particularly in testing,
+and not always obvious how to do it, so it would be useful addition to
+the standard library.
+
+
+Existing Implementations
+------------------------
+
+The standard library includes the
+``unittest.TestCase.assertAlmostEqual`` method, but it:
+
+* Is buried in the unittest.TestCase class
+
+* Is an assertion, so you can't use it as a general test (easily)
+
+* Uses number of decimal digits or an absolute delta, which are
+  particular use cases that don't provide a general relative error.
+
+The numpy package has the ``allclose()`` and ``isclose()`` functions.
+
+The statistics package tests include an implementation, used for its
+unit tests.
+
+One can also find discussion and sample implementations on Stack
+Overflow, and other help sites.
+
+These existing implementations indicate that this is a common need,
+and not trivial to write oneself, making it a candidate for the
+standard library.
+
+
+Proposed Implementation
+=======================
+
+NOTE: this PEP is the result of an extended discussion on the
+python-ideas list [1]_.
+
+The new function will have the following signature::
+
+  is_close_to(actual, expected, tol=1e-8, abs_tol=0.0)
+
+``actual``: is the value that has been computed, measured, etc.
+
+``expected``: is the "known" value.
+
+``tol``: is the relative tolerance -- it is the amount of error
+allowed, relative to the magnitude of the expected value.
+
+``abs_tol``: is an minimum absolute tolerance level -- useful for
+comparisons near zero.
+
+Modulo error checking, etc, the function will return the result of::
+
+    abs(expected-actual) <= max(tol*actual, abs_tol)
+
+
+Handling of non-finite numbers
+------------------------------
+
+The IEEE 754 special values of NaN, inf, and -inf will be handled
+according to IEEE rules. Specifically, NaN is not considered close to
+any other value, including NaN. inf and -inf are only considered close
+to themselves.
+
+
+Non-float types
+---------------
+
+The primary use-case is expected to be floating point numbers.
+However, users may want to compare other numeric types similarly. In
+theory, it should work for any type that supports ``abs()``,
+comparisons, and subtraction.  The code will be written and tested to
+accommodate these types:
+
+ * ``Decimal``
+
+ * ``int``
+
+ * ``Fraction``
+ 
+ * ``complex``: for complex, ``abs(z)`` will be used for scaling and
+   comparison.
+
+
+Behavior near zero
+------------------
+
+Relative comparison is problematic if either value is zero. In this
+case, the difference is relative to zero, and thus will always be
+smaller than the prescribed tolerance. To handle this case, an
+optional parameter, ``abs_tol`` (default 0.0) can be used to set a
+minimum tolerance to be used in the case of very small relative
+tolerance. That is, the values will be considered close if::
+
+    abs(a-b) <= abs(tol*actual) or abs(a-b) <= abs_tol
+
+If the user sets the rel_tol parameter to 0.0, then only the absolute
+tolerance will effect the result, so this function provides an
+absolute tolerance check as well.
+
+
+Relative Difference
+===================
+
+There are essentially two ways to think about how close two numbers
+are to each-other: absolute difference: simple ``abs(a-b)``, and
+relative difference: ``abs(a-b)/scale_factor`` [2]_. The absolute
+difference is trivial enough that this proposal focuses on the
+relative difference.
+
+Usually, the scale factor is some function of the values under
+consideration, for instance: 
+
+ 1) The absolute value of one of the input values
+
+ 2) The maximum absolute value of the two
+
+ 3) The minimum absolute value of the two.
+
+ 4) The arithmetic mean of the two
+
+
+Symmetry
+--------
+
+A relative comparison can be either symmetric or non-symmetric. For a
+symmetric algorithm:
+
+``is_close_to(a,b)`` is always equal to ``is_close_to(b,a)``
+
+This is an appealing consistency -- it mirrors the symmetry of
+equality, and is less likely to confuse people. However, often the
+question at hand is:
+
+"Is this computed or measured value within some tolerance of a known
+value?"
+
+In this case, the user wants the relative tolerance to be specifically
+scaled against the known value. It is also easier for the user to
+reason about.
+
+This proposal uses this asymmetric test to allow this specific
+definition of relative tolerance.
+
+Example:
+
+For the question: "Is the value of a within x% of b?", Using b to
+scale the percent error clearly defines the result.
+
+However, as this approach is not symmetric, a may be within 10% of b,
+but b is not within x% of a. Consider the case::
+
+  a =  9.0
+  b = 10.0
+
+The difference between a and b is 1.0. 10% of a is 0.9, so b is not
+within 10% of a. But 10% of b is 1.0, so a is within 10% of b. 
+
+Casual users might reasonably expect that if a is close to b, then b
+would also be close to a. However, in the common cases, the tolerance
+is quite small and often poorly defined, i.e. 1e-8, defined to only
+one significant figure, so the result will be very similar regardless
+of the order of the values. And if the user does care about the
+precise result, s/he can take care to always pass in the two
+parameters in sorted order.
+
+This proposed implementation uses asymmetric criteria with the scaling
+value clearly identified.
+
+
+Expected Uses
+=============
+
+The primary expected use case is various forms of testing -- "are the
+results computed near what I expect as a result?" This sort of test
+may or may not be part of a formal unit testing suite.
+
+The function might be used also to determine if a measured value is
+within an expected value.
+
+
+Inappropriate uses
+------------------
+
+One use case for floating point comparison is testing the accuracy of
+a numerical algorithm. However, in this case, the numerical analyst
+ideally would be doing careful error propagation analysis, and should
+understand exactly what to test for. It is also likely that ULP (Unit
+in the Last Place) comparison may be called for. While this function
+may prove useful in such situations, It is not intended to be used in
+that way.
+
+
+Other Approaches
+================
+
+``unittest.TestCase.assertAlmostEqual``
+---------------------------------------
+
+(https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmostEqual)
+
+Tests that values are approximately (or not approximately) equal by
+computing the difference, rounding to the given number of decimal
+places (default 7), and comparing to zero.
+
+This method was not selected for this proposal, as the use of decimal
+digits is a specific, not generally useful or flexible test.
+
+numpy ``is_close()``
+--------------------
+
+http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html
+
+The numpy package provides the vectorized functions is_close() and
+all_close, for similar use cases as this proposal:
+
+``isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)``
+
+      Returns a boolean array where two arrays are element-wise equal
+      within a tolerance.
+
+      The tolerance values are positive, typically very small numbers.
+      The relative difference (rtol * abs(b)) and the absolute
+      difference atol are added together to compare against the
+      absolute difference between a and b
+
+In this approach, the absolute and relative tolerance are added
+together, rather than the ``or`` method used in this proposal. This is
+computationally more simple, and if relative tolerance is larger than
+the absolute tolerance, then the addition will have no effect. But if
+the absolute and relative tolerances are of similar magnitude, then
+the allowed difference will be about twice as large as expected.
+
+Also, if the value passed in are small compared to the absolute
+tolerance, then the relative tolerance will be completely swamped,
+perhaps unexpectedly.
+
+This is why, in this proposal, the absolute tolerance defaults to zero
+-- the user will be required to choose a value appropriate for the
+values at hand.
+
+
+Boost floating-point comparison
+-------------------------------
+
+The Boost project ( [3]_ ) provides a floating point comparison
+function. Is is a symetric approach, with both "weak" (larger of the
+two relative errors) and "strong" (smaller of the two relative errors)
+options.
+
+It was decided that a method that clearly defined which value was used
+to scale the relative error would be more appropriate for the standard
+library.
+
+References
+==========
+
+.. [1] Python-ideas list discussion thread
+   (https://mail.python.org/pipermail/python-ideas/2015-January/030947.html)
+
+.. [2] Wikipedaia page on relative difference
+   (http://en.wikipedia.org/wiki/Relative_change_and_difference)
+
+.. [3] Boost project floating-point comparison algorithms
+   (http://www.boost.org/doc/libs/1_35_0/libs/test/doc/components/test_tools/floating_point_comparison.html)
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8