Checking in some text. Most of this is simply an Appendix repeating

what David Ascher said in 1998 (before the starship bites the dust again).
2000-12-04 20:32:13 +00:00 · 2000-12-04 20:32:13 +00:00 · ce5b583ec5
parent 9599e69426
commit ce5b583ec5
1 changed files with 322 additions and 1 deletions
--- a/pep-0207.txt
+++ b/pep-0207.txt
@ -1,11 +1,332 @@
 PEP: 207
 Title: Rich Comparisions
 Version: $Revision$
-Author: mal@lemburg.com (Marc-Andre Lemburg), guido@python.org (Guido van Rossum)
+Author: guido@python.org (Guido van Rossum), mal@lemburg.com (Marc-Andre Lemburg)
 Python-Version: 2.1
 Status: Incomplete


+Abstract
+
+    This PEP proposes several new features for comparisons:
+
+    - Allow separately overloading of <, >, <=, >=, ==, !=, both in
+      classes and in C extensions.
+
+    - Allow any of those overloaded operators to return something else
+      besides a Boolean result.
+
+
+Motivation
+
+    The main motivation comes from NumPy, whose users agree that A<B
+    should return an array of elementwise comparison outcomes; they
+    currently have to spell this as less(A,B) because A<B can only
+    return a Boolean result or an exception.
+
+    An additional motivation is that frequently, types don't have a
+    natural ordering, but still need to be compared for equality.
+    Currenlty such a type *must* implement comparison and thus assign
+    an arbitrary ordering, just so that equality can be tested.
+
+    More motivation can be found in the proposals listed under
+    previous work below.
+
+
+Previous Work
+
+    Rich Comparisons have been proposed before; in particular by David
+    Ascher, after experience with Numerical Python:
+
+      http://starship.python.net/crew/da/proposals/richcmp.html
+
+    It is also included as an appendix.  In this proposal, David also
+    proposes the addition of an optional 3rd argument to cmp(), as in:
+    cmp(a, b, "<") or cmp(a, b, "!=").
+
+
+Concerns
+
+    - Backwards compatibility, both at the Python level (classes using
+      __cmp__ need not be changed) and at the C level (extensions
+      defining tp_compare need not be changed).
+
+    - When A<B returns a matrix of elementwise comparisons, an easy
+      mistake to make is to use this expression in a Boolean context.
+      Without special precautions, it would always be true.  This use
+      should raise an exception instead.
+
+    - If a class overrides x==y but nothing else, should x!=y be
+      computed as not(x==y), or fail?  (I think this is OK; David
+      disagrees.)
+
+    - Similarly, should we allow x<y to be calculated from y>x?  (I
+      think this is OK; David agrees.)
+
+    - Similarly, should we allow x<=y to be calculated from not(x>y)?
+      (I think this is *not* OK; neither does David.)
+
+    - When using comparisons to generate elementwise comparisons, what
+      to do about shortcut operators like A<B<C or ``A<B and C<D''?
+      (David proposes a solution for A<B<C, but it means that ``if
+      A<B:...'' will assume ``if true:...''.
+
+
+Solution
+
+    To be done.
+
+
+Copyright
+
+    This document has been placed in the public domain.
+
+
+Appendix
+
+    Here, for posterity, is most of David Ascher's original proposal.
+    It addresses almost all concerns.
+
+
+Abstract
+
+    A new mechanism allowing comparisons of Python objects to return
+    values other than -1, 0, or 1 (or raise exceptions) is
+    proposed. This mechanism is entirely backwards compatible, and can
+    be controlled at the level of the C PyObject type or of the Python
+    class definition. There are three cooperating parts to the
+    proposed mechanism:
+
+    - the use of the last slot in the type object structure to store a
+    pointer to a rich comparison function
+
+    - the addition of special methods for classes
+
+    - the addition of an optional argument to the builtin cmp()
+    function.
+
+
+Motivation
+
+    The current comparison protocol for Python objects assumes that
+    any two Python objects can be compared (as of Python 1.5, object
+    comparisons can raise exceptions), and that the return value for
+    any comparison should be -1, 0 or 1. -1 indicates that the first
+    argument to the comparison function is less than the right one, +1
+    indicating the contrapositive, and 0 indicating that the two
+    objects are equal. While this mechanism allows the establishment
+    of a order relationship (e.g. for use by the sort() method of list
+    objects), it has proven to be limited in the context of Numeric
+    Python (NumPy).
+
+    Specifically, NumPy allows the creation of multidimensional
+    arrays, which support most of the numeric operators. Thus:
+
+             x = array((1,2,3,4))        y = array((2,2,4,4))
+
+    are two NumPy arrays. While they can be added elementwise,:
+
+             z = x + y   # z == array((3,4,7,8))
+
+    they cannot be compared in the current framework - the released
+    version of NumPy compares the pointers, (thus yielding junk
+    information) which was the only solution before the recent
+    addition of the ability (in 1.5) to raise exceptions in comparison
+    functions.
+
+    Even with the ability to raise exceptions, the current protocol
+    makes array comparisons useless. To deal with this fact, NumPy
+    includes several functions which perform the comparisons: less(),
+    less_equal(), greater(), greater_equal(), equal(),
+    not_equal(). These functions return arrays with the same shape as
+    their arguments (modulo broadcasting), filled with 0's and 1's
+    depending on whether the comparison is true or not for each
+    element pair. Thus, for example, using the arrays x and y defined
+    above:
+
+             less(x,y) 
+
+    would be an array containing the numbers (1,0,0,0).
+
+    The current proposal is to modify the Python object interface to
+    allow the NumPy package to make it so that x < y returns the same
+    thing as less(x,y). The exact return value is up to the NumPy
+    package -- what this proposal really asks for is changing the
+    Python core so that extension objects have the ability to return
+    something other than -1, 0, 1, should their authors choose to do
+    so.
+
+Current State of Affairs
+
+    The current protocol is, at the C level, that each object type
+    defines a tp_compare slot, which is a pointer to a function which
+    takes two PyObject* references and returns -1, 0, or 1. This
+    function is called by the PyObject_Compare() function defined in
+    the C API. PyObject_Compare() is also called by the builtin
+    function cmp() which takes two arguments.
+
+    1. Changes to the C structure for type objects
+
+    The last availabel slot in the PyTypeObject, reserved up to now
+    for future expansion, is used to optionally store a pointer to a
+    new comparison function, of type richcmpfunc defined by:
+
+           typedef PyObject *(*richcmpfunc)
+                Py_PROTO((PyObject *, PyObject *, int));
+
+    This function takes three arguments. The first two are the objects
+    to be compared, and the third is an integer corresponding to an
+    opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,
+    then rich comparison for that object type is not supported (except
+    for class instances whose class provide the special methods
+    described below).
+
+    The above opcodes need to be added to the published Python/C API
+    (probably under the names Py_LT, Py_LE, etc.)
+
+    2. Additions of special methods for classes
+
+    Classes wishing to support the rich comparison mechanisms must add
+    one or more of the following new special methods:
+
+             def __lt__(self, other):
+                ...
+             def __le__(self, other):
+                ...
+             def __gt__(self, other):
+                ...
+             def __ge__(self, other):
+                ...
+             def __eq__(self, other):
+                ...
+             def __ne__(self, other):
+                ...
+
+    Each of these is called when the class instance is the on the
+    left-hand-side of the corresponding operators (<, <=, >, >=, ==,
+    and != or <>). The argument other is set to the object on the
+    right side of the operator. The return value of these methods is
+    up to the class implementor (after all, that's the entire point of
+    the proposal).
+
+    If the object on the left side of the operator does not define an
+    appropriate rich comparison operator (either at the C level or
+    with one of the special methods, then the comparison is reversed,
+    and the right hand operator is called with the opposite operator,
+    and the two objects are swapped. This assumes that a < b and b > a
+    are equivalent, as are a <= b and b >= a, and that == and != are
+    commutative (e.g. a == b if and only if b == a).
+
+    For example, if obj1 is an object which supports the rich
+    comparison protocol and x and y are objects which do not support
+    the rich comparison protocol, then obj1 < x will call the __lt__
+    method of obj1 with x as the second argument. x < obj1 will call
+    obj1's __gt__ method with x as a second argument, and x < y will
+    just use the existing (non-rich) comparison mechanism.
+
+    The above mechanism is such that classes can get away with not
+    implementing either __lt__ and __le__ or __gt__ and
+    __ge__. Further smarts could have been added to the comparison
+    mechanism, but this limited set of allowed "swaps" was chosen
+    because it doesn't require the infrastructure to do any processing
+    (negation) of return values. The choice of six special methods was
+    made over a single (e.g. __richcmp__) method to allow the
+    dispatching on the opcode to be performed at the level of the C
+    implementation rather than the user-defined method.
+
+    3. Addition of an optional argument to the builtin cmp()
+
+    The builtin cmp() is still used for simple comparisons. For rich
+    comparisons, it is called with a third argument, one of "<", "<=",
+    ">", ">=", "==", "!=", "<>" (the last two have the same
+    meaning). When called with one of these strings as the third
+    argument, cmp() can return any Python object. Otherwise, it can
+    only return -1, 0 or 1 as before.
+
+Chained Comparisons
+
+    Problem
+
+    It would be nice to allow objects for which the comparison returns
+    something other than -1, 0, or 1 to be used in chained
+    comparisons, such as:
+
+             x < y < z
+
+    Currently, this is interpreted by Python as:
+
+             temp1 = x < y
+             if temp1:
+               return y < z
+             else:
+               return temp1     
+
+    Note that this requires testing the truth value of the result of
+    comparisons, with potential "shortcutting" of the right-side
+    comparison testings. In other words, the truth-value of the result
+    of the result of the comparison determines the result of a chained
+    operation. This is problematic in the case of arrays, since if x,
+    y and z are three arrays, then the user expects:
+
+            x < y < z
+
+    to be an array of 0's and 1's where 1's are in the locations
+    corresponding to the elements of y which are between the
+    corresponding elements in x and z. In other words, the right-hand
+    side must be evaluated regardless of the result of x < y, which is
+    incompatible with the mechanism currently in use by the parser.
+
+    Solution
+
+    Guido mentioned that one possible way out would be to change the
+    code generated by chained comparisons to allow arrays to be
+    chained-compared intelligently. What follows is a mixture of his
+    idea and my suggestions. The code generated for x < y < z would be
+    equivalent to:
+
+             temp1 = x < y
+             if temp1:
+               temp2 = y < z
+               return boolean_combine(temp1, temp2)
+             else:
+               return temp1     
+
+    where boolean_combine is a new function which does something like
+    the following:
+
+             def boolean_combine(a, b):
+                 if hasattr(a, '__boolean_and__') or \
+                    hasattr(b, '__boolean_and__'):
+                     try:
+                         return a.__boolean_and__(b)
+                     except:
+                         return b.__boolean_and__(a)
+                 else: # standard behavior
+                     if a:
+                         return b
+                     else: 
+                         return 0
+
+    where the __boolean_and__ special method is implemented for
+    C-level types by another value of the third argument to the
+    richcmp function. This method would perform a boolean comparison
+    of the arrays (currently implemented in the umath module as the
+    logical_and ufunc).
+
+    Thus, objects returned by rich comparisons should always test
+    true, but should define another special method which creates
+    boolean combinations of them and their argument.
+
+    This solution has the advantage of allowing chained comparisons to
+    work for arrays, but the disadvantage that it requires comparison
+    arrays to always return true (in an ideal world, I'd have them
+    always raise an exception on truth testing, since the meaning of
+    testing "if a>b:" is massively ambiguous.
+
+    The inlining already present which deals with integer comparisons
+    would still apply, resulting in no performance cost for the most
+    common cases.
+

 Local Variables:
 mode: indented-text