python-peps/pep-0207.txt

PEP: 207
Title: Rich Comparisions
Version: $Revision$
Author: guido@python.org (Guido van Rossum), mal@lemburg.com (Marc-Andre Lemburg)
Python-Version: 2.1
Status: Incomplete


Abstract

    This PEP proposes several new features for comparisons:

    - Allow separately overloading of <, >, <=, >=, ==, !=, both in
      classes and in C extensions.

    - Allow any of those overloaded operators to return something else
      besides a Boolean result.


Motivation

    The main motivation comes from NumPy, whose users agree that A<B
    should return an array of elementwise comparison outcomes; they
    currently have to spell this as less(A,B) because A<B can only
    return a Boolean result or an exception.

    An additional motivation is that frequently, types don't have a
    natural ordering, but still need to be compared for equality.
    Currenlty such a type *must* implement comparison and thus assign
    an arbitrary ordering, just so that equality can be tested.

    More motivation can be found in the proposals listed under
    previous work below.


Previous Work

    Rich Comparisons have been proposed before; in particular by David
    Ascher, after experience with Numerical Python:

      http://starship.python.net/crew/da/proposals/richcmp.html

    It is also included as an appendix.  In this proposal, David also
    proposes the addition of an optional 3rd argument to cmp(), as in:
    cmp(a, b, "<") or cmp(a, b, "!=").


Concerns

    - Backwards compatibility, both at the Python level (classes using
      __cmp__ need not be changed) and at the C level (extensions
      defining tp_compare need not be changed).

    - When A<B returns a matrix of elementwise comparisons, an easy
      mistake to make is to use this expression in a Boolean context.
      Without special precautions, it would always be true.  This use
      should raise an exception instead.

    - If a class overrides x==y but nothing else, should x!=y be
      computed as not(x==y), or fail?  (I think this is OK; David
      disagrees.)

    - Similarly, should we allow x<y to be calculated from y>x?  (I
      think this is OK; David agrees.)

    - Similarly, should we allow x<=y to be calculated from not(x>y)?
      (I think this is *not* OK; neither does David.)

    - When using comparisons to generate elementwise comparisons, what
      to do about shortcut operators like A<B<C or ``A<B and C<D''?
      (David proposes a solution for A<B<C, but it means that ``if
      A<B:...'' will assume ``if true:...''.


Solution

    To be done.


Copyright

    This document has been placed in the public domain.


Appendix

    Here, for posterity, is most of David Ascher's original proposal.
    It addresses almost all concerns.


Abstract

    A new mechanism allowing comparisons of Python objects to return
    values other than -1, 0, or 1 (or raise exceptions) is
    proposed. This mechanism is entirely backwards compatible, and can
    be controlled at the level of the C PyObject type or of the Python
    class definition. There are three cooperating parts to the
    proposed mechanism:

    - the use of the last slot in the type object structure to store a
    pointer to a rich comparison function

    - the addition of special methods for classes

    - the addition of an optional argument to the builtin cmp()
    function.


Motivation

    The current comparison protocol for Python objects assumes that
    any two Python objects can be compared (as of Python 1.5, object
    comparisons can raise exceptions), and that the return value for
    any comparison should be -1, 0 or 1. -1 indicates that the first
    argument to the comparison function is less than the right one, +1
    indicating the contrapositive, and 0 indicating that the two
    objects are equal. While this mechanism allows the establishment
    of a order relationship (e.g. for use by the sort() method of list
    objects), it has proven to be limited in the context of Numeric
    Python (NumPy).

    Specifically, NumPy allows the creation of multidimensional
    arrays, which support most of the numeric operators. Thus:

             x = array((1,2,3,4))        y = array((2,2,4,4))

    are two NumPy arrays. While they can be added elementwise,:

             z = x + y   # z == array((3,4,7,8))

    they cannot be compared in the current framework - the released
    version of NumPy compares the pointers, (thus yielding junk
    information) which was the only solution before the recent
    addition of the ability (in 1.5) to raise exceptions in comparison
    functions.

    Even with the ability to raise exceptions, the current protocol
    makes array comparisons useless. To deal with this fact, NumPy
    includes several functions which perform the comparisons: less(),
    less_equal(), greater(), greater_equal(), equal(),
    not_equal(). These functions return arrays with the same shape as
    their arguments (modulo broadcasting), filled with 0's and 1's
    depending on whether the comparison is true or not for each
    element pair. Thus, for example, using the arrays x and y defined
    above:

             less(x,y)

    would be an array containing the numbers (1,0,0,0).

    The current proposal is to modify the Python object interface to
    allow the NumPy package to make it so that x < y returns the same
    thing as less(x,y). The exact return value is up to the NumPy
    package -- what this proposal really asks for is changing the
    Python core so that extension objects have the ability to return
    something other than -1, 0, 1, should their authors choose to do
    so.

Current State of Affairs

    The current protocol is, at the C level, that each object type
    defines a tp_compare slot, which is a pointer to a function which
    takes two PyObject* references and returns -1, 0, or 1. This
    function is called by the PyObject_Compare() function defined in
    the C API. PyObject_Compare() is also called by the builtin
    function cmp() which takes two arguments.

    1. Changes to the C structure for type objects

    The last availabel slot in the PyTypeObject, reserved up to now
    for future expansion, is used to optionally store a pointer to a
    new comparison function, of type richcmpfunc defined by:

           typedef PyObject *(*richcmpfunc)
                Py_PROTO((PyObject *, PyObject *, int));

    This function takes three arguments. The first two are the objects
    to be compared, and the third is an integer corresponding to an
    opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,
    then rich comparison for that object type is not supported (except
    for class instances whose class provide the special methods
    described below).

    The above opcodes need to be added to the published Python/C API
    (probably under the names Py_LT, Py_LE, etc.)

    2. Additions of special methods for classes

    Classes wishing to support the rich comparison mechanisms must add
    one or more of the following new special methods:

             def __lt__(self, other):
                ...
             def __le__(self, other):
                ...
             def __gt__(self, other):
                ...
             def __ge__(self, other):
                ...
             def __eq__(self, other):
                ...
             def __ne__(self, other):
                ...

    Each of these is called when the class instance is the on the
    left-hand-side of the corresponding operators (<, <=, >, >=, ==,
    and != or <>). The argument other is set to the object on the
    right side of the operator. The return value of these methods is
    up to the class implementor (after all, that's the entire point of
    the proposal).

    If the object on the left side of the operator does not define an
    appropriate rich comparison operator (either at the C level or
    with one of the special methods, then the comparison is reversed,
    and the right hand operator is called with the opposite operator,
    and the two objects are swapped. This assumes that a < b and b > a
    are equivalent, as are a <= b and b >= a, and that == and != are
    commutative (e.g. a == b if and only if b == a).

    For example, if obj1 is an object which supports the rich
    comparison protocol and x and y are objects which do not support
    the rich comparison protocol, then obj1 < x will call the __lt__
    method of obj1 with x as the second argument. x < obj1 will call
    obj1's __gt__ method with x as a second argument, and x < y will
    just use the existing (non-rich) comparison mechanism.

    The above mechanism is such that classes can get away with not
    implementing either __lt__ and __le__ or __gt__ and
    __ge__. Further smarts could have been added to the comparison
    mechanism, but this limited set of allowed "swaps" was chosen
    because it doesn't require the infrastructure to do any processing
    (negation) of return values. The choice of six special methods was
    made over a single (e.g. __richcmp__) method to allow the
    dispatching on the opcode to be performed at the level of the C
    implementation rather than the user-defined method.

    3. Addition of an optional argument to the builtin cmp()

    The builtin cmp() is still used for simple comparisons. For rich
    comparisons, it is called with a third argument, one of "<", "<=",
    ">", ">=", "==", "!=", "<>" (the last two have the same
    meaning). When called with one of these strings as the third
    argument, cmp() can return any Python object. Otherwise, it can
    only return -1, 0 or 1 as before.

Chained Comparisons

    Problem

    It would be nice to allow objects for which the comparison returns
    something other than -1, 0, or 1 to be used in chained
    comparisons, such as:

             x < y < z

    Currently, this is interpreted by Python as:

             temp1 = x < y
             if temp1:
               return y < z
             else:
               return temp1

    Note that this requires testing the truth value of the result of
    comparisons, with potential "shortcutting" of the right-side
    comparison testings. In other words, the truth-value of the result
    of the result of the comparison determines the result of a chained
    operation. This is problematic in the case of arrays, since if x,
    y and z are three arrays, then the user expects:

            x < y < z

    to be an array of 0's and 1's where 1's are in the locations
    corresponding to the elements of y which are between the
    corresponding elements in x and z. In other words, the right-hand
    side must be evaluated regardless of the result of x < y, which is
    incompatible with the mechanism currently in use by the parser.

    Solution

    Guido mentioned that one possible way out would be to change the
    code generated by chained comparisons to allow arrays to be
    chained-compared intelligently. What follows is a mixture of his
    idea and my suggestions. The code generated for x < y < z would be
    equivalent to:

             temp1 = x < y
             if temp1:
               temp2 = y < z
               return boolean_combine(temp1, temp2)
             else:
               return temp1

    where boolean_combine is a new function which does something like
    the following:

             def boolean_combine(a, b):
                 if hasattr(a, '__boolean_and__') or \
                    hasattr(b, '__boolean_and__'):
                     try:
                         return a.__boolean_and__(b)
                     except:
                         return b.__boolean_and__(a)
                 else: # standard behavior
                     if a:
                         return b
                     else:
                         return 0

    where the __boolean_and__ special method is implemented for
    C-level types by another value of the third argument to the
    richcmp function. This method would perform a boolean comparison
    of the arrays (currently implemented in the umath module as the
    logical_and ufunc).

    Thus, objects returned by rich comparisons should always test
    true, but should define another special method which creates
    boolean combinations of them and their argument.

    This solution has the advantage of allowing chained comparisons to
    work for arrays, but the disadvantage that it requires comparison
    arrays to always return true (in an ideal world, I'd have them
    always raise an exception on truth testing, since the meaning of
    testing "if a>b:" is massively ambiguous.

    The inlining already present which deals with integer comparisons
    would still apply, resulting in no performance cost for the most
    common cases.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: