Checking in some text. Most of this is simply an Appendix repeating

what David Ascher said in 1998 (before the starship bites the dust
again).
This commit is contained in:
Guido van Rossum 2000-12-04 20:32:13 +00:00
parent 9599e69426
commit ce5b583ec5
1 changed files with 322 additions and 1 deletions

View File

@ -1,11 +1,332 @@
PEP: 207
Title: Rich Comparisions
Version: $Revision$
Author: mal@lemburg.com (Marc-Andre Lemburg), guido@python.org (Guido van Rossum)
Author: guido@python.org (Guido van Rossum), mal@lemburg.com (Marc-Andre Lemburg)
Python-Version: 2.1
Status: Incomplete
Abstract
This PEP proposes several new features for comparisons:
- Allow separately overloading of <, >, <=, >=, ==, !=, both in
classes and in C extensions.
- Allow any of those overloaded operators to return something else
besides a Boolean result.
Motivation
The main motivation comes from NumPy, whose users agree that A<B
should return an array of elementwise comparison outcomes; they
currently have to spell this as less(A,B) because A<B can only
return a Boolean result or an exception.
An additional motivation is that frequently, types don't have a
natural ordering, but still need to be compared for equality.
Currenlty such a type *must* implement comparison and thus assign
an arbitrary ordering, just so that equality can be tested.
More motivation can be found in the proposals listed under
previous work below.
Previous Work
Rich Comparisons have been proposed before; in particular by David
Ascher, after experience with Numerical Python:
http://starship.python.net/crew/da/proposals/richcmp.html
It is also included as an appendix. In this proposal, David also
proposes the addition of an optional 3rd argument to cmp(), as in:
cmp(a, b, "<") or cmp(a, b, "!=").
Concerns
- Backwards compatibility, both at the Python level (classes using
__cmp__ need not be changed) and at the C level (extensions
defining tp_compare need not be changed).
- When A<B returns a matrix of elementwise comparisons, an easy
mistake to make is to use this expression in a Boolean context.
Without special precautions, it would always be true. This use
should raise an exception instead.
- If a class overrides x==y but nothing else, should x!=y be
computed as not(x==y), or fail? (I think this is OK; David
disagrees.)
- Similarly, should we allow x<y to be calculated from y>x? (I
think this is OK; David agrees.)
- Similarly, should we allow x<=y to be calculated from not(x>y)?
(I think this is *not* OK; neither does David.)
- When using comparisons to generate elementwise comparisons, what
to do about shortcut operators like A<B<C or ``A<B and C<D''?
(David proposes a solution for A<B<C, but it means that ``if
A<B:...'' will assume ``if true:...''.
Solution
To be done.
Copyright
This document has been placed in the public domain.
Appendix
Here, for posterity, is most of David Ascher's original proposal.
It addresses almost all concerns.
Abstract
A new mechanism allowing comparisons of Python objects to return
values other than -1, 0, or 1 (or raise exceptions) is
proposed. This mechanism is entirely backwards compatible, and can
be controlled at the level of the C PyObject type or of the Python
class definition. There are three cooperating parts to the
proposed mechanism:
- the use of the last slot in the type object structure to store a
pointer to a rich comparison function
- the addition of special methods for classes
- the addition of an optional argument to the builtin cmp()
function.
Motivation
The current comparison protocol for Python objects assumes that
any two Python objects can be compared (as of Python 1.5, object
comparisons can raise exceptions), and that the return value for
any comparison should be -1, 0 or 1. -1 indicates that the first
argument to the comparison function is less than the right one, +1
indicating the contrapositive, and 0 indicating that the two
objects are equal. While this mechanism allows the establishment
of a order relationship (e.g. for use by the sort() method of list
objects), it has proven to be limited in the context of Numeric
Python (NumPy).
Specifically, NumPy allows the creation of multidimensional
arrays, which support most of the numeric operators. Thus:
x = array((1,2,3,4)) y = array((2,2,4,4))
are two NumPy arrays. While they can be added elementwise,:
z = x + y # z == array((3,4,7,8))
they cannot be compared in the current framework - the released
version of NumPy compares the pointers, (thus yielding junk
information) which was the only solution before the recent
addition of the ability (in 1.5) to raise exceptions in comparison
functions.
Even with the ability to raise exceptions, the current protocol
makes array comparisons useless. To deal with this fact, NumPy
includes several functions which perform the comparisons: less(),
less_equal(), greater(), greater_equal(), equal(),
not_equal(). These functions return arrays with the same shape as
their arguments (modulo broadcasting), filled with 0's and 1's
depending on whether the comparison is true or not for each
element pair. Thus, for example, using the arrays x and y defined
above:
less(x,y)
would be an array containing the numbers (1,0,0,0).
The current proposal is to modify the Python object interface to
allow the NumPy package to make it so that x < y returns the same
thing as less(x,y). The exact return value is up to the NumPy
package -- what this proposal really asks for is changing the
Python core so that extension objects have the ability to return
something other than -1, 0, 1, should their authors choose to do
so.
Current State of Affairs
The current protocol is, at the C level, that each object type
defines a tp_compare slot, which is a pointer to a function which
takes two PyObject* references and returns -1, 0, or 1. This
function is called by the PyObject_Compare() function defined in
the C API. PyObject_Compare() is also called by the builtin
function cmp() which takes two arguments.
1. Changes to the C structure for type objects
The last availabel slot in the PyTypeObject, reserved up to now
for future expansion, is used to optionally store a pointer to a
new comparison function, of type richcmpfunc defined by:
typedef PyObject *(*richcmpfunc)
Py_PROTO((PyObject *, PyObject *, int));
This function takes three arguments. The first two are the objects
to be compared, and the third is an integer corresponding to an
opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,
then rich comparison for that object type is not supported (except
for class instances whose class provide the special methods
described below).
The above opcodes need to be added to the published Python/C API
(probably under the names Py_LT, Py_LE, etc.)
2. Additions of special methods for classes
Classes wishing to support the rich comparison mechanisms must add
one or more of the following new special methods:
def __lt__(self, other):
...
def __le__(self, other):
...
def __gt__(self, other):
...
def __ge__(self, other):
...
def __eq__(self, other):
...
def __ne__(self, other):
...
Each of these is called when the class instance is the on the
left-hand-side of the corresponding operators (<, <=, >, >=, ==,
and != or <>). The argument other is set to the object on the
right side of the operator. The return value of these methods is
up to the class implementor (after all, that's the entire point of
the proposal).
If the object on the left side of the operator does not define an
appropriate rich comparison operator (either at the C level or
with one of the special methods, then the comparison is reversed,
and the right hand operator is called with the opposite operator,
and the two objects are swapped. This assumes that a < b and b > a
are equivalent, as are a <= b and b >= a, and that == and != are
commutative (e.g. a == b if and only if b == a).
For example, if obj1 is an object which supports the rich
comparison protocol and x and y are objects which do not support
the rich comparison protocol, then obj1 < x will call the __lt__
method of obj1 with x as the second argument. x < obj1 will call
obj1's __gt__ method with x as a second argument, and x < y will
just use the existing (non-rich) comparison mechanism.
The above mechanism is such that classes can get away with not
implementing either __lt__ and __le__ or __gt__ and
__ge__. Further smarts could have been added to the comparison
mechanism, but this limited set of allowed "swaps" was chosen
because it doesn't require the infrastructure to do any processing
(negation) of return values. The choice of six special methods was
made over a single (e.g. __richcmp__) method to allow the
dispatching on the opcode to be performed at the level of the C
implementation rather than the user-defined method.
3. Addition of an optional argument to the builtin cmp()
The builtin cmp() is still used for simple comparisons. For rich
comparisons, it is called with a third argument, one of "<", "<=",
">", ">=", "==", "!=", "<>" (the last two have the same
meaning). When called with one of these strings as the third
argument, cmp() can return any Python object. Otherwise, it can
only return -1, 0 or 1 as before.
Chained Comparisons
Problem
It would be nice to allow objects for which the comparison returns
something other than -1, 0, or 1 to be used in chained
comparisons, such as:
x < y < z
Currently, this is interpreted by Python as:
temp1 = x < y
if temp1:
return y < z
else:
return temp1
Note that this requires testing the truth value of the result of
comparisons, with potential "shortcutting" of the right-side
comparison testings. In other words, the truth-value of the result
of the result of the comparison determines the result of a chained
operation. This is problematic in the case of arrays, since if x,
y and z are three arrays, then the user expects:
x < y < z
to be an array of 0's and 1's where 1's are in the locations
corresponding to the elements of y which are between the
corresponding elements in x and z. In other words, the right-hand
side must be evaluated regardless of the result of x < y, which is
incompatible with the mechanism currently in use by the parser.
Solution
Guido mentioned that one possible way out would be to change the
code generated by chained comparisons to allow arrays to be
chained-compared intelligently. What follows is a mixture of his
idea and my suggestions. The code generated for x < y < z would be
equivalent to:
temp1 = x < y
if temp1:
temp2 = y < z
return boolean_combine(temp1, temp2)
else:
return temp1
where boolean_combine is a new function which does something like
the following:
def boolean_combine(a, b):
if hasattr(a, '__boolean_and__') or \
hasattr(b, '__boolean_and__'):
try:
return a.__boolean_and__(b)
except:
return b.__boolean_and__(a)
else: # standard behavior
if a:
return b
else:
return 0
where the __boolean_and__ special method is implemented for
C-level types by another value of the third argument to the
richcmp function. This method would perform a boolean comparison
of the arrays (currently implemented in the umath module as the
logical_and ufunc).
Thus, objects returned by rich comparisons should always test
true, but should define another special method which creates
boolean combinations of them and their argument.
This solution has the advantage of allowing chained comparisons to
work for arrays, but the disadvantage that it requires comparison
arrays to always return true (in an ideal world, I'd have them
always raise an exception on truth testing, since the meaning of
testing "if a>b:" is massively ambiguous.
The inlining already present which deals with integer comparisons
would still apply, resulting in no performance cost for the most
common cases.
Local Variables:
mode: indented-text