Checking in some text. Most of this is simply an Appendix repeating
what David Ascher said in 1998 (before the starship bites the dust again).
This commit is contained in:
parent
9599e69426
commit
ce5b583ec5
323
pep-0207.txt
323
pep-0207.txt
|
@ -1,11 +1,332 @@
|
|||
PEP: 207
|
||||
Title: Rich Comparisions
|
||||
Version: $Revision$
|
||||
Author: mal@lemburg.com (Marc-Andre Lemburg), guido@python.org (Guido van Rossum)
|
||||
Author: guido@python.org (Guido van Rossum), mal@lemburg.com (Marc-Andre Lemburg)
|
||||
Python-Version: 2.1
|
||||
Status: Incomplete
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP proposes several new features for comparisons:
|
||||
|
||||
- Allow separately overloading of <, >, <=, >=, ==, !=, both in
|
||||
classes and in C extensions.
|
||||
|
||||
- Allow any of those overloaded operators to return something else
|
||||
besides a Boolean result.
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
The main motivation comes from NumPy, whose users agree that A<B
|
||||
should return an array of elementwise comparison outcomes; they
|
||||
currently have to spell this as less(A,B) because A<B can only
|
||||
return a Boolean result or an exception.
|
||||
|
||||
An additional motivation is that frequently, types don't have a
|
||||
natural ordering, but still need to be compared for equality.
|
||||
Currenlty such a type *must* implement comparison and thus assign
|
||||
an arbitrary ordering, just so that equality can be tested.
|
||||
|
||||
More motivation can be found in the proposals listed under
|
||||
previous work below.
|
||||
|
||||
|
||||
Previous Work
|
||||
|
||||
Rich Comparisons have been proposed before; in particular by David
|
||||
Ascher, after experience with Numerical Python:
|
||||
|
||||
http://starship.python.net/crew/da/proposals/richcmp.html
|
||||
|
||||
It is also included as an appendix. In this proposal, David also
|
||||
proposes the addition of an optional 3rd argument to cmp(), as in:
|
||||
cmp(a, b, "<") or cmp(a, b, "!=").
|
||||
|
||||
|
||||
Concerns
|
||||
|
||||
- Backwards compatibility, both at the Python level (classes using
|
||||
__cmp__ need not be changed) and at the C level (extensions
|
||||
defining tp_compare need not be changed).
|
||||
|
||||
- When A<B returns a matrix of elementwise comparisons, an easy
|
||||
mistake to make is to use this expression in a Boolean context.
|
||||
Without special precautions, it would always be true. This use
|
||||
should raise an exception instead.
|
||||
|
||||
- If a class overrides x==y but nothing else, should x!=y be
|
||||
computed as not(x==y), or fail? (I think this is OK; David
|
||||
disagrees.)
|
||||
|
||||
- Similarly, should we allow x<y to be calculated from y>x? (I
|
||||
think this is OK; David agrees.)
|
||||
|
||||
- Similarly, should we allow x<=y to be calculated from not(x>y)?
|
||||
(I think this is *not* OK; neither does David.)
|
||||
|
||||
- When using comparisons to generate elementwise comparisons, what
|
||||
to do about shortcut operators like A<B<C or ``A<B and C<D''?
|
||||
(David proposes a solution for A<B<C, but it means that ``if
|
||||
A<B:...'' will assume ``if true:...''.
|
||||
|
||||
|
||||
Solution
|
||||
|
||||
To be done.
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
Appendix
|
||||
|
||||
Here, for posterity, is most of David Ascher's original proposal.
|
||||
It addresses almost all concerns.
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
A new mechanism allowing comparisons of Python objects to return
|
||||
values other than -1, 0, or 1 (or raise exceptions) is
|
||||
proposed. This mechanism is entirely backwards compatible, and can
|
||||
be controlled at the level of the C PyObject type or of the Python
|
||||
class definition. There are three cooperating parts to the
|
||||
proposed mechanism:
|
||||
|
||||
- the use of the last slot in the type object structure to store a
|
||||
pointer to a rich comparison function
|
||||
|
||||
- the addition of special methods for classes
|
||||
|
||||
- the addition of an optional argument to the builtin cmp()
|
||||
function.
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
The current comparison protocol for Python objects assumes that
|
||||
any two Python objects can be compared (as of Python 1.5, object
|
||||
comparisons can raise exceptions), and that the return value for
|
||||
any comparison should be -1, 0 or 1. -1 indicates that the first
|
||||
argument to the comparison function is less than the right one, +1
|
||||
indicating the contrapositive, and 0 indicating that the two
|
||||
objects are equal. While this mechanism allows the establishment
|
||||
of a order relationship (e.g. for use by the sort() method of list
|
||||
objects), it has proven to be limited in the context of Numeric
|
||||
Python (NumPy).
|
||||
|
||||
Specifically, NumPy allows the creation of multidimensional
|
||||
arrays, which support most of the numeric operators. Thus:
|
||||
|
||||
x = array((1,2,3,4)) y = array((2,2,4,4))
|
||||
|
||||
are two NumPy arrays. While they can be added elementwise,:
|
||||
|
||||
z = x + y # z == array((3,4,7,8))
|
||||
|
||||
they cannot be compared in the current framework - the released
|
||||
version of NumPy compares the pointers, (thus yielding junk
|
||||
information) which was the only solution before the recent
|
||||
addition of the ability (in 1.5) to raise exceptions in comparison
|
||||
functions.
|
||||
|
||||
Even with the ability to raise exceptions, the current protocol
|
||||
makes array comparisons useless. To deal with this fact, NumPy
|
||||
includes several functions which perform the comparisons: less(),
|
||||
less_equal(), greater(), greater_equal(), equal(),
|
||||
not_equal(). These functions return arrays with the same shape as
|
||||
their arguments (modulo broadcasting), filled with 0's and 1's
|
||||
depending on whether the comparison is true or not for each
|
||||
element pair. Thus, for example, using the arrays x and y defined
|
||||
above:
|
||||
|
||||
less(x,y)
|
||||
|
||||
would be an array containing the numbers (1,0,0,0).
|
||||
|
||||
The current proposal is to modify the Python object interface to
|
||||
allow the NumPy package to make it so that x < y returns the same
|
||||
thing as less(x,y). The exact return value is up to the NumPy
|
||||
package -- what this proposal really asks for is changing the
|
||||
Python core so that extension objects have the ability to return
|
||||
something other than -1, 0, 1, should their authors choose to do
|
||||
so.
|
||||
|
||||
Current State of Affairs
|
||||
|
||||
The current protocol is, at the C level, that each object type
|
||||
defines a tp_compare slot, which is a pointer to a function which
|
||||
takes two PyObject* references and returns -1, 0, or 1. This
|
||||
function is called by the PyObject_Compare() function defined in
|
||||
the C API. PyObject_Compare() is also called by the builtin
|
||||
function cmp() which takes two arguments.
|
||||
|
||||
1. Changes to the C structure for type objects
|
||||
|
||||
The last availabel slot in the PyTypeObject, reserved up to now
|
||||
for future expansion, is used to optionally store a pointer to a
|
||||
new comparison function, of type richcmpfunc defined by:
|
||||
|
||||
typedef PyObject *(*richcmpfunc)
|
||||
Py_PROTO((PyObject *, PyObject *, int));
|
||||
|
||||
This function takes three arguments. The first two are the objects
|
||||
to be compared, and the third is an integer corresponding to an
|
||||
opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,
|
||||
then rich comparison for that object type is not supported (except
|
||||
for class instances whose class provide the special methods
|
||||
described below).
|
||||
|
||||
The above opcodes need to be added to the published Python/C API
|
||||
(probably under the names Py_LT, Py_LE, etc.)
|
||||
|
||||
2. Additions of special methods for classes
|
||||
|
||||
Classes wishing to support the rich comparison mechanisms must add
|
||||
one or more of the following new special methods:
|
||||
|
||||
def __lt__(self, other):
|
||||
...
|
||||
def __le__(self, other):
|
||||
...
|
||||
def __gt__(self, other):
|
||||
...
|
||||
def __ge__(self, other):
|
||||
...
|
||||
def __eq__(self, other):
|
||||
...
|
||||
def __ne__(self, other):
|
||||
...
|
||||
|
||||
Each of these is called when the class instance is the on the
|
||||
left-hand-side of the corresponding operators (<, <=, >, >=, ==,
|
||||
and != or <>). The argument other is set to the object on the
|
||||
right side of the operator. The return value of these methods is
|
||||
up to the class implementor (after all, that's the entire point of
|
||||
the proposal).
|
||||
|
||||
If the object on the left side of the operator does not define an
|
||||
appropriate rich comparison operator (either at the C level or
|
||||
with one of the special methods, then the comparison is reversed,
|
||||
and the right hand operator is called with the opposite operator,
|
||||
and the two objects are swapped. This assumes that a < b and b > a
|
||||
are equivalent, as are a <= b and b >= a, and that == and != are
|
||||
commutative (e.g. a == b if and only if b == a).
|
||||
|
||||
For example, if obj1 is an object which supports the rich
|
||||
comparison protocol and x and y are objects which do not support
|
||||
the rich comparison protocol, then obj1 < x will call the __lt__
|
||||
method of obj1 with x as the second argument. x < obj1 will call
|
||||
obj1's __gt__ method with x as a second argument, and x < y will
|
||||
just use the existing (non-rich) comparison mechanism.
|
||||
|
||||
The above mechanism is such that classes can get away with not
|
||||
implementing either __lt__ and __le__ or __gt__ and
|
||||
__ge__. Further smarts could have been added to the comparison
|
||||
mechanism, but this limited set of allowed "swaps" was chosen
|
||||
because it doesn't require the infrastructure to do any processing
|
||||
(negation) of return values. The choice of six special methods was
|
||||
made over a single (e.g. __richcmp__) method to allow the
|
||||
dispatching on the opcode to be performed at the level of the C
|
||||
implementation rather than the user-defined method.
|
||||
|
||||
3. Addition of an optional argument to the builtin cmp()
|
||||
|
||||
The builtin cmp() is still used for simple comparisons. For rich
|
||||
comparisons, it is called with a third argument, one of "<", "<=",
|
||||
">", ">=", "==", "!=", "<>" (the last two have the same
|
||||
meaning). When called with one of these strings as the third
|
||||
argument, cmp() can return any Python object. Otherwise, it can
|
||||
only return -1, 0 or 1 as before.
|
||||
|
||||
Chained Comparisons
|
||||
|
||||
Problem
|
||||
|
||||
It would be nice to allow objects for which the comparison returns
|
||||
something other than -1, 0, or 1 to be used in chained
|
||||
comparisons, such as:
|
||||
|
||||
x < y < z
|
||||
|
||||
Currently, this is interpreted by Python as:
|
||||
|
||||
temp1 = x < y
|
||||
if temp1:
|
||||
return y < z
|
||||
else:
|
||||
return temp1
|
||||
|
||||
Note that this requires testing the truth value of the result of
|
||||
comparisons, with potential "shortcutting" of the right-side
|
||||
comparison testings. In other words, the truth-value of the result
|
||||
of the result of the comparison determines the result of a chained
|
||||
operation. This is problematic in the case of arrays, since if x,
|
||||
y and z are three arrays, then the user expects:
|
||||
|
||||
x < y < z
|
||||
|
||||
to be an array of 0's and 1's where 1's are in the locations
|
||||
corresponding to the elements of y which are between the
|
||||
corresponding elements in x and z. In other words, the right-hand
|
||||
side must be evaluated regardless of the result of x < y, which is
|
||||
incompatible with the mechanism currently in use by the parser.
|
||||
|
||||
Solution
|
||||
|
||||
Guido mentioned that one possible way out would be to change the
|
||||
code generated by chained comparisons to allow arrays to be
|
||||
chained-compared intelligently. What follows is a mixture of his
|
||||
idea and my suggestions. The code generated for x < y < z would be
|
||||
equivalent to:
|
||||
|
||||
temp1 = x < y
|
||||
if temp1:
|
||||
temp2 = y < z
|
||||
return boolean_combine(temp1, temp2)
|
||||
else:
|
||||
return temp1
|
||||
|
||||
where boolean_combine is a new function which does something like
|
||||
the following:
|
||||
|
||||
def boolean_combine(a, b):
|
||||
if hasattr(a, '__boolean_and__') or \
|
||||
hasattr(b, '__boolean_and__'):
|
||||
try:
|
||||
return a.__boolean_and__(b)
|
||||
except:
|
||||
return b.__boolean_and__(a)
|
||||
else: # standard behavior
|
||||
if a:
|
||||
return b
|
||||
else:
|
||||
return 0
|
||||
|
||||
where the __boolean_and__ special method is implemented for
|
||||
C-level types by another value of the third argument to the
|
||||
richcmp function. This method would perform a boolean comparison
|
||||
of the arrays (currently implemented in the umath module as the
|
||||
logical_and ufunc).
|
||||
|
||||
Thus, objects returned by rich comparisons should always test
|
||||
true, but should define another special method which creates
|
||||
boolean combinations of them and their argument.
|
||||
|
||||
This solution has the advantage of allowing chained comparisons to
|
||||
work for arrays, but the disadvantage that it requires comparison
|
||||
arrays to always return true (in an ideal world, I'd have them
|
||||
always raise an exception on truth testing, since the meaning of
|
||||
testing "if a>b:" is massively ambiguous.
|
||||
|
||||
The inlining already present which deals with integer comparisons
|
||||
would still apply, resulting in no performance cost for the most
|
||||
common cases.
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
|
|
Loading…
Reference in New Issue