diff --git a/pep-0207.txt b/pep-0207.txt index 2feb11b73..bf963c5b3 100644 --- a/pep-0207.txt +++ b/pep-0207.txt @@ -5,459 +5,479 @@ Last-Modified: $Date$ Author: guido@python.org (Guido van Rossum), DavidA@ActiveState.com (David Ascher) Status: Final Type: Standards Track +Content-Type: text/x-rst Created: Python-Version: 2.1 Post-History: Abstract +======== - This PEP proposes several new features for comparisons: +This PEP proposes several new features for comparisons: - - Allow separately overloading of <, >, <=, >=, ==, !=, both in - classes and in C extensions. +- Allow separately overloading of <, >, <=, >=, ==, !=, both in + classes and in C extensions. - - Allow any of those overloaded operators to return something else - besides a Boolean result. +- Allow any of those overloaded operators to return something else + besides a Boolean result. Motivation +========== - The main motivation comes from NumPy, whose users agree that A=, or between > and <=? +3. If a class overrides x==y but nothing else, should x!=y be + computed as not(x==y), or fail? What about the similar + relationship between < and >=, or between > and <=? - 4 Similarly, should we allow xx? And - x<=y from not(x>y)? And x==y from y==x, or x!=y from y!=x? +4. Similarly, should we allow xx? And + x<=y from not(x>y)? And x==y from y==x, or x!=y from y!=x? - 5 When comparison operators return elementwise comparisons, what - to do about shortcut operators like A is tried and if - it returns true, +1 is returned. If any operator tried returns - a non-Boolean value (see below), the exception raised by - conversion to Boolean is passed through. If none of the - operators tried returns true, the classic comparison fallbacks - are tried next. + Full forward compatibility can be achieved as follows. When a + classic comparison is requested on an object that implements + ``tp_richcompare()``, up to three comparisons are used: first == is + tried, and if it returns true, 0 is returned; next, < is tried + and if it returns true, -1 is returned; next, > is tried and if + it returns true, +1 is returned. If any operator tried returns + a non-Boolean value (see below), the exception raised by + conversion to Boolean is passed through. If none of the + operators tried returns true, the classic comparison fallbacks + are tried next. - (I thought long and hard about the order in which the three - comparisons should be tried. At one point I had a convincing - argument for doing it in this order, based on the behavior of - comparisons for cyclical data structures. But since that code - has changed again, I'm not so sure that it makes a difference - any more.) + (I thought long and hard about the order in which the three + comparisons should be tried. At one point I had a convincing + argument for doing it in this order, based on the behavior of + comparisons for cyclical data structures. But since that code + has changed again, I'm not so sure that it makes a difference + any more.) - 2 Any type that returns a collection of Booleans instead of a - single boolean should define nb_nonzero() to raise an exception. - Such a type is considered a non-Boolean. +2. Any type that returns a collection of Booleans instead of a + single boolean should define ``nb_nonzero()`` to raise an exception. + Such a type is considered a non-Boolean. - 3 The == and != operators are not assumed to be each other's - complement (e.g. IEEE 754 floating point numbers do not satisfy - this). It is up to the type to implement this if desired. - Similar for < and >=, or > and <=; there are lots of examples - where these assumptions aren't true (e.g. tabnanny). +3. The == and != operators are not assumed to be each other's + complement (e.g. IEEE 754 floating point numbers do not satisfy + this). It is up to the type to implement this if desired. + Similar for < and >=, or > and <=; there are lots of examples + where these assumptions aren't true (e.g. tabnanny). - 4 The reflexivity rules *are* assumed by Python. Thus, the - interpreter may swap y>x with x=x with x<=y, and may swap - the arguments of x==y and x!=y. (Note: Python currently assumes - that x==x is always true and x!=x is never true; this should not - be assumed.) +4. The reflexivity rules **are** assumed by Python. Thus, the + interpreter may swap y>x with x=x with x<=y, and may swap + the arguments of x==y and x!=y. (Note: Python currently assumes + that x==x is always true and x!=x is never true; this should not + be assumed.) - 5 In the current proposal, when A operator. The 'in' and - 'not in' operators and dictionary lookup will only use the == - operator. +6. The ``min()`` and ``list.sort()`` operations will only use the + < operator; max() will only use the > operator. The 'in' and + 'not in' operators and dictionary lookup will only use the == + operator. Implementation Proposal +======================= - This closely follows David Ascher's proposal. +This closely follows David Ascher's proposal. - C API +C API +----- - - New functions: +- New functions:: PyObject *PyObject_RichCompare(PyObject *, PyObject *, int) - This performs the requested rich comparison, returning a Python - object or raising an exception. The 3rd argument must be one of - Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT or Py_GE. + This performs the requested rich comparison, returning a Python + object or raising an exception. The 3rd argument must be one of + Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT or Py_GE. + + :: int PyObject_RichCompareBool(PyObject *, PyObject *, int) - This performs the requested rich comparison, returning a - Boolean: -1 for exception, 0 for false, 1 for true. The 3rd - argument must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT or - Py_GE. Note that when PyObject_RichCompare() returns a - non-Boolean object, PyObject_RichCompareBool() will raise an - exception. + This performs the requested rich comparison, returning a + Boolean: -1 for exception, 0 for false, 1 for true. The 3rd + argument must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT or + Py_GE. Note that when ``PyObject_RichCompare()`` returns a + non-Boolean object, ``PyObject_RichCompareBool()`` will raise an + exception. - - New typedef: +- New typedef:: typedef PyObject *(*richcmpfunc) (PyObject *, PyObject *, int); - - New slot in type object, replacing spare tp_xxx7: +- New slot in type object, replacing spare tp_xxx7:: richcmpfunc tp_richcompare; - This should be a function with the same signature as - PyObject_RichCompare(), and performing the same comparison. - At least one of the arguments is of the type whose - tp_richcompare slot is being used, but the other may have a - different type. If the function cannot compare the particular - combination of objects, it should return a new reference to - Py_NotImplemented. + This should be a function with the same signature as + ``PyObject_RichCompare()``, and performing the same comparison. + At least one of the arguments is of the type whose + tp_richcompare slot is being used, but the other may have a + different type. If the function cannot compare the particular + combination of objects, it should return a new reference to + ``Py_NotImplemented``. - - PyObject_Compare() is changed to try rich comparisons if they - are defined (but only if classic comparisons aren't defined). +- ``PyObject_Compare()`` is changed to try rich comparisons if they + are defined (but only if classic comparisons aren't defined). - Changes to the interpreter +Changes to the interpreter +-------------------------- - - Whenever PyObject_Compare() is called with the intent of getting - the outcome of a particular comparison (e.g. in list.sort(), and - of course for the comparison operators in ceval.c), the code is - changed to call PyObject_RichCompare() or - PyObject_RichCompareBool() instead; if the C code needs to know - the outcome of the comparison, PyObject_IsTrue() is called on - the result (which may raise an exception). +- Whenever ``PyObject_Compare()`` is called with the intent of getting + the outcome of a particular comparison (e.g. in ``list.sort()``, and + of course for the comparison operators in ceval.c), the code is + changed to call ``PyObject_RichCompare()`` or + ``PyObject_RichCompareBool()`` instead; if the C code needs to know + the outcome of the comparison, ``PyObject_IsTrue()`` is called on + the result (which may raise an exception). - - Most built-in types that currently define a comparison will be - modified to define a rich comparison instead. (This is - optional; I've converted lists, tuples, complex numbers, and - arrays so far, and am not sure whether I will convert others.) +- Most built-in types that currently define a comparison will be + modified to define a rich comparison instead. (This is + optional; I've converted lists, tuples, complex numbers, and + arrays so far, and am not sure whether I will convert others.) - Classes +Classes +------- - - Classes can define new special methods __lt__, __le__, __eq__, - __ne__,__gt__, __ge__ to override the corresponding operators. - (I.e., <, <=, ==, !=, >, >=. You gotta love the Fortran - heritage.) If a class defines __cmp__ as well, it is only used - when __lt__ etc. have been tried and return NotImplemented. +- Classes can define new special methods ``__lt__``, ``__le__``, ``__eq__``, + ``__ne__``, ``__gt__``, ``__ge__`` to override the corresponding operators. + (I.e., <, <=, ==, !=, >, >=. You gotta love the Fortran + heritage.) If a class defines ``__cmp__`` as well, it is only used + when ``__lt__`` etc. have been tried and return ``NotImplemented``. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - Appendix +======== - Here is most of David Ascher's original proposal (version 0.2.1, - dated Wed Jul 22 16:49:28 1998; I've left the Contents, History - and Patches sections out). It addresses almost all concerns - above. +Here is most of David Ascher's original proposal (version 0.2.1, +dated Wed Jul 22 16:49:28 1998; I've left the Contents, History +and Patches sections out). It addresses almost all concerns +above. Abstract +======== - A new mechanism allowing comparisons of Python objects to return - values other than -1, 0, or 1 (or raise exceptions) is - proposed. This mechanism is entirely backwards compatible, and can - be controlled at the level of the C PyObject type or of the Python - class definition. There are three cooperating parts to the - proposed mechanism: +A new mechanism allowing comparisons of Python objects to return +values other than -1, 0, or 1 (or raise exceptions) is +proposed. This mechanism is entirely backwards compatible, and can +be controlled at the level of the C ``PyObject`` type or of the Python +class definition. There are three cooperating parts to the +proposed mechanism: - - the use of the last slot in the type object structure to store a - pointer to a rich comparison function +- the use of the last slot in the type object structure to store a + pointer to a rich comparison function - - the addition of special methods for classes +- the addition of special methods for classes - - the addition of an optional argument to the builtin cmp() - function. +- the addition of an optional argument to the builtin ``cmp()`` + function. Motivation +========== - The current comparison protocol for Python objects assumes that - any two Python objects can be compared (as of Python 1.5, object - comparisons can raise exceptions), and that the return value for - any comparison should be -1, 0 or 1. -1 indicates that the first - argument to the comparison function is less than the right one, +1 - indicating the contrapositive, and 0 indicating that the two - objects are equal. While this mechanism allows the establishment - of an order relationship (e.g. for use by the sort() method of list - objects), it has proven to be limited in the context of Numeric - Python (NumPy). +The current comparison protocol for Python objects assumes that +any two Python objects can be compared (as of Python 1.5, object +comparisons can raise exceptions), and that the return value for +any comparison should be -1, 0 or 1. -1 indicates that the first +argument to the comparison function is less than the right one, +1 +indicating the contrapositive, and 0 indicating that the two +objects are equal. While this mechanism allows the establishment +of an order relationship (e.g. for use by the ``sort()`` method of list +objects), it has proven to be limited in the context of Numeric +Python (NumPy). - Specifically, NumPy allows the creation of multidimensional - arrays, which support most of the numeric operators. Thus: +Specifically, NumPy allows the creation of multidimensional +arrays, which support most of the numeric operators. Thus:: - x = array((1,2,3,4)) y = array((2,2,4,4)) + x = array((1,2,3,4)) y = array((2,2,4,4)) - are two NumPy arrays. While they can be added elementwise,: +are two NumPy arrays. While they can be added elementwise,:: - z = x + y # z == array((3,4,7,8)) + z = x + y # z == array((3,4,7,8)) - they cannot be compared in the current framework - the released - version of NumPy compares the pointers, (thus yielding junk - information) which was the only solution before the recent - addition of the ability (in 1.5) to raise exceptions in comparison - functions. +they cannot be compared in the current framework - the released +version of NumPy compares the pointers, (thus yielding junk +information) which was the only solution before the recent +addition of the ability (in 1.5) to raise exceptions in comparison +functions. - Even with the ability to raise exceptions, the current protocol - makes array comparisons useless. To deal with this fact, NumPy - includes several functions which perform the comparisons: less(), - less_equal(), greater(), greater_equal(), equal(), - not_equal(). These functions return arrays with the same shape as - their arguments (modulo broadcasting), filled with 0's and 1's - depending on whether the comparison is true or not for each - element pair. Thus, for example, using the arrays x and y defined - above: +Even with the ability to raise exceptions, the current protocol +makes array comparisons useless. To deal with this fact, NumPy +includes several functions which perform the comparisons: ``less()``, +``less_equal()``, ``greater()``, ``greater_equal()``, ``equal()``, +``not_equal()``. These functions return arrays with the same shape as +their arguments (modulo broadcasting), filled with 0's and 1's +depending on whether the comparison is true or not for each +element pair. Thus, for example, using the arrays x and y defined +above:: - less(x,y) + less(x,y) - would be an array containing the numbers (1,0,0,0). +would be an array containing the numbers (1,0,0,0). - The current proposal is to modify the Python object interface to - allow the NumPy package to make it so that x < y returns the same - thing as less(x,y). The exact return value is up to the NumPy - package -- what this proposal really asks for is changing the - Python core so that extension objects have the ability to return - something other than -1, 0, 1, should their authors choose to do - so. +The current proposal is to modify the Python object interface to +allow the NumPy package to make it so that x < y returns the same +thing as less(x,y). The exact return value is up to the NumPy +package -- what this proposal really asks for is changing the +Python core so that extension objects have the ability to return +something other than -1, 0, 1, should their authors choose to do +so. Current State of Affairs +======================== - The current protocol is, at the C level, that each object type - defines a tp_compare slot, which is a pointer to a function which - takes two PyObject* references and returns -1, 0, or 1. This - function is called by the PyObject_Compare() function defined in - the C API. PyObject_Compare() is also called by the builtin - function cmp() which takes two arguments. +The current protocol is, at the C level, that each object type +defines a ``tp_compare`` slot, which is a pointer to a function which +takes two ``PyObject*`` references and returns -1, 0, or 1. This +function is called by the ``PyObject_Compare()`` function defined in +the C API. ``PyObject_Compare()`` is also called by the builtin +function ``cmp()`` which takes two arguments. Proposed Mechanism +------------------ - 1. Changes to the C structure for type objects +1. Changes to the C structure for type objects - The last available slot in the PyTypeObject, reserved up to now - for future expansion, is used to optionally store a pointer to a - new comparison function, of type richcmpfunc defined by: + The last available slot in the ``PyTypeObject``, reserved up to now + for future expansion, is used to optionally store a pointer to a + new comparison function, of type richcmpfunc defined by:: - typedef PyObject *(*richcmpfunc) - Py_PROTO((PyObject *, PyObject *, int)); + typedef PyObject *(*richcmpfunc) + Py_PROTO((PyObject *, PyObject *, int)); - This function takes three arguments. The first two are the objects - to be compared, and the third is an integer corresponding to an - opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL, - then rich comparison for that object type is not supported (except - for class instances whose class provide the special methods - described below). + This function takes three arguments. The first two are the objects + to be compared, and the third is an integer corresponding to an + opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL, + then rich comparison for that object type is not supported (except + for class instances whose class provide the special methods + described below). - The above opcodes need to be added to the published Python/C API - (probably under the names Py_LT, Py_LE, etc.) + The above opcodes need to be added to the published Python/C API + (probably under the names Py_LT, Py_LE, etc.) - 2. Additions of special methods for classes +2. Additions of special methods for classes - Classes wishing to support the rich comparison mechanisms must add - one or more of the following new special methods: + Classes wishing to support the rich comparison mechanisms must add + one or more of the following new special methods:: - def __lt__(self, other): - ... - def __le__(self, other): - ... - def __gt__(self, other): - ... - def __ge__(self, other): - ... - def __eq__(self, other): - ... - def __ne__(self, other): - ... + def __lt__(self, other): + ... + def __le__(self, other): + ... + def __gt__(self, other): + ... + def __ge__(self, other): + ... + def __eq__(self, other): + ... + def __ne__(self, other): + ... - Each of these is called when the class instance is the on the - left-hand-side of the corresponding operators (<, <=, >, >=, ==, - and != or <>). The argument other is set to the object on the - right side of the operator. The return value of these methods is - up to the class implementor (after all, that's the entire point of - the proposal). + Each of these is called when the class instance is the on the + left-hand-side of the corresponding operators (<, <=, >, >=, ==, + and != or <>). The argument other is set to the object on the + right side of the operator. The return value of these methods is + up to the class implementor (after all, that's the entire point of + the proposal). - If the object on the left side of the operator does not define an - appropriate rich comparison operator (either at the C level or - with one of the special methods, then the comparison is reversed, - and the right hand operator is called with the opposite operator, - and the two objects are swapped. This assumes that a < b and b > a - are equivalent, as are a <= b and b >= a, and that == and != are - commutative (e.g. a == b if and only if b == a). + If the object on the left side of the operator does not define an + appropriate rich comparison operator (either at the C level or + with one of the special methods, then the comparison is reversed, + and the right hand operator is called with the opposite operator, + and the two objects are swapped. This assumes that a < b and b > a + are equivalent, as are a <= b and b >= a, and that == and != are + commutative (e.g. a == b if and only if b == a). - For example, if obj1 is an object which supports the rich - comparison protocol and x and y are objects which do not support - the rich comparison protocol, then obj1 < x will call the __lt__ - method of obj1 with x as the second argument. x < obj1 will call - obj1's __gt__ method with x as a second argument, and x < y will - just use the existing (non-rich) comparison mechanism. + For example, if obj1 is an object which supports the rich + comparison protocol and x and y are objects which do not support + the rich comparison protocol, then obj1 < x will call the ``__lt__`` + method of obj1 with x as the second argument. x < obj1 will call + obj1's ``__gt__`` method with x as a second argument, and x < y will + just use the existing (non-rich) comparison mechanism. - The above mechanism is such that classes can get away with not - implementing either __lt__ and __le__ or __gt__ and - __ge__. Further smarts could have been added to the comparison - mechanism, but this limited set of allowed "swaps" was chosen - because it doesn't require the infrastructure to do any processing - (negation) of return values. The choice of six special methods was - made over a single (e.g. __richcmp__) method to allow the - dispatching on the opcode to be performed at the level of the C - implementation rather than the user-defined method. + The above mechanism is such that classes can get away with not + implementing either ``__lt__`` and ``__le__`` or ``__gt__`` and + ``__ge__``. Further smarts could have been added to the comparison + mechanism, but this limited set of allowed "swaps" was chosen + because it doesn't require the infrastructure to do any processing + (negation) of return values. The choice of six special methods was + made over a single (e.g. ``__richcmp__``) method to allow the + dispatching on the opcode to be performed at the level of the C + implementation rather than the user-defined method. - 3. Addition of an optional argument to the builtin cmp() +3. Addition of an optional argument to the builtin ``cmp()`` - The builtin cmp() is still used for simple comparisons. For rich - comparisons, it is called with a third argument, one of "<", "<=", - ">", ">=", "==", "!=", "<>" (the last two have the same - meaning). When called with one of these strings as the third - argument, cmp() can return any Python object. Otherwise, it can - only return -1, 0 or 1 as before. + The builtin ``cmp()`` is still used for simple comparisons. For rich + comparisons, it is called with a third argument, one of "<", "<=", + ">", ">=", "==", "!=", "<>" (the last two have the same + meaning). When called with one of these strings as the third + argument, ``cmp()`` can return any Python object. Otherwise, it can + only return -1, 0 or 1 as before. Chained Comparisons +------------------- - Problem +Problem +''''''' - It would be nice to allow objects for which the comparison returns - something other than -1, 0, or 1 to be used in chained - comparisons, such as: +It would be nice to allow objects for which the comparison returns +something other than -1, 0, or 1 to be used in chained +comparisons, such as:: - x < y < z + x < y < z - Currently, this is interpreted by Python as: +Currently, this is interpreted by Python as:: - temp1 = x < y - if temp1: - return y < z + temp1 = x < y + if temp1: + return y < z + else: + return temp1 + +Note that this requires testing the truth value of the result of +comparisons, with potential "shortcutting" of the right-side +comparison testings. In other words, the truth-value of the result +of the result of the comparison determines the result of a chained +operation. This is problematic in the case of arrays, since if x, +y and z are three arrays, then the user expects:: + + x < y < z + +to be an array of 0's and 1's where 1's are in the locations +corresponding to the elements of y which are between the +corresponding elements in x and z. In other words, the right-hand +side must be evaluated regardless of the result of x < y, which is +incompatible with the mechanism currently in use by the parser. + +Solution +'''''''' + +Guido mentioned that one possible way out would be to change the +code generated by chained comparisons to allow arrays to be +chained-compared intelligently. What follows is a mixture of his +idea and my suggestions. The code generated for x < y < z would be +equivalent to:: + + temp1 = x < y + if temp1: + temp2 = y < z + return boolean_combine(temp1, temp2) + else: + return temp1 + +where boolean_combine is a new function which does something like +the following:: + + def boolean_combine(a, b): + if hasattr(a, '__boolean_and__') or \ + hasattr(b, '__boolean_and__'): + try: + return a.__boolean_and__(b) + except: + return b.__boolean_and__(a) + else: # standard behavior + if a: + return b else: - return temp1 + return 0 - Note that this requires testing the truth value of the result of - comparisons, with potential "shortcutting" of the right-side - comparison testings. In other words, the truth-value of the result - of the result of the comparison determines the result of a chained - operation. This is problematic in the case of arrays, since if x, - y and z are three arrays, then the user expects: +where the ``__boolean_and__`` special method is implemented for +C-level types by another value of the third argument to the +richcmp function. This method would perform a boolean comparison +of the arrays (currently implemented in the umath module as the +logical_and ufunc). - x < y < z +Thus, objects returned by rich comparisons should always test +true, but should define another special method which creates +boolean combinations of them and their argument. - to be an array of 0's and 1's where 1's are in the locations - corresponding to the elements of y which are between the - corresponding elements in x and z. In other words, the right-hand - side must be evaluated regardless of the result of x < y, which is - incompatible with the mechanism currently in use by the parser. +This solution has the advantage of allowing chained comparisons to +work for arrays, but the disadvantage that it requires comparison +arrays to always return true (in an ideal world, I'd have them +always raise an exception on truth testing, since the meaning of +testing "if a>b:" is massively ambiguous. - Solution +The inlining already present which deals with integer comparisons +would still apply, resulting in no performance cost for the most +common cases. - Guido mentioned that one possible way out would be to change the - code generated by chained comparisons to allow arrays to be - chained-compared intelligently. What follows is a mixture of his - idea and my suggestions. The code generated for x < y < z would be - equivalent to: - - temp1 = x < y - if temp1: - temp2 = y < z - return boolean_combine(temp1, temp2) - else: - return temp1 - - where boolean_combine is a new function which does something like - the following: - - def boolean_combine(a, b): - if hasattr(a, '__boolean_and__') or \ - hasattr(b, '__boolean_and__'): - try: - return a.__boolean_and__(b) - except: - return b.__boolean_and__(a) - else: # standard behavior - if a: - return b - else: - return 0 - - where the __boolean_and__ special method is implemented for - C-level types by another value of the third argument to the - richcmp function. This method would perform a boolean comparison - of the arrays (currently implemented in the umath module as the - logical_and ufunc). - - Thus, objects returned by rich comparisons should always test - true, but should define another special method which creates - boolean combinations of them and their argument. - - This solution has the advantage of allowing chained comparisons to - work for arrays, but the disadvantage that it requires comparison - arrays to always return true (in an ideal world, I'd have them - always raise an exception on truth testing, since the meaning of - testing "if a>b:" is massively ambiguous. - - The inlining already present which deals with integer comparisons - would still apply, resulting in no performance cost for the most - common cases. - - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: