update from Josiah Carlson

This commit is contained in:
David Goodger 2004-01-09 02:36:27 +00:00
parent cef13d2601
commit 778f8da1d3
1 changed files with 134 additions and 175 deletions

View File

@ -9,34 +9,34 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 20-Dec-2003
Python-Version: 2.4
Post-History: 20-Dec-2003, 03-Jan-2004, 05-Jan-2004
Post-History: 20-Dec-2003, 03-Jan-2004, 05-Jan-2004, 07-Jan-2004
Abstract
========
This PEP proposes two new attributes to the ``cmp`` built-in that
represent a top and bottom [4]_ value: ``high`` and ``low`` (or a pair
of similarly named attributes [5]_).
This PEP proposes two singleton constants that represent a top and
bottom [3]_ value: ``Max`` and ``Min`` (or two similarly suggestive
names [4]_; see `Open Issues`_).
As suggested by their names, ``cmp.high`` and ``cmp.low`` would
compare higher or lower than any other object (respectively). Such
behavior results in easier to understand code and fewer special cases
in which a temporary minimum or maximum is required, and an actual
minimum or maximum numeric value is not limited.
As suggested by their names, ``Max`` and ``Min`` would compare higher
or lower than any other object (respectively). Such behavior results
in easier to understand code and fewer special cases in which a
temporary minimum or maximum value is required, and an actual minimum
or maximum numeric value is not limited.
Rationale
=========
While ``None`` can be used as an absolute minimum that any value can
attain [1]_, this may be depreciated [5]_ in Python 3.0, and shouldn't
attain [1]_, this may be depreciated [4]_ in Python 3.0, and shouldn't
be relied upon.
As a replacement for ``None`` being used as an absolute minimum, as
well as the introduction of an absolute maximum, attaching ``low`` and
``high`` to ``cmp`` addresses concerns for namespace pollution and
serves to make both self-documenting.
well as the introduction of an absolute maximum, the introduction of
two singleton constants ``Max`` and ``Min`` address concerns for the
constants to be self-documenting.
What is commonly done to deal with absolute minimum or maximum values,
is to set a value that is larger than the script author ever expects
@ -69,24 +69,23 @@ infinity). However, each has their drawbacks.
- These same drawbacks exist when numbers are small.
Introducing ``high`` and ``low`` attributes to ``cmp`` that work as
described does not take much effort. A sample Python `reference
implementation`_ of both attributes is included.
Introducing ``Max`` and ``Min`` that work as described above does not
take much effort. A sample Python `reference implementation`_ of both
is included.
Motivation
==========
``cmp.high`` Examples
---------------------
There are hundreds of algorithms that begin by initializing some set
of values to a logical (or numeric) infinity or negative infinity.
Python lacks either infinity that works consistently or really is the
most extreme value that can be attained. By adding the ``cmp.high``
and ``cmp.low`` attributes, Python would have a real maximum and
minimum value, and such algorithms can become clearer due to the
reduction of special cases.
most extreme value that can be attained. By adding ``Max`` and
``Min``, Python would have a real maximum and minimum value, and such
algorithms can become clearer due to the reduction of special cases.
``Max`` Examples
---------------------
Take for example, finding the minimum in a sequence::
@ -114,26 +113,26 @@ Take for example, finding the minimum in a sequence::
::
def findmin_high(seq):
cur = cmp.high
def findmin_Max(seq):
cur = Max
for obj in seq:
cur = min(obj, cur)
return cur
Please note that there are an arbitrarily large number of ways to find
the minimum (or maximum) of a sequence, these seek to show a simple
example where using ``cmp.high`` makes the algorithm easier to
understand and results in simplification of code.
example where using ``Max`` makes the algorithm easier to understand
and results in the simplification of code.
Guido brought up the idea of just negating everything and comparing
[2]_. Certainly this does work when using numbers, but it does not
remove the special case (actually adds one) and results in the code
being less readable. ::
#we have cmp.high available
#we have Max available
a = min(a, b)
#we don't have cmp.high available
#we don't have Max available
if a is not None:
if b is None:
a = b
@ -166,10 +165,10 @@ table (a bit more understandable) and one using a heap (much faster)::
table = {}
for node in graph.iterkeys():
#(visited, distance, node, parent)
table[node] = (0, cmp.high, node, None)
table[node] = (0, Max, node, None)
table[S] = (0, 0, S, None)
cur = min(table.values())
while (not cur[0]) and cur[1] < cmp.high:
while (not cur[0]) and cur[1] < Max:
(visited, distance, node, parent) = cur
table[node] = (1, distance, node, parent)
for cdist, child in graph[node]:
@ -194,10 +193,10 @@ table (a bit more understandable) and one using a heap (much faster)::
#runs in O(NlgN) time using a minheap
#find the shortest path
import heapq
Q = [(cmp.high, i, None) for i in graph.iterkeys()]
Q = [(Max, i, None) for i in graph.iterkeys()]
heapq.heappush(Q, (0, S, None))
V = {}
while Q[0][0] < cmp.high and T not in V:
while Q[0][0] < Max and T not in V:
dist, node, parent = heapq.heappop(Q)
if node in V:
continue
@ -215,20 +214,20 @@ table (a bit more understandable) and one using a heap (much faster)::
path.reverse()
return path
Readers should note that replacing ``cmp.high`` in the above code with
an arbitrarily large number does not guarantee that the actual
Readers should note that replacing ``Max`` in the above code with an
arbitrarily large number does not guarantee that the shortest path
distance to a node will never exceed that number. Well, with one
caveat: one could certainly sum up the weights of every edge in the
graph, and set the 'arbitrarily large number' to that total. However,
doing so does not make the algorithm any easier to understand and has
potential problems with various numeric overflows.
potential problems with numeric overflows.
A ``cmp.low`` Example
---------------------
A ``Min`` Example
-----------------
An example of usage for ``cmp.low`` is an algorithm that solves the
following problem [7]_:
An example of usage for ``Min`` is an algorithm that solves the
following problem [6]_:
Suppose you are given a directed graph, representing a
communication network. The vertices are the nodes in the network,
@ -246,10 +245,10 @@ Such an algorithm is a 7 line modification to the DijkstraSP_table
algorithm given above::
#only showing the changed to lines with the proper indentation
table[node] = (0, cmp.low, node, None)
table[node] = (0, Min, node, None)
table[S] = (0, 1, S, None)
cur = max(table.values())
while (not cur[0]) and cur[1] > cmp.low:
while (not cur[0]) and cur[1] > Min:
ndist = distance*cdist
if not table[child][0] and ndist > table[child][1]:
cur = max(table.values())
@ -260,9 +259,9 @@ to)::
#only showing the changed to lines with the proper indentation
import maxheapq
Q = [(cmp.low, i, None) for i in graph.iterkeys()]
Q = [(Min, i, None) for i in graph.iterkeys()]
maxheapq.heappush(Q, (1, S, None))
while Q[0][0] > cmp.low and T not in V:
while Q[0][0] > Min and T not in V:
dist, node, parent = maxheapq.heappop(Q)
maxheapq.heappush(Q, (next*dist, dest, node))
@ -270,45 +269,69 @@ Note that there is an equivalent way of translating the graph to
produce something that can be passed unchanged into the original
Dijkstra shortest path algorithm.
Further usage examples of both ``cmp.high`` and ``cmp.low`` are
available in the realm of graph algorithms.
Other Examples
--------------
Andrew P. Lentvorski, Jr. [7]_ has pointed out that various data
structures involving range searching have immediate use for ``Max``
and ``Min`` values. More specifically; Segment trees, Range trees,
k-d trees and database keys:
...The issue is that a range can be open on one side and does not
always have an initialized case.
The solutions I have seen are to either overload None as the
extremum or use an arbitrary large magnitude number. Overloading
None means that the built-ins can't really be used without special
case checks to work around the undefined (or "wrongly defined")
ordering of None. These checks tend to swamp the nice performance
of built-ins like max() and min().
Choosing a large magnitude number throws away the ability of
Python to cope with arbitrarily large integers and introduces a
potential source of overrun/underrun bugs.
Further use examples of both ``Max`` and ``Min`` are available in the
realm of graph algorithms, range searching algorithms, computational
geometry algorithms, and others.
Independent Implementations?
----------------------------
Independent implementations of the top/bottom concept by users
Independent implementations of the ``Min``/``Max`` concept by users
desiring such functionality are not likely to be compatible, and
certainly will be inconsistent. The following examples seek to show
how inconsistent they can be.
certainly will produce inconsistent orderings. The following examples
seek to show how inconsistent they can be.
- Let us pretend we have created proper separate implementations of
Myhigh, Mylow, Yourhigh and Yourlow with the same code as given in
MyMax, MyMin, YourMax and YourMin with the same code as given in
the sample implementation (with some minor renaming)::
>>> lst = [Yourlow, Mylow, Mylow, Yourlow, Myhigh, Yourlow,
Myhigh, Yourhigh, Myhigh]
>>> lst = [YourMin, MyMin, MyMin, YourMin, MyMax, YourMin, MyMax,
YourMax, MyMax]
>>> lst.sort()
>>> lst
[Yourlow, Yourlow, Mylow, Mylow, Yourlow, Myhigh, Myhigh,
Yourhigh, Myhigh]
[YourMin, YourMin, MyMin, MyMin, YourMin, MyMax, MyMax, YourMax,
MyMax]
Notice that while all the "low"s are before the "high"s, there is no
guarantee that all instances of Yourlow will come before Mylow, the
reverse, or the equivalent Myhigh and Yourhigh.
Notice that while all the "Min"s are before the "Max"s, there is no
guarantee that all instances of YourMin will come before MyMin, the
reverse, or the equivalent MyMax and YourMax.
- The problem is evident even when using the heapq module::
- The problem is also evident when using the heapq module::
>>> lst = [Yourlow, Mylow, Mylow, Yourlow, Myhigh, Yourlow,
Myhigh, Yourhigh, Myhigh]
>>> lst = [YourMin, MyMin, MyMin, YourMin, MyMax, YourMin, MyMax,
YourMax, MyMax]
>>> heapq.heapify(lst) #not needed, but it can't hurt
>>> while lst: print heapq.heappop(lst),
...
Yourlow Mylow Yourlow Yourlow Mylow Myhigh Myhigh Yourhigh Myhigh
YourMin MyMin YourMin YourMin MyMin MyMax MyMax YourMax MyMax
- Furthermore, the findmin_high code and both versions of Dijkstra
- Furthermore, the findmin_Max code and both versions of Dijkstra
could result in incorrect output by passing in secondary versions of
high.
``Max``.
Reference Implementation
@ -316,42 +339,36 @@ Reference Implementation
::
class _HighType(object):
class _ExtremeType(object):
def __init__(self, cmpr, rep):
object.__init__(self)
self._cmpr = cmpr
self._rep = rep
def __cmp__(self, other):
if isinstance(other, self.__class__):
if isinstance(other, self.__class__) and\
other._cmpr == self._cmpr:
return 0
return 1
return self._cmpr
def __repr__(self):
return 'cmp.high'
return self._rep
class _LowType(object):
Max = _ExtremeType(1, "Max")
Min = _ExtremeType(-1, "Min")
def __cmp__(self, other):
if isinstance(other, self.__class__):
return 0
return -1
Results of Test Run::
def __repr__(self):
return 'cmp.low'
# please note that the following code doesn't
# work due to built-ins being read-only
cmp.high = _HighType()
cmp.low = _LowType()
Results of Test Run if we could set cmp.high and cmp.low::
>>> max(cmp.high, 2**65536)
cmp.high
>>> min(cmp.high, 2**65536)
>>> max(Max, 2**65536)
Max
>>> min(Max, 2**65536)
20035299304068464649790...
(lines removed for brevity)
...72339445587895905719156736L
>>> min(cmp.low, -2**65536)
cmp.low
>>> max(cmp.low, -2**65536)
>>> min(Min, -2**65536)
Min
>>> max(Min, -2**65536)
-2003529930406846464979...
(lines removed for brevity)
...072339445587895905719156736L
@ -360,66 +377,15 @@ Results of Test Run if we could set cmp.high and cmp.low::
Open Issues
===========
- Previously, ``Some`` and ``All`` were names for the idea that
``cmp.high`` now represents. They have been subsumed by
``cmp.high`` due to the relative ambiguity of ``Some`` and ``All``.
Current options for the naming and namespace for ``Min``/``Max``, in
no particular order:
- Terry Reedy [5]_ and others have offered alternate names for the
``high/low`` objects: ``ObjHi/ObjLo``, ``NoneHi/NoneLo``,
``Highest/Lowest``, ``Infinity/NegativeInfinity``, ``hi/lo`` and
``High/Low``.
- Terry Reedy has also offered possible default behaviors of ``min``
and ``max`` on empty lists using these values. Some have voiced
that changing the behavior of min and max are not desirable due to
the amount of code that actively uses min and max, which may rely on
min and max raising exceptions on empty lists.
- Choosing ``high`` and ``low`` to be the attributes of ``cmp`` is
arbitrary, but meaningful. Other meaningful parent locations
include, but are not limited to: ``math``, ``int`` and ``number``
(if such a numeric superclass existed). ``sys`` probably does not
make sense, as such maximum and minimum values are not platform
dependent.
- The base class of the high and low objects do not necessarily have
to be ``object``. ``object``, ``NoneType`` or even a new class
called ``cmp.extreme`` have been suggested.
- Various built-in names such as ``All`` and ``Some`` have been
rejected by many users. Based on comments, it seems that regardless
of name, any new built-in would be rejected. [6]_
- Should ``-cmp.high == cmp.low``? This seems to make logical sense.
- Certainly ``bool(cmp.high) == True``, but should ``bool(cmp.low)``
be ``True`` or ``False``? Due to ``bool(1) == bool(-1) == True``,
it seems to follow that ``bool(cmp.high) == bool(cmp.low) == True``.
- Whatever name the concepts of a top and bottom value come to have,
the question of whether or not math can be done on them may or may
not make sense. If math were not allowed, it brings up a potential
ambiguity that while ``-cmp.high == cmp.low``, ``-1 * cmp.high``
would produce an exception.
Most-Preferred Options
======================
Through a non-rigorous method, the following behavior of the objects
seem to be preferred by those who are generally in favor of this PEP
in python-dev.
- ``high`` and ``low`` objects should be attached to the ``cmp``
built-in as ``cmp.high`` and ``cmp.low`` (or ``cmp.High/cmp.Low``).
- ``-cmp.high == cmp.low`` and equivalently ``-cmp.low == cmp.high``.
- ``bool(cmp.high) == bool(cmp.low) == True``
- The base type seems to be a cosmetic issue and has not resulted in
any real preference other than ``cmp.extreme`` making the most
sense.
1. Give the built-in ``max`` and ``min`` appropriate ``__cmp__``
methods to allow them to double as ``Min``/``Max``.
2. Attach them to attributes of the ``cmp()`` built-in.
3. Attach them to attributes of an appropriate type object.
4. Make them an appropriate module object.
5. Create two new built-ins with appropriate names.
References
@ -431,61 +397,54 @@ References
.. [2] Re: [Python-Dev] Got None. Maybe Some?, von Rossum, Guido
(http://mail.python.org/pipermail/python-dev/2003-December/041352.html)
.. [3] [Python-Dev] Re: Got None. Maybe Some?, Reedy, Terry
(http://mail.python.org/pipermail/python-dev/2003-December/041337.html)
.. [4] RE: [Python-Dev] Got None. Maybe Some?, Peters, Tim
.. [3] RE: [Python-Dev] Got None. Maybe Some?, Peters, Tim
(http://mail.python.org/pipermail/python-dev/2003-December/041332.html)
.. [5] [Python-Dev] Re: PEP 326 now online, Reedy, Terry
.. [4] [Python-Dev] Re: PEP 326 now online, Reedy, Terry
(http://mail.python.org/pipermail/python-dev/2004-January/041685.html)
.. [6] [Python-Dev] PEP 326 now online, Chermside, Michael
.. [5] [Python-Dev] PEP 326 now online, Chermside, Michael
(http://mail.python.org/pipermail/python-dev/2004-January/041704.html)
.. [7] Homework 6, Problem 7, Dillencourt, Michael
.. [6] Homework 6, Problem 7, Dillencourt, Michael
(link may not be valid in the future)
(http://www.ics.uci.edu/~dillenco/ics161/hw/hw6.pdf)
.. [7] RE: [Python-Dev] PEP 326 now online, Lentvorski, Andrew P., Jr.
(http://mail.python.org/pipermail/python-dev/2004-January/041727.html)
.. [8] Re: It's not really Some is it?, Ippolito, Bob
(http://www.livejournal.com/users/chouyu_31/138195.html?thread=274643#t274643)
Changes
=======
- Added this section.
- Renamed ``Some`` to ``All``: "Some" was an arbitrary name that
suffered from being unclear. [3]_
- Made ``All`` a subclass of ``object`` in order for it to become a
new-style class.
- Removed mathematical negation and casting to float in Open Issues.
``None`` is not a number and is not treated as one, ``All``
shouldn't be either.
- Added Motivation_ section.
- Changed markup to reStructuredText.
- Renamed ``All`` to ``cmp.hi`` to remove builtin requirement and to
provide a better better name, as well as adding an equivalent
future-proof bottom value ``cmp.lo``. [5]_
- Concept gets a possible name and location. [5]_
- Clarified Abstract_, Motivation_, `Reference Implementation`_ and
`Open Issues`_ based on the simultaneous concepts of ``cmp.hi`` and
``cmp.lo``.
`Open Issues`_ based on the simultaneous concepts of ``Max`` and
``Min``.
- Added two implementations of Dijkstra's Shortest Path algorithm that
show where ``cmp.hi`` can be used to remove special cases.
show where ``Max`` can be used to remove special cases.
- Renamed ``hi`` to ``high`` and ``lo`` to ``low`` to address concerns
for non-native english speakers.
- Added an example of use for ``Min`` to Motivation_.
- Added an example of use for ``cmp.low`` to Motivation_.
- Added some `Open Issues`_ and clarified some others.
- Added a couple `Open Issues`_ and clarified some others.
- Added an example and `Other Examples`_ subheading.
- Added `Most-Preferred Options`_ section.
- Modified `Reference Implementation`_ to instantiate both items from
a single class/type.
- Removed a large number of open issues that are not within the scope
of this PEP.
Copyright