updates from Josiah Carlson, edited

This commit is contained in:
David Goodger 2004-01-06 15:34:49 +00:00
parent f5cddb1e9c
commit b9523879ea
2 changed files with 374 additions and 88 deletions

View File

@ -121,7 +121,7 @@ Index by Category
S 323 Copyable Iterators Martelli
S 324 popen5 - New POSIX process module Astrand
S 325 Resource-Release Support for Generators Pedroni
S 326 A Case for All Carlson
S 326 A Case for Top and Bottom Values Carlson
S 754 IEEE 754 Floating Point Special Values Warnes
Finished PEPs (done, implemented in CVS)
@ -344,7 +344,7 @@ Numerical Index
S 323 Copyable Iterators Martelli
S 324 popen5 - New POSIX process module Astrand
S 325 Resource-Release Support for Generators Pedroni
S 326 A Case for All Carlson
S 326 A Case for Top and Bottom Values Carlson
SR 666 Reject Foolish Indentation Creighton
S 754 IEEE 754 Floating Point Special Values Warnes

View File

@ -1,56 +1,59 @@
PEP: 326
Title: A Case for All
Title: A Case for Top and Bottom Values
Version: $Revision$
Last-Modified: $Date$
Author: Josiah Carlson <jcarlson@uci.edu>
Author: Josiah Carlson <jcarlson@uci.edu>,
Terry Reedy <tjreedy@udel.edu>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 20-Dec-2003
Python-Version: 2.4
Post-History: 20-Dec-2003, 03-Jan-2004
Post-History: 20-Dec-2003, 03-Jan-2004, 05-Jan-2004
Abstract
========
This PEP proposes a new named constant or built-in: ``All``.
This PEP proposes two new attributes to the ``cmp`` built-in that
represent a top and bottom [4]_ value: ``high`` and ``low`` (or a pair
of similarly named attributes [5]_).
Users of Python have had the constant None, which represents a lack of
value, for quite some time (possibly from the beginning, this is
unknown to the author at the current time). The author believes that
``All`` should be introduced in order to represent a functionally
infinite value, with similar behavior corresponding to None.
As suggested by their names, ``cmp.high`` and ``cmp.low`` would
compare higher or lower than any other object (respectively). Such
behavior results in easier to understand code and fewer special cases
in which a temporary minimum or maximum is required, and an actual
minimum or maximum numeric value is not limited.
Rationale
=========
While None can be used as an absolute minimum that any value can
attain [1]_, there does not exist an equivalent absolute maximum. For
example::
While ``None`` can be used as an absolute minimum that any value can
attain [1]_, this may be depreciated [5]_ in Python 3.0, and shouldn't
be relied upon.
>>> print min(None, -2**1000)
None
As a replacement for ``None`` being used as an absolute minimum, as
well as the introduction of an absolute maximum, attaching ``low`` and
``high`` to ``cmp`` addresses concerns for namespace pollution and
serves to make both self-documenting.
All comparisons including None and another object results in None
being the smaller of the two.
However, there does not exist a value such that the comparison of any
other object results in the constant being the larger of the two.
What is commonly done to deal with such cases, is to set a value in a
script that is larger than the author ever expects the input to reach,
and hope that it isn't reached.
What is commonly done to deal with absolute minimum or maximum values,
is to set a value that is larger than the script author ever expects
the input to reach, and hope that it isn't reached.
Guido has brought up [2]_ the fact that there exists two constants
that can be used in the interim: sys.maxint and floating point
positive infinity (1e309 will evaluate to positive infinity).
However, each has their drawbacks. On most architectures sys.maxint
is arbitrarily small (2**31-1 or 2**63-1), and can be easily eclipsed
by large 'long' integers or floating point numbers. Furthermore,
comparing long integers larger than the largest floating point number
representable against any float will result in an exception being
raised::
that can be used in the interim for maximum values: sys.maxint and
floating point positive infinity (1e309 will evaluate to positive
infinity). However, each has their drawbacks.
- On most architectures sys.maxint is arbitrarily small (2**31-1 or
2**63-1) and can be easily eclipsed by large 'long' integers or
floating point numbers.
- Comparing long integers larger than the largest floating point
number representable against any float will result in an exception
being raised::
>>> cmp(1.0, 10**309)
Traceback (most recent call last):
@ -64,19 +67,26 @@ Even when large integers are compared against positive infinity::
File "<stdin>", line 1, in ?
OverflowError: long int too large to convert to float
Introducing an ``All`` built-in that works as described does not take
much effort. A sample Python implementation is included.
- These same drawbacks exist when numbers are small.
Introducing ``high`` and ``low`` attributes to ``cmp`` that work as
described does not take much effort. A sample Python `reference
implementation`_ of both attributes is included.
Motivation
==========
``cmp.high`` Examples
---------------------
There are hundreds of algorithms that begin by initializing some set
of values to a logical (or numeric) infinity. Python lacks a positive
infinity that works consistently or really is the largest value that
can be attained. By adding the ``All`` constant (or built-in), Python
would have a real maximum value, and such algorithms can become
clearer due to the reduction of special cases.
of values to a logical (or numeric) infinity or negative infinity.
Python lacks either infinity that works consistently or really is the
most extreme value that can be attained. By adding the ``cmp.high``
and ``cmp.low`` attributes, Python would have a real maximum and
minimum value, and such algorithms can become clearer due to the
reduction of special cases.
Take for example, finding the minimum in a sequence::
@ -91,6 +101,8 @@ Take for example, finding the minimum in a sequence::
cur = min(cur, obj)
return cur
::
def findmin_None(seq):
cur = None
for obj in seq:
@ -100,34 +112,211 @@ Take for example, finding the minimum in a sequence::
return cur
return cur
def findmin_All(seq):
cur = All
::
def findmin_high(seq):
cur = cmp.high
for obj in seq:
cur = min(obj, cur)
return cur
Please note that there are an arbitrarily large number of ways to find
the minimum (or maximum) of a sequence, these seek to show a simple
example where using ``cmp.high`` makes the algorithm easier to
understand and results in simplification of code.
Guido brought up the idea of just negating everything and comparing
[2]_. Certainly this does work when using numbers, but it does not
remove the special case (actually adds one) and results in the code
being less readable. ::
#we have All available
#we have cmp.high available
a = min(a, b)
#we don't have All available
#we don't have cmp.high available
if a is not None:
if b is None:
a = b
else:
a = -max(-a, -b)
As another example, in Dijkstra's shortest path algorithm on a graph
with weighted edges (all positive).
Implementation
==============
1. Set distances to every node in the graph to infinity.
2. Set the distance to the start node to zero.
3. Set visited to be an empty mapping.
4. While shortest distance of a node that has not been visited is less
than infinity and the destination has not been visited.
a. Get the node with the shortest distance.
b. Visit the node.
c. Update neighbor distances and parent pointers if necessary for
neighbors that have not been visited.
5. If the destination has been visited, step back through parent
pointers to find the reverse of the path to be taken.
To be complete, below are two versions of the algorithm, one using a
table (a bit more understandable) and one using a heap (much faster)::
def DijkstraSP_table(graph, S, T):
#runs in O(N^2) time using a table
#find the shortest path
table = {}
for node in graph.iterkeys():
#(visited, distance, node, parent)
table[node] = (0, cmp.high, node, None)
table[S] = (0, 0, S, None)
cur = min(table.values())
while (not cur[0]) and cur[1] < cmp.high:
(visited, distance, node, parent) = cur
table[node] = (1, distance, node, parent)
for cdist, child in graph[node]:
ndist = distance+cdist
if not table[child][0] and ndist < table[child][1]:
table[child] = (0, ndist, child, node)
cur = min(table.values())
#backtrace through results
if not table[T][0]:
return None
cur = T
path = [T]
while table[cur][3] is not None:
path.append(table[cur][3])
cur = path[-1]
path.reverse()
return path
::
class _AllType(object):
def DijkstraSP_heap(graph, S, T):
#runs in O(NlgN) time using a minheap
#find the shortest path
import heapq
Q = [(cmp.high, i, None) for i in graph.iterkeys()]
heapq.heappush(Q, (0, S, None))
V = {}
while Q[0][0] < cmp.high and T not in V:
dist, node, parent = heapq.heappop(Q)
if node in V:
continue
V[node] = (dist, parent)
for next, dest in graph[node]:
heapq.heappush(Q, (next+dist, dest, node))
#backtrace through results
if T not in V:
return None
cur = T
path = [T]
while V[cur][1] is not None:
path.append(V[cur][1])
cur = path[-1]
path.reverse()
return path
Readers should note that replacing ``cmp.high`` in the above code with
an arbitrarily large number does not guarantee that the actual
distance to a node will never exceed that number. Well, with one
caveat: one could certainly sum up the weights of every edge in the
graph, and set the 'arbitrarily large number' to that total. However,
doing so does not make the algorithm any easier to understand and has
potential problems with various numeric overflows.
A ``cmp.low`` Example
---------------------
An example of usage for ``cmp.low`` is an algorithm that solves the
following problem [7]_:
Suppose you are given a directed graph, representing a
communication network. The vertices are the nodes in the network,
and each edge is a communication channel. Each edge ``(u, v)`` has
an associated value ``r(u, v)``, with ``0 <= r(u, v) <= 1``, which
represents the reliability of the channel from ``u`` to ``v``
(i.e., the probability that the channel from ``u`` to ``v`` will
**not** fail). Assume that the reliability probabilities of the
channels are independent. (This implies that the reliability of
any path is the product of the reliability of the edges along the
path.) Now suppose you are given two nodes in the graph, ``A``
and ``B``.
Such an algorithm is a 7 line modification to the DijkstraSP_table
algorithm given above::
#only showing the changed to lines with the proper indentation
table[node] = (0, cmp.low, node, None)
table[S] = (0, 1, S, None)
cur = max(table.values())
while (not cur[0]) and cur[1] > cmp.low:
ndist = distance*cdist
if not table[child][0] and ndist > table[child][1]:
cur = max(table.values())
Or a 6 line modification to the DijkstraSP_heap algorithm given above
(if we assume that ``maxheapq`` exists and does what it is supposed
to)::
#only showing the changed to lines with the proper indentation
import maxheapq
Q = [(cmp.low, i, None) for i in graph.iterkeys()]
maxheapq.heappush(Q, (1, S, None))
while Q[0][0] > cmp.low and T not in V:
dist, node, parent = maxheapq.heappop(Q)
maxheapq.heappush(Q, (next*dist, dest, node))
Note that there is an equivalent way of translating the graph to
produce something that can be passed unchanged into the original
Dijkstra shortest path algorithm.
Further usage examples of both ``cmp.high`` and ``cmp.low`` are
available in the realm of graph algorithms.
Independent Implementations?
----------------------------
Independent implementations of the top/bottom concept by users
desiring such functionality are not likely to be compatible, and
certainly will be inconsistent. The following examples seek to show
how inconsistent they can be.
- Let us pretend we have created proper separate implementations of
Myhigh, Mylow, Yourhigh and Yourlow with the same code as given in
the sample implementation (with some minor renaming)::
>>> lst = [Yourlow, Mylow, Mylow, Yourlow, Myhigh, Yourlow,
Myhigh, Yourhigh, Myhigh]
>>> lst.sort()
>>> lst
[Yourlow, Yourlow, Mylow, Mylow, Yourlow, Myhigh, Myhigh,
Yourhigh, Myhigh]
Notice that while all the "low"s are before the "high"s, there is no
guarantee that all instances of Yourlow will come before Mylow, the
reverse, or the equivalent Myhigh and Yourhigh.
- The problem is evident even when using the heapq module::
>>> lst = [Yourlow, Mylow, Mylow, Yourlow, Myhigh, Yourlow,
Myhigh, Yourhigh, Myhigh]
>>> heapq.heapify(lst) #not needed, but it can't hurt
>>> while lst: print heapq.heappop(lst),
...
Yourlow Mylow Yourlow Yourlow Mylow Myhigh Myhigh Yourhigh Myhigh
- Furthermore, the findmin_high code and both versions of Dijkstra
could result in incorrect output by passing in secondary versions of
high.
Reference Implementation
========================
::
class _HighType(object):
def __cmp__(self, other):
if isinstance(other, self.__class__):
@ -135,38 +324,102 @@ Implementation
return 1
def __repr__(self):
return 'All'
return 'cmp.high'
import sys
sys.modules['__builtin__'].All = _AllType()
class _LowType(object):
Results of Test Run in Python 2.3.3::
def __cmp__(self, other):
if isinstance(other, self.__class__):
return 0
return -1
>>> max(All, 2**65536)
All
>>> min(All, 2**65536)
def __repr__(self):
return 'cmp.low'
# please note that the following code doesn't
# work due to built-ins being read-only
cmp.high = _HighType()
cmp.low = _LowType()
Results of Test Run if we could set cmp.high and cmp.low::
>>> max(cmp.high, 2**65536)
cmp.high
>>> min(cmp.high, 2**65536)
20035299304068464649790...
(lines removed for brevity)
...72339445587895905719156736L
>>> max(All, None)
All
>>> min(All, None)
>>> print min(All, None)
None
>>> bool(All)
True
>>> not None
1
>>> not All
0
>>> min(cmp.low, -2**65536)
cmp.low
>>> max(cmp.low, -2**65536)
-2003529930406846464979...
(lines removed for brevity)
...072339445587895905719156736L
Open Issues
===========
``All`` seemed to be an awkward object to various Python developers on
the python-dev mailing list. The author hopes that with this updated
PEP, developers will no longer find ``All`` awkward.
- Previously, ``Some`` and ``All`` were names for the idea that
``cmp.high`` now represents. They have been subsumed by
``cmp.high`` due to the relative ambiguity of ``Some`` and ``All``.
- Terry Reedy [5]_ and others have offered alternate names for the
``high/low`` objects: ``ObjHi/ObjLo``, ``NoneHi/NoneLo``,
``Highest/Lowest``, ``Infinity/NegativeInfinity``, ``hi/lo`` and
``High/Low``.
- Terry Reedy has also offered possible default behaviors of ``min``
and ``max`` on empty lists using these values. Some have voiced
that changing the behavior of min and max are not desirable due to
the amount of code that actively uses min and max, which may rely on
min and max raising exceptions on empty lists.
- Choosing ``high`` and ``low`` to be the attributes of ``cmp`` is
arbitrary, but meaningful. Other meaningful parent locations
include, but are not limited to: ``math``, ``int`` and ``number``
(if such a numeric superclass existed). ``sys`` probably does not
make sense, as such maximum and minimum values are not platform
dependent.
- The base class of the high and low objects do not necessarily have
to be ``object``. ``object``, ``NoneType`` or even a new class
called ``cmp.extreme`` have been suggested.
- Various built-in names such as ``All`` and ``Some`` have been
rejected by many users. Based on comments, it seems that regardless
of name, any new built-in would be rejected. [6]_
- Should ``-cmp.high == cmp.low``? This seems to make logical sense.
- Certainly ``bool(cmp.high) == True``, but should ``bool(cmp.low)``
be ``True`` or ``False``? Due to ``bool(1) == bool(-1) == True``,
it seems to follow that ``bool(cmp.high) == bool(cmp.low) == True``.
- Whatever name the concepts of a top and bottom value come to have,
the question of whether or not math can be done on them may or may
not make sense. If math were not allowed, it brings up a potential
ambiguity that while ``-cmp.high == cmp.low``, ``-1 * cmp.high``
would produce an exception.
Most-Preferred Options
======================
Through a non-rigorous method, the following behavior of the objects
seem to be preferred by those who are generally in favor of this PEP
in python-dev.
- ``high`` and ``low`` objects should be attached to the ``cmp``
built-in as ``cmp.high`` and ``cmp.low`` (or ``cmp.High/cmp.Low``).
- ``-cmp.high == cmp.low`` and equivalently ``-cmp.low == cmp.high``.
- ``bool(cmp.high) == bool(cmp.low) == True``
- The base type seems to be a cosmetic issue and has not resulted in
any real preference other than ``cmp.extreme`` making the most
sense.
References
@ -181,25 +434,58 @@ References
.. [3] [Python-Dev] Re: Got None. Maybe Some?, Reedy, Terry
(http://mail.python.org/pipermail/python-dev/2003-December/041337.html)
.. [4] RE: [Python-Dev] Got None. Maybe Some?, Peters, Tim
(http://mail.python.org/pipermail/python-dev/2003-December/041332.html)
.. [5] [Python-Dev] Re: PEP 326 now online, Reedy, Terry
(http://mail.python.org/pipermail/python-dev/2004-January/041685.html)
.. [6] [Python-Dev] PEP 326 now online, Chermside, Michael
(http://mail.python.org/pipermail/python-dev/2004-January/041704.html)
.. [7] Homework 6, Problem 7, Dillencourt, Michael
(link may not be valid in the future)
(http://www.ics.uci.edu/~dillenco/ics161/hw/hw6.pdf)
Changes
=======
Added this section.
- Added this section.
Renamed ``Some`` to ``All``: Some was an arbitrary name that suffered
from being unclear. [3]_
- Renamed ``Some`` to ``All``: "Some" was an arbitrary name that
suffered from being unclear. [3]_
Made ``All`` a subclass of object in order for it to become a new
style class.
- Made ``All`` a subclass of ``object`` in order for it to become a
new-style class.
Removed mathematical negation and casting to float in Open Issues.
None is not a number and is not treated as one, ``All`` shouldn't be
either.
- Removed mathematical negation and casting to float in Open Issues.
``None`` is not a number and is not treated as one, ``All``
shouldn't be either.
Added Motivation section.
- Added Motivation_ section.
Changed markup to reStructuredText.
- Changed markup to reStructuredText.
- Renamed ``All`` to ``cmp.hi`` to remove builtin requirement and to
provide a better better name, as well as adding an equivalent
future-proof bottom value ``cmp.lo``. [5]_
- Clarified Abstract_, Motivation_, `Reference Implementation`_ and
`Open Issues`_ based on the simultaneous concepts of ``cmp.hi`` and
``cmp.lo``.
- Added two implementations of Dijkstra's Shortest Path algorithm that
show where ``cmp.hi`` can be used to remove special cases.
- Renamed ``hi`` to ``high`` and ``lo`` to ``low`` to address concerns
for non-native english speakers.
- Added an example of use for ``cmp.low`` to Motivation_.
- Added a couple `Open Issues`_ and clarified some others.
- Added `Most-Preferred Options`_ section.
Copyright
@ -213,5 +499,5 @@ This document has been placed in the public domain.
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 74
fill-column: 70
End: