python-peps/pep-0326.txt

PEP: 326
Title: A Case for Top and Bottom Values
Version: $Revision$
Last-Modified: $Date$
Author: Josiah Carlson <jcarlson@uci.edu>,
        Terry Reedy <tjreedy@udel.edu>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 20-Dec-2003
Python-Version: 2.4
Post-History: 20-Dec-2003, 03-Jan-2004, 05-Jan-2004, 07-Jan-2004


Abstract
========

This PEP proposes two singleton constants that represent a top and
bottom [3]_ value: ``Max`` and ``Min`` (or two similarly suggestive
names [4]_; see `Open Issues`_).

As suggested by their names, ``Max`` and ``Min`` would compare higher
or lower than any other object (respectively).  Such behavior results
in easier to understand code and fewer special cases in which a
temporary minimum or maximum value is required, and an actual minimum
or maximum numeric value is not limited.


Rationale
=========

While ``None`` can be used as an absolute minimum that any value can
attain [1]_, this may be depreciated [4]_ in Python 3.0, and shouldn't
be relied upon.

As a replacement for ``None`` being used as an absolute minimum, as
well as the introduction of an absolute maximum, the introduction of
two singleton constants ``Max`` and ``Min`` address concerns for the
constants to be self-documenting.

What is commonly done to deal with absolute minimum or maximum values,
is to set a value that is larger than the script author ever expects
the input to reach, and hope that it isn't reached.

Guido has brought up [2]_ the fact that there exists two constants
that can be used in the interim for maximum values: sys.maxint and
floating point positive infinity (1e309 will evaluate to positive
infinity).  However, each has their drawbacks.

- On most architectures sys.maxint is arbitrarily small (2**31-1 or
  2**63-1) and can be easily eclipsed by large 'long' integers or
  floating point numbers.

- Comparing long integers larger than the largest floating point
  number representable against any float will result in an exception
  being raised::

        >>> cmp(1.0, 10**309)
        Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        OverflowError: long int too large to convert to float

  Even when large integers are compared against positive infinity::

        >>> cmp(1e309, 10**309)
        Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        OverflowError: long int too large to convert to float

- These same drawbacks exist when numbers are small.

Introducing ``Max`` and ``Min`` that work as described above does not
take much effort.  A sample Python `reference implementation`_ of both
is included.


Motivation
==========

There are hundreds of algorithms that begin by initializing some set
of values to a logical (or numeric) infinity or negative infinity.
Python lacks either infinity that works consistently or really is the
most extreme value that can be attained.  By adding ``Max`` and
``Min``, Python would have a real maximum and minimum value, and such
algorithms can become clearer due to the reduction of special cases.

``Max`` Examples
---------------------

Take for example, finding the minimum in a sequence::

    def findmin_Num(seq):
        BIG = 0
        cur = BIG
        for obj in seq:
            if cur == BIG:
                cur = obj
                BIG = max(cur, BIG) + 1
            else:
                cur = min(cur, obj)
        return cur

::

    def findmin_None(seq):
        cur = None
        for obj in seq:
            if obj < cur or (cur is None):
                cur = obj
                if cur is None:
                    return cur
        return cur

::

    def findmin_Max(seq):
        cur = Max
        for obj in seq:
            cur = min(obj, cur)
        return cur

Please note that there are an arbitrarily large number of ways to find
the minimum (or maximum) of a sequence, these seek to show a simple
example where using ``Max`` makes the algorithm easier to understand
and results in the simplification of code.

Guido brought up the idea of just negating everything and comparing
[2]_.  Certainly this does work when using numbers, but it does not
remove the special case (actually adds one) and results in the code
being less readable. ::

    #we have Max available
    a = min(a, b)

    #we don't have Max available
    if a is not None:
        if b is None:
            a = b
        else:
            a = -max(-a, -b)

As another example, in Dijkstra's shortest path algorithm on a graph
with weighted edges (all positive).

1. Set distances to every node in the graph to infinity.
2. Set the distance to the start node to zero.
3. Set visited to be an empty mapping.
4. While shortest distance of a node that has not been visited is less
   than infinity and the destination has not been visited.

   a. Get the node with the shortest distance.
   b. Visit the node.
   c. Update neighbor distances and parent pointers if necessary for
      neighbors that have not been visited.

5. If the destination has been visited, step back through parent
   pointers to find the reverse of the path to be taken.

To be complete, below are two versions of the algorithm, one using a
table (a bit more understandable) and one using a heap (much faster)::

    def DijkstraSP_table(graph, S, T):
        #runs in O(N^2) time using a table
        #find the shortest path
        table = {}
        for node in graph.iterkeys():
            #(visited, distance, node, parent)
            table[node] = (0, Max, node, None)
        table[S] = (0, 0, S, None)
        cur = min(table.values())
        while (not cur[0]) and cur[1] < Max:
            (visited, distance, node, parent) = cur
            table[node] = (1, distance, node, parent)
            for cdist, child in graph[node]:
                ndist = distance+cdist
                if not table[child][0] and ndist < table[child][1]:
                    table[child] = (0, ndist, child, node)
            cur = min(table.values())
        #backtrace through results
        if not table[T][0]:
            return None
        cur = T
        path = [T]
        while table[cur][3] is not None:
            path.append(table[cur][3])
            cur = path[-1]
        path.reverse()
        return path

::

    def DijkstraSP_heap(graph, S, T):
        #runs in O(NlgN) time using a minheap
        #find the shortest path
        import heapq
        Q = [(Max, i, None) for i in graph.iterkeys()]
        heapq.heappush(Q, (0, S, None))
        V = {}
        while Q[0][0] < Max and T not in V:
            dist, node, parent = heapq.heappop(Q)
            if node in V:
                continue
            V[node] = (dist, parent)
            for next, dest in graph[node]:
                heapq.heappush(Q, (next+dist, dest, node))
        #backtrace through results
        if T not in V:
            return None
        cur = T
        path = [T]
        while V[cur][1] is not None:
            path.append(V[cur][1])
            cur = path[-1]
        path.reverse()
        return path

Readers should note that replacing ``Max`` in the above code with an
arbitrarily large number does not guarantee that the shortest path
distance to a node will never exceed that number.  Well, with one
caveat: one could certainly sum up the weights of every edge in the
graph, and set the 'arbitrarily large number' to that total.  However,
doing so does not make the algorithm any easier to understand and has
potential problems with numeric overflows.


A ``Min`` Example
-----------------

An example of usage for ``Min`` is an algorithm that solves the
following problem [6]_:

    Suppose you are given a directed graph, representing a
    communication network.  The vertices are the nodes in the network,
    and each edge is a communication channel. Each edge ``(u, v)`` has
    an associated value ``r(u, v)``, with ``0 <= r(u, v) <= 1``, which
    represents the reliability of the channel from ``u`` to ``v``
    (i.e., the probability that the channel from ``u`` to ``v`` will
    **not** fail).  Assume that the reliability probabilities of the
    channels are independent.  (This implies that the reliability of
    any path is the product of the reliability of the edges along the
    path.)  Now suppose you are given two nodes in the graph, ``A``
    and ``B``.

Such an algorithm is a 7 line modification to the DijkstraSP_table
algorithm given above::

    #only showing the changed to lines with the proper indentation
            table[node] = (0, Min, node, None)
        table[S] = (0, 1, S, None)
        cur = max(table.values())
        while (not cur[0]) and cur[1] > Min:
                ndist = distance*cdist
                if not table[child][0] and ndist > table[child][1]:
            cur = max(table.values())

Or a 6 line modification to the DijkstraSP_heap algorithm given above
(if we assume that ``maxheapq`` exists and does what it is supposed
to)::

    #only showing the changed to lines with the proper indentation
        import maxheapq
        Q = [(Min, i, None) for i in graph.iterkeys()]
        maxheapq.heappush(Q, (1, S, None))
        while Q[0][0] > Min and T not in V:
            dist, node, parent = maxheapq.heappop(Q)
                maxheapq.heappush(Q, (next*dist, dest, node))

Note that there is an equivalent way of translating the graph to
produce something that can be passed unchanged into the original
Dijkstra shortest path algorithm.


Other Examples
--------------

Andrew P. Lentvorski, Jr. [7]_ has pointed out that various data
structures involving range searching have immediate use for ``Max``
and ``Min`` values.  More specifically; Segment trees, Range trees,
k-d trees and database keys:

    ...The issue is that a range can be open on one side and does not
    always have an initialized case.

    The solutions I have seen are to either overload None as the
    extremum or use an arbitrary large magnitude number.  Overloading
    None means that the built-ins can't really be used without special
    case checks to work around the undefined (or "wrongly defined")
    ordering of None.  These checks tend to swamp the nice performance
    of built-ins like max() and min().

    Choosing a large magnitude number throws away the ability of
    Python to cope with arbitrarily large integers and introduces a
    potential source of overrun/underrun bugs.

Further use examples of both ``Max`` and ``Min`` are available in the
realm of graph algorithms, range searching algorithms, computational
geometry algorithms, and others.


Independent Implementations?
----------------------------

Independent implementations of the ``Min``/``Max`` concept by users
desiring such functionality are not likely to be compatible, and
certainly will produce inconsistent orderings.  The following examples
seek to show how inconsistent they can be.

- Let us pretend we have created proper separate implementations of
  MyMax, MyMin, YourMax and YourMin with the same code as given in
  the sample implementation (with some minor renaming)::

    >>> lst = [YourMin, MyMin, MyMin, YourMin, MyMax, YourMin, MyMax,
    YourMax, MyMax]
    >>> lst.sort()
    >>> lst
    [YourMin, YourMin, MyMin, MyMin, YourMin, MyMax, MyMax, YourMax,
    MyMax]

  Notice that while all the "Min"s are before the "Max"s, there is no
  guarantee that all instances of YourMin will come before MyMin, the
  reverse, or the equivalent MyMax and YourMax.

- The problem is also evident when using the heapq module::

    >>> lst = [YourMin, MyMin, MyMin, YourMin, MyMax, YourMin, MyMax,
    YourMax, MyMax]
    >>> heapq.heapify(lst)  #not needed, but it can't hurt
    >>> while lst: print heapq.heappop(lst),
    ...
    YourMin MyMin YourMin YourMin MyMin MyMax MyMax YourMax MyMax

- Furthermore, the findmin_Max code and both versions of Dijkstra
  could result in incorrect output by passing in secondary versions of
  ``Max``.


Reference Implementation
========================

::

    class _ExtremeType(object):

        def __init__(self, cmpr, rep):
            object.__init__(self)
            self._cmpr = cmpr
            self._rep = rep

        def __cmp__(self, other):
            if isinstance(other, self.__class__) and\
               other._cmpr == self._cmpr:
                return 0
            return self._cmpr

        def __repr__(self):
            return self._rep

    Max = _ExtremeType(1, "Max")
    Min = _ExtremeType(-1, "Min")

Results of Test Run::

    >>> max(Max, 2**65536)
    Max
    >>> min(Max, 2**65536)
    20035299304068464649790...
    (lines removed for brevity)
    ...72339445587895905719156736L
    >>> min(Min, -2**65536)
    Min
    >>> max(Min, -2**65536)
    -2003529930406846464979...
    (lines removed for brevity)
    ...072339445587895905719156736L


Open Issues
===========

Current options for the naming and namespace for ``Min``/``Max``, in
no particular order:

1. Give the built-in ``max`` and ``min`` appropriate ``__cmp__``
   methods to allow them to double as ``Min``/``Max``.
2. Attach them to attributes of the ``cmp()`` built-in.
3. Attach them to attributes of an appropriate type object.
4. Make them an appropriate module object.
5. Create two new built-ins with appropriate names.


References
==========

.. [1] RE: [Python-Dev] Re: Got None. Maybe Some?, Peters, Tim
   (http://mail.python.org/pipermail/python-dev/2003-December/041374.html)

.. [2] Re: [Python-Dev] Got None. Maybe Some?, von Rossum, Guido
   (http://mail.python.org/pipermail/python-dev/2003-December/041352.html)

.. [3] RE: [Python-Dev] Got None. Maybe Some?, Peters, Tim
   (http://mail.python.org/pipermail/python-dev/2003-December/041332.html)

.. [4] [Python-Dev] Re: PEP 326 now online, Reedy, Terry
   (http://mail.python.org/pipermail/python-dev/2004-January/041685.html)

.. [5] [Python-Dev] PEP 326 now online, Chermside, Michael
   (http://mail.python.org/pipermail/python-dev/2004-January/041704.html)

.. [6] Homework 6, Problem 7, Dillencourt, Michael
   (link may not be valid in the future)
   (http://www.ics.uci.edu/~dillenco/ics161/hw/hw6.pdf)

.. [7] RE: [Python-Dev] PEP 326 now online, Lentvorski, Andrew P., Jr.
   (http://mail.python.org/pipermail/python-dev/2004-January/041727.html)

.. [8] Re: It's not really Some is it?, Ippolito, Bob
   (http://www.livejournal.com/users/chouyu_31/138195.html?thread=274643#t274643)

Changes
=======

- Added this section.

- Added Motivation_ section.

- Changed markup to reStructuredText.

- Concept gets a possible name and location. [5]_

- Clarified Abstract_, Motivation_, `Reference Implementation`_ and
  `Open Issues`_ based on the simultaneous concepts of ``Max`` and
  ``Min``.

- Added two implementations of Dijkstra's Shortest Path algorithm that
  show where ``Max`` can be used to remove special cases.

- Added an example of use for ``Min`` to Motivation_.

- Added some `Open Issues`_ and clarified some others.

- Added an example and `Other Examples`_ subheading.

- Modified `Reference Implementation`_ to instantiate both items from
  a single class/type.

- Removed a large number of open issues that are not within the scope
  of this PEP.


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End: