PEP: 211
Title: Adding New Linear Algebra Operators to Python
Version: $Revision$
Owner: gvwilson@nevex.com (Greg Wilson)
Python-Version: 2.1
Created: 15-Jul-2000
Status: Draft
Post-History:


Introduction

    This PEP describes a proposal to add linear algebra operators to
    Python 2.0.  It discusses why such operators are desirable, and
    alternatives that have been considered and discarded.  This PEP
    summarizes discussions held in mailing list forums, and provides
    URLs for further information, where appropriate.  The CVS revision
    history of this file contains the definitive historical record.


Proposal

    Add a single new infix binary operator '@' ("across"), and
    corresponding special methods "__across__()" and "__racross__()".
    This operator will perform mathematical matrix multiplication on
    NumPy arrays, and generate cross-products when applied to built-in
    sequence types.  No existing operator definitions will be changed.


Background

    Computers were invented to do arithmetic, as was the first
    high-level programming language, Fortran.  While Fortran was a
    great advance on its machine-level predecessors, there was still a
    very large gap between its syntax and the notation used by
    mathematicians.  The most influential effort to close this gap was
    APL [1]:

        The language [APL] was invented by Kenneth E. Iverson while at
        Harvard University. The language, originally titled "Iverson
        Notation", was designed to overcome the inherent ambiguities
        and points of confusion found when dealing with standard
        mathematical notation. It was later described in 1962 in a
        book simply titled "A Programming Language" (hence APL).
        Towards the end of the sixties, largely through the efforts of
        IBM, the computer community gained its first exposure to
        APL. Iverson received the Turing Award in 1980 for this work.

    APL's operators supported both familiar algebraic operations, such
    as vector dot product and matrix multiplication, and a wide range
    of structural operations, such as stitching vectors together to
    create arrays.  Its notation was exceptionally cryptic: many of
    its symbols did not exist on standard keyboards, and expressions
    had to be read right to left.

    Most subsequent work on numerical languages, such as Fortran-90,
    MATLAB, and Mathematica, has tried to provide the power of APL
    without the obscurity.  Python's NumPy [2] has most of the
    features that users of such languages expect, but these are
    provided through named functions and methods, rather than
    overloaded operators.  This makes NumPy clumsier than its
    competitors.

    One way to make NumPy more competitive is to provide greater
    syntactic support in Python itself for linear algebra.  This
    proposal therefore examines the requirements that new linear
    algebra operators in Python must satisfy, and proposes a syntax
    and semantics for those operators.


Requirements

    The most important requirement is that there be minimal impact on
    the existing definition of Python.  The proposal must not break
    existing programs, except possibly those that use NumPy.

    The second most important requirement is to be able to do both
    elementwise and mathematical matrix multiplication using infix
    notation.  The nine cases that must be handled are:

        |5 6| *   9   = |45 54|      MS: matrix-scalar multiplication
        |7 8|           |63 72|

          9   * |5 6| = |45 54|      SM: scalar-matrix multiplication
                |7 8|   |63 72|

        |2 3| * |4 5| = |8 15|       VE: vector elementwise multiplication


        |2 3| *  |4|  =   23         VD: vector dot product
                 |5|

         |2|  * |4 5| = | 8 10|      VO: vector outer product
         |3|            |12 15|

        |1 2| * |5 6| = | 5 12|      ME: matrix elementwise multiplication
        |3 4|   |7 8|   |21 32|

        |1 2| * |5 6| = |19 22|      MM: mathematical matrix multiplication
        |3 4|   |7 8|   |43 50|

        |1 2| * |5 6| = |19 22|      VM: vector-matrix multiplication
                |7 8|

        |5 6| *  |1|  =   |17|       MV: matrix-vector multiplication
        |7 8|    |2|      |23|

    Note that 1-dimensional vectors are treated as rows in VM, as
    columns in MV, and as both in VD and VO.  Both are special cases
    of 2-dimensional matrices (Nx1 and 1xN respectively).  It may
    therefore be reasonable to define the new operator only for
    2-dimensional arrays, and provide an easy (and efficient) way for
    users to convert 1-dimensional structures to 2-dimensional.
    Behavior of a new multiplication operator for built-in types may
    then:

    (a) be a parsing error (possible only if a constant is one of the
        arguments, since names are untyped in Python);

    (b) generate a runtime error; or

    (c) be derived by plausible extension from its behavior in the
        two-dimensional case.

    Third, syntactic support should be considered for three other
    operations:

                         T
    (a) transposition:  A   => A[j, i] for A[i, j]

                         -1
    (b) inverse:        A   => A' such that A' * A = I (the identity matrix)

    (c) solution:       A/b => x  such that A * x = b
                        A\b => x  such that x * A = b

    With regard to (c), it is worth noting that the two syntaxes used
    were invented by programmers, not mathematicians.  Mathematicians
    do not have a standard, widely-used notation for matrix solution.

    It is also worth noting that dozens of matrix inversion and
    solution algorithms are widely used.  MATLAB and its kin bind
    their inversion and/or solution operators to one which is
    reasonably robust in most cases, and require users to call
    functions or methods to access others.

    Fourth, confusion between Python's notation and those of MATLAB
    and Fortran-90 should be avoided.  In particular, mathematical
    matrix multiplication (case MM) should not be represented as '.*',
    since:

    (a) MATLAB uses prefix-'.' forms to mean 'elementwise', and raw
        forms to mean "mathematical" [4]; and

    (b) even if the Python parser can be taught how to handle dotted
        forms, '1.*A' will still be visually ambiguous [4].

    One anti-requirement is that new operators are not needed for
    addition, subtraction, bitwise operations, and so on, since
    mathematicians already treat them elementwise.


Proposal:

    The meanings of all existing operators will be unchanged.  In
    particular, 'A*B' will continue to be interpreted elementwise.
    This takes care of the cases MS, SM, VE, and ME, and ensures
    minimal impact on existing programs.

    A new operator '@' (pronounced "across") will be added to Python,
    along with two special methods, "__across__()" and
    "__racross__()", with the usual semantics.

    NumPy will overload "@" to perform mathematical multiplication of
    arrays where shapes permit, and to throw an exception otherwise.
    The matrix class's implementation of "@" will treat built-in
    sequence types as if they were column vectors.  This takes care of
    the cases MM and MV.

    An attribute "T" will be added to the NumPy array type, such that
    "m.T" is:

    (a) the transpose of "m" for a 2-dimensional array

    (b) the 1xN matrix transpose of "m" if "m" is a 1-dimensional
        array; or

    (c) a runtime error for an array with rank >= 3.

    This attribute will alias the memory of the base object.  NumPy's
    "transpose()" function will be extended to turn built-in sequence
    types into row vectors.  This takes care of the VM, VD, and VO
    cases.  We propose an attribute because:

    (a) the resulting notation is similar to the 'superscript T' (at
        least, as similar as ASCII allows), and

    (b) it signals that the transposition aliases the original object.

    No new operators will be defined to mean "solve a set of linear
    equations", or "invert a matrix".  Instead, NumPy will define a
    value "inv", which will be recognized by the exponentiation
    operator, such that "A ** inv" is the inverse of "A".  This is
    similar in spirit to NumPy's existing "newaxis" value.

    (Optional) When applied to sequences, the operator will return a
    list of tuples containing the cross-product of their elements in
    left-to-right order:

    >>> [1, 2] @ (3, 4)
    [(1, 3), (1, 4), (2, 3), (2, 4)]

    >>> [1, 2] @ (3, 4) @ (5, 6)
    [(1, 3, 5), (1, 3, 6), 
     (1, 4, 5), (1, 4, 6),
     (2, 3, 5), (2, 3, 6),
     (2, 4, 5), (2, 4, 6)]

    This will require the same kind of special support from the parser
    as chained comparisons (such as "a<b<c<=d").  However, it would
    permit the following:

    >>> for (i, j) in [1, 2] @ [3, 4]:
    >>>     print i, j
    1 3
    1 4
    2 3
    2 4

    as a short-hand for the common nested loop idiom:

    >>> for i in [1, 2]:
    >>>    for j in [3, 4]:
    >>>        print i, j

    Response to the 'lockstep loop' questionnaire [5] indicated that
    newcomers would be comfortable with this (so comfortable, in fact,
    that most of them interpreted most multi-loop 'zip' syntaxes [6]
    as implementing single-stage nesting).


Alternatives:

    01. Don't add new operators --- stick to functions and methods.

    Python is not primarily a numerical language.  It is not worth
    complexifying the language for this special case --- NumPy's
    success is proof that users can and will use functions and methods
    for linear algebra.

    On the positive side, this maintains Python's simplicity.  Its
    weakness is that support for real matrix multiplication (and, to a
    lesser extent, other linear algebra operations) is frequently
    requested, as functional forms are cumbersome for lengthy
    formulas, and do not respect the operator precedence rules of
    conventional mathematics.  In addition, the method form is
    asymmetric in its operands.

    02. Introduce prefixed forms of existing operators, such as "@*"
        or "~*", or used boxed forms, such as "[*]" or "%*%".

    There are (at least) three objections to this.  First, either form
    seems to imply that all operators exist in both forms.  This is
    more new entities than the problem merits, and would require the
    addition of many new overloadable methods, such as __at_mul__.

    Second, while it is certainly possible to invent semantics for
    these new operators for built-in types, this would be a case of
    the tail wagging the dog, i.e. of letting the existence of a
    feature "create" a need for it.

    Finally, the boxed forms make human parsing more complex, e.g.:

        A[*] = B    vs.    A[:] = B

    03. (From Moshe Zadka [7], and also considered by Huaiyu Zhou [8]
        in his proposal [9]) Retain the existing meaning of all
        operators, but create a behavioral accessor for arrays, such
        that:

            A * B

        is elementwise multiplication (ME), but:

            A.m() * B.m()

        is mathematical multiplication (MM).  The method "A.m()" would
        return an object that aliased A's memory (for efficiency), but
        which had a different implementation of __mul__().

    The advantage of this method is that it has no effect on the
    existing implementation of Python: changes are localized in the
    Numeric module.  The disadvantages are:

    (a) The semantics of "A.m() * B", "A + B.m()", and so on would
        have to be defined, and there is no "obvious" choice for them.

    (b) Aliasing objects to trigger different operator behavior feels
        less Pythonic than either calling methods (as in the existing
        Numeric module) or using a different operator.  This PEP is
        primarily about look and feel, and about making Python more
        attractive to people who are not already using it.

    04. (From a proposal [9] by Huaiyu Zhou [8]) Introduce a "delayed
        inverse" attribute, similar to the "transpose" attribute
        advocated in the third part of this proposal.  The expression
        "a.I" would be a delayed handle on the inverse of the matrix
        "a", which would be evaluated in context as required.  For
        example, "a.I * b" and "b * a.I" would solve sets of linear
        equations, without actually calculating the inverse.

    The main drawback of this proposal it is reliance on lazy
    evaluation, and even more on "smart" lazy evaluation (i.e. the
    operation performed depends on the context in which the evaluation
    is done).  The BDFL has so far resisted introducing LE into
    Python.


Related Proposals

    0203 :  Augmented Assignments

            If new operators for linear algebra are introduced, it may
            make sense to introduce augmented assignment forms for
            them.

    0207 :  Rich Comparisons

            It may become possible to overload comparison operators
            such as '<' so that an expression such as 'A < B' returns
            an array, rather than a scalar value.

    0209 :  Adding Multidimensional Arrays

            Multidimensional arrays are currently an extension to
            Python, rather than a built-in type.


Acknowledgments:

    I am grateful to Huaiyu Zhu [8] for initiating this discussion,
    and for some of the ideas and terminology included below.


References:

    [1] http://www.acm.org/sigapl/whyapl.htm
    [2] http://numpy.sourceforge.net
    [3] PEP-0203.txt "Augmented Assignments".
    [4] http://bevo.che.wisc.edu/octave/doc/octave_9.html#SEC69
    [5] http://www.python.org/pipermail/python-dev/2000-July/013139.html
    [6] PEP-0201.txt "Lockstep Iteration"
    [7] Moshe Zadka is 'moshez@math.huji.ac.il'.
    [8] Huaiyu Zhu is 'hzhu@users.sourceforge.net'
    [9] http://www.python.org/pipermail/python-list/2000-August/112529.html


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: