python-peps/pep-0201.txt

PEP: 201
Title: Lockstep Iteration
Version: $Revision$
Last-Modified: $Date$
Author: barry@python.org (Barry A. Warsaw)
Status: Final
Type: Standards Track
Created: 13-Jul-2000
Python-Version: 2.0
Post-History: 27-Jul-2000


Introduction

    This PEP describes the `lockstep iteration' proposal.  This PEP
    tracks the status and ownership of this feature, slated for
    introduction in Python 2.0.  It contains a description of the
    feature and outlines changes necessary to support the feature.
    This PEP summarizes discussions held in mailing list forums, and
    provides URLs for further information, where appropriate.  The CVS
    revision history of this file contains the definitive historical
    record.


Motivation

    Standard for-loops in Python iterate over every element in a
    sequence until the sequence is exhausted[1].  However, for-loops
    iterate over only a single sequence, and it is often desirable to
    loop over more than one sequence in a lock-step fashion.  In other
    words, in a way such that nthe i-th iteration through the loop
    returns an object containing the i-th element from each sequence.

    The common idioms used to accomplish this are unintuitive.  This
    PEP proposes a standard way of performing such iterations by
    introducing a new builtin function called `zip'.

    While the primary motivation for zip() comes from lock-step
    iteration, by implementing zip() as a built-in function, it has
    additional utility in contexts other than for-loops.

Lockstep For-Loops

    Lockstep for-loops are non-nested iterations over two or more
    sequences, such that at each pass through the loop, one element
    from each sequence is taken to compose the target.  This behavior
    can already be accomplished in Python through the use of the map()
    built-in function:

    >>> a = (1, 2, 3)
    >>> b = (4, 5, 6)
    >>> for i in map(None, a, b): print i
    ...
    (1, 4)
    (2, 5)
    (3, 6)
    >>> map(None, a, b)
    [(1, 4), (2, 5), (3, 6)]

    The for-loop simply iterates over this list as normal.

    While the map() idiom is a common one in Python, it has several
    disadvantages:

    - It is non-obvious to programmers without a functional
      programming background.

    - The use of the magic `None' first argument is non-obvious.

    - It has arbitrary, often unintended, and inflexible semantics
      when the lists are not of the same length: the shorter sequences
      are padded with `None'.

      >>> c = (4, 5, 6, 7)
      >>> map(None, a, c)
      [(1, 4), (2, 5), (3, 6), (None, 7)]

    For these reasons, several proposals were floated in the Python
    2.0 beta time frame for syntactic support of lockstep for-loops.
    Here are two suggestions:

    for x in seq1, y in seq2:
        # stuff

    for x, y in seq1, seq2:
        # stuff

    Neither of these forms would work, since they both already mean
    something in Python and changing the meanings would break existing
    code.  All other suggestions for new syntax suffered the same
    problem, or were in conflict with other another proposed feature
    called `list comprehensions' (see PEP 202).

The Proposed Solution

    The proposed solution is to introduce a new built-in sequence
    generator function, available in the __builtin__ module.  This
    function is to be called `zip' and has the following signature:

    zip(seqa, [seqb, [...]])

    zip() takes one or more sequences and weaves their elements
    together, just as map(None, ...) does with sequences of equal
    length.  The weaving stops when the shortest sequence is
    exhausted.


Return Value

    zip() returns a real Python list, the same way map() does.


Examples

    Here are some examples, based on the reference implementation
    below.

    >>> a = (1, 2, 3, 4)
    >>> b = (5, 6, 7, 8)
    >>> c = (9, 10, 11)
    >>> d = (12, 13)

    >>> zip(a, b)
    [(1, 5), (2, 6), (3, 7), (4, 8)]

    >>> zip(a, d)
    [(1, 12), (2, 13)]

    >>> zip(a, b, c, d)
    [(1, 5, 9, 12), (2, 6, 10, 13)]

    Note that when the sequences are of the same length, zip() is
    reversible:

    >>> a = (1, 2, 3)
    >>> b = (4, 5, 6)
    >>> x = zip(a, b)
    >>> y = zip(*x) # alternatively, apply(zip, x)
    >>> z = zip(*y) # alternatively, apply(zip, y)
    >>> x
    [(1, 4), (2, 5), (3, 6)]
    >>> y
    [(1, 2, 3), (4, 5, 6)]
    >>> z
    [(1, 4), (2, 5), (3, 6)]
    >>> x == z
    1

    It is not possible to reverse zip this way when the sequences are
    not all the same length.


Reference Implementation

    Here is a reference implementation, in Python of the zip()
    built-in function.  This will be replaced with a C implementation
    after final approval.

    def zip(*args):
        if not args:
            raise TypeError('zip() expects one or more sequence arguments')
        ret = []
        i = 0
        try:
            while 1:
                item = []
                for s in args:
                    item.append(s[i])
                ret.append(tuple(item))
                i = i + 1
        except IndexError:
            return ret


BDFL Pronouncements

    Note: the BDFL refers to Guido van Rossum, Python's Benevolent
    Dictator For Life.

    - The function's name.  An earlier version of this PEP included an
      open issue listing 20+ proposed alternative names to zip().  In
      the face of no overwhelmingly better choice, the BDFL strongly
      prefers zip() due to its Haskell[2] heritage.  See version 1.7
      of this PEP for the list of alternatives.

    - zip() shall be a built-in function.

    - Optional padding.  An earlier version of this PEP proposed an
      optional `pad' keyword argument, which would be used when the
      argument sequences were not the same length.  This is similar
      behavior to the map(None, ...) semantics except that the user
      would be able to specify pad object.  This has been rejected by
      the BDFL in favor of always truncating to the shortest sequence,
      because of the KISS principle.  If there's a true need, it is
      easier to add later.  If it is not needed, it would still be
      impossible to delete it in the future.

    - Lazy evaluation.  An earlier version of this PEP proposed that
      zip() return a built-in object that performed lazy evaluation
      using __getitem__() protocol.  This has been strongly rejected
      by the BDFL in favor of returning a real Python list.  If lazy
      evaluation is desired in the future, the BDFL suggests an xzip()
      function be added.

    - zip() with no arguments.  the BDFL strongly prefers this raise a
      TypeError exception.

    - zip() with one argument.  the BDFL strongly prefers that this
      return a list of 1-tuples.

    - Inner and outer container control.  An earlier version of this
      PEP contains a rather lengthy discussion on a feature that some
      people wanted, namely the ability to control what the inner and
      outer container types were (they are tuples and list
      respectively in this version of the PEP).  Given the simplified
      API and implementation, this elaboration is rejected.  For a
      more detailed analysis, see version 1.7 of this PEP.

Subsequent Change to zip()

    In Python 2.4, zip() with no arguments was modified to return an
    empty list rather than raising a TypeError exception.  The rationale
    for the original behavior was that the absence of arguments was
    thought to indicate a programming error.  However, that thinking
    did not anticipate the use of zip() with the * operator for unpacking
    variable length argument lists.  For example, the inverse of zip
    could be defined as:  unzip = lambda s: zip(*s).  That transformation
    also defines a matrix transpose or an equivalent row/column swap for
    tables defined as lists of tuples.  The latter transformation is
    commonly used when reading data files with records as rows and fields
    as columns.  For example, the code:

        date, rain, high, low = zip(*csv.reader(file("weather.csv")))

    rearranges columnar data so that each field is collected into
    individual tuples for straight-forward looping and summarization:

        print "Total rainfall", sum(rain)

    Using zip(*args) is more easily coded if zip(*[]) is handled as an
    allowable case rather than an exception.  This is especially helpful
    when data is either built up from or recursed down to a null case
    with no records.

    Seeing this possibility, the BDFL agreed (with some misgivings) to
    have the behavior changed for Py2.4.

Other Changes

    - The xzip() function discussed above was implemented in Py2.3 in
      the itertools module as itertools.izip().  This function provides
      lazy behavior, consuming single elements and producing a single
      tuple on each pass.  The "just-in-time" style saves memory and
      runs faster than its list based counterpart, zip().

    - The itertools module also added itertools.repeat() and
      itertools.chain().  These tools can be used together to pad
      sequences with None (to match the behavior of map(None, seqn)):

          zip(firstseq, chain(secondseq, repeat(None)))


References

    [1] http://www.python.org/doc/current/ref/for.html
    [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip

    Greg Wilson's questionaire on proposed syntax to some CS grad students
    http://www.python.org/pipermail/python-dev/2000-July/013139.html


Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: