220 lines
6.8 KiB
Plaintext
220 lines
6.8 KiB
Plaintext
PEP: 201
|
||
Title: Parallel Iteration
|
||
Version: $Revision$
|
||
Author: bwarsaw@beopen.com (Barry A. Warsaw)
|
||
Python-Version: 2.0
|
||
Status: Draft
|
||
Created: 13-Jul-2000
|
||
Post-History:
|
||
|
||
|
||
Introduction
|
||
|
||
This PEP describes the `parallel iteration' proposal for Python
|
||
2.0, previously known as `parallel for loops'. This PEP tracks
|
||
the status and ownership of this feature, slated for introduction
|
||
in Python 2.0. It contains a description of the feature and
|
||
outlines changes necessary to support the feature. This PEP
|
||
summarizes discussions held in mailing list forums, and provides
|
||
URLs for further information, where appropriate. The CVS revision
|
||
history of this file contains the definitive historical record.
|
||
|
||
|
||
Motivation
|
||
|
||
Standard for-loops in Python iterate over every element in a
|
||
sequence until the sequence is exhausted[1]. However, for-loops
|
||
iterate over only a single sequence, and it is often desirable to
|
||
loop over more than one sequence, in a lock-step, "Chinese Menu"
|
||
type of way.
|
||
|
||
The common idioms used to accomplish this are unintuitive and
|
||
inflexible. This PEP proposes a standard way of performing such
|
||
iterations by introducing a new builtin function called `zip'.
|
||
|
||
|
||
Parallel For-Loops
|
||
|
||
Parallel for-loops are non-nested iterations over two or more
|
||
sequences, such that at each pass through the loop, one element
|
||
from each sequence is taken to compose the target. This behavior
|
||
can already be accomplished in Python through the use of the map()
|
||
built-in function:
|
||
|
||
>>> a = (1, 2, 3)
|
||
>>> b = (4, 5, 6)
|
||
>>> for i in map(None, a, b): print i
|
||
...
|
||
(1, 4)
|
||
(2, 5)
|
||
(3, 6)
|
||
>>> map(None, a, b)
|
||
[(1, 4), (2, 5), (3, 6)]
|
||
|
||
The for-loop simply iterates over this list as normal.
|
||
|
||
While the map() idiom is a common one in Python, it has several
|
||
disadvantages:
|
||
|
||
- It is non-obvious to programmers without a functional
|
||
programming background.
|
||
|
||
- The use of the magic `None' first argument is non-obvious.
|
||
|
||
- It has arbitrary, often unintended, and inflexible semantics
|
||
when the lists are not of the same length: the shorter sequences
|
||
are padded with `None'.
|
||
|
||
>>> c = (4, 5, 6, 7)
|
||
>>> map(None, a, c)
|
||
[(1, 4), (2, 5), (3, 6), (None, 7)]
|
||
|
||
For these reasons, several proposals were floated in the Python
|
||
2.0 beta time frame for providing a better spelling of parallel
|
||
for-loops. The initial proposals centered around syntactic
|
||
changes to the for statement, but conflicts and problems with the
|
||
syntax were unresolvable, especially when parallel for-loops were
|
||
combined with another proposed feature called `list
|
||
comprehensions' (see pep-0202.txt).
|
||
|
||
|
||
The Proposed Solution
|
||
|
||
The proposed solution is to introduce a new built-in sequence
|
||
generator function, available in the __builtin__ module. This
|
||
function is to be called `zip' and has the following signature:
|
||
|
||
zip(seqa, [seqb, [...]])
|
||
|
||
zip() takes one or more sequences and weaves their elements
|
||
together, just as map(None, ...) does with sequences of equal
|
||
length. The weaving stops when the shortest sequence is
|
||
exhausted.
|
||
|
||
|
||
Return Value
|
||
|
||
zip() returns a real Python list, the same way map() does.
|
||
|
||
|
||
Examples
|
||
|
||
Here are some examples, based on the reference implementation
|
||
below.
|
||
|
||
>>> a = (1, 2, 3, 4)
|
||
>>> b = (5, 6, 7, 8)
|
||
>>> c = (9, 10, 11)
|
||
>>> d = (12, 13)
|
||
|
||
>>> zip(a, b)
|
||
[(1, 5), (2, 6), (3, 7), (4, 8)]
|
||
|
||
>>> zip(a, d)
|
||
[(1, 12), (2, 13)]
|
||
|
||
>>> zip(a, b, c, d)
|
||
[(1, 5, 9, 12), (2, 6, 10, 13)]
|
||
|
||
Note that when the sequences are of the same length, zip() is
|
||
reversible:
|
||
|
||
>>> a = (1, 2, 3)
|
||
>>> b = (4, 5, 6)
|
||
>>> x = zip(a, b)
|
||
>>> y = zip(*x) # alternatively, apply(zip, x)
|
||
>>> z = zip(*y) # alternatively, apply(zip, y)
|
||
>>> x
|
||
[(1, 4), (2, 5), (3, 6)]
|
||
>>> y
|
||
[(1, 2, 3), (4, 5, 6)]
|
||
>>> z
|
||
[(1, 4), (2, 5), (3, 6)]
|
||
>>> x == z
|
||
1
|
||
|
||
It is not possible to reverse zip this way when the sequences are
|
||
not all the same length.
|
||
|
||
|
||
Reference Implementation
|
||
|
||
Here is a reference implementation, in Python of the zip()
|
||
built-in function and helper class. These would ultimately be
|
||
replaced by equivalent C code.
|
||
|
||
def zip(*args):
|
||
if not args:
|
||
raise TypeError('zip() expects one or more sequence arguments')
|
||
ret = []
|
||
# find the length of the shortest sequence
|
||
shortest = min(*map(len, args))
|
||
for i in range(shortest):
|
||
item = []
|
||
for s in args:
|
||
item.append(s[i])
|
||
ret.append(tuple(item))
|
||
return ret
|
||
|
||
|
||
BDFL Pronouncements
|
||
|
||
Note: the BDFL refers to Guido van Rossum, Python's Benevolent
|
||
Dictator For Life.
|
||
|
||
- The function's name. An earlier version of this PEP included an
|
||
open issue listing 20+ proposed alternative names to zip(). In
|
||
the face of no overwhelmingly better choice, the BDFL strongly
|
||
prefers zip() due to it's Haskell[2] heritage. See version 1.7
|
||
of this PEP for the list of alteratives.
|
||
|
||
- zip() shall be a built-in function.
|
||
|
||
- Optional padding. An earlier version of this PEP proposed an
|
||
optional `pad' keyword argument, which would be used when the
|
||
argument sequences were not the same length. This is similar
|
||
behavior to the map(None, ...) semantics except that the user
|
||
would be able to specify pad object. This has been rejected by
|
||
the BDFL in favor of always truncating to the shortest sequence.
|
||
|
||
- Lazy evaluation. An earlier version of this PEP proposed that
|
||
zip() return a built-in object that performed lazy evaluation
|
||
using __getitem__() protocol. This has been strongly rejected
|
||
by the BDFL in favor of returning a real Python list. If lazy
|
||
evaluation is desired in the future, the BDFL suggests an xzip()
|
||
function be added.
|
||
|
||
- zip() with no arguments. the BDFL strongly prefers this raise a
|
||
TypeError exception.
|
||
|
||
- zip() with one argument. the BDFL strongly prefers that this
|
||
return a list of 1-tuples.
|
||
|
||
- Inner and outer container control. An earlier version of this
|
||
PEP contains a rather lengthy discussion on a feature that some
|
||
people wanted, namely the ability to control what the inner and
|
||
outer container types were (they are tuples and list
|
||
respectively in this version of the PEP). Given the simplified
|
||
API and implementation, this elaboration is rejected. For a
|
||
more detailed analysis, see version 1.7 of this PEP.
|
||
|
||
|
||
References
|
||
|
||
[1] http://www.python.org/doc/current/ref/for.html
|
||
[2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
|
||
|
||
TBD: URL to python-dev archives
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|