248 lines
7.6 KiB
Plaintext
248 lines
7.6 KiB
Plaintext
PEP: 201
|
||
Title: Parallel Iteration
|
||
Version: $Revision$
|
||
Owner: bwarsaw@beopen.com (Barry A. Warsaw)
|
||
Python-Version: 2.0
|
||
Status: Draft
|
||
|
||
|
||
|
||
Introduction
|
||
|
||
This PEP describes the `parallel iteration' proposal for Python
|
||
2.0, previously known as `parallel for loops'. This PEP tracks
|
||
the status and ownership of this feature, slated for introduction
|
||
in Python 2.0. It contains a description of the feature and
|
||
outlines changes necessary to support the feature. This PEP
|
||
summarizes discussions held in mailing list forums, and provides
|
||
URLs for further information, where appropriate. The CVS revision
|
||
history of this file contains the definitive historical record.
|
||
|
||
|
||
|
||
Standard For-Loops
|
||
|
||
Motivation for this feature has its roots in a concept described
|
||
as `parallel for loops'. A standard for-loop in Python iterates
|
||
over every element in the sequence until the sequence is
|
||
exhausted. The for-loop can also be explicitly exited with a
|
||
`break' statement, and for-loops can have else: clauses, but these
|
||
is has no bearing on this PEP.
|
||
|
||
For-loops can iterate over built-in types such as lists and
|
||
tuples, but they can also iterate over instance types that conform
|
||
to an informal sequence protocol. This protocol states that the
|
||
instance should implement the __getitem__() method, expecting a
|
||
monotonically increasing index starting at 0, and this method
|
||
should raise an IndexError when the sequence is exhausted. This
|
||
protocol is current undocumented -- a defect in Python's
|
||
documentation hopefully soon corrected.
|
||
|
||
For loops are described in the language reference manual here
|
||
http://www.python.org/doc/devel/ref/for.html
|
||
|
||
An example for-loop
|
||
|
||
>>> for i in (1, 2, 3): print i
|
||
...
|
||
1
|
||
2
|
||
3
|
||
|
||
In this example, the variable `i' is called the `target', and is
|
||
assigned the next element of the list, each time through the loop.
|
||
|
||
|
||
|
||
Parallel For-Loops
|
||
|
||
Parallel for-loops are non-nested iterations over two or more
|
||
sequences, such that at each pass through the loop, one element
|
||
from each sequence is taken to compose the target. This behavior
|
||
can already be accomplished in Python through the use of the map()
|
||
built-in function:
|
||
|
||
>>> a = (1, 2, 3)
|
||
>>> b = (4, 5, 6)
|
||
>>> for i in map(None, a, b): print i
|
||
...
|
||
(1, 4)
|
||
(2, 5)
|
||
(3, 6)
|
||
|
||
Here, map() returns a list of N-tuples, where N is the number of
|
||
sequences in map()'s argument list (after the initial `None').
|
||
Each tuple is constructed of the i-th elements from each of the
|
||
argument lists, specifically in this example:
|
||
|
||
>>> map(None, a, b)
|
||
[(1, 4), (2, 5), (3, 6)]
|
||
|
||
The for-loop simply iterates over this list as normal.
|
||
|
||
While the map() idiom is a common one in Python, it has several
|
||
disadvantages:
|
||
|
||
- It is non-obvious to programmers without a functional
|
||
programming background.
|
||
|
||
- The use of the magic `None' first argument is non-obvious.
|
||
|
||
- Its has arbitrary, often unintended, and inflexible semantics
|
||
when the lists are not of the same length: the shorter sequences
|
||
are padded with `None'.
|
||
|
||
>>> c = (4, 5, 6, 7)
|
||
>>> map(None, a, c)
|
||
[(1, 4), (2, 5), (3, 6), (None, 7)]
|
||
|
||
For these reasons, several proposals were floated in the Python
|
||
2.0 beta time frame for providing a better spelling of parallel
|
||
for-loops. The initial proposals centered around syntactic
|
||
changes to the for statement, but conflicts and problems with the
|
||
syntax were unresolvable, especially when parallel for-loops were
|
||
combined with another proposed feature called `list
|
||
comprehensions' (see pep-0202.txt).
|
||
|
||
|
||
|
||
The Proposed Solution
|
||
|
||
The proposed solution is to introduce a new built-in sequence
|
||
generator function, available in the __builtin__ module. This
|
||
function is to be called `marry' and has the following signature:
|
||
|
||
marry(seqa, [seqb, [...]], [pad=<value>])
|
||
|
||
marry() takes one or more sequences and weaves their elements
|
||
together, just as map(None, ...) does with sequences of equal
|
||
length. The optional keyword argument `pad', if supplied, is a
|
||
value used to pad all shorter sequences to the length of the
|
||
longest sequence. If `pad' is omitted, then weaving stops when
|
||
the shortest sequence is exhausted.
|
||
|
||
It is not possible to pad short lists with different pad values,
|
||
nor will marry() ever raise an exception with lists of different
|
||
lengths. To accomplish both of these, the sequences must be
|
||
checked and processed before the call to marry().
|
||
|
||
|
||
|
||
Lazy Execution
|
||
|
||
For performance purposes, marry() does not construct the list of
|
||
tuples immediately. Instead it instantiates an object that
|
||
implements a __getitem__() method and conforms to the informal
|
||
for-loop protocol. This method constructs the individual tuples
|
||
on demand.
|
||
|
||
|
||
|
||
Examples
|
||
|
||
Here are some examples, based on the reference implementation
|
||
below.
|
||
|
||
>>> a = (1, 2, 3, 4)
|
||
>>> b = (5, 6, 7, 8)
|
||
>>> c = (9, 10, 11)
|
||
>>> d = (12, 13)
|
||
|
||
>>> marry(a, b)
|
||
[(1, 5), (2, 6), (3, 7), (4, 8)]
|
||
|
||
>>> marry(a, d)
|
||
[(1, 12), (2, 13)]
|
||
|
||
>>> marry(a, d, pad=0)
|
||
[(1, 12), (2, 13), (3, 0), (4, 0)]
|
||
|
||
>>> marry(a, d, pid=0)
|
||
Traceback (most recent call last):
|
||
File "<stdin>", line 1, in ?
|
||
File "/usr/tmp/python-iKAOxR", line 11, in marry
|
||
TypeError: unexpected keyword arguments
|
||
|
||
>>> marry(a, b, c, d)
|
||
[(1, 5, 9, 12), (2, 6, 10, 13)]
|
||
|
||
>>> marry(a, b, c, d, pad=None)
|
||
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
|
||
>>> map(None, a, b, c, d)
|
||
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
|
||
|
||
|
||
|
||
Reference Implementation
|
||
|
||
Here is a reference implementation, in Python of the marry()
|
||
built-in function and helper class. These would ultimately be
|
||
replaced by equivalent C code.
|
||
|
||
class _Marriage:
|
||
def __init__(self, args, kws):
|
||
self.__padgiven = 0
|
||
if kws.has_key('pad'):
|
||
self.__padgiven = 1
|
||
self.__pad = kws['pad']
|
||
del kws['pad']
|
||
if kws:
|
||
raise TypeError('unexpected keyword arguments')
|
||
self.__sequences = args
|
||
self.__seqlen = len(args)
|
||
|
||
def __getitem__(self, i):
|
||
ret = []
|
||
exhausted = 0
|
||
for s in self.__sequences:
|
||
try:
|
||
ret.append(s[i])
|
||
except IndexError:
|
||
if not self.__padgiven:
|
||
raise
|
||
exhausted = exhausted + 1
|
||
if exhausted == self.__seqlen:
|
||
raise
|
||
ret.append(self.__pad)
|
||
return tuple(ret)
|
||
|
||
def __str__(self):
|
||
ret = []
|
||
i = 0
|
||
while 1:
|
||
try:
|
||
ret.append(self[i])
|
||
except IndexError:
|
||
break
|
||
i = i + 1
|
||
return str(ret)
|
||
__repr__ = __str__
|
||
|
||
|
||
def marry(*args, **kws):
|
||
return _Marriage(args, kws)
|
||
|
||
|
||
|
||
Open Issues
|
||
|
||
What should "marry(a)" do?
|
||
|
||
Given a = (1, 2, 3), should marry(a) return [(1,), (2,), (3,)] or
|
||
should it return [1, 2, 3]? The first is more consistent with the
|
||
description given above, while the latter is what map(None, a)
|
||
does, and may be more consistent with user expectation.
|
||
|
||
The latter interpretation requires special casing, which is not
|
||
present in the reference implementation. It returns
|
||
|
||
>>> marry(a)
|
||
[(1,), (2,), (3,), (4,)]
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|