python-peps/pep-0201.txt

248 lines
7.6 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 201
Title: Parallel Iteration
Version: $Revision$
Owner: bwarsaw@beopen.com (Barry A. Warsaw)
Python-Version: 2.0
Status: Draft
Introduction
This PEP describes the `parallel iteration' proposal for Python
2.0, previously known as `parallel for loops'. This PEP tracks
the status and ownership of this feature, slated for introduction
in Python 2.0. It contains a description of the feature and
outlines changes necessary to support the feature. This PEP
summarizes discussions held in mailing list forums, and provides
URLs for further information, where appropriate. The CVS revision
history of this file contains the definitive historical record.
Standard For-Loops
Motivation for this feature has its roots in a concept described
as `parallel for loops'. A standard for-loop in Python iterates
over every element in the sequence until the sequence is
exhausted. The for-loop can also be explicitly exited with a
`break' statement, and for-loops can have else: clauses, but these
is has no bearing on this PEP.
For-loops can iterate over built-in types such as lists and
tuples, but they can also iterate over instance types that conform
to an informal sequence protocol. This protocol states that the
instance should implement the __getitem__() method, expecting a
monotonically increasing index starting at 0, and this method
should raise an IndexError when the sequence is exhausted. This
protocol is current undocumented -- a defect in Python's
documentation hopefully soon corrected.
For loops are described in the language reference manual here
http://www.python.org/doc/devel/ref/for.html
An example for-loop
>>> for i in (1, 2, 3): print i
...
1
2
3
In this example, the variable `i' is called the `target', and is
assigned the next element of the list, each time through the loop.
Parallel For-Loops
Parallel for-loops are non-nested iterations over two or more
sequences, such that at each pass through the loop, one element
from each sequence is taken to compose the target. This behavior
can already be accomplished in Python through the use of the map()
built-in function:
>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> for i in map(None, a, b): print i
...
(1, 4)
(2, 5)
(3, 6)
Here, map() returns a list of N-tuples, where N is the number of
sequences in map()'s argument list (after the initial `None').
Each tuple is constructed of the i-th elements from each of the
argument lists, specifically in this example:
>>> map(None, a, b)
[(1, 4), (2, 5), (3, 6)]
The for-loop simply iterates over this list as normal.
While the map() idiom is a common one in Python, it has several
disadvantages:
- It is non-obvious to programmers without a functional
programming background.
- The use of the magic `None' first argument is non-obvious.
- Its has arbitrary, often unintended, and inflexible semantics
when the lists are not of the same length: the shorter sequences
are padded with `None'.
>>> c = (4, 5, 6, 7)
>>> map(None, a, c)
[(1, 4), (2, 5), (3, 6), (None, 7)]
For these reasons, several proposals were floated in the Python
2.0 beta time frame for providing a better spelling of parallel
for-loops. The initial proposals centered around syntactic
changes to the for statement, but conflicts and problems with the
syntax were unresolvable, especially when parallel for-loops were
combined with another proposed feature called `list
comprehensions' (see pep-0202.txt).
The Proposed Solution
The proposed solution is to introduce a new built-in sequence
generator function, available in the __builtin__ module. This
function is to be called `marry' and has the following signature:
marry(seqa, [seqb, [...]], [pad=<value>])
marry() takes one or more sequences and weaves their elements
together, just as map(None, ...) does with sequences of equal
length. The optional keyword argument `pad', if supplied, is a
value used to pad all shorter sequences to the length of the
longest sequence. If `pad' is omitted, then weaving stops when
the shortest sequence is exhausted.
It is not possible to pad short lists with different pad values,
nor will marry() ever raise an exception with lists of different
lengths. To accomplish both of these, the sequences must be
checked and processed before the call to marry().
Lazy Execution
For performance purposes, marry() does not construct the list of
tuples immediately. Instead it instantiates an object that
implements a __getitem__() method and conforms to the informal
for-loop protocol. This method constructs the individual tuples
on demand.
Examples
Here are some examples, based on the reference implementation
below.
>>> a = (1, 2, 3, 4)
>>> b = (5, 6, 7, 8)
>>> c = (9, 10, 11)
>>> d = (12, 13)
>>> marry(a, b)
[(1, 5), (2, 6), (3, 7), (4, 8)]
>>> marry(a, d)
[(1, 12), (2, 13)]
>>> marry(a, d, pad=0)
[(1, 12), (2, 13), (3, 0), (4, 0)]
>>> marry(a, d, pid=0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/tmp/python-iKAOxR", line 11, in marry
TypeError: unexpected keyword arguments
>>> marry(a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13)]
>>> marry(a, b, c, d, pad=None)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
>>> map(None, a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
Reference Implementation
Here is a reference implementation, in Python of the marry()
built-in function and helper class. These would ultimately be
replaced by equivalent C code.
class _Marriage:
def __init__(self, args, kws):
self.__padgiven = 0
if kws.has_key('pad'):
self.__padgiven = 1
self.__pad = kws['pad']
del kws['pad']
if kws:
raise TypeError('unexpected keyword arguments')
self.__sequences = args
self.__seqlen = len(args)
def __getitem__(self, i):
ret = []
exhausted = 0
for s in self.__sequences:
try:
ret.append(s[i])
except IndexError:
if not self.__padgiven:
raise
exhausted = exhausted + 1
if exhausted == self.__seqlen:
raise
ret.append(self.__pad)
return tuple(ret)
def __str__(self):
ret = []
i = 0
while 1:
try:
ret.append(self[i])
except IndexError:
break
i = i + 1
return str(ret)
__repr__ = __str__
def marry(*args, **kws):
return _Marriage(args, kws)
Open Issues
What should "marry(a)" do?
Given a = (1, 2, 3), should marry(a) return [(1,), (2,), (3,)] or
should it return [1, 2, 3]? The first is more consistent with the
description given above, while the latter is what map(None, a)
does, and may be more consistent with user expectation.
The latter interpretation requires special casing, which is not
present in the reference implementation. It returns
>>> marry(a)
[(1,), (2,), (3,), (4,)]
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: