2000-07-13 02:33:08 -04:00
|
|
|
PEP: 201
|
2000-07-28 01:48:25 -04:00
|
|
|
Title: Lockstep Iteration
|
2022-10-05 12:48:43 -04:00
|
|
|
Author: Barry Warsaw <barry@python.org>
|
2000-09-23 04:19:29 -04:00
|
|
|
Status: Final
|
2000-08-23 01:12:55 -04:00
|
|
|
Type: Standards Track
|
2016-07-19 17:25:03 -04:00
|
|
|
Content-Type: text/x-rst
|
2000-07-25 17:51:55 -04:00
|
|
|
Created: 13-Jul-2000
|
2007-06-19 00:20:07 -04:00
|
|
|
Python-Version: 2.0
|
2000-07-27 15:17:36 -04:00
|
|
|
Post-History: 27-Jul-2000
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
|
2000-07-13 02:33:08 -04:00
|
|
|
Introduction
|
2016-07-19 17:25:03 -04:00
|
|
|
============
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
This PEP describes the 'lockstep iteration' proposal. This PEP tracks
|
|
|
|
the status and ownership of this feature, slated for introduction in
|
|
|
|
Python 2.0. It contains a description of the feature and outlines
|
|
|
|
changes necessary to support the feature. This PEP summarizes
|
|
|
|
discussions held in mailing list forums, and provides URLs for further
|
|
|
|
information, where appropriate. The CVS revision history of this file
|
|
|
|
contains the definitive historical record.
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
|
2000-07-25 17:51:55 -04:00
|
|
|
Motivation
|
2016-07-19 17:25:03 -04:00
|
|
|
==========
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
Standard for-loops in Python iterate over every element in a sequence
|
|
|
|
until the sequence is exhausted [1]_. However, for-loops iterate over
|
|
|
|
only a single sequence, and it is often desirable to loop over more
|
|
|
|
than one sequence in a lock-step fashion. In other words, in a way
|
|
|
|
such that the i-th iteration through the loop returns an object
|
|
|
|
containing the i-th element from each sequence.
|
2000-07-25 17:51:55 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
The common idioms used to accomplish this are unintuitive. This PEP
|
|
|
|
proposes a standard way of performing such iterations by introducing a
|
|
|
|
new builtin function called ``zip``.
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
While the primary motivation for zip() comes from lock-step iteration,
|
|
|
|
by implementing zip() as a built-in function, it has additional
|
|
|
|
utility in contexts other than for-loops.
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2000-07-28 01:48:25 -04:00
|
|
|
Lockstep For-Loops
|
2016-07-19 17:25:03 -04:00
|
|
|
==================
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
Lockstep for-loops are non-nested iterations over two or more
|
|
|
|
sequences, such that at each pass through the loop, one element from
|
|
|
|
each sequence is taken to compose the target. This behavior can
|
2023-09-01 14:37:19 -04:00
|
|
|
already be accomplished in Python through the use of the map() built-in
|
|
|
|
function::
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
>>> a = (1, 2, 3)
|
|
|
|
>>> b = (4, 5, 6)
|
|
|
|
>>> for i in map(None, a, b): print i
|
2017-03-24 17:11:33 -04:00
|
|
|
...
|
2000-07-13 02:33:08 -04:00
|
|
|
(1, 4)
|
|
|
|
(2, 5)
|
|
|
|
(3, 6)
|
|
|
|
>>> map(None, a, b)
|
|
|
|
[(1, 4), (2, 5), (3, 6)]
|
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
The for-loop simply iterates over this list as normal.
|
|
|
|
|
|
|
|
While the map() idiom is a common one in Python, it has several
|
|
|
|
disadvantages:
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
* It is non-obvious to programmers without a functional programming
|
|
|
|
background.
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
* The use of the magic ``None`` first argument is non-obvious.
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
* It has arbitrary, often unintended, and inflexible semantics when
|
|
|
|
the lists are not of the same length: the shorter sequences are
|
|
|
|
padded with ``None``::
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
>>> c = (4, 5, 6, 7)
|
|
|
|
>>> map(None, a, c)
|
|
|
|
[(1, 4), (2, 5), (3, 6), (None, 7)]
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
For these reasons, several proposals were floated in the Python 2.0
|
|
|
|
beta time frame for syntactic support of lockstep for-loops. Here are
|
|
|
|
two suggestions::
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2017-03-24 17:11:33 -04:00
|
|
|
for x in seq1, y in seq2:
|
2016-07-19 17:25:03 -04:00
|
|
|
# stuff
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
::
|
2000-07-31 11:52:45 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
for x, y in seq1, seq2:
|
|
|
|
# stuff
|
2000-07-31 11:52:45 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
Neither of these forms would work, since they both already mean
|
|
|
|
something in Python and changing the meanings would break existing
|
|
|
|
code. All other suggestions for new syntax suffered the same problem,
|
|
|
|
or were in conflict with other another proposed feature called 'list
|
2022-01-21 06:03:51 -05:00
|
|
|
comprehensions' (see :pep:`202`).
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
The Proposed Solution
|
2016-07-19 17:25:03 -04:00
|
|
|
=====================
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
The proposed solution is to introduce a new built-in sequence
|
|
|
|
generator function, available in the ``__builtin__`` module. This
|
|
|
|
function is to be called ``zip`` and has the following signature::
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2000-07-27 15:15:20 -04:00
|
|
|
zip(seqa, [seqb, [...]])
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
``zip()`` takes one or more sequences and weaves their elements
|
|
|
|
together, just as ``map(None, ...)`` does with sequences of equal
|
|
|
|
length. The weaving stops when the shortest sequence is exhausted.
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
|
2000-07-27 15:15:20 -04:00
|
|
|
Return Value
|
2016-07-19 17:25:03 -04:00
|
|
|
============
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
``zip()`` returns a real Python list, the same way ``map()`` does.
|
2000-07-25 17:51:55 -04:00
|
|
|
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
Examples
|
2016-07-19 17:25:03 -04:00
|
|
|
========
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
Here are some examples, based on the reference implementation below::
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
>>> a = (1, 2, 3, 4)
|
|
|
|
>>> b = (5, 6, 7, 8)
|
|
|
|
>>> c = (9, 10, 11)
|
|
|
|
>>> d = (12, 13)
|
|
|
|
|
2000-07-17 14:49:21 -04:00
|
|
|
>>> zip(a, b)
|
2000-07-13 02:33:08 -04:00
|
|
|
[(1, 5), (2, 6), (3, 7), (4, 8)]
|
|
|
|
|
2000-07-17 14:49:21 -04:00
|
|
|
>>> zip(a, d)
|
2000-07-13 02:33:08 -04:00
|
|
|
[(1, 12), (2, 13)]
|
|
|
|
|
2000-07-17 14:49:21 -04:00
|
|
|
>>> zip(a, b, c, d)
|
2000-07-13 02:33:08 -04:00
|
|
|
[(1, 5, 9, 12), (2, 6, 10, 13)]
|
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
Note that when the sequences are of the same length, ``zip()`` is
|
|
|
|
reversible::
|
2000-07-19 00:19:54 -04:00
|
|
|
|
|
|
|
>>> a = (1, 2, 3)
|
|
|
|
>>> b = (4, 5, 6)
|
|
|
|
>>> x = zip(a, b)
|
|
|
|
>>> y = zip(*x) # alternatively, apply(zip, x)
|
|
|
|
>>> z = zip(*y) # alternatively, apply(zip, y)
|
|
|
|
>>> x
|
|
|
|
[(1, 4), (2, 5), (3, 6)]
|
|
|
|
>>> y
|
|
|
|
[(1, 2, 3), (4, 5, 6)]
|
|
|
|
>>> z
|
|
|
|
[(1, 4), (2, 5), (3, 6)]
|
|
|
|
>>> x == z
|
|
|
|
1
|
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
It is not possible to reverse zip this way when the sequences are not
|
|
|
|
all the same length.
|
2000-07-19 00:19:54 -04:00
|
|
|
|
2000-07-13 02:33:08 -04:00
|
|
|
|
|
|
|
Reference Implementation
|
2016-07-19 17:25:03 -04:00
|
|
|
========================
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
Here is a reference implementation, in Python of the zip() built-in
|
|
|
|
function. This will be replaced with a C implementation after final
|
|
|
|
approval::
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2000-07-27 15:15:20 -04:00
|
|
|
def zip(*args):
|
|
|
|
if not args:
|
|
|
|
raise TypeError('zip() expects one or more sequence arguments')
|
|
|
|
ret = []
|
2000-07-28 01:48:25 -04:00
|
|
|
i = 0
|
|
|
|
try:
|
|
|
|
while 1:
|
|
|
|
item = []
|
|
|
|
for s in args:
|
|
|
|
item.append(s[i])
|
|
|
|
ret.append(tuple(item))
|
|
|
|
i = i + 1
|
|
|
|
except IndexError:
|
|
|
|
return ret
|
2000-07-27 15:15:20 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
BDFL Pronouncements
|
|
|
|
===================
|
|
|
|
|
|
|
|
Note: the BDFL refers to Guido van Rossum, Python's Benevolent
|
|
|
|
Dictator For Life.
|
|
|
|
|
|
|
|
* The function's name. An earlier version of this PEP included an
|
|
|
|
open issue listing 20+ proposed alternative names to ``zip()``. In
|
|
|
|
the face of no overwhelmingly better choice, the BDFL strongly
|
|
|
|
prefers ``zip()`` due to its Haskell [2]_ heritage. See version 1.7
|
|
|
|
of this PEP for the list of alternatives.
|
|
|
|
|
|
|
|
* ``zip()`` shall be a built-in function.
|
|
|
|
|
|
|
|
* Optional padding. An earlier version of this PEP proposed an
|
|
|
|
optional ``pad`` keyword argument, which would be used when the
|
|
|
|
argument sequences were not the same length. This is similar
|
|
|
|
behavior to the ``map(None, ...)`` semantics except that the user
|
|
|
|
would be able to specify pad object. This has been rejected by the
|
|
|
|
BDFL in favor of always truncating to the shortest sequence, because
|
|
|
|
of the KISS principle. If there's a true need, it is easier to add
|
|
|
|
later. If it is not needed, it would still be impossible to delete
|
|
|
|
it in the future.
|
|
|
|
|
|
|
|
* Lazy evaluation. An earlier version of this PEP proposed that
|
|
|
|
``zip()`` return a built-in object that performed lazy evaluation
|
|
|
|
using ``__getitem__()`` protocol. This has been strongly rejected
|
|
|
|
by the BDFL in favor of returning a real Python list. If lazy
|
|
|
|
evaluation is desired in the future, the BDFL suggests an ``xzip()``
|
|
|
|
function be added.
|
|
|
|
|
|
|
|
* ``zip()`` with no arguments. the BDFL strongly prefers this raise a
|
|
|
|
TypeError exception.
|
|
|
|
|
|
|
|
* ``zip()`` with one argument. the BDFL strongly prefers that this
|
|
|
|
return a list of 1-tuples.
|
|
|
|
|
|
|
|
* Inner and outer container control. An earlier version of this PEP
|
|
|
|
contains a rather lengthy discussion on a feature that some people
|
|
|
|
wanted, namely the ability to control what the inner and outer
|
|
|
|
container types were (they are tuples and list respectively in this
|
|
|
|
version of the PEP). Given the simplified API and implementation,
|
|
|
|
this elaboration is rejected. For a more detailed analysis, see
|
|
|
|
version 1.7 of this PEP.
|
|
|
|
|
|
|
|
Subsequent Change to ``zip()``
|
|
|
|
==============================
|
|
|
|
|
|
|
|
In Python 2.4, zip() with no arguments was modified to return an empty
|
|
|
|
list rather than raising a TypeError exception. The rationale for the
|
|
|
|
original behavior was that the absence of arguments was thought to
|
|
|
|
indicate a programming error. However, that thinking did not
|
|
|
|
anticipate the use of zip() with the ``*`` operator for unpacking
|
|
|
|
variable length argument lists. For example, the inverse of zip could
|
|
|
|
be defined as: ``unzip = lambda s: zip(*s)``. That transformation
|
|
|
|
also defines a matrix transpose or an equivalent row/column swap for
|
|
|
|
tables defined as lists of tuples. The latter transformation is
|
|
|
|
commonly used when reading data files with records as rows and fields
|
|
|
|
as columns. For example, the code::
|
|
|
|
|
|
|
|
date, rain, high, low = zip(*csv.reader(file("weather.csv")))
|
|
|
|
|
|
|
|
rearranges columnar data so that each field is collected into
|
|
|
|
individual tuples for straightforward looping and summarization::
|
|
|
|
|
|
|
|
print "Total rainfall", sum(rain)
|
|
|
|
|
|
|
|
Using ``zip(*args)`` is more easily coded if ``zip(*[])`` is handled
|
|
|
|
as an allowable case rather than an exception. This is especially
|
|
|
|
helpful when data is either built up from or recursed down to a null
|
|
|
|
case with no records.
|
|
|
|
|
|
|
|
Seeing this possibility, the BDFL agreed (with some misgivings) to
|
|
|
|
have the behavior changed for Py2.4.
|
2003-08-02 02:32:12 -04:00
|
|
|
|
|
|
|
Other Changes
|
2016-07-19 17:25:03 -04:00
|
|
|
=============
|
2003-08-02 02:32:12 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
* The ``xzip()`` function discussed above was implemented in Py2.3 in
|
|
|
|
the ``itertools`` module as ``itertools.izip()``. This function
|
|
|
|
provides lazy behavior, consuming single elements and producing a
|
|
|
|
single tuple on each pass. The "just-in-time" style saves memory
|
|
|
|
and runs faster than its list based counterpart, ``zip()``.
|
2003-08-02 02:32:12 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
* The ``itertools`` module also added ``itertools.repeat()`` and
|
|
|
|
``itertools.chain()``. These tools can be used together to pad
|
|
|
|
sequences with ``None`` (to match the behavior of ``map(None,
|
|
|
|
seqn)``)::
|
2003-08-02 02:32:12 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
zip(firstseq, chain(secondseq, repeat(None)))
|
2003-08-02 02:32:12 -04:00
|
|
|
|
2000-07-17 14:49:21 -04:00
|
|
|
|
|
|
|
References
|
2016-07-19 17:25:03 -04:00
|
|
|
==========
|
|
|
|
|
|
|
|
.. [1] http://docs.python.org/reference/compound_stmts.html#for
|
|
|
|
|
|
|
|
.. [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
|
2000-07-17 14:49:21 -04:00
|
|
|
|
|
|
|
|
2019-07-03 14:20:45 -04:00
|
|
|
Greg Wilson's questionnaire on proposed syntax to some CS grad students
|
2016-07-19 17:25:03 -04:00
|
|
|
http://www.python.org/pipermail/python-dev/2000-July/013139.html
|
2000-07-17 14:49:21 -04:00
|
|
|
|
2000-07-27 15:15:20 -04:00
|
|
|
|
2000-07-17 14:49:21 -04:00
|
|
|
Copyright
|
2016-07-19 17:25:03 -04:00
|
|
|
=========
|
2000-07-13 02:33:08 -04:00
|
|
|
|
2016-07-19 17:25:03 -04:00
|
|
|
This document has been placed in the public domain.
|