Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.
This commit is contained in:
Barry Warsaw 2000-07-17 18:49:21 +00:00
parent cf9d818d03
commit be3c33389e
1 changed files with 160 additions and 35 deletions

View File

@ -25,9 +25,10 @@ Standard For-Loops
Motivation for this feature has its roots in a concept described
as `parallel for loops'. A standard for-loop in Python iterates
over every element in the sequence until the sequence is
exhausted. The for-loop can also be explicitly exited with a
`break' statement, and for-loops can have else: clauses, but these
is has no bearing on this PEP.
exhausted. A `break' statement inside the loop suite causes an
explicit loop exit. For-loops also have else: clauses which get
executed when the loop exits normally (i.e. not by execution of a
break).
For-loops can iterate over built-in types such as lists and
tuples, but they can also iterate over instance types that conform
@ -35,13 +36,13 @@ Standard For-Loops
instance should implement the __getitem__() method, expecting a
monotonically increasing index starting at 0, and this method
should raise an IndexError when the sequence is exhausted. This
protocol is current undocumented -- a defect in Python's
protocol is currently undocumented -- a defect in Python's
documentation hopefully soon corrected.
For loops are described in the language reference manual here
http://www.python.org/doc/devel/ref/for.html
For-loops are described in the Python language reference
manual[1].
An example for-loop
An example for-loop:
>>> for i in (1, 2, 3): print i
...
@ -88,7 +89,7 @@ Parallel For-Loops
- The use of the magic `None' first argument is non-obvious.
- Its has arbitrary, often unintended, and inflexible semantics
- It has arbitrary, often unintended, and inflexible semantics
when the lists are not of the same length: the shorter sequences
are padded with `None'.
@ -110,11 +111,11 @@ The Proposed Solution
The proposed solution is to introduce a new built-in sequence
generator function, available in the __builtin__ module. This
function is to be called `marry' and has the following signature:
function is to be called `zip' and has the following signature:
marry(seqa, [seqb, [...]], [pad=<value>])
zip(seqa, [seqb, [...]], [pad=<value>])
marry() takes one or more sequences and weaves their elements
zip() takes one or more sequences and weaves their elements
together, just as map(None, ...) does with sequences of equal
length. The optional keyword argument `pad', if supplied, is a
value used to pad all shorter sequences to the length of the
@ -122,15 +123,15 @@ The Proposed Solution
the shortest sequence is exhausted.
It is not possible to pad short lists with different pad values,
nor will marry() ever raise an exception with lists of different
lengths. To accomplish both of these, the sequences must be
checked and processed before the call to marry().
nor will zip() ever raise an exception with lists of different
lengths. To accomplish either behavior, the sequences must be
checked and processed before the call to zip().
Lazy Execution
For performance purposes, marry() does not construct the list of
For performance purposes, zip() does not construct the list of
tuples immediately. Instead it instantiates an object that
implements a __getitem__() method and conforms to the informal
for-loop protocol. This method constructs the individual tuples
@ -148,25 +149,25 @@ Examples
>>> c = (9, 10, 11)
>>> d = (12, 13)
>>> marry(a, b)
>>> zip(a, b)
[(1, 5), (2, 6), (3, 7), (4, 8)]
>>> marry(a, d)
>>> zip(a, d)
[(1, 12), (2, 13)]
>>> marry(a, d, pad=0)
>>> zip(a, d, pad=0)
[(1, 12), (2, 13), (3, 0), (4, 0)]
>>> marry(a, d, pid=0)
>>> zip(a, d, pid=0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/tmp/python-iKAOxR", line 11, in marry
File "/usr/tmp/python-iKAOxR", line 11, in zip
TypeError: unexpected keyword arguments
>>> marry(a, b, c, d)
>>> zip(a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13)]
>>> marry(a, b, c, d, pad=None)
>>> zip(a, b, c, d, pad=None)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
>>> map(None, a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
@ -175,17 +176,19 @@ Examples
Reference Implementation
Here is a reference implementation, in Python of the marry()
Here is a reference implementation, in Python of the zip()
built-in function and helper class. These would ultimately be
replaced by equivalent C code.
class _Marriage:
class _Zipper:
def __init__(self, args, kws):
# Defaults
self.__padgiven = 0
if kws.has_key('pad'):
self.__padgiven = 1
self.__pad = kws['pad']
del kws['pad']
# Assert no unknown arguments are left
if kws:
raise TypeError('unexpected keyword arguments')
self.__sequences = args
@ -206,6 +209,23 @@ Reference Implementation
ret.append(self.__pad)
return tuple(ret)
def __len__(self):
# If we're padding, then len is the length of the longest sequence,
# otherwise it's the length of the shortest sequence.
if not self.__padgiven:
shortest = -1
for s in self.__sequences:
slen = len(s)
if shortest < 0 or slen < shortest:
shortest = slen
return shortest
longest = 0
for s in self.__sequences:
slen = len(s)
if slen > longest:
longest = slen
return longest
def __str__(self):
ret = []
i = 0
@ -219,25 +239,130 @@ Reference Implementation
__repr__ = __str__
def marry(*args, **kws):
return _Marriage(args, kws)
def zip(*args, **kws):
return _Zipper(args, kws)
Rejected Elaborations
Some people have suggested that the user be able to specify the
type of the inner and outer containers for the zipped sequence.
This would be specified by additional keyword arguments to zip(),
named `inner' and `outer'.
This elaboration is rejected for several reasons. First, there
really is no outer container, even though there appears to be an
outer list container the example above. This is simply an
artifact of the repr() of the zipped object. User code can do its
own looping over the zipped object via __getitem__(), and build
any type of outer container for the fully evaluated, concrete
sequence. For example, to build a zipped object with lists as an
outer container, use
>>> list(zip(sequence_a, sequence_b, sequence_c))
for tuple outer container, use
>>> tuple(zip(sequence_a, sequence_b, sequence_c))
This type of construction will usually not be necessary though,
since it is expected that zipped objects will most often appear in
for-loops.
Second, allowing the user to specify the inner container
introduces needless complexity and arbitrary decisions. You might
imagine that instead of the default tuple inner container, the
user could prefer a list, or a dictionary, or instances of some
sequence-like class.
One problem is the API. Should the argument to `inner' be a type
or a template object? For flexibility, the argument should
probably be a type object (i.e. TupleType, ListType, DictType), or
a class. For classes, the implementation could just pass the zip
element to the constructor. But what about built-in types that
don't have constructors? They would have to be special-cased in
the implementation (i.e. what is the constructor for TupleType?
The tuple() built-in).
Another problem that arises is for zips greater than length two.
Say you had three sequences and you wanted the inner type to be a
dictionary. What would the semantics of the following be?
>>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)
Would the key be (element_a, element_b) and the value be
element_c, or would the key be element_a and the value be
(element_b, element_c)? Or should an exception be thrown?
This suggests that the specification of the inner container type
is needless complexity. It isn't likely that the inner container
will need to be specified very often, and it is easy to roll your
own should you need it. Tuples are chosen for the inner container
type due to their (slight) memory footprint and performance
advantages.
Open Issues
What should "marry(a)" do?
- What should "zip(a)" do? Given
Given a = (1, 2, 3), should marry(a) return [(1,), (2,), (3,)] or
should it return [1, 2, 3]? The first is more consistent with the
description given above, while the latter is what map(None, a)
does, and may be more consistent with user expectation.
a = (1, 2, 3); zip(a)
The latter interpretation requires special casing, which is not
present in the reference implementation. It returns
three outcomes are possible.
>>> marry(a)
[(1,), (2,), (3,), (4,)]
1) Returns [(1,), (2,), (3,)]
Pros: no special casing in the implementation or in user
code, and is more consistent with the description of it's
semantics. Cons: this isn't what map(None, a) would return,
and may be counter to user expectations.
2) Returns [1, 2, 3]
Pros: consistency with map(None, a), and simpler code for
for-loops, e.g.
for i in zip(a):
instead of
for (i,) in zip(a):
Cons: too much complexity and special casing for what should
be a relatively rare usage pattern.
3) Raises TypeError
Pros: None
Cons: needless restriction
Current scoring seems to generally favor outcome 1.
- The name of the built-in `zip' may cause some initial confusion
with the zip compression algorithm. Other suggestions include
(but are not limited to!): marry, weave, parallel, lace, braid,
interlace, permute, furl, tuples, lists, stitch, collate, knit,
plait, and with. All have disadvantages, and there is no clear
unanimous choice, therefore the decision was made to go with
`zip' because the same functionality is available in other
languages (e.g. Haskell) under the name `zip'[2].
References
[1] http://www.python.org/doc/devel/ref/for.html
[2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
TBD: URL to python-dev archives
Copyright
This document has been placed in the public domain.