Updated based on Guido's recent pronouncements on the Open Issues.
There are now no more open issues.
This commit is contained in:
parent
c7410156bc
commit
15c9185e18
302
pep-0201.txt
302
pep-0201.txt
|
@ -84,31 +84,17 @@ The Proposed Solution
|
|||
generator function, available in the __builtin__ module. This
|
||||
function is to be called `zip' and has the following signature:
|
||||
|
||||
zip(seqa, [seqb, [...]], [pad=<value>])
|
||||
zip(seqa, [seqb, [...]])
|
||||
|
||||
zip() takes one or more sequences and weaves their elements
|
||||
together, just as map(None, ...) does with sequences of equal
|
||||
length. The optional keyword argument `pad', if supplied, is a
|
||||
value used to pad all shorter sequences to the length of the
|
||||
longest sequence. If `pad' is omitted, then weaving stops when
|
||||
the shortest sequence is exhausted.
|
||||
|
||||
It is not possible to pad short lists with different pad values,
|
||||
nor will zip() ever raise an exception with lists of different
|
||||
lengths. To accomplish either behavior, the sequences must be
|
||||
checked and processed before the call to zip() -- but see the Open
|
||||
Issues below for more discussion.
|
||||
length. The weaving stops when the shortest sequence is
|
||||
exhausted.
|
||||
|
||||
|
||||
Lazy Execution
|
||||
Return Value
|
||||
|
||||
For performance purposes, zip() does not construct the list of
|
||||
tuples immediately. Instead it instantiates an object that
|
||||
implements a __getitem__() method and conforms to the informal
|
||||
for-loop protocol. This method constructs the individual tuples
|
||||
on demand.
|
||||
|
||||
Guido is strongly opposed to lazy execution. See Open Issues.
|
||||
zip() returns a real Python list, the same way map() does.
|
||||
|
||||
|
||||
Examples
|
||||
|
@ -127,23 +113,9 @@ Examples
|
|||
>>> zip(a, d)
|
||||
[(1, 12), (2, 13)]
|
||||
|
||||
>>> zip(a, d, pad=0)
|
||||
[(1, 12), (2, 13), (3, 0), (4, 0)]
|
||||
|
||||
>>> zip(a, d, pid=0)
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ?
|
||||
File "/usr/tmp/python-iKAOxR", line 11, in zip
|
||||
TypeError: unexpected keyword arguments
|
||||
|
||||
>>> zip(a, b, c, d)
|
||||
[(1, 5, 9, 12), (2, 6, 10, 13)]
|
||||
|
||||
>>> zip(a, b, c, d, pad=None)
|
||||
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
|
||||
>>> map(None, a, b, c, d)
|
||||
[(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
|
||||
|
||||
Note that when the sequences are of the same length, zip() is
|
||||
reversible:
|
||||
|
||||
|
@ -171,235 +143,60 @@ Reference Implementation
|
|||
built-in function and helper class. These would ultimately be
|
||||
replaced by equivalent C code.
|
||||
|
||||
class _Zipper:
|
||||
def __init__(self, args, kws):
|
||||
# Defaults
|
||||
self.__padgiven = 0
|
||||
if kws.has_key('pad'):
|
||||
self.__padgiven = 1
|
||||
self.__pad = kws['pad']
|
||||
del kws['pad']
|
||||
# Assert no unknown arguments are left
|
||||
if kws:
|
||||
raise TypeError('unexpected keyword arguments')
|
||||
self.__sequences = args
|
||||
self.__seqlen = len(args)
|
||||
|
||||
def __getitem__(self, i):
|
||||
if not self.__sequences:
|
||||
raise IndexError
|
||||
ret = []
|
||||
exhausted = 0
|
||||
for s in self.__sequences:
|
||||
try:
|
||||
ret.append(s[i])
|
||||
except IndexError:
|
||||
if not self.__padgiven:
|
||||
raise
|
||||
exhausted = exhausted + 1
|
||||
if exhausted == self.__seqlen:
|
||||
raise
|
||||
ret.append(self.__pad)
|
||||
return tuple(ret)
|
||||
|
||||
def __len__(self):
|
||||
# If we're padding, then len is the length of the longest sequence,
|
||||
# otherwise it's the length of the shortest sequence.
|
||||
if not self.__padgiven:
|
||||
shortest = -1
|
||||
for s in self.__sequences:
|
||||
slen = len(s)
|
||||
if shortest < 0 or slen < shortest:
|
||||
shortest = slen
|
||||
if shortest < 0:
|
||||
return 0
|
||||
return shortest
|
||||
longest = 0
|
||||
for s in self.__sequences:
|
||||
slen = len(s)
|
||||
if slen > longest:
|
||||
longest = slen
|
||||
return longest
|
||||
|
||||
def __cmp__(self, other):
|
||||
i = 0
|
||||
smore = 1
|
||||
omore = 1
|
||||
while 1:
|
||||
try:
|
||||
si = self[i]
|
||||
except IndexError:
|
||||
smore = 0
|
||||
try:
|
||||
oi = other[i]
|
||||
except IndexError:
|
||||
omore = 0
|
||||
if not smore and not omore:
|
||||
return 0
|
||||
elif not smore:
|
||||
return -1
|
||||
elif not omore:
|
||||
return 1
|
||||
test = cmp(si, oi)
|
||||
if test:
|
||||
return test
|
||||
i = i + 1
|
||||
|
||||
def __str__(self):
|
||||
ret = []
|
||||
i = 0
|
||||
while 1:
|
||||
try:
|
||||
ret.append(self[i])
|
||||
except IndexError:
|
||||
break
|
||||
i = i + 1
|
||||
return str(ret)
|
||||
__repr__ = __str__
|
||||
def zip(*args):
|
||||
if not args:
|
||||
raise TypeError('zip() expects one or more sequence arguments')
|
||||
ret = []
|
||||
# find the length of the shortest sequence
|
||||
shortest = min(*map(len, args))
|
||||
for i in range(shortest):
|
||||
item = []
|
||||
for s in args:
|
||||
item.append(s[i])
|
||||
ret.append(tuple(item))
|
||||
return ret
|
||||
|
||||
|
||||
def zip(*args, **kws):
|
||||
return _Zipper(args, kws)
|
||||
BDFL Pronouncements
|
||||
|
||||
Note: the BDFL refers to Guido van Rossum, Python's Benevolent
|
||||
Dictator For Life.
|
||||
|
||||
Rejected Elaborations
|
||||
- The function's name. An earlier version of this PEP included an
|
||||
open issue listing 20+ proposed alternative names to zip(). In
|
||||
the face of no overwhelmingly better choice, the BDFL strongly
|
||||
prefers zip() due to it's Haskell[2] heritage. See version 1.7
|
||||
of this PEP for the list of alteratives.
|
||||
|
||||
Some people have suggested that the user be able to specify the
|
||||
type of the inner and outer containers for the zipped sequence.
|
||||
This would be specified by additional keyword arguments to zip(),
|
||||
named `inner' and `outer'.
|
||||
- zip() shall be a built-in function.
|
||||
|
||||
This elaboration is rejected for several reasons. First, there
|
||||
really is no outer container, even though there appears to be an
|
||||
outer list container the example above. This is simply an
|
||||
artifact of the repr() of the zipped object. User code can do its
|
||||
own looping over the zipped object via __getitem__(), and build
|
||||
any type of outer container for the fully evaluated, concrete
|
||||
sequence. For example, to build a zipped object with lists as an
|
||||
outer container, use
|
||||
- Optional padding. An earlier version of this PEP proposed an
|
||||
optional `pad' keyword argument, which would be used when the
|
||||
argument sequences were not the same length. This is similar
|
||||
behavior to the map(None, ...) semantics except that the user
|
||||
would be able to specify pad object. This has been rejected by
|
||||
the BDFL in favor of always truncating to the shortest sequence.
|
||||
|
||||
>>> list(zip(sequence_a, sequence_b, sequence_c))
|
||||
- Lazy evaluation. An earlier version of this PEP proposed that
|
||||
zip() return a built-in object that performed lazy evaluation
|
||||
using __getitem__() protocol. This has been strongly rejected
|
||||
by the BDFL in favor of returning a real Python list. If lazy
|
||||
evaluation is desired in the future, the BDFL suggests an xzip()
|
||||
function be added.
|
||||
|
||||
for tuple outer container, use
|
||||
|
||||
>>> tuple(zip(sequence_a, sequence_b, sequence_c))
|
||||
- zip() with no arguments. the BDFL strongly prefers this raise a
|
||||
TypeError exception.
|
||||
|
||||
This type of construction will usually not be necessary though,
|
||||
since it is expected that zipped objects will most often appear in
|
||||
for-loops.
|
||||
- zip() with one argument. the BDFL strongly prefers that this
|
||||
return a list of 1-tuples.
|
||||
|
||||
Second, allowing the user to specify the inner container
|
||||
introduces needless complexity and arbitrary decisions. You might
|
||||
imagine that instead of the default tuple inner container, the
|
||||
user could prefer a list, or a dictionary, or instances of some
|
||||
sequence-like class.
|
||||
|
||||
One problem is the API. Should the argument to `inner' be a type
|
||||
or a template object? For flexibility, the argument should
|
||||
probably be a type object (i.e. TupleType, ListType, DictType), or
|
||||
a class. For classes, the implementation could just pass the zip
|
||||
element to the constructor. But what about built-in types that
|
||||
don't have constructors? They would have to be special-cased in
|
||||
the implementation (i.e. what is the constructor for TupleType?
|
||||
The tuple() built-in).
|
||||
|
||||
Another problem that arises is for zips greater than length two.
|
||||
Say you had three sequences and you wanted the inner type to be a
|
||||
dictionary. What would the semantics of the following be?
|
||||
|
||||
>>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)
|
||||
|
||||
Would the key be (element_a, element_b) and the value be
|
||||
element_c, or would the key be element_a and the value be
|
||||
(element_b, element_c)? Or should an exception be thrown?
|
||||
|
||||
This suggests that the specification of the inner container type
|
||||
is needless complexity. It isn't likely that the inner container
|
||||
will need to be specified very often, and it is easy to roll your
|
||||
own should you need it. Tuples are chosen for the inner container
|
||||
type due to their (slight) memory footprint and performance
|
||||
advantages.
|
||||
|
||||
|
||||
Open Issues
|
||||
|
||||
- Guido opposes lazy evaluation for zip(). He believes zip()
|
||||
should return a real list, with an xzip() lazy evaluator added
|
||||
later if necessary.
|
||||
|
||||
- What should "zip(a)" do? Given
|
||||
|
||||
a = (1, 2, 3); zip(a)
|
||||
|
||||
three outcomes are possible.
|
||||
|
||||
1) Returns [(1,), (2,), (3,)]
|
||||
|
||||
Pros: no special casing in the implementation or in user
|
||||
code, and is more consistent with the description of it's
|
||||
semantics. Cons: this isn't what map(None, a) would return,
|
||||
and may be counter to user expectations.
|
||||
|
||||
2) Returns [1, 2, 3]
|
||||
|
||||
Pros: consistency with map(None, a), and simpler code for
|
||||
for-loops, e.g.
|
||||
|
||||
for i in zip(a):
|
||||
|
||||
instead of
|
||||
|
||||
for (i,) in zip(a):
|
||||
|
||||
Cons: too much complexity and special casing for what should
|
||||
be a relatively rare usage pattern.
|
||||
|
||||
3) Raises TypeError
|
||||
|
||||
Pros: zip(a) doesn't make much sense and could be confusing
|
||||
to explain.
|
||||
|
||||
Cons: needless restriction
|
||||
|
||||
Current scoring seems to generally favor outcome 1.
|
||||
|
||||
- What should "zip()" do?
|
||||
|
||||
Along similar lines, zip() with no arguments (or zip() with just
|
||||
a pad argument) can have ambiguous semantics. Should this
|
||||
return no elements or an infinite number? For these reaons,
|
||||
raising a TypeError exception in this case makes the most
|
||||
sense.
|
||||
|
||||
- The name of the built-in `zip' may cause some initial confusion
|
||||
with the zip compression algorithm. Other suggestions include
|
||||
(but are not limited to!): marry, weave, parallel, lace, braid,
|
||||
interlace, permute, furl, tuples, lists, stitch, collate, knit,
|
||||
plait, fold, with, mktuples, maketuples, totuples, gentuples,
|
||||
tupleorama.
|
||||
|
||||
All have disadvantages, and there is no clear unanimous choice,
|
||||
therefore the decision was made to go with `zip' because the
|
||||
same functionality is available in other languages
|
||||
(e.g. Haskell) under the name `zip'[2].
|
||||
|
||||
- Should zip() be including in the builtins module or should it be
|
||||
in a separate generators module (possibly with other candidate
|
||||
functions like irange())?
|
||||
|
||||
- Padding short sequences with different values. A suggestion has
|
||||
been made to allow a `padtuple' (probably better called `pads'
|
||||
or `padseq') argument similar to `pad'. This sequence must have
|
||||
a length equal to the number of sequences given. It is a
|
||||
sequence of the individual pad values to use for each sequence,
|
||||
should it be shorter than the maximum length.
|
||||
|
||||
One problem is what to do if `padtuple' itself isn't of the
|
||||
right length? A TypeError seems to be the only choice here.
|
||||
|
||||
How does `pad' and `padtuple' interact? Perhaps if padtuple
|
||||
were too short, it could use pad as a fallback. padtuple would
|
||||
always override pad if both were given.
|
||||
- Inner and outer container control. An earlier version of this
|
||||
PEP contains a rather lengthy discussion on a feature that some
|
||||
people wanted, namely the ability to control what the inner and
|
||||
outer container types were (they are tuples and list
|
||||
respectively in this version of the PEP). Given the simplified
|
||||
API and implementation, this elaboration is rejected. For a
|
||||
more detailed analysis, see version 1.7 of this PEP.
|
||||
|
||||
|
||||
References
|
||||
|
@ -409,6 +206,7 @@ References
|
|||
|
||||
TBD: URL to python-dev archives
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
|
Loading…
Reference in New Issue