Updated based on Guido's recent pronouncements on the Open Issues.

There are now no more open issues.
2000-07-27 19:15:20 +00:00 · 2000-07-27 19:15:20 +00:00 · 15c9185e18
parent c7410156bc
commit 15c9185e18
1 changed files with 50 additions and 252 deletions
--- a/pep-0201.txt
+++ b/pep-0201.txt
@ -84,31 +84,17 @@ The Proposed Solution
    generator function, available in the __builtin__ module.  This
    function is to be called `zip' and has the following signature:

-    zip(seqa, [seqb, [...]], [pad=<value>])
+    zip(seqa, [seqb, [...]])

    zip() takes one or more sequences and weaves their elements
    together, just as map(None, ...) does with sequences of equal
-    length.  The optional keyword argument `pad', if supplied, is a
-    value used to pad all shorter sequences to the length of the
-    longest sequence.  If `pad' is omitted, then weaving stops when
-    the shortest sequence is exhausted.
-
-    It is not possible to pad short lists with different pad values,
-    nor will zip() ever raise an exception with lists of different
-    lengths.  To accomplish either behavior, the sequences must be
-    checked and processed before the call to zip() -- but see the Open
-    Issues below for more discussion.
+    length.  The weaving stops when the shortest sequence is
+    exhausted.


-Lazy Execution
+Return Value

-    For performance purposes, zip() does not construct the list of
-    tuples immediately.  Instead it instantiates an object that
-    implements a __getitem__() method and conforms to the informal
-    for-loop protocol.  This method constructs the individual tuples
-    on demand.
-
-    Guido is strongly opposed to lazy execution.  See Open Issues.
+    zip() returns a real Python list, the same way map() does.


 Examples
@ -127,23 +113,9 @@ Examples
    >>> zip(a, d)
    [(1, 12), (2, 13)]

-    >>> zip(a, d, pad=0)
-    [(1, 12), (2, 13), (3, 0), (4, 0)]
-    
-    >>> zip(a, d, pid=0)
-    Traceback (most recent call last):
-      File "<stdin>", line 1, in ?
-      File "/usr/tmp/python-iKAOxR", line 11, in zip
-    TypeError: unexpected keyword arguments
-    
    >>> zip(a, b, c, d)
    [(1, 5, 9, 12), (2, 6, 10, 13)]

-    >>> zip(a, b, c, d, pad=None)
-    [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
-    >>> map(None, a, b, c, d)
-    [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
-
    Note that when the sequences are of the same length, zip() is
    reversible:

@ -171,235 +143,60 @@ Reference Implementation
    built-in function and helper class.  These would ultimately be
    replaced by equivalent C code.

-    class _Zipper:
-        def __init__(self, args, kws):
-            # Defaults
-            self.__padgiven = 0
-            if kws.has_key('pad'):
-                self.__padgiven = 1
-                self.__pad = kws['pad']
-                del kws['pad']
-            # Assert no unknown arguments are left
-            if kws:
-                raise TypeError('unexpected keyword arguments')
-            self.__sequences = args
-            self.__seqlen = len(args)
-
-        def __getitem__(self, i):
-            if not self.__sequences:
-                raise IndexError
-            ret = []
-            exhausted = 0
-            for s in self.__sequences:
-                try:
-                    ret.append(s[i])
-                except IndexError:
-                    if not self.__padgiven:
-                        raise
-                    exhausted = exhausted + 1
-                    if exhausted == self.__seqlen:
-                        raise
-                    ret.append(self.__pad)
-            return tuple(ret)
-
-        def __len__(self):
-            # If we're padding, then len is the length of the longest sequence,
-            # otherwise it's the length of the shortest sequence.
-            if not self.__padgiven:
-                shortest = -1
-                for s in self.__sequences:
-                    slen = len(s)
-                    if shortest < 0 or slen < shortest:
-                        shortest = slen
-                if shortest < 0:
-                    return 0
-                return shortest
-            longest = 0
-            for s in self.__sequences:
-                slen = len(s)
-                if slen > longest:
-                    longest = slen
-            return longest
-
-        def __cmp__(self, other):
-            i = 0
-            smore = 1
-            omore = 1
-            while 1:
-                try:
-                    si = self[i]
-                except IndexError:
-                    smore = 0
-                try:
-                    oi = other[i]
-                except IndexError:
-                    omore = 0
-                if not smore and not omore:
-                    return 0
-                elif not smore:
-                    return -1
-                elif not omore:
-                    return 1
-                test = cmp(si, oi)
-                if test:
-                    return test
-                i = i + 1
-
-        def __str__(self):
-            ret = []
-            i = 0
-            while 1:
-                try:
-                    ret.append(self[i])
-                except IndexError:
-                    break
-                i = i + 1
-            return str(ret)
-        __repr__ = __str__
+    def zip(*args):
+        if not args:
+            raise TypeError('zip() expects one or more sequence arguments')
+        ret = []
+        # find the length of the shortest sequence
+        shortest = min(*map(len, args))
+        for i in range(shortest):
+            item = []
+            for s in args:
+                item.append(s[i])
+            ret.append(tuple(item))
+        return ret


-    def zip(*args, **kws):
-        return _Zipper(args, kws)
+BDFL Pronouncements

+    Note: the BDFL refers to Guido van Rossum, Python's Benevolent
+    Dictator For Life.

-Rejected Elaborations
+    - The function's name.  An earlier version of this PEP included an
+      open issue listing 20+ proposed alternative names to zip().  In
+      the face of no overwhelmingly better choice, the BDFL strongly
+      prefers zip() due to it's Haskell[2] heritage.  See version 1.7
+      of this PEP for the list of alteratives.

-    Some people have suggested that the user be able to specify the
-    type of the inner and outer containers for the zipped sequence.
-    This would be specified by additional keyword arguments to zip(),
-    named `inner' and `outer'.
+    - zip() shall be a built-in function.

-    This elaboration is rejected for several reasons.  First, there
-    really is no outer container, even though there appears to be an
-    outer list container the example above.  This is simply an
-    artifact of the repr() of the zipped object.  User code can do its
-    own looping over the zipped object via __getitem__(), and build
-    any type of outer container for the fully evaluated, concrete
-    sequence.  For example, to build a zipped object with lists as an
-    outer container, use
+    - Optional padding.  An earlier version of this PEP proposed an
+      optional `pad' keyword argument, which would be used when the
+      argument sequences were not the same length.  This is similar
+      behavior to the map(None, ...) semantics except that the user
+      would be able to specify pad object.  This has been rejected by
+      the BDFL in favor of always truncating to the shortest sequence.

-        >>> list(zip(sequence_a, sequence_b, sequence_c))
+    - Lazy evaluation.  An earlier version of this PEP proposed that
+      zip() return a built-in object that performed lazy evaluation
+      using __getitem__() protocol.  This has been strongly rejected
+      by the BDFL in favor of returning a real Python list.  If lazy
+      evaluation is desired in the future, the BDFL suggests an xzip()
+      function be added.

-    for tuple outer container, use
-    
-        >>> tuple(zip(sequence_a, sequence_b, sequence_c))
+    - zip() with no arguments.  the BDFL strongly prefers this raise a
+      TypeError exception.

-    This type of construction will usually not be necessary though,
-    since it is expected that zipped objects will most often appear in
-    for-loops.
+    - zip() with one argument.  the BDFL strongly prefers that this
+      return a list of 1-tuples.

-    Second, allowing the user to specify the inner container
-    introduces needless complexity and arbitrary decisions.  You might
-    imagine that instead of the default tuple inner container, the
-    user could prefer a list, or a dictionary, or instances of some
-    sequence-like class.
-
-    One problem is the API.  Should the argument to `inner' be a type
-    or a template object?  For flexibility, the argument should
-    probably be a type object (i.e. TupleType, ListType, DictType), or
-    a class.  For classes, the implementation could just pass the zip
-    element to the constructor.  But what about built-in types that
-    don't have constructors?  They would have to be special-cased in
-    the implementation (i.e. what is the constructor for TupleType?
-    The tuple() built-in).
-
-    Another problem that arises is for zips greater than length two.
-    Say you had three sequences and you wanted the inner type to be a
-    dictionary.  What would the semantics of the following be?
-
-        >>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)
-
-    Would the key be (element_a, element_b) and the value be
-    element_c, or would the key be element_a and the value be
-    (element_b, element_c)?  Or should an exception be thrown?
-
-    This suggests that the specification of the inner container type
-    is needless complexity.  It isn't likely that the inner container
-    will need to be specified very often, and it is easy to roll your
-    own should you need it.  Tuples are chosen for the inner container
-    type due to their (slight) memory footprint and performance
-    advantages.
-
-
-Open Issues
-
-    - Guido opposes lazy evaluation for zip().  He believes zip()
-      should return a real list, with an xzip() lazy evaluator added
-      later if necessary.
-
-    - What should "zip(a)" do?  Given
-
-      a = (1, 2, 3); zip(a)
-
-      three outcomes are possible.
-
-      1) Returns [(1,), (2,), (3,)]
-
-         Pros: no special casing in the implementation or in user
-         code, and is more consistent with the description of it's
-         semantics.  Cons: this isn't what map(None, a) would return,
-         and may be counter to user expectations.
-
-      2) Returns [1, 2, 3]
-
-         Pros: consistency with map(None, a), and simpler code for
-         for-loops, e.g.
-
-         for i in zip(a):
-
-         instead of
-
-         for (i,) in zip(a):
-
-         Cons: too much complexity and special casing for what should
-         be a relatively rare usage pattern.
-
-      3) Raises TypeError
-
-         Pros: zip(a) doesn't make much sense and could be confusing
-         to explain.
-
-         Cons: needless restriction
-
-      Current scoring seems to generally favor outcome 1.
-
-    - What should "zip()" do?
-
-      Along similar lines, zip() with no arguments (or zip() with just
-      a pad argument) can have ambiguous semantics.  Should this
-      return no elements or an infinite number?  For these reaons,
-      raising a TypeError exception in this case makes the most
-      sense.
-
-    - The name of the built-in `zip' may cause some initial confusion
-      with the zip compression algorithm.  Other suggestions include
-      (but are not limited to!): marry, weave, parallel, lace, braid,
-      interlace, permute, furl, tuples, lists, stitch, collate, knit,
-      plait, fold, with, mktuples, maketuples, totuples, gentuples,
-      tupleorama.
-
-      All have disadvantages, and there is no clear unanimous choice,
-      therefore the decision was made to go with `zip' because the
-      same functionality is available in other languages
-      (e.g. Haskell) under the name `zip'[2].
-
-    - Should zip() be including in the builtins module or should it be
-      in a separate generators module (possibly with other candidate
-      functions like irange())?
-
-    - Padding short sequences with different values.  A suggestion has
-      been made to allow a `padtuple' (probably better called `pads'
-      or `padseq') argument similar to `pad'.  This sequence must have
-      a length equal to the number of sequences given.  It is a
-      sequence of the individual pad values to use for each sequence,
-      should it be shorter than the maximum length.
-
-      One problem is what to do if `padtuple' itself isn't of the
-      right length?  A TypeError seems to be the only choice here.
-
-      How does `pad' and `padtuple' interact?  Perhaps if padtuple
-      were too short, it could use pad as a fallback.  padtuple would
-      always override pad if both were given.
+    - Inner and outer container control.  An earlier version of this
+      PEP contains a rather lengthy discussion on a feature that some
+      people wanted, namely the ability to control what the inner and
+      outer container types were (they are tuples and list
+      respectively in this version of the PEP).  Given the simplified
+      API and implementation, this elaboration is rejected.  For a
+      more detailed analysis, see version 1.7 of this PEP.


 References
@ -409,6 +206,7 @@ References

    TBD: URL to python-dev archives

+
 Copyright

    This document has been placed in the public domain.