python-peps/pep-0201.txt

PEP: 201
Title: Parallel Iteration
Version: $Revision$
Author: bwarsaw@beopen.com (Barry A. Warsaw)
Python-Version: 2.0
Status: Draft
Created: 13-Jul-2000
Post-History:


Introduction

    This PEP describes the `parallel iteration' proposal for Python
    2.0, previously known as `parallel for loops'.  This PEP tracks
    the status and ownership of this feature, slated for introduction
    in Python 2.0.  It contains a description of the feature and
    outlines changes necessary to support the feature.  This PEP
    summarizes discussions held in mailing list forums, and provides
    URLs for further information, where appropriate.  The CVS revision
    history of this file contains the definitive historical record.


Motivation

    Standard for-loops in Python iterate over every element in a
    sequence until the sequence is exhausted[1].  However, for-loops
    iterate over only a single sequence, and it is often desirable to
    loop over more than one sequence, in a lock-step, "Chinese Menu"
    type of way.

    The common idioms used to accomplish this are unintuitive and
    inflexible.  This PEP proposes a standard way of performing such
    iterations by introducing a new builtin function called `zip'.


Parallel For-Loops

    Parallel for-loops are non-nested iterations over two or more
    sequences, such that at each pass through the loop, one element
    from each sequence is taken to compose the target.  This behavior
    can already be accomplished in Python through the use of the map()
    built-in function:

    >>> a = (1, 2, 3)
    >>> b = (4, 5, 6)
    >>> for i in map(None, a, b): print i
    ... 
    (1, 4)
    (2, 5)
    (3, 6)
    >>> map(None, a, b)
    [(1, 4), (2, 5), (3, 6)]

    The for-loop simply iterates over this list as normal.

    While the map() idiom is a common one in Python, it has several
    disadvantages:

    - It is non-obvious to programmers without a functional
      programming background.

    - The use of the magic `None' first argument is non-obvious.

    - It has arbitrary, often unintended, and inflexible semantics
      when the lists are not of the same length: the shorter sequences
      are padded with `None'.

      >>> c = (4, 5, 6, 7)
      >>> map(None, a, c)
      [(1, 4), (2, 5), (3, 6), (None, 7)]

    For these reasons, several proposals were floated in the Python
    2.0 beta time frame for providing a better spelling of parallel
    for-loops.  The initial proposals centered around syntactic
    changes to the for statement, but conflicts and problems with the
    syntax were unresolvable, especially when parallel for-loops were
    combined with another proposed feature called `list
    comprehensions' (see pep-0202.txt).


The Proposed Solution

    The proposed solution is to introduce a new built-in sequence
    generator function, available in the __builtin__ module.  This
    function is to be called `zip' and has the following signature:

    zip(seqa, [seqb, [...]], [pad=<value>])

    zip() takes one or more sequences and weaves their elements
    together, just as map(None, ...) does with sequences of equal
    length.  The optional keyword argument `pad', if supplied, is a
    value used to pad all shorter sequences to the length of the
    longest sequence.  If `pad' is omitted, then weaving stops when
    the shortest sequence is exhausted.

    It is not possible to pad short lists with different pad values,
    nor will zip() ever raise an exception with lists of different
    lengths.  To accomplish either behavior, the sequences must be
    checked and processed before the call to zip() -- but see the Open
    Issues below for more discussion.


Lazy Execution

    For performance purposes, zip() does not construct the list of
    tuples immediately.  Instead it instantiates an object that
    implements a __getitem__() method and conforms to the informal
    for-loop protocol.  This method constructs the individual tuples
    on demand.

    Guido is strongly opposed to lazy execution.  See Open Issues.


Examples

    Here are some examples, based on the reference implementation
    below.

    >>> a = (1, 2, 3, 4)
    >>> b = (5, 6, 7, 8)
    >>> c = (9, 10, 11)
    >>> d = (12, 13)

    >>> zip(a, b)
    [(1, 5), (2, 6), (3, 7), (4, 8)]

    >>> zip(a, d)
    [(1, 12), (2, 13)]

    >>> zip(a, d, pad=0)
    [(1, 12), (2, 13), (3, 0), (4, 0)]
    
    >>> zip(a, d, pid=0)
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/usr/tmp/python-iKAOxR", line 11, in zip
    TypeError: unexpected keyword arguments
    
    >>> zip(a, b, c, d)
    [(1, 5, 9, 12), (2, 6, 10, 13)]

    >>> zip(a, b, c, d, pad=None)
    [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
    >>> map(None, a, b, c, d)
    [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]

    Note that when the sequences are of the same length, zip() is
    reversible:

    >>> a = (1, 2, 3)
    >>> b = (4, 5, 6)
    >>> x = zip(a, b)
    >>> y = zip(*x) # alternatively, apply(zip, x)
    >>> z = zip(*y) # alternatively, apply(zip, y)
    >>> x
    [(1, 4), (2, 5), (3, 6)]
    >>> y
    [(1, 2, 3), (4, 5, 6)]
    >>> z
    [(1, 4), (2, 5), (3, 6)]
    >>> x == z
    1

    It is not possible to reverse zip this way when the sequences are
    not all the same length.


Reference Implementation

    Here is a reference implementation, in Python of the zip()
    built-in function and helper class.  These would ultimately be
    replaced by equivalent C code.

    class _Zipper:
        def __init__(self, args, kws):
            # Defaults
            self.__padgiven = 0
            if kws.has_key('pad'):
                self.__padgiven = 1
                self.__pad = kws['pad']
                del kws['pad']
            # Assert no unknown arguments are left
            if kws:
                raise TypeError('unexpected keyword arguments')
            self.__sequences = args
            self.__seqlen = len(args)

        def __getitem__(self, i):
            if not self.__sequences:
                raise IndexError
            ret = []
            exhausted = 0
            for s in self.__sequences:
                try:
                    ret.append(s[i])
                except IndexError:
                    if not self.__padgiven:
                        raise
                    exhausted = exhausted + 1
                    if exhausted == self.__seqlen:
                        raise
                    ret.append(self.__pad)
            return tuple(ret)

        def __len__(self):
            # If we're padding, then len is the length of the longest sequence,
            # otherwise it's the length of the shortest sequence.
            if not self.__padgiven:
                shortest = -1
                for s in self.__sequences:
                    slen = len(s)
                    if shortest < 0 or slen < shortest:
                        shortest = slen
                if shortest < 0:
                    return 0
                return shortest
            longest = 0
            for s in self.__sequences:
                slen = len(s)
                if slen > longest:
                    longest = slen
            return longest

        def __cmp__(self, other):
            i = 0
            smore = 1
            omore = 1
            while 1:
                try:
                    si = self[i]
                except IndexError:
                    smore = 0
                try:
                    oi = other[i]
                except IndexError:
                    omore = 0
                if not smore and not omore:
                    return 0
                elif not smore:
                    return -1
                elif not omore:
                    return 1
                test = cmp(si, oi)
                if test:
                    return test
                i = i + 1

        def __str__(self):
            ret = []
            i = 0
            while 1:
                try:
                    ret.append(self[i])
                except IndexError:
                    break
                i = i + 1
            return str(ret)
        __repr__ = __str__


    def zip(*args, **kws):
        return _Zipper(args, kws)


Rejected Elaborations

    Some people have suggested that the user be able to specify the
    type of the inner and outer containers for the zipped sequence.
    This would be specified by additional keyword arguments to zip(),
    named `inner' and `outer'.

    This elaboration is rejected for several reasons.  First, there
    really is no outer container, even though there appears to be an
    outer list container the example above.  This is simply an
    artifact of the repr() of the zipped object.  User code can do its
    own looping over the zipped object via __getitem__(), and build
    any type of outer container for the fully evaluated, concrete
    sequence.  For example, to build a zipped object with lists as an
    outer container, use

        >>> list(zip(sequence_a, sequence_b, sequence_c))

    for tuple outer container, use
    
        >>> tuple(zip(sequence_a, sequence_b, sequence_c))

    This type of construction will usually not be necessary though,
    since it is expected that zipped objects will most often appear in
    for-loops.

    Second, allowing the user to specify the inner container
    introduces needless complexity and arbitrary decisions.  You might
    imagine that instead of the default tuple inner container, the
    user could prefer a list, or a dictionary, or instances of some
    sequence-like class.

    One problem is the API.  Should the argument to `inner' be a type
    or a template object?  For flexibility, the argument should
    probably be a type object (i.e. TupleType, ListType, DictType), or
    a class.  For classes, the implementation could just pass the zip
    element to the constructor.  But what about built-in types that
    don't have constructors?  They would have to be special-cased in
    the implementation (i.e. what is the constructor for TupleType?
    The tuple() built-in).

    Another problem that arises is for zips greater than length two.
    Say you had three sequences and you wanted the inner type to be a
    dictionary.  What would the semantics of the following be?

        >>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)

    Would the key be (element_a, element_b) and the value be
    element_c, or would the key be element_a and the value be
    (element_b, element_c)?  Or should an exception be thrown?

    This suggests that the specification of the inner container type
    is needless complexity.  It isn't likely that the inner container
    will need to be specified very often, and it is easy to roll your
    own should you need it.  Tuples are chosen for the inner container
    type due to their (slight) memory footprint and performance
    advantages.


Open Issues

    - Guido opposes lazy evaluation for zip().  He believes zip()
      should return a real list, with an xzip() lazy evaluator added
      later if necessary.

    - What should "zip(a)" do?  Given

      a = (1, 2, 3); zip(a)

      three outcomes are possible.

      1) Returns [(1,), (2,), (3,)]

         Pros: no special casing in the implementation or in user
         code, and is more consistent with the description of it's
         semantics.  Cons: this isn't what map(None, a) would return,
         and may be counter to user expectations.

      2) Returns [1, 2, 3]

         Pros: consistency with map(None, a), and simpler code for
         for-loops, e.g.

         for i in zip(a):

         instead of

         for (i,) in zip(a):

         Cons: too much complexity and special casing for what should
         be a relatively rare usage pattern.

      3) Raises TypeError

         Pros: zip(a) doesn't make much sense and could be confusing
         to explain.

         Cons: needless restriction

      Current scoring seems to generally favor outcome 1.

    - What should "zip()" do?

      Along similar lines, zip() with no arguments (or zip() with just
      a pad argument) can have ambiguous semantics.  Should this
      return no elements or an infinite number?  For these reaons,
      raising a TypeError exception in this case makes the most
      sense.

    - The name of the built-in `zip' may cause some initial confusion
      with the zip compression algorithm.  Other suggestions include
      (but are not limited to!): marry, weave, parallel, lace, braid,
      interlace, permute, furl, tuples, lists, stitch, collate, knit,
      plait, fold, with, mktuples, maketuples, totuples, gentuples,
      tupleorama.

      All have disadvantages, and there is no clear unanimous choice,
      therefore the decision was made to go with `zip' because the
      same functionality is available in other languages
      (e.g. Haskell) under the name `zip'[2].

    - Should zip() be including in the builtins module or should it be
      in a separate generators module (possibly with other candidate
      functions like irange())?

    - Padding short sequences with different values.  A suggestion has
      been made to allow a `padtuple' (probably better called `pads'
      or `padseq') argument similar to `pad'.  This sequence must have
      a length equal to the number of sequences given.  It is a
      sequence of the individual pad values to use for each sequence,
      should it be shorter than the maximum length.

      One problem is what to do if `padtuple' itself isn't of the
      right length?  A TypeError seems to be the only choice here.

      How does `pad' and `padtuple' interact?  Perhaps if padtuple
      were too short, it could use pad as a fallback.  padtuple would
      always override pad if both were given.


References

    [1] http://www.python.org/doc/current/ref/for.html
    [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip

    TBD: URL to python-dev archives

Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								PEP: 201
 								Title: Parallel Iteration
 								Version: $Revision$
-												In a compromise with JHy, and to be more consistent with the style now
documented in PEP1, remove the Emacs page breaks.

Also, Owner: -> Author:, added Created: and Post-History: headers

Changed "Standard For-Loops" section to "Motivation" and shortened
considerably; readers already know how for-loops work in Python.

Added notes about Guido's veto of lazy evaluation.

											
										
										
											2000-07-25 17:51:55 -04:00
+								Author: bwarsaw@beopen.com (Barry A. Warsaw)
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								Python-Version: 2.0
 								Status: Draft
-												In a compromise with JHy, and to be more consistent with the style now
documented in PEP1, remove the Emacs page breaks.

Also, Owner: -> Author:, added Created: and Post-History: headers

Changed "Standard For-Loops" section to "Motivation" and shortened
considerably; readers already know how for-loops work in Python.

Added notes about Guido's veto of lazy evaluation.

											
										
										
											2000-07-25 17:51:55 -04:00
+								Created: 13-Jul-2000
 								Post-History:
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
 								Introduction
 								    This PEP describes the `parallel iteration' proposal for Python
 .0, previously known as `parallel for loops'.  This PEP tracks
 								    the status and ownership of this feature, slated for introduction
 								    in Python 2.0.  It contains a description of the feature and
 								    outlines changes necessary to support the feature.  This PEP
 								    summarizes discussions held in mailing list forums, and provides
 								    URLs for further information, where appropriate.  The CVS revision
 								    history of this file contains the definitive historical record.
-												In a compromise with JHy, and to be more consistent with the style now
documented in PEP1, remove the Emacs page breaks.

Also, Owner: -> Author:, added Created: and Post-History: headers

Changed "Standard For-Loops" section to "Motivation" and shortened
considerably; readers already know how for-loops work in Python.

Added notes about Guido's veto of lazy evaluation.

											
										
										
											2000-07-25 17:51:55 -04:00
+								Motivation
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
-												In a compromise with JHy, and to be more consistent with the style now
documented in PEP1, remove the Emacs page breaks.

Also, Owner: -> Author:, added Created: and Post-History: headers

Changed "Standard For-Loops" section to "Motivation" and shortened
considerably; readers already know how for-loops work in Python.

Added notes about Guido's veto of lazy evaluation.

											
										
										
											2000-07-25 17:51:55 -04:00
+								    Standard for-loops in Python iterate over every element in a
 								    sequence until the sequence is exhausted[1].  However, for-loops
 								    iterate over only a single sequence, and it is often desirable to
 								    loop over more than one sequence, in a lock-step, "Chinese Menu"
 								    type of way.
 								    The common idioms used to accomplish this are unintuitive and
 								    inflexible.  This PEP proposes a standard way of performing such
 								    iterations by introducing a new builtin function called `zip'.
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
 								Parallel For-Loops
 								    Parallel for-loops are non-nested iterations over two or more
 								    sequences, such that at each pass through the loop, one element
 								    from each sequence is taken to compose the target.  This behavior
 								    can already be accomplished in Python through the use of the map()
 								    built-in function:
 								    >>> a = (1, 2, 3)
 								    >>> b = (4, 5, 6)
 								    >>> for i in map(None, a, b): print i
 								    ...
 								    (1, 4)
 								    (2, 5)
 								    (3, 6)
 								    >>> map(None, a, b)
 								    [(1, 4), (2, 5), (3, 6)]
 								    The for-loop simply iterates over this list as normal.
 								    While the map() idiom is a common one in Python, it has several
 								    disadvantages:
 								    - It is non-obvious to programmers without a functional
 								      programming background.
 								    - The use of the magic `None' first argument is non-obvious.
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    - It has arbitrary, often unintended, and inflexible semantics
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								      when the lists are not of the same length: the shorter sequences
 								      are padded with `None'.
 								      >>> c = (4, 5, 6, 7)
 								      >>> map(None, a, c)
 								      [(1, 4), (2, 5), (3, 6), (None, 7)]
 								    For these reasons, several proposals were floated in the Python
 .0 beta time frame for providing a better spelling of parallel
 								    for-loops.  The initial proposals centered around syntactic
 								    changes to the for statement, but conflicts and problems with the
 								    syntax were unresolvable, especially when parallel for-loops were
 								    combined with another proposed feature called `list
 								    comprehensions' (see pep-0202.txt).
 								The Proposed Solution
 								    The proposed solution is to introduce a new built-in sequence
 								    generator function, available in the __builtin__ module.  This
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    function is to be called `zip' and has the following signature:
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    zip(seqa, [seqb, [...]], [pad=<value>])
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    zip() takes one or more sequences and weaves their elements
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    together, just as map(None, ...) does with sequences of equal
 								    length.  The optional keyword argument `pad', if supplied, is a
 								    value used to pad all shorter sequences to the length of the
 								    longest sequence.  If `pad' is omitted, then weaving stops when
 								    the shortest sequence is exhausted.
 								    It is not possible to pad short lists with different pad values,
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    nor will zip() ever raise an exception with lists of different
 								    lengths.  To accomplish either behavior, the sequences must be
-												Added a few more open issues:

- what should "zip()" do (i.e. zip with no arguments).

- should zip() be included in the builtins?

- the padtuple proposal

											
										
										
											2000-07-24 13:40:00 -04:00
+								    checked and processed before the call to zip() -- but see the Open
 								    Issues below for more discussion.
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
 								Lazy Execution
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    For performance purposes, zip() does not construct the list of
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    tuples immediately.  Instead it instantiates an object that
 								    implements a __getitem__() method and conforms to the informal
 								    for-loop protocol.  This method constructs the individual tuples
 								    on demand.
-												In a compromise with JHy, and to be more consistent with the style now
documented in PEP1, remove the Emacs page breaks.

Also, Owner: -> Author:, added Created: and Post-History: headers

Changed "Standard For-Loops" section to "Motivation" and shortened
considerably; readers already know how for-loops work in Python.

Added notes about Guido's veto of lazy evaluation.

											
										
										
											2000-07-25 17:51:55 -04:00
+								    Guido is strongly opposed to lazy execution.  See Open Issues.
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
 								Examples
 								    Here are some examples, based on the reference implementation
 								    below.
 								    >>> a = (1, 2, 3, 4)
 								    >>> b = (5, 6, 7, 8)
 								    >>> c = (9, 10, 11)
 								    >>> d = (12, 13)
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    >>> zip(a, b)
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    [(1, 5), (2, 6), (3, 7), (4, 8)]
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    >>> zip(a, d)
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    [(1, 12), (2, 13)]
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    >>> zip(a, d, pad=0)
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    [(1, 12), (2, 13), (3, 0), (4, 0)]
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    >>> zip(a, d, pid=0)
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    Traceback (most recent call last):
 								      File "<stdin>", line 1, in ?
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								      File "/usr/tmp/python-iKAOxR", line 11, in zip
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    TypeError: unexpected keyword arguments
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    >>> zip(a, b, c, d)
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    [(1, 5, 9, 12), (2, 6, 10, 13)]
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    >>> zip(a, b, c, d, pad=None)
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
 								    >>> map(None, a, b, c, d)
 								    [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
-												In the examples section, show how zip() is reversible.

Patches to the reference implementation:

    __getitem__() raises IndexError immediately if no sequences were
    given.

    __len__() returns 0 if no sequences were given.

    __cmp__() new method

Added a little more explanation to raise-a-TypeError-for-zip(a)

Added `fold' as one of the alternative names proposed.

											
										
										
											2000-07-19 00:19:54 -04:00
+								    Note that when the sequences are of the same length, zip() is
 								    reversible:
 								    >>> a = (1, 2, 3)
 								    >>> b = (4, 5, 6)
 								    >>> x = zip(a, b)
 								    >>> y = zip(*x) # alternatively, apply(zip, x)
 								    >>> z = zip(*y) # alternatively, apply(zip, y)
 								    >>> x
 								    [(1, 4), (2, 5), (3, 6)]
 								    >>> y
 								    [(1, 2, 3), (4, 5, 6)]
 								    >>> z
 								    [(1, 4), (2, 5), (3, 6)]
 								    >>> x == z
 
 								    It is not possible to reverse zip this way when the sequences are
 								    not all the same length.
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
 								Reference Implementation
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    Here is a reference implementation, in Python of the zip()
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								    built-in function and helper class.  These would ultimately be
 								    replaced by equivalent C code.
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    class _Zipper:
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								        def __init__(self, args, kws):
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								            # Defaults
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								            self.__padgiven = 0
 								            if kws.has_key('pad'):
 								                self.__padgiven = 1
 								                self.__pad = kws['pad']
 								                del kws['pad']
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								            # Assert no unknown arguments are left
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								            if kws:
 								                raise TypeError('unexpected keyword arguments')
 								            self.__sequences = args
 								            self.__seqlen = len(args)
 								        def __getitem__(self, i):
-												In the examples section, show how zip() is reversible.

Patches to the reference implementation:

    __getitem__() raises IndexError immediately if no sequences were
    given.

    __len__() returns 0 if no sequences were given.

    __cmp__() new method

Added a little more explanation to raise-a-TypeError-for-zip(a)

Added `fold' as one of the alternative names proposed.

											
										
										
											2000-07-19 00:19:54 -04:00
+								            if not self.__sequences:
 								                raise IndexError
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								            ret = []
 								            exhausted = 0
 								            for s in self.__sequences:
 								                try:
 								                    ret.append(s[i])
 								                except IndexError:
 								                    if not self.__padgiven:
 								                        raise
 								                    exhausted = exhausted + 1
 								                    if exhausted == self.__seqlen:
 								                        raise
 								                    ret.append(self.__pad)
 								            return tuple(ret)
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								        def __len__(self):
 								            # If we're padding, then len is the length of the longest sequence,
 								            # otherwise it's the length of the shortest sequence.
 								            if not self.__padgiven:
 								                shortest = -1
 								                for s in self.__sequences:
 								                    slen = len(s)
 								                    if shortest < 0 or slen < shortest:
 								                        shortest = slen
-												In the examples section, show how zip() is reversible.

Patches to the reference implementation:

    __getitem__() raises IndexError immediately if no sequences were
    given.

    __len__() returns 0 if no sequences were given.

    __cmp__() new method

Added a little more explanation to raise-a-TypeError-for-zip(a)

Added `fold' as one of the alternative names proposed.

											
										
										
											2000-07-19 00:19:54 -04:00
+								                if shortest < 0:
 								                    return 0
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								                return shortest
 								            longest = 0
 								            for s in self.__sequences:
 								                slen = len(s)
 								                if slen > longest:
 								                    longest = slen
 								            return longest
-												In the examples section, show how zip() is reversible.

Patches to the reference implementation:

    __getitem__() raises IndexError immediately if no sequences were
    given.

    __len__() returns 0 if no sequences were given.

    __cmp__() new method

Added a little more explanation to raise-a-TypeError-for-zip(a)

Added `fold' as one of the alternative names proposed.

											
										
										
											2000-07-19 00:19:54 -04:00
+								        def __cmp__(self, other):
 								            i = 0
 								            smore = 1
 								            omore = 1
 								            while 1:
 								                try:
 								                    si = self[i]
 								                except IndexError:
 								                    smore = 0
 								                try:
 								                    oi = other[i]
 								                except IndexError:
 								                    omore = 0
 								                if not smore and not omore:
 								                    return 0
 								                elif not smore:
 								                    return -1
 								                elif not omore:
 								                    return 1
 								                test = cmp(si, oi)
 								                if test:
 								                    return test
 								                i = i + 1
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
+								        def __str__(self):
 								            ret = []
 								            i = 0
 								            while 1:
 								                try:
 								                    ret.append(self[i])
 								                except IndexError:
 								                    break
 								                i = i + 1
 								            return str(ret)
 								        __repr__ = __str__
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    def zip(*args, **kws):
 								        return _Zipper(args, kws)
 								Rejected Elaborations
 								    Some people have suggested that the user be able to specify the
 								    type of the inner and outer containers for the zipped sequence.
 								    This would be specified by additional keyword arguments to zip(),
 								    named `inner' and `outer'.
 								    This elaboration is rejected for several reasons.  First, there
 								    really is no outer container, even though there appears to be an
 								    outer list container the example above.  This is simply an
 								    artifact of the repr() of the zipped object.  User code can do its
 								    own looping over the zipped object via __getitem__(), and build
 								    any type of outer container for the fully evaluated, concrete
 								    sequence.  For example, to build a zipped object with lists as an
 								    outer container, use
 								        >>> list(zip(sequence_a, sequence_b, sequence_c))
 								    for tuple outer container, use
 								        >>> tuple(zip(sequence_a, sequence_b, sequence_c))
 								    This type of construction will usually not be necessary though,
 								    since it is expected that zipped objects will most often appear in
 								    for-loops.
 								    Second, allowing the user to specify the inner container
 								    introduces needless complexity and arbitrary decisions.  You might
 								    imagine that instead of the default tuple inner container, the
 								    user could prefer a list, or a dictionary, or instances of some
 								    sequence-like class.
 								    One problem is the API.  Should the argument to `inner' be a type
 								    or a template object?  For flexibility, the argument should
 								    probably be a type object (i.e. TupleType, ListType, DictType), or
 								    a class.  For classes, the implementation could just pass the zip
 								    element to the constructor.  But what about built-in types that
 								    don't have constructors?  They would have to be special-cased in
 								    the implementation (i.e. what is the constructor for TupleType?
 								    The tuple() built-in).
 								    Another problem that arises is for zips greater than length two.
 								    Say you had three sequences and you wanted the inner type to be a
 								    dictionary.  What would the semantics of the following be?
 								        >>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)
 								    Would the key be (element_a, element_b) and the value be
 								    element_c, or would the key be element_a and the value be
 								    (element_b, element_c)?  Or should an exception be thrown?
 								    This suggests that the specification of the inner container type
 								    is needless complexity.  It isn't likely that the inner container
 								    will need to be specified very often, and it is easy to roll your
 								    own should you need it.  Tuples are chosen for the inner container
 								    type due to their (slight) memory footprint and performance
 								    advantages.
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
 								Open Issues
-												In a compromise with JHy, and to be more consistent with the style now
documented in PEP1, remove the Emacs page breaks.

Also, Owner: -> Author:, added Created: and Post-History: headers

Changed "Standard For-Loops" section to "Motivation" and shortened
considerably; readers already know how for-loops work in Python.

Added notes about Guido's veto of lazy evaluation.

											
										
										
											2000-07-25 17:51:55 -04:00
+								    - Guido opposes lazy evaluation for zip().  He believes zip()
 								      should return a real list, with an xzip() lazy evaluator added
 								      later if necessary.
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    - What should "zip(a)" do?  Given
 								      a = (1, 2, 3); zip(a)
 								      three outcomes are possible.
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+) Returns [(1,), (2,), (3,)]
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								         Pros: no special casing in the implementation or in user
 								         code, and is more consistent with the description of it's
 								         semantics.  Cons: this isn't what map(None, a) would return,
 								         and may be counter to user expectations.
 ) Returns [1, 2, 3]
 								         Pros: consistency with map(None, a), and simpler code for
 								         for-loops, e.g.
 								         for i in zip(a):
 								         instead of
 								         for (i,) in zip(a):
 								         Cons: too much complexity and special casing for what should
 								         be a relatively rare usage pattern.
 ) Raises TypeError
-												In the examples section, show how zip() is reversible.

Patches to the reference implementation:

    __getitem__() raises IndexError immediately if no sequences were
    given.

    __len__() returns 0 if no sequences were given.

    __cmp__() new method

Added a little more explanation to raise-a-TypeError-for-zip(a)

Added `fold' as one of the alternative names proposed.

											
										
										
											2000-07-19 00:19:54 -04:00
+								         Pros: zip(a) doesn't make much sense and could be confusing
 								         to explain.
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
 								         Cons: needless restriction
 								      Current scoring seems to generally favor outcome 1.
-												Added a few more open issues:

- what should "zip()" do (i.e. zip with no arguments).

- should zip() be included in the builtins?

- the padtuple proposal

											
										
										
											2000-07-24 13:40:00 -04:00
+								    - What should "zip()" do?
 								      Along similar lines, zip() with no arguments (or zip() with just
 								      a pad argument) can have ambiguous semantics.  Should this
 								      return no elements or an infinite number?  For these reaons,
 								      raising a TypeError exception in this case makes the most
 								      sense.
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    - The name of the built-in `zip' may cause some initial confusion
 								      with the zip compression algorithm.  Other suggestions include
 								      (but are not limited to!): marry, weave, parallel, lace, braid,
 								      interlace, permute, furl, tuples, lists, stitch, collate, knit,
-												Added Paul's latest crop of names to the list of proposed alternatives
in Open Issues.

											
										
										
											2000-07-25 18:00:05 -04:00
+								      plait, fold, with, mktuples, maketuples, totuples, gentuples,
 								      tupleorama.
 								      All have disadvantages, and there is no clear unanimous choice,
 								      therefore the decision was made to go with `zip' because the
 								      same functionality is available in other languages
 								      (e.g. Haskell) under the name `zip'[2].
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
-												Added a few more open issues:

- what should "zip()" do (i.e. zip with no arguments).

- should zip() be included in the builtins?

- the padtuple proposal

											
										
										
											2000-07-24 13:40:00 -04:00
+								    - Should zip() be including in the builtins module or should it be
 								      in a separate generators module (possibly with other candidate
 								      functions like irange())?
 								    - Padding short sequences with different values.  A suggestion has
 								      been made to allow a `padtuple' (probably better called `pads'
 								      or `padseq') argument similar to `pad'.  This sequence must have
 								      a length equal to the number of sequences given.  It is a
 								      sequence of the individual pad values to use for each sequence,
 								      should it be shorter than the maximum length.
 								      One problem is what to do if `padtuple' itself isn't of the
 								      right length?  A TypeError seems to be the only choice here.
 								      How does `pad' and `padtuple' interact?  Perhaps if padtuple
 								      were too short, it could use pad as a fallback.  padtuple would
 								      always override pad if both were given.
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
 								References
-												Fixed reference [1] to point to the current documentation, not the
devel copy (thanks to Fred Drake, Dr. Docs!)

											
										
										
											2000-07-26 00:22:03 -04:00
+								    [1] http://www.python.org/doc/current/ref/for.html
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
 								    TBD: URL to python-dev archives
 								Copyright
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
-												Latest update.

After consultation with Guido, zip() is chosen as the name of this
built-in.

In reference implementation added an __len__() method.

Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).

Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.

											
										
										
											2000-07-17 14:49:21 -04:00
+								    This document has been placed in the public domain.
-												Initial set of Python Enhancement Proposals

											
										
										
											2000-07-13 02:33:08 -04:00
 								Local Variables:
 								mode: indented-text
 								indent-tabs-mode: nil
 								End: