python-peps/pep-0323.txt

PEP: 323
Title: Copyable Iterators
Version: $Revision$
Last-Modified: $Date$
Author: Alex Martelli <aleaxit@yahoo.com>
Status: Deferred
Type: Standards Track
Content-Type: text/plain
Created: 25-Oct-2003
Python-Version: 2.5
Post-History: 29-Oct-2003


Deferral

  This PEP has been deferred. Copyable iterators are a nice idea, but after
  four years, no implementation or widespread interest has emerged.


Abstract

    This PEP suggests that some iterator types should support shallow
    copies of their instances by exposing a __copy__ method which meets
    some specific requirements, and indicates how code using an iterator
    might exploit such a __copy__ method when present.


Update and Comments

    Support for __copy__ was included in Py2.4's itertools.tee().

    Adding __copy__ methods to existing iterators will change the
    behavior under tee().  Currently, the copied iterators remain
    tied to the original iterator.  If the original advances, then
    so do all of the copies.  Good practice is to overwrite the
    original so that anamolies don't result:  a,b=tee(a).
    Code that doesn't follow that practice may observe a semantic
    change if a __copy__ method is added to an iterator.

Motivation

    In Python up to 2.3, most built-in iterator types don't let the user
    copy their instances.  User-coded iterators that do let their clients
    call copy.copy on their instances may, or may not, happen to return,
    as a result of the copy, a separate iterator object that may be
    iterated upon independently from the original.

    Currently, "support" for copy.copy in a user-coded iterator type is
    almost invariably "accidental" -- i.e., the standard machinery of the
    copy method in Python's standard library's copy module does build and
    return a copy.  However, the copy will be independently iterable with
    respect to the original only if calling .next() on an instance of that
    class happens to change instance state solely by rebinding some
    attributes to new values, and not by mutating some attributes'
    existing values.

    For example, an iterator whose "index" state is held as an integer
    attribute will probably give usable copies, since (integers being
    immutable) .next() presumably just rebinds that attribute.  On the
    other hand, another iterator whose "index" state is held as a list
    attribute will probably mutate the same list object when .next()
    executes, and therefore copies of such an iterator will not be
    iterable separately and independently from the original.

    Given this existing situation, copy.copy(it) on some iterator object
    isn't very useful, nor, therefore, is it at all widely used.  However,
    there are many cases in which being able to get a "snapshot" of an
    iterator, as a "bookmark", so as to be able to keep iterating along
    the sequence but later iterate again on the same sequence from the
    bookmark onwards, is useful.  To support such "bookmarking", module
    itertools, in 2.4, has grown a 'tee' function, to be used as:

        it, bookmark = itertools.tee(it)

    The previous value of 'it' must not be used again, which is why this
    typical usage idiom rebinds the name.  After this call, 'it' and
    'bookmark' are independently-iterable iterators on the same underlying
    sequence as the original value of 'it': this satisfies application
    needs for "iterator copying".

    However, when itertools.tee can make no hypotheses about the nature of
    the iterator it is passed as an argument, it must save in memory all
    items through which one of the two 'teed' iterators, but not yet both,
    have stepped.  This can be quite costly in terms of memory, if the two
    iterators get very far from each other in their stepping; indeed, in
    some cases it may be preferable to make a list from the iterator so as
    to be able to step repeatedly through the subsequence, or, if that is
    too costy in terms of memory, save items to disk, again in order to be
    able to iterate through them repeatedly.

    This PEP proposes another idea that will, in some important cases,
    allow itertools.tee to do its job with minimal cost in terms of
    memory; user code may also occasionally be able to exploit the idea in
    order to decide whether to copy an iterator, make a list from it, or
    use an auxiliary disk file.

    The key consideration is that some important iterators, such as those
    which built-in function iter builds over sequences, would be
    intrinsically easy to copy: just get another reference to the same
    sequence, and a copy of the integer index.  However, in Python 2.3,
    those iterators don't expose the state, and don't support copy.copy.

    The purpose of this PEP, therefore, is to have those iterator types
    expose a suitable __copy__ method.  Similarly, user-coded iterator
    types that can provide copies of their instances, suitable for
    separate and independent iteration, with limited costs in time and
    space, should also expose a suitable __copy__ method.  While
    copy.copy also supports other ways to let a type control the way
    its instances are copied, it is suggested, for simplicity, that
    iterator types that support copying always do so by exposing a
    __copy__ method, and not in the other ways copy.copy supports.

    Having iterators expose a suitable __copy__ when feasible will afford
    easy optimization of itertools.tee and similar user code, as in:

        def tee(it):
            it = iter(it)
            try: copier = it.__copy__
            except AttributeError:
                # non-copyable iterator, do all the needed hard work
                # [snipped!]
            else:
                return it, copier()

    Note that this function does NOT call "copy.copy(it)", which (even
    after this PEP is implemented) might well still "just happen to
    succeed". for some iterator type that is implemented as a user-coded
    class. without really supplying an adequate "independently iterable"
    copy object as its result.


Specification

    Any iterator type X may expose a method __copy__ that is callable
    without arguments on any instance x of X.  The method should be
    exposed if and only if the iterator type can provide copyability with
    reasonably little computational and memory effort.  Furthermore, the
    new object y returned by method __copy__ should be a new instance
    of X that is iterable independently and separately from x, stepping
    along the same "underlying sequence" of items.

    For example, suppose a class Iter essentially duplicated the
    functionality of the iter builtin for iterating on a sequence:

        class Iter(object):

            def __init__(self, sequence):
                self.sequence = sequence
                self.index = 0

            def __iter__(self):
                return self

            def next(self):
                try: result = self.sequence[self.index]
                except IndexError: raise StopIteration
                self.index += 1
                return result

    To make this Iter class compliant with this PEP, the following
    addition to the body of class Iter would suffice:

            def __copy__(self):
                result = self.__class__(self.sequence)
                result.index = self.index
                return result

    Note that __copy__, in this case, does not even try to copy the
    sequence; if the sequence is altered while either or both of the
    original and copied iterators are still stepping on it, the iteration
    behavior is quite likely to go awry anyway -- it is not __copy__'s
    responsibility to change this normal Python behavior for iterators
    which iterate on mutable sequences (that might, perhaps, be the
    specification for a __deepcopy__ method of iterators, which, however,
    this PEP does not deal with).

    Consider also a "random iterator", which provides a nonterminating
    sequence of results from some method of a random instance, called
    with given arguments:

        class RandomIterator(object):

            def __init__(self, bound_method, *args):
                self.call = bound_method
                self.args = args

            def __iter__(self):
                return self

            def next(self):
                return self.call(*self.args)

            def __copy__(self):
                import copy, new
                im_self = copy.copy(self.call.im_self)
                method = new.instancemethod(self.call.im_func, im_self)
                return self.__class__(method, *self.args)

    This iterator type is slightly more general than its name implies, as
    it supports calls to any bound method (or other callable, but if the
    callable is not a bound method, then method __copy__ will fail).  But
    the use case is for the purpose of generating random streams, as in:

            import random

            def show5(it):
                for i, result in enumerate(it):
                    print '%6.3f'%result,
                    if i==4: break
                print

            normit = RandomIterator(random.Random().gauss, 0, 1)
            show5(normit)
            copit = normit.__copy__()
            show5(normit)
            show5(copit)

    which will display some output such as:

            -0.536  1.936 -1.182 -1.690 -1.184
             0.666 -0.701  1.214  0.348  1.373
             0.666 -0.701  1.214  0.348  1.373

    the key point being that the second and third lines are equal, because
    the normit and copit iterators will step along the same "underlying
    sequence".  (As an aside, note that to get a copy of self.call.im_self
    we must use copy.copy, NOT try getting at a __copy__ method directly,
    because for example instances of random.Random support copying via
    __getstate__ and __setstate__, NOT via __copy__; indeed, using
    copy.copy is the normal way to get a shallow copy of any object --
    copyable iterators are different because of the already-mentioned
    uncertainty about the result of copy.copy supporting these "copyable
    iterator" specs).


Details

    Besides adding to the Python docs a recommendation that user-coded
    iterator types support a __copy__ method (if and only if it can be
    implemented with small costs in memory and runtime, and produce an
    independently-iterable copy of an iterator object), this PEP's
    implementation will specifically include the addition of copyability
    to the iterators over sequences that built-in iter returns, and also
    to the iterators over a dictionary returned by the methods __iter__,
    iterkeys, itervalues, and iteritems of built-in type dict.

    Iterators produced by generator functions will not be copyable.
    However, iterators produced by the new "generator expressions" of
    Python 2.4 (PEP 289 [3]) should be copyable if their underlying
    iterator[s] are; the strict limitations on what is possible in a
    generator expression, compared to the much vaster generality of a
    generator, should make that feasible.  Similarly, the iterators
    produced by the built-in function enumerate, and certain functions
    suppiled by module itertools, should be copyable if the underlying
    iterators are.

    The implementation of this PEP will also include the optimization of
    the new itertools.tee function mentioned in the Motivation section.


Rationale

    The main use case for (shallow) copying of an iterator is the same as
    for the function itertools.tee (new in 2.4).  User code will not
    directly attempt to copy an iterator, because it would have to deal
    separately with uncopyable cases; calling itertools.tee will
    internally perform the copy when appropriate, and implicitly fallback
    to a maximally efficient non-copying strategy for iterators that are
    not copyable.  (Occasionally, user code may want more direct control,
    specifically in order to deal with non-copyable iterators by other
    strategies, such as making a list or saving the sequence to disk).

    A tee'd iterator may serve as a "reference point", allowing processing
    of a sequence to continue or resume from a known point, while the
    other independent iterator can be freely advanced to "explore" a
    further part of the sequence as needed.  A simple example: a generator
    function which, given an iterator of numbers (assumed to be positive),
    returns a corresponding iterator, each of whose items is the fraction
    of the total corresponding to each corresponding item of the input
    iterator.  The caller may pass the total as a value, if known in
    advance; otherwise, the iterator returned by calling this generator
    function will first compute the total.

        def fractions(numbers, total=None):
            if total is None:
                numbers, aux = itertools.tee(numbers)
                total = sum(aux)
            total = float(total)
            for item in numbers:
                yield item / total

    The ability to tee the numbers iterator allows this generator to
    precompute the total, if needed, without necessarily requiring
    O(N) auxiliary memory if the numbers iterator is copyable.

    As another example of "iterator bookmarking", consider a stream of
    numbers with an occasional string as a "postfix operator" now and
    then.  By far most frequent such operator is a '+', whereupon we must
    sum all previous numbers (since the last previous operator if any, or
    else since the start) and yield the result.  Sometimes we find a '*'
    instead, which is the same except that the previous numbers must
    instead be multiplied, not summed.

        def filter_weird_stream(stream):
            it = iter(stream)
            while True:
                it, bookmark = itertools.tee(it)
                total = 0
                for item in it:
                    if item=='+':
                        yield total
                        break
                    elif item=='*':
                        product = 1
                        for item in bookmark:
                            if item=='*':
                                yield product
                                break
                            else:
                                product *= item
                   else:
                       total += item

    Similar use cases of itertools.tee can support such tasks as
    "undo" on a stream of commands represented by an iterator,
    "backtracking" on the parse of a stream of tokens, and so on.
    (Of course, in each case, one should also consider simpler
    possibilities such as saving relevant portions of the sequence
    into lists while stepping on the sequence with just one iterator,
    depending on the details of one's task).


    Here is an example, in pure Python, of how the 'enumerate'
    built-in could be extended to support __copy__ if its underlying
    iterator also supported __copy__:

        class enumerate(object):

            def __init__(self, it):
                self.it = iter(it)
                self.i = -1

            def __iter__(self):
                return self

            def next(self):
                self.i += 1
                return self.i, self.it.next()

            def __copy__(self):
                result = self.__class__.__new__()
                result.it = self.it.__copy__()
                result.i = self.i
                return result


    Here is an example of the kind of "fragility" produced by "accidental
    copyability" of an iterator -- the reason why one must NOT use
    copy.copy expecting, if it succeeds, to receive as a result an
    iterator which is iterable-on independently from the original.  Here
    is an iterator class that iterates (in preorder) on "trees" which, for
    simplicity, are just nested lists -- any item that's a list is treated
    as a subtree, any other item as a leaf.

    class ListreeIter(object):

        def __init__(self, tree):
            self.tree = [tree]
            self.indx = [-1]

        def __iter__(self):
            return self

        def next(self):
            if not self.indx:
                raise StopIteration
            self.indx[-1] += 1
            try:
                result = self.tree[-1][self.indx[-1]]
            except IndexError:
                self.tree.pop()
                self.indx.pop()
                return self.next()
            if type(result) is not list:
                return result
            self.tree.append(result)
            self.indx.append(-1)
            return self.next()

    Now, for example, the following code:

        import copy
        x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]

        print 'showing all items:',
        it = ListreeIter(x)
        for i in it:
            print i,
            if i==6: cop = copy.copy(it)
        print

        print 'showing items >6 again:'
        for i in cop: print i,
        print

    does NOT work as intended -- the "cop" iterator gets consumed, and
    exhausted, step by step as the original "it" iterator is, because
    the accidental (rather than deliberate) copying performed by
    copy.copy shares, rather than duplicating the "index" list, which
    is the mutable attribute it.indx (a list of numerical indices).
    Thus, this "client code" of the iterator, which attemps to iterate
    twice over a portion of the sequence via a copy.copy on the
    iterator, is NOT correct.

    Some correct solutions include using itertools.tee, i.e., changing
    the first for loop into:

        for i in it:
            print i,
            if i==6:
                it, cop = itertools.tee(it)
                break
        for i in it: print i,

    (note that we MUST break the loop in two, otherwise we'd still
    be looping on the ORIGINAL value of it, which must NOT be used
    further after the call to tee!!!); or making a list, i.e.:

        for i in it:
            print i,
            if i==6:
                cop = lit = list(it)
                break
        for i in lit: print i,
    
    (again, the loop must be broken in two, since iterator 'it'
    gets exhausted by the call list(it)).

    Finally, all of these solutions would work if Listiter supplied
    a suitable __copy__ method, as this PEP recommends:

            def __copy__(self):
                result = self.__class__.new()
                result.tree = copy.copy(self.tree)
                result.indx = copy.copy(self.indx)
                return result

    There is no need to get any "deeper" in the copy, but the two
    mutable "index state" attributes must indeed be copied in order
    to achieve a "proper" (independently iterable) iterator-copy.

    The recommended solution is to have class Listiter supply this
    __copy__ method AND have client code use itertools.tee (with
    the split-in-two-parts loop as shown above).  This will make
    client code maximally tolerant of different iterator types it
    might be using AND achieve good performance for tee'ing of this
    specific iterator type at the same time.


References

    [1] Discussion on python-dev starting at post:
        http://mail.python.org/pipermail/python-dev/2003-October/038969.html

    [2] Online documentation for the copy module of the standard library:
        http://www.python.org/doc/current/lib/module-copy.html

    [3] PEP 289, Generator Expressions, Hettinger
        http://www.python.org/peps/pep-0289.html

Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
+								PEP: 323
 								Title: Copyable Iterators
 								Version: $Revision$
 								Last-Modified: $Date$
 								Author: Alex Martelli <aleaxit@yahoo.com>
-												Change the status of a whole lotta PEPs.
Some are marked final years after the work was completed and released;
others are rejected after sitting idly for years.

											
										
										
											2007-05-18 13:41:31 -04:00
+								Status: Deferred
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
+								Type: Standards Track
 								Content-Type: text/plain
 								Created: 25-Oct-2003
-												Add comments to 323.

											
										
										
											2005-06-28 04:46:39 -04:00
+								Python-Version: 2.5
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								Post-History: 29-Oct-2003
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Change the status of a whole lotta PEPs.
Some are marked final years after the work was completed and released;
others are rejected after sitting idly for years.

											
										
										
											2007-05-18 13:41:31 -04:00
+								Deferral
 								  This PEP has been deferred. Copyable iterators are a nice idea, but after
 								  four years, no implementation or widespread interest has emerged.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
+								Abstract
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    This PEP suggests that some iterator types should support shallow
 								    copies of their instances by exposing a __copy__ method which meets
 								    some specific requirements, and indicates how code using an iterator
 								    might exploit such a __copy__ method when present.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Add comments to 323.

											
										
										
											2005-06-28 04:46:39 -04:00
+								Update and Comments
 								    Support for __copy__ was included in Py2.4's itertools.tee().
 								    Adding __copy__ methods to existing iterators will change the
 								    behavior under tee().  Currently, the copied iterators remain
 								    tied to the original iterator.  If the original advances, then
 								    so do all of the copies.  Good practice is to overwrite the
 								    original so that anamolies don't result:  a,b=tee(a).
 								    Code that doesn't follow that practice may observe a semantic
 								    change if a __copy__ method is added to an iterator.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
+								Motivation
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    In Python up to 2.3, most built-in iterator types don't let the user
 								    copy their instances.  User-coded iterators that do let their clients
 								    call copy.copy on their instances may, or may not, happen to return,
 								    as a result of the copy, a separate iterator object that may be
 								    iterated upon independently from the original.
 								    Currently, "support" for copy.copy in a user-coded iterator type is
 								    almost invariably "accidental" -- i.e., the standard machinery of the
 								    copy method in Python's standard library's copy module does build and
 								    return a copy.  However, the copy will be independently iterable with
 								    respect to the original only if calling .next() on an instance of that
 								    class happens to change instance state solely by rebinding some
 								    attributes to new values, and not by mutating some attributes'
 								    existing values.
 								    For example, an iterator whose "index" state is held as an integer
 								    attribute will probably give usable copies, since (integers being
 								    immutable) .next() presumably just rebinds that attribute.  On the
 								    other hand, another iterator whose "index" state is held as a list
 								    attribute will probably mutate the same list object when .next()
 								    executes, and therefore copies of such an iterator will not be
 								    iterable separately and independently from the original.
 								    Given this existing situation, copy.copy(it) on some iterator object
 								    isn't very useful, nor, therefore, is it at all widely used.  However,
 								    there are many cases in which being able to get a "snapshot" of an
 								    iterator, as a "bookmark", so as to be able to keep iterating along
 								    the sequence but later iterate again on the same sequence from the
 								    bookmark onwards, is useful.  To support such "bookmarking", module
 								    itertools, in 2.4, has grown a 'tee' function, to be used as:
 								        it, bookmark = itertools.tee(it)
 								    The previous value of 'it' must not be used again, which is why this
 								    typical usage idiom rebinds the name.  After this call, 'it' and
 								    'bookmark' are independently-iterable iterators on the same underlying
 								    sequence as the original value of 'it': this satisfies application
 								    needs for "iterator copying".
 								    However, when itertools.tee can make no hypotheses about the nature of
 								    the iterator it is passed as an argument, it must save in memory all
 								    items through which one of the two 'teed' iterators, but not yet both,
 								    have stepped.  This can be quite costly in terms of memory, if the two
 								    iterators get very far from each other in their stepping; indeed, in
 								    some cases it may be preferable to make a list from the iterator so as
 								    to be able to step repeatedly through the subsequence, or, if that is
 								    too costy in terms of memory, save items to disk, again in order to be
 								    able to iterate through them repeatedly.
 								    This PEP proposes another idea that will, in some important cases,
 								    allow itertools.tee to do its job with minimal cost in terms of
 								    memory; user code may also occasionally be able to exploit the idea in
 								    order to decide whether to copy an iterator, make a list from it, or
 								    use an auxiliary disk file.
 								    The key consideration is that some important iterators, such as those
 								    which built-in function iter builds over sequences, would be
 								    intrinsically easy to copy: just get another reference to the same
 								    sequence, and a copy of the integer index.  However, in Python 2.3,
 								    those iterators don't expose the state, and don't support copy.copy.
 								    The purpose of this PEP, therefore, is to have those iterator types
 								    expose a suitable __copy__ method.  Similarly, user-coded iterator
 								    types that can provide copies of their instances, suitable for
 								    separate and independent iteration, with limited costs in time and
 								    space, should also expose a suitable __copy__ method.  While
 								    copy.copy also supports other ways to let a type control the way
 								    its instances are copied, it is suggested, for simplicity, that
 								    iterator types that support copying always do so by exposing a
 								    __copy__ method, and not in the other ways copy.copy supports.
 								    Having iterators expose a suitable __copy__ when feasible will afford
 								    easy optimization of itertools.tee and similar user code, as in:
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
 								        def tee(it):
 								            it = iter(it)
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								            try: copier = it.__copy__
 								            except AttributeError:
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
+								                # non-copyable iterator, do all the needed hard work
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								                # [snipped!]
 								            else:
 								                return it, copier()
 								    Note that this function does NOT call "copy.copy(it)", which (even
 								    after this PEP is implemented) might well still "just happen to
 								    succeed". for some iterator type that is implemented as a user-coded
 								    class. without really supplying an adequate "independently iterable"
 								    copy object as its result.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
 								Specification
 								    Any iterator type X may expose a method __copy__ that is callable
 								    without arguments on any instance x of X.  The method should be
 								    exposed if and only if the iterator type can provide copyability with
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    reasonably little computational and memory effort.  Furthermore, the
 								    new object y returned by method __copy__ should be a new instance
 								    of X that is iterable independently and separately from x, stepping
 								    along the same "underlying sequence" of items.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
 								    For example, suppose a class Iter essentially duplicated the
 								    functionality of the iter builtin for iterating on a sequence:
 								        class Iter(object):
 								            def __init__(self, sequence):
 								                self.sequence = sequence
 								                self.index = 0
 								            def __iter__(self):
 								                return self
 								            def next(self):
 								                try: result = self.sequence[self.index]
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								                except IndexError: raise StopIteration
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
+								                self.index += 1
 								                return result
 								    To make this Iter class compliant with this PEP, the following
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    addition to the body of class Iter would suffice:
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
 								            def __copy__(self):
-												fixed 2 typos in example code kindly pointed out by Tom Jenkins.

											
										
										
											2004-01-02 04:15:55 -05:00
+								                result = self.__class__(self.sequence)
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
+								                result.index = self.index
 								                return result
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    Note that __copy__, in this case, does not even try to copy the
 								    sequence; if the sequence is altered while either or both of the
 								    original and copied iterators are still stepping on it, the iteration
 								    behavior is quite likely to go awry anyway -- it is not __copy__'s
 								    responsibility to change this normal Python behavior for iterators
 								    which iterate on mutable sequences (that might, perhaps, be the
 								    specification for a __deepcopy__ method of iterators, which, however,
 								    this PEP does not deal with).
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    Consider also a "random iterator", which provides a nonterminating
 								    sequence of results from some method of a random instance, called
 								    with given arguments:
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								        class RandomIterator(object):
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								            def __init__(self, bound_method, *args):
 								                self.call = bound_method
 								                self.args = args
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								            def __iter__(self):
 								                return self
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								            def next(self):
 								                return self.call(*self.args)
 								            def __copy__(self):
 								                import copy, new
 								                im_self = copy.copy(self.call.im_self)
 								                method = new.instancemethod(self.call.im_func, im_self)
 								                return self.__class__(method, *self.args)
 								    This iterator type is slightly more general than its name implies, as
 								    it supports calls to any bound method (or other callable, but if the
 								    callable is not a bound method, then method __copy__ will fail).  But
 								    the use case is for the purpose of generating random streams, as in:
 								            import random
 								            def show5(it):
 								                for i, result in enumerate(it):
 								                    print '%6.3f'%result,
 								                    if i==4: break
 								                print
 								            normit = RandomIterator(random.Random().gauss, 0, 1)
 								            show5(normit)
 								            copit = normit.__copy__()
 								            show5(normit)
 								            show5(copit)
 								    which will display some output such as:
 								            -0.536  1.936 -1.182 -1.690 -1.184
 .666 -0.701  1.214  0.348  1.373
 .666 -0.701  1.214  0.348  1.373
 								    the key point being that the second and third lines are equal, because
 								    the normit and copit iterators will step along the same "underlying
 								    sequence".  (As an aside, note that to get a copy of self.call.im_self
 								    we must use copy.copy, NOT try getting at a __copy__ method directly,
 								    because for example instances of random.Random support copying via
 								    __getstate__ and __setstate__, NOT via __copy__; indeed, using
 								    copy.copy is the normal way to get a shallow copy of any object --
 								    copyable iterators are different because of the already-mentioned
 								    uncertainty about the result of copy.copy supporting these "copyable
 								    iterator" specs).
 								Details
 								    Besides adding to the Python docs a recommendation that user-coded
 								    iterator types support a __copy__ method (if and only if it can be
 								    implemented with small costs in memory and runtime, and produce an
 								    independently-iterable copy of an iterator object), this PEP's
 								    implementation will specifically include the addition of copyability
 								    to the iterators over sequences that built-in iter returns, and also
 								    to the iterators over a dictionary returned by the methods __iter__,
 								    iterkeys, itervalues, and iteritems of built-in type dict.
 								    Iterators produced by generator functions will not be copyable.
 								    However, iterators produced by the new "generator expressions" of
 								    Python 2.4 (PEP 289 [3]) should be copyable if their underlying
 								    iterator[s] are; the strict limitations on what is possible in a
 								    generator expression, compared to the much vaster generality of a
 								    generator, should make that feasible.  Similarly, the iterators
 								    produced by the built-in function enumerate, and certain functions
 								    suppiled by module itertools, should be copyable if the underlying
 								    iterators are.
 								    The implementation of this PEP will also include the optimization of
 								    the new itertools.tee function mentioned in the Motivation section.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
 								Rationale
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    The main use case for (shallow) copying of an iterator is the same as
 								    for the function itertools.tee (new in 2.4).  User code will not
 								    directly attempt to copy an iterator, because it would have to deal
 								    separately with uncopyable cases; calling itertools.tee will
 								    internally perform the copy when appropriate, and implicitly fallback
 								    to a maximally efficient non-copying strategy for iterators that are
 								    not copyable.  (Occasionally, user code may want more direct control,
 								    specifically in order to deal with non-copyable iterators by other
 								    strategies, such as making a list or saving the sequence to disk).
 								    A tee'd iterator may serve as a "reference point", allowing processing
 								    of a sequence to continue or resume from a known point, while the
 								    other independent iterator can be freely advanced to "explore" a
 								    further part of the sequence as needed.  A simple example: a generator
 								    function which, given an iterator of numbers (assumed to be positive),
 								    returns a corresponding iterator, each of whose items is the fraction
 								    of the total corresponding to each corresponding item of the input
 								    iterator.  The caller may pass the total as a value, if known in
 								    advance; otherwise, the iterator returned by calling this generator
 								    function will first compute the total.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
 								        def fractions(numbers, total=None):
 								            if total is None:
 								                numbers, aux = itertools.tee(numbers)
 								                total = sum(aux)
 								            total = float(total)
 								            for item in numbers:
 								                yield item / total
 								    The ability to tee the numbers iterator allows this generator to
 								    precompute the total, if needed, without necessarily requiring
 								    O(N) auxiliary memory if the numbers iterator is copyable.
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
 								    As another example of "iterator bookmarking", consider a stream of
 								    numbers with an occasional string as a "postfix operator" now and
 								    then.  By far most frequent such operator is a '+', whereupon we must
 								    sum all previous numbers (since the last previous operator if any, or
 								    else since the start) and yield the result.  Sometimes we find a '*'
 								    instead, which is the same except that the previous numbers must
 								    instead be multiplied, not summed.
 								        def filter_weird_stream(stream):
 								            it = iter(stream)
 								            while True:
 								                it, bookmark = itertools.tee(it)
 								                total = 0
 								                for item in it:
 								                    if item=='+':
 								                        yield total
 								                        break
 								                    elif item=='*':
 								                        product = 1
 								                        for item in bookmark:
 								                            if item=='*':
 								                                yield product
 								                                break
 								                            else:
 								                                product *= item
 								                   else:
 								                       total += item
 								    Similar use cases of itertools.tee can support such tasks as
 								    "undo" on a stream of commands represented by an iterator,
 								    "backtracking" on the parse of a stream of tokens, and so on.
 								    (Of course, in each case, one should also consider simpler
 								    possibilities such as saving relevant portions of the sequence
 								    into lists while stepping on the sequence with just one iterator,
 								    depending on the details of one's task).
 								    Here is an example, in pure Python, of how the 'enumerate'
 								    built-in could be extended to support __copy__ if its underlying
 								    iterator also supported __copy__:
 								        class enumerate(object):
 								            def __init__(self, it):
 								                self.it = iter(it)
 								                self.i = -1
 								            def __iter__(self):
 								                return self
 								            def next(self):
 								                self.i += 1
 								                return self.i, self.it.next()
 								            def __copy__(self):
-												fixed 2 typos in example code kindly pointed out by Tom Jenkins.

											
										
										
											2004-01-02 04:15:55 -05:00
+								                result = self.__class__.__new__()
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								                result.it = self.it.__copy__()
 								                result.i = self.i
 								                return result
 								    Here is an example of the kind of "fragility" produced by "accidental
 								    copyability" of an iterator -- the reason why one must NOT use
 								    copy.copy expecting, if it succeeds, to receive as a result an
 								    iterator which is iterable-on independently from the original.  Here
 								    is an iterator class that iterates (in preorder) on "trees" which, for
 								    simplicity, are just nested lists -- any item that's a list is treated
 								    as a subtree, any other item as a leaf.
 								    class ListreeIter(object):
 								        def __init__(self, tree):
 								            self.tree = [tree]
 								            self.indx = [-1]
 								        def __iter__(self):
 								            return self
 								        def next(self):
 								            if not self.indx:
 								                raise StopIteration
 								            self.indx[-1] += 1
 								            try:
 								                result = self.tree[-1][self.indx[-1]]
 								            except IndexError:
 								                self.tree.pop()
 								                self.indx.pop()
 								                return self.next()
 								            if type(result) is not list:
 								                return result
 								            self.tree.append(result)
 								            self.indx.append(-1)
 								            return self.next()
 								    Now, for example, the following code:
 								        import copy
 								        x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]
 								        print 'showing all items:',
 								        it = ListreeIter(x)
 								        for i in it:
 								            print i,
 								            if i==6: cop = copy.copy(it)
 								        print
 								        print 'showing items >6 again:'
 								        for i in cop: print i,
 								        print
 								    does NOT work as intended -- the "cop" iterator gets consumed, and
 								    exhausted, step by step as the original "it" iterator is, because
 								    the accidental (rather than deliberate) copying performed by
 								    copy.copy shares, rather than duplicating the "index" list, which
 								    is the mutable attribute it.indx (a list of numerical indices).
 								    Thus, this "client code" of the iterator, which attemps to iterate
 								    twice over a portion of the sequence via a copy.copy on the
 								    iterator, is NOT correct.
 								    Some correct solutions include using itertools.tee, i.e., changing
 								    the first for loop into:
 								        for i in it:
 								            print i,
 								            if i==6:
 								                it, cop = itertools.tee(it)
 								                break
 								        for i in it: print i,
 								    (note that we MUST break the loop in two, otherwise we'd still
 								    be looping on the ORIGINAL value of it, which must NOT be used
 								    further after the call to tee!!!); or making a list, i.e.:
 								        for i in it:
 								            print i,
 								            if i==6:
 								                cop = lit = list(it)
 								                break
 								        for i in lit: print i,
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
-												Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.

											
										
										
											2003-10-29 08:39:51 -05:00
+								    (again, the loop must be broken in two, since iterator 'it'
 								    gets exhausted by the call list(it)).
 								    Finally, all of these solutions would work if Listiter supplied
 								    a suitable __copy__ method, as this PEP recommends:
 								            def __copy__(self):
 								                result = self.__class__.new()
 								                result.tree = copy.copy(self.tree)
 								                result.indx = copy.copy(self.indx)
 								                return result
 								    There is no need to get any "deeper" in the copy, but the two
 								    mutable "index state" attributes must indeed be copied in order
 								    to achieve a "proper" (independently iterable) iterator-copy.
 								    The recommended solution is to have class Listiter supply this
 								    __copy__ method AND have client code use itertools.tee (with
 								    the split-in-two-parts loop as shown above).  This will make
 								    client code maximally tolerant of different iterator types it
 								    might be using AND achieve good performance for tee'ing of this
 								    specific iterator type at the same time.
-												Added PEP 323, "Copyable Iterators".

											
										
										
											2003-10-27 10:01:08 -05:00
 								References
 								    [1] Discussion on python-dev starting at post:
 								        http://mail.python.org/pipermail/python-dev/2003-October/038969.html
 								    [2] Online documentation for the copy module of the standard library:
 								        http://www.python.org/doc/current/lib/module-copy.html
 								    [3] PEP 289, Generator Expressions, Hettinger
 								        http://www.python.org/peps/pep-0289.html
 								Copyright
 								    This document has been placed in the public domain.
 								Local Variables:
 								mode: indented-text
 								indent-tabs-mode: nil
 								sentence-end-double-space: t
 								fill-column: 70
 								End: