reSTify PEP 323 (#351)
This commit is contained in:
parent
86a462a7b4
commit
5aea3b9792
705
pep-0323.txt
705
pep-0323.txt
|
@ -5,362 +5,375 @@ Last-Modified: $Date$
|
|||
Author: Alex Martelli <aleaxit@gmail.com>
|
||||
Status: Deferred
|
||||
Type: Standards Track
|
||||
Content-Type: text/plain
|
||||
Content-Type: text/x-rst
|
||||
Created: 25-Oct-2003
|
||||
Python-Version: 2.5
|
||||
Post-History: 29-Oct-2003
|
||||
|
||||
|
||||
Deferral
|
||||
========
|
||||
|
||||
This PEP has been deferred. Copyable iterators are a nice idea, but after
|
||||
four years, no implementation or widespread interest has emerged.
|
||||
This PEP has been deferred. Copyable iterators are a nice idea, but after
|
||||
four years, no implementation or widespread interest has emerged.
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP suggests that some iterator types should support shallow
|
||||
copies of their instances by exposing a __copy__ method which meets
|
||||
some specific requirements, and indicates how code using an iterator
|
||||
might exploit such a __copy__ method when present.
|
||||
This PEP suggests that some iterator types should support shallow
|
||||
copies of their instances by exposing a ``__copy__`` method which meets
|
||||
some specific requirements, and indicates how code using an iterator
|
||||
might exploit such a ``__copy__`` method when present.
|
||||
|
||||
|
||||
Update and Comments
|
||||
===================
|
||||
|
||||
Support for __copy__ was included in Py2.4's itertools.tee().
|
||||
Support for ``__copy__`` was included in Py2.4's ``itertools.tee()``.
|
||||
|
||||
Adding __copy__ methods to existing iterators will change the
|
||||
behavior under tee(). Currently, the copied iterators remain
|
||||
tied to the original iterator. If the original advances, then
|
||||
so do all of the copies. Good practice is to overwrite the
|
||||
original so that anamolies don't result: a,b=tee(a).
|
||||
Code that doesn't follow that practice may observe a semantic
|
||||
change if a __copy__ method is added to an iterator.
|
||||
Adding ``__copy__`` methods to existing iterators will change the
|
||||
behavior under ``tee()``. Currently, the copied iterators remain
|
||||
tied to the original iterator. If the original advances, then
|
||||
so do all of the copies. Good practice is to overwrite the
|
||||
original so that anamolies don't result: ``a,b=tee(a)``.
|
||||
Code that doesn't follow that practice may observe a semantic
|
||||
change if a ``__copy__`` method is added to an iterator.
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
In Python up to 2.3, most built-in iterator types don't let the user
|
||||
copy their instances. User-coded iterators that do let their clients
|
||||
call copy.copy on their instances may, or may not, happen to return,
|
||||
as a result of the copy, a separate iterator object that may be
|
||||
iterated upon independently from the original.
|
||||
In Python up to 2.3, most built-in iterator types don't let the user
|
||||
copy their instances. User-coded iterators that do let their clients
|
||||
call copy.copy on their instances may, or may not, happen to return,
|
||||
as a result of the copy, a separate iterator object that may be
|
||||
iterated upon independently from the original.
|
||||
|
||||
Currently, "support" for copy.copy in a user-coded iterator type is
|
||||
almost invariably "accidental" -- i.e., the standard machinery of the
|
||||
copy method in Python's standard library's copy module does build and
|
||||
return a copy. However, the copy will be independently iterable with
|
||||
respect to the original only if calling .next() on an instance of that
|
||||
class happens to change instance state solely by rebinding some
|
||||
attributes to new values, and not by mutating some attributes'
|
||||
existing values.
|
||||
Currently, "support" for copy.copy in a user-coded iterator type is
|
||||
almost invariably "accidental" -- i.e., the standard machinery of the
|
||||
copy method in Python's standard library's copy module does build and
|
||||
return a copy. However, the copy will be independently iterable with
|
||||
respect to the original only if calling ``.next()`` on an instance of that
|
||||
class happens to change instance state solely by rebinding some
|
||||
attributes to new values, and not by mutating some attributes'
|
||||
existing values.
|
||||
|
||||
For example, an iterator whose "index" state is held as an integer
|
||||
attribute will probably give usable copies, since (integers being
|
||||
immutable) .next() presumably just rebinds that attribute. On the
|
||||
other hand, another iterator whose "index" state is held as a list
|
||||
attribute will probably mutate the same list object when .next()
|
||||
executes, and therefore copies of such an iterator will not be
|
||||
iterable separately and independently from the original.
|
||||
For example, an iterator whose "index" state is held as an integer
|
||||
attribute will probably give usable copies, since (integers being
|
||||
immutable) ``.next()`` presumably just rebinds that attribute. On the
|
||||
other hand, another iterator whose "index" state is held as a list
|
||||
attribute will probably mutate the same list object when ``.next()``
|
||||
executes, and therefore copies of such an iterator will not be
|
||||
iterable separately and independently from the original.
|
||||
|
||||
Given this existing situation, copy.copy(it) on some iterator object
|
||||
isn't very useful, nor, therefore, is it at all widely used. However,
|
||||
there are many cases in which being able to get a "snapshot" of an
|
||||
iterator, as a "bookmark", so as to be able to keep iterating along
|
||||
the sequence but later iterate again on the same sequence from the
|
||||
bookmark onwards, is useful. To support such "bookmarking", module
|
||||
itertools, in 2.4, has grown a 'tee' function, to be used as:
|
||||
Given this existing situation, ``copy.copy(it)`` on some iterator object
|
||||
isn't very useful, nor, therefore, is it at all widely used. However,
|
||||
there are many cases in which being able to get a "snapshot" of an
|
||||
iterator, as a "bookmark", so as to be able to keep iterating along
|
||||
the sequence but later iterate again on the same sequence from the
|
||||
bookmark onwards, is useful. To support such "bookmarking", module
|
||||
itertools, in 2.4, has grown a 'tee' function, to be used as::
|
||||
|
||||
it, bookmark = itertools.tee(it)
|
||||
it, bookmark = itertools.tee(it)
|
||||
|
||||
The previous value of 'it' must not be used again, which is why this
|
||||
typical usage idiom rebinds the name. After this call, 'it' and
|
||||
'bookmark' are independently-iterable iterators on the same underlying
|
||||
sequence as the original value of 'it': this satisfies application
|
||||
needs for "iterator copying".
|
||||
The previous value of 'it' must not be used again, which is why this
|
||||
typical usage idiom rebinds the name. After this call, 'it' and
|
||||
'bookmark' are independently-iterable iterators on the same underlying
|
||||
sequence as the original value of 'it': this satisfies application
|
||||
needs for "iterator copying".
|
||||
|
||||
However, when itertools.tee can make no hypotheses about the nature of
|
||||
the iterator it is passed as an argument, it must save in memory all
|
||||
items through which one of the two 'teed' iterators, but not yet both,
|
||||
have stepped. This can be quite costly in terms of memory, if the two
|
||||
iterators get very far from each other in their stepping; indeed, in
|
||||
some cases it may be preferable to make a list from the iterator so as
|
||||
to be able to step repeatedly through the subsequence, or, if that is
|
||||
too costy in terms of memory, save items to disk, again in order to be
|
||||
able to iterate through them repeatedly.
|
||||
However, when itertools.tee can make no hypotheses about the nature of
|
||||
the iterator it is passed as an argument, it must save in memory all
|
||||
items through which one of the two 'teed' iterators, but not yet both,
|
||||
have stepped. This can be quite costly in terms of memory, if the two
|
||||
iterators get very far from each other in their stepping; indeed, in
|
||||
some cases it may be preferable to make a list from the iterator so as
|
||||
to be able to step repeatedly through the subsequence, or, if that is
|
||||
too costy in terms of memory, save items to disk, again in order to be
|
||||
able to iterate through them repeatedly.
|
||||
|
||||
This PEP proposes another idea that will, in some important cases,
|
||||
allow itertools.tee to do its job with minimal cost in terms of
|
||||
memory; user code may also occasionally be able to exploit the idea in
|
||||
order to decide whether to copy an iterator, make a list from it, or
|
||||
use an auxiliary disk file.
|
||||
This PEP proposes another idea that will, in some important cases,
|
||||
allow ``itertools.tee`` to do its job with minimal cost in terms of
|
||||
memory; user code may also occasionally be able to exploit the idea in
|
||||
order to decide whether to copy an iterator, make a list from it, or
|
||||
use an auxiliary disk file.
|
||||
|
||||
The key consideration is that some important iterators, such as those
|
||||
which built-in function iter builds over sequences, would be
|
||||
intrinsically easy to copy: just get another reference to the same
|
||||
sequence, and a copy of the integer index. However, in Python 2.3,
|
||||
those iterators don't expose the state, and don't support copy.copy.
|
||||
The key consideration is that some important iterators, such as those
|
||||
which built-in function iter builds over sequences, would be
|
||||
intrinsically easy to copy: just get another reference to the same
|
||||
sequence, and a copy of the integer index. However, in Python 2.3,
|
||||
those iterators don't expose the state, and don't support ``copy.copy``.
|
||||
|
||||
The purpose of this PEP, therefore, is to have those iterator types
|
||||
expose a suitable __copy__ method. Similarly, user-coded iterator
|
||||
types that can provide copies of their instances, suitable for
|
||||
separate and independent iteration, with limited costs in time and
|
||||
space, should also expose a suitable __copy__ method. While
|
||||
copy.copy also supports other ways to let a type control the way
|
||||
its instances are copied, it is suggested, for simplicity, that
|
||||
iterator types that support copying always do so by exposing a
|
||||
__copy__ method, and not in the other ways copy.copy supports.
|
||||
The purpose of this PEP, therefore, is to have those iterator types
|
||||
expose a suitable ``__copy__`` method. Similarly, user-coded iterator
|
||||
types that can provide copies of their instances, suitable for
|
||||
separate and independent iteration, with limited costs in time and
|
||||
space, should also expose a suitable ``__copy__`` method. While
|
||||
copy.copy also supports other ways to let a type control the way
|
||||
its instances are copied, it is suggested, for simplicity, that
|
||||
iterator types that support copying always do so by exposing a
|
||||
``__copy__`` method, and not in the other ways ``copy.copy`` supports.
|
||||
|
||||
Having iterators expose a suitable __copy__ when feasible will afford
|
||||
easy optimization of itertools.tee and similar user code, as in:
|
||||
Having iterators expose a suitable ``__copy__`` when feasible will afford
|
||||
easy optimization of itertools.tee and similar user code, as in::
|
||||
|
||||
def tee(it):
|
||||
it = iter(it)
|
||||
try: copier = it.__copy__
|
||||
except AttributeError:
|
||||
# non-copyable iterator, do all the needed hard work
|
||||
# [snipped!]
|
||||
else:
|
||||
return it, copier()
|
||||
def tee(it):
|
||||
it = iter(it)
|
||||
try: copier = it.__copy__
|
||||
except AttributeError:
|
||||
# non-copyable iterator, do all the needed hard work
|
||||
# [snipped!]
|
||||
else:
|
||||
return it, copier()
|
||||
|
||||
Note that this function does NOT call "copy.copy(it)", which (even
|
||||
after this PEP is implemented) might well still "just happen to
|
||||
succeed". for some iterator type that is implemented as a user-coded
|
||||
class. without really supplying an adequate "independently iterable"
|
||||
copy object as its result.
|
||||
Note that this function does NOT call "copy.copy(it)", which (even
|
||||
after this PEP is implemented) might well still "just happen to
|
||||
succeed". for some iterator type that is implemented as a user-coded
|
||||
class. without really supplying an adequate "independently iterable"
|
||||
copy object as its result.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
Any iterator type X may expose a method __copy__ that is callable
|
||||
without arguments on any instance x of X. The method should be
|
||||
exposed if and only if the iterator type can provide copyability with
|
||||
reasonably little computational and memory effort. Furthermore, the
|
||||
new object y returned by method __copy__ should be a new instance
|
||||
of X that is iterable independently and separately from x, stepping
|
||||
along the same "underlying sequence" of items.
|
||||
Any iterator type X may expose a method ``__copy__`` that is callable
|
||||
without arguments on any instance x of X. The method should be
|
||||
exposed if and only if the iterator type can provide copyability with
|
||||
reasonably little computational and memory effort. Furthermore, the
|
||||
new object y returned by method ``__copy__`` should be a new instance
|
||||
of X that is iterable independently and separately from x, stepping
|
||||
along the same "underlying sequence" of items.
|
||||
|
||||
For example, suppose a class Iter essentially duplicated the
|
||||
functionality of the iter builtin for iterating on a sequence:
|
||||
For example, suppose a class Iter essentially duplicated the
|
||||
functionality of the iter builtin for iterating on a sequence::
|
||||
|
||||
class Iter(object):
|
||||
class Iter(object):
|
||||
|
||||
def __init__(self, sequence):
|
||||
self.sequence = sequence
|
||||
self.index = 0
|
||||
def __init__(self, sequence):
|
||||
self.sequence = sequence
|
||||
self.index = 0
|
||||
|
||||
def __iter__(self):
|
||||
return self
|
||||
def __iter__(self):
|
||||
return self
|
||||
|
||||
def next(self):
|
||||
try: result = self.sequence[self.index]
|
||||
except IndexError: raise StopIteration
|
||||
self.index += 1
|
||||
return result
|
||||
def next(self):
|
||||
try: result = self.sequence[self.index]
|
||||
except IndexError: raise StopIteration
|
||||
self.index += 1
|
||||
return result
|
||||
|
||||
To make this Iter class compliant with this PEP, the following
|
||||
addition to the body of class Iter would suffice:
|
||||
To make this Iter class compliant with this PEP, the following
|
||||
addition to the body of class Iter would suffice::
|
||||
|
||||
def __copy__(self):
|
||||
result = self.__class__(self.sequence)
|
||||
result.index = self.index
|
||||
return result
|
||||
def __copy__(self):
|
||||
result = self.__class__(self.sequence)
|
||||
result.index = self.index
|
||||
return result
|
||||
|
||||
Note that __copy__, in this case, does not even try to copy the
|
||||
sequence; if the sequence is altered while either or both of the
|
||||
original and copied iterators are still stepping on it, the iteration
|
||||
behavior is quite likely to go awry anyway -- it is not __copy__'s
|
||||
responsibility to change this normal Python behavior for iterators
|
||||
which iterate on mutable sequences (that might, perhaps, be the
|
||||
specification for a __deepcopy__ method of iterators, which, however,
|
||||
this PEP does not deal with).
|
||||
Note that ``__copy__``, in this case, does not even try to copy the
|
||||
sequence; if the sequence is altered while either or both of the
|
||||
original and copied iterators are still stepping on it, the iteration
|
||||
behavior is quite likely to go awry anyway -- it is not ``__copy__``'s
|
||||
responsibility to change this normal Python behavior for iterators
|
||||
which iterate on mutable sequences (that might, perhaps, be the
|
||||
specification for a ``__deepcopy__`` method of iterators, which, however,
|
||||
this PEP does not deal with).
|
||||
|
||||
Consider also a "random iterator", which provides a nonterminating
|
||||
sequence of results from some method of a random instance, called
|
||||
with given arguments:
|
||||
Consider also a "random iterator", which provides a nonterminating
|
||||
sequence of results from some method of a random instance, called
|
||||
with given arguments::
|
||||
|
||||
class RandomIterator(object):
|
||||
class RandomIterator(object):
|
||||
|
||||
def __init__(self, bound_method, *args):
|
||||
self.call = bound_method
|
||||
self.args = args
|
||||
def __init__(self, bound_method, *args):
|
||||
self.call = bound_method
|
||||
self.args = args
|
||||
|
||||
def __iter__(self):
|
||||
return self
|
||||
def __iter__(self):
|
||||
return self
|
||||
|
||||
def next(self):
|
||||
return self.call(*self.args)
|
||||
def next(self):
|
||||
return self.call(*self.args)
|
||||
|
||||
def __copy__(self):
|
||||
import copy, new
|
||||
im_self = copy.copy(self.call.im_self)
|
||||
method = new.instancemethod(self.call.im_func, im_self)
|
||||
return self.__class__(method, *self.args)
|
||||
def __copy__(self):
|
||||
import copy, new
|
||||
im_self = copy.copy(self.call.im_self)
|
||||
method = new.instancemethod(self.call.im_func, im_self)
|
||||
return self.__class__(method, *self.args)
|
||||
|
||||
This iterator type is slightly more general than its name implies, as
|
||||
it supports calls to any bound method (or other callable, but if the
|
||||
callable is not a bound method, then method __copy__ will fail). But
|
||||
the use case is for the purpose of generating random streams, as in:
|
||||
This iterator type is slightly more general than its name implies, as
|
||||
it supports calls to any bound method (or other callable, but if the
|
||||
callable is not a bound method, then method ``__copy__`` will fail). But
|
||||
the use case is for the purpose of generating random streams, as in::
|
||||
|
||||
import random
|
||||
import random
|
||||
|
||||
def show5(it):
|
||||
for i, result in enumerate(it):
|
||||
print '%6.3f'%result,
|
||||
if i==4: break
|
||||
print
|
||||
def show5(it):
|
||||
for i, result in enumerate(it):
|
||||
print '%6.3f'%result,
|
||||
if i==4: break
|
||||
print
|
||||
|
||||
normit = RandomIterator(random.Random().gauss, 0, 1)
|
||||
show5(normit)
|
||||
copit = normit.__copy__()
|
||||
show5(normit)
|
||||
show5(copit)
|
||||
normit = RandomIterator(random.Random().gauss, 0, 1)
|
||||
show5(normit)
|
||||
copit = normit.__copy__()
|
||||
show5(normit)
|
||||
show5(copit)
|
||||
|
||||
which will display some output such as:
|
||||
which will display some output such as::
|
||||
|
||||
-0.536 1.936 -1.182 -1.690 -1.184
|
||||
0.666 -0.701 1.214 0.348 1.373
|
||||
0.666 -0.701 1.214 0.348 1.373
|
||||
-0.536 1.936 -1.182 -1.690 -1.184
|
||||
0.666 -0.701 1.214 0.348 1.373
|
||||
0.666 -0.701 1.214 0.348 1.373
|
||||
|
||||
the key point being that the second and third lines are equal, because
|
||||
the normit and copit iterators will step along the same "underlying
|
||||
sequence". (As an aside, note that to get a copy of self.call.im_self
|
||||
we must use copy.copy, NOT try getting at a __copy__ method directly,
|
||||
because for example instances of random.Random support copying via
|
||||
__getstate__ and __setstate__, NOT via __copy__; indeed, using
|
||||
copy.copy is the normal way to get a shallow copy of any object --
|
||||
copyable iterators are different because of the already-mentioned
|
||||
uncertainty about the result of copy.copy supporting these "copyable
|
||||
iterator" specs).
|
||||
the key point being that the second and third lines are equal, because
|
||||
the normit and copit iterators will step along the same "underlying
|
||||
sequence". (As an aside, note that to get a copy of ``self.call.im_self``
|
||||
we must use ``copy.copy``, NOT try getting at a ``__copy__`` method directly,
|
||||
because for example instances of ``random.Random`` support copying via
|
||||
``__getstate__`` and ``__setstate__``, NOT via ``__copy__``; indeed, using
|
||||
copy.copy is the normal way to get a shallow copy of any object --
|
||||
copyable iterators are different because of the already-mentioned
|
||||
uncertainty about the result of ``copy.copy`` supporting these "copyable
|
||||
iterator" specs).
|
||||
|
||||
|
||||
Details
|
||||
=======
|
||||
|
||||
Besides adding to the Python docs a recommendation that user-coded
|
||||
iterator types support a __copy__ method (if and only if it can be
|
||||
implemented with small costs in memory and runtime, and produce an
|
||||
independently-iterable copy of an iterator object), this PEP's
|
||||
implementation will specifically include the addition of copyability
|
||||
to the iterators over sequences that built-in iter returns, and also
|
||||
to the iterators over a dictionary returned by the methods __iter__,
|
||||
iterkeys, itervalues, and iteritems of built-in type dict.
|
||||
Besides adding to the Python docs a recommendation that user-coded
|
||||
iterator types support a ``__copy__`` method (if and only if it can be
|
||||
implemented with small costs in memory and runtime, and produce an
|
||||
independently-iterable copy of an iterator object), this PEP's
|
||||
implementation will specifically include the addition of copyability
|
||||
to the iterators over sequences that built-in iter returns, and also
|
||||
to the iterators over a dictionary returned by the methods ``__iter__``,
|
||||
iterkeys, itervalues, and iteritems of built-in type dict.
|
||||
|
||||
Iterators produced by generator functions will not be copyable.
|
||||
However, iterators produced by the new "generator expressions" of
|
||||
Python 2.4 (PEP 289 [3]) should be copyable if their underlying
|
||||
iterator[s] are; the strict limitations on what is possible in a
|
||||
generator expression, compared to the much vaster generality of a
|
||||
generator, should make that feasible. Similarly, the iterators
|
||||
produced by the built-in function enumerate, and certain functions
|
||||
suppiled by module itertools, should be copyable if the underlying
|
||||
iterators are.
|
||||
Iterators produced by generator functions will not be copyable.
|
||||
However, iterators produced by the new "generator expressions" of
|
||||
Python 2.4 (PEP 289 [3]_) should be copyable if their underlying
|
||||
``iterator[s]`` are; the strict limitations on what is possible in a
|
||||
generator expression, compared to the much vaster generality of a
|
||||
generator, should make that feasible. Similarly, the iterators
|
||||
produced by the built-in function ``enumerate``, and certain functions
|
||||
suppiled by module itertools, should be copyable if the underlying
|
||||
iterators are.
|
||||
|
||||
The implementation of this PEP will also include the optimization of
|
||||
the new itertools.tee function mentioned in the Motivation section.
|
||||
The implementation of this PEP will also include the optimization of
|
||||
the new itertools.tee function mentioned in the Motivation section.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
The main use case for (shallow) copying of an iterator is the same as
|
||||
for the function itertools.tee (new in 2.4). User code will not
|
||||
directly attempt to copy an iterator, because it would have to deal
|
||||
separately with uncopyable cases; calling itertools.tee will
|
||||
internally perform the copy when appropriate, and implicitly fallback
|
||||
to a maximally efficient non-copying strategy for iterators that are
|
||||
not copyable. (Occasionally, user code may want more direct control,
|
||||
specifically in order to deal with non-copyable iterators by other
|
||||
strategies, such as making a list or saving the sequence to disk).
|
||||
The main use case for (shallow) copying of an iterator is the same as
|
||||
for the function ``itertools.tee`` (new in 2.4). User code will not
|
||||
directly attempt to copy an iterator, because it would have to deal
|
||||
separately with uncopyable cases; calling ``itertools.tee`` will
|
||||
internally perform the copy when appropriate, and implicitly fallback
|
||||
to a maximally efficient non-copying strategy for iterators that are
|
||||
not copyable. (Occasionally, user code may want more direct control,
|
||||
specifically in order to deal with non-copyable iterators by other
|
||||
strategies, such as making a list or saving the sequence to disk).
|
||||
|
||||
A tee'd iterator may serve as a "reference point", allowing processing
|
||||
of a sequence to continue or resume from a known point, while the
|
||||
other independent iterator can be freely advanced to "explore" a
|
||||
further part of the sequence as needed. A simple example: a generator
|
||||
function which, given an iterator of numbers (assumed to be positive),
|
||||
returns a corresponding iterator, each of whose items is the fraction
|
||||
of the total corresponding to each corresponding item of the input
|
||||
iterator. The caller may pass the total as a value, if known in
|
||||
advance; otherwise, the iterator returned by calling this generator
|
||||
function will first compute the total.
|
||||
A tee'd iterator may serve as a "reference point", allowing processing
|
||||
of a sequence to continue or resume from a known point, while the
|
||||
other independent iterator can be freely advanced to "explore" a
|
||||
further part of the sequence as needed. A simple example: a generator
|
||||
function which, given an iterator of numbers (assumed to be positive),
|
||||
returns a corresponding iterator, each of whose items is the fraction
|
||||
of the total corresponding to each corresponding item of the input
|
||||
iterator. The caller may pass the total as a value, if known in
|
||||
advance; otherwise, the iterator returned by calling this generator
|
||||
function will first compute the total.
|
||||
|
||||
def fractions(numbers, total=None):
|
||||
if total is None:
|
||||
numbers, aux = itertools.tee(numbers)
|
||||
total = sum(aux)
|
||||
total = float(total)
|
||||
for item in numbers:
|
||||
yield item / total
|
||||
::
|
||||
|
||||
The ability to tee the numbers iterator allows this generator to
|
||||
precompute the total, if needed, without necessarily requiring
|
||||
O(N) auxiliary memory if the numbers iterator is copyable.
|
||||
def fractions(numbers, total=None):
|
||||
if total is None:
|
||||
numbers, aux = itertools.tee(numbers)
|
||||
total = sum(aux)
|
||||
total = float(total)
|
||||
for item in numbers:
|
||||
yield item / total
|
||||
|
||||
As another example of "iterator bookmarking", consider a stream of
|
||||
numbers with an occasional string as a "postfix operator" now and
|
||||
then. By far most frequent such operator is a '+', whereupon we must
|
||||
sum all previous numbers (since the last previous operator if any, or
|
||||
else since the start) and yield the result. Sometimes we find a '*'
|
||||
instead, which is the same except that the previous numbers must
|
||||
instead be multiplied, not summed.
|
||||
The ability to tee the numbers iterator allows this generator to
|
||||
precompute the total, if needed, without necessarily requiring
|
||||
O(N) auxiliary memory if the numbers iterator is copyable.
|
||||
|
||||
def filter_weird_stream(stream):
|
||||
it = iter(stream)
|
||||
while True:
|
||||
it, bookmark = itertools.tee(it)
|
||||
total = 0
|
||||
for item in it:
|
||||
if item=='+':
|
||||
yield total
|
||||
break
|
||||
elif item=='*':
|
||||
product = 1
|
||||
for item in bookmark:
|
||||
if item=='*':
|
||||
yield product
|
||||
break
|
||||
else:
|
||||
product *= item
|
||||
else:
|
||||
total += item
|
||||
As another example of "iterator bookmarking", consider a stream of
|
||||
numbers with an occasional string as a "postfix operator" now and
|
||||
then. By far most frequent such operator is a '+', whereupon we must
|
||||
sum all previous numbers (since the last previous operator if any, or
|
||||
else since the start) and yield the result. Sometimes we find a '*'
|
||||
instead, which is the same except that the previous numbers must
|
||||
instead be multiplied, not summed.
|
||||
|
||||
Similar use cases of itertools.tee can support such tasks as
|
||||
"undo" on a stream of commands represented by an iterator,
|
||||
"backtracking" on the parse of a stream of tokens, and so on.
|
||||
(Of course, in each case, one should also consider simpler
|
||||
possibilities such as saving relevant portions of the sequence
|
||||
into lists while stepping on the sequence with just one iterator,
|
||||
depending on the details of one's task).
|
||||
::
|
||||
|
||||
def filter_weird_stream(stream):
|
||||
it = iter(stream)
|
||||
while True:
|
||||
it, bookmark = itertools.tee(it)
|
||||
total = 0
|
||||
for item in it:
|
||||
if item=='+':
|
||||
yield total
|
||||
break
|
||||
elif item=='*':
|
||||
product = 1
|
||||
for item in bookmark:
|
||||
if item=='*':
|
||||
yield product
|
||||
break
|
||||
else:
|
||||
product *= item
|
||||
else:
|
||||
total += item
|
||||
|
||||
Similar use cases of itertools.tee can support such tasks as
|
||||
"undo" on a stream of commands represented by an iterator,
|
||||
"backtracking" on the parse of a stream of tokens, and so on.
|
||||
(Of course, in each case, one should also consider simpler
|
||||
possibilities such as saving relevant portions of the sequence
|
||||
into lists while stepping on the sequence with just one iterator,
|
||||
depending on the details of one's task).
|
||||
|
||||
|
||||
Here is an example, in pure Python, of how the 'enumerate'
|
||||
built-in could be extended to support __copy__ if its underlying
|
||||
iterator also supported __copy__:
|
||||
Here is an example, in pure Python, of how the 'enumerate'
|
||||
built-in could be extended to support ``__copy__`` if its underlying
|
||||
iterator also supported ``__copy__``::
|
||||
|
||||
class enumerate(object):
|
||||
class enumerate(object):
|
||||
|
||||
def __init__(self, it):
|
||||
self.it = iter(it)
|
||||
self.i = -1
|
||||
def __init__(self, it):
|
||||
self.it = iter(it)
|
||||
self.i = -1
|
||||
|
||||
def __iter__(self):
|
||||
return self
|
||||
def __iter__(self):
|
||||
return self
|
||||
|
||||
def next(self):
|
||||
self.i += 1
|
||||
return self.i, self.it.next()
|
||||
def next(self):
|
||||
self.i += 1
|
||||
return self.i, self.it.next()
|
||||
|
||||
def __copy__(self):
|
||||
result = self.__class__.__new__()
|
||||
result.it = self.it.__copy__()
|
||||
result.i = self.i
|
||||
return result
|
||||
def __copy__(self):
|
||||
result = self.__class__.__new__()
|
||||
result.it = self.it.__copy__()
|
||||
result.i = self.i
|
||||
return result
|
||||
|
||||
|
||||
Here is an example of the kind of "fragility" produced by "accidental
|
||||
copyability" of an iterator -- the reason why one must NOT use
|
||||
copy.copy expecting, if it succeeds, to receive as a result an
|
||||
iterator which is iterable-on independently from the original. Here
|
||||
is an iterator class that iterates (in preorder) on "trees" which, for
|
||||
simplicity, are just nested lists -- any item that's a list is treated
|
||||
as a subtree, any other item as a leaf.
|
||||
Here is an example of the kind of "fragility" produced by "accidental
|
||||
copyability" of an iterator -- the reason why one must NOT use
|
||||
copy.copy expecting, if it succeeds, to receive as a result an
|
||||
iterator which is iterable-on independently from the original. Here
|
||||
is an iterator class that iterates (in preorder) on "trees" which, for
|
||||
simplicity, are just nested lists -- any item that's a list is treated
|
||||
as a subtree, any other item as a leaf.
|
||||
|
||||
::
|
||||
|
||||
class ListreeIter(object):
|
||||
|
||||
|
@ -387,96 +400,100 @@ Rationale
|
|||
self.indx.append(-1)
|
||||
return self.next()
|
||||
|
||||
Now, for example, the following code:
|
||||
Now, for example, the following code::
|
||||
|
||||
import copy
|
||||
x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]
|
||||
import copy
|
||||
x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]
|
||||
|
||||
print 'showing all items:',
|
||||
it = ListreeIter(x)
|
||||
for i in it:
|
||||
print i,
|
||||
if i==6: cop = copy.copy(it)
|
||||
print
|
||||
print 'showing all items:',
|
||||
it = ListreeIter(x)
|
||||
for i in it:
|
||||
print i,
|
||||
if i==6: cop = copy.copy(it)
|
||||
print
|
||||
|
||||
print 'showing items >6 again:'
|
||||
for i in cop: print i,
|
||||
print
|
||||
print 'showing items >6 again:'
|
||||
for i in cop: print i,
|
||||
print
|
||||
|
||||
does NOT work as intended -- the "cop" iterator gets consumed, and
|
||||
exhausted, step by step as the original "it" iterator is, because
|
||||
the accidental (rather than deliberate) copying performed by
|
||||
copy.copy shares, rather than duplicating the "index" list, which
|
||||
is the mutable attribute it.indx (a list of numerical indices).
|
||||
Thus, this "client code" of the iterator, which attempts to iterate
|
||||
twice over a portion of the sequence via a copy.copy on the
|
||||
iterator, is NOT correct.
|
||||
does NOT work as intended -- the "cop" iterator gets consumed, and
|
||||
exhausted, step by step as the original "it" iterator is, because
|
||||
the accidental (rather than deliberate) copying performed by
|
||||
``copy.copy`` shares, rather than duplicating the "index" list, which
|
||||
is the mutable attribute ``it.indx`` (a list of numerical indices).
|
||||
Thus, this "client code" of the iterator, which attempts to iterate
|
||||
twice over a portion of the sequence via a ``copy.copy`` on the
|
||||
iterator, is NOT correct.
|
||||
|
||||
Some correct solutions include using itertools.tee, i.e., changing
|
||||
the first for loop into:
|
||||
Some correct solutions include using ``itertools.tee``, i.e., changing
|
||||
the first for loop into::
|
||||
|
||||
for i in it:
|
||||
print i,
|
||||
if i==6:
|
||||
it, cop = itertools.tee(it)
|
||||
break
|
||||
for i in it: print i,
|
||||
for i in it:
|
||||
print i,
|
||||
if i==6:
|
||||
it, cop = itertools.tee(it)
|
||||
break
|
||||
for i in it: print i,
|
||||
|
||||
(note that we MUST break the loop in two, otherwise we'd still
|
||||
be looping on the ORIGINAL value of it, which must NOT be used
|
||||
further after the call to tee!!!); or making a list, i.e.:
|
||||
(note that we MUST break the loop in two, otherwise we'd still
|
||||
be looping on the ORIGINAL value of it, which must NOT be used
|
||||
further after the call to tee!!!); or making a list, i.e.
|
||||
|
||||
for i in it:
|
||||
print i,
|
||||
if i==6:
|
||||
cop = lit = list(it)
|
||||
break
|
||||
for i in lit: print i,
|
||||
::
|
||||
|
||||
(again, the loop must be broken in two, since iterator 'it'
|
||||
gets exhausted by the call list(it)).
|
||||
for i in it:
|
||||
print i,
|
||||
if i==6:
|
||||
cop = lit = list(it)
|
||||
break
|
||||
for i in lit: print i,
|
||||
|
||||
Finally, all of these solutions would work if Listiter supplied
|
||||
a suitable __copy__ method, as this PEP recommends:
|
||||
(again, the loop must be broken in two, since iterator 'it'
|
||||
gets exhausted by the call ``list(it)``).
|
||||
|
||||
def __copy__(self):
|
||||
result = self.__class__.new()
|
||||
result.tree = copy.copy(self.tree)
|
||||
result.indx = copy.copy(self.indx)
|
||||
return result
|
||||
Finally, all of these solutions would work if Listiter supplied
|
||||
a suitable ``__copy__`` method, as this PEP recommends::
|
||||
|
||||
There is no need to get any "deeper" in the copy, but the two
|
||||
mutable "index state" attributes must indeed be copied in order
|
||||
to achieve a "proper" (independently iterable) iterator-copy.
|
||||
def __copy__(self):
|
||||
result = self.__class__.new()
|
||||
result.tree = copy.copy(self.tree)
|
||||
result.indx = copy.copy(self.indx)
|
||||
return result
|
||||
|
||||
The recommended solution is to have class Listiter supply this
|
||||
__copy__ method AND have client code use itertools.tee (with
|
||||
the split-in-two-parts loop as shown above). This will make
|
||||
client code maximally tolerant of different iterator types it
|
||||
might be using AND achieve good performance for tee'ing of this
|
||||
specific iterator type at the same time.
|
||||
There is no need to get any "deeper" in the copy, but the two
|
||||
mutable "index state" attributes must indeed be copied in order
|
||||
to achieve a "proper" (independently iterable) iterator-copy.
|
||||
|
||||
The recommended solution is to have class Listiter supply this
|
||||
``__copy__`` method AND have client code use ``itertools.tee`` (with
|
||||
the split-in-two-parts loop as shown above). This will make
|
||||
client code maximally tolerant of different iterator types it
|
||||
might be using AND achieve good performance for tee'ing of this
|
||||
specific iterator type at the same time.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] Discussion on python-dev starting at post:
|
||||
https://mail.python.org/pipermail/python-dev/2003-October/038969.html
|
||||
.. [1] Discussion on python-dev starting at post:
|
||||
https://mail.python.org/pipermail/python-dev/2003-October/038969.html
|
||||
|
||||
[2] Online documentation for the copy module of the standard library:
|
||||
http://docs.python.org/library/copy.html
|
||||
.. [2] Online documentation for the copy module of the standard library:
|
||||
http://docs.python.org/library/copy.html
|
||||
|
||||
[3] PEP 289, Generator Expressions, Hettinger
|
||||
http://www.python.org/dev/peps/pep-0289/
|
||||
.. [3] PEP 289, Generator Expressions, Hettinger
|
||||
http://www.python.org/dev/peps/pep-0289/
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
||||
|
|
Loading…
Reference in New Issue