Added PEP 323, "Copyable Iterators".
This commit is contained in:
parent
068cdefc0d
commit
eefc4df37c
|
@ -119,6 +119,7 @@ Index by Category
|
|||
S 319 Python Synchronize/Asynchronize Block Pelletier
|
||||
S 321 Date/Time Parsing and Formatting Kuchling
|
||||
S 322 Reverse Iteration Methods Hettinger
|
||||
S 323 Copyable Iterators Martelli
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
|
||||
Finished PEPs (done, implemented in CVS)
|
||||
|
|
|
@ -0,0 +1,192 @@
|
|||
PEP: 323
|
||||
Title: Copyable Iterators
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Alex Martelli <aleaxit@yahoo.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/plain
|
||||
Created: 25-Oct-2003
|
||||
Python-Version: 2.4
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP suggests that some iterator types should support being copied
|
||||
(shallowly or deeply), by exposing suitable methods such as __copy__ or
|
||||
__deepcopy__.
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
Some iterators, such as those which built-in function iter builds over
|
||||
sequences, should be intrinsically easy to copy (just get another
|
||||
reference to the same sequence and a copy of the index) or deepcopy
|
||||
(ditto but with a deepcopy of the underlying sequence).
|
||||
|
||||
However, in Python up to 2.3, those iterators don't expose their state:
|
||||
user code which is using such an iterator X is not able to just "save" a
|
||||
copy of X (for future iteration from the current point) with a simple
|
||||
"saved_X = copy.copy(X)".
|
||||
|
||||
To "tee" a non-copyable iterator into two independently iterable ones,
|
||||
subtle code [such as that which will go into itertools.tee in 2.4] is
|
||||
needed (a Python version thereof is currently in the itertools docs).
|
||||
|
||||
In any case, such code will inevitably have to consume memory to keep
|
||||
a copy of the subsequence which has been iterated on by one but not
|
||||
both tee'd iterators. This is a waste of memory in those cases in
|
||||
which the iterator being tee'd already _has_ an underlying sequence
|
||||
(or other iterable) and index.
|
||||
|
||||
Having iterators expose a suitable __copy__ when feasible allows easy
|
||||
optimization of itertools.tee and similar user code, as in:
|
||||
|
||||
def tee(it):
|
||||
it = iter(it)
|
||||
try: return it, copy.copy(it)
|
||||
except TypeError:
|
||||
# non-copyable iterator, do all the needed hard work
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
Any iterator type X may expose a method __copy__ that is callable
|
||||
without arguments on any instance x of X. The method should be
|
||||
exposed if and only if the iterator type can provide copyability with
|
||||
reasonably little computational and memory effort. Similarly, X may
|
||||
expose a method __deepcopy__ that is callable with one argument (a memo
|
||||
dictionary) on instances of X. The (very concise...) specs for
|
||||
these methods are at [2] -- for more details, see also file copy.py
|
||||
in the Python standard library.
|
||||
|
||||
For example, suppose a class Iter essentially duplicated the
|
||||
functionality of the iter builtin for iterating on a sequence:
|
||||
|
||||
class Iter(object):
|
||||
|
||||
def __init__(self, sequence):
|
||||
self.sequence = sequence
|
||||
self.index = 0
|
||||
|
||||
def __iter__(self):
|
||||
return self
|
||||
|
||||
def next(self):
|
||||
try: result = self.sequence[self.index]
|
||||
except IndexError: raise StopIteration
|
||||
self.index += 1
|
||||
return result
|
||||
|
||||
To make this Iter class compliant with this PEP, the following
|
||||
additions to the body of class Iter would suffice:
|
||||
|
||||
def __copy__(self):
|
||||
result = self.__class__(sequence)
|
||||
result.index = self.index
|
||||
return result
|
||||
|
||||
def __deepcopy__(self, memo):
|
||||
result = self.__copy__()
|
||||
result.sequence = deepcopy(self.sequence, memo)
|
||||
return result
|
||||
|
||||
|
||||
Details
|
||||
|
||||
Besides adding to the Python docs a recommendation that user-coded
|
||||
iterators be made copyable when feasible, this PEP's implementation
|
||||
will specifically include the addition of copyability to the iterators
|
||||
over sequences that built-in iter returns, and also to the iterators
|
||||
over a dictionary returned by the methods __iter__, iterkeys, itervalues,
|
||||
and iteritems of built-in type dict.
|
||||
|
||||
Iterators produced by generators will not be copyable (the BDFL deems
|
||||
shallow copy impossible, and deep copy too much trouble). However,
|
||||
iterators produced by the new "generator expressions" of Python 2.4
|
||||
(PEP 289 [3]) should be copyable if their underlying iterator[s] are;
|
||||
the strict limitations on what is possible in a generator expression,
|
||||
compared to the much vaster generality of a generator, should make
|
||||
that feasible. Similarly, the iterators produced by the built-in
|
||||
function enumerate should be copyable if the underlying iterator is.
|
||||
|
||||
The implementation of this PEP will also include the optimization
|
||||
of the new itertools.tee function mentioned in the Motivation section.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Being able to copy iterators will allow copying of user-coded classes
|
||||
that have copyable iterators as attributes. This applies to both
|
||||
shallow and deep copying.
|
||||
|
||||
Deep copyability of suitable iterators will allow "decoupling" from
|
||||
an underlying mutable sequence, and, according to the BDFL, may in
|
||||
the future allow certain iterators to become picklable (however, this
|
||||
PEP, in itself, does not include picklability of iterators).
|
||||
|
||||
The main use case for (shallow) copying of an iterator is the same
|
||||
as for the function itertools.tee (new in 2.4). Indeed, we assume
|
||||
that user code will typically not directly attempt to copy.copy an
|
||||
iterator, because it would have to deal with uncopyable cases; just
|
||||
calling itertools.tee will enable copying when feasible, with an
|
||||
implicit fallback to a maximally efficient non-copying strategy for
|
||||
iterators that are not copyable.
|
||||
|
||||
A tee'd iterator may serve as a "reference point", allowing processing
|
||||
of a sequence to continue or resume from a known point, while the other
|
||||
independent iterator can be freely advanced to "explore" a further part
|
||||
of the sequence as needed. To fully exploit this pattern, it should
|
||||
ideally also be possible, given two iterators originally produced by
|
||||
iterator.tee, to check if they are "equal" (have been stepped the same
|
||||
number of times); however, this PEP, in itself, does not include
|
||||
comparability of such iterators. Therefore, code using this pattern
|
||||
may also need to count the number of forward steps taken after tee by
|
||||
the iterators, so as to be able to tell which one is "behind", and by
|
||||
how much. Built-in function enumerate is one way to make such counting
|
||||
very easy in many situations. However, such needs do not always arise.
|
||||
|
||||
Here is a simpler example: a generator which, given an iterator of
|
||||
numbers (assumed to be positive), returns a corresponding iterator
|
||||
each of whose items is the fraction of the total corresponding to each
|
||||
corresponding item of the input iterator. The caller may pass the
|
||||
total as a value, if known in advance; otherwise, the iterator
|
||||
returned by calling the generator will first compute the total.
|
||||
|
||||
def fractions(numbers, total=None):
|
||||
if total is None:
|
||||
numbers, aux = itertools.tee(numbers)
|
||||
total = sum(aux)
|
||||
total = float(total)
|
||||
for item in numbers:
|
||||
yield item / total
|
||||
|
||||
The ability to tee the numbers iterator allows this generator to
|
||||
precompute the total, if needed, without necessarily requiring
|
||||
O(N) auxiliary memory if the numbers iterator is copyable.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] Discussion on python-dev starting at post:
|
||||
http://mail.python.org/pipermail/python-dev/2003-October/038969.html
|
||||
|
||||
[2] Online documentation for the copy module of the standard library:
|
||||
http://www.python.org/doc/current/lib/module-copy.html
|
||||
|
||||
[3] PEP 289, Generator Expressions, Hettinger
|
||||
http://www.python.org/peps/pep-0289.html
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
Loading…
Reference in New Issue