Added PEP 323, "Copyable Iterators".

This commit is contained in:
Alex Martelli 2003-10-27 15:01:08 +00:00
parent 068cdefc0d
commit eefc4df37c
2 changed files with 193 additions and 0 deletions

View File

@ -119,6 +119,7 @@ Index by Category
S 319 Python Synchronize/Asynchronize Block Pelletier
S 321 Date/Time Parsing and Formatting Kuchling
S 322 Reverse Iteration Methods Hettinger
S 323 Copyable Iterators Martelli
S 754 IEEE 754 Floating Point Special Values Warnes
Finished PEPs (done, implemented in CVS)

192
pep-0323.txt Normal file
View File

@ -0,0 +1,192 @@
PEP: 323
Title: Copyable Iterators
Version: $Revision$
Last-Modified: $Date$
Author: Alex Martelli <aleaxit@yahoo.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 25-Oct-2003
Python-Version: 2.4
Post-History:
Abstract
This PEP suggests that some iterator types should support being copied
(shallowly or deeply), by exposing suitable methods such as __copy__ or
__deepcopy__.
Motivation
Some iterators, such as those which built-in function iter builds over
sequences, should be intrinsically easy to copy (just get another
reference to the same sequence and a copy of the index) or deepcopy
(ditto but with a deepcopy of the underlying sequence).
However, in Python up to 2.3, those iterators don't expose their state:
user code which is using such an iterator X is not able to just "save" a
copy of X (for future iteration from the current point) with a simple
"saved_X = copy.copy(X)".
To "tee" a non-copyable iterator into two independently iterable ones,
subtle code [such as that which will go into itertools.tee in 2.4] is
needed (a Python version thereof is currently in the itertools docs).
In any case, such code will inevitably have to consume memory to keep
a copy of the subsequence which has been iterated on by one but not
both tee'd iterators. This is a waste of memory in those cases in
which the iterator being tee'd already _has_ an underlying sequence
(or other iterable) and index.
Having iterators expose a suitable __copy__ when feasible allows easy
optimization of itertools.tee and similar user code, as in:
def tee(it):
it = iter(it)
try: return it, copy.copy(it)
except TypeError:
# non-copyable iterator, do all the needed hard work
Specification
Any iterator type X may expose a method __copy__ that is callable
without arguments on any instance x of X. The method should be
exposed if and only if the iterator type can provide copyability with
reasonably little computational and memory effort. Similarly, X may
expose a method __deepcopy__ that is callable with one argument (a memo
dictionary) on instances of X. The (very concise...) specs for
these methods are at [2] -- for more details, see also file copy.py
in the Python standard library.
For example, suppose a class Iter essentially duplicated the
functionality of the iter builtin for iterating on a sequence:
class Iter(object):
def __init__(self, sequence):
self.sequence = sequence
self.index = 0
def __iter__(self):
return self
def next(self):
try: result = self.sequence[self.index]
except IndexError: raise StopIteration
self.index += 1
return result
To make this Iter class compliant with this PEP, the following
additions to the body of class Iter would suffice:
def __copy__(self):
result = self.__class__(sequence)
result.index = self.index
return result
def __deepcopy__(self, memo):
result = self.__copy__()
result.sequence = deepcopy(self.sequence, memo)
return result
Details
Besides adding to the Python docs a recommendation that user-coded
iterators be made copyable when feasible, this PEP's implementation
will specifically include the addition of copyability to the iterators
over sequences that built-in iter returns, and also to the iterators
over a dictionary returned by the methods __iter__, iterkeys, itervalues,
and iteritems of built-in type dict.
Iterators produced by generators will not be copyable (the BDFL deems
shallow copy impossible, and deep copy too much trouble). However,
iterators produced by the new "generator expressions" of Python 2.4
(PEP 289 [3]) should be copyable if their underlying iterator[s] are;
the strict limitations on what is possible in a generator expression,
compared to the much vaster generality of a generator, should make
that feasible. Similarly, the iterators produced by the built-in
function enumerate should be copyable if the underlying iterator is.
The implementation of this PEP will also include the optimization
of the new itertools.tee function mentioned in the Motivation section.
Rationale
Being able to copy iterators will allow copying of user-coded classes
that have copyable iterators as attributes. This applies to both
shallow and deep copying.
Deep copyability of suitable iterators will allow "decoupling" from
an underlying mutable sequence, and, according to the BDFL, may in
the future allow certain iterators to become picklable (however, this
PEP, in itself, does not include picklability of iterators).
The main use case for (shallow) copying of an iterator is the same
as for the function itertools.tee (new in 2.4). Indeed, we assume
that user code will typically not directly attempt to copy.copy an
iterator, because it would have to deal with uncopyable cases; just
calling itertools.tee will enable copying when feasible, with an
implicit fallback to a maximally efficient non-copying strategy for
iterators that are not copyable.
A tee'd iterator may serve as a "reference point", allowing processing
of a sequence to continue or resume from a known point, while the other
independent iterator can be freely advanced to "explore" a further part
of the sequence as needed. To fully exploit this pattern, it should
ideally also be possible, given two iterators originally produced by
iterator.tee, to check if they are "equal" (have been stepped the same
number of times); however, this PEP, in itself, does not include
comparability of such iterators. Therefore, code using this pattern
may also need to count the number of forward steps taken after tee by
the iterators, so as to be able to tell which one is "behind", and by
how much. Built-in function enumerate is one way to make such counting
very easy in many situations. However, such needs do not always arise.
Here is a simpler example: a generator which, given an iterator of
numbers (assumed to be positive), returns a corresponding iterator
each of whose items is the fraction of the total corresponding to each
corresponding item of the input iterator. The caller may pass the
total as a value, if known in advance; otherwise, the iterator
returned by calling the generator will first compute the total.
def fractions(numbers, total=None):
if total is None:
numbers, aux = itertools.tee(numbers)
total = sum(aux)
total = float(total)
for item in numbers:
yield item / total
The ability to tee the numbers iterator allows this generator to
precompute the total, if needed, without necessarily requiring
O(N) auxiliary memory if the numbers iterator is copyable.
References
[1] Discussion on python-dev starting at post:
http://mail.python.org/pipermail/python-dev/2003-October/038969.html
[2] Online documentation for the copy module of the standard library:
http://www.python.org/doc/current/lib/module-copy.html
[3] PEP 289, Generator Expressions, Hettinger
http://www.python.org/peps/pep-0289.html
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: