reSTify PEP 323 (#351)

This commit is contained in:
Huang Huang 2017-08-19 02:59:08 +08:00 committed by Brett Cannon
parent 86a462a7b4
commit 5aea3b9792
1 changed files with 361 additions and 344 deletions

View File

@ -5,39 +5,43 @@ Last-Modified: $Date$
Author: Alex Martelli <aleaxit@gmail.com>
Status: Deferred
Type: Standards Track
Content-Type: text/plain
Content-Type: text/x-rst
Created: 25-Oct-2003
Python-Version: 2.5
Post-History: 29-Oct-2003
Deferral
========
This PEP has been deferred. Copyable iterators are a nice idea, but after
four years, no implementation or widespread interest has emerged.
Abstract
========
This PEP suggests that some iterator types should support shallow
copies of their instances by exposing a __copy__ method which meets
copies of their instances by exposing a ``__copy__`` method which meets
some specific requirements, and indicates how code using an iterator
might exploit such a __copy__ method when present.
might exploit such a ``__copy__`` method when present.
Update and Comments
===================
Support for __copy__ was included in Py2.4's itertools.tee().
Support for ``__copy__`` was included in Py2.4's ``itertools.tee()``.
Adding __copy__ methods to existing iterators will change the
behavior under tee(). Currently, the copied iterators remain
Adding ``__copy__`` methods to existing iterators will change the
behavior under ``tee()``. Currently, the copied iterators remain
tied to the original iterator. If the original advances, then
so do all of the copies. Good practice is to overwrite the
original so that anamolies don't result: a,b=tee(a).
original so that anamolies don't result: ``a,b=tee(a)``.
Code that doesn't follow that practice may observe a semantic
change if a __copy__ method is added to an iterator.
change if a ``__copy__`` method is added to an iterator.
Motivation
==========
In Python up to 2.3, most built-in iterator types don't let the user
copy their instances. User-coded iterators that do let their clients
@ -49,26 +53,26 @@ Motivation
almost invariably "accidental" -- i.e., the standard machinery of the
copy method in Python's standard library's copy module does build and
return a copy. However, the copy will be independently iterable with
respect to the original only if calling .next() on an instance of that
respect to the original only if calling ``.next()`` on an instance of that
class happens to change instance state solely by rebinding some
attributes to new values, and not by mutating some attributes'
existing values.
For example, an iterator whose "index" state is held as an integer
attribute will probably give usable copies, since (integers being
immutable) .next() presumably just rebinds that attribute. On the
immutable) ``.next()`` presumably just rebinds that attribute. On the
other hand, another iterator whose "index" state is held as a list
attribute will probably mutate the same list object when .next()
attribute will probably mutate the same list object when ``.next()``
executes, and therefore copies of such an iterator will not be
iterable separately and independently from the original.
Given this existing situation, copy.copy(it) on some iterator object
Given this existing situation, ``copy.copy(it)`` on some iterator object
isn't very useful, nor, therefore, is it at all widely used. However,
there are many cases in which being able to get a "snapshot" of an
iterator, as a "bookmark", so as to be able to keep iterating along
the sequence but later iterate again on the same sequence from the
bookmark onwards, is useful. To support such "bookmarking", module
itertools, in 2.4, has grown a 'tee' function, to be used as:
itertools, in 2.4, has grown a 'tee' function, to be used as::
it, bookmark = itertools.tee(it)
@ -89,7 +93,7 @@ Motivation
able to iterate through them repeatedly.
This PEP proposes another idea that will, in some important cases,
allow itertools.tee to do its job with minimal cost in terms of
allow ``itertools.tee`` to do its job with minimal cost in terms of
memory; user code may also occasionally be able to exploit the idea in
order to decide whether to copy an iterator, make a list from it, or
use an auxiliary disk file.
@ -98,20 +102,20 @@ Motivation
which built-in function iter builds over sequences, would be
intrinsically easy to copy: just get another reference to the same
sequence, and a copy of the integer index. However, in Python 2.3,
those iterators don't expose the state, and don't support copy.copy.
those iterators don't expose the state, and don't support ``copy.copy``.
The purpose of this PEP, therefore, is to have those iterator types
expose a suitable __copy__ method. Similarly, user-coded iterator
expose a suitable ``__copy__`` method. Similarly, user-coded iterator
types that can provide copies of their instances, suitable for
separate and independent iteration, with limited costs in time and
space, should also expose a suitable __copy__ method. While
space, should also expose a suitable ``__copy__`` method. While
copy.copy also supports other ways to let a type control the way
its instances are copied, it is suggested, for simplicity, that
iterator types that support copying always do so by exposing a
__copy__ method, and not in the other ways copy.copy supports.
``__copy__`` method, and not in the other ways ``copy.copy`` supports.
Having iterators expose a suitable __copy__ when feasible will afford
easy optimization of itertools.tee and similar user code, as in:
Having iterators expose a suitable ``__copy__`` when feasible will afford
easy optimization of itertools.tee and similar user code, as in::
def tee(it):
it = iter(it)
@ -130,17 +134,18 @@ Motivation
Specification
=============
Any iterator type X may expose a method __copy__ that is callable
Any iterator type X may expose a method ``__copy__`` that is callable
without arguments on any instance x of X. The method should be
exposed if and only if the iterator type can provide copyability with
reasonably little computational and memory effort. Furthermore, the
new object y returned by method __copy__ should be a new instance
new object y returned by method ``__copy__`` should be a new instance
of X that is iterable independently and separately from x, stepping
along the same "underlying sequence" of items.
For example, suppose a class Iter essentially duplicated the
functionality of the iter builtin for iterating on a sequence:
functionality of the iter builtin for iterating on a sequence::
class Iter(object):
@ -158,25 +163,25 @@ Specification
return result
To make this Iter class compliant with this PEP, the following
addition to the body of class Iter would suffice:
addition to the body of class Iter would suffice::
def __copy__(self):
result = self.__class__(self.sequence)
result.index = self.index
return result
Note that __copy__, in this case, does not even try to copy the
Note that ``__copy__``, in this case, does not even try to copy the
sequence; if the sequence is altered while either or both of the
original and copied iterators are still stepping on it, the iteration
behavior is quite likely to go awry anyway -- it is not __copy__'s
behavior is quite likely to go awry anyway -- it is not ``__copy__``'s
responsibility to change this normal Python behavior for iterators
which iterate on mutable sequences (that might, perhaps, be the
specification for a __deepcopy__ method of iterators, which, however,
specification for a ``__deepcopy__`` method of iterators, which, however,
this PEP does not deal with).
Consider also a "random iterator", which provides a nonterminating
sequence of results from some method of a random instance, called
with given arguments:
with given arguments::
class RandomIterator(object):
@ -198,8 +203,8 @@ Specification
This iterator type is slightly more general than its name implies, as
it supports calls to any bound method (or other callable, but if the
callable is not a bound method, then method __copy__ will fail). But
the use case is for the purpose of generating random streams, as in:
callable is not a bound method, then method ``__copy__`` will fail). But
the use case is for the purpose of generating random streams, as in::
import random
@ -215,7 +220,7 @@ Specification
show5(normit)
show5(copit)
which will display some output such as:
which will display some output such as::
-0.536 1.936 -1.182 -1.690 -1.184
0.666 -0.701 1.214 0.348 1.373
@ -223,34 +228,35 @@ Specification
the key point being that the second and third lines are equal, because
the normit and copit iterators will step along the same "underlying
sequence". (As an aside, note that to get a copy of self.call.im_self
we must use copy.copy, NOT try getting at a __copy__ method directly,
because for example instances of random.Random support copying via
__getstate__ and __setstate__, NOT via __copy__; indeed, using
sequence". (As an aside, note that to get a copy of ``self.call.im_self``
we must use ``copy.copy``, NOT try getting at a ``__copy__`` method directly,
because for example instances of ``random.Random`` support copying via
``__getstate__`` and ``__setstate__``, NOT via ``__copy__``; indeed, using
copy.copy is the normal way to get a shallow copy of any object --
copyable iterators are different because of the already-mentioned
uncertainty about the result of copy.copy supporting these "copyable
uncertainty about the result of ``copy.copy`` supporting these "copyable
iterator" specs).
Details
=======
Besides adding to the Python docs a recommendation that user-coded
iterator types support a __copy__ method (if and only if it can be
iterator types support a ``__copy__`` method (if and only if it can be
implemented with small costs in memory and runtime, and produce an
independently-iterable copy of an iterator object), this PEP's
implementation will specifically include the addition of copyability
to the iterators over sequences that built-in iter returns, and also
to the iterators over a dictionary returned by the methods __iter__,
to the iterators over a dictionary returned by the methods ``__iter__``,
iterkeys, itervalues, and iteritems of built-in type dict.
Iterators produced by generator functions will not be copyable.
However, iterators produced by the new "generator expressions" of
Python 2.4 (PEP 289 [3]) should be copyable if their underlying
iterator[s] are; the strict limitations on what is possible in a
Python 2.4 (PEP 289 [3]_) should be copyable if their underlying
``iterator[s]`` are; the strict limitations on what is possible in a
generator expression, compared to the much vaster generality of a
generator, should make that feasible. Similarly, the iterators
produced by the built-in function enumerate, and certain functions
produced by the built-in function ``enumerate``, and certain functions
suppiled by module itertools, should be copyable if the underlying
iterators are.
@ -259,11 +265,12 @@ Details
Rationale
=========
The main use case for (shallow) copying of an iterator is the same as
for the function itertools.tee (new in 2.4). User code will not
for the function ``itertools.tee`` (new in 2.4). User code will not
directly attempt to copy an iterator, because it would have to deal
separately with uncopyable cases; calling itertools.tee will
separately with uncopyable cases; calling ``itertools.tee`` will
internally perform the copy when appropriate, and implicitly fallback
to a maximally efficient non-copying strategy for iterators that are
not copyable. (Occasionally, user code may want more direct control,
@ -281,6 +288,8 @@ Rationale
advance; otherwise, the iterator returned by calling this generator
function will first compute the total.
::
def fractions(numbers, total=None):
if total is None:
numbers, aux = itertools.tee(numbers)
@ -301,6 +310,8 @@ Rationale
instead, which is the same except that the previous numbers must
instead be multiplied, not summed.
::
def filter_weird_stream(stream):
it = iter(stream)
while True:
@ -331,8 +342,8 @@ Rationale
Here is an example, in pure Python, of how the 'enumerate'
built-in could be extended to support __copy__ if its underlying
iterator also supported __copy__:
built-in could be extended to support ``__copy__`` if its underlying
iterator also supported ``__copy__``::
class enumerate(object):
@ -362,6 +373,8 @@ Rationale
simplicity, are just nested lists -- any item that's a list is treated
as a subtree, any other item as a leaf.
::
class ListreeIter(object):
def __init__(self, tree):
@ -387,7 +400,7 @@ Rationale
self.indx.append(-1)
return self.next()
Now, for example, the following code:
Now, for example, the following code::
import copy
x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]
@ -406,14 +419,14 @@ Rationale
does NOT work as intended -- the "cop" iterator gets consumed, and
exhausted, step by step as the original "it" iterator is, because
the accidental (rather than deliberate) copying performed by
copy.copy shares, rather than duplicating the "index" list, which
is the mutable attribute it.indx (a list of numerical indices).
``copy.copy`` shares, rather than duplicating the "index" list, which
is the mutable attribute ``it.indx`` (a list of numerical indices).
Thus, this "client code" of the iterator, which attempts to iterate
twice over a portion of the sequence via a copy.copy on the
twice over a portion of the sequence via a ``copy.copy`` on the
iterator, is NOT correct.
Some correct solutions include using itertools.tee, i.e., changing
the first for loop into:
Some correct solutions include using ``itertools.tee``, i.e., changing
the first for loop into::
for i in it:
print i,
@ -424,7 +437,9 @@ Rationale
(note that we MUST break the loop in two, otherwise we'd still
be looping on the ORIGINAL value of it, which must NOT be used
further after the call to tee!!!); or making a list, i.e.:
further after the call to tee!!!); or making a list, i.e.
::
for i in it:
print i,
@ -434,10 +449,10 @@ Rationale
for i in lit: print i,
(again, the loop must be broken in two, since iterator 'it'
gets exhausted by the call list(it)).
gets exhausted by the call ``list(it)``).
Finally, all of these solutions would work if Listiter supplied
a suitable __copy__ method, as this PEP recommends:
a suitable ``__copy__`` method, as this PEP recommends::
def __copy__(self):
result = self.__class__.new()
@ -450,7 +465,7 @@ Rationale
to achieve a "proper" (independently iterable) iterator-copy.
The recommended solution is to have class Listiter supply this
__copy__ method AND have client code use itertools.tee (with
``__copy__`` method AND have client code use ``itertools.tee`` (with
the split-in-two-parts loop as shown above). This will make
client code maximally tolerant of different iterator types it
might be using AND achieve good performance for tee'ing of this
@ -458,22 +473,24 @@ Rationale
References
==========
[1] Discussion on python-dev starting at post:
.. [1] Discussion on python-dev starting at post:
https://mail.python.org/pipermail/python-dev/2003-October/038969.html
[2] Online documentation for the copy module of the standard library:
.. [2] Online documentation for the copy module of the standard library:
http://docs.python.org/library/copy.html
[3] PEP 289, Generator Expressions, Hettinger
.. [3] PEP 289, Generator Expressions, Hettinger
http://www.python.org/dev/peps/pep-0289/
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil