python-peps/pep-0323/index.html

571 lines
50 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 323 Copyable Iterators | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0323/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 323 Copyable Iterators | peps.python.org'>
<meta property="og:description" content="This PEP suggests that some iterator types should support shallow copies of their instances by exposing a __copy__ method which meets some specific requirements, and indicates how code using an iterator might exploit such a __copy__ method when present.">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0323/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="This PEP suggests that some iterator types should support shallow copies of their instances by exposing a __copy__ method which meets some specific requirements, and indicates how code using an iterator might exploit such a __copy__ method when present.">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 323</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 323 Copyable Iterators</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Alex Martelli &lt;aleaxit&#32;&#97;t&#32;gmail.com&gt;</dd>
<dt class="field-even">Status<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Inactive draft that may be taken up again at a later time">Deferred</abbr></dd>
<dt class="field-odd">Type<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-even">Created<span class="colon">:</span></dt>
<dd class="field-even">25-Oct-2003</dd>
<dt class="field-odd">Python-Version<span class="colon">:</span></dt>
<dd class="field-odd">2.5</dd>
<dt class="field-even">Post-History<span class="colon">:</span></dt>
<dd class="field-even">29-Oct-2003</dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#deferral">Deferral</a></li>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#update-and-comments">Update and Comments</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#specification">Specification</a></li>
<li><a class="reference internal" href="#details">Details</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="deferral">
<h2><a class="toc-backref" href="#deferral" role="doc-backlink">Deferral</a></h2>
<p>This PEP has been deferred. Copyable iterators are a nice idea, but after
four years, no implementation or widespread interest has emerged.</p>
</section>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p>This PEP suggests that some iterator types should support shallow
copies of their instances by exposing a <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method which meets
some specific requirements, and indicates how code using an iterator
might exploit such a <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method when present.</p>
</section>
<section id="update-and-comments">
<h2><a class="toc-backref" href="#update-and-comments" role="doc-backlink">Update and Comments</a></h2>
<p>Support for <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> was included in Py2.4s <code class="docutils literal notranslate"><span class="pre">itertools.tee()</span></code>.</p>
<p>Adding <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> methods to existing iterators will change the
behavior under <code class="docutils literal notranslate"><span class="pre">tee()</span></code>. Currently, the copied iterators remain
tied to the original iterator. If the original advances, then
so do all of the copies. Good practice is to overwrite the
original so that anomalies dont result: <code class="docutils literal notranslate"><span class="pre">a,b=tee(a)</span></code>.
Code that doesnt follow that practice may observe a semantic
change if a <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method is added to an iterator.</p>
</section>
<section id="motivation">
<h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2>
<p>In Python up to 2.3, most built-in iterator types dont let the user
copy their instances. User-coded iterators that do let their clients
call copy.copy on their instances may, or may not, happen to return,
as a result of the copy, a separate iterator object that may be
iterated upon independently from the original.</p>
<p>Currently, “support” for copy.copy in a user-coded iterator type is
almost invariably “accidental” i.e., the standard machinery of the
copy method in Pythons standard librarys copy module does build and
return a copy. However, the copy will be independently iterable with
respect to the original only if calling <code class="docutils literal notranslate"><span class="pre">.next()</span></code> on an instance of that
class happens to change instance state solely by rebinding some
attributes to new values, and not by mutating some attributes
existing values.</p>
<p>For example, an iterator whose “index” state is held as an integer
attribute will probably give usable copies, since (integers being
immutable) <code class="docutils literal notranslate"><span class="pre">.next()</span></code> presumably just rebinds that attribute. On the
other hand, another iterator whose “index” state is held as a list
attribute will probably mutate the same list object when <code class="docutils literal notranslate"><span class="pre">.next()</span></code>
executes, and therefore copies of such an iterator will not be
iterable separately and independently from the original.</p>
<p>Given this existing situation, <code class="docutils literal notranslate"><span class="pre">copy.copy(it)</span></code> on some iterator object
isnt very useful, nor, therefore, is it at all widely used. However,
there are many cases in which being able to get a “snapshot” of an
iterator, as a “bookmark”, so as to be able to keep iterating along
the sequence but later iterate again on the same sequence from the
bookmark onwards, is useful. To support such “bookmarking”, module
itertools, in 2.4, has grown a tee function, to be used as:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">it</span><span class="p">,</span> <span class="n">bookmark</span> <span class="o">=</span> <span class="n">itertools</span><span class="o">.</span><span class="n">tee</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
</pre></div>
</div>
<p>The previous value of it must not be used again, which is why this
typical usage idiom rebinds the name. After this call, it and
bookmark are independently-iterable iterators on the same underlying
sequence as the original value of it: this satisfies application
needs for “iterator copying”.</p>
<p>However, when itertools.tee can make no hypotheses about the nature of
the iterator it is passed as an argument, it must save in memory all
items through which one of the two teed iterators, but not yet both,
have stepped. This can be quite costly in terms of memory, if the two
iterators get very far from each other in their stepping; indeed, in
some cases it may be preferable to make a list from the iterator so as
to be able to step repeatedly through the subsequence, or, if that is
too costy in terms of memory, save items to disk, again in order to be
able to iterate through them repeatedly.</p>
<p>This PEP proposes another idea that will, in some important cases,
allow <code class="docutils literal notranslate"><span class="pre">itertools.tee</span></code> to do its job with minimal cost in terms of
memory; user code may also occasionally be able to exploit the idea in
order to decide whether to copy an iterator, make a list from it, or
use an auxiliary disk file.</p>
<p>The key consideration is that some important iterators, such as those
which built-in function iter builds over sequences, would be
intrinsically easy to copy: just get another reference to the same
sequence, and a copy of the integer index. However, in Python 2.3,
those iterators dont expose the state, and dont support <code class="docutils literal notranslate"><span class="pre">copy.copy</span></code>.</p>
<p>The purpose of this PEP, therefore, is to have those iterator types
expose a suitable <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method. Similarly, user-coded iterator
types that can provide copies of their instances, suitable for
separate and independent iteration, with limited costs in time and
space, should also expose a suitable <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method. While
copy.copy also supports other ways to let a type control the way
its instances are copied, it is suggested, for simplicity, that
iterator types that support copying always do so by exposing a
<code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method, and not in the other ways <code class="docutils literal notranslate"><span class="pre">copy.copy</span></code> supports.</p>
<p>Having iterators expose a suitable <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> when feasible will afford
easy optimization of itertools.tee and similar user code, as in:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">tee</span><span class="p">(</span><span class="n">it</span><span class="p">):</span>
<span class="n">it</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span> <span class="n">copier</span> <span class="o">=</span> <span class="n">it</span><span class="o">.</span><span class="n">__copy__</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
<span class="c1"># non-copyable iterator, do all the needed hard work</span>
<span class="c1"># [snipped!]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">it</span><span class="p">,</span> <span class="n">copier</span><span class="p">()</span>
</pre></div>
</div>
<p>Note that this function does NOT call “copy.copy(it)”, which (even
after this PEP is implemented) might well still “just happen to
succeed”. for some iterator type that is implemented as a user-coded
class. without really supplying an adequate “independently iterable”
copy object as its result.</p>
</section>
<section id="specification">
<h2><a class="toc-backref" href="#specification" role="doc-backlink">Specification</a></h2>
<p>Any iterator type X may expose a method <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> that is callable
without arguments on any instance x of X. The method should be
exposed if and only if the iterator type can provide copyability with
reasonably little computational and memory effort. Furthermore, the
new object y returned by method <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> should be a new instance
of X that is iterable independently and separately from x, stepping
along the same “underlying sequence” of items.</p>
<p>For example, suppose a class Iter essentially duplicated the
functionality of the iter builtin for iterating on a sequence:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Iter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sequence</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sequence</span> <span class="o">=</span> <span class="n">sequence</span>
<span class="bp">self</span><span class="o">.</span><span class="n">index</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="fm">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="nf">next</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">sequence</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">index</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">StopIteration</span>
<span class="bp">self</span><span class="o">.</span><span class="n">index</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
</div>
<p>To make this Iter class compliant with this PEP, the following
addition to the body of class Iter would suffice:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">__copy__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">sequence</span><span class="p">)</span>
<span class="n">result</span><span class="o">.</span><span class="n">index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">index</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
</div>
<p>Note that <code class="docutils literal notranslate"><span class="pre">__copy__</span></code>, in this case, does not even try to copy the
sequence; if the sequence is altered while either or both of the
original and copied iterators are still stepping on it, the iteration
behavior is quite likely to go awry anyway it is not <code class="docutils literal notranslate"><span class="pre">__copy__</span></code>s
responsibility to change this normal Python behavior for iterators
which iterate on mutable sequences (that might, perhaps, be the
specification for a <code class="docutils literal notranslate"><span class="pre">__deepcopy__</span></code> method of iterators, which, however,
this PEP does not deal with).</p>
<p>Consider also a “random iterator”, which provides a nonterminating
sequence of results from some method of a random instance, called
with given arguments:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">RandomIterator</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">bound_method</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">call</span> <span class="o">=</span> <span class="n">bound_method</span>
<span class="bp">self</span><span class="o">.</span><span class="n">args</span> <span class="o">=</span> <span class="n">args</span>
<span class="k">def</span> <span class="fm">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="nf">next</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">args</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__copy__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="kn">import</span> <span class="nn">copy</span><span class="o">,</span> <span class="nn">new</span>
<span class="n">im_self</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">call</span><span class="o">.</span><span class="n">im_self</span><span class="p">)</span>
<span class="n">method</span> <span class="o">=</span> <span class="n">new</span><span class="o">.</span><span class="n">instancemethod</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">call</span><span class="o">.</span><span class="n">im_func</span><span class="p">,</span> <span class="n">im_self</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="p">(</span><span class="n">method</span><span class="p">,</span> <span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">args</span><span class="p">)</span>
</pre></div>
</div>
<p>This iterator type is slightly more general than its name implies, as
it supports calls to any bound method (or other callable, but if the
callable is not a bound method, then method <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> will fail). But
the use case is for the purpose of generating random streams, as in:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">random</span>
<span class="k">def</span> <span class="nf">show5</span><span class="p">(</span><span class="n">it</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">result</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">it</span><span class="p">):</span>
<span class="nb">print</span> <span class="s1">&#39;</span><span class="si">%6.3f</span><span class="s1">&#39;</span><span class="o">%</span><span class="n">result</span><span class="p">,</span>
<span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">4</span><span class="p">:</span> <span class="k">break</span>
<span class="nb">print</span>
<span class="n">normit</span> <span class="o">=</span> <span class="n">RandomIterator</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">Random</span><span class="p">()</span><span class="o">.</span><span class="n">gauss</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">show5</span><span class="p">(</span><span class="n">normit</span><span class="p">)</span>
<span class="n">copit</span> <span class="o">=</span> <span class="n">normit</span><span class="o">.</span><span class="n">__copy__</span><span class="p">()</span>
<span class="n">show5</span><span class="p">(</span><span class="n">normit</span><span class="p">)</span>
<span class="n">show5</span><span class="p">(</span><span class="n">copit</span><span class="p">)</span>
</pre></div>
</div>
<p>which will display some output such as:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">-</span><span class="mf">0.536</span> <span class="mf">1.936</span> <span class="o">-</span><span class="mf">1.182</span> <span class="o">-</span><span class="mf">1.690</span> <span class="o">-</span><span class="mf">1.184</span>
<span class="mf">0.666</span> <span class="o">-</span><span class="mf">0.701</span> <span class="mf">1.214</span> <span class="mf">0.348</span> <span class="mf">1.373</span>
<span class="mf">0.666</span> <span class="o">-</span><span class="mf">0.701</span> <span class="mf">1.214</span> <span class="mf">0.348</span> <span class="mf">1.373</span>
</pre></div>
</div>
<p>the key point being that the second and third lines are equal, because
the normit and copit iterators will step along the same “underlying
sequence”. (As an aside, note that to get a copy of <code class="docutils literal notranslate"><span class="pre">self.call.im_self</span></code>
we must use <code class="docutils literal notranslate"><span class="pre">copy.copy</span></code>, NOT try getting at a <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method directly,
because for example instances of <code class="docutils literal notranslate"><span class="pre">random.Random</span></code> support copying via
<code class="docutils literal notranslate"><span class="pre">__getstate__</span></code> and <code class="docutils literal notranslate"><span class="pre">__setstate__</span></code>, NOT via <code class="docutils literal notranslate"><span class="pre">__copy__</span></code>; indeed, using
copy.copy is the normal way to get a shallow copy of any object
copyable iterators are different because of the already-mentioned
uncertainty about the result of <code class="docutils literal notranslate"><span class="pre">copy.copy</span></code> supporting these “copyable
iterator” specs).</p>
</section>
<section id="details">
<h2><a class="toc-backref" href="#details" role="doc-backlink">Details</a></h2>
<p>Besides adding to the Python docs a recommendation that user-coded
iterator types support a <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method (if and only if it can be
implemented with small costs in memory and runtime, and produce an
independently-iterable copy of an iterator object), this PEPs
implementation will specifically include the addition of copyability
to the iterators over sequences that built-in iter returns, and also
to the iterators over a dictionary returned by the methods <code class="docutils literal notranslate"><span class="pre">__iter__</span></code>,
iterkeys, itervalues, and iteritems of built-in type dict.</p>
<p>Iterators produced by generator functions will not be copyable.
However, iterators produced by the new “generator expressions” of
Python 2.4 (<a class="pep reference internal" href="../pep-0289/" title="PEP 289 Generator Expressions">PEP 289</a>) should be copyable if their underlying
<code class="docutils literal notranslate"><span class="pre">iterator[s]</span></code> are; the strict limitations on what is possible in a
generator expression, compared to the much vaster generality of a
generator, should make that feasible. Similarly, the iterators
produced by the built-in function <code class="docutils literal notranslate"><span class="pre">enumerate</span></code>, and certain functions
suppiled by module itertools, should be copyable if the underlying
iterators are.</p>
<p>The implementation of this PEP will also include the optimization of
the new itertools.tee function mentioned in the Motivation section.</p>
</section>
<section id="rationale">
<h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2>
<p>The main use case for (shallow) copying of an iterator is the same as
for the function <code class="docutils literal notranslate"><span class="pre">itertools.tee</span></code> (new in 2.4). User code will not
directly attempt to copy an iterator, because it would have to deal
separately with uncopyable cases; calling <code class="docutils literal notranslate"><span class="pre">itertools.tee</span></code> will
internally perform the copy when appropriate, and implicitly fallback
to a maximally efficient non-copying strategy for iterators that are
not copyable. (Occasionally, user code may want more direct control,
specifically in order to deal with non-copyable iterators by other
strategies, such as making a list or saving the sequence to disk).</p>
<p>A teed iterator may serve as a “reference point”, allowing processing
of a sequence to continue or resume from a known point, while the
other independent iterator can be freely advanced to “explore” a
further part of the sequence as needed. A simple example: a generator
function which, given an iterator of numbers (assumed to be positive),
returns a corresponding iterator, each of whose items is the fraction
of the total corresponding to each corresponding item of the input
iterator. The caller may pass the total as a value, if known in
advance; otherwise, the iterator returned by calling this generator
function will first compute the total.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">fractions</span><span class="p">(</span><span class="n">numbers</span><span class="p">,</span> <span class="n">total</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="n">total</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">numbers</span><span class="p">,</span> <span class="n">aux</span> <span class="o">=</span> <span class="n">itertools</span><span class="o">.</span><span class="n">tee</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
<span class="n">total</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">aux</span><span class="p">)</span>
<span class="n">total</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">total</span><span class="p">)</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">numbers</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">item</span> <span class="o">/</span> <span class="n">total</span>
</pre></div>
</div>
<p>The ability to tee the numbers iterator allows this generator to
precompute the total, if needed, without necessarily requiring
O(N) auxiliary memory if the numbers iterator is copyable.</p>
<p>As another example of “iterator bookmarking”, consider a stream of
numbers with an occasional string as a “postfix operator” now and
then. By far most frequent such operator is a +, whereupon we must
sum all previous numbers (since the last previous operator if any, or
else since the start) and yield the result. Sometimes we find a *
instead, which is the same except that the previous numbers must
instead be multiplied, not summed.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">filter_weird_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">):</span>
<span class="n">it</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">stream</span><span class="p">)</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">it</span><span class="p">,</span> <span class="n">bookmark</span> <span class="o">=</span> <span class="n">itertools</span><span class="o">.</span><span class="n">tee</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">it</span><span class="p">:</span>
<span class="k">if</span> <span class="n">item</span><span class="o">==</span><span class="s1">&#39;+&#39;</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">total</span>
<span class="k">break</span>
<span class="k">elif</span> <span class="n">item</span><span class="o">==</span><span class="s1">&#39;*&#39;</span><span class="p">:</span>
<span class="n">product</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">bookmark</span><span class="p">:</span>
<span class="k">if</span> <span class="n">item</span><span class="o">==</span><span class="s1">&#39;*&#39;</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">product</span>
<span class="k">break</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">product</span> <span class="o">*=</span> <span class="n">item</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">total</span> <span class="o">+=</span> <span class="n">item</span>
</pre></div>
</div>
<p>Similar use cases of itertools.tee can support such tasks as
“undo” on a stream of commands represented by an iterator,
“backtracking” on the parse of a stream of tokens, and so on.
(Of course, in each case, one should also consider simpler
possibilities such as saving relevant portions of the sequence
into lists while stepping on the sequence with just one iterator,
depending on the details of ones task).</p>
<p>Here is an example, in pure Python, of how the enumerate
built-in could be extended to support <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> if its underlying
iterator also supported <code class="docutils literal notranslate"><span class="pre">__copy__</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">enumerate</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">it</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">def</span> <span class="fm">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="nf">next</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">i</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">it</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">__copy__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="fm">__new__</span><span class="p">()</span>
<span class="n">result</span><span class="o">.</span><span class="n">it</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">it</span><span class="o">.</span><span class="n">__copy__</span><span class="p">()</span>
<span class="n">result</span><span class="o">.</span><span class="n">i</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">i</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
</div>
<p>Here is an example of the kind of “fragility” produced by “accidental
copyability” of an iterator the reason why one must NOT use
copy.copy expecting, if it succeeds, to receive as a result an
iterator which is iterable-on independently from the original. Here
is an iterator class that iterates (in preorder) on “trees” which, for
simplicity, are just nested lists any item thats a list is treated
as a subtree, any other item as a leaf.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ListreeIter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tree</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">tree</span> <span class="o">=</span> <span class="p">[</span><span class="n">tree</span><span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">indx</span> <span class="o">=</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">def</span> <span class="fm">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="nf">next</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">indx</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">StopIteration</span>
<span class="bp">self</span><span class="o">.</span><span class="n">indx</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">tree</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="bp">self</span><span class="o">.</span><span class="n">indx</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]]</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">tree</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">indx</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="ow">is</span> <span class="ow">not</span> <span class="nb">list</span><span class="p">:</span>
<span class="k">return</span> <span class="n">result</span>
<span class="bp">self</span><span class="o">.</span><span class="n">tree</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">indx</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
</pre></div>
</div>
<p>Now, for example, the following code:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">copy</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">[</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="p">[</span><span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span> <span class="mi">9</span><span class="p">],</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="p">]</span>
<span class="nb">print</span> <span class="s1">&#39;showing all items:&#39;</span><span class="p">,</span>
<span class="n">it</span> <span class="o">=</span> <span class="n">ListreeIter</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">it</span><span class="p">:</span>
<span class="nb">print</span> <span class="n">i</span><span class="p">,</span>
<span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">6</span><span class="p">:</span> <span class="n">cop</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="nb">print</span>
<span class="nb">print</span> <span class="s1">&#39;showing items &gt;6 again:&#39;</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">cop</span><span class="p">:</span> <span class="nb">print</span> <span class="n">i</span><span class="p">,</span>
<span class="nb">print</span>
</pre></div>
</div>
<p>does NOT work as intended the “cop” iterator gets consumed, and
exhausted, step by step as the original “it” iterator is, because
the accidental (rather than deliberate) copying performed by
<code class="docutils literal notranslate"><span class="pre">copy.copy</span></code> shares, rather than duplicating the “index” list, which
is the mutable attribute <code class="docutils literal notranslate"><span class="pre">it.indx</span></code> (a list of numerical indices).
Thus, this “client code” of the iterator, which attempts to iterate
twice over a portion of the sequence via a <code class="docutils literal notranslate"><span class="pre">copy.copy</span></code> on the
iterator, is NOT correct.</p>
<p>Some correct solutions include using <code class="docutils literal notranslate"><span class="pre">itertools.tee</span></code>, i.e., changing
the first for loop into:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">it</span><span class="p">:</span>
<span class="nb">print</span> <span class="n">i</span><span class="p">,</span>
<span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">6</span><span class="p">:</span>
<span class="n">it</span><span class="p">,</span> <span class="n">cop</span> <span class="o">=</span> <span class="n">itertools</span><span class="o">.</span><span class="n">tee</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">break</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">it</span><span class="p">:</span> <span class="nb">print</span> <span class="n">i</span><span class="p">,</span>
</pre></div>
</div>
<p>(note that we MUST break the loop in two, otherwise wed still
be looping on the ORIGINAL value of it, which must NOT be used
further after the call to tee!!!); or making a list, i.e.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">it</span><span class="p">:</span>
<span class="nb">print</span> <span class="n">i</span><span class="p">,</span>
<span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">6</span><span class="p">:</span>
<span class="n">cop</span> <span class="o">=</span> <span class="n">lit</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">break</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">lit</span><span class="p">:</span> <span class="nb">print</span> <span class="n">i</span><span class="p">,</span>
</pre></div>
</div>
<p>(again, the loop must be broken in two, since iterator it
gets exhausted by the call <code class="docutils literal notranslate"><span class="pre">list(it)</span></code>).</p>
<p>Finally, all of these solutions would work if Listiter supplied
a suitable <code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method, as this PEP recommends:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">__copy__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="n">new</span><span class="p">()</span>
<span class="n">result</span><span class="o">.</span><span class="n">tree</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tree</span><span class="p">)</span>
<span class="n">result</span><span class="o">.</span><span class="n">indx</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">indx</span><span class="p">)</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
</div>
<p>There is no need to get any “deeper” in the copy, but the two
mutable “index state” attributes must indeed be copied in order
to achieve a “proper” (independently iterable) iterator-copy.</p>
<p>The recommended solution is to have class Listiter supply this
<code class="docutils literal notranslate"><span class="pre">__copy__</span></code> method AND have client code use <code class="docutils literal notranslate"><span class="pre">itertools.tee</span></code> (with
the split-in-two-parts loop as shown above). This will make
client code maximally tolerant of different iterator types it
might be using AND achieve good performance for teeing of this
specific iterator type at the same time.</p>
</section>
<section id="references">
<h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2>
<p>[1] Discussion on python-dev starting at post:
<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2003-October/038969.html">https://mail.python.org/pipermail/python-dev/2003-October/038969.html</a></p>
<p>[2] Online documentation for the copy module of the standard library:
<a class="reference external" href="https://docs.python.org/release/2.6/library/copy.html">https://docs.python.org/release/2.6/library/copy.html</a></p>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document has been placed in the public domain.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0323.rst">https://github.com/python/peps/blob/main/peps/pep-0323.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0323.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#deferral">Deferral</a></li>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#update-and-comments">Update and Comments</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#specification">Specification</a></li>
<li><a class="reference internal" href="#details">Details</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0323.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>