504 lines
19 KiB
Plaintext
504 lines
19 KiB
Plaintext
PEP: 255
|
|
Title: Simple Generators
|
|
Author: Neil Schemenauer <nas@arctrix.com>,
|
|
Tim Peters <tim.peters@gmail.com>,
|
|
Magnus Lie Hetland <magnus@hetland.org>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Requires: 234
|
|
Created: 18-May-2001
|
|
Python-Version: 2.2
|
|
Post-History: 14-Jun-2001, 23-Jun-2001
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP introduces the concept of generators to Python, as well as a new
|
|
statement used in conjunction with them, the ``yield`` statement.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
When a producer function has a hard enough job that it requires maintaining
|
|
state between values produced, most programming languages offer no pleasant and
|
|
efficient solution beyond adding a callback function to the producer's argument
|
|
list, to be called with each value produced.
|
|
|
|
For example, ``tokenize.py`` in the standard library takes this approach: the
|
|
caller must pass a *tokeneater* function to ``tokenize()``, called whenever
|
|
``tokenize()`` finds the next token. This allows tokenize to be coded in a
|
|
natural way, but programs calling tokenize are typically convoluted by the need
|
|
to remember between callbacks which token(s) were seen last. The *tokeneater*
|
|
function in ``tabnanny.py`` is a good example of that, maintaining a state
|
|
machine in global variables, to remember across callbacks what it has already
|
|
seen and what it hopes to see next. This was difficult to get working
|
|
correctly, and is still difficult for people to understand. Unfortunately,
|
|
that's typical of this approach.
|
|
|
|
An alternative would have been for tokenize to produce an entire parse of the
|
|
Python program at once, in a large list. Then tokenize clients could be
|
|
written in a natural way, using local variables and local control flow (such as
|
|
loops and nested if statements) to keep track of their state. But this isn't
|
|
practical: programs can be very large, so no a priori bound can be placed on
|
|
the memory needed to materialize the whole parse; and some tokenize clients
|
|
only want to see whether something specific appears early in the program (e.g.,
|
|
a future statement, or, as is done in IDLE, just the first indented statement),
|
|
and then parsing the whole program first is a severe waste of time.
|
|
|
|
Another alternative would be to make tokenize an :pep:`iterator <234>`,
|
|
delivering the
|
|
next token whenever its ``.next()`` method is invoked. This is pleasant for the
|
|
caller in the same way a large list of results would be, but without the memory
|
|
and "what if I want to get out early?" drawbacks. However, this shifts the
|
|
burden on tokenize to remember *its* state between ``.next()`` invocations, and
|
|
the reader need only glance at ``tokenize.tokenize_loop()`` to realize what a
|
|
horrid chore that would be. Or picture a recursive algorithm for producing the
|
|
nodes of a general tree structure: to cast that into an iterator framework
|
|
requires removing the recursion manually and maintaining the state of the
|
|
traversal by hand.
|
|
|
|
A fourth option is to run the producer and consumer in separate threads. This
|
|
allows both to maintain their states in natural ways, and so is pleasant for
|
|
both. Indeed, Demo/threads/Generator.py in the Python source distribution
|
|
provides a usable synchronized-communication class for doing that in a general
|
|
way. This doesn't work on platforms without threads, though, and is very slow
|
|
on platforms that do (compared to what is achievable without threads).
|
|
|
|
A final option is to use the Stackless [1]_ (:pep:`219`) variant implementation of Python
|
|
instead, which supports lightweight coroutines. This has much the same
|
|
programmatic benefits as the thread option, but is much more efficient.
|
|
However, Stackless is a controversial rethinking of the Python core, and it may
|
|
not be possible for Jython to implement the same semantics. This PEP isn't the
|
|
place to debate that, so suffice it to say here that generators provide a
|
|
useful subset of Stackless functionality in a way that fits easily into the
|
|
current CPython implementation, and is believed to be relatively
|
|
straightforward for other Python implementations.
|
|
|
|
That exhausts the current alternatives. Some other high-level languages
|
|
provide pleasant solutions, notably iterators in Sather [2]_, which were
|
|
inspired by iterators in CLU; and generators in Icon [3]_, a novel language
|
|
where every expression *is a generator*. There are differences among these,
|
|
but the basic idea is the same: provide a kind of function that can return an
|
|
intermediate result ("the next value") to its caller, but maintaining the
|
|
function's local state so that the function can be resumed again right where it
|
|
left off. A very simple example::
|
|
|
|
def fib():
|
|
a, b = 0, 1
|
|
while 1:
|
|
yield b
|
|
a, b = b, a+b
|
|
|
|
When ``fib()`` is first invoked, it sets *a* to 0 and *b* to 1, then yields *b*
|
|
back to its caller. The caller sees 1. When ``fib`` is resumed, from its
|
|
point of view the ``yield`` statement is really the same as, say, a ``print``
|
|
statement: ``fib`` continues after the yield with all local state intact. *a*
|
|
and *b* then become 1 and 1, and ``fib`` loops back to the ``yield``, yielding
|
|
1 to its invoker. And so on. From ``fib``'s point of view it's just
|
|
delivering a sequence of results, as if via callback. But from its caller's
|
|
point of view, the ``fib`` invocation is an iterable object that can be resumed
|
|
at will. As in the thread approach, this allows both sides to be coded in the
|
|
most natural ways; but unlike the thread approach, this can be done efficiently
|
|
and on all platforms. Indeed, resuming a generator should be no more expensive
|
|
than a function call.
|
|
|
|
The same kind of approach applies to many producer/consumer functions. For
|
|
example, ``tokenize.py`` could yield the next token instead of invoking a
|
|
callback function with it as argument, and tokenize clients could iterate over
|
|
the tokens in a natural way: a Python generator is a kind of Python
|
|
:pep:`iterator <234>`, but of an especially powerful kind.
|
|
|
|
|
|
Specification: Yield
|
|
=====================
|
|
|
|
A new statement is introduced::
|
|
|
|
yield_stmt: "yield" expression_list
|
|
|
|
``yield`` is a new keyword, so a ``future`` statement (:pep:`236`) is needed to phase
|
|
this in: in the initial release, a module desiring to use generators must
|
|
include the line::
|
|
|
|
from __future__ import generators
|
|
|
|
near the top (see :pep:`236`) for details). Modules using the identifier
|
|
``yield`` without a ``future`` statement will trigger warnings. In the
|
|
following release, ``yield`` will be a language keyword and the ``future``
|
|
statement will no longer be needed.
|
|
|
|
The ``yield`` statement may only be used inside functions. A function that
|
|
contains a ``yield`` statement is called a generator function. A generator
|
|
function is an ordinary function object in all respects, but has the new
|
|
``CO_GENERATOR`` flag set in the code object's co_flags member.
|
|
|
|
When a generator function is called, the actual arguments are bound to
|
|
function-local formal argument names in the usual way, but no code in the body
|
|
of the function is executed. Instead a generator-iterator object is returned;
|
|
this conforms to the :pep:`iterator protocol <234>`, so in particular can be used in
|
|
for-loops in a natural way. Note that when the intent is clear from context,
|
|
the unqualified name "generator" may be used to refer either to a
|
|
generator-function or a generator-iterator.
|
|
|
|
Each time the ``.next()`` method of a generator-iterator is invoked, the code
|
|
in the body of the generator-function is executed until a ``yield`` or
|
|
``return`` statement (see below) is encountered, or until the end of the body
|
|
is reached.
|
|
|
|
If a ``yield`` statement is encountered, the state of the function is frozen,
|
|
and the value of *expression_list* is returned to ``.next()``'s caller. By
|
|
"frozen" we mean that all local state is retained, including the current
|
|
bindings of local variables, the instruction pointer, and the internal
|
|
evaluation stack: enough information is saved so that the next time
|
|
``.next()`` is invoked, the function can proceed exactly as if the ``yield``
|
|
statement were just another external call.
|
|
|
|
Restriction: A ``yield`` statement is not allowed in the ``try`` clause of a
|
|
``try/finally`` construct. The difficulty is that there's no guarantee the
|
|
generator will ever be resumed, hence no guarantee that the finally block will
|
|
ever get executed; that's too much a violation of finally's purpose to bear.
|
|
|
|
Restriction: A generator cannot be resumed while it is actively running::
|
|
|
|
>>> def g():
|
|
... i = me.next()
|
|
... yield i
|
|
>>> me = g()
|
|
>>> me.next()
|
|
Traceback (most recent call last):
|
|
...
|
|
File "<string>", line 2, in g
|
|
ValueError: generator already executing
|
|
|
|
|
|
Specification: Return
|
|
======================
|
|
|
|
A generator function can also contain return statements of the form::
|
|
|
|
return
|
|
|
|
Note that an *expression_list* is not allowed on return statements in the body
|
|
of a generator (although, of course, they may appear in the bodies of
|
|
non-generator functions nested within the generator).
|
|
|
|
When a return statement is encountered, control proceeds as in any function
|
|
return, executing the appropriate ``finally`` clauses (if any exist). Then a
|
|
``StopIteration`` exception is raised, signalling that the iterator is
|
|
exhausted. A ``StopIteration`` exception is also raised if control flows off
|
|
the end of the generator without an explicit return.
|
|
|
|
Note that return means "I'm done, and have nothing interesting to return", for
|
|
both generator functions and non-generator functions.
|
|
|
|
Note that return isn't always equivalent to raising ``StopIteration``: the
|
|
difference lies in how enclosing ``try/except`` constructs are treated. For
|
|
example,::
|
|
|
|
>>> def f1():
|
|
... try:
|
|
... return
|
|
... except:
|
|
... yield 1
|
|
>>> print list(f1())
|
|
[]
|
|
|
|
because, as in any function, ``return`` simply exits, but::
|
|
|
|
>>> def f2():
|
|
... try:
|
|
... raise StopIteration
|
|
... except:
|
|
... yield 42
|
|
>>> print list(f2())
|
|
[42]
|
|
|
|
because ``StopIteration`` is captured by a bare ``except``, as is any
|
|
exception.
|
|
|
|
|
|
Specification: Generators and Exception Propagation
|
|
====================================================
|
|
|
|
If an unhandled exception-- including, but not limited to, ``StopIteration``
|
|
--is raised by, or passes through, a generator function, then the exception is
|
|
passed on to the caller in the usual way, and subsequent attempts to resume the
|
|
generator function raise ``StopIteration``. In other words, an unhandled
|
|
exception terminates a generator's useful life.
|
|
|
|
Example (not idiomatic but to illustrate the point)::
|
|
|
|
>>> def f():
|
|
... return 1/0
|
|
>>> def g():
|
|
... yield f() # the zero division exception propagates
|
|
... yield 42 # and we'll never get here
|
|
>>> k = g()
|
|
>>> k.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
File "<stdin>", line 2, in g
|
|
File "<stdin>", line 2, in f
|
|
ZeroDivisionError: integer division or modulo by zero
|
|
>>> k.next() # and the generator cannot be resumed
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
StopIteration
|
|
>>>
|
|
|
|
|
|
Specification: Try/Except/Finally
|
|
==================================
|
|
|
|
As noted earlier, ``yield`` is not allowed in the ``try`` clause of a
|
|
``try/finally`` construct. A consequence is that generators should allocate
|
|
critical resources with great care. There is no restriction on ``yield``
|
|
otherwise appearing in ``finally`` clauses, ``except`` clauses, or in the
|
|
``try`` clause of a ``try/except`` construct::
|
|
|
|
>>> def f():
|
|
... try:
|
|
... yield 1
|
|
... try:
|
|
... yield 2
|
|
... 1/0
|
|
... yield 3 # never get here
|
|
... except ZeroDivisionError:
|
|
... yield 4
|
|
... yield 5
|
|
... raise
|
|
... except:
|
|
... yield 6
|
|
... yield 7 # the "raise" above stops this
|
|
... except:
|
|
... yield 8
|
|
... yield 9
|
|
... try:
|
|
... x = 12
|
|
... finally:
|
|
... yield 10
|
|
... yield 11
|
|
>>> print list(f())
|
|
[1, 2, 4, 5, 8, 9, 10, 11]
|
|
>>>
|
|
|
|
|
|
Example
|
|
=======
|
|
|
|
::
|
|
|
|
# A binary tree class.
|
|
class Tree:
|
|
|
|
def __init__(self, label, left=None, right=None):
|
|
self.label = label
|
|
self.left = left
|
|
self.right = right
|
|
|
|
def __repr__(self, level=0, indent=" "):
|
|
s = level*indent + `self.label`
|
|
if self.left:
|
|
s = s + "\n" + self.left.__repr__(level+1, indent)
|
|
if self.right:
|
|
s = s + "\n" + self.right.__repr__(level+1, indent)
|
|
return s
|
|
|
|
def __iter__(self):
|
|
return inorder(self)
|
|
|
|
# Create a Tree from a list.
|
|
def tree(list):
|
|
n = len(list)
|
|
if n == 0:
|
|
return []
|
|
i = n / 2
|
|
return Tree(list[i], tree(list[:i]), tree(list[i+1:]))
|
|
|
|
# A recursive generator that generates Tree labels in in-order.
|
|
def inorder(t):
|
|
if t:
|
|
for x in inorder(t.left):
|
|
yield x
|
|
yield t.label
|
|
for x in inorder(t.right):
|
|
yield x
|
|
|
|
# Show it off: create a tree.
|
|
t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
|
|
# Print the nodes of the tree in in-order.
|
|
for x in t:
|
|
print x,
|
|
print
|
|
|
|
# A non-recursive generator.
|
|
def inorder(node):
|
|
stack = []
|
|
while node:
|
|
while node.left:
|
|
stack.append(node)
|
|
node = node.left
|
|
yield node.label
|
|
while not node.right:
|
|
try:
|
|
node = stack.pop()
|
|
except IndexError:
|
|
return
|
|
yield node.label
|
|
node = node.right
|
|
|
|
# Exercise the non-recursive generator.
|
|
for x in t:
|
|
print x,
|
|
print
|
|
|
|
Both output blocks display::
|
|
|
|
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
|
|
|
|
|
|
Q & A
|
|
=====
|
|
|
|
Why not a new keyword instead of reusing ``def``?
|
|
-------------------------------------------------
|
|
|
|
See BDFL Pronouncements section below.
|
|
|
|
Why a new keyword for ``yield``? Why not a builtin function instead?
|
|
---------------------------------------------------------------------
|
|
|
|
Control flow is much better expressed via keyword in Python, and yield is a
|
|
control construct. It's also believed that efficient implementation in Jython
|
|
requires that the compiler be able to determine potential suspension points at
|
|
compile-time, and a new keyword makes that easy. The CPython reference
|
|
implementation also exploits it heavily, to detect which functions *are*
|
|
generator-functions (although a new keyword in place of ``def`` would solve
|
|
that for CPython -- but people asking the "why a new keyword?" question don't
|
|
want any new keyword).
|
|
|
|
Then why not some other special syntax without a new keyword?
|
|
-------------------------------------------------------------
|
|
|
|
For example, one of these instead of ``yield 3``::
|
|
|
|
return 3 and continue
|
|
return and continue 3
|
|
return generating 3
|
|
continue return 3
|
|
return >> , 3
|
|
from generator return 3
|
|
return >> 3
|
|
return << 3
|
|
>> 3
|
|
<< 3
|
|
* 3
|
|
|
|
Did I miss one <wink>? Out of hundreds of messages, I counted three
|
|
suggesting such an alternative, and extracted the above from them. It would be
|
|
nice not to need a new keyword, but nicer to make ``yield`` very clear -- I
|
|
don't want to have to *deduce* that a yield is occurring from making sense of a
|
|
previously senseless sequence of keywords or operators. Still, if this
|
|
attracts enough interest, proponents should settle on a single consensus
|
|
suggestion, and Guido will Pronounce on it.
|
|
|
|
Why allow ``return`` at all? Why not force termination to be spelled ``raise StopIteration``?
|
|
----------------------------------------------------------------------------------------------
|
|
|
|
The mechanics of ``StopIteration`` are low-level details, much like the
|
|
mechanics of ``IndexError`` in Python 2.1: the implementation needs to do
|
|
*something* well-defined under the covers, and Python exposes these mechanisms
|
|
for advanced users. That's not an argument for forcing everyone to work at
|
|
that level, though. ``return`` means "I'm done" in any kind of function, and
|
|
that's easy to explain and to use. Note that ``return`` isn't always equivalent
|
|
to ``raise StopIteration`` in try/except construct, either (see the
|
|
"Specification: Return" section).
|
|
|
|
Then why not allow an expression on ``return`` too?
|
|
---------------------------------------------------
|
|
|
|
Perhaps we will someday. In Icon, ``return expr`` means both "I'm done", and
|
|
"but I have one final useful value to return too, and this is it". At the
|
|
start, and in the absence of compelling uses for ``return expr``, it's simply
|
|
cleaner to use ``yield`` exclusively for delivering values.
|
|
|
|
|
|
BDFL Pronouncements
|
|
===================
|
|
|
|
Issue
|
|
-----
|
|
|
|
Introduce another new keyword (say, ``gen`` or ``generator``) in place
|
|
of ``def``, or otherwise alter the syntax, to distinguish generator-functions
|
|
from non-generator functions.
|
|
|
|
Con
|
|
---
|
|
|
|
In practice (how you think about them), generators *are* functions, but
|
|
with the twist that they're resumable. The mechanics of how they're set up
|
|
is a comparatively minor technical issue, and introducing a new keyword would
|
|
unhelpfully overemphasize the mechanics of how generators get started (a vital
|
|
but tiny part of a generator's life).
|
|
|
|
Pro
|
|
---
|
|
|
|
In reality (how you think about them), generator-functions are actually
|
|
factory functions that produce generator-iterators as if by magic. In this
|
|
respect they're radically different from non-generator functions, acting more
|
|
like a constructor than a function, so reusing ``def`` is at best confusing.
|
|
A ``yield`` statement buried in the body is not enough warning that the
|
|
semantics are so different.
|
|
|
|
BDFL
|
|
----
|
|
|
|
``def`` it stays. No argument on either side is totally convincing, so I
|
|
have consulted my language designer's intuition. It tells me that the syntax
|
|
proposed in the PEP is exactly right - not too hot, not too cold. But, like
|
|
the Oracle at Delphi in Greek mythology, it doesn't tell me why, so I don't
|
|
have a rebuttal for the arguments against the PEP syntax. The best I can come
|
|
up with (apart from agreeing with the rebuttals ... already made) is "FUD".
|
|
If this had been part of the language from day one, I very much doubt it would
|
|
have made Andrew Kuchling's "Python Warts" page.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The current implementation, in a preliminary state (no docs, but well tested
|
|
and solid), is part of Python's CVS development tree [5]_. Using this requires
|
|
that you build Python from source.
|
|
|
|
This was derived from an earlier patch by Neil Schemenauer [4]_.
|
|
|
|
|
|
Footnotes and References
|
|
========================
|
|
|
|
.. [1] http://www.stackless.com/
|
|
|
|
.. [2] "Iteration Abstraction in Sather"
|
|
Murer, Omohundro, Stoutamire and Szyperski
|
|
http://www.icsi.berkeley.edu/~sather/Publications/toplas.html
|
|
|
|
.. [3] http://www.cs.arizona.edu/icon/
|
|
|
|
.. [4] http://python.ca/nas/python/generator.diff
|
|
|
|
.. [5] To experiment with this implementation, check out Python from CVS
|
|
according to the instructions at http://sf.net/cvs/?group_id=5470
|
|
Note that the std test ``Lib/test/test_generators.py`` contains many
|
|
examples, including all those in this PEP.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|