python-peps/pep-0255.txt

404 lines
17 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 255
Title: Simple Generators
Version: $Revision$
Author: nas@python.ca (Neil Schemenauer),
tim.one@home.com (Tim Peters),
magnus@hetland.org (Magnus Lie Hetland)
Discussion-To: python-iterators@lists.sourceforge.net
Status: Draft
Type: Standards Track
Requires: 234
Created: 18-May-2001
Python-Version: 2.2
Post-History: 14-Jun-2001
Abstract
This PEP introduces the concept of generators to Python, as well
as a new statement used in conjunction with them, the "yield"
statement.
Motivation
When a producer function has a hard enough job that it requires
maintaining state between values produced, most programming languages
offer no pleasant and efficient solution beyond adding a callback
function to the producer's argument list, to be called with each value
produced.
For example, tokenize.py in the standard library takes this approach:
the caller must pass a "tokeneater" function to tokenize(), called
whenever tokenize() finds the next token. This allows tokenize to be
coded in a natural way, but programs calling tokenize are typically
convoluted by the need to remember between callbacks which token(s)
were seen last. The tokeneater function in tabnanny.py is a good
example of that, maintaining a state machine in global variables, to
remember across callbacks what it has already seen and what it hopes to
see next. This was difficult to get working correctly, and is still
difficult for people to understand. Unfortunately, that's typical of
this approach.
An alternative would have been for tokenize to produce an entire parse
of the Python program at once, in a large list. Then tokenize clients
could be written in a natural way, using local variables and local
control flow (such as loops and nested if statements) to keep track of
their state. But this isn't practical: programs can be very large, so
no a priori bound can be placed on the memory needed to materialize the
whole parse; and some tokenize clients only want to see whether
something specific appears early in the program (e.g., a future
statement, or, as is done in IDLE, just the first indented statement),
and then parsing the whole program first is a severe waste of time.
Another alternative would be to make tokenize an iterator[1],
delivering the next token whenever its .next() method is invoked. This
is pleasant for the caller in the same way a large list of results
would be, but without the memory and "what if I want to get out early?"
drawbacks. However, this shifts the burden on tokenize to remember
*its* state between .next() invocations, and the reader need only
glance at tokenize.tokenize_loop() to realize what a horrid chore that
would be. Or picture a recursive algorithm for producing the nodes of
a general tree structure: to cast that into an iterator framework
requires removing the recursion manually and maintaining the state of
the traversal by hand.
A fourth option is to run the producer and consumer in separate
threads. This allows both to maintain their states in natural ways,
and so is pleasant for both. Indeed, Demo/threads/Generator.py in the
Python source distribution provides a usable synchronized-communication
class for doing that in a general way. This doesn't work on platforms
without threads, though, and is very slow on platforms that do
(compared to what is achievable without threads).
A final option is to use the Stackless[2][3] variant implementation of
Python instead, which supports lightweight coroutines. This has much
the same programmatic benefits as the thread option, but is much more
efficient. However, Stackless is a controversial rethinking of the
Python core, and it may not be possible for Jython to implement the
same semantics. This PEP isn't the place to debate that, so suffice it
to say here that generators provide a useful subset of Stackless
functionality in a way that fits easily into the current CPython
implementation, and is believed to be relatively straightforward for
other Python implementations.
That exhausts the current alternatives. Some other high-level
languages provide pleasant solutions, notably iterators in Sather[4],
which were inspired by iterators in CLU; and generators in Icon[5], a
novel language where every expression "is a generator". There are
differences among these, but the basic idea is the same: provide a
kind of function that can return an intermediate result ("the next
value") to its caller, but maintaining the function's local state so
that the function can be resumed again right where it left off. A
very simple example:
def fib():
a, b = 0, 1
while 1:
yield b
a, b = b, a+b
When fib() is first invoked, it sets a to 0 and b to 1, then yields b
back to its caller. The caller sees 1. When fib is resumed, from its
point of view the yield statement is really the same as, say, a print
statement: fib continues after the yield with all local state intact.
a and b then become 1 and 1, and fib loops back to the yield, yielding
1 to its invoker. And so on. From fib's point of view it's just
delivering a sequence of results, as if via callback. But from its
caller's point of view, the fib invocation is an iterable object that
can be resumed at will. As in the thread approach, this allows both
sides to be coded in the most natural ways; but unlike the thread
approach, this can be done efficiently and on all platforms. Indeed,
resuming a generator should be no more expensive than a function call.
The same kind of approach applies to many producer/consumer functions.
For example, tokenize.py could yield the next token instead of invoking
a callback function with it as argument, and tokenize clients could
iterate over the tokens in a natural way: a Python generator is a kind
of Python iterator[1], but of an especially powerful kind.
Specification
A new statement is introduced:
yield_stmt: "yield" expression_list
"yield" is a new keyword, so a future statement[8] is needed to phase
this in. [XXX spell this out]
The yield statement may only be used inside functions. A function that
contains a yield statement is called a generator function.
When a generator function is called, the actual arguments are bound to
function-local formal argument names in the usual way, but no code in
the body of the function is executed. Instead a generator-iterator
object is returned; this conforms to the iterator protocol[6], so in
particular can be used in for-loops in a natural way. Note that when
the intent is clear from context, the unqualified name "generator" may
be used to refer either to a generator-function or a generator-
iterator.
Each time the .next() method of a generator-iterator is invoked, the
code in the body of the generator-function is executed until a yield
or return statement (see below) is encountered, or until the end of
the body is reached.
If a yield statement is encountered, the state of the function is
frozen, and the value of expression_list is returned to .next()'s
caller. By "frozen" we mean that all local state is retained,
including the current bindings of local variables, the instruction
pointer, and the internal evaluation stack: enough information is
saved so that the next time .next() is invoked, the function can
proceed exactly as if the yield statement were just another external
call.
A generator function can also contain return statements of the form:
"return"
Note that an expression_list is not allowed on return statements
in the body of a generator (although, of course, they may appear in
the bodies of non-generator functions nested within the generator).
When a return statement is encountered, nothing is returned, but a
StopIteration exception is raised, signalling that the iterator is
exhausted. The same is true if control flows off the end of the
function. Note that return means "I'm done, and have nothing
interesting to return", for both generator functions and non-generator
functions.
Generators and Exception Propagation
If an unhandled exception-- including, but not limited to,
StopIteration --is raised by, or passes through, a generator function,
then the exception is passed on to the caller in the usual way, and
subsequent attempts to resume the generator function raise
StopIteration. In other words, an unhandled exception terminates a
generator's useful life.
Example (not idiomatic but to illustrate the point):
>>> def f():
... return 1/0
>>> def g():
... yield f() # the zero division exception propagates
... yield 42 # and we'll never get here
>>> k = g()
>>> k.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in g
File "<stdin>", line 2, in f
ZeroDivisionError: integer division or modulo by zero
>>> k.next() # and the generator function cannot be resumed
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
Yield and Try/Except/Finally
While "yield" is a control-flow statement, and in most respects acts
like a "return" statement from the caller's point of view, within a
generator it acts more like a callback function. In particular, it has
no special semantics with respect to try/except/finally. This is best
illustrated by a contrived example; the primary lesson to take from
this is that using yield in a finally block is a dubious idea!
>>> def g():
... try:
... yield 1
... 1/0 # raises exception
... yield 2 # we never get here
... finally:
... yield 3 # yields, and we raise the exception *next* time
... yield 4 # we never get here
>>> k = g()
>>> k.next()
1
>>> k.next()
3
>>> k.next() # as if "yield 3" were a callback, exception raised now
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 4, in g
ZeroDivisionError: integer division or modulo by zero
>>> k.next() # unhandled exception terminated the generator
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
Example
# A binary tree class.
class Tree:
def __init__(self, label, left=None, right=None):
self.label = label
self.left = left
self.right = right
def __repr__(self, level=0, indent=" "):
s = level*indent + `self.label`
if self.left:
s = s + "\n" + self.left.__repr__(level+1, indent)
if self.right:
s = s + "\n" + self.right.__repr__(level+1, indent)
return s
def __iter__(self):
return inorder(self)
# Create a Tree from a list.
def tree(list):
n = len(list)
if n == 0:
return []
i = n / 2
return Tree(list[i], tree(list[:i]), tree(list[i+1:]))
# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
# Show it off: create a tree.
t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
# Print the nodes of the tree in in-order.
for x in t:
print x,
print
# A non-recursive generator.
def inorder(node):
stack = []
while node:
while node.left:
stack.append(node)
node = node.left
yield node.label
while not node.right:
try:
node = stack.pop()
except IndexError:
return
yield node.label
node = node.right
# Exercise the non-recursive generator.
for x in t:
print x,
print
Q & A
Q. Why not a new keyword instead of reusing "def"?
A. See BDFL Pronouncements section below.
Q. Why a new keyword for "yield"? Why not a builtin function instead?
A. Control flow is much better expressed via keyword in Python, and
yield is a control construct. It's also believed that efficient
implementation in Jython requires that the compiler be able to
determine potential suspension points at compile-time, and a new
keyword makes that easy.
Q. Why allow "return" at all? Why not force termination to be spelled
"raise StopIteration"?
A. The mechanics of StopIteration are low-level details, much like the
mechanics of IndexError in Python 2.1: the implementation needs to
do *something* well-defined under the covers, and Python exposes
these mechanisms for advanced users. That's not an argument for
forcing everyone to work at that level, though. "return" means "I'm
done" in any kind of function, and that's easy to explain and to use.
Q. Then why not allow an expression on "return" too?
A. Perhaps we will someday. In Icon, "return expr" means both "I'm
done", and "but I have one final useful value to return too, and
this is it". At the start, and in the absence of compelling uses
for "return expr", it's simply cleaner to use "yield" exclusively
for delivering values.
BDFL Pronouncements
Issue: Introduce another new keyword (say, "gen" or "generator") in
place of "def", or otherwise alter the syntax, to distinguish
generator-functions from non-generator functions.
Con: In practice (how you think about them), generators *are*
functions, but with the twist that they're resumable. The mechanics of
how they're set up is a comparatively minor technical issue, and
introducing a new keyword would unhelpfully overemphasize the
mechanics of how generators get started (a vital but tiny part of a
generator's life).
Pro: In reality (how you think about them), generator-functions are
actually factory functions that produce generator-iterators as if by
magic. In this respect they're radically different from non-generator
functions, acting more like a constructor than a function, so reusing
"def" is at best confusing. A "yield" statement buried in the body is
not enough warning that the semantics are so different.
BDFL: "def" it stays. No argument on either side is totally
convincing, so I have consulted my language designer's intuition. It
tells me that the syntax proposed in the PEP is exactly right - not too
hot, not too cold. But, like the Oracle at Delphi in Greek mythology,
it doesn't tell me why, so I don't have a rebuttal for the arguments
against the PEP syntax. The best I can come up with (apart from
agreeing with the rebuttals ... already made) is "FUD". If this had
been part of the language from day one, I very much doubt it would have
made Andrew Kuchling's "Python Warts" page.
Reference Implementation
The current implementation, in a preliminary state (no docs and no
focused tests), is part of Python's CVS development tree[9].
Using this requires that you build Python from source.
This was derived from an earlier patch by Neil Schemenauer[7].
Footnotes and References
[1] PEP 234, http://python.sf.net/peps/pep-0234.html
[2] http://www.stackless.com/
[3] PEP 219, http://python.sf.net/peps/pep-0219.html
[4] "Iteration Abstraction in Sather"
Murer , Omohundro, Stoutamire and Szyperski
http://www.icsi.berkeley.edu/~sather/Publications/toplas.html
[5] http://www.cs.arizona.edu/icon/
[6] The concept of iterators is described in PEP 234
http://python.sf.net/peps/pep-0234.html
[7] http://python.ca/nas/python/generator.diff
[8] http://python.sf.net/peps/pep-0236.html
[9] To experiment with this implementation, check out Python from CVS
according to the instructions at
http://sf.net/cvs/?group_id=5470
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: