Magnus Lie Hetland's first version of this pep.
This commit is contained in:
parent
6426c51912
commit
80e6bc7e2e
|
@ -0,0 +1,238 @@
|
|||
PEP: 255
|
||||
Title: Simple Generators
|
||||
Version: $Revision$
|
||||
Author: nas@python.ca (Neil Schemenauer),
|
||||
tim.one@home.com (Tim Peters),
|
||||
magnus@hetland.org (Magnus Lie Hetland)
|
||||
Discussion-To: python-iterators@lists.sourceforge.net
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Requires: 234
|
||||
Created: 18-May-2001
|
||||
Python-Version: 2.2
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP introduces the concept of generators to Python, as well
|
||||
as a new statement used in conjunction with them, the "yield"
|
||||
statement.
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
When a producer function has a hard enough job that it requires
|
||||
maintaining state between values produced, most programming languages
|
||||
offer no pleasant and efficient solution beyond adding a callback
|
||||
function to the producer's argument list, to be called with each value
|
||||
produced.
|
||||
|
||||
For example, tokenize.py in the standard library takes this approach:
|
||||
the caller must pass a "tokeneater" function to tokenize(), called
|
||||
whenever tokenize() finds the next token. This allows tokenize to be
|
||||
coded in a natural way, but programs calling tokenize are typically
|
||||
convoluted by the need to remember between callbacks which token(s)
|
||||
were seen last. The tokeneater function in tabnanny.py is a good
|
||||
example of that, maintaining a state machine in global variables, to
|
||||
remember across callbacks what it has already seen and what it hopes to
|
||||
see next. This was difficult to get working correctly, and is still
|
||||
difficult for people to understand. Unfortunately, that's typical of
|
||||
this approach.
|
||||
|
||||
An alternative would have been for tokenize to produce an entire parse
|
||||
of the Python program at once, in a large list. Then tokenize clients
|
||||
could be written in a natural way, using local variables and local
|
||||
control flow (such as loops and nested if statements) to keep track of
|
||||
their state. But this isn't practical: programs can be very large, so
|
||||
no a priori bound can be placed on the memory needed to materialize the
|
||||
whole parse; and some tokenize clients only want to see whether
|
||||
something specific appears early in the program (e.g., a future
|
||||
statement, or, as is done in IDLE, just the first indented statement),
|
||||
and then parsing the whole program first is a severe waste of time.
|
||||
|
||||
Another alternative would be to make tokenize an iterator[1],
|
||||
delivering the next token whenever its .next() method is invoked. This
|
||||
is pleasant for the caller in the same way a large list of results
|
||||
would be, but without the memory and "what if I want to get out early?"
|
||||
drawbacks. However, this shifts the burden on tokenize to remember
|
||||
*its* state between .next() invocations, and the reader need only
|
||||
glance at tokenize.tokenize_loop() to realize what a horrid chore that
|
||||
would be. Or picture a recursive algorithm for producing the nodes of
|
||||
a general tree structure: to cast that into an iterator framework
|
||||
requires removing the recursion manually and maintaining the state of
|
||||
the traversal by hand.
|
||||
|
||||
A fourth option is to run the producer and consumer in separate
|
||||
threads. This allows both to maintain their states in natural ways,
|
||||
and so is pleasant for both. Indeed, Demo/threads/Generator.py in the
|
||||
Python source distribution provides a usable synchronized-communication
|
||||
class for doing that in a general way. This doesn't work on platforms
|
||||
without threads, though, and is very slow on platforms that do
|
||||
(compared to what is achievable without threads).
|
||||
|
||||
A final option is to use the Stackless[2][3] variant implementation of
|
||||
Python instead, which supports lightweight coroutines. This has much
|
||||
the same programmatic benefits as the thread option, but is much more
|
||||
efficient. However, Stackless is a radical and controversial
|
||||
rethinking of the Python core, and it may not be possible for Jython to
|
||||
implement the same semantics. This PEP isn't the place to debate that,
|
||||
so suffice it to say here that generators provide a useful subset of
|
||||
Stackless functionality in a way that fits easily into the current
|
||||
Python implementation.
|
||||
|
||||
That exhausts the current alternatives. Some other high-level
|
||||
languages provide pleasant solutions, notably iterators in Sather[4],
|
||||
which were inspired by iterators in CLU; and generators in Icon[5], a
|
||||
novel language where every expression "is a generator". There are
|
||||
differences among these, but the basic idea is the same: provide a
|
||||
kind of function that can return an intermediate result ("the next
|
||||
value") to its caller, but maintaining the function's local state so
|
||||
that the function can be resumed again right where it left off. A
|
||||
very simple example:
|
||||
|
||||
def fib(): a, b = 0, 1
|
||||
while 1:
|
||||
yield b
|
||||
a, b = b, a+b
|
||||
|
||||
When fib() is first invoked, it sets a to 0 and b to 1, then yields b
|
||||
back to its caller. The caller sees 1. When fib is resumed, from its
|
||||
point of view the yield statement is really the same as, say, a print
|
||||
statement: fib continues after the yield with all local state intact.
|
||||
a and b then become 1 and 1, and fib loops back to the yield, yielding
|
||||
1 to its invoker. And so on. From fib's point of view it's just
|
||||
delivering a sequence of results, as if via callback. But from its
|
||||
caller's point of view, the fib invocation is an iterable object that
|
||||
can be resumed at will. As in the thread approach, this allows both
|
||||
sides to be coded in the most natural ways; but unlike the thread
|
||||
approach, this can be done efficiently and on all platforms. Indeed,
|
||||
resuming a generator should be no more expensive than a function call.
|
||||
|
||||
The same kind of approach applies to many producer/consumer functions.
|
||||
For example, tokenize.py could yield the next token instead of invoking
|
||||
a callback function with it as argument, and tokenize clients could
|
||||
iterate over the tokens in a natural way: a Python generator is a kind
|
||||
of Python iterator[1], but of an especially powerful kind.
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
A new statement, the "yield" statement, is introduced:
|
||||
|
||||
yield_stmt: "yield" [expression_list]
|
||||
|
||||
This statement may only be used inside functions. A function which
|
||||
contains a yield statement is a so-called "generator function". A
|
||||
generator function may not contain return statements of the form:
|
||||
|
||||
"return" expression_list
|
||||
|
||||
It may, however, contain return statements of the form:
|
||||
|
||||
"return"
|
||||
|
||||
When a generator function is called, an iterator[6] is returned.
|
||||
Each time the .next() method of this iterator is called, the code
|
||||
in the body of the generator function is executed until a yield
|
||||
statement or a return statement is encountered, or until the end
|
||||
of the body is reached.
|
||||
|
||||
If a yield statement is encountered during this execution, the
|
||||
state of the function is frozen, and a value is returned to the
|
||||
object calling .next(). If an empty yield statement was
|
||||
encountered, None is returned; otherwise, the given expression(s)
|
||||
is (are) returned.
|
||||
|
||||
If an empty return statement is encountered, nothing is returned;
|
||||
however, a StopIteration exception is raised, signalling that the
|
||||
iterator is exhausted.
|
||||
|
||||
An example of how generators may be used is given below:
|
||||
|
||||
# A binary tree class
|
||||
class Tree:
|
||||
def __init__(self, label, left=None, right=None):
|
||||
self.label = label
|
||||
self.left = left
|
||||
self.right = right
|
||||
def __repr__(self, level=0, indent=" "):
|
||||
s = level*indent + `self.label`
|
||||
if self.left:
|
||||
s = s + "\n" + self.left.__repr__(level+1, indent)
|
||||
if self.right:
|
||||
s = s + "\n" + self.right.__repr__(level+1, indent)
|
||||
return s
|
||||
def __iter__(self):
|
||||
return inorder(self)
|
||||
|
||||
# A function that creates a tree from a list
|
||||
def tree(list):
|
||||
if not len(list):
|
||||
return []
|
||||
i = len(list)/2
|
||||
return Tree(list[i], tree(list[:i]), tree(list[i+1:]))
|
||||
|
||||
# A recursive generator that generates the tree leaves in in-order
|
||||
def inorder(t):
|
||||
if t:
|
||||
for x in inorder(t.left): yield x
|
||||
yield t.label
|
||||
for x in inorder(t.right): yield x
|
||||
|
||||
# Show it off: create a tree
|
||||
t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
|
||||
# Print the nodes of the tree in in-order
|
||||
for x in t: print x,
|
||||
print
|
||||
|
||||
# A non-recursive generator.
|
||||
def inorder(node):
|
||||
stack = []
|
||||
while node:
|
||||
while node.left:
|
||||
stack.append(node)
|
||||
node = node.left
|
||||
yield node.label
|
||||
while not node.right:
|
||||
try:
|
||||
node = stack.pop()
|
||||
except IndexError:
|
||||
return
|
||||
yield node.label
|
||||
node = node.right
|
||||
|
||||
# Exercise the non-recursive generator
|
||||
for x in t: print x,
|
||||
print
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
||||
A preliminary patch against the CVS Python source is available[7].
|
||||
|
||||
|
||||
Footnotes and References
|
||||
|
||||
[1] PEP 234, http://python.sourceforge.net/peps/pep-0234.html
|
||||
[2] http://www.stackless.com/
|
||||
[3] PEP 219, http://python.sourceforge.net/peps/pep-0219.html
|
||||
[4] "Iteration Abstraction in Sather"
|
||||
Murer , Omohundro, Stoutamire and Szyperski
|
||||
http://www.icsi.berkeley.edu/~sather/Publications/toplas.html
|
||||
[5] http://www.cs.arizona.edu/icon/
|
||||
[6] The concept of iterators is described in PEP 234
|
||||
http://python.sourceforge.net/peps/pep-0234.html
|
||||
[7] http://python.ca/nas/python/generator.diff
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
End:
|
Loading…
Reference in New Issue