python-peps/pep-0342.txt

594 lines
26 KiB
Plaintext

PEP: 342
Title: Coroutines via Enhanced Generators
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum, Phillip J. Eby
Status: Final
Type: Standards Track
Content-Type: text/plain
Created: 10-May-2005
Post-History:
Introduction
This PEP proposes some enhancements to the API and syntax of
generators, to make them usable as simple coroutines. It is
basically a combination of ideas from these two PEPs, which
may be considered redundant if this PEP is accepted:
- PEP 288, Generators Attributes and Exceptions. The current PEP
covers its second half, generator exceptions (in fact the
throw() method name was taken from PEP 288). PEP 342 replaces
generator attributes, however, with a concept from an earlier
revision of PEP 288, the "yield expression".
- PEP 325, Resource-Release Support for Generators. PEP 342
ties up a few loose ends in the PEP 325 spec, to make it suitable
for actual implementation.
Motivation
Coroutines are a natural way of expressing many algorithms, such as
simulations, games, asynchronous I/O, and other forms of event-
driven programming or co-operative multitasking. Python's generator
functions are almost coroutines -- but not quite -- in that they
allow pausing execution to produce a value, but do not provide for
values or exceptions to be passed in when execution resumes. They
also do not allow execution to be paused within the "try" portion of
try/finally blocks, and therefore make it difficult for an aborted
coroutine to clean up after itself.
Also, generators cannot yield control while other functions are
executing, unless those functions are themselves expressed as
generators, and the outer generator is written to yield in response
to values yielded by the inner generator. This complicates the
implementation of even relatively simple use cases like asynchronous
communications, because calling any functions either requires the
generator to "block" (i.e. be unable to yield control), or else a
lot of boilerplate looping code must be added around every needed
function call.
However, if it were possible to pass values or exceptions *into* a
generator at the point where it was suspended, a simple co-routine
scheduler or "trampoline function" would let coroutines "call" each
other without blocking -- a tremendous boon for asynchronous
applications. Such applications could then write co-routines to
do non-blocking socket I/O by yielding control to an I/O scheduler
until data has been sent or becomes available. Meanwhile, code that
performs the I/O would simply do something like this:
data = (yield nonblocking_read(my_socket, nbytes))
in order to pause execution until the nonblocking_read() coroutine
produced a value.
In other words, with a few relatively minor enhancements to the
language and to the implementation of the generator-iterator type,
Python will be able to support performing asynchronous operations
without needing to write the entire application as a series of
callbacks, and without requiring the use of resource-intensive threads
for programs that need hundreds or even thousands of co-operatively
multitasking pseudothreads. Thus, these enhancements will give
standard Python many of the benefits of the Stackless Python fork,
without requiring any significant modification to the CPython core
or its APIs. In addition, these enhancements should be readily
implementable by any Python implementation (such as Jython) that
already supports generators.
Specification Summary
By adding a few simple methods to the generator-iterator type, and
with two minor syntax adjustments, Python developers will be able
to use generator functions to implement co-routines and other forms
of co-operative multitasking. These methods and adjustments are:
1. Redefine "yield" to be an expression, rather than a statement.
The current yield statement would become a yield expression
whose value is thrown away. A yield expression's value is
None whenever the generator is resumed by a normal next() call.
2. Add a new send() method for generator-iterators, which resumes
the generator and "sends" a value that becomes the result of the
current yield-expression. The send() method returns the next
value yielded by the generator, or raises StopIteration if the
generator exits without yielding another value.
3. Add a new throw() method for generator-iterators, which raises
an exception at the point where the generator was paused, and
which returns the next value yielded by the generator, raising
StopIteration if the generator exits without yielding another
value. (If the generator does not catch the passed-in exception,
or raises a different exception, then that exception propagates
to the caller.)
4. Add a close() method for generator-iterators, which raises
GeneratorExit at the point where the generator was paused. If
the generator then raises StopIteration (by exiting normally, or
due to already being closed) or GeneratorExit (by not catching
the exception), close() returns to its caller. If the generator
yields a value, a RuntimeError is raised. If the generator
raises any other exception, it is propagated to the caller.
close() does nothing if the generator has already exited due to
an exception or normal exit.
5. Add support to ensure that close() is called when a generator
iterator is garbage-collected.
6. Allow "yield" to be used in try/finally blocks, since garbage
collection or an explicit close() call would now allow the
finally clause to execute.
A prototype patch implementing all of these changes against the
current Python CVS HEAD is available as SourceForge patch #1223381
(http://python.org/sf/1223381).
Specification: Sending Values into Generators
New generator method: send(value)
A new method for generator-iterators is proposed, called send(). It
takes exactly one argument, which is the value that should be "sent
in" to the generator. Calling send(None) is exactly equivalent to
calling a generator's next() method. Calling send() with any other
value is the same, except that the value produced by the generator's
current yield expression will be different.
Because generator-iterators begin execution at the top of the
generator's function body, there is no yield expression to receive
a value when the generator has just been created. Therefore,
calling send() with a non-None argument is prohibited when the
generator iterator has just started, and a TypeError is raised if
this occurs (presumably due to a logic error of some kind). Thus,
before you can communicate with a coroutine you must first call
next() or send(None) to advance its execution to the first yield
expression.
As with the next() method, the send() method returns the next value
yielded by the generator-iterator, or raises StopIteration if the
generator exits normally, or has already exited. If the generator
raises an uncaught exception, it is propagated to send()'s caller.
New syntax: Yield Expressions
The yield-statement will be allowed to be used on the right-hand
side of an assignment; in that case it is referred to as
yield-expression. The value of this yield-expression is None
unless send() was called with a non-None argument; see below.
A yield-expression must always be parenthesized except when it
occurs at the top-level expression on the right-hand side of an
assignment. So
x = yield 42
x = yield
x = 12 + (yield 42)
x = 12 + (yield)
foo(yield 42)
foo(yield)
are all legal, but
x = 12 + yield 42
x = 12 + yield
foo(yield 42, 12)
foo(yield, 12)
are all illegal. (Some of the edge cases are motivated by the
current legality of "yield 12, 42".)
Note that a yield-statement or yield-expression without an
expression is now legal. This makes sense: when the information
flow in the next() call is reversed, it should be possible to
yield without passing an explicit value ("yield" is of course
equivalent to "yield None").
When send(value) is called, the yield-expression that it resumes
will return the passed-in value. When next() is called, the resumed
yield-expression will return None. If the yield-expression is a
yield-statement, this returned value is ignored, similar to ignoring
the value returned by a function call used as a statement.
In effect, a yield-expression is like an inverted function call; the
argument to yield is in fact returned (yielded) from the currently
executing function, and the "return value" of yield is the argument
passed in via send().
Note: the syntactic extensions to yield make its use very similar
to that in Ruby. This is intentional. Do note that in Python the
block passes a value to the generator using "send(EXPR)" rather
than "return EXPR", and the underlying mechanism whereby control
is passed between the generator and the block is completely
different. Blocks in Python are not compiled into thunks; rather,
yield suspends execution of the generator's frame. Some edge
cases work differently; in Python, you cannot save the block for
later use, and you cannot test whether there is a block or not.
(XXX - this stuff about blocks seems out of place now, perhaps
Guido can edit to clarify.)
Specification: Exceptions and Cleanup
Let a generator object be the iterator produced by calling a
generator function. Below, 'g' always refers to a generator
object.
New syntax: yield allowed inside try-finally
The syntax for generator functions is extended to allow a
yield-statement inside a try-finally statement.
New generator method: throw(type, value=None, traceback=None)
g.throw(type, value, traceback) causes the specified exception to
be thrown at the point where the generator g is currently
suspended (i.e. at a yield-statement, or at the start of its
function body if next() has not been called yet). If the
generator catches the exception and yields another value, that is
the return value of g.throw(). If it doesn't catch the exception,
the throw() appears to raise the same exception passed it (it
"falls through"). If the generator raises another exception (this
includes the StopIteration produced when it returns) that
exception is raised by the throw() call. In summary, throw()
behaves like next() or send(), except it raises an exception at the
suspension point. If the generator is already in the closed
state, throw() just raises the exception it was passed without
executing any of the generator's code.
The effect of raising the exception is exactly as if the
statement:
raise type, value, traceback
was executed at the suspension point. The type argument must
not be None, and the type and value must be compatible. If the
value is not an instance of the type, a new exception instance
is created using the value, following the same rules that the raise
statement uses to create an exception instance. The traceback, if
supplied, must be a valid Python traceback object, or a TypeError
occurs.
Note: The name of the throw() method was selected for several
reasons. Raise is a keyword and so cannot be used as a method
name. Unlike raise (which immediately raises an exception from the
current execution point), throw() first resumes the generator, and
only then raises the exception. The word throw is suggestive of
putting the exception in another location, and is already associated
with exceptions in other languages.
Alternative method names were considered: resolve(), signal(),
genraise(), raiseinto(), and flush(). None of these seem to fit
as well as throw().
New standard exception: GeneratorExit
A new standard exception is defined, GeneratorExit, inheriting
from Exception. A generator should handle this by re-raising it
(or just not catching it) or by raising StopIteration.
New generator method: close()
g.close() is defined by the following pseudo-code:
def close(self):
try:
self.throw(GeneratorExit)
except (GeneratorExit, StopIteration):
pass
else:
raise RuntimeError("generator ignored GeneratorExit")
# Other exceptions are not caught
New generator method: __del__()
g.__del__() is a wrapper for g.close(). This will be called when
the generator object is garbage-collected (in CPython, this is
when its reference count goes to zero). If close() raises an
exception, a traceback for the exception is printed to sys.stderr
and further ignored; it is not propagated back to the place that
triggered the garbage collection. This is consistent with the
handling of exceptions in __del__() methods on class instances.
If the generator object participates in a cycle, g.__del__() may
not be called. This is the behavior of CPython's current garbage
collector. The reason for the restriction is that the GC code
needs to "break" a cycle at an arbitrary point in order to collect
it, and from then on no Python code should be allowed to see the
objects that formed the cycle, as they may be in an invalid state.
Objects "hanging off" a cycle are not subject to this restriction.
Note that it is unlikely to see a generator object participate in
a cycle in practice. However, storing a generator object in a
global variable creates a cycle via the generator frame's
f_globals pointer. Another way to create a cycle would be to
store a reference to the generator object in a data structure that
is passed to the generator as an argument (e.g., if an object has
a method that's a generator, and keeps a reference to a running
iterator created by that method). Neither of these cases
are very likely given the typical patterns of generator use.
Also, in the CPython implementation of this PEP, the frame object
used by the generator should be released whenever its execution is
terminated due to an error or normal exit. This will ensure that
generators that cannot be resumed do not remain part of an
uncollectable reference cycle. This allows other code to
potentially use close() in a try/finally or "with" block (per PEP
343) to ensure that a given generator is properly finalized.
Optional Extensions
The Extended 'continue' Statement
An earlier draft of this PEP proposed a new "continue EXPR"
syntax for use in for-loops (carried over from PEP 340), that
would pass the value of EXPR into the iterator being looped over.
This feature has been withdrawn for the time being, because the
scope of this PEP has been narrowed to focus only on passing values
into generator-iterators, and not other kinds of iterators. It
was also felt by some on the Python-Dev list that adding new syntax
for this particular feature would be premature at best.
Open Issues
Discussion on python-dev has revealed some open issues. I list
them here, with my preferred resolution and its motivation. The
PEP as currently written reflects this preferred resolution.
1. What exception should be raised by close() when the generator
yields another value as a response to the GeneratorExit
exception?
I originally chose TypeError because it represents gross
misbehavior of the generator function, which should be fixed by
changing the code. But the with_template decorator class in
PEP 343 uses RuntimeError for similar offenses. Arguably they
should all use the same exception. I'd rather not introduce a
new exception class just for this purpose, since it's not an
exception that I want people to catch: I want it to turn into a
traceback which is seen by the programmer who then fixes the
code. So now I believe they should both raise RuntimeError.
There are some precedents for that: it's raised by the core
Python code in situations where endless recursion is detected,
and for uninitialized objects (and for a variety of
miscellaneous conditions).
2. Oren Tirosh has proposed renaming the send() method to feed(),
for compatibility with the "consumer interface" (see
http://effbot.org/zone/consumer.htm for the specification.)
However, looking more closely at the consumer interface, it seems
that the desired semantics for feed() are different than for
send(), because send() can't be meaningfully called on a just-
started generator. Also, the consumer interface as currently
defined doesn't include handling for StopIteration.
Therefore, it seems like it would probably be more useful to
create a simple decorator that wraps a generator function to make
it conform to the consumer interface. For example, it could
"warm up" the generator with an initial next() call, trap
StopIteration, and perhaps even provide reset() by re-invoking
the generator function.
Examples
1. A simple "consumer" decorator that makes a generator function
automatically advance to its first yield point when initially
called:
def consumer(func):
def wrapper(*args,**kw):
gen = func(*args, **kw)
gen.next()
return gen
wrapper.__name__ = func.__name__
wrapper.__dict__ = func.__dict__
wrapper.__doc__ = func.__doc__
return wrapper
2. An example of using the "consumer" decorator to create a
"reverse generator" that receives images and creates thumbnail
pages, sending them on to another consumer. Functions like
this can be chained together to form efficient processing
pipelines of "consumers" that each can have complex internal
state:
@consumer
def thumbnail_pager(pagesize, thumbsize, destination):
while True:
page = new_image(pagesize)
rows, columns = pagesize / thumbsize
pending = False
try:
for row in xrange(rows):
for column in xrange(columns):
thumb = create_thumbnail((yield), thumbsize)
page.write(
thumb, col*thumbsize.x, row*thumbsize.y
)
pending = True
except GeneratorExit:
# close() was called, so flush any pending output
if pending:
destination.send(page)
# then close the downstream consumer, and exit
destination.close()
return
else:
# we finished a page full of thumbnails, so send it
# downstream and keep on looping
destination.send(page)
@consumer
def jpeg_writer(dirname):
fileno = 1
while True:
filename = os.path.join(dirname,"page%04d.jpg" % fileno)
write_jpeg((yield), filename)
fileno += 1
# Put them together to make a function that makes thumbnail
# pages from a list of images and other parameters.
#
def write_thumbnails(pagesize, thumbsize, images, output_dir):
pipeline = thumbnail_pager(
pagesize, thumbsize, jpeg_writer(output_dir)
)
for image in images:
pipeline.send(image)
pipeline.close()
3. A simple co-routine scheduler or "trampoline" that lets
coroutines "call" other coroutines by yielding the coroutine
they wish to invoke. Any non-generator value yielded by
a coroutine is returned to the coroutine that "called" the
one yielding the value. Similarly, if a coroutine raises an
exception, the exception is propagated to its "caller". In
effect, this example emulates simple tasklets as are used
in Stackless Python, as long as you use a yield expression to
invoke routines that would otherwise "block". This is only
a very simple example, and far more sophisticated schedulers
are possible. (For example, the existing GTasklet framework
for Python (http://www.gnome.org/~gjc/gtasklet/gtasklets.html)
and the peak.events framework (http://peak.telecommunity.com/)
already implement similar scheduling capabilities, but must
currently use awkward workarounds for the inability to pass
values or exceptions into generators.)
import collections
class Trampoline:
"""Manage communications between coroutines"""
running = False
def __init__(self):
self.queue = collections.deque()
def add(self, coroutine):
"""Request that a coroutine be executed"""
self.schedule(coroutine)
def run(self):
result = None
self.running = True
try:
while self.running and self.queue:
func = self.queue.popleft()
result = func()
return result
finally:
self.running = False
def stop(self):
self.running = False
def schedule(self, coroutine, stack=(), value=None, *exc):
def resume():
try:
if exc:
value = coroutine.throw(value,*exc)
else:
value = coroutine.send(value)
except:
if stack:
# send the error back to the "caller"
self.schedule(
stack[0], stack[1], *sys.exc_info()
)
else:
# Nothing left in this pseudothread to
# handle it, let it propagate to the
# run loop
raise
if isinstance(value, types.GeneratorType):
# Yielded to a specific coroutine, push the
# current one on the stack, and call the new
# one with no args
self.schedule(value, (coroutine,stack))
elif stack:
# Yielded a result, pop the stack and send the
# value to the caller
self.schedule(stack[0], stack[1], value)
# else: this pseudothread has ended
self.queue.append(resume)
4. A simple "echo" server, and code to run it using a trampoline
(presumes the existence of "nonblocking_read",
"nonblocking_write", and other I/O coroutines, that e.g. raise
ConnectionLost if the connection is closed):
# coroutine function that echos data back on a connected
# socket
#
def echo_handler(sock):
while True:
try:
data = yield nonblocking_read(sock)
yield nonblocking_write(sock, data)
except ConnectionLost:
pass # exit normally if connection lost
# coroutine function that listens for connections on a
# socket, and then launches a service "handler" coroutine
# to service the connection
#
def listen_on(trampoline, sock, handler):
while True:
# get the next incoming connection
connected_socket = yield nonblocking_accept(sock)
# start another coroutine to handle the connection
trampoline.add( handler(connected_socket) )
# Create a scheduler to manage all our coroutines
t = Trampoline()
# Create a coroutine instance to run the echo_handler on
# incoming connections
#
server = listen_on(
t, listening_socket("localhost","echo"), echo_handler
)
# Add the coroutine to the scheduler
t.add(server)
# loop forever, accepting connections and servicing them
# "in parallel"
#
t.run()
Reference Implementation
A prototype patch implementing all of the features described in this
PEP is available as SourceForge patch #1223381
(http://python.org/sf/1223381).
This patch was commited to CVS 01-02 August 2005.
Acknowledgements
Raymond Hettinger (PEP 288) and Samuele Pedroni (PEP 325) first
formally proposed the ideas of communicating values or exceptions
into generators, and the ability to "close" generators. Timothy
Delaney suggested the title of this PEP, and Steven Bethard helped
edit a previous version. See also the Acknowledgements section
of PEP 340.
References
TBD.
Copyright
This document has been placed in the public domain.