PEP: 342
Title: Coroutines via Enhanced Generators
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum, Phillip J. Eby
Status: Final
Type: Standards Track
Content-Type: text/plain
Created: 10-May-2005
Python-Version: 2.5
Post-History:

Introduction

    This PEP proposes some enhancements to the API and syntax of
    generators, to make them usable as simple coroutines.  It is
    basically a combination of ideas from these two PEPs, which
    may be considered redundant if this PEP is accepted:

    - PEP 288, Generators Attributes and Exceptions.  The current PEP
      covers its second half, generator exceptions (in fact the
      throw() method name was taken from PEP 288).  PEP 342 replaces
      generator attributes, however, with a concept from an earlier
      revision of PEP 288, the "yield expression".

    - PEP 325, Resource-Release Support for Generators.  PEP 342
      ties up a few loose ends in the PEP 325 spec, to make it suitable
      for actual implementation.

Motivation

    Coroutines are a natural way of expressing many algorithms, such as
    simulations, games, asynchronous I/O, and other forms of event-
    driven programming or co-operative multitasking.  Python's generator
    functions are almost coroutines -- but not quite -- in that they
    allow pausing execution to produce a value, but do not provide for
    values or exceptions to be passed in when execution resumes.  They
    also do not allow execution to be paused within the "try" portion of
    try/finally blocks, and therefore make it difficult for an aborted
    coroutine to clean up after itself.

    Also, generators cannot yield control while other functions are
    executing, unless those functions are themselves expressed as
    generators, and the outer generator is written to yield in response
    to values yielded by the inner generator.  This complicates the
    implementation of even relatively simple use cases like asynchronous
    communications, because calling any functions either requires the
    generator to "block" (i.e. be unable to yield control), or else a
    lot of boilerplate looping code must be added around every needed
    function call.

    However, if it were possible to pass values or exceptions *into* a
    generator at the point where it was suspended, a simple co-routine
    scheduler or "trampoline function" would let coroutines "call" each
    other without blocking -- a tremendous boon for asynchronous
    applications.  Such applications could then write co-routines to
    do non-blocking socket I/O by yielding control to an I/O scheduler
    until data has been sent or becomes available.  Meanwhile, code that
    performs the I/O would simply do something like this:

         data = (yield nonblocking_read(my_socket, nbytes))

    in order to pause execution until the nonblocking_read() coroutine
    produced a value.

    In other words, with a few relatively minor enhancements to the
    language and to the implementation of the generator-iterator type,
    Python will be able to support performing asynchronous operations
    without needing to write the entire application as a series of
    callbacks, and without requiring the use of resource-intensive threads
    for programs that need hundreds or even thousands of co-operatively
    multitasking pseudothreads.  Thus, these enhancements will give
    standard Python many of the benefits of the Stackless Python fork,
    without requiring any significant modification to the CPython core
    or its APIs.  In addition, these enhancements should be readily
    implementable by any Python implementation (such as Jython) that
    already supports generators.

Specification Summary

    By adding a few simple methods to the generator-iterator type, and
    with two minor syntax adjustments, Python developers will be able
    to use generator functions to implement co-routines and other forms
    of co-operative multitasking.  These methods and adjustments are:

    1. Redefine "yield" to be an expression, rather than a statement.
       The current yield statement would become a yield expression
       whose value is thrown away.  A yield expression's value is
       None whenever the generator is resumed by a normal next() call.

    2. Add a new send() method for generator-iterators, which resumes
       the generator and "sends" a value that becomes the result of the
       current yield-expression.  The send() method returns the next
       value yielded by the generator, or raises StopIteration if the
       generator exits without yielding another value.

    3. Add a new throw() method for generator-iterators, which raises
       an exception at the point where the generator was paused, and
       which returns the next value yielded by the generator, raising
       StopIteration if the generator exits without yielding another
       value.  (If the generator does not catch the passed-in exception,
       or raises a different exception, then that exception propagates
       to the caller.)

    4. Add a close() method for generator-iterators, which raises
       GeneratorExit at the point where the generator was paused.  If
       the generator then raises StopIteration (by exiting normally, or
       due to already being closed) or GeneratorExit (by not catching
       the exception), close() returns to its caller.  If the generator
       yields a value, a RuntimeError is raised.  If the generator
       raises any other exception, it is propagated to the caller.
       close() does nothing if the generator has already exited due to
       an exception or normal exit.

    5. Add support to ensure that close() is called when a generator
       iterator is garbage-collected.

    6. Allow "yield" to be used in try/finally blocks, since garbage
       collection or an explicit close() call would now allow the
       finally clause to execute.

    A prototype patch implementing all of these changes against the
    current Python CVS HEAD is available as SourceForge patch #1223381
    (http://python.org/sf/1223381).


Specification: Sending Values into Generators

  New generator method: send(value)

    A new method for generator-iterators is proposed, called send().  It
    takes exactly one argument, which is the value that should be "sent
    in" to the generator.  Calling send(None) is exactly equivalent to
    calling a generator's next() method.  Calling send() with any other
    value is the same, except that the value produced by the generator's
    current yield expression will be different.

    Because generator-iterators begin execution at the top of the
    generator's function body, there is no yield expression to receive
    a value when the generator has just been created.  Therefore,
    calling send() with a non-None argument is prohibited when the
    generator iterator has just started, and a TypeError is raised if
    this occurs (presumably due to a logic error of some kind).  Thus,
    before you can communicate with a coroutine you must first call
    next() or send(None) to advance its execution to the first yield
    expression.

    As with the next() method, the send() method returns the next value
    yielded by the generator-iterator, or raises StopIteration if the
    generator exits normally, or has already exited.  If the generator
    raises an uncaught exception, it is propagated to send()'s caller.

  New syntax: Yield Expressions

    The yield-statement will be allowed to be used on the right-hand
    side of an assignment; in that case it is referred to as
    yield-expression.  The value of this yield-expression is None
    unless send() was called with a non-None argument; see below.

    A yield-expression must always be parenthesized except when it
    occurs at the top-level expression on the right-hand side of an
    assignment.  So

        x = yield 42
        x = yield
        x = 12 + (yield 42)
        x = 12 + (yield)
        foo(yield 42)
        foo(yield)

    are all legal, but

        x = 12 + yield 42
        x = 12 + yield
        foo(yield 42, 12)
        foo(yield, 12)

    are all illegal.  (Some of the edge cases are motivated by the
    current legality of "yield 12, 42".)

    Note that a yield-statement or yield-expression without an
    expression is now legal.  This makes sense: when the information
    flow in the next() call is reversed, it should be possible to
    yield without passing an explicit value ("yield" is of course
    equivalent to "yield None").

    When send(value) is called, the yield-expression that it resumes
    will return the passed-in value.  When next() is called, the resumed
    yield-expression will return None.  If the yield-expression is a
    yield-statement, this returned value is ignored, similar to ignoring
    the value returned by a function call used as a statement.

    In effect, a yield-expression is like an inverted function call; the
    argument to yield is in fact returned (yielded) from the currently
    executing function, and the "return value" of yield is the argument
    passed in via send().

    Note: the syntactic extensions to yield make its use very similar
    to that in Ruby.  This is intentional.  Do note that in Python the
    block passes a value to the generator using "send(EXPR)" rather
    than "return EXPR", and the underlying mechanism whereby control
    is passed between the generator and the block is completely
    different.  Blocks in Python are not compiled into thunks; rather,
    yield suspends execution of the generator's frame.  Some edge
    cases work differently; in Python, you cannot save the block for
    later use, and you cannot test whether there is a block or not.
    (XXX - this stuff about blocks seems out of place now, perhaps
    Guido can edit to clarify.)

Specification: Exceptions and Cleanup

    Let a generator object be the iterator produced by calling a
    generator function.  Below, 'g' always refers to a generator
    object.

  New syntax: yield allowed inside try-finally

    The syntax for generator functions is extended to allow a
    yield-statement inside a try-finally statement.

  New generator method: throw(type, value=None, traceback=None)

    g.throw(type, value, traceback) causes the specified exception to
    be thrown at the point where the generator g is currently
    suspended (i.e. at a yield-statement, or at the start of its
    function body if next() has not been called yet).  If the
    generator catches the exception and yields another value, that is
    the return value of g.throw().  If it doesn't catch the exception,
    the throw() appears to raise the same exception passed it (it
    "falls through").  If the generator raises another exception (this
    includes the StopIteration produced when it returns) that
    exception is raised by the throw() call.  In summary, throw()
    behaves like next() or send(), except it raises an exception at the
    suspension point.  If the generator is already in the closed
    state, throw() just raises the exception it was passed without
    executing any of the generator's code.

    The effect of raising the exception is exactly as if the
    statement:

        raise type, value, traceback

    was executed at the suspension point.  The type argument must
    not be None, and the type and value must be compatible.  If the
    value is not an instance of the type, a new exception instance
    is created using the value, following the same rules that the raise
    statement uses to create an exception instance.  The traceback, if
    supplied, must be a valid Python traceback object, or a TypeError
    occurs.

    Note: The name of the throw() method was selected for several
    reasons.  Raise is a keyword and so cannot be used as a method
    name.  Unlike raise (which immediately raises an exception from the
    current execution point), throw() first resumes the generator, and
    only then raises the exception.  The word throw is suggestive of
    putting the exception in another location, and is already associated
    with exceptions in other languages.

    Alternative method names were considered: resolve(), signal(),
    genraise(), raiseinto(), and flush().  None of these seem to fit
    as well as throw().

  New standard exception: GeneratorExit

    A new standard exception is defined, GeneratorExit, inheriting
    from Exception.  A generator should handle this by re-raising it
    (or just not catching it) or by raising StopIteration.

  New generator method: close()

    g.close() is defined by the following pseudo-code:

        def close(self):
            try:
                self.throw(GeneratorExit)
            except (GeneratorExit, StopIteration):
                pass
            else:
                raise RuntimeError("generator ignored GeneratorExit")
            # Other exceptions are not caught

  New generator method: __del__()

    g.__del__() is a wrapper for g.close().  This will be called when
    the generator object is garbage-collected (in CPython, this is
    when its reference count goes to zero).  If close() raises an
    exception, a traceback for the exception is printed to sys.stderr
    and further ignored; it is not propagated back to the place that
    triggered the garbage collection.  This is consistent with the
    handling of exceptions in __del__() methods on class instances.

    If the generator object participates in a cycle, g.__del__() may
    not be called.  This is the behavior of CPython's current garbage
    collector.  The reason for the restriction is that the GC code
    needs to "break" a cycle at an arbitrary point in order to collect
    it, and from then on no Python code should be allowed to see the
    objects that formed the cycle, as they may be in an invalid state.
    Objects "hanging off" a cycle are not subject to this restriction.

    Note that it is unlikely to see a generator object participate in
    a cycle in practice.  However, storing a generator object in a
    global variable creates a cycle via the generator frame's
    f_globals pointer.  Another way to create a cycle would be to
    store a reference to the generator object in a data structure that
    is passed to the generator as an argument (e.g., if an object has
    a method that's a generator, and keeps a reference to a running
    iterator created by that method).  Neither of these cases
    are very likely given the typical patterns of generator use.

    Also, in the CPython implementation of this PEP, the frame object
    used by the generator should be released whenever its execution is
    terminated due to an error or normal exit.  This will ensure that
    generators that cannot be resumed do not remain part of an
    uncollectable reference cycle.  This allows other code to
    potentially use close() in a try/finally or "with" block (per PEP
    343) to ensure that a given generator is properly finalized.

Optional Extensions

  The Extended 'continue' Statement

     An earlier draft of this PEP proposed a new "continue EXPR"
     syntax for use in for-loops (carried over from PEP 340), that
     would pass the value of EXPR into the iterator being looped over.
     This feature has been withdrawn for the time being, because the
     scope of this PEP has been narrowed to focus only on passing values
     into generator-iterators, and not other kinds of iterators.  It
     was also felt by some on the Python-Dev list that adding new syntax
     for this particular feature would be premature at best.

Open Issues

    Discussion on python-dev has revealed some open issues.  I list
    them here, with my preferred resolution and its motivation.  The
    PEP as currently written reflects this preferred resolution.

    1. What exception should be raised by close() when the generator
       yields another value as a response to the GeneratorExit
       exception?

       I originally chose TypeError because it represents gross
       misbehavior of the generator function, which should be fixed by
       changing the code.  But the with_template decorator class in
       PEP 343 uses RuntimeError for similar offenses.  Arguably they
       should all use the same exception.  I'd rather not introduce a
       new exception class just for this purpose, since it's not an
       exception that I want people to catch: I want it to turn into a
       traceback which is seen by the programmer who then fixes the
       code.  So now I believe they should both raise RuntimeError.
       There are some precedents for that: it's raised by the core
       Python code in situations where endless recursion is detected,
       and for uninitialized objects (and for a variety of
       miscellaneous conditions).

    2. Oren Tirosh has proposed renaming the send() method to feed(),
       for compatibility with the "consumer interface" (see
       http://effbot.org/zone/consumer.htm for the specification.)

       However, looking more closely at the consumer interface, it seems
       that the desired semantics for feed() are different than for
       send(), because send() can't be meaningfully called on a just-
       started generator.  Also, the consumer interface as currently
       defined doesn't include handling for StopIteration.

       Therefore, it seems like it would probably be more useful to
       create a simple decorator that wraps a generator function to make
       it conform to the consumer interface.  For example, it could
       "warm up" the generator with an initial next() call, trap
       StopIteration, and perhaps even provide reset() by re-invoking
       the generator function.

Examples

    1. A simple "consumer" decorator that makes a generator function
       automatically advance to its first yield point when initially
       called:

        def consumer(func):
            def wrapper(*args,**kw):
                gen = func(*args, **kw)
                gen.next()
                return gen
            wrapper.__name__ = func.__name__
            wrapper.__dict__ = func.__dict__
            wrapper.__doc__  = func.__doc__
            return wrapper

    2. An example of using the "consumer" decorator to create a
       "reverse generator" that receives images and creates thumbnail
       pages, sending them on to another consumer.  Functions like
       this can be chained together to form efficient processing
       pipelines of "consumers" that each can have complex internal
       state:

        @consumer
        def thumbnail_pager(pagesize, thumbsize, destination):
            while True:
                page = new_image(pagesize)
                rows, columns = pagesize / thumbsize
                pending = False
                try:
                    for row in xrange(rows):
                        for column in xrange(columns):
                            thumb = create_thumbnail((yield), thumbsize)
                            page.write(
                                thumb, col*thumbsize.x, row*thumbsize.y
                            )
                            pending = True
                except GeneratorExit:
                    # close() was called, so flush any pending output
                    if pending:
                        destination.send(page)

                    # then close the downstream consumer, and exit
                    destination.close()
                    return
                else:
                    # we finished a page full of thumbnails, so send it
                    # downstream and keep on looping
                    destination.send(page)

        @consumer
        def jpeg_writer(dirname):
            fileno = 1
            while True:
                filename = os.path.join(dirname,"page%04d.jpg" % fileno)
                write_jpeg((yield), filename)
                fileno += 1


        # Put them together to make a function that makes thumbnail
        # pages from a list of images and other parameters.      
        #
        def write_thumbnails(pagesize, thumbsize, images, output_dir):
            pipeline = thumbnail_pager(
                pagesize, thumbsize, jpeg_writer(output_dir)
            )

            for image in images:
                pipeline.send(image)

            pipeline.close()

    3. A simple co-routine scheduler or "trampoline" that lets
       coroutines "call" other coroutines by yielding the coroutine
       they wish to invoke.  Any non-generator value yielded by
       a coroutine is returned to the coroutine that "called" the
       one yielding the value.  Similarly, if a coroutine raises an
       exception, the exception is propagated to its "caller".  In
       effect, this example emulates simple tasklets as are used
       in Stackless Python, as long as you use a yield expression to
       invoke routines that would otherwise "block".  This is only
       a very simple example, and far more sophisticated schedulers
       are possible.  (For example, the existing GTasklet framework
       for Python (http://www.gnome.org/~gjc/gtasklet/gtasklets.html)
       and the peak.events framework (http://peak.telecommunity.com/)
       already implement similar scheduling capabilities, but must
       currently use awkward workarounds for the inability to pass
       values or exceptions into generators.)

        import collections

        class Trampoline:
            """Manage communications between coroutines"""

            running = False

            def __init__(self):
                self.queue = collections.deque()

            def add(self, coroutine):
                """Request that a coroutine be executed"""
                self.schedule(coroutine)

            def run(self):
                result = None
                self.running = True
                try:
                    while self.running and self.queue:
                        func = self.queue.popleft()
                        result = func()
                    return result
                finally:
                    self.running = False

            def stop(self):
                self.running = False

            def schedule(self, coroutine, stack=(), value=None, *exc):
                def resume():
                    try:
                        if exc:
                            value = coroutine.throw(value,*exc)
                        else:
                            value = coroutine.send(value)
                    except:
                        if stack:
                            # send the error back to the "caller"
                            self.schedule(
                                stack[0], stack[1], *sys.exc_info()
                            )
                        else:
                            # Nothing left in this pseudothread to
                            # handle it, let it propagate to the
                            # run loop
                            raise

                    if isinstance(value, types.GeneratorType):
                        # Yielded to a specific coroutine, push the
                        # current one on the stack, and call the new
                        # one with no args
                        self.schedule(value, (coroutine,stack))

                    elif stack:
                        # Yielded a result, pop the stack and send the
                        # value to the caller
                        self.schedule(stack[0], stack[1], value)

                    # else: this pseudothread has ended

                self.queue.append(resume)

    4. A simple "echo" server, and code to run it using a trampoline
       (presumes the existence of "nonblocking_read",
       "nonblocking_write", and other I/O coroutines, that e.g. raise
       ConnectionLost if the connection is closed):

           # coroutine function that echos data back on a connected
           # socket
           #
           def echo_handler(sock):
               while True:
                   try:
                       data = yield nonblocking_read(sock)
                       yield nonblocking_write(sock, data)
                   except ConnectionLost:
                       pass  # exit normally if connection lost

           # coroutine function that listens for connections on a
           # socket, and then launches a service "handler" coroutine
           # to service the connection
           #
           def listen_on(trampoline, sock, handler):
               while True:
                   # get the next incoming connection
                   connected_socket = yield nonblocking_accept(sock)

                   # start another coroutine to handle the connection
                   trampoline.add( handler(connected_socket) )

           # Create a scheduler to manage all our coroutines
           t = Trampoline()

           # Create a coroutine instance to run the echo_handler on
           # incoming connections
           #
           server = listen_on(
               t, listening_socket("localhost","echo"), echo_handler
           )

           # Add the coroutine to the scheduler
           t.add(server)

           # loop forever, accepting connections and servicing them
           # "in parallel"
           #
           t.run()


Reference Implementation

    A prototype patch implementing all of the features described in this
    PEP is available as SourceForge patch #1223381
    (http://python.org/sf/1223381).

    This patch was committed to CVS 01-02 August 2005.


Acknowledgements

    Raymond Hettinger (PEP 288) and Samuele Pedroni (PEP 325) first
    formally proposed the ideas of communicating values or exceptions
    into generators, and the ability to "close" generators.  Timothy
    Delaney suggested the title of this PEP, and Steven Bethard helped
    edit a previous version.  See also the Acknowledgements section
    of PEP 340.

References

    TBD.

Copyright

    This document has been placed in the public domain.