python-peps/pep-0380.txt

348 lines
11 KiB
Plaintext
Raw Normal View History

PEP: 380
Title: Syntax for Delegating to a Subgenerator
Version: $Revision$
Last-Modified: $Date$
Author: Gregory Ewing <greg.ewing@canterbury.ac.nz>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Feb-2009
Python-Version: 2.7
Post-History:
Abstract
========
A syntax is proposed for a generator to delegate part of its
operations to another generator. This allows a section of code
containing 'yield' to be factored out and placed in another
generator. Additionally, the subgenerator is allowed to return with a
value, and the value is made available to the delegating generator.
The new syntax also opens up some opportunities for optimisation when
one generator re-yields values produced by another.
Proposal
========
The following new expression syntax will be allowed in the body of a
generator:
::
yield from <expr>
where <expr> is an expression evaluating to an iterable, from which an
iterator is extracted. The iterator is run to exhaustion, during which
time it yields and receives values directly to or from the caller of
the generator containing the ``yield from`` expression (the
"delegating generator").
When the iterator is another generator, the effect is the same as if
the body of the subgenerator were inlined at the point of the ``yield
from`` expression. Furthermore, the subgenerator is allowed to execute
a ``return`` statement with a value, and that value becomes the value of
the ``yield from`` expression.
In general, the semantics can be understood in terms of the iterator
protocol as follows:
* Any values that the iterator yields are passed directly to the
caller.
* Any values sent to the delegating generator using ``send()``
are passed directly to the iterator. If the sent value is None,
the iterator's ``next()`` method is called. If the sent value is
not None, the iterator's ``send()`` method is called. Any exception
resulting from attempting to call ``next`` or ``send`` is raised
in the delegating generator.
* Calls to the ``throw()`` method of the delegating generator are
forwarded to the iterator. If the iterator does not have a
``throw()`` method, the thrown-in exception is raised in the
delegating generator.
* If the delegating generator's ``close()`` method is called, the
``close() method of the iterator is called first if it has one,
then the delegating generator is finalised.
* The value of the ``yield from`` expression is the first argument
to the ``StopIteration`` exception raised by the iterator when it
terminates.
* ``return expr`` in a generator causes ``StopIteration(expr)`` to
be raised.
For convenience, the ``StopIteration`` exception will be given a
``value`` attribute that holds its first argument, or None if there
are no arguments.
Formal Semantics
----------------
1. The statement
::
result = yield from expr
is semantically equivalent to
::
_i = iter(expr)
try:
_u = _i.next()
while 1:
try:
_v = yield _u
except Exception, _e:
_m = getattr(_i, 'throw', None)
if _m is not None:
_u = _m(_e)
else:
raise
else:
if _v is None:
_u = _i.next()
else:
_u = _i.send(_v)
except StopIteration, _e:
result = _e.value
finally:
_m = getattr(_i, 'close', None)
if _m is not None:
_m()
except that implementations are free to cache bound methods for the 'next',
'send', 'throw' and 'close' methods of the iterator.
2. In a generator, the statement
::
return value
is semantically equivalent to
::
raise StopIteration(value)
except that, as currently, the exception cannot be caught by ``except``
clauses within the returning generator.
3. The StopIteration exception behaves as though defined thusly:
::
class StopIteration(Exception):
def __init__(self, *args):
if len(args) > 0:
self.value = args[0]
else:
self.value = None
Exception.__init__(self, *args)
Rationale
=========
A Python generator is a form of coroutine, but has the limitation that
it can only yield to its immediate caller. This means that a piece of
code containing a ``yield`` cannot be factored out and put into a
separate function in the same way as other code. Performing such a
factoring causes the called function to itself become a generator, and
it is necessary to explicitly iterate over this second generator and
re-yield any values that it produces.
If yielding of values is the only concern, this is not very arduous
and can be performed with a loop such as
::
for v in g:
yield v
However, if the subgenerator is to interact properly with the caller
in the case of calls to ``send()``, ``throw()`` and ``close()``, things
become considerably more complicated. As the formal expansion presented
above illustrates, the necessary code is very longwinded, and it is tricky
to handle all the corner cases correctly. In this situation, the advantages
of a specialised syntax should be clear.
Generators as Threads
---------------------
A motivating use case for generators being able to return values
concerns the use of generators to implement lightweight threads. When
using generators in that way, it is reasonable to want to spread the
computation performed by the lightweight thread over many functions.
One would like to be able to call a subgenerator as though it were
an ordinary function, passing it parameters and receiving a returned
value.
Using the proposed syntax, a statement such as
::
y = f(x)
where f is an ordinary function, can be transformed into a delegation
call
::
y = yield from g(x)
where g is a generator. One can reason about the behaviour of the
resulting code by thinking of g as an ordinary function that can be
suspended using a ``yield`` statement.
When using generators as threads in this way, typically one is not
interested in the values being passed in or out of the yields.
However, there are use cases for this as well, where the thread is
seen as a producer or consumer of items. The ``yield from``
expression allows the logic of the thread to be spread over as
many functions as desired, with the production or consumption of
items occuring in any subfunction, and the items are automatically
routed to or from their ultimate source or destination.
Concerning ``throw()`` and ``close()``, it is reasonable to expect
that if an exception is thrown into the thread from outside, it should
first be raised in the innermost generator where the thread is suspended,
and propagate outwards from there; and that if the thread is terminated
from outside by calling ``close()``, the chain of active generators
should be finalised from the innermost outwards.
Syntax
------
The particular syntax proposed has been chosen as suggestive of its
meaning, while not introducing any new keywords and clearly standing
out as being different from a plain ``yield``.
Optimisations
-------------
Using a specialised syntax opens up possibilities for optimisation
when there is a long chain of generators. Such chains can arise, for
instance, when recursively traversing a tree structure. The overhead
of passing ``next()`` calls and yielded values down and up the chain
can cause what ought to be an O(n) operation to become O(n\*\*2).
A possible strategy is to add a slot to generator objects to hold a
generator being delegated to. When a ``next()`` or ``send()`` call is
made on the generator, this slot is checked first, and if it is
nonempty, the generator that it references is resumed instead. If it
raises StopIteration, the slot is cleared and the main generator is
resumed.
This would reduce the delegation overhead to a chain of C function
calls involving no Python code execution. A possible enhancement would
be to traverse the whole chain of generators in a loop and directly
resume the one at the end, although the handling of StopIteration is
more complicated then.
Use of StopIteration to return values
-------------------------------------
There are a variety of ways that the return value from the generator
could be passed back. Some alternatives include storing it as an
attribute of the generator-iterator object, or returning it as the
value of the ``close()`` call to the subgenerator. However, the proposed
mechanism is attractive for a couple of reasons:
* Using the StopIteration exception makes it easy for other kinds
of iterators to participate in the protocol without having to
grow extra attributes or a close() method.
* It simplifies the implementation, because the point at which the
return value from the subgenerator becomes available is the same
point at which StopIteration is raised. Delaying until any later
time would require storing the return value somewhere.
Criticisms
==========
Under this proposal, the value of a ``yield from`` expression would
be derived in a very different way from that of an ordinary ``yield``
expression. This suggests that some other syntax not containing the
word ``yield`` might be more appropriate, but no acceptable alternative
has so far been proposed.
It has been suggested that some mechanism other than ``return`` in
the subgenerator should be used to establish the value returned by
the ``yield from`` expression. However, this would interfere with
the goal of being able to think of the subgenerator as a suspendable
function, since it would not be able to return values in the same way
as other functions.
The use of an argument to StopIteration to pass the return value
has been criticised as an "abuse of exceptions", without any
concrete justification of this claim. In any case, this is only
one suggested implementation; another mechanism could be used
without losing any essential features of the proposal.
It has been suggested that a different exception, such as
GeneratorReturn, should be used instead of StopIteration to return a
value. However, no convincing practical reason for this has been put
forward, and the addition of a ``value`` attribute to StopIteration
mitigates any difficulties in extracting a return value from a
StopIteration exception that may or may not have one. Also, using a
different exception would mean that, unlike ordinary functions,
'return' without a value in a generator would not be equivalent to
'return None'.
Alternative Proposals
=====================
Proposals along similar lines have been made before, some using the
syntax ``yield *`` instead of ``yield from``. While ``yield *`` is
more concise, it could be argued that it looks too similar to an
ordinary ``yield`` and the difference might be overlooked when reading
code.
To the author's knowledge, previous proposals have focused only on
yielding values, and thereby suffered from the criticism that the
two-line for-loop they replace is not sufficiently tiresome to write
to justify a new syntax. By also dealing with calls to ``send()``,
``throw()`` and ``close()``, this proposal provides considerably more
benefit.
Implementation
==============
A `prototype implementation`_ is available, based on the first
optimisation outlined above.
.. _prototype implementation: http://www.cosc.canterbury.ac.nz/greg.ewing/python/yield-from/
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: