python-peps/pep-0479.txt

379 lines
14 KiB
Plaintext
Raw Normal View History

PEP: 479
Title: Change StopIteration handling inside generators
Version: $Revision$
Last-Modified: $Date$
2014-11-21 13:53:13 -05:00
Author: Chris Angelico <rosuav@gmail.com>, Guido van Rossum <guido@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Nov-2014
Python-Version: 3.5
Post-History: 15-Nov-2014, 19-Nov-2014
Abstract
========
This PEP proposes a semantic change to ``StopIteration`` when raised
inside a generator. This would unify the behaviour of list
comprehensions and generator expressions, reducing surprises such as
the one that started this discussion [1]_. This is also the main
backwards incompatibility of the proposal -- any generator that
depends on raising ``StopIteration`` to terminate it will
have to be rewritten to either catch that exception or use a for-loop.
Rationale
=========
The interaction of generators and ``StopIteration`` is currently
somewhat surprising, and can conceal obscure bugs. An unexpected
exception should not result in subtly altered behaviour, but should
cause a noisy and easily-debugged traceback. Currently,
``StopIteration`` can be absorbed by the generator construct.
2014-11-17 04:26:09 -05:00
Background information
======================
When a generator frame is (re)started as a result of a ``__next__()``
(or ``send()`` or ``throw()``) call, one of three outcomes can occur:
* A yield point is reached, and the yielded value is returned.
* The frame is returned from; ``StopIteration`` is raised.
* An exception is raised, which bubbles out.
In the latter two cases the frame is abandoned (and the generator
object's ``gi_frame`` attribute is set to None).
2014-11-17 04:26:09 -05:00
Proposal
========
If a ``StopIteration`` is about to bubble out of a generator frame, it
2014-11-17 04:26:09 -05:00
is replaced with ``RuntimeError``, which causes the ``next()`` call
(which invoked the generator) to fail, passing that exception out.
From then on it's just like any old exception. [3]_
This affects the third outcome listed above, without altering any
other effects. Furthermore, it only affects this outcome when the
exception raised is StopIteration (or a subclass thereof).
Note that the proposed replacement happens at the point where the
exception is about to bubble out of the frame, i.e. after any
``except`` or ``finally`` blocks that could affect it have been
exited. The ``StopIteration`` raised by returning from the frame is
not affected (the point being that ``StopIteration`` means that the
generator terminated "normally", i.e. it did not raise an exception).
Consequences for existing code
==============================
This change will affect existing code that depends on
``StopIteration`` bubbling up. The pure Python reference
implementation of ``groupby`` [2]_ currently has comments "Exit on
``StopIteration``" where it is expected that the exception will
propagate and then be handled. This will be unusual, but not unknown,
and such constructs will fail. Other examples abound, e.g. [5]_, [6]_.
(Nick Coghlan comments: """If you wanted to factor out a helper
function that terminated the generator you'd have to do "return
yield from helper()" rather than just "helper()".""")
There are also examples of generator expressions floating around that
rely on a StopIteration raised by the expression, the target or the
predicate (rather than by the __next__() call implied in the ``for``
loop proper).
As this can break code, it is proposed to utilize the ``__future__``
mechanism to introduce this in Python 3.5, finally making it standard
in Python 3.6 or 3.7. The proposed syntax is::
from __future__ import generator_stop
Any generator function constructed under the influence of this
directive will have the REPLACE_STOPITERATION flag set on its code
object, and generators with the flag set will behave according to this
proposal. Once the feature becomes standard, the flag may be dropped;
code should not inspect generators for it.
Examples
--------
Generators which explicitly raise StopIteration can generally be
changed to simply return instead. This will be compatible with all
existing Python versions, and will not be affected by __future__.
Lib/ipaddress.py::
2014-11-21 00:14:17 -05:00
if other == self:
raise StopIteration
2014-11-21 00:14:17 -05:00
Becomes::
2014-11-21 00:14:17 -05:00
if other == self:
return
In some cases, this can be combined with ``yield from`` to simplify
the code, such as Lib/difflib.py::
2014-11-21 00:14:17 -05:00
if context is None:
while True:
yield next(line_pair_iterator)
2014-11-21 00:14:17 -05:00
Becomes::
2014-11-21 00:14:17 -05:00
if context is None:
yield from line_pair_iterator
return
2014-11-21 00:14:17 -05:00
(The ``return`` is necessary for a strictly-equivalent translation,
though in this particular file, there is no further code, and the
``return`` can be elided.) For compatibility with pre-3.3 versions
of Python, this could be written with an explicit ``for`` loop::
2014-11-21 00:14:17 -05:00
if context is None:
for line in line_pair_iterator:
yield line
return
More complicated iteration patterns will need explicit try/catch
constructs. For example, a parser construct like this::
2014-11-21 00:14:17 -05:00
def unwrap(f):
while True:
data = next(f)
while True:
line = next(f)
if line == "- end -": break
data += line
yield data
2014-11-21 00:14:17 -05:00
would need to be rewritten as::
2014-11-21 00:14:17 -05:00
def parser(f):
while True:
try:
data = next(f)
while True:
line = next(f)
if line == "- end -": break
data += line
yield data
except StopIteration:
return
2014-11-21 00:14:17 -05:00
or possibly::
2014-11-21 00:14:17 -05:00
def parser(f):
for data in f:
while True:
line = next(f)
if line == "- end -": break
data += line
yield data
The latter form obscures the iteration by purporting to iterate over
the file with a ``for`` loop, but then also fetches more data from
the same iterator during the loop body. It does, however, clearly
differentiate between a "normal" termination (``StopIteration``
instead of the initial line) and an "abnormal" termination (failing
to find the end marker in the inner loop, which will now raise
``RuntimeError``).
Explanation of generators, iterators, and StopIteration
=======================================================
Under this proposal, generators and iterators would be distinct, but
related, concepts. Like the mixing of text and bytes in Python 2,
the mixing of generators and iterators has resulted in certain
perceived conveniences, but proper separation will make bugs more
visible.
An iterator is an object with a ``__next__`` method. Like many other
dunder methods, it may either return a value, or raise a specific
exception - in this case, ``StopIteration`` - to signal that it has
no value to return. In this, it is similar to ``__getattr__`` (can
raise ``AttributeError``), ``__getitem__`` (can raise ``KeyError``),
and so on. A helper function for an iterator can be written to
follow the same protocol; for example::
2014-11-21 00:14:17 -05:00
def helper(x, y):
if x > y: return 1 / (x - y)
raise StopIteration
2014-11-21 00:14:17 -05:00
def __next__(self):
if self.a: return helper(self.b, self.c)
return helper(self.d, self.e)
Both forms of signalling are carried through: a returned value is
returned, an exception bubbles up. The helper is written to match
the protocol of the calling function.
A generator function is one which contains a ``yield`` expression.
Each time it is (re)started, it may either yield a value, or return
(including "falling off the end"). A helper function for a generator
can also be written, but it must also follow generator protocol::
2014-11-21 00:14:17 -05:00
def helper(x, y):
if x > y: yield 1 / (x - y)
2014-11-21 00:14:17 -05:00
def gen(self):
if self.a: return (yield from helper(self.b, self.c))
return (yield from helper(self.d, self.e))
In both cases, any unexpected exception will bubble up. Due to the
nature of generators and iterators, an unexpected ``StopIteration``
inside a generator will be converted into ``RuntimeError``, but
beyond that, all exceptions will propagate normally.
Alternate proposals
===================
2014-11-17 04:26:09 -05:00
Raising something other than RuntimeError
-----------------------------------------
Rather than the generic ``RuntimeError``, it might make sense to raise
a new exception type ``UnexpectedStopIteration``. This has the
downside of implicitly encouraging that it be caught; the correct
action is to catch the original ``StopIteration``, not the chained
exception.
Supplying a specific exception to raise on return
-------------------------------------------------
Nick Coghlan suggested a means of providing a specific
``StopIteration`` instance to the generator; if any other instance of
``StopIteration`` is raised, it is an error, but if that particular
2014-11-17 04:26:09 -05:00
one is raised, the generator has properly completed. This subproposal
has been withdrawn in favour of better options, but is retained for
reference.
Making return-triggered StopIterations obvious
----------------------------------------------
For certain situations, a simpler and fully backward-compatible
solution may be sufficient: when a generator returns, instead of
raising ``StopIteration``, it raises a specific subclass of
``StopIteration`` (``GeneratorReturn``) which can then be detected.
If it is not that subclass, it is an escaping exception rather than a
return statement.
The inspiration for this alternative proposal was Nick's observation
[7]_ that if an ``asyncio`` coroutine [8]_ accidentally raises
``StopIteration``, it currently terminates silently, which may present
a hard-to-debug mystery to the developer. The main proposal turns
such accidents into clearly distinguishable ``RuntimeError`` exceptions,
but if that is rejected, this alternate proposal would enable
``asyncio`` to distinguish between a ``return`` statement and an
accidentally-raised ``StopIteration`` exception.
Of the three outcomes listed above, two change:
2014-11-17 04:26:09 -05:00
* If a yield point is reached, the value, obviously, would still be
returned.
* If the frame is returned from, ``GeneratorReturn`` (rather than
``StopIteration``) is raised.
2014-11-17 04:26:09 -05:00
* If an instance of ``GeneratorReturn`` would be raised, instead an
instance of ``StopIteration`` would be raised. Any other exception
bubbles up normally.
2014-11-17 04:26:09 -05:00
In the third case, the ``StopIteration`` would have the ``value`` of
the original ``GeneratorReturn``, and would reference the original
exception in its ``__cause__``. If uncaught, this would clearly show
the chaining of exceptions.
This alternative does *not* affect the discrepancy between generator
expressions and list comprehensions, but allows generator-aware code
(such as the ``contextlib`` and ``asyncio`` modules) to reliably
differentiate between the second and third outcomes listed above.
2014-11-17 04:26:09 -05:00
However, once code exists that depends on this distinction between
``GeneratorReturn`` and ``StopIteration``, a generator that invokes
another generator and relies on the latter's ``StopIteration`` to
bubble out would still be potentially wrong, depending on the use made
of the distinction between the two exception types.
Criticism
=========
Unofficial and apocryphal statistics suggest that this is seldom, if
ever, a problem. [4]_ Code does exist which relies on the current
behaviour (e.g. [2]_, [5]_, [6]_), and there is the concern that this
would be unnecessary code churn to achieve little or no gain.
Steven D'Aprano started an informal survey on comp.lang.python [9]_;
at the time of writing only two responses have been received: one was
in favor of changing list comprehensions to match generator
expressions (!), the other was in favor of this PEP's main proposal.
The existing model has been compared to the perfectly-acceptable
issues inherent to every other case where an exception has special
meaning. For instance, an unexpected ``KeyError`` inside a
``__getitem__`` method will be interpreted as failure, rather than
permitted to bubble up. However, there is a difference. Dunder
methods use ``return`` to indicate normality, and ``raise`` to signal
abnormality; generators ``yield`` to indicate data, and ``return`` to
signal the abnormal state. This makes explicitly raising
``StopIteration`` entirely redundant, and potentially surprising. If
other dunder methods had dedicated keywords to distinguish between
their return paths, they too could turn unexpected exceptions into
``RuntimeError``; the fact that they cannot should not preclude
generators from doing so.
References
==========
.. [1] Initial mailing list comment
(https://mail.python.org/pipermail/python-ideas/2014-November/029906.html)
.. [2] Pure Python implementation of groupby
(https://docs.python.org/3/library/itertools.html#itertools.groupby)
.. [3] Proposal by GvR
(https://mail.python.org/pipermail/python-ideas/2014-November/029953.html)
.. [4] Response by Steven D'Aprano
(https://mail.python.org/pipermail/python-ideas/2014-November/029994.html)
.. [5] Split a sequence or generator using a predicate
(http://code.activestate.com/recipes/578416-split-a-sequence-or-generator-using-a-predicate/)
.. [6] wrap unbounded generator to restrict its output
(http://code.activestate.com/recipes/66427-wrap-unbounded-generator-to-restrict-its-output/)
.. [7] Post from Nick Coghlan mentioning asyncio
(https://mail.python.org/pipermail/python-ideas/2014-November/029961.html)
.. [8] Coroutines in asyncio
(https://docs.python.org/3/library/asyncio-task.html#coroutines)
.. [9] Thread on comp.lang.python started by Steven D'Aprano
(https://mail.python.org/pipermail/python-list/2014-November/680757.html)
2014-11-20 14:30:39 -05:00
.. [10] Tracker issue with Proof-of-Concept patch
(http://bugs.python.org/issue22906)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: