PEP: 479 Title: Change StopIteration handling inside generators Version: $Revision$ Last-Modified: $Date$ Author: Chris Angelico , Guido van Rossum Status: Accepted Type: Standards Track Content-Type: text/x-rst Created: 15-Nov-2014 Python-Version: 3.5 Post-History: 15-Nov-2014, 19-Nov-2014, 5-Dec-2014 Abstract ======== This PEP proposes a change to generators: when ``StopIteration`` is raised inside a generator, it is replaced it with ``RuntimeError``. (More precisely, this happens when the exception is about to bubble out of the generator's stack frame.) Because the change is backwards incompatible, the feature is initially introduced using a ``__future__`` statement. Acceptance ========== This PEP was accepted by the BDFL on November 22. Because of the exceptionally short period from first draft to acceptance, the main objections brought up after acceptance were carefully considered and have been reflected in the "Alternate proposals" section below. However, none of the discussion changed the BDFL's mind and the PEP's acceptance is now final. (Suggestions for clarifying edits are still welcome -- unlike IETF RFCs, the text of a PEP is not cast in stone after its acceptance, although the core design/plan/specification should not change after acceptance.) Rationale ========= The interaction of generators and ``StopIteration`` is currently somewhat surprising, and can conceal obscure bugs. An unexpected exception should not result in subtly altered behaviour, but should cause a noisy and easily-debugged traceback. Currently, ``StopIteration`` can be absorbed by the generator construct. The main goal of the proposal is to ease debugging in the situation where an unguarded ``next()`` call (perhaps several stack frames deep) raises ``StopIteration`` and causes the iteration controlled by the generator to terminate silently. (When another exception is raised, a traceback is printed pinpointing the cause of the problem.) This is particularly pernicious in combination with the ``yield from`` construct of PEP 380 [1]_, as it breaks the abstraction that a subgenerator may be factored out of a generator. That PEP notes this limitation, but notes that "use cases for these [are] rare to non- existent". Unfortunately while intentional use is rare, it is easy to stumble on these cases by accident:: import contextlib @contextlib.contextmanager def transaction(): print('begin') try: yield from do_it() except: print('rollback') raise else: print('commit') def do_it(): print('Refactored initial setup') yield # Body of with-statement is executed here print('Refactored finalization of successful transaction') def gene(): for i in range(2): with transaction(): yield i # return raise StopIteration # This is wrong print('Should not be reached') for i in gene(): print('main: i =', i) Here factoring out ``do_it`` into a subgenerator has introduced a subtle bug: if the wrapped block raises ``StopIteration``, under the current behavior this exception will be swallowed by the context manager; and, worse, the finalization is silently skipped! Similarly problematic behavior occurs when an ``asyncio`` coroutine raises ``StopIteration``, causing it to terminate silently, or when ``next`` is used to take the first result from an iterator that unexpectedly turns out to be empty, for example:: # using the same context manager as above import pathlib with transaction(): print('commit file {}'.format( # I can never remember what the README extension is next(pathlib.Path('/some/dir').glob('README*')))) In both cases, the refactoring abstraction of ``yield from`` breaks in the presence of bugs in client code. Additionally, the proposal reduces the difference between list comprehensions and generator expressions, preventing surprises such as the one that started this discussion [2]_. Henceforth, the following statements will produce the same result if either produces a result at all:: a = list(F(x) for x in xs if P(x)) a = [F(x) for x in xs if P(x)] With the current state of affairs, it is possible to write a function ``F(x)`` or a predicate ``P(x)`` that causes the first form to produce a (truncated) result, while the second form raises an exception (namely, ``StopIteration``). With the proposed change, both forms will raise an exception at this point (albeit ``RuntimeError`` in the first case and ``StopIteration`` in the second). Finally, the proposal also clears up the confusion about how to terminate a generator: the proper way is ``return``, not ``raise StopIteration``. As an added bonus, the above changes bring generator functions much more in line with regular functions. If you wish to take a piece of code presented as a generator and turn it into something else, you can usually do this fairly simply, by replacing every ``yield`` with a call to ``print()`` or ``list.append()``; however, if there are any bare ``next()`` calls in the code, you have to be aware of them. If the code was originally written without relying on ``StopIteration`` terminating the function, the transformation would be that much easier. Background information ====================== When a generator frame is (re)started as a result of a ``__next__()`` (or ``send()`` or ``throw()``) call, one of three outcomes can occur: * A yield point is reached, and the yielded value is returned. * The frame is returned from; ``StopIteration`` is raised. * An exception is raised, which bubbles out. In the latter two cases the frame is abandoned (and the generator object's ``gi_frame`` attribute is set to None). Proposal ======== If a ``StopIteration`` is about to bubble out of a generator frame, it is replaced with ``RuntimeError``, which causes the ``next()`` call (which invoked the generator) to fail, passing that exception out. From then on it's just like any old exception. [3]_ This affects the third outcome listed above, without altering any other effects. Furthermore, it only affects this outcome when the exception raised is ``StopIteration`` (or a subclass thereof). Note that the proposed replacement happens at the point where the exception is about to bubble out of the frame, i.e. after any ``except`` or ``finally`` blocks that could affect it have been exited. The ``StopIteration`` raised by returning from the frame is not affected (the point being that ``StopIteration`` means that the generator terminated "normally", i.e. it did not raise an exception). A subtle issue is what will happen if the caller, having caught the ``RuntimeError``, calls the generator object's ``__next__()`` method again. The answer is that from this point on it will raise ``StopIteration`` -- the behavior is the same as when any other exception was raised by the generator. Another logical consequence of the proposal: if someone uses ``g.throw(StopIteration)`` to throw a ``StopIteration`` exception into a generator, if the generator doesn't catch it (which it could do using a ``try/except`` around the ``yield``), it will be transformed into ``RuntimeError``. During the transition phase, the new feature must be enabled per-module using:: from __future__ import generator_stop Any generator function constructed under the influence of this directive will have the ``REPLACE_STOPITERATION`` flag set on its code object, and generators with the flag set will behave according to this proposal. Once the feature becomes standard, the flag may be dropped; code should not inspect generators for it. A proof-of-concept patch has been created to facilitate testing. [4]_ Consequences for existing code ============================== This change will affect existing code that depends on ``StopIteration`` bubbling up. The pure Python reference implementation of ``groupby`` [5]_ currently has comments "Exit on ``StopIteration``" where it is expected that the exception will propagate and then be handled. This will be unusual, but not unknown, and such constructs will fail. Other examples abound, e.g. [6]_, [7]_. (Nick Coghlan comments: """If you wanted to factor out a helper function that terminated the generator you'd have to do "return yield from helper()" rather than just "helper()".""") There are also examples of generator expressions floating around that rely on a ``StopIteration`` raised by the expression, the target or the predicate (rather than by the ``__next__()`` call implied in the ``for`` loop proper). Writing backwards and forwards compatible code ---------------------------------------------- With the exception of hacks that raise ``StopIteration`` to exit a generator expression, it is easy to write code that works equally well under older Python versions as under the new semantics. This is done by enclosing those places in the generator body where a ``StopIteration`` is expected (e.g. bare ``next()`` calls or in some cases helper functions that are expected to raise ``StopIteration``) in a ``try/except`` construct that returns when ``StopIteration`` is raised. The ``try/except`` construct should appear directly in the generator function; doing this in a helper function that is not itself a generator does not work. If ``raise StopIteration`` occurs directly in a generator, simply replace it with ``return``. Examples of breakage -------------------- Generators which explicitly raise ``StopIteration`` can generally be changed to simply return instead. This will be compatible with all existing Python versions, and will not be affected by ``__future__``. Here are some illustrations from the standard library. Lib/ipaddress.py:: if other == self: raise StopIteration Becomes:: if other == self: return In some cases, this can be combined with ``yield from`` to simplify the code, such as Lib/difflib.py:: if context is None: while True: yield next(line_pair_iterator) Becomes:: if context is None: yield from line_pair_iterator return (The ``return`` is necessary for a strictly-equivalent translation, though in this particular file, there is no further code, and the ``return`` can be omitted.) For compatibility with pre-3.3 versions of Python, this could be written with an explicit ``for`` loop:: if context is None: for line in line_pair_iterator: yield line return More complicated iteration patterns will need explicit ``try/except`` constructs. For example, a hypothetical parser like this:: def parser(f): while True: data = next(f) while True: line = next(f) if line == "- end -": break data += line yield data would need to be rewritten as:: def parser(f): while True: try: data = next(f) while True: line = next(f) if line == "- end -": break data += line yield data except StopIteration: return or possibly:: def parser(f): for data in f: while True: line = next(f) if line == "- end -": break data += line yield data The latter form obscures the iteration by purporting to iterate over the file with a ``for`` loop, but then also fetches more data from the same iterator during the loop body. It does, however, clearly differentiate between a "normal" termination (``StopIteration`` instead of the initial line) and an "abnormal" termination (failing to find the end marker in the inner loop, which will now raise ``RuntimeError``). This effect of ``StopIteration`` has been used to cut a generator expression short, creating a form of ``takewhile``:: def stop(): raise StopIteration print(list(x for x in range(10) if x < 5 or stop())) # prints [0, 1, 2, 3, 4] Under the current proposal, this form of non-local flow control is not supported, and would have to be rewritten in statement form:: def gen(): for x in range(10): if x >= 5: return yield x print(list(gen())) # prints [0, 1, 2, 3, 4] While this is a small loss of functionality, it is functionality that often comes at the cost of readability, and just as ``lambda`` has restrictions compared to ``def``, so does a generator expression have restrictions compared to a generator function. In many cases, the transformation to full generator function will be trivially easy, and may improve structural clarity. Explanation of generators, iterators, and StopIteration ======================================================= Under this proposal, generators and iterators would be distinct, but related, concepts. Like the mixing of text and bytes in Python 2, the mixing of generators and iterators has resulted in certain perceived conveniences, but proper separation will make bugs more visible. An iterator is an object with a ``__next__`` method. Like many other special methods, it may either return a value, or raise a specific exception - in this case, ``StopIteration`` - to signal that it has no value to return. In this, it is similar to ``__getattr__`` (can raise ``AttributeError``), ``__getitem__`` (can raise ``KeyError``), and so on. A helper function for an iterator can be written to follow the same protocol; for example:: def helper(x, y): if x > y: return 1 / (x - y) raise StopIteration def __next__(self): if self.a: return helper(self.b, self.c) return helper(self.d, self.e) Both forms of signalling are carried through: a returned value is returned, an exception bubbles up. The helper is written to match the protocol of the calling function. A generator function is one which contains a ``yield`` expression. Each time it is (re)started, it may either yield a value, or return (including "falling off the end"). A helper function for a generator can also be written, but it must also follow generator protocol:: def helper(x, y): if x > y: yield 1 / (x - y) def gen(self): if self.a: return (yield from helper(self.b, self.c)) return (yield from helper(self.d, self.e)) In both cases, any unexpected exception will bubble up. Due to the nature of generators and iterators, an unexpected ``StopIteration`` inside a generator will be converted into ``RuntimeError``, but beyond that, all exceptions will propagate normally. Transition plan =============== - Python 3.5: Enable new semantics under ``__future__`` import; silent deprecation warning if ``StopIteration`` bubbles out of a generator not under ``__future__`` import. - Python 3.6: Non-silent deprecation warning. - Python 3.7: Enable new semantics everywhere. Alternate proposals =================== Raising something other than RuntimeError ----------------------------------------- Rather than the generic ``RuntimeError``, it might make sense to raise a new exception type ``UnexpectedStopIteration``. This has the downside of implicitly encouraging that it be caught; the correct action is to catch the original ``StopIteration``, not the chained exception. Supplying a specific exception to raise on return ------------------------------------------------- Nick Coghlan suggested a means of providing a specific ``StopIteration`` instance to the generator; if any other instance of ``StopIteration`` is raised, it is an error, but if that particular one is raised, the generator has properly completed. This subproposal has been withdrawn in favour of better options, but is retained for reference. Making return-triggered StopIterations obvious ---------------------------------------------- For certain situations, a simpler and fully backward-compatible solution may be sufficient: when a generator returns, instead of raising ``StopIteration``, it raises a specific subclass of ``StopIteration`` (``GeneratorReturn``) which can then be detected. If it is not that subclass, it is an escaping exception rather than a return statement. The inspiration for this alternative proposal was Nick's observation [8]_ that if an ``asyncio`` coroutine [9]_ accidentally raises ``StopIteration``, it currently terminates silently, which may present a hard-to-debug mystery to the developer. The main proposal turns such accidents into clearly distinguishable ``RuntimeError`` exceptions, but if that is rejected, this alternate proposal would enable ``asyncio`` to distinguish between a ``return`` statement and an accidentally-raised ``StopIteration`` exception. Of the three outcomes listed above, two change: * If a yield point is reached, the value, obviously, would still be returned. * If the frame is returned from, ``GeneratorReturn`` (rather than ``StopIteration``) is raised. * If an instance of ``GeneratorReturn`` would be raised, instead an instance of ``StopIteration`` would be raised. Any other exception bubbles up normally. In the third case, the ``StopIteration`` would have the ``value`` of the original ``GeneratorReturn``, and would reference the original exception in its ``__cause__``. If uncaught, this would clearly show the chaining of exceptions. This alternative does *not* affect the discrepancy between generator expressions and list comprehensions, but allows generator-aware code (such as the ``contextlib`` and ``asyncio`` modules) to reliably differentiate between the second and third outcomes listed above. However, once code exists that depends on this distinction between ``GeneratorReturn`` and ``StopIteration``, a generator that invokes another generator and relies on the latter's ``StopIteration`` to bubble out would still be potentially wrong, depending on the use made of the distinction between the two exception types. Converting the exception inside next() -------------------------------------- Mark Shannon suggested [10]_ that the problem could be solved in ``next()`` rather than at the boundary of generator functions. By having ``next()`` catch ``StopIteration`` and raise instead ``ValueError``, all unexpected ``StopIteration`` bubbling would be prevented; however, the backward-incompatibility concerns are far more serious than for the current proposal, as every ``next()`` call now needs to be rewritten to guard against ``ValueError`` instead of ``StopIteration`` - not to mention that there is no way to write one block of code which reliably works on multiple versions of Python. (Using a dedicated exception type, perhaps subclassing ``ValueError``, would help this; however, all code would still need to be rewritten.) Sub-proposal: decorator to explicitly request current behaviour --------------------------------------------------------------- Nick Coghlan suggested [11]_ that the situations where the current behaviour is desired could be supported by means of a decorator:: from itertools import allow_implicit_stop @allow_implicit_stop def my_generator(): ... yield next(it) ... Which would be semantically equivalent to:: def my_generator(): try: ... yield next(it) ... except StopIteration return but be faster, as it could be implemented by simply permitting the ``StopIteration`` to bubble up directly. Single-source Python 2/3 code would also benefit in a 3.7+ world, since libraries like six and python-future could just define their own version of "allow_implicit_stop" that referred to the new builtin in 3.5+, and was implemented as an identity function in other versions. However, due to the implementation complexities required, the ongoing compatibility issues created, the subtlety of the decorator's effect, and the fact that it would encourage the "quick-fix" solution of just slapping the decorator onto all generators instead of properly fixing the code in question, this sub-proposal has been rejected. [12]_ Criticism ========= Unofficial and apocryphal statistics suggest that this is seldom, if ever, a problem. [13]_ Code does exist which relies on the current behaviour (e.g. [3]_, [6]_, [7]_), and there is the concern that this would be unnecessary code churn to achieve little or no gain. Steven D'Aprano started an informal survey on comp.lang.python [14]_; at the time of writing only two responses have been received: one was in favor of changing list comprehensions to match generator expressions (!), the other was in favor of this PEP's main proposal. The existing model has been compared to the perfectly-acceptable issues inherent to every other case where an exception has special meaning. For instance, an unexpected ``KeyError`` inside a ``__getitem__`` method will be interpreted as failure, rather than permitted to bubble up. However, there is a difference. Special methods use ``return`` to indicate normality, and ``raise`` to signal abnormality; generators ``yield`` to indicate data, and ``return`` to signal the abnormal state. This makes explicitly raising ``StopIteration`` entirely redundant, and potentially surprising. If other special methods had dedicated keywords to distinguish between their return paths, they too could turn unexpected exceptions into ``RuntimeError``; the fact that they cannot should not preclude generators from doing so. References ========== .. [1] PEP 380 - Syntax for Delegating to a Subgenerator (https://www.python.org/dev/peps/pep-0380) .. [2] Initial mailing list comment (https://mail.python.org/pipermail/python-ideas/2014-November/029906.html) .. [3] Proposal by GvR (https://mail.python.org/pipermail/python-ideas/2014-November/029953.html) .. [4] Tracker issue with Proof-of-Concept patch (http://bugs.python.org/issue22906) .. [5] Pure Python implementation of groupby (https://docs.python.org/3/library/itertools.html#itertools.groupby) .. [6] Split a sequence or generator using a predicate (http://code.activestate.com/recipes/578416-split-a-sequence-or-generator-using-a-predicate/) .. [7] wrap unbounded generator to restrict its output (http://code.activestate.com/recipes/66427-wrap-unbounded-generator-to-restrict-its-output/) .. [8] Post from Nick Coghlan mentioning asyncio (https://mail.python.org/pipermail/python-ideas/2014-November/029961.html) .. [9] Coroutines in asyncio (https://docs.python.org/3/library/asyncio-task.html#coroutines) .. [10] Post from Mark Shannon with alternate proposal (https://mail.python.org/pipermail/python-dev/2014-November/137129.html) .. [11] Idea from Nick Coghlan (https://mail.python.org/pipermail/python-dev/2014-November/137201.html) .. [12] Rejection of above idea by GvR (https://mail.python.org/pipermail/python-dev/2014-November/137243.html) .. [13] Response by Steven D'Aprano (https://mail.python.org/pipermail/python-ideas/2014-November/029994.html) .. [14] Thread on comp.lang.python started by Steven D'Aprano (https://mail.python.org/pipermail/python-list/2014-November/680757.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: