PEP: 340 Title: Anonymous Block Statements Version: $Revision$ Last-Modified: $Date$ Author: Guido van Rossum Status: Draft Type: Standards Track Content-Type: text/plain Created: 27-Apr-2005 Post-History: Introduction This PEP proposes a new type of compound statement which can be used for resource management purposes, and a new iterator API to go with it. The new statement type is provisionally called the block-statement because the keyword to be used has not yet been chosen. This PEP competes with several other PEPs: PEP 288 (Generators Attributes and Exceptions; only the second part), PEP 310 (Reliable Acquisition/Release Pairs), and PEP 325 (Resource-Release Support for Generators). This proposal is just a strawman; we've had a heated debate about this on python-dev recently [1], and I figured it would be time to write up a precise spec in PEP form. Motivation and Summary (Thanks to Shane Hathaway -- Hi Shane!) Good programmers move commonly used code into reusable functions. Sometimes, however, patterns arise in the structure of the functions rather than the actual sequence of statements. For example, many functions acquire a lock, execute some code specific to that function, and unconditionally release the lock. Repeating the locking code in every function that uses it is error prone and makes refactoring difficult. Block statements provide a mechanism for encapsulating patterns of structure. Code inside the block statement runs under the control of an object called a block iterator. Simple block iterators execute code before and after the code inside the block statement. Block iterators also have the opportunity to execute the controlled code more than once (or not at all), catch exceptions, or receive data from the body of the block statement. A convenient way to write block iterators is to write a generator (PEP 255). A generator looks a lot like a Python function, but instead of returning a value immediately, generators pause their execution at "yield" statements. When a generator is used as a block iterator, the yield statement tells the Python interpreter to suspend the block iterator, execute the block statement body, and resume the block iterator when the body has executed. The Python interpreter behaves as follows when it encounters a block statement based on a generator. First, the interpreter instantiates the generator and begins executing it. The generator does setup work appropriate to the pattern it encapsulates, such as acquiring a lock, opening a file, starting a database transaction, or starting a loop. Then the generator yields execution to the body of the block statement using a yield statement. When the block statement body completes, raises an uncaught exception, or sends data back to the generator using a continue statement, the generator resumes. At this point, the generator can either clean up and stop or yield again, causing the block statement body to execute again. When the generator finishes, the interpreter leaves the block statement. Use Cases TBD. For now, see the Examples section near the end. Specification: the __next__() Method A new method for iterators is proposed, called __next__(). It takes one optional argument, which defaults to None. Calling the __next__() method without argument or with None is equivalent to using the old iterator API, next(). For backwards compatibility, it is recommended that iterators also implement a next() method as an alias for calling the __next__() method without an argument. The argument to the __next__() method may be used by the iterator as a hint on what to do next. Specification: the __exit__() Method An optional new method for iterators is proposed, called __exit__(). It takes up to three arguments which correspond to the three "arguments" to the raise-statement: type, value, and traceback. If all three arguments are None, sys.exc_info() may be consulted to provide suitable default values. Specification: the next() Built-in Function This is a built-in function defined as follows: def next(itr, arg=None): nxt = getattr(itr, "__next__", None) if nxt is not None: return nxt(arg) if arg is None: return itr.next() raise TypeError("next() with arg for old-style iterator") This function is proposed because there is often a need to call the next() method outside a for-loop; the new API, and the backwards compatibility code, is too ugly to have to repeat in user code. Note that I'm not proposing a built-in function to call the __exit__() method of an iterator. I don't expect that this will be called much outside the block-statement. Specification: a Change to the 'for' Loop A small change in the translation of the for-loop is proposed. The statement for VAR1 in EXPR1: BLOCK1 else: BLOCK2 will be translated as follows: itr = iter(EXPR1) arg = None brk = False while True: try: VAR1 = next(itr, arg) except StopIteration: brk = True break arg = None BLOCK1 if brk: BLOCK2 (However, the variables 'itr' etc. are not user-visible and the built-in names used cannot be overridden by the user.) Specification: the Extended 'continue' Statement In the translation of the for-loop, inside BLOCK1, the new syntax continue EXPR2 is legal and is translated into arg = EXPR2 continue (Where 'arg' references the corresponding hidden variable from the previous section.) This is also the case in the body of the block-statement proposed below. Specification: the Anonymous Block Statement A new statement is proposed with the syntax block EXPR1 as VAR1: BLOCK1 else: BLOCK2 Here, 'block' and 'as' are new keywords; EXPR1 is an arbitrary expression (but not an expression-list) and VAR1 is an arbitrary assignment target (which may be a comma-separated list). The "as VAR1" part is optional; if omitted, the assignments to VAR1 in the translation below are omitted (but the expressions assigned are still evaluated!). The choice of the 'block' keyword is contentious; many alternatives have been proposed, including not to use a keyword at all (which I actually like). PEP 310 uses 'with' for similar semantics, but I would like to reserve that for a with-statement similar to the one found in Pascal and VB. (Though I just found that the C# designers don't like 'with' [2], and I have to agree with their reasoning.) To sidestep this issue momentarily I'm using 'block' until we can agree on the right keyword, if any. Note that the 'as' keyword is not contentious (it will finally be elevated to proper keyword status). Note that it is left in the middle whether a block-statement represents a loop or not; this is up to the iterator, but in the most common case BLOCK1 is executed exactly once. The translation is subtly different from the translation of a for-loop: iter() is not called, so EXPR1 should already be an iterator (not just an iterable); and the iterator is guaranteed to be notified when the block-statement is left, regardless if this is due to a break, return or exception: itr = EXPR1 # The iterator ret = False # True if a return statement is active val = None # Return value, if ret == True arg = None # Argument to __next__() (value from continue) exc = None # sys.exc_info() tuple if an exception is active while True: try: if exc: ext = getattr(itr, "__exit__", None) if ext is not None: VAR1 = ext(*exc) # May re-raise *exc else: raise *exc # Well, the moral equivalent :-) else: VAR1 = next(itr, arg) # May raise StopIteration except StopIteration: if ret: return val break try: ret = False val = arg = exc = None BLOCK1 except: exc = sys.exc_info() (Again, the variables and built-ins are hidden from the user.) Inside BLOCK1, the following special translations apply: - "continue" and "continue EXPR2" are always legal; the latter is translated as shown earlier: arg = EXPR2 continue - "break" is always legal; it is translated into: exc = (StopIteration,) continue - "return EXPR3" is only legal when the block-statement is contained in a function definition; it is translated into: exc = (StopIteration,) ret = True val = EXPR3 continue The net effect is that break, continue and return behave much the same as if the block-statement were a for-loop, except that the iterator gets a chance at resource cleanup before the block-statement is left, through the optional __exit__() method. The iterator also gets a chance if the block-statement is left through raising an exception. If the iterator doesn't have an __exit__() method, there is no difference with a for-loop (except that a for-loop calls iter() on EXPR1). Note that a yield-statement (or a yield-expression, see below) in a block-statement is not treated differently. It suspends the function containing the block *without* notifying the block's iterator. The block's iterator is entirely unaware of this yield, since the local control flow doesn't actually leave the block. In other words, it is *not* like a break, continue or return statement. When the loop that was resumed by the yield calls next(), the block is resumed right after the yield. The generator finalization semantics described below guarantee (within the limitations of all finalization semantics) that the block will be resumed eventually. I haven't decided yet whether the block-statement should also allow an optional else-clause, like the for-loop, but I'm leaning against it. I think it would be confusing, and emphasize the "loopiness" of the block-statement, while I want to emphasize its *difference* from a for-loop. In addition, there are several possible semantics for an else-clause. Specification: Generator Exception Handling Generators will implement the new __next__() method API, as well as the old argument-less next() method which becomes an alias for calling __next__() without an argument. They will also implement the new __exit__() method API. Generators will be allowed to have a yield statement inside a try-finally statement. The expression argument to the yield-statement will become optional (defaulting to None). The yield-statement will be allowed to be used on the right-hand side of an assignment; in that case it is referred to as yield-expression. The value of this yield-expression is None unless __next__() was called with a ContinueIteration argument; see below. A yield-expression must always be parenthesized except when it occurs at the top-level expression on the right-hand side of an assignment. So x = yield 42 x = yield x = 12 + (yield 42) x = 12 + (yield) foo(yield 42) foo(yield) are all legal, but x = 12 + yield 42 x = 12 + yield foo(yield 42, 12) foo(yield, 12) are all illegal. (Some of the edge cases are motivated by the current legality of "yield 12, 42".) When __exit__() is called, the generator is resumed but at the point of the yield-statement or -expression the exception represented by the __exit__ argument(s) is raised. The generator may re-raise this exception, raise another exception, or yield another value, execpt that if the exception passed in to __exit__() was StopIteration, it ought to raise StopIteration (otherwise the effect would be that a break is turned into continue, which is unexpected at least). When the *initial* call resuming the generator is an __exit__() call instead of a __next__() call, the generator's execution is aborted and the exception is re-raised without passing control to the generator's body. When __next__() is called with an argument that is not None, the yield-expression that it resumes will return the value attribute of the argument. If it resumes a yield-statement, the value is ignored (or should this be considered an error?). When the *initial* call to __next__() receives an argument that is not None, the generator's execution is started normally; the argument's value attribute is ignored (or should this be considered an error?). When __next__() is called without an argument or with None as argument, and a yield-expression is resumed, the yield-expression returns None. When a generator that has not yet terminated is garbage-collected (either through reference counting or by the cyclical garbage collector), its __exit__() method is called once with StopIteration as its first argument. Together with the requirement that a generator ought to raise StopIteration when __exit__() is called with StopIteration, this guarantees the eventual activation of any finally-clauses that were active when the generator was last suspended. Of course, under certain circumstances the generator may never be garbage-collected. This is no different than the guarantees that are made about finalizers (__del__() methods) of other objects. Note: the syntactic extensions to yield make its use very similar to that in Ruby. This is intentional. Do note that in Python the block passes a value to the generator using "continue EXPR" rather than "return EXPR", and the underlying mechanism whereby control is passed between the generator and the block is completely different. Blocks in Python are not compiled into thunks; rather, yield suspends execution of the generator's frame. Some edge cases work differently; in Python, you cannot save the block for later use, and you cannot test whether there is a block or not. Loose Ends These are things that need to be resolved before accepting the PEP. - Fill in the remaining TBD sections. - Address Phillip Eby's proposal to have the block-statement use an entirely different API than the for-loop, to differentiate between the two (a generator would have to be wrapped in a decorator to make it support the block API). - Decide on the keyword ('block', 'with', '@', nothing, or something else?). - Whether a block-statement should allow an else-clause. Comparison to Thunks Alternative semantics proposed for the block-statement turn the block into a thunk (an anonymous function that blends into the containing scope). The main advantage of thunks that I can see is that you can save the thunk for later, like a callback for a button widget (the thunk then becomes a closure). You can't use a yield-based block for that (except in Ruby, which uses yield syntax with a thunk-based implementation). But I have to say that I almost see this as an advantage: I think I'd be slightly uncomfortable seeing a block and not knowing whether it will be executed in the normal control flow or later. Defining an explicit nested function for that purpose doesn't have this problem for me, because I already know that the 'def' keyword means its body is executed later. The other problem with thunks is that once we think of them as the anonymous functions they are, we're pretty much forced to say that a return statement in a thunk returns from the thunk rather than from the containing function. Doing it any other way would cause major weirdness when the thunk were to survive its containing function as a closure (perhaps continuations would help, but I'm not about to go there :-). But then an IMO important use case for the resource cleanup template pattern is lost. I routinely write code like this: def findSomething(self, key, default=None): self.lock.acquire() try: for item in self.elements: if item.matches(key): return item return default finally: self.lock.release() and I'd be bummed if I couldn't write this as: def findSomething(self, key, default=None): block synchronized(self.lock): for item in self.elements: if item.matches(key): return item return default This particular example can be rewritten using a break: def findSomething(self, key, default=None): block synchronized(self.lock): for item in self.elements: if item.matches(key): break else: item = default return item but it looks forced and the transformation isn't always that easy; you'd be forced to rewrite your code in a single-return style which feels too restrictive. Also note the semantic conundrum of a yield in a thunk -- the only reasonable interpretation is that this turns the thunk into a generator! Greg Ewing believes that thunks "would be a lot simpler, doing just what is required without any jiggery pokery with exceptions and break/continue/return statements. It would be easy to explain what it does and why it's useful." But in order to obtain the required local variable sharing between the thunk and the containing function, every local variable used or set in the thunk would have to become a 'cell' (our mechanism for sharing variables between nested scopes). Cells slow down access compared to regular local variables: access involves an extra C function call (PyCell_Get() or PyCell_Set()). Perhaps not entirely coincidentally, the last example above (findSomething() rewritten to avoid a return inside the block) shows that, unlike for regular nested functions, we'll want variables *assigned to* by the thunk also to be shared with the containing function, even if they are not assigned to outside the thunk. Greg Ewing again: "generators have turned out to be more powerful, because you can have more than one of them on the go at once. Is there a use for that capability here?" I believe there are definitely uses for this; several people have already shown how to do asynchronous light-weight threads using generators (e.g. David Mertz quoted in PEP 288, and Fredrik Lundh[3]). And finally, Greg says: "a thunk implementation has the potential to easily handle multiple block arguments, if a suitable syntax could ever be devised. It's hard to see how that could be done in a general way with the generator implementation." However, the use cases for multiple blocks seem elusive. Alternatives Considered TBD. Examples 1. A template for ensuring that a lock, acquired at the start of a block, is released when the block is left: def synchronized(lock): lock.acquire() try: yield finally: lock.release() Used as follows: block synchronized(myLock): # Code here executes with myLock held. The lock is # guaranteed to be released when the block is left (even # if by an uncaught exception). 2. A template for opening a file that ensures the file is closed when the block is left: def opening(filename, mode="r"): f = open(filename, mode) try: yield f finally: f.close() Used as follows: block opening("/etc/passwd") as f: for line in f: print line.rstrip() 3. A template for committing or rolling back a database transaction: def transactional(db): try: yield except: db.rollback() raise else: db.commit() 4. A template that tries something up to n times: def auto_retry(n=3, exc=Exception): for i in range(n): try: yield return except Exception, err: # perhaps log exception here continue raise # re-raise the exception we caught earlier Used as follows: block auto_retry(3, IOError): f = urllib.urlopen("http://python.org/peps/pep-0340.html") print f.read() 5. It is possible to nest blocks and combine templates: def synchronized_opening(lock, filename, mode="r"): block synchronized(lock): block opening(filename) as f: yield f Used as follows: block synchronized_opening("/etc/passwd", myLock) as f: for line in f: print line.rstrip() 6. Coroutine example TBD. Acknowledgements In no useful order: Alex Martelli, Barry Warsaw, Bob Ippolito, Brett Cannon, Brian Sabbey, Doug Landauer, Duncan Booth, Fredrik Lundh, Greg Ewing, Holger Krekel, Jason Diamond, Jim Jewett, Josiah Carlson, Ka-Ping Yee, Michael Chermside, Michael Hudson, Neil Schemenauer, Nick Coghlan, Paul Moore, Phillip Eby, Raymond Hettinger, Samuele Pedroni, Shannon Behrens, Skip Montanaro, Steven Bethard, Terry Reedy, Tim Delaney, Aahz, and others. Thanks all for the valuable contributions! References [1] http://mail.python.org/pipermail/python-dev/2005-April/052821.html [2] http://msdn.microsoft.com/vcsharp/programming/language/ask/withstatement/ [3] http://effbot.org/zone/asyncore-generators.htm Copyright This document has been placed in the public domain.