python-peps/pep-0340.txt

643 lines
24 KiB
Plaintext
Raw Normal View History

2005-04-27 03:21:38 -04:00
PEP: 340
Title: Anonymous Block Statements
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 27-Apr-2005
Post-History:
Introduction
This PEP proposes a new type of compound statement which can be
used for resource management purposes, and a new iterator API to
go with it. The new statement type is provisionally called the
block-statement because the keyword to be used has not yet been
chosen.
This PEP competes with several other PEPs: PEP 288 (Generators
Attributes and Exceptions; only the second part), PEP 310
(Reliable Acquisition/Release Pairs), and PEP 325
(Resource-Release Support for Generators).
This proposal is just a strawman; we've had a heated debate about
this on python-dev recently [1], and I figured it would be time to
write up a precise spec in PEP form.
Proposal Evolution
The discussion on python-dev has changed my mind slightly on how
exceptions should be handled, but I don't have the time to do a
full update of the PEP right now. Basically, I'm now in favor of
a variation on the exception handling proposed in the section
"Alternative __next__() and Generator Exception Handling" below.
The added twist is that instead of adding a flag argument to
next() and __next__() to indicate whether the previous argument is
a value or an exception, we use a separate API (an __error__()
method taking an exception and perhaps a traceback) for the
exception. If an iterator doesn't implement __error__(), the
exception is just re-raised. It is expected that, apart from
generators, very few iterators will implement __error__(); one use
case would be a fast implementation of synchronized() written in
C.
The built-in next() function only interfaces to the next() and
__next__() methods; there is no user-friendly API to call
__error__().
Perhaps __error__() should be named __exit__().
Motivation and Summary
(Thanks to Shane Hathaway -- Hi Shane!)
Good programmers move commonly used code into reusable functions.
Sometimes, however, patterns arise in the structure of the
functions rather than the actual sequence of statements. For
example, many functions acquire a lock, execute some code specific
to that function, and unconditionally release the lock. Repeating
the locking code in every function that uses it is error prone and
makes refactoring difficult.
Block statements provide a mechanism for encapsulating patterns of
structure. Code inside the block statement runs under the control
of an object called a block iterator. Simple block iterators
execute code before and after the code inside the block statement.
Block iterators also have the opportunity to execute the
controlled code more than once (or not at all), catch exceptions,
or receive data from the body of the block statement.
A convenient way to write block iterators is to write a generator
(PEP 255). A generator looks a lot like a Python function, but
instead of returning a value immediately, generators pause their
execution at "yield" statements. When a generator is used as a
block iterator, the yield statement tells the Python interpreter
to suspend the block iterator, execute the block statement body,
and resume the block iterator when the body has executed.
The Python interpreter behaves as follows when it encounters a
block statement based on a generator. First, the interpreter
instantiates the generator and begins executing it. The generator
does setup work appropriate to the pattern it encapsulates, such
as acquiring a lock, opening a file, starting a database
transaction, or starting a loop. Then the generator yields
execution to the body of the block statement using a yield
statement. When the block statement body completes, raises an
uncaught exception, or sends data back to the generator using a
continue statement, the generator resumes. At this point, the
generator can either clean up and stop or yield again, causing the
block statement body to execute again. When the generator
finishes, the interpreter leaves the block statement.
Use Cases
TBD. For now, see the Examples section near the end.
2005-04-27 03:21:38 -04:00
Specification: the Iteration Exception Hierarchy
Two new built-in exceptions are defined, and StopIteration is
moved in the exception hierarchy:
class Iteration(Exception):
pass
class StopIteration(Iteration):
pass
class ContinueIteration(Iteration):
def __init__(self, value=None):
self.value = None
Specification: the __next__() Method
A new method for iterators is proposed, called __next__(). It
takes one optional argument, which defaults to None. If not None,
the argument must be an Iteration instance. Calling the
__next__() method without argument or with None is equivalent to
using the old iterator API, next(). For backwards compatibility,
it is recommended that iterators also implement a next() method as
an alias for calling the __next__() method without an argument.
Calling the __next__() method with a StopIteration instance
signals the iterator that the caller wants to abort the iteration
sequence; the iterator should respond by doing any necessary
cleanup and raising StopIteration. Calling it with a
ContinueIteration instance signals the iterator that the caller
wants to continue the iteration; the ContinueIteration exception
has a 'value' attribute which may be used by the iterator as a
hint on what to do next. Calling it with a (base class) Iteration
instance is the same as calling it with None.
Specification: the next() Built-in Function
This is a built-in function defined as follows:
def next(itr, arg=None):
nxt = getattr(itr, "__next__", None)
if nxt is not None:
return nxt(arg)
if arg is None:
return itr.next()
raise TypeError("next() with arg for old-style iterator")
Specification: the 'for' Loop
A small change in the translation of the for-loop is proposed.
The statement
for VAR1 in EXPR1:
BLOCK1
will be translated as follows:
itr = iter(EXPR1)
arg = None
while True:
try:
VAR1 = next(itr, arg)
except StopIteration:
2005-04-27 03:21:38 -04:00
break
arg = None
BLOCK1
2005-04-29 10:44:31 -04:00
(However, 'itr' and 'arg' are hidden from the user, their scope
2005-04-27 03:21:38 -04:00
ends when the while-loop is exited, and they are not shared with
nested or outer for-loops, and the user cannot override the
built-ins referenced.)
I'm leaving the translation of an else-clause up to the reader;
note that you can't simply affix the else-clause to the while-loop
since it is always broken out.
2005-04-27 03:21:38 -04:00
Specification: the Extended 'continue' Statement
In the translation of the for-loop, inside BLOCK1, the new syntax
continue EXPR2
is legal and is translated into
arg = ContinueIteration(EXPR2)
continue
(Where 'arg' references the corresponding hidden variable from the
previous section.)
This is also the case in the body of the block-statement proposed
below.
Specification: the Anonymous Block Statement
A new statement is proposed with the syntax
block EXPR1 as VAR1:
BLOCK1
Here, 'block' and 'as' are new keywords; EXPR1 is an arbitrary
expression (but not an expression-list) and VAR1 is an arbitrary
assignment target (which may be a comma-separated list).
The "as VAR1" part is optional; if omitted, the assignment to VAR1
in the translation below is omitted (but the next() call is not!).
The choice of the 'block' keyword is contentious; it has even been
proposed not to use a keyword at all. PEP 310 uses 'with' for
similar semantics, but I would like to reserve that for a
with-statement similar to the one found in Pascal and VB. To
sidestep this issue momentarily I'm using 'block' until we can
agree on a keyword. (I just found that the C# designers don't
like 'with' [2].)
Note that it is left in the middle whether a block-statement
represents a loop or not; this is up to the iterator, but in the
most common case BLOCK1 is executed exactly once.
The translation is subtly different from the translation of a
for-loop: iter() is not called, so EXPR1 should already be an
iterator (not just an iterable); and the iterator is guaranteed to
be exhausted when the block-statement is left:
itr = EXPR1
val = arg = None
2005-04-27 03:21:38 -04:00
ret = False
while True:
try:
VAR1 = next(itr, arg)
except StopIteration:
if ret:
return val
if val is not None:
raise val
2005-04-27 03:21:38 -04:00
break
try:
val = arg = None
ret = False
2005-04-27 03:21:38 -04:00
BLOCK1
except Exception, val:
2005-04-27 03:21:38 -04:00
arg = StopIteration()
(Again, 'itr' etc. are hidden, and the user cannot override the
2005-04-27 03:21:38 -04:00
built-ins.)
The "raise val" translation is inexact; this is supposed to
2005-04-27 03:21:38 -04:00
re-raise the exact exception that was raised inside BLOCK1, with
the same traceback. We can't use a bare raise-statement because
we've just caught StopIteration.
Inside BLOCK1, the following special translations apply:
- "continue" and "continue EXPR2" are always legal; the latter is
translated as shown earlier:
arg = ContinueIteration(EXPR2)
continue
- "break" is always legal; it is translated into:
arg = StopIteration()
continue
- "return EXPR3" is only legal when the block-statement is
contained in a function definition; it is translated into:
val = EXPR3
2005-04-27 03:21:38 -04:00
ret = True
arg = StopIteration()
continue
The net effect is that break, continue and return behave much the
same as if the block-statement were a for-loop, except that the
iterator gets a chance at resource cleanup before the
block-statement is left. The iterator also gets a chance if the
block-statement is left through raising an exception.
Note that a yield-statement (or a yield-expression, see below) in
a block-statement is not treated differently. It suspends the
function containing the block *without* notifying the block's
2005-04-29 11:23:34 -04:00
iterator. The block's iterator is entirely unaware of this
yield, since the local control flow doesn't actually leave the
block. In other words, it is *not* like a break, continue or
return statement. When the loop that was resumed by the yield
calls next(), the block is resumed right after the yield. The
generator finalization semantics described below guarantee (within
the limitations of all finalization semantics) that the block will
be resumed eventually.
I haven't decided yet whether the block-statement should also
allow an optional else-clause, like the for-loop. I think it
would be confusing, and emphasize the "loopiness" of the
block-statement, while I want to emphasize its *difference* from a
for-loop.
2005-04-27 03:21:38 -04:00
Specification: Generator Exception Handling
Generators will implement the new __next__() method API, as well
as the old argument-less next() method.
Generators will be allowed to have a yield statement inside a
try-finally statement.
The expression argument to the yield-statement will become
optional (defaulting to None).
The yield-statement will be allowed to be used on the right-hand
side of an assignment; in that case it is referred to as
yield-expression. The value of this yield-expression is None
unless __next__() was called with a ContinueIteration argument;
see below.
A yield-expression must always be parenthesized except when it
occurs at the top-level expression on the right-hand side of an
assignment. So
x = yield 42
x = yield
x = 12 + (yield 42)
x = 12 + (yield)
foo(yield 42)
foo(yield)
are all legal, but
x = 12 + yield 42
x = 12 + yield
foo(yield 42, 12)
foo(yield, 12)
are all illegal. (Some of the edge cases are motivated by the
current legality of "yield 12, 42".)
When __next__() is called with a StopIteration instance argument,
the yield statement that is resumed by the __next__() call will
raise this StopIteration exception. The generator should re-raise
this exception; it should not yield another value. When the
*initial* call to __next__() receives a StopIteration instance
argument, the generator's execution is aborted and the exception
is re-raised without passing control to the generator's body.
When __next__() is called with a ContinueIteration instance
argument, the yield-expression that it resumes will return the
value attribute of the argument. If it resumes a yield-statement,
the value is ignored. When the *initial* call to __next__()
receives a ContinueIteration instance argument, the generator's
execution is started normally; the argument's value attribute is
ignored.
When a generator that has not yet terminated is garbage-collected
(either through reference counting or by the cyclical garbage
collector), its __next__() method is called once with a
StopIteration instance argument. Together with the requirement
that __next__() should always re-raise a StopIteration argument,
this guarantees the eventual activation of any finally-clauses
that were active when the generator was last suspended. Of
course, under certain circumstances the generator may never be
garbage-collected. This is no different than the guarantees that
are made about finalizers (__del__() methods) of other objects.
2005-04-29 11:23:34 -04:00
Note: the syntactic extensions to yield make its use very similar
2005-04-27 03:21:38 -04:00
to that in Ruby. This is intentional. Do note that in Python the
block passes a value to the generator using "continue EXPR" rather
than "return EXPR", and the underlying mechanism whereby control
is passed between the generator and the block is completely
different. Blocks in Python are not compiled into thunks; rather,
yield suspends execution of the generator's frame. Some edge
cases work differently; in Python, you cannot save the block for
later use, and you cannot test whether there is a block or not.
Specification: Alternative __next__() and Generator Exception Handling
The above specification doesn't let the generator handle general
exceptions. If we want that, we could modify the __next__() API
to take either a value or an exception argument, with an
additional flag argument to distinguish between the two. When the
second argument is True, the first must be an Exception instance,
which raised at the point of the resuming yield; otherwise the
first argument is the value that is returned from the
yield-expression (or ignored by a yield-statement). Wrapping a
regular value in a ContinueIteration is then no longer necessary.
The next() built-in would be modified likewise:
def next(itr, arg=None, exc=False):
nxt = getattr(itr, "__next__", None)
if nxt is not None:
return nxt(arg, exc)
if arg is None and not exc:
return itr.next()
raise TypeError("next() with args for old-style iterator")
The translation of a block-statement would become:
itr = EXPR1
arg = val = None
ret = exc = False
while True:
try:
VAR1 = next(itr, arg, exc)
except StopIteration:
if ret:
return val
break
try:
arg = val = None
ret = exc = False
BLOCK1
except Exception, arg:
exc = True
The translation of "continue EXPR2" would become:
arg = EXPR2
continue
The translation of "break" inside a block-statement would become:
arg = StopIteration()
exc = True
continue
The translation of "return EXPR3" inside a block-statement would
become:
val = EXPR3
arg = StopIteration()
ret = exc = True
continue
The translation of a for-loop would be the same as indicated
earlier (inside a for-loop only the translation of "continue
EXPR2" is changed; break and return translate to themselves in
that case).
Comparison to Thunks
Alternative semantics proposed for the block-statement turn the
block into a thunk (an anonymous function that blends into the
containing scope).
The main advantage of thunks that I can see is that you can save
the thunk for later, like a callback for a button widget (the
thunk then becomes a closure). You can't use a yield-based block
for that (except in Ruby, which uses yield syntax with a
thunk-based implementation). But I have to say that I almost see
this as an advantage: I think I'd be slightly uncomfortable seeing
a block and not knowing whether it will be executed in the normal
control flow or later. Defining an explicit nested function for
that purpose doesn't have this problem for me, because I already
know that the 'def' keyword means its body is executed later.
The other problem with thunks is that once we think of them as the
anonymous functions they are, we're pretty much forced to say that
a return statement in a thunk returns from the thunk rather than
from the containing function. Doing it any other way would cause
major weirdness when the thunk were to survive its containing
function as a closure (perhaps continuations would help, but I'm
not about to go there :-).
But then an IMO important use case for the resource cleanup
template pattern is lost. I routinely write code like this:
def findSomething(self, key, default=None):
self.lock.acquire()
try:
for item in self.elements:
if item.matches(key):
return item
return default
finally:
self.lock.release()
and I'd be bummed if I couldn't write this as:
def findSomething(self, key, default=None):
block synchronized(self.lock):
for item in self.elements:
if item.matches(key):
return item
return default
This particular example can be rewritten using a break:
def findSomething(self, key, default=None):
block synchronized(self.lock):
for item in self.elements:
if item.matches(key):
break
else:
item = default
return item
but it looks forced and the transformation isn't always that easy;
you'd be forced to rewrite your code in a single-return style
which feels too restrictive.
Also note the semantic conundrum of a yield in a thunk -- the only
reasonable interpretation is that this turns the thunk into a
generator!
Greg Ewing believes that thunks "would be a lot simpler, doing
just what is required without any jiggery pokery with exceptions
and break/continue/return statements. It would be easy to explain
what it does and why it's useful."
But in order to obtain the required local variable sharing between
the thunk and the containing function, every local variable used
or set in the thunk would have to become a 'cell' (our mechanism
for sharing variables between nested scopes). Cells slow down
access compared to regular local variables: access involves an
extra C function call (PyCell_Get() or PyCell_Set()).
Perhaps not entirely coincidentally, the last example above
(findSomething() rewritten to avoid a return inside the block)
shows that, unlike for regular nested functions, we'll want
variables *assigned to* by the thunk also to be shared with the
containing function, even if they are not assigned to outside the
thunk.
Greg Ewing again: "generators have turned out to be more powerful,
because you can have more than one of them on the go at once. Is
there a use for that capability here?"
I believe there are definitely uses for this; several people have
already shown how to do asynchronous light-weight threads using
generators (e.g. David Mertz quoted in PEP 288, and Fredrik
Lundh[3]).
And finally, Greg says: "a thunk implementation has the potential
to easily handle multiple block arguments, if a suitable syntax
could ever be devised. It's hard to see how that could be done in
a general way with the generator implementation."
However, the use cases for multiple blocks seem elusive.
2005-04-27 03:21:38 -04:00
Alternatives Considered
TBD.
Examples
1. A template for ensuring that a lock, acquired at the start of a
block, is released when the block is left:
def synchronized(lock):
lock.acquire()
try:
yield
finally:
lock.release()
Used as follows:
block synchronized(myLock):
# Code here executes with myLock held. The lock is
# guaranteed to be released when the block is left (even
# if by an uncaught exception).
2. A template for opening a file that ensures the file is closed
when the block is left:
def opening(filename, mode="r"):
f = open(filename, mode)
try:
yield f
finally:
f.close()
Used as follows:
block opening("/etc/passwd") as f:
for line in f:
print line.rstrip()
3. A template for committing or rolling back a database
transaction:
def transactional(db):
try:
yield
except:
db.rollback()
raise
else:
db.commit()
4. A template that tries something up to n times:
def auto_retry(n=3, exc=Exception):
for i in range(n):
try:
yield
return
except Exception, err:
# perhaps log exception here
continue
raise # re-raise the exception we caught earlier
Used as follows:
block auto_retry(3, IOError):
f = urllib.urlopen("http://python.org/peps/pep-0340.html")
print f.read()
5. It is possible to nest blocks and combine templates:
def synchronized_opening(lock, filename, mode="r"):
block synchronized(lock):
block opening(filename) as f:
yield f
Used as follows:
block synchronized_opening("/etc/passwd", myLock) as f:
for line in f:
print line.rstrip()
6. Coroutine example TBD.
2005-04-27 03:21:38 -04:00
Acknowledgements
In no useful order: Alex Martelli, Barry Warsaw, Bob Ippolito,
Brett Cannon, Brian Sabbey, Doug Landauer, Duncan Booth, Fredrik
Lundh, Greg Ewing, Holger Krekel, Jason Diamond, Jim Jewett,
Josiah Carlson, Ka-Ping Yee, Michael Chermside, Michael Hudson,
Neil Schemenauer, Nick Coghlan, Paul Moore, Phillip Eby, Raymond
Hettinger, Samuele Pedroni, Shannon Behrens, Skip Montanaro,
Steven Bethard, Terry Reedy, Tim Delaney, Aahz, and others.
Thanks all for the valuable contributions!
2005-04-27 03:21:38 -04:00
References
[1] http://mail.python.org/pipermail/python-dev/2005-April/052821.html
[2] http://msdn.microsoft.com/vcsharp/programming/language/ask/withstatement/
[3] http://effbot.org/zone/asyncore-generators.htm
Copyright
This document has been placed in the public domain.