507 lines
20 KiB
Plaintext
507 lines
20 KiB
Plaintext
PEP: 279
|
||
Title: Enhanced Generators
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: python@rcn.com (Raymond D. Hettinger)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 30-Jan-2002
|
||
Python-Version: 2.3
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
This PEP introduces four orthogonal (not mutually exclusive) ideas
|
||
for enhancing the generators introduced in Python version 2.2 [1].
|
||
The goal is to increase the convenience, utility, and power
|
||
of generators.
|
||
|
||
|
||
Rationale
|
||
|
||
Starting with xrange() and xreadlines(), Python has been evolving
|
||
toward a model that provides lazy evaluation as an alternative
|
||
when complete evaluation is not desired because of memory
|
||
restrictions or availability of data.
|
||
|
||
Starting with Python 2.2, a second evolutionary direction came in
|
||
the form of iterators and generators. The iter() factory function
|
||
and generators were provided as convenient means of creating
|
||
iterators. Deep changes were made to use iterators as a unifying
|
||
theme throughout Python. The unification came in the form of
|
||
establishing a common iterable interface for mappings, sequences,
|
||
and file objects. In the case of mappings and file objects, lazy
|
||
evaluation became the norm.
|
||
|
||
The next steps in the evolution of generators are:
|
||
|
||
1. Add built-in functions which provide lazy alternatives to their
|
||
complete evaluation counterparts and one other convenience
|
||
function which was made possible once iterators and generators
|
||
became available. The new functions are xzip, xmap, xfilter,
|
||
and indexed.
|
||
|
||
2. Provide a generator alternative to list comprehensions [3]
|
||
making generator creation as convenient as list creation.
|
||
|
||
3. Extend the syntax of the 'yield' keyword to enable generator
|
||
parameter passing. The resulting increase in power simplifies
|
||
the creation of consumer streams which have a complex execution
|
||
state and/or variable state.
|
||
|
||
4. Add a generator method to enable exceptions to be passed to a
|
||
generator. Currently, there is no clean method for triggering
|
||
exceptions from outside the generator. Also, generator exception
|
||
passing helps mitigate the try/finally prohibition for generators.
|
||
|
||
All of the suggestions are designed to take advantage of the
|
||
existing implementation and require little additional effort to
|
||
incorporate. Each is backward compatible and requires no new
|
||
keywords. These generator tools go into Python 2.3 when
|
||
generators become final and are not imported from __future__.
|
||
|
||
|
||
|
||
Reference Implementation
|
||
|
||
There is not currently a CPython implementation; however, a simulation
|
||
module written in pure Python is available on SourceForge [8]. The
|
||
simulation covers every feature proposed in this PEP and is meant
|
||
to allow direct experimentation with the proposals.
|
||
|
||
There is also a module [9] with working source code for all of the
|
||
examples used in this PEP. It serves as a test suite for the simulator
|
||
and it documents how each of the new features works in practice.
|
||
|
||
|
||
|
||
Specification for new built-ins:
|
||
|
||
def xfilter(pred, gen):
|
||
'''
|
||
xfilter(...)
|
||
xfilter(function, sequence) -> list
|
||
|
||
Return an iterator containing those items of sequence for
|
||
which function is true. If function is None, return a list of
|
||
items that are true.
|
||
'''
|
||
if pred is None:
|
||
for i in gen:
|
||
if i:
|
||
yield i
|
||
else:
|
||
for i in gen:
|
||
if pred(i):
|
||
yield i
|
||
|
||
def xmap(fun, *collections): ### Code from Python Cookbook [6]
|
||
'''
|
||
xmap(...)
|
||
xmap(function, sequence[, sequence, ...]) -> list
|
||
|
||
Return an iterator applying the function to the items of the
|
||
argument collection(s). If more than one collection is given,
|
||
the function is called with an argument list consisting of the
|
||
corresponding item of each collection, substituting None for
|
||
missing values when not all collections have the same length.
|
||
If the function is None, return an iterator of the items of the
|
||
collection (or an iterator of tuples if more than one collection).
|
||
'''
|
||
gens = map(iter, collections)
|
||
values_left = [1]
|
||
def values():
|
||
# Emulate map behavior by padding sequences with None
|
||
# when they run out of values.
|
||
values_left[0] = 0
|
||
for i in range(len(gens)):
|
||
iterator = gens[i]
|
||
if iterator is None:
|
||
yield None
|
||
else:
|
||
try:
|
||
yield iterator.next()
|
||
values_left[0] = 1
|
||
except StopIteration:
|
||
gens[i] = None
|
||
yield None
|
||
while 1:
|
||
args = tuple(values())
|
||
if not values_left[0]:
|
||
raise StopIteration
|
||
yield fun(*args)
|
||
|
||
def xzip(*collections):
|
||
'''
|
||
xzip(...)
|
||
xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
|
||
|
||
Return a iterator of tuples, where each tuple contains the
|
||
i-th element from each of the argument sequences or iterable.
|
||
The returned iterator is truncated in length to the length of
|
||
the shortest argument collection.
|
||
'''
|
||
gens = map(iter, collections)
|
||
while 1:
|
||
yield tuple([g.next() for g in gens])
|
||
|
||
def indexed(collection, cnt=0, limit=None):
|
||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||
gen = iter(collection)
|
||
while limit is None or cnt<limit:
|
||
yield (cnt, gen.next())
|
||
cnt += 1
|
||
|
||
|
||
Note A: PEP 212 Loop Counter Iteration [2] discussed several
|
||
proposals for achieving indexing. Some of the proposals only work
|
||
for lists unlike the above function which works for any generator,
|
||
xrange, sequence, or iterable object. Also, those proposals were
|
||
presented and evaluated in the world prior to Python 2.2 which did
|
||
not include generators. As a result, the generator-less version in
|
||
PEP 212 had the disadvantage of consuming memory with a giant list
|
||
of tuples. The generator version presented here is fast and light,
|
||
works with all iterables, and allows users to abandon the sequence
|
||
in mid-stream.
|
||
|
||
|
||
Note B: An alternate, simplified definition of indexed is:
|
||
|
||
def indexed(collection, cnt=0, limit=sys.maxint):
|
||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||
return xzip( xrange(cnt,limit), collection )
|
||
|
||
|
||
Note C: As it stands, the Python code for xmap is slow. The actual
|
||
implementation of the functions should be written in C for speed.
|
||
The pure Python code listed above is meant only to specify how the
|
||
functions would behave, in particular that they should as closely as
|
||
possible emulate their non-lazy counterparts.
|
||
|
||
|
||
Note D: Almost all of the PEP reviewers welcomed these functions but were
|
||
divided as to whether they should be built-ins or in a separate module.
|
||
The main argument for a separate module was to slow the rate of language
|
||
inflation. The main argument for built-ins was that these functions are
|
||
destined to be part of a core programming style, applicable to any object
|
||
with an iterable interface. Just as zip() solves the problem of looping
|
||
over multiple sequences, the indexed() function solves the loop counter
|
||
problem. Likewise, the x-functions solve the problem of applying
|
||
functional constructs without forcing the evaluation of an entire sequence.
|
||
|
||
If only one built-in were allowed, then indexed() is the most important
|
||
general purpose tool, solving the broadest class of problems while
|
||
improving program brevity, clarity and reliability.
|
||
|
||
|
||
|
||
Specification for Generator Comprehensions:
|
||
|
||
If a list comprehension starts with a 'yield' keyword, then
|
||
express the comprehension with a generator. For example:
|
||
|
||
g = [yield (len(line),line) for line in file if len(line)>5]
|
||
print g.next()
|
||
|
||
This would be implemented as if it had been written:
|
||
|
||
def __temp(self):
|
||
for line in file:
|
||
if len(line) > 5:
|
||
yield (len(line), line)
|
||
g = __temp()
|
||
print g.next()
|
||
|
||
Note A: There is some discussion about whether the enclosing brackets
|
||
should be part of the syntax for generator comprehensions. On the
|
||
plus side, it neatly parallels list comprehensions and would be
|
||
immediately recognizable as a similar form with similar internal
|
||
syntax (taking maximum advantage of what people already know).
|
||
More importantly, it sets off the generator comprehension from the
|
||
rest of the function so as to not suggest that the enclosing
|
||
function is a generator (currently the only cue that a function is
|
||
really a generator is the presence of the yield keyword). On the
|
||
minus side, the brackets may falsely suggest that the whole
|
||
expression returns a list. Most of the feedback received to date
|
||
indicates that brackets are helpful and not misleading.
|
||
|
||
Note B: List comprehensions expose their looping variable and
|
||
leave that variable in the enclosing scope. The code, [str(i) for
|
||
i in range(8)] leaves 'i' set to 7 in the scope where the
|
||
comprehension appears. This behavior is by design and reflects an
|
||
intent to duplicate the result of coding a for-loop instead of a
|
||
list comprehension. Further, the variable 'i' is in a defined and
|
||
potentially useful state on the line immediately following the
|
||
list comprehension.
|
||
|
||
In contrast, generator comprehensions do not expose the looping
|
||
variable to the enclosing scope. The code, [yield str(i) for i in
|
||
range(8)] leaves 'i' untouched in the scope where the
|
||
comprehension appears. This is also by design and reflects an
|
||
intent to duplicate the result of coding a generator directly
|
||
instead of a generator comprehension. Further, the variable 'i'
|
||
is not in a defined state on the line immediately following the
|
||
list comprehension. It does not come into existence until
|
||
iteration starts (possibly never).
|
||
|
||
|
||
|
||
Specification for Generator Parameter Passing:
|
||
|
||
1. Allow 'yield' to assign a value as in:
|
||
|
||
def mygen():
|
||
while 1:
|
||
x = yield None
|
||
print x
|
||
|
||
2. Let the .next() method take a value to pass to the generator as in:
|
||
|
||
g = mygen()
|
||
g.next() # runs the generator until the first 'yield'
|
||
g.next(1) # '1' is bound to 'x' in mygen(), then printed
|
||
g.next(2) # '2' is bound to 'x' in mygen(), then printed
|
||
|
||
The control flow of 'yield' and 'next' is unchanged by this proposal.
|
||
The only change is that a value can be sent into the generator.
|
||
By analogy, consider the quality improvement from GOSUB (which had
|
||
no argument passing mechanism) to modern procedure calls (which can
|
||
pass in arguments and return values).
|
||
|
||
Most of the underlying machinery is already in place, only the
|
||
communication needs to be added by modifying the parse syntax to
|
||
accept the new 'x = yield expr' syntax and by allowing the .next()
|
||
method to accept an optional argument.
|
||
|
||
Yield is more than just a simple iterator creator. It does
|
||
something else truly wonderful -- it suspends execution and saves
|
||
state. It is good for a lot more than writing iterators. This
|
||
proposal further expands its capability by making it easier to
|
||
share data with the generator.
|
||
|
||
The .next(arg) mechanism is especially useful for:
|
||
1. Sending data to any generator
|
||
2. Writing lazy consumers with complex execution states
|
||
3. Writing co-routines (as demonstrated in Dr. Mertz's article [5])
|
||
|
||
The proposal is a clear improvement over the existing alternative
|
||
of passing data via global variables. It is also much simpler,
|
||
more readable and easier to debug than an approach involving the
|
||
threading module with its attendant mutexes, semaphores, and data
|
||
queues. A class-based approach competes well when there are no
|
||
complex execution states or variable states. However, when the
|
||
complexity increases, generators with parameter passing are much simpler
|
||
because they automatically save state (unlike classes which must
|
||
explicitly save the variable and execution state in instance variables).
|
||
|
||
Note A: This proposal changes 'yield' from a statement to an
|
||
expression with binding and precedence similar to lambda.
|
||
|
||
|
||
Example of a Complex Consumer
|
||
|
||
The encoder for arithmetic compression sends a series of
|
||
fractional values to a complex, lazy consumer. That consumer
|
||
makes computations based on previous inputs and only writes out
|
||
when certain conditions have been met. After the last fraction is
|
||
received, it has a procedure for flushing any unwritten data.
|
||
|
||
|
||
Example of a Consumer Stream
|
||
|
||
def filelike(packagename, appendOrOverwrite):
|
||
cum = []
|
||
if appendOrOverwrite == 'w+':
|
||
cum.extend(packages[packagename])
|
||
try:
|
||
while 1:
|
||
dat = yield None
|
||
cum.append(dat)
|
||
except FlushStream:
|
||
packages[packagename] = cum
|
||
|
||
ostream = filelike('mydest','w') # Analogous to file.open(name,flag)
|
||
ostream.next() # Advance to the first yield
|
||
ostream.next(firstdat) # Analogous to file.write(dat)
|
||
ostream.next(seconddat)
|
||
ostream.throw(FlushStream) # This feature proposed below
|
||
|
||
|
||
Example of a Complex Consumer
|
||
|
||
Loop over the picture files in a directory, shrink them
|
||
one at a time to thumbnail size using PIL [7], and send them to a
|
||
lazy consumer. That consumer is responsible for creating a large
|
||
blank image, accepting thumbnails one at a time and placing them
|
||
in a 5 by 3 grid format onto the blank image. Whenever the grid is
|
||
full, it writes-out the large image as an index print. A
|
||
FlushStream exception indicates that no more thumbnails are
|
||
available and that the partial index print should be written out
|
||
if there are one or more thumbnails on it.
|
||
|
||
|
||
Example of a Producer and Consumer Used Together in a Pipe-like Fashion
|
||
|
||
'Analogy to Linux style pipes: source | upper | sink'
|
||
sink = sinkgen()
|
||
sink.next()
|
||
for word in source():
|
||
sink.next(word.upper())
|
||
|
||
|
||
|
||
Specification for Generator Exception Passing:
|
||
|
||
Add a .throw(exception) method to the generator interface:
|
||
|
||
def mygen():
|
||
try:
|
||
while 1:
|
||
x = yield None
|
||
print x
|
||
except FlushStream:
|
||
print 'Done'
|
||
|
||
g = mygen()
|
||
g.next(5)
|
||
g.throw(FlushStream)
|
||
|
||
There is no existing work-around for triggering an exception
|
||
inside a generator. This is a true deficiency. It is the only
|
||
case in Python where active code cannot be excepted to or through.
|
||
Even if the .next(arg) proposal is not adopted, we should add the
|
||
.throw() method.
|
||
|
||
Generator exception passing also helps address an intrinsic limitation
|
||
on generators, the prohibition against their using try/finally to
|
||
trigger clean-up code [1]. Without .throw(), the current work-around
|
||
forces the resolution or clean-up code to be moved outside the generator.
|
||
|
||
|
||
Note A: The name of the throw method was selected for several
|
||
reasons. Raise is a keyword and so cannot be used as a method
|
||
name. Unlike raise which immediately raises an exception from the
|
||
current execution point, throw will first return to the generator
|
||
and then raise the exception. The word throw is suggestive of
|
||
putting the exception in another location. The word throw is
|
||
already associated with exceptions in other languages.
|
||
|
||
Alternative method names were considered: resolve(), signal(),
|
||
genraise(), raiseinto(), and flush(). None of these seem to fit
|
||
as well as throw().
|
||
|
||
|
||
Note B: The throw syntax should exactly match raise's syntax:
|
||
|
||
throw([expression, [expression, [expression]]])
|
||
|
||
Accordingly, it should be implemented to handle all of the following:
|
||
|
||
raise string g.throw(string)
|
||
raise string, data g.throw(string,data)
|
||
raise class, instance g.throw(class,instance)
|
||
raise instance g.throw(instance)
|
||
raise g.throw()
|
||
|
||
|
||
Discussion of Restartability:
|
||
|
||
Inside for-loops, generators are not substitutable for lists unless they
|
||
are accessed only once. A second access only works for restartable
|
||
objects like lists, dicts, objects defined with __getitem__, and
|
||
xrange objects. Generators are not the only objects which are not
|
||
restartable. Other examples of non-restartable sequences include file
|
||
objects, xreadlines objects, and the result of iter(callable,sentinel).
|
||
|
||
Since the proposed built-in functions return generators, they are also
|
||
non-restartable. As a result, 'xmap' is not substitutable for 'map' in
|
||
the following example:
|
||
|
||
alphabet = map(chr, xrange(ord('a'), ord('z')+1))
|
||
twoletterwords = [a+b for a in alphabet for b in alphabet]
|
||
|
||
Since generator comprehensions also return generators, they are not
|
||
restartable. Consequently, they are not substitutable for list
|
||
comprehensions in the following example:
|
||
|
||
digits = [str(i) for i in xrange(10)]
|
||
alphadig = [a+d for a in 'abcdefg' for d in digits]
|
||
|
||
To achieve substitutabity, generator comprehensions and x-functions
|
||
can be implemented in a way that supports restarts. PEP 234 [4]
|
||
explicitly states that restarts are to be supported through repeated
|
||
calls to iter(). With that guidance, it is easy to add restartability
|
||
to generator comprehensions using a simple wrapper class around the
|
||
generator function and modifying the implementation above to return:
|
||
|
||
g = Restartable(__temp) # instead of g = __temp()
|
||
|
||
Restartable is a simple (12 line) class which calls the generator function
|
||
to create a new, re-wound generator whenever iter() requests a restart.
|
||
Calls to .next() are simply forwarded to the generator. The Python source
|
||
code for the Restartable class can found in the PEP 279 simulator [8].
|
||
An actual implementation in C can achieve re-startability directly and
|
||
would not need the slow class wrapper used in the pure Python simulation.
|
||
|
||
The XLazy library [10] shows how restarts can be implemented for xmap,
|
||
xfilter, and xzip.
|
||
|
||
The upside of adding restart capability is that more list comprehensions
|
||
can be made lazy and save memory by adding 'yield'. Likewise,
|
||
more expressions that use map, filter, and zip can be made lazy just by
|
||
adding 'x'.
|
||
|
||
A possible downside is that x-functions have no control over whether their
|
||
inputs are themselves restartable. With non-restartable inputs like
|
||
generators or files, an x-function restart will not produce a meaningful
|
||
result.
|
||
|
||
|
||
|
||
References
|
||
|
||
[1] PEP 255 Simple Generators
|
||
http://python.sourceforge.net/peps/pep-0255.html
|
||
|
||
[2] PEP 212 Loop Counter Iteration
|
||
http://python.sourceforge.net/peps/pep-0212.html
|
||
|
||
[3] PEP 202 List Comprehensions
|
||
http://python.sourceforge.net/peps/pep-0202.html
|
||
|
||
[4] PEP 234 Iterators
|
||
http://python.sourceforge.net/peps/pep-0234.html
|
||
|
||
[5] Dr. David Mertz's draft column for Charming Python.
|
||
http://gnosis.cx/publish/programming/charming_python_b5.txt
|
||
|
||
[6] The code fragment for xmap() was found at:
|
||
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66448
|
||
|
||
[7] PIL, the Python Imaging Library can be found at:
|
||
http://www.pythonware.com/products/pil/
|
||
|
||
[8] A pure Python simulation of every feature in this PEP is at:
|
||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
|
||
|
||
[9] The full, working source code for each of the examples in this PEP
|
||
along with other examples and tests is at:
|
||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
|
||
|
||
[10] Oren Tirosh's XLazy library with re-startable x-functions is at:
|
||
http://www.tothink.com/python/dataflow/
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|