418 lines
16 KiB
Plaintext
418 lines
16 KiB
Plaintext
PEP: 279
|
||
Title: Enhanced Generators
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: othello@javanet.com (Raymond D. Hettinger)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 30-Jan-2002
|
||
Python-Version: 2.3
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
This PEP introduces four orthogonal (not mutually exclusive) ideas
|
||
for enhancing the generators as introduced in Python version 2.2
|
||
[1]. The goal is to increase the convenience, utility, and power
|
||
of generators.
|
||
|
||
|
||
Rationale
|
||
|
||
Starting with xrange() and xreadlines(), Python has been evolving
|
||
toward a model that provides lazy evaluation as an alternative
|
||
when complete evaluation is not desired because of memory
|
||
restrictions or availability of data.
|
||
|
||
Starting with Python 2.2, a second evolutionary direction came in
|
||
the form of iterators and generators. The iter() factory function
|
||
and generators were provided as convenient means of creating
|
||
iterators. Deep changes were made to use iterators as a unifying
|
||
theme throughout Python. The unification came in the form of
|
||
establishing a common iterable interface for mappings, sequences,
|
||
and file objects. In the case of mappings and file objects, lazy
|
||
evaluation was made available.
|
||
|
||
The next steps in the evolution of generators are:
|
||
|
||
1. Add built-in functions which provide lazy alternatives to their
|
||
complete evaluation counterparts and one other convenience
|
||
function which was made possible once iterators and generators
|
||
became available. The new functions are xzip, xmap, xfilter,
|
||
and indexed.
|
||
|
||
2. Provide a generator alternative to list comprehensions [3]
|
||
making generator creation as convenient as list creation.
|
||
|
||
3. Extend the syntax of the 'yield' keyword to enable generator
|
||
parameter passing. The resulting increase in power simplifies
|
||
the creation of consumer streams which have a complex execution
|
||
state and/or variable state.
|
||
|
||
4. Add a generator method to enable exceptions to be passed to a
|
||
generator. Currently, there is no clean method for triggering
|
||
exceptions from outside the generator.
|
||
|
||
All of the suggestions are designed to take advantage of the
|
||
existing implementation and require little additional effort to
|
||
incorporate. Each is backward compatible and requires no new
|
||
keywords. These generator tools go into Python 2.3 when
|
||
generators become final and are not imported from __future__.
|
||
|
||
SourceForge contains a working, pure Python simulation of every
|
||
feature proposed in this PEP [8]. SourceForge also has a separate
|
||
file with a simulation test suite and working source code for the
|
||
examples listed used in this PEP [9].
|
||
|
||
|
||
Specification for new built-ins:
|
||
|
||
def xfilter( pred, gen ):
|
||
'''
|
||
xfilter(...)
|
||
xfilter(function, sequence) -> list
|
||
|
||
Return an iterator containing those items of sequence for
|
||
which function is true. If function is None, return a list of
|
||
items that are true.
|
||
'''
|
||
if pred is None:
|
||
for i in gen:
|
||
if i:
|
||
yield i
|
||
else:
|
||
for i in gen:
|
||
if pred(i):
|
||
yield i
|
||
|
||
def xmap( fun, *collections ):
|
||
'''
|
||
xmap(...)
|
||
xmap(function, sequence[, sequence, ...]) -> list
|
||
|
||
Return an iterator applying the function to the items of the
|
||
argument collection(s). If more than one collection is given,
|
||
the function is called with an argument list consisting of the
|
||
corresponding item of each collection, substituting None for
|
||
missing values when not all collections have the same length.
|
||
If the function is None, return a list of the items of the
|
||
collection (or a list of tuples if more than one collection).
|
||
'''
|
||
gens = map(iter, collections)
|
||
values_left = [1]
|
||
def values():
|
||
# Emulate map behaviour, i.e. shorter
|
||
# sequences are padded with None when
|
||
# they run out of values.
|
||
values_left[0] = 0
|
||
for i in range(len(gens)):
|
||
iterator = gens[i]
|
||
if iterator is None:
|
||
yield None
|
||
else:
|
||
try:
|
||
yield iterator.next()
|
||
values_left[0] = 1
|
||
except StopIteration:
|
||
gens[i] = None
|
||
yield None
|
||
while 1:
|
||
args = tuple(values())
|
||
if not values_left[0]:
|
||
raise StopIteration
|
||
yield fun(*args)
|
||
|
||
def xzip( *collections ): ### Code from Python Cookbook [6]
|
||
'''
|
||
xzip(...)
|
||
xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
|
||
|
||
Return a iterator of tuples, where each tuple contains the
|
||
i-th element from each of the argument sequences or iterable.
|
||
The returned iterator is truncated in length to the length of
|
||
the shortest argument collection.
|
||
'''
|
||
gens = map(iter, collections)
|
||
while 1:
|
||
yield tuple( [g.next() for g in gens] )
|
||
|
||
def indexed( collection, cnt=0, limit=None ):
|
||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||
gen = iter(collection)
|
||
while limit is None or cnt<limit:
|
||
yield (cnt, gen.next())
|
||
cnt += 1
|
||
|
||
Note A: PEP 212 Loop Counter Iteration [2] discussed several
|
||
proposals for achieving indexing. Some of the proposals only work
|
||
for lists unlike the above function which works for any generator,
|
||
xrange, sequence, or iterable object. Also, those proposals were
|
||
presented and evaluated in the world prior to Python 2.2 which did
|
||
not include generators.
|
||
|
||
Note B: An alternate, simplified definition of indexed was proposed:
|
||
|
||
def indexed( collection, cnt=0, limit=sys.maxint ):
|
||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||
return xzip( xrange(cnt,limit), collection )
|
||
|
||
|
||
Specification for Generator Comprehensions:
|
||
|
||
If a list comprehension starts with a 'yield' keyword, then
|
||
express the comprehension with a generator. For example:
|
||
|
||
g = [yield (len(line),line) for line in file if len(line)>5]
|
||
|
||
This would be implemented as if it had been written:
|
||
|
||
class __Temp:
|
||
def __iter__(self):
|
||
for line in file:
|
||
if len(line) > 5:
|
||
yield (len(line), line)
|
||
g = __Temp()
|
||
|
||
Note A: There is some debate about whether the enclosing brackets
|
||
should be part of the syntax for generator comprehensions. On the
|
||
plus side, it neatly parallels list comprehensions and would be
|
||
immediately recognizable as a similar form with similar internal
|
||
syntax (taking maximum advantage of what people already know).
|
||
More importantly, it sets off the generator comprehension from the
|
||
rest of the function so as to not suggest that the enclosing
|
||
function is a generator (currently the only cue that a function is
|
||
really a generator is the presence of the yield keyword). On the
|
||
minus side, the brackets may falsely suggest that the whole
|
||
expression returns a list. Most of the feedback received to date
|
||
indicates that brackets do not make a false suggestion and are
|
||
in fact helpful.
|
||
|
||
Note B: An iterable instance is returned by the above code. The
|
||
purpose is to allow the object to be re-started and looped-over
|
||
multiple times. This accurately mimics the behavior of list
|
||
comprehensions. As a result, the following code (provided by Oren
|
||
Tirosh) works equally well with or without 'yield':
|
||
|
||
letters = [yield chr(i) for i in xrange(ord('a'),ord('z')+1)]
|
||
digits = [yield str(i) for i in xrange(10)]
|
||
letdig = [yield l+d for l in letters for d in digits]
|
||
|
||
Note C: List comprehensions expose their looping variable and
|
||
leave the variable in the enclosing scope. The code, [str(i) for
|
||
i in range(8)] leaves 'i' set to 7 in the scope where the
|
||
comprehension appears. This behavior is by design and reflects an
|
||
intent to duplicate the result of coding a for-loop instead of a
|
||
list comprehension. Further, the variable 'i' is in a defined and
|
||
potentially useful state on the line immediately following the
|
||
list comprehension.
|
||
|
||
In contrast, generator comprehensions do not expose the looping
|
||
variable to the enclosing scope. The code, [yield str(i) for i in
|
||
range(8)] leaves 'i' untouched in the scope where the
|
||
comprehension appears. This is also by design and reflects an
|
||
intent to duplicate the result of coding a generator directly
|
||
instead of a generator comprehension. Further, the variable 'i'
|
||
is not in a defined state on the line immediately following the
|
||
list comprehension. It does not come into existence until
|
||
iteration starts. Since several generators may be running at
|
||
once, there are potentially multiple, unequal instances of 'i' at
|
||
any one time.
|
||
|
||
|
||
Specification for Generator Parameter Passing:
|
||
|
||
1. Allow 'yield' to assign a value as in:
|
||
|
||
def mygen():
|
||
while 1:
|
||
x = yield None
|
||
print x
|
||
|
||
2. Let the .next() method take a value to pass to generator as in:
|
||
|
||
g = mygen()
|
||
g.next() # runs the generator until the first 'yield'
|
||
g.next(1) # '1' is bound to 'x' in mygen(), then printed
|
||
g.next(2) # '2' is bound to 'x' in mygen(), then printed
|
||
|
||
The control flow is unchanged by this proposal. The only change
|
||
is that a value can be sent into the generator. By analogy,
|
||
consider the quality improvement from GOSUB (which had no argument
|
||
passing mechanism) to modern procedure calls (which can pass in
|
||
arguments and return values).
|
||
|
||
Most of the underlying machinery is already in place, only the
|
||
communication needs to be added by modifying the parse syntax to
|
||
accept the new 'x = yield expr' syntax and by allowing the .next()
|
||
method to accept an optional argument.
|
||
|
||
Yield is more than just a simple iterator creator. It does
|
||
something else truly wonderful -- it suspends execution and saves
|
||
state. It is good for a lot more than writing iterators. This
|
||
proposal further expands its capability by making it easier to
|
||
share data with the generator.
|
||
|
||
The .next(arg) mechanism is especially useful for:
|
||
1. Sending data to any generator
|
||
2. Writing lazy consumers with complex execution states
|
||
3. Writing co-routines (as demonstrated in Dr. Mertz's article [5])
|
||
|
||
The proposal is a clear improvement over the existing alternative
|
||
of passing data via global variables. It is also much simpler,
|
||
more readable and easier to debug than an approach involving the
|
||
threading module with its attendant mutexes, semaphores, and data
|
||
queues. A class-based approach competes well when there are no
|
||
complex execution states or variable states. When the complexity
|
||
increases, generators with parameter passing are much simpler
|
||
because they automatically save state (unlike classes which must
|
||
explicitly save the variable and execution state in instance
|
||
variables).
|
||
|
||
|
||
Example of a Complex Consumer
|
||
|
||
The encoder for arithmetic compression sends a series of
|
||
fractional values to a complex, lazy consumer. That consumer
|
||
makes computations based on previous inputs and only writes out
|
||
when certain conditions have been met. After the last fraction is
|
||
received, it has a procedure for flushing any unwritten data.
|
||
|
||
|
||
Example of a Consumer Stream
|
||
|
||
def filelike(packagename, appendOrOverwrite):
|
||
cum = []
|
||
if appendOrOverwrite == 'w+':
|
||
cum.extend( packages[packagename] )
|
||
try:
|
||
while 1:
|
||
dat = yield None
|
||
cum.append(dat)
|
||
except FlushStream:
|
||
packages[packagename] = cum
|
||
ostream = filelike('mydest','w') # Analogous to file.open(name,flag)
|
||
ostream.next() # Advance to the first yield
|
||
ostream.next(firstdat) # Analogous to file.write(dat)
|
||
ostream.next(seconddat)
|
||
ostream.throw( FlushStream ) # This feature proposed below
|
||
|
||
|
||
Example of a Complex Consumer
|
||
|
||
Loop over the picture files in a directory, shrink them
|
||
one-at-a-time to thumbnail size using PIL [7], and send them to a
|
||
lazy consumer. That consumer is responsible for creating a large
|
||
blank image, accepting thumbnails one-at-a-time and placing them
|
||
in a 5x3 grid format onto the blank image. Whenever the grid is
|
||
full, it writes-out the large image as an index print. A
|
||
FlushStream exception indicates that no more thumbnails are
|
||
available and that the partial index print should be written out
|
||
if there are one or more thumbnails on it.
|
||
|
||
|
||
Example of a Producer and Consumer Used Together in a Pipelike Fashion
|
||
|
||
'Analogy to: source | upper | sink'
|
||
sink = sinkgen()
|
||
sink.next()
|
||
for word in source():
|
||
sink.next( word.upper() )
|
||
|
||
|
||
Specification for Generator Exception Passing:
|
||
|
||
Add a .throw(exception) method to the resulting generator as in:
|
||
|
||
def mygen():
|
||
try:
|
||
while 1:
|
||
x = yield None
|
||
print x
|
||
except FlushStream:
|
||
print 'Done'
|
||
|
||
g = mygen()
|
||
g.next(5)
|
||
g.throw(FlushStream)
|
||
|
||
There is no existing work around for triggering an exception
|
||
inside a generator. This is a true deficiency. It is the only
|
||
case in Python where active code cannot be excepted to or through.
|
||
Even if the .next(arg) proposal is not adopted, we should add the
|
||
.throw() method.
|
||
|
||
Note A: The name of the throw method was selected for several
|
||
reasons. Raise is a keyword and so cannot be used as a method
|
||
name. Unlike raise which immediately raises an exception from the
|
||
current execution point, throw will first return to the generator
|
||
and then raise the exception. The word throw is suggestive of
|
||
putting the exception in another location. The word throw is
|
||
already associated with exceptions in other languages.
|
||
|
||
Note B: The throw syntax should exactly match raise's syntax including:
|
||
raise string g.throw(string)
|
||
raise string, data g.throw(string,data)
|
||
raise class, instance g.throw(class,instance)
|
||
raise instance g.throw(instance)
|
||
raise g.throw()
|
||
|
||
|
||
References
|
||
|
||
[1] PEP 255 Simple Generators
|
||
http://python.sourceforge.net/peps/pep-0255.html
|
||
|
||
[2] PEP 212 Loop Counter Iteration
|
||
http://python.sourceforge.net/peps/pep-0212.html
|
||
|
||
[3] PEP 202 List Comprehensions
|
||
http://python.sourceforge.net/peps/pep-0202.html
|
||
|
||
[4] There have been several discussion on comp.lang.python which helped
|
||
tease out these proposals:
|
||
|
||
Indexed Function
|
||
http://groups.google.com/groups?hl=en&th=33f778d92dd5720a
|
||
|
||
Xmap, Xfilter, Xzip and Two-way Generator Communication
|
||
http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1
|
||
|
||
Two-way Generator Communication -- Revised Version
|
||
http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1
|
||
|
||
Generator Comprehensions
|
||
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
|
||
|
||
Discussion Draft of this PEP
|
||
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
|
||
|
||
[5] Dr. David Mertz's draft column for Charming Python.
|
||
http://gnosis.cx/publish/programming/charming_python_b5.txt
|
||
|
||
[6] The code fragment for xmap() was found at:
|
||
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66448
|
||
|
||
[7] PIL, the Python Imaging Library can be found at:
|
||
http://www.pythonware.com/products/pil/
|
||
|
||
[8] A pure Python simulation of every feature in this PEP is at:
|
||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
|
||
|
||
[9] The full, working source code for each of the examples in this PEP
|
||
along with other examples and tests is at:
|
||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|