PEP 279, Enhanced Generators, Raymond D. Hettinger.
(Minor spell checking and formatting by BAW)
This commit is contained in:
parent
398c395d47
commit
1947249ff7
|
@ -0,0 +1,422 @@
|
||||||
|
PEP: 279
|
||||||
|
Title: Enhanced Generators
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: othello@javanet.com (Raymond D. Hettinger)
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Created: 30-Jan-2002
|
||||||
|
Python-Version: 2.3
|
||||||
|
Post-History:
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
|
||||||
|
This PEP introduces four orthogonal (not mutually exclusive) ideas
|
||||||
|
for enhancing the generators as introduced in Python version 2.2
|
||||||
|
[1]. The goal is increase the convenience, utility, and power of
|
||||||
|
generators.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
|
||||||
|
Starting with xrange() and xreadlines(), Python has been evolving
|
||||||
|
toward a model that provides lazy evaluation as an alternative
|
||||||
|
when complete evaluation is not desired because of memory
|
||||||
|
restrictions or availability of data.
|
||||||
|
|
||||||
|
Starting with Python 2.2, a second evolutionary direction came in
|
||||||
|
the form of iterators and generators. The iter() factory function
|
||||||
|
and generators were provided as convenient means of creating
|
||||||
|
iterators. Deep changes were made to use iterators as a unifying
|
||||||
|
theme throughout Python. The unification came in the form of
|
||||||
|
establishing a common iterable interface for mappings, sequences,
|
||||||
|
and file objects. In the case of mappings and file objects, lazy
|
||||||
|
evaluation was made available.
|
||||||
|
|
||||||
|
The next steps in the evolution of generators are:
|
||||||
|
|
||||||
|
1. Add built-in functions which provide lazy alternatives to their
|
||||||
|
complete evaluation counterparts and one other convenience
|
||||||
|
function which was made possible once iterators and generators
|
||||||
|
became available. The new functions are xzip, xmap, xfilter,
|
||||||
|
and indexed.
|
||||||
|
|
||||||
|
2. Provide a generator alternative to list comprehensions [3]
|
||||||
|
making generator creation as convenient as list creation.
|
||||||
|
|
||||||
|
3. Extend the syntax of the 'yield' keyword to enable two way
|
||||||
|
parameter passing. The resulting increase in power simplifies
|
||||||
|
the creation of consumer streams which have a complex execution
|
||||||
|
state and/or variable state.
|
||||||
|
|
||||||
|
4. Add a generator method to enable exceptions to be passed to a
|
||||||
|
generator. Currently, there is no clean method for triggering
|
||||||
|
exceptions from outside the generator.
|
||||||
|
|
||||||
|
All of the suggestions are designed to take advantage of the
|
||||||
|
existing implementation and require little additional effort to
|
||||||
|
incorporate. Each is backward compatible and requires no new
|
||||||
|
keywords.
|
||||||
|
|
||||||
|
|
||||||
|
Specification for new built-ins:
|
||||||
|
|
||||||
|
def xfilter( pred, gen ):
|
||||||
|
'''
|
||||||
|
xfilter(...)
|
||||||
|
xfilter(function, sequence) -> list
|
||||||
|
|
||||||
|
Return an iterator containing those items of sequence for
|
||||||
|
which function is true. If function is None, return a list of
|
||||||
|
items that are true.
|
||||||
|
'''
|
||||||
|
if pred is None:
|
||||||
|
for i in gen:
|
||||||
|
if i:
|
||||||
|
yield i
|
||||||
|
else:
|
||||||
|
for i in gen:
|
||||||
|
if pred(i):
|
||||||
|
yield i
|
||||||
|
|
||||||
|
def xmap( fun, *collections ):
|
||||||
|
'''
|
||||||
|
xmap(...)
|
||||||
|
xmap(function, sequence[, sequence, ...]) -> list
|
||||||
|
|
||||||
|
Return an iterator applying the function to the items of the
|
||||||
|
argument collection(s). If more than one collection is given,
|
||||||
|
the function is called with an argument list consisting of the
|
||||||
|
corresponding item of each collection, substituting None for
|
||||||
|
missing values when not all collections have the same length.
|
||||||
|
If the function is None, return a list of the items of the
|
||||||
|
collection (or a list of tuples if more than one collection).
|
||||||
|
'''
|
||||||
|
gens = map(iter, collections)
|
||||||
|
values_left = [1]
|
||||||
|
def values():
|
||||||
|
# Emulate map behaviour, i.e. shorter
|
||||||
|
# sequences are padded with None when
|
||||||
|
# they run out of values.
|
||||||
|
values_left[0] = 0
|
||||||
|
for i in range(len(gens)):
|
||||||
|
iterator = gens[i]
|
||||||
|
if iterator is None:
|
||||||
|
yield None
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
yield iterator.next()
|
||||||
|
values_left[0] = 1
|
||||||
|
except StopIteration:
|
||||||
|
gens[i] = None
|
||||||
|
yield None
|
||||||
|
while 1:
|
||||||
|
args = tuple(values())
|
||||||
|
if not values_left[0]:
|
||||||
|
raise StopIteration
|
||||||
|
yield func(*args)
|
||||||
|
|
||||||
|
def xzip( *collections ):
|
||||||
|
'''
|
||||||
|
xzip(...)
|
||||||
|
xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
|
||||||
|
|
||||||
|
Return a iterator of tuples, where each tuple contains the
|
||||||
|
i-th element from each of the argument sequences or iterable.
|
||||||
|
The returned iterator is truncated in length to the length of
|
||||||
|
the shortest argument collection.
|
||||||
|
'''
|
||||||
|
gens = map(iter, collections)
|
||||||
|
while 1:
|
||||||
|
yield tuple( [g.next() for g in gens] )
|
||||||
|
|
||||||
|
def indexed( collection, cnt=0, limit=None ):
|
||||||
|
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||||||
|
gen = iter(collection)
|
||||||
|
while limit is None or cnt<limit:
|
||||||
|
yield (cnt, collection.next())
|
||||||
|
cnt += 1
|
||||||
|
|
||||||
|
Note A: PEP 212 Loop Counter Iteration [2] discussed several
|
||||||
|
proposals for achieving indexing. Some of the proposals only work
|
||||||
|
for lists unlike the above function which works for any generator,
|
||||||
|
xrange, sequence, or iterable object. Also, those proposals were
|
||||||
|
presented and evaluated in the world prior to Python 2.2 which did
|
||||||
|
not include generators.
|
||||||
|
|
||||||
|
|
||||||
|
Specification for Generator Comprehensions:
|
||||||
|
|
||||||
|
If a list comprehension starts with a 'yield' keyword, then
|
||||||
|
express the remainder of the statement as generator. For example:
|
||||||
|
|
||||||
|
g = [yield (len(line),line) for line in file.readline() if len(line)>5]
|
||||||
|
print g.next()
|
||||||
|
print g.next()
|
||||||
|
|
||||||
|
This would be implemented as if it had been written:
|
||||||
|
|
||||||
|
def __temp_gen():
|
||||||
|
for line in file.readline():
|
||||||
|
if len(line) > 5:
|
||||||
|
yield (len(line), line)
|
||||||
|
g = __temp_gen()
|
||||||
|
print g.next()
|
||||||
|
print g.next()
|
||||||
|
|
||||||
|
Note A: There is a difference in the above implementation as
|
||||||
|
compared to list comprehensions. For a generator comprehension,
|
||||||
|
the variables are created in a separate scope while list
|
||||||
|
comprehensions use the enclosing scope. If this PEP is accepted,
|
||||||
|
the parser should generate byte code that eliminates this
|
||||||
|
difference by passing the line variable in the enclosing scope and
|
||||||
|
using that same variable passed by reference inside the generator.
|
||||||
|
This will make the behavior of generator comprehension identical
|
||||||
|
to that of list comprehensions.
|
||||||
|
|
||||||
|
Note B: There is some debate about whether the enclosing brackets
|
||||||
|
should be part of the syntax for generator comprehensions. On the
|
||||||
|
plus side, it neatly parallels list comprehensions and would be
|
||||||
|
immediately recognizable as a similar form with similar internal
|
||||||
|
syntax (taking maximum advantage of what people already know).
|
||||||
|
More importantly, it sets off the generator comprehension from the
|
||||||
|
rest of the function so as to not suggest that the enclosing
|
||||||
|
function is a generator (currently the only cue that a function is
|
||||||
|
really a generator is the presence of the yield keyword). On the
|
||||||
|
minus side, the brackets may falsely suggest that the whole
|
||||||
|
expression returns a list. All of the feedback received to date
|
||||||
|
indicates that brackets do not make a false suggestion and are
|
||||||
|
in fact helpful.
|
||||||
|
|
||||||
|
|
||||||
|
Specification for two-way Generator Parameter Passing:
|
||||||
|
|
||||||
|
1. Allow 'yield' to assign a value as in:
|
||||||
|
|
||||||
|
def mygen():
|
||||||
|
while 1:
|
||||||
|
x = yield None
|
||||||
|
print x
|
||||||
|
|
||||||
|
2. Let the .next() method take a value to pass to generator as in:
|
||||||
|
|
||||||
|
g = mygen()
|
||||||
|
g.next() # runs the generators until the first 'yield'
|
||||||
|
g.next(1) # the '1' gets bound to 'x' in mygen()
|
||||||
|
g.next(2) # the '2' gets bound to 'x' in mygen()
|
||||||
|
|
||||||
|
Note A: An early question arose, when would you need this? The
|
||||||
|
answer is that existing generators make it easy to write lazy
|
||||||
|
producers which may have a complex execution state and/or complex
|
||||||
|
variable state. This proposal makes it equally easy to write lazy
|
||||||
|
consumers which may also have a complex execution or variable
|
||||||
|
state.
|
||||||
|
|
||||||
|
For instance, when writing an encoder for arithmetic compression,
|
||||||
|
a series of fractional values are sent to a function which has
|
||||||
|
periodic output and a complex state which depends on previous
|
||||||
|
inputs. Also, that encoder requires a flush() function when no
|
||||||
|
additional fractions are to be output. It is helpful to think of
|
||||||
|
the following parallel with file output streams:
|
||||||
|
|
||||||
|
ostream = file('mydest.txt','w')
|
||||||
|
ostream.write(firstdat)
|
||||||
|
ostream.write(seconddat)
|
||||||
|
ostream.flush()
|
||||||
|
|
||||||
|
With the proposed extensions, it could be written like this:
|
||||||
|
|
||||||
|
def filelike(packagename, appendOrOverwrite):
|
||||||
|
cum = []
|
||||||
|
if appendOrOverwrite == 'w+':
|
||||||
|
cum.extend( packages[packagename] )
|
||||||
|
try:
|
||||||
|
while 1:
|
||||||
|
dat = yield None
|
||||||
|
cum.append(dat)
|
||||||
|
except FlushStream:
|
||||||
|
packages[packagename] = cum
|
||||||
|
ostream = filelike('mydest','w')
|
||||||
|
ostream.next()
|
||||||
|
ostream.next(firstdat)
|
||||||
|
ostream.next(seconddat)
|
||||||
|
ostream.throw( FlushStream ) # this feature discussed below
|
||||||
|
|
||||||
|
Note C: Almost all of the machinery necessary to implement this
|
||||||
|
extension is already in place. The parse syntax needs to be
|
||||||
|
modified to accept the new x = yield None syntax and the .next()
|
||||||
|
method needs to allow an argument.
|
||||||
|
|
||||||
|
Note D: Some care must be used when writing a values to the
|
||||||
|
generator because execution starts at the top of the generator not
|
||||||
|
at the first yield.
|
||||||
|
|
||||||
|
Consider the usual flow using .next() without an argument.
|
||||||
|
|
||||||
|
g = mygen(p1) will bind p1 to a local variable and then return a
|
||||||
|
generator to be bound to g and NOT run any code in mygen().
|
||||||
|
y = g.next() runs the generator from the first line until it
|
||||||
|
encounters a yield when it suspends execution and a returns
|
||||||
|
a value to be bound to y
|
||||||
|
|
||||||
|
Since the same flow applies when you are submitting values, the
|
||||||
|
first call to .next() should have no argument since there is no
|
||||||
|
place to put it.
|
||||||
|
|
||||||
|
g = mygen(p1) will bind p1 to a local variable and then return a
|
||||||
|
generator to be bound to g and NOT run any code in mygen()
|
||||||
|
g.next() will START execution in mygen() from the first line. Note,
|
||||||
|
that there is nowhere to bind any potential arguments that
|
||||||
|
might have been supplied to next(). Execution continues
|
||||||
|
until the first yield is encountered and control is returned
|
||||||
|
to the caller.
|
||||||
|
g.next(val) resumes execution at the yield and binds val to the
|
||||||
|
left hand side of the yield assignment and continues running
|
||||||
|
until another yield is encountered. This makes sense because
|
||||||
|
you submit values expecting them to be processed right away.
|
||||||
|
|
||||||
|
|
||||||
|
Q. Two-way generator parameter passing seems awfully bold. To
|
||||||
|
my mind, one of the great things about generators is that they
|
||||||
|
meet the (very simple) definition of an iterator. With this,
|
||||||
|
they no longer do. I like lazy consumers -- really I do --
|
||||||
|
but I'd rather be conservative about putting something like
|
||||||
|
this in the language.
|
||||||
|
|
||||||
|
A. If you don't use x = yield expr, then nothing changes and you
|
||||||
|
haven't lost anything. So, it isn't really bold. It simply
|
||||||
|
adds an option to pass in data as well as take it out. Other
|
||||||
|
generator implementations (like the thread based generator.py)
|
||||||
|
already have provisions for two-way parameter passing so that
|
||||||
|
consumers are put on an equal footing with producers. Two-way
|
||||||
|
is the norm, not the exception.
|
||||||
|
|
||||||
|
Yield is not just a simple iterator creator. It does
|
||||||
|
something else truly wonderful -- it suspends execution and
|
||||||
|
saves state. It is good for a lot more than its original
|
||||||
|
purpose. Dr. Mertz's article [5] shows how they can be used
|
||||||
|
to create general purpose co-routines.
|
||||||
|
|
||||||
|
Besides, 98% of the mechanism is already in place. Only the
|
||||||
|
communication needs to be added. Remember GOSUB which neither
|
||||||
|
took nor returned data. Routines which accepted parameters
|
||||||
|
and returned values were a major step forward.
|
||||||
|
|
||||||
|
When you first need to pass information into a generator, the
|
||||||
|
existing alternative is clumsy. It involves setting a global
|
||||||
|
variable, calling .next(), and assigning the local from the
|
||||||
|
global.
|
||||||
|
|
||||||
|
|
||||||
|
Q. Why not introduce another keyword 'accept' for lazy consumers?
|
||||||
|
|
||||||
|
A. To avoid conflicts with 'yield', to avoid creating a new
|
||||||
|
keyword, and to take advantage of the explicit clarity of the
|
||||||
|
'=' operator.
|
||||||
|
|
||||||
|
|
||||||
|
Q. How often does one need to write a lazy consumer or a co-routine?
|
||||||
|
|
||||||
|
A. Not often. But, when you DO have to write one, this approach
|
||||||
|
is the easiest to implement, read, and debug.
|
||||||
|
|
||||||
|
It clearly beats using existing generators and passing data
|
||||||
|
through global variables. It is much clearer and easier to
|
||||||
|
debug than an equivalent approach using threading, mutexes,
|
||||||
|
semaphores, and data queues. A class based approach competes
|
||||||
|
well when there are no complex execution states or variable
|
||||||
|
states. When the complexity increases, generators with
|
||||||
|
two-way communication are much simpler because they
|
||||||
|
automatically save state unlike classes which must explicitly
|
||||||
|
store variable and execution state in instance variables.
|
||||||
|
|
||||||
|
|
||||||
|
Q. Why does yield require an argument? Isn't yield None too wordy?
|
||||||
|
|
||||||
|
A. It doesn't matter for the purposes of this PEP. For
|
||||||
|
information purposes, here is the reasoning as I understand
|
||||||
|
it. Though return allows an implicit None, some now consider
|
||||||
|
this to be weak design. There is some spirit of "Explicit is
|
||||||
|
better than Implicit". More importantly, in most uses of
|
||||||
|
yield, a missing argument is more likely to be a bug than an
|
||||||
|
intended yield None.
|
||||||
|
|
||||||
|
|
||||||
|
Specification for Generator Exception Passing:
|
||||||
|
|
||||||
|
Add a .throw(exception) method to the resulting generator as in:
|
||||||
|
|
||||||
|
def mygen():
|
||||||
|
try:
|
||||||
|
while 1:
|
||||||
|
x = yield None
|
||||||
|
print x
|
||||||
|
except FlushStream:
|
||||||
|
print 'Done'
|
||||||
|
|
||||||
|
g = mygen()
|
||||||
|
g.next(5)
|
||||||
|
g.throw(FlushStream)
|
||||||
|
|
||||||
|
There is no existing work around for triggering an exception
|
||||||
|
inside a generator. This is a true deficiency. It is the only
|
||||||
|
case in Python where active code cannot be excepted to or through.
|
||||||
|
Even if .next(arg) is not adopted, we should add the .throw()
|
||||||
|
method.
|
||||||
|
|
||||||
|
Note A: The name of the throw method was selected for several
|
||||||
|
reasons. Raise is a keyword and so cannot be used as a method
|
||||||
|
name. Unlike raise which immediately raises an exception from the
|
||||||
|
current execution point, throw will first return to the generator
|
||||||
|
and then raise the exception. The word throw is suggestive of
|
||||||
|
putting the exception in another location. The word throw is
|
||||||
|
already associated with exceptions in other languages.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
|
||||||
|
[1] PEP 255 Simple Generators
|
||||||
|
http://python.sourceforge.net/peps/pep-0255.html
|
||||||
|
|
||||||
|
[2] PEP 212 Loop Counter Iteration
|
||||||
|
http://python.sourceforge.net/peps/pep-0212.html
|
||||||
|
|
||||||
|
[3] PEP 202 List Comprehensions
|
||||||
|
http://python.sourceforge.net/peps/pep-0202.html
|
||||||
|
|
||||||
|
[4] There have been several discussion on comp.lang.python which helped
|
||||||
|
tease out these proposals:
|
||||||
|
|
||||||
|
Indexed Function
|
||||||
|
http://groups.google.com/groups?hl=en&th=33f778d92dd5720a
|
||||||
|
|
||||||
|
Xmap, Xfilter, Xzip and Two-way Generator Communication
|
||||||
|
http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1
|
||||||
|
|
||||||
|
Two-way Generator Communication -- Revised Version
|
||||||
|
http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1
|
||||||
|
|
||||||
|
Generator Comprehensions
|
||||||
|
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
|
||||||
|
|
||||||
|
|
||||||
|
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
|
||||||
|
|
||||||
|
[5] Dr. David Mertz's draft column for Charming Python.
|
||||||
|
href="http://gnosis.cx/publish/programming/charming_python_b5.txt
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
fill-column: 70
|
||||||
|
End:
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue