PEP 279, Enhanced Generators, Raymond D. Hettinger.
(Minor spell checking and formatting by BAW)
This commit is contained in:
parent
398c395d47
commit
1947249ff7
|
@ -0,0 +1,422 @@
|
|||
PEP: 279
|
||||
Title: Enhanced Generators
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: othello@javanet.com (Raymond D. Hettinger)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 30-Jan-2002
|
||||
Python-Version: 2.3
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP introduces four orthogonal (not mutually exclusive) ideas
|
||||
for enhancing the generators as introduced in Python version 2.2
|
||||
[1]. The goal is increase the convenience, utility, and power of
|
||||
generators.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Starting with xrange() and xreadlines(), Python has been evolving
|
||||
toward a model that provides lazy evaluation as an alternative
|
||||
when complete evaluation is not desired because of memory
|
||||
restrictions or availability of data.
|
||||
|
||||
Starting with Python 2.2, a second evolutionary direction came in
|
||||
the form of iterators and generators. The iter() factory function
|
||||
and generators were provided as convenient means of creating
|
||||
iterators. Deep changes were made to use iterators as a unifying
|
||||
theme throughout Python. The unification came in the form of
|
||||
establishing a common iterable interface for mappings, sequences,
|
||||
and file objects. In the case of mappings and file objects, lazy
|
||||
evaluation was made available.
|
||||
|
||||
The next steps in the evolution of generators are:
|
||||
|
||||
1. Add built-in functions which provide lazy alternatives to their
|
||||
complete evaluation counterparts and one other convenience
|
||||
function which was made possible once iterators and generators
|
||||
became available. The new functions are xzip, xmap, xfilter,
|
||||
and indexed.
|
||||
|
||||
2. Provide a generator alternative to list comprehensions [3]
|
||||
making generator creation as convenient as list creation.
|
||||
|
||||
3. Extend the syntax of the 'yield' keyword to enable two way
|
||||
parameter passing. The resulting increase in power simplifies
|
||||
the creation of consumer streams which have a complex execution
|
||||
state and/or variable state.
|
||||
|
||||
4. Add a generator method to enable exceptions to be passed to a
|
||||
generator. Currently, there is no clean method for triggering
|
||||
exceptions from outside the generator.
|
||||
|
||||
All of the suggestions are designed to take advantage of the
|
||||
existing implementation and require little additional effort to
|
||||
incorporate. Each is backward compatible and requires no new
|
||||
keywords.
|
||||
|
||||
|
||||
Specification for new built-ins:
|
||||
|
||||
def xfilter( pred, gen ):
|
||||
'''
|
||||
xfilter(...)
|
||||
xfilter(function, sequence) -> list
|
||||
|
||||
Return an iterator containing those items of sequence for
|
||||
which function is true. If function is None, return a list of
|
||||
items that are true.
|
||||
'''
|
||||
if pred is None:
|
||||
for i in gen:
|
||||
if i:
|
||||
yield i
|
||||
else:
|
||||
for i in gen:
|
||||
if pred(i):
|
||||
yield i
|
||||
|
||||
def xmap( fun, *collections ):
|
||||
'''
|
||||
xmap(...)
|
||||
xmap(function, sequence[, sequence, ...]) -> list
|
||||
|
||||
Return an iterator applying the function to the items of the
|
||||
argument collection(s). If more than one collection is given,
|
||||
the function is called with an argument list consisting of the
|
||||
corresponding item of each collection, substituting None for
|
||||
missing values when not all collections have the same length.
|
||||
If the function is None, return a list of the items of the
|
||||
collection (or a list of tuples if more than one collection).
|
||||
'''
|
||||
gens = map(iter, collections)
|
||||
values_left = [1]
|
||||
def values():
|
||||
# Emulate map behaviour, i.e. shorter
|
||||
# sequences are padded with None when
|
||||
# they run out of values.
|
||||
values_left[0] = 0
|
||||
for i in range(len(gens)):
|
||||
iterator = gens[i]
|
||||
if iterator is None:
|
||||
yield None
|
||||
else:
|
||||
try:
|
||||
yield iterator.next()
|
||||
values_left[0] = 1
|
||||
except StopIteration:
|
||||
gens[i] = None
|
||||
yield None
|
||||
while 1:
|
||||
args = tuple(values())
|
||||
if not values_left[0]:
|
||||
raise StopIteration
|
||||
yield func(*args)
|
||||
|
||||
def xzip( *collections ):
|
||||
'''
|
||||
xzip(...)
|
||||
xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
|
||||
|
||||
Return a iterator of tuples, where each tuple contains the
|
||||
i-th element from each of the argument sequences or iterable.
|
||||
The returned iterator is truncated in length to the length of
|
||||
the shortest argument collection.
|
||||
'''
|
||||
gens = map(iter, collections)
|
||||
while 1:
|
||||
yield tuple( [g.next() for g in gens] )
|
||||
|
||||
def indexed( collection, cnt=0, limit=None ):
|
||||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||||
gen = iter(collection)
|
||||
while limit is None or cnt<limit:
|
||||
yield (cnt, collection.next())
|
||||
cnt += 1
|
||||
|
||||
Note A: PEP 212 Loop Counter Iteration [2] discussed several
|
||||
proposals for achieving indexing. Some of the proposals only work
|
||||
for lists unlike the above function which works for any generator,
|
||||
xrange, sequence, or iterable object. Also, those proposals were
|
||||
presented and evaluated in the world prior to Python 2.2 which did
|
||||
not include generators.
|
||||
|
||||
|
||||
Specification for Generator Comprehensions:
|
||||
|
||||
If a list comprehension starts with a 'yield' keyword, then
|
||||
express the remainder of the statement as generator. For example:
|
||||
|
||||
g = [yield (len(line),line) for line in file.readline() if len(line)>5]
|
||||
print g.next()
|
||||
print g.next()
|
||||
|
||||
This would be implemented as if it had been written:
|
||||
|
||||
def __temp_gen():
|
||||
for line in file.readline():
|
||||
if len(line) > 5:
|
||||
yield (len(line), line)
|
||||
g = __temp_gen()
|
||||
print g.next()
|
||||
print g.next()
|
||||
|
||||
Note A: There is a difference in the above implementation as
|
||||
compared to list comprehensions. For a generator comprehension,
|
||||
the variables are created in a separate scope while list
|
||||
comprehensions use the enclosing scope. If this PEP is accepted,
|
||||
the parser should generate byte code that eliminates this
|
||||
difference by passing the line variable in the enclosing scope and
|
||||
using that same variable passed by reference inside the generator.
|
||||
This will make the behavior of generator comprehension identical
|
||||
to that of list comprehensions.
|
||||
|
||||
Note B: There is some debate about whether the enclosing brackets
|
||||
should be part of the syntax for generator comprehensions. On the
|
||||
plus side, it neatly parallels list comprehensions and would be
|
||||
immediately recognizable as a similar form with similar internal
|
||||
syntax (taking maximum advantage of what people already know).
|
||||
More importantly, it sets off the generator comprehension from the
|
||||
rest of the function so as to not suggest that the enclosing
|
||||
function is a generator (currently the only cue that a function is
|
||||
really a generator is the presence of the yield keyword). On the
|
||||
minus side, the brackets may falsely suggest that the whole
|
||||
expression returns a list. All of the feedback received to date
|
||||
indicates that brackets do not make a false suggestion and are
|
||||
in fact helpful.
|
||||
|
||||
|
||||
Specification for two-way Generator Parameter Passing:
|
||||
|
||||
1. Allow 'yield' to assign a value as in:
|
||||
|
||||
def mygen():
|
||||
while 1:
|
||||
x = yield None
|
||||
print x
|
||||
|
||||
2. Let the .next() method take a value to pass to generator as in:
|
||||
|
||||
g = mygen()
|
||||
g.next() # runs the generators until the first 'yield'
|
||||
g.next(1) # the '1' gets bound to 'x' in mygen()
|
||||
g.next(2) # the '2' gets bound to 'x' in mygen()
|
||||
|
||||
Note A: An early question arose, when would you need this? The
|
||||
answer is that existing generators make it easy to write lazy
|
||||
producers which may have a complex execution state and/or complex
|
||||
variable state. This proposal makes it equally easy to write lazy
|
||||
consumers which may also have a complex execution or variable
|
||||
state.
|
||||
|
||||
For instance, when writing an encoder for arithmetic compression,
|
||||
a series of fractional values are sent to a function which has
|
||||
periodic output and a complex state which depends on previous
|
||||
inputs. Also, that encoder requires a flush() function when no
|
||||
additional fractions are to be output. It is helpful to think of
|
||||
the following parallel with file output streams:
|
||||
|
||||
ostream = file('mydest.txt','w')
|
||||
ostream.write(firstdat)
|
||||
ostream.write(seconddat)
|
||||
ostream.flush()
|
||||
|
||||
With the proposed extensions, it could be written like this:
|
||||
|
||||
def filelike(packagename, appendOrOverwrite):
|
||||
cum = []
|
||||
if appendOrOverwrite == 'w+':
|
||||
cum.extend( packages[packagename] )
|
||||
try:
|
||||
while 1:
|
||||
dat = yield None
|
||||
cum.append(dat)
|
||||
except FlushStream:
|
||||
packages[packagename] = cum
|
||||
ostream = filelike('mydest','w')
|
||||
ostream.next()
|
||||
ostream.next(firstdat)
|
||||
ostream.next(seconddat)
|
||||
ostream.throw( FlushStream ) # this feature discussed below
|
||||
|
||||
Note C: Almost all of the machinery necessary to implement this
|
||||
extension is already in place. The parse syntax needs to be
|
||||
modified to accept the new x = yield None syntax and the .next()
|
||||
method needs to allow an argument.
|
||||
|
||||
Note D: Some care must be used when writing a values to the
|
||||
generator because execution starts at the top of the generator not
|
||||
at the first yield.
|
||||
|
||||
Consider the usual flow using .next() without an argument.
|
||||
|
||||
g = mygen(p1) will bind p1 to a local variable and then return a
|
||||
generator to be bound to g and NOT run any code in mygen().
|
||||
y = g.next() runs the generator from the first line until it
|
||||
encounters a yield when it suspends execution and a returns
|
||||
a value to be bound to y
|
||||
|
||||
Since the same flow applies when you are submitting values, the
|
||||
first call to .next() should have no argument since there is no
|
||||
place to put it.
|
||||
|
||||
g = mygen(p1) will bind p1 to a local variable and then return a
|
||||
generator to be bound to g and NOT run any code in mygen()
|
||||
g.next() will START execution in mygen() from the first line. Note,
|
||||
that there is nowhere to bind any potential arguments that
|
||||
might have been supplied to next(). Execution continues
|
||||
until the first yield is encountered and control is returned
|
||||
to the caller.
|
||||
g.next(val) resumes execution at the yield and binds val to the
|
||||
left hand side of the yield assignment and continues running
|
||||
until another yield is encountered. This makes sense because
|
||||
you submit values expecting them to be processed right away.
|
||||
|
||||
|
||||
Q. Two-way generator parameter passing seems awfully bold. To
|
||||
my mind, one of the great things about generators is that they
|
||||
meet the (very simple) definition of an iterator. With this,
|
||||
they no longer do. I like lazy consumers -- really I do --
|
||||
but I'd rather be conservative about putting something like
|
||||
this in the language.
|
||||
|
||||
A. If you don't use x = yield expr, then nothing changes and you
|
||||
haven't lost anything. So, it isn't really bold. It simply
|
||||
adds an option to pass in data as well as take it out. Other
|
||||
generator implementations (like the thread based generator.py)
|
||||
already have provisions for two-way parameter passing so that
|
||||
consumers are put on an equal footing with producers. Two-way
|
||||
is the norm, not the exception.
|
||||
|
||||
Yield is not just a simple iterator creator. It does
|
||||
something else truly wonderful -- it suspends execution and
|
||||
saves state. It is good for a lot more than its original
|
||||
purpose. Dr. Mertz's article [5] shows how they can be used
|
||||
to create general purpose co-routines.
|
||||
|
||||
Besides, 98% of the mechanism is already in place. Only the
|
||||
communication needs to be added. Remember GOSUB which neither
|
||||
took nor returned data. Routines which accepted parameters
|
||||
and returned values were a major step forward.
|
||||
|
||||
When you first need to pass information into a generator, the
|
||||
existing alternative is clumsy. It involves setting a global
|
||||
variable, calling .next(), and assigning the local from the
|
||||
global.
|
||||
|
||||
|
||||
Q. Why not introduce another keyword 'accept' for lazy consumers?
|
||||
|
||||
A. To avoid conflicts with 'yield', to avoid creating a new
|
||||
keyword, and to take advantage of the explicit clarity of the
|
||||
'=' operator.
|
||||
|
||||
|
||||
Q. How often does one need to write a lazy consumer or a co-routine?
|
||||
|
||||
A. Not often. But, when you DO have to write one, this approach
|
||||
is the easiest to implement, read, and debug.
|
||||
|
||||
It clearly beats using existing generators and passing data
|
||||
through global variables. It is much clearer and easier to
|
||||
debug than an equivalent approach using threading, mutexes,
|
||||
semaphores, and data queues. A class based approach competes
|
||||
well when there are no complex execution states or variable
|
||||
states. When the complexity increases, generators with
|
||||
two-way communication are much simpler because they
|
||||
automatically save state unlike classes which must explicitly
|
||||
store variable and execution state in instance variables.
|
||||
|
||||
|
||||
Q. Why does yield require an argument? Isn't yield None too wordy?
|
||||
|
||||
A. It doesn't matter for the purposes of this PEP. For
|
||||
information purposes, here is the reasoning as I understand
|
||||
it. Though return allows an implicit None, some now consider
|
||||
this to be weak design. There is some spirit of "Explicit is
|
||||
better than Implicit". More importantly, in most uses of
|
||||
yield, a missing argument is more likely to be a bug than an
|
||||
intended yield None.
|
||||
|
||||
|
||||
Specification for Generator Exception Passing:
|
||||
|
||||
Add a .throw(exception) method to the resulting generator as in:
|
||||
|
||||
def mygen():
|
||||
try:
|
||||
while 1:
|
||||
x = yield None
|
||||
print x
|
||||
except FlushStream:
|
||||
print 'Done'
|
||||
|
||||
g = mygen()
|
||||
g.next(5)
|
||||
g.throw(FlushStream)
|
||||
|
||||
There is no existing work around for triggering an exception
|
||||
inside a generator. This is a true deficiency. It is the only
|
||||
case in Python where active code cannot be excepted to or through.
|
||||
Even if .next(arg) is not adopted, we should add the .throw()
|
||||
method.
|
||||
|
||||
Note A: The name of the throw method was selected for several
|
||||
reasons. Raise is a keyword and so cannot be used as a method
|
||||
name. Unlike raise which immediately raises an exception from the
|
||||
current execution point, throw will first return to the generator
|
||||
and then raise the exception. The word throw is suggestive of
|
||||
putting the exception in another location. The word throw is
|
||||
already associated with exceptions in other languages.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] PEP 255 Simple Generators
|
||||
http://python.sourceforge.net/peps/pep-0255.html
|
||||
|
||||
[2] PEP 212 Loop Counter Iteration
|
||||
http://python.sourceforge.net/peps/pep-0212.html
|
||||
|
||||
[3] PEP 202 List Comprehensions
|
||||
http://python.sourceforge.net/peps/pep-0202.html
|
||||
|
||||
[4] There have been several discussion on comp.lang.python which helped
|
||||
tease out these proposals:
|
||||
|
||||
Indexed Function
|
||||
http://groups.google.com/groups?hl=en&th=33f778d92dd5720a
|
||||
|
||||
Xmap, Xfilter, Xzip and Two-way Generator Communication
|
||||
http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1
|
||||
|
||||
Two-way Generator Communication -- Revised Version
|
||||
http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1
|
||||
|
||||
Generator Comprehensions
|
||||
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
|
||||
|
||||
|
||||
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
|
||||
|
||||
[5] Dr. David Mertz's draft column for Charming Python.
|
||||
href="http://gnosis.cx/publish/programming/charming_python_b5.txt
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
fill-column: 70
|
||||
End:
|
||||
|
||||
|
Loading…
Reference in New Issue