PEP 279, Enhanced Generators, Raymond D. Hettinger.

(Minor spell checking and formatting by BAW)
This commit is contained in:
Barry Warsaw 2002-02-01 05:59:14 +00:00
parent 398c395d47
commit 1947249ff7
1 changed files with 422 additions and 0 deletions

422
pep-0279.txt Normal file
View File

@ -0,0 +1,422 @@
PEP: 279
Title: Enhanced Generators
Version: $Revision$
Last-Modified: $Date$
Author: othello@javanet.com (Raymond D. Hettinger)
Status: Draft
Type: Standards Track
Created: 30-Jan-2002
Python-Version: 2.3
Post-History:
Abstract
This PEP introduces four orthogonal (not mutually exclusive) ideas
for enhancing the generators as introduced in Python version 2.2
[1]. The goal is increase the convenience, utility, and power of
generators.
Rationale
Starting with xrange() and xreadlines(), Python has been evolving
toward a model that provides lazy evaluation as an alternative
when complete evaluation is not desired because of memory
restrictions or availability of data.
Starting with Python 2.2, a second evolutionary direction came in
the form of iterators and generators. The iter() factory function
and generators were provided as convenient means of creating
iterators. Deep changes were made to use iterators as a unifying
theme throughout Python. The unification came in the form of
establishing a common iterable interface for mappings, sequences,
and file objects. In the case of mappings and file objects, lazy
evaluation was made available.
The next steps in the evolution of generators are:
1. Add built-in functions which provide lazy alternatives to their
complete evaluation counterparts and one other convenience
function which was made possible once iterators and generators
became available. The new functions are xzip, xmap, xfilter,
and indexed.
2. Provide a generator alternative to list comprehensions [3]
making generator creation as convenient as list creation.
3. Extend the syntax of the 'yield' keyword to enable two way
parameter passing. The resulting increase in power simplifies
the creation of consumer streams which have a complex execution
state and/or variable state.
4. Add a generator method to enable exceptions to be passed to a
generator. Currently, there is no clean method for triggering
exceptions from outside the generator.
All of the suggestions are designed to take advantage of the
existing implementation and require little additional effort to
incorporate. Each is backward compatible and requires no new
keywords.
Specification for new built-ins:
def xfilter( pred, gen ):
'''
xfilter(...)
xfilter(function, sequence) -> list
Return an iterator containing those items of sequence for
which function is true. If function is None, return a list of
items that are true.
'''
if pred is None:
for i in gen:
if i:
yield i
else:
for i in gen:
if pred(i):
yield i
def xmap( fun, *collections ):
'''
xmap(...)
xmap(function, sequence[, sequence, ...]) -> list
Return an iterator applying the function to the items of the
argument collection(s). If more than one collection is given,
the function is called with an argument list consisting of the
corresponding item of each collection, substituting None for
missing values when not all collections have the same length.
If the function is None, return a list of the items of the
collection (or a list of tuples if more than one collection).
'''
gens = map(iter, collections)
values_left = [1]
def values():
# Emulate map behaviour, i.e. shorter
# sequences are padded with None when
# they run out of values.
values_left[0] = 0
for i in range(len(gens)):
iterator = gens[i]
if iterator is None:
yield None
else:
try:
yield iterator.next()
values_left[0] = 1
except StopIteration:
gens[i] = None
yield None
while 1:
args = tuple(values())
if not values_left[0]:
raise StopIteration
yield func(*args)
def xzip( *collections ):
'''
xzip(...)
xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
Return a iterator of tuples, where each tuple contains the
i-th element from each of the argument sequences or iterable.
The returned iterator is truncated in length to the length of
the shortest argument collection.
'''
gens = map(iter, collections)
while 1:
yield tuple( [g.next() for g in gens] )
def indexed( collection, cnt=0, limit=None ):
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
gen = iter(collection)
while limit is None or cnt<limit:
yield (cnt, collection.next())
cnt += 1
Note A: PEP 212 Loop Counter Iteration [2] discussed several
proposals for achieving indexing. Some of the proposals only work
for lists unlike the above function which works for any generator,
xrange, sequence, or iterable object. Also, those proposals were
presented and evaluated in the world prior to Python 2.2 which did
not include generators.
Specification for Generator Comprehensions:
If a list comprehension starts with a 'yield' keyword, then
express the remainder of the statement as generator. For example:
g = [yield (len(line),line) for line in file.readline() if len(line)>5]
print g.next()
print g.next()
This would be implemented as if it had been written:
def __temp_gen():
for line in file.readline():
if len(line) > 5:
yield (len(line), line)
g = __temp_gen()
print g.next()
print g.next()
Note A: There is a difference in the above implementation as
compared to list comprehensions. For a generator comprehension,
the variables are created in a separate scope while list
comprehensions use the enclosing scope. If this PEP is accepted,
the parser should generate byte code that eliminates this
difference by passing the line variable in the enclosing scope and
using that same variable passed by reference inside the generator.
This will make the behavior of generator comprehension identical
to that of list comprehensions.
Note B: There is some debate about whether the enclosing brackets
should be part of the syntax for generator comprehensions. On the
plus side, it neatly parallels list comprehensions and would be
immediately recognizable as a similar form with similar internal
syntax (taking maximum advantage of what people already know).
More importantly, it sets off the generator comprehension from the
rest of the function so as to not suggest that the enclosing
function is a generator (currently the only cue that a function is
really a generator is the presence of the yield keyword). On the
minus side, the brackets may falsely suggest that the whole
expression returns a list. All of the feedback received to date
indicates that brackets do not make a false suggestion and are
in fact helpful.
Specification for two-way Generator Parameter Passing:
1. Allow 'yield' to assign a value as in:
def mygen():
while 1:
x = yield None
print x
2. Let the .next() method take a value to pass to generator as in:
g = mygen()
g.next() # runs the generators until the first 'yield'
g.next(1) # the '1' gets bound to 'x' in mygen()
g.next(2) # the '2' gets bound to 'x' in mygen()
Note A: An early question arose, when would you need this? The
answer is that existing generators make it easy to write lazy
producers which may have a complex execution state and/or complex
variable state. This proposal makes it equally easy to write lazy
consumers which may also have a complex execution or variable
state.
For instance, when writing an encoder for arithmetic compression,
a series of fractional values are sent to a function which has
periodic output and a complex state which depends on previous
inputs. Also, that encoder requires a flush() function when no
additional fractions are to be output. It is helpful to think of
the following parallel with file output streams:
ostream = file('mydest.txt','w')
ostream.write(firstdat)
ostream.write(seconddat)
ostream.flush()
With the proposed extensions, it could be written like this:
def filelike(packagename, appendOrOverwrite):
cum = []
if appendOrOverwrite == 'w+':
cum.extend( packages[packagename] )
try:
while 1:
dat = yield None
cum.append(dat)
except FlushStream:
packages[packagename] = cum
ostream = filelike('mydest','w')
ostream.next()
ostream.next(firstdat)
ostream.next(seconddat)
ostream.throw( FlushStream ) # this feature discussed below
Note C: Almost all of the machinery necessary to implement this
extension is already in place. The parse syntax needs to be
modified to accept the new x = yield None syntax and the .next()
method needs to allow an argument.
Note D: Some care must be used when writing a values to the
generator because execution starts at the top of the generator not
at the first yield.
Consider the usual flow using .next() without an argument.
g = mygen(p1) will bind p1 to a local variable and then return a
generator to be bound to g and NOT run any code in mygen().
y = g.next() runs the generator from the first line until it
encounters a yield when it suspends execution and a returns
a value to be bound to y
Since the same flow applies when you are submitting values, the
first call to .next() should have no argument since there is no
place to put it.
g = mygen(p1) will bind p1 to a local variable and then return a
generator to be bound to g and NOT run any code in mygen()
g.next() will START execution in mygen() from the first line. Note,
that there is nowhere to bind any potential arguments that
might have been supplied to next(). Execution continues
until the first yield is encountered and control is returned
to the caller.
g.next(val) resumes execution at the yield and binds val to the
left hand side of the yield assignment and continues running
until another yield is encountered. This makes sense because
you submit values expecting them to be processed right away.
Q. Two-way generator parameter passing seems awfully bold. To
my mind, one of the great things about generators is that they
meet the (very simple) definition of an iterator. With this,
they no longer do. I like lazy consumers -- really I do --
but I'd rather be conservative about putting something like
this in the language.
A. If you don't use x = yield expr, then nothing changes and you
haven't lost anything. So, it isn't really bold. It simply
adds an option to pass in data as well as take it out. Other
generator implementations (like the thread based generator.py)
already have provisions for two-way parameter passing so that
consumers are put on an equal footing with producers. Two-way
is the norm, not the exception.
Yield is not just a simple iterator creator. It does
something else truly wonderful -- it suspends execution and
saves state. It is good for a lot more than its original
purpose. Dr. Mertz's article [5] shows how they can be used
to create general purpose co-routines.
Besides, 98% of the mechanism is already in place. Only the
communication needs to be added. Remember GOSUB which neither
took nor returned data. Routines which accepted parameters
and returned values were a major step forward.
When you first need to pass information into a generator, the
existing alternative is clumsy. It involves setting a global
variable, calling .next(), and assigning the local from the
global.
Q. Why not introduce another keyword 'accept' for lazy consumers?
A. To avoid conflicts with 'yield', to avoid creating a new
keyword, and to take advantage of the explicit clarity of the
'=' operator.
Q. How often does one need to write a lazy consumer or a co-routine?
A. Not often. But, when you DO have to write one, this approach
is the easiest to implement, read, and debug.
It clearly beats using existing generators and passing data
through global variables. It is much clearer and easier to
debug than an equivalent approach using threading, mutexes,
semaphores, and data queues. A class based approach competes
well when there are no complex execution states or variable
states. When the complexity increases, generators with
two-way communication are much simpler because they
automatically save state unlike classes which must explicitly
store variable and execution state in instance variables.
Q. Why does yield require an argument? Isn't yield None too wordy?
A. It doesn't matter for the purposes of this PEP. For
information purposes, here is the reasoning as I understand
it. Though return allows an implicit None, some now consider
this to be weak design. There is some spirit of "Explicit is
better than Implicit". More importantly, in most uses of
yield, a missing argument is more likely to be a bug than an
intended yield None.
Specification for Generator Exception Passing:
Add a .throw(exception) method to the resulting generator as in:
def mygen():
try:
while 1:
x = yield None
print x
except FlushStream:
print 'Done'
g = mygen()
g.next(5)
g.throw(FlushStream)
There is no existing work around for triggering an exception
inside a generator. This is a true deficiency. It is the only
case in Python where active code cannot be excepted to or through.
Even if .next(arg) is not adopted, we should add the .throw()
method.
Note A: The name of the throw method was selected for several
reasons. Raise is a keyword and so cannot be used as a method
name. Unlike raise which immediately raises an exception from the
current execution point, throw will first return to the generator
and then raise the exception. The word throw is suggestive of
putting the exception in another location. The word throw is
already associated with exceptions in other languages.
References
[1] PEP 255 Simple Generators
http://python.sourceforge.net/peps/pep-0255.html
[2] PEP 212 Loop Counter Iteration
http://python.sourceforge.net/peps/pep-0212.html
[3] PEP 202 List Comprehensions
http://python.sourceforge.net/peps/pep-0202.html
[4] There have been several discussion on comp.lang.python which helped
tease out these proposals:
Indexed Function
http://groups.google.com/groups?hl=en&th=33f778d92dd5720a
Xmap, Xfilter, Xzip and Two-way Generator Communication
http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1
Two-way Generator Communication -- Revised Version
http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1
Generator Comprehensions
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
[5] Dr. David Mertz's draft column for Charming Python.
href="http://gnosis.cx/publish/programming/charming_python_b5.txt
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: