Latest update from Raymond D. Hettinger. Spell checking applied.

Made code samples self-consistent, and also consistent with Python
style guide w.r.t. spaces around function arguments.
This commit is contained in:
Barry Warsaw 2002-03-04 13:20:02 +00:00
parent e7818687b7
commit 6d5f340df7
1 changed files with 181 additions and 92 deletions

View File

@ -2,7 +2,7 @@ PEP: 279
Title: Enhanced Generators
Version: $Revision$
Last-Modified: $Date$
Author: othello@javanet.com (Raymond D. Hettinger)
Author: python@rcn.com (Raymond D. Hettinger)
Status: Draft
Type: Standards Track
Created: 30-Jan-2002
@ -13,8 +13,8 @@ Post-History:
Abstract
This PEP introduces four orthogonal (not mutually exclusive) ideas
for enhancing the generators as introduced in Python version 2.2
[1]. The goal is to increase the convenience, utility, and power
for enhancing the generators introduced in Python version 2.2 [1].
The goal is to increase the convenience, utility, and power
of generators.
@ -32,7 +32,7 @@ Rationale
theme throughout Python. The unification came in the form of
establishing a common iterable interface for mappings, sequences,
and file objects. In the case of mappings and file objects, lazy
evaluation was made available.
evaluation became the norm.
The next steps in the evolution of generators are:
@ -52,7 +52,8 @@ Rationale
4. Add a generator method to enable exceptions to be passed to a
generator. Currently, there is no clean method for triggering
exceptions from outside the generator.
exceptions from outside the generator. Also, generator exception
passing helps mitigate the try/finally prohibition for generators.
All of the suggestions are designed to take advantage of the
existing implementation and require little additional effort to
@ -60,15 +61,24 @@ Rationale
keywords. These generator tools go into Python 2.3 when
generators become final and are not imported from __future__.
SourceForge contains a working, pure Python simulation of every
feature proposed in this PEP [8]. SourceForge also has a separate
file with a simulation test suite and working source code for the
examples listed used in this PEP [9].
Reference Implementation
There is not currently a CPython implementation; however, a simulation
module written in pure Python is available on SourceForge [8]. The
simulation covers every feature proposed in this PEP and is meant
to allow direct experimentation with the proposals.
There is also a module [9] with working source code for all of the
examples used in this PEP. It serves as a test suite for the simulator
and it documents how each of the new features works in practice.
Specification for new built-ins:
def xfilter( pred, gen ):
def xfilter(pred, gen):
'''
xfilter(...)
xfilter(function, sequence) -> list
@ -86,7 +96,7 @@ Specification for new built-ins:
if pred(i):
yield i
def xmap( fun, *collections ):
def xmap(fun, *collections): ### Code from Python Cookbook [6]
'''
xmap(...)
xmap(function, sequence[, sequence, ...]) -> list
@ -96,15 +106,14 @@ Specification for new built-ins:
the function is called with an argument list consisting of the
corresponding item of each collection, substituting None for
missing values when not all collections have the same length.
If the function is None, return a list of the items of the
collection (or a list of tuples if more than one collection).
If the function is None, return an iterator of the items of the
collection (or an iterator of tuples if more than one collection).
'''
gens = map(iter, collections)
values_left = [1]
def values():
# Emulate map behaviour, i.e. shorter
# sequences are padded with None when
# they run out of values.
# Emulate map behavior by padding sequences with None
# when they run out of values.
values_left[0] = 0
for i in range(len(gens)):
iterator = gens[i]
@ -123,7 +132,7 @@ Specification for new built-ins:
raise StopIteration
yield fun(*args)
def xzip( *collections ): ### Code from Python Cookbook [6]
def xzip(*collections):
'''
xzip(...)
xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
@ -135,28 +144,57 @@ Specification for new built-ins:
'''
gens = map(iter, collections)
while 1:
yield tuple( [g.next() for g in gens] )
yield tuple([g.next() for g in gens])
def indexed( collection, cnt=0, limit=None ):
def indexed(collection, cnt=0, limit=None):
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
gen = iter(collection)
while limit is None or cnt<limit:
yield (cnt, gen.next())
cnt += 1
Note A: PEP 212 Loop Counter Iteration [2] discussed several
proposals for achieving indexing. Some of the proposals only work
for lists unlike the above function which works for any generator,
xrange, sequence, or iterable object. Also, those proposals were
presented and evaluated in the world prior to Python 2.2 which did
not include generators.
not include generators. As a result, the generator-less version in
PEP 212 had the disadvantage of consuming memory with a giant list
of tuples. The generator version presented here is fast and light,
works with all iterables, and allows users to abandon the sequence
in mid-stream.
Note B: An alternate, simplified definition of indexed was proposed:
def indexed( collection, cnt=0, limit=sys.maxint ):
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
return xzip( xrange(cnt,limit), collection )
Note B: An alternate, simplified definition of indexed is:
def indexed(collection, cnt=0, limit=sys.maxint):
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
return xzip( xrange(cnt,limit), collection )
Note C: As it stands, the Python code for xmap is slow. The actual
implementation of the functions should be written in C for speed.
The pure Python code listed above is meant only to specify how the
functions would behave, in particular that they should as closely as
possible emulate their non-lazy counterparts.
Note D: Almost all of the PEP reviewers welcomed these functions but were
divided as to whether they should be built-ins or in a separate module.
The main argument for a separate module was to slow the rate of language
inflation. The main argument for built-ins was that these functions are
destined to be part of a core programming style, applicable to any object
with an iterable interface. Just as zip() solves the problem of looping
over multiple sequences, the indexed() function solves the loop counter
problem. Likewise, the x-functions solve the problem of applying
functional constructs without forcing the evaluation of an entire sequence.
If only one built-in were allowed, then indexed() is the most important
general purpose tool, solving the broadest class of problems while
improving program brevity, clarity and reliability.
Specification for Generator Comprehensions:
@ -164,17 +202,18 @@ Specification for Generator Comprehensions:
express the comprehension with a generator. For example:
g = [yield (len(line),line) for line in file if len(line)>5]
print g.next()
This would be implemented as if it had been written:
class __Temp:
def __iter__(self):
for line in file:
if len(line) > 5:
yield (len(line), line)
g = __Temp()
def __temp(self):
for line in file:
if len(line) > 5:
yield (len(line), line)
g = __temp()
print g.next()
Note A: There is some debate about whether the enclosing brackets
Note A: There is some discussion about whether the enclosing brackets
should be part of the syntax for generator comprehensions. On the
plus side, it neatly parallels list comprehensions and would be
immediately recognizable as a similar form with similar internal
@ -185,21 +224,10 @@ Specification for Generator Comprehensions:
really a generator is the presence of the yield keyword). On the
minus side, the brackets may falsely suggest that the whole
expression returns a list. Most of the feedback received to date
indicates that brackets do not make a false suggestion and are
in fact helpful.
indicates that brackets are helpful and not misleading.
Note B: An iterable instance is returned by the above code. The
purpose is to allow the object to be re-started and looped-over
multiple times. This accurately mimics the behavior of list
comprehensions. As a result, the following code (provided by Oren
Tirosh) works equally well with or without 'yield':
letters = [yield chr(i) for i in xrange(ord('a'),ord('z')+1)]
digits = [yield str(i) for i in xrange(10)]
letdig = [yield l+d for l in letters for d in digits]
Note C: List comprehensions expose their looping variable and
leave the variable in the enclosing scope. The code, [str(i) for
Note B: List comprehensions expose their looping variable and
leave that variable in the enclosing scope. The code, [str(i) for
i in range(8)] leaves 'i' set to 7 in the scope where the
comprehension appears. This behavior is by design and reflects an
intent to duplicate the result of coding a for-loop instead of a
@ -215,9 +243,8 @@ Specification for Generator Comprehensions:
instead of a generator comprehension. Further, the variable 'i'
is not in a defined state on the line immediately following the
list comprehension. It does not come into existence until
iteration starts. Since several generators may be running at
once, there are potentially multiple, unequal instances of 'i' at
any one time.
iteration starts (possibly never).
Specification for Generator Parameter Passing:
@ -229,18 +256,18 @@ Specification for Generator Parameter Passing:
x = yield None
print x
2. Let the .next() method take a value to pass to generator as in:
2. Let the .next() method take a value to pass to the generator as in:
g = mygen()
g.next() # runs the generator until the first 'yield'
g.next(1) # '1' is bound to 'x' in mygen(), then printed
g.next(2) # '2' is bound to 'x' in mygen(), then printed
The control flow is unchanged by this proposal. The only change
is that a value can be sent into the generator. By analogy,
consider the quality improvement from GOSUB (which had no argument
passing mechanism) to modern procedure calls (which can pass in
arguments and return values).
The control flow of 'yield' and 'next' is unchanged by this proposal.
The only change is that a value can be sent into the generator.
By analogy, consider the quality improvement from GOSUB (which had
no argument passing mechanism) to modern procedure calls (which can
pass in arguments and return values).
Most of the underlying machinery is already in place, only the
communication needs to be added by modifying the parse syntax to
@ -263,12 +290,14 @@ Specification for Generator Parameter Passing:
more readable and easier to debug than an approach involving the
threading module with its attendant mutexes, semaphores, and data
queues. A class-based approach competes well when there are no
complex execution states or variable states. When the complexity
increases, generators with parameter passing are much simpler
complex execution states or variable states. However, when the
complexity increases, generators with parameter passing are much simpler
because they automatically save state (unlike classes which must
explicitly save the variable and execution state in instance
variables).
explicitly save the variable and execution state in instance variables).
Note A: This proposal changes 'yield' from a statement to an
expression with binding and precedence similar to lambda.
Example of a Complex Consumer
@ -284,45 +313,47 @@ Specification for Generator Parameter Passing:
def filelike(packagename, appendOrOverwrite):
cum = []
if appendOrOverwrite == 'w+':
cum.extend( packages[packagename] )
cum.extend(packages[packagename])
try:
while 1:
dat = yield None
cum.append(dat)
except FlushStream:
packages[packagename] = cum
ostream = filelike('mydest','w') # Analogous to file.open(name,flag)
ostream.next() # Advance to the first yield
ostream.next(firstdat) # Analogous to file.write(dat)
ostream.next(seconddat)
ostream.throw( FlushStream ) # This feature proposed below
ostream.throw(FlushStream) # This feature proposed below
Example of a Complex Consumer
Loop over the picture files in a directory, shrink them
one-at-a-time to thumbnail size using PIL [7], and send them to a
one at a time to thumbnail size using PIL [7], and send them to a
lazy consumer. That consumer is responsible for creating a large
blank image, accepting thumbnails one-at-a-time and placing them
in a 5x3 grid format onto the blank image. Whenever the grid is
blank image, accepting thumbnails one at a time and placing them
in a 5 by 3 grid format onto the blank image. Whenever the grid is
full, it writes-out the large image as an index print. A
FlushStream exception indicates that no more thumbnails are
available and that the partial index print should be written out
if there are one or more thumbnails on it.
Example of a Producer and Consumer Used Together in a Pipelike Fashion
Example of a Producer and Consumer Used Together in a Pipe-like Fashion
'Analogy to: source | upper | sink'
'Analogy to Linux style pipes: source | upper | sink'
sink = sinkgen()
sink.next()
for word in source():
sink.next( word.upper() )
sink.next(word.upper())
Specification for Generator Exception Passing:
Add a .throw(exception) method to the resulting generator as in:
Add a .throw(exception) method to the generator interface:
def mygen():
try:
@ -336,12 +367,18 @@ Specification for Generator Exception Passing:
g.next(5)
g.throw(FlushStream)
There is no existing work around for triggering an exception
There is no existing work-around for triggering an exception
inside a generator. This is a true deficiency. It is the only
case in Python where active code cannot be excepted to or through.
case in Python where active code cannot be excepted to or through.
Even if the .next(arg) proposal is not adopted, we should add the
.throw() method.
Generator exception passing also helps address an intrinsic limitation
on generators, the prohibition against their using try/finally to
trigger clean-up code [1]. Without .throw(), the current work-around
forces the resolution or clean-up code to be moved outside the generator.
Note A: The name of the throw method was selected for several
reasons. Raise is a keyword and so cannot be used as a method
name. Unlike raise which immediately raises an exception from the
@ -350,12 +387,76 @@ Specification for Generator Exception Passing:
putting the exception in another location. The word throw is
already associated with exceptions in other languages.
Note B: The throw syntax should exactly match raise's syntax including:
raise string g.throw(string)
raise string, data g.throw(string,data)
raise class, instance g.throw(class,instance)
raise instance g.throw(instance)
raise g.throw()
Alternative method names were considered: resolve(), signal(),
genraise(), raiseinto(), and flush(). None of these seem to fit
as well as throw().
Note B: The throw syntax should exactly match raise's syntax:
throw([expression, [expression, [expression]]])
Accordingly, it should be implemented to handle all of the following:
raise string g.throw(string)
raise string, data g.throw(string,data)
raise class, instance g.throw(class,instance)
raise instance g.throw(instance)
raise g.throw()
Discussion of Restartability:
Inside for-loops, generators are not substitutable for lists unless they
are accessed only once. A second access only works for restartable
objects like lists, dicts, objects defined with __getitem__, and
xrange objects. Generators are not the only objects which are not
restartable. Other examples of non-restartable sequences include file
objects, xreadlines objects, and the result of iter(callable,sentinel).
Since the proposed built-in functions return generators, they are also
non-restartable. As a result, 'xmap' is not substitutable for 'map' in
the following example:
alphabet = map(chr, xrange(ord('a'), ord('z')+1))
twoletterwords = [a+b for a in alphabet for b in alphabet]
Since generator comprehensions also return generators, they are not
restartable. Consequently, they are not substitutable for list
comprehensions in the following example:
digits = [str(i) for i in xrange(10)]
alphadig = [a+d for a in 'abcdefg' for d in digits]
To achieve substitutabity, generator comprehensions and x-functions
can be implemented in a way that supports restarts. PEP 234 [4]
explicitly states that restarts are to be supported through repeated
calls to iter(). With that guidance, it is easy to add restartability
to generator comprehensions using a simple wrapper class around the
generator function and modifying the implementation above to return:
g = Restartable(__temp) # instead of g = __temp()
Restartable is a simple (12 line) class which calls the generator function
to create a new, re-wound generator whenever iter() requests a restart.
Calls to .next() are simply forwarded to the generator. The Python source
code for the Restartable class can found in the PEP 279 simulator [8].
An actual implementation in C can achieve re-startability directly and
would not need the slow class wrapper used in the pure Python simulation.
The XLazy library [10] shows how restarts can be implemented for xmap,
xfilter, and xzip.
The upside of adding restart capability is that more list comprehensions
can be made lazy and save memory by adding 'yield'. Likewise,
more expressions that use map, filter, and zip can be made lazy just by
adding 'x'.
A possible downside is that x-functions have no control over whether their
inputs are themselves restartable. With non-restartable inputs like
generators or files, an x-function restart will not produce a meaningful
result.
References
@ -369,23 +470,8 @@ References
[3] PEP 202 List Comprehensions
http://python.sourceforge.net/peps/pep-0202.html
[4] There have been several discussion on comp.lang.python which helped
tease out these proposals:
Indexed Function
http://groups.google.com/groups?hl=en&th=33f778d92dd5720a
Xmap, Xfilter, Xzip and Two-way Generator Communication
http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1
Two-way Generator Communication -- Revised Version
http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1
Generator Comprehensions
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
Discussion Draft of this PEP
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
[4] PEP 234 Iterators
http://python.sourceforge.net/peps/pep-0234.html
[5] Dr. David Mertz's draft column for Charming Python.
http://gnosis.cx/publish/programming/charming_python_b5.txt
@ -403,6 +489,9 @@ References
along with other examples and tests is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
[10] Oren Tirosh's XLazy library with re-startable x-functions is at:
http://www.tothink.com/python/dataflow/
Copyright