Latest update from Raymond D. Hettinger. Spell checking applied.
Made code samples self-consistent, and also consistent with Python style guide w.r.t. spaces around function arguments.
This commit is contained in:
parent
e7818687b7
commit
6d5f340df7
273
pep-0279.txt
273
pep-0279.txt
|
@ -2,7 +2,7 @@ PEP: 279
|
|||
Title: Enhanced Generators
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: othello@javanet.com (Raymond D. Hettinger)
|
||||
Author: python@rcn.com (Raymond D. Hettinger)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 30-Jan-2002
|
||||
|
@ -13,8 +13,8 @@ Post-History:
|
|||
Abstract
|
||||
|
||||
This PEP introduces four orthogonal (not mutually exclusive) ideas
|
||||
for enhancing the generators as introduced in Python version 2.2
|
||||
[1]. The goal is to increase the convenience, utility, and power
|
||||
for enhancing the generators introduced in Python version 2.2 [1].
|
||||
The goal is to increase the convenience, utility, and power
|
||||
of generators.
|
||||
|
||||
|
||||
|
@ -32,7 +32,7 @@ Rationale
|
|||
theme throughout Python. The unification came in the form of
|
||||
establishing a common iterable interface for mappings, sequences,
|
||||
and file objects. In the case of mappings and file objects, lazy
|
||||
evaluation was made available.
|
||||
evaluation became the norm.
|
||||
|
||||
The next steps in the evolution of generators are:
|
||||
|
||||
|
@ -52,7 +52,8 @@ Rationale
|
|||
|
||||
4. Add a generator method to enable exceptions to be passed to a
|
||||
generator. Currently, there is no clean method for triggering
|
||||
exceptions from outside the generator.
|
||||
exceptions from outside the generator. Also, generator exception
|
||||
passing helps mitigate the try/finally prohibition for generators.
|
||||
|
||||
All of the suggestions are designed to take advantage of the
|
||||
existing implementation and require little additional effort to
|
||||
|
@ -60,15 +61,24 @@ Rationale
|
|||
keywords. These generator tools go into Python 2.3 when
|
||||
generators become final and are not imported from __future__.
|
||||
|
||||
SourceForge contains a working, pure Python simulation of every
|
||||
feature proposed in this PEP [8]. SourceForge also has a separate
|
||||
file with a simulation test suite and working source code for the
|
||||
examples listed used in this PEP [9].
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
||||
There is not currently a CPython implementation; however, a simulation
|
||||
module written in pure Python is available on SourceForge [8]. The
|
||||
simulation covers every feature proposed in this PEP and is meant
|
||||
to allow direct experimentation with the proposals.
|
||||
|
||||
There is also a module [9] with working source code for all of the
|
||||
examples used in this PEP. It serves as a test suite for the simulator
|
||||
and it documents how each of the new features works in practice.
|
||||
|
||||
|
||||
|
||||
Specification for new built-ins:
|
||||
|
||||
def xfilter( pred, gen ):
|
||||
def xfilter(pred, gen):
|
||||
'''
|
||||
xfilter(...)
|
||||
xfilter(function, sequence) -> list
|
||||
|
@ -86,7 +96,7 @@ Specification for new built-ins:
|
|||
if pred(i):
|
||||
yield i
|
||||
|
||||
def xmap( fun, *collections ):
|
||||
def xmap(fun, *collections): ### Code from Python Cookbook [6]
|
||||
'''
|
||||
xmap(...)
|
||||
xmap(function, sequence[, sequence, ...]) -> list
|
||||
|
@ -96,15 +106,14 @@ Specification for new built-ins:
|
|||
the function is called with an argument list consisting of the
|
||||
corresponding item of each collection, substituting None for
|
||||
missing values when not all collections have the same length.
|
||||
If the function is None, return a list of the items of the
|
||||
collection (or a list of tuples if more than one collection).
|
||||
If the function is None, return an iterator of the items of the
|
||||
collection (or an iterator of tuples if more than one collection).
|
||||
'''
|
||||
gens = map(iter, collections)
|
||||
values_left = [1]
|
||||
def values():
|
||||
# Emulate map behaviour, i.e. shorter
|
||||
# sequences are padded with None when
|
||||
# they run out of values.
|
||||
# Emulate map behavior by padding sequences with None
|
||||
# when they run out of values.
|
||||
values_left[0] = 0
|
||||
for i in range(len(gens)):
|
||||
iterator = gens[i]
|
||||
|
@ -123,7 +132,7 @@ Specification for new built-ins:
|
|||
raise StopIteration
|
||||
yield fun(*args)
|
||||
|
||||
def xzip( *collections ): ### Code from Python Cookbook [6]
|
||||
def xzip(*collections):
|
||||
'''
|
||||
xzip(...)
|
||||
xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
|
||||
|
@ -135,28 +144,57 @@ Specification for new built-ins:
|
|||
'''
|
||||
gens = map(iter, collections)
|
||||
while 1:
|
||||
yield tuple( [g.next() for g in gens] )
|
||||
yield tuple([g.next() for g in gens])
|
||||
|
||||
def indexed( collection, cnt=0, limit=None ):
|
||||
def indexed(collection, cnt=0, limit=None):
|
||||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||||
gen = iter(collection)
|
||||
while limit is None or cnt<limit:
|
||||
yield (cnt, gen.next())
|
||||
cnt += 1
|
||||
|
||||
|
||||
Note A: PEP 212 Loop Counter Iteration [2] discussed several
|
||||
proposals for achieving indexing. Some of the proposals only work
|
||||
for lists unlike the above function which works for any generator,
|
||||
xrange, sequence, or iterable object. Also, those proposals were
|
||||
presented and evaluated in the world prior to Python 2.2 which did
|
||||
not include generators.
|
||||
not include generators. As a result, the generator-less version in
|
||||
PEP 212 had the disadvantage of consuming memory with a giant list
|
||||
of tuples. The generator version presented here is fast and light,
|
||||
works with all iterables, and allows users to abandon the sequence
|
||||
in mid-stream.
|
||||
|
||||
Note B: An alternate, simplified definition of indexed was proposed:
|
||||
|
||||
def indexed( collection, cnt=0, limit=sys.maxint ):
|
||||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||||
return xzip( xrange(cnt,limit), collection )
|
||||
Note B: An alternate, simplified definition of indexed is:
|
||||
|
||||
def indexed(collection, cnt=0, limit=sys.maxint):
|
||||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||||
return xzip( xrange(cnt,limit), collection )
|
||||
|
||||
|
||||
Note C: As it stands, the Python code for xmap is slow. The actual
|
||||
implementation of the functions should be written in C for speed.
|
||||
The pure Python code listed above is meant only to specify how the
|
||||
functions would behave, in particular that they should as closely as
|
||||
possible emulate their non-lazy counterparts.
|
||||
|
||||
|
||||
Note D: Almost all of the PEP reviewers welcomed these functions but were
|
||||
divided as to whether they should be built-ins or in a separate module.
|
||||
The main argument for a separate module was to slow the rate of language
|
||||
inflation. The main argument for built-ins was that these functions are
|
||||
destined to be part of a core programming style, applicable to any object
|
||||
with an iterable interface. Just as zip() solves the problem of looping
|
||||
over multiple sequences, the indexed() function solves the loop counter
|
||||
problem. Likewise, the x-functions solve the problem of applying
|
||||
functional constructs without forcing the evaluation of an entire sequence.
|
||||
|
||||
If only one built-in were allowed, then indexed() is the most important
|
||||
general purpose tool, solving the broadest class of problems while
|
||||
improving program brevity, clarity and reliability.
|
||||
|
||||
|
||||
|
||||
Specification for Generator Comprehensions:
|
||||
|
||||
|
@ -164,17 +202,18 @@ Specification for Generator Comprehensions:
|
|||
express the comprehension with a generator. For example:
|
||||
|
||||
g = [yield (len(line),line) for line in file if len(line)>5]
|
||||
print g.next()
|
||||
|
||||
This would be implemented as if it had been written:
|
||||
|
||||
class __Temp:
|
||||
def __iter__(self):
|
||||
for line in file:
|
||||
if len(line) > 5:
|
||||
yield (len(line), line)
|
||||
g = __Temp()
|
||||
def __temp(self):
|
||||
for line in file:
|
||||
if len(line) > 5:
|
||||
yield (len(line), line)
|
||||
g = __temp()
|
||||
print g.next()
|
||||
|
||||
Note A: There is some debate about whether the enclosing brackets
|
||||
Note A: There is some discussion about whether the enclosing brackets
|
||||
should be part of the syntax for generator comprehensions. On the
|
||||
plus side, it neatly parallels list comprehensions and would be
|
||||
immediately recognizable as a similar form with similar internal
|
||||
|
@ -185,21 +224,10 @@ Specification for Generator Comprehensions:
|
|||
really a generator is the presence of the yield keyword). On the
|
||||
minus side, the brackets may falsely suggest that the whole
|
||||
expression returns a list. Most of the feedback received to date
|
||||
indicates that brackets do not make a false suggestion and are
|
||||
in fact helpful.
|
||||
indicates that brackets are helpful and not misleading.
|
||||
|
||||
Note B: An iterable instance is returned by the above code. The
|
||||
purpose is to allow the object to be re-started and looped-over
|
||||
multiple times. This accurately mimics the behavior of list
|
||||
comprehensions. As a result, the following code (provided by Oren
|
||||
Tirosh) works equally well with or without 'yield':
|
||||
|
||||
letters = [yield chr(i) for i in xrange(ord('a'),ord('z')+1)]
|
||||
digits = [yield str(i) for i in xrange(10)]
|
||||
letdig = [yield l+d for l in letters for d in digits]
|
||||
|
||||
Note C: List comprehensions expose their looping variable and
|
||||
leave the variable in the enclosing scope. The code, [str(i) for
|
||||
Note B: List comprehensions expose their looping variable and
|
||||
leave that variable in the enclosing scope. The code, [str(i) for
|
||||
i in range(8)] leaves 'i' set to 7 in the scope where the
|
||||
comprehension appears. This behavior is by design and reflects an
|
||||
intent to duplicate the result of coding a for-loop instead of a
|
||||
|
@ -215,9 +243,8 @@ Specification for Generator Comprehensions:
|
|||
instead of a generator comprehension. Further, the variable 'i'
|
||||
is not in a defined state on the line immediately following the
|
||||
list comprehension. It does not come into existence until
|
||||
iteration starts. Since several generators may be running at
|
||||
once, there are potentially multiple, unequal instances of 'i' at
|
||||
any one time.
|
||||
iteration starts (possibly never).
|
||||
|
||||
|
||||
|
||||
Specification for Generator Parameter Passing:
|
||||
|
@ -229,18 +256,18 @@ Specification for Generator Parameter Passing:
|
|||
x = yield None
|
||||
print x
|
||||
|
||||
2. Let the .next() method take a value to pass to generator as in:
|
||||
2. Let the .next() method take a value to pass to the generator as in:
|
||||
|
||||
g = mygen()
|
||||
g.next() # runs the generator until the first 'yield'
|
||||
g.next(1) # '1' is bound to 'x' in mygen(), then printed
|
||||
g.next(2) # '2' is bound to 'x' in mygen(), then printed
|
||||
|
||||
The control flow is unchanged by this proposal. The only change
|
||||
is that a value can be sent into the generator. By analogy,
|
||||
consider the quality improvement from GOSUB (which had no argument
|
||||
passing mechanism) to modern procedure calls (which can pass in
|
||||
arguments and return values).
|
||||
The control flow of 'yield' and 'next' is unchanged by this proposal.
|
||||
The only change is that a value can be sent into the generator.
|
||||
By analogy, consider the quality improvement from GOSUB (which had
|
||||
no argument passing mechanism) to modern procedure calls (which can
|
||||
pass in arguments and return values).
|
||||
|
||||
Most of the underlying machinery is already in place, only the
|
||||
communication needs to be added by modifying the parse syntax to
|
||||
|
@ -263,12 +290,14 @@ Specification for Generator Parameter Passing:
|
|||
more readable and easier to debug than an approach involving the
|
||||
threading module with its attendant mutexes, semaphores, and data
|
||||
queues. A class-based approach competes well when there are no
|
||||
complex execution states or variable states. When the complexity
|
||||
increases, generators with parameter passing are much simpler
|
||||
complex execution states or variable states. However, when the
|
||||
complexity increases, generators with parameter passing are much simpler
|
||||
because they automatically save state (unlike classes which must
|
||||
explicitly save the variable and execution state in instance
|
||||
variables).
|
||||
explicitly save the variable and execution state in instance variables).
|
||||
|
||||
Note A: This proposal changes 'yield' from a statement to an
|
||||
expression with binding and precedence similar to lambda.
|
||||
|
||||
|
||||
Example of a Complex Consumer
|
||||
|
||||
|
@ -284,45 +313,47 @@ Specification for Generator Parameter Passing:
|
|||
def filelike(packagename, appendOrOverwrite):
|
||||
cum = []
|
||||
if appendOrOverwrite == 'w+':
|
||||
cum.extend( packages[packagename] )
|
||||
cum.extend(packages[packagename])
|
||||
try:
|
||||
while 1:
|
||||
dat = yield None
|
||||
cum.append(dat)
|
||||
except FlushStream:
|
||||
packages[packagename] = cum
|
||||
|
||||
ostream = filelike('mydest','w') # Analogous to file.open(name,flag)
|
||||
ostream.next() # Advance to the first yield
|
||||
ostream.next(firstdat) # Analogous to file.write(dat)
|
||||
ostream.next(seconddat)
|
||||
ostream.throw( FlushStream ) # This feature proposed below
|
||||
ostream.throw(FlushStream) # This feature proposed below
|
||||
|
||||
|
||||
Example of a Complex Consumer
|
||||
|
||||
Loop over the picture files in a directory, shrink them
|
||||
one-at-a-time to thumbnail size using PIL [7], and send them to a
|
||||
one at a time to thumbnail size using PIL [7], and send them to a
|
||||
lazy consumer. That consumer is responsible for creating a large
|
||||
blank image, accepting thumbnails one-at-a-time and placing them
|
||||
in a 5x3 grid format onto the blank image. Whenever the grid is
|
||||
blank image, accepting thumbnails one at a time and placing them
|
||||
in a 5 by 3 grid format onto the blank image. Whenever the grid is
|
||||
full, it writes-out the large image as an index print. A
|
||||
FlushStream exception indicates that no more thumbnails are
|
||||
available and that the partial index print should be written out
|
||||
if there are one or more thumbnails on it.
|
||||
|
||||
|
||||
Example of a Producer and Consumer Used Together in a Pipelike Fashion
|
||||
Example of a Producer and Consumer Used Together in a Pipe-like Fashion
|
||||
|
||||
'Analogy to: source | upper | sink'
|
||||
'Analogy to Linux style pipes: source | upper | sink'
|
||||
sink = sinkgen()
|
||||
sink.next()
|
||||
for word in source():
|
||||
sink.next( word.upper() )
|
||||
sink.next(word.upper())
|
||||
|
||||
|
||||
|
||||
Specification for Generator Exception Passing:
|
||||
|
||||
Add a .throw(exception) method to the resulting generator as in:
|
||||
Add a .throw(exception) method to the generator interface:
|
||||
|
||||
def mygen():
|
||||
try:
|
||||
|
@ -336,12 +367,18 @@ Specification for Generator Exception Passing:
|
|||
g.next(5)
|
||||
g.throw(FlushStream)
|
||||
|
||||
There is no existing work around for triggering an exception
|
||||
There is no existing work-around for triggering an exception
|
||||
inside a generator. This is a true deficiency. It is the only
|
||||
case in Python where active code cannot be excepted to or through.
|
||||
case in Python where active code cannot be excepted to or through.
|
||||
Even if the .next(arg) proposal is not adopted, we should add the
|
||||
.throw() method.
|
||||
|
||||
Generator exception passing also helps address an intrinsic limitation
|
||||
on generators, the prohibition against their using try/finally to
|
||||
trigger clean-up code [1]. Without .throw(), the current work-around
|
||||
forces the resolution or clean-up code to be moved outside the generator.
|
||||
|
||||
|
||||
Note A: The name of the throw method was selected for several
|
||||
reasons. Raise is a keyword and so cannot be used as a method
|
||||
name. Unlike raise which immediately raises an exception from the
|
||||
|
@ -350,12 +387,76 @@ Specification for Generator Exception Passing:
|
|||
putting the exception in another location. The word throw is
|
||||
already associated with exceptions in other languages.
|
||||
|
||||
Note B: The throw syntax should exactly match raise's syntax including:
|
||||
raise string g.throw(string)
|
||||
raise string, data g.throw(string,data)
|
||||
raise class, instance g.throw(class,instance)
|
||||
raise instance g.throw(instance)
|
||||
raise g.throw()
|
||||
Alternative method names were considered: resolve(), signal(),
|
||||
genraise(), raiseinto(), and flush(). None of these seem to fit
|
||||
as well as throw().
|
||||
|
||||
|
||||
Note B: The throw syntax should exactly match raise's syntax:
|
||||
|
||||
throw([expression, [expression, [expression]]])
|
||||
|
||||
Accordingly, it should be implemented to handle all of the following:
|
||||
|
||||
raise string g.throw(string)
|
||||
raise string, data g.throw(string,data)
|
||||
raise class, instance g.throw(class,instance)
|
||||
raise instance g.throw(instance)
|
||||
raise g.throw()
|
||||
|
||||
|
||||
Discussion of Restartability:
|
||||
|
||||
Inside for-loops, generators are not substitutable for lists unless they
|
||||
are accessed only once. A second access only works for restartable
|
||||
objects like lists, dicts, objects defined with __getitem__, and
|
||||
xrange objects. Generators are not the only objects which are not
|
||||
restartable. Other examples of non-restartable sequences include file
|
||||
objects, xreadlines objects, and the result of iter(callable,sentinel).
|
||||
|
||||
Since the proposed built-in functions return generators, they are also
|
||||
non-restartable. As a result, 'xmap' is not substitutable for 'map' in
|
||||
the following example:
|
||||
|
||||
alphabet = map(chr, xrange(ord('a'), ord('z')+1))
|
||||
twoletterwords = [a+b for a in alphabet for b in alphabet]
|
||||
|
||||
Since generator comprehensions also return generators, they are not
|
||||
restartable. Consequently, they are not substitutable for list
|
||||
comprehensions in the following example:
|
||||
|
||||
digits = [str(i) for i in xrange(10)]
|
||||
alphadig = [a+d for a in 'abcdefg' for d in digits]
|
||||
|
||||
To achieve substitutabity, generator comprehensions and x-functions
|
||||
can be implemented in a way that supports restarts. PEP 234 [4]
|
||||
explicitly states that restarts are to be supported through repeated
|
||||
calls to iter(). With that guidance, it is easy to add restartability
|
||||
to generator comprehensions using a simple wrapper class around the
|
||||
generator function and modifying the implementation above to return:
|
||||
|
||||
g = Restartable(__temp) # instead of g = __temp()
|
||||
|
||||
Restartable is a simple (12 line) class which calls the generator function
|
||||
to create a new, re-wound generator whenever iter() requests a restart.
|
||||
Calls to .next() are simply forwarded to the generator. The Python source
|
||||
code for the Restartable class can found in the PEP 279 simulator [8].
|
||||
An actual implementation in C can achieve re-startability directly and
|
||||
would not need the slow class wrapper used in the pure Python simulation.
|
||||
|
||||
The XLazy library [10] shows how restarts can be implemented for xmap,
|
||||
xfilter, and xzip.
|
||||
|
||||
The upside of adding restart capability is that more list comprehensions
|
||||
can be made lazy and save memory by adding 'yield'. Likewise,
|
||||
more expressions that use map, filter, and zip can be made lazy just by
|
||||
adding 'x'.
|
||||
|
||||
A possible downside is that x-functions have no control over whether their
|
||||
inputs are themselves restartable. With non-restartable inputs like
|
||||
generators or files, an x-function restart will not produce a meaningful
|
||||
result.
|
||||
|
||||
|
||||
|
||||
References
|
||||
|
@ -369,23 +470,8 @@ References
|
|||
[3] PEP 202 List Comprehensions
|
||||
http://python.sourceforge.net/peps/pep-0202.html
|
||||
|
||||
[4] There have been several discussion on comp.lang.python which helped
|
||||
tease out these proposals:
|
||||
|
||||
Indexed Function
|
||||
http://groups.google.com/groups?hl=en&th=33f778d92dd5720a
|
||||
|
||||
Xmap, Xfilter, Xzip and Two-way Generator Communication
|
||||
http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1
|
||||
|
||||
Two-way Generator Communication -- Revised Version
|
||||
http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1
|
||||
|
||||
Generator Comprehensions
|
||||
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
|
||||
|
||||
Discussion Draft of this PEP
|
||||
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
|
||||
[4] PEP 234 Iterators
|
||||
http://python.sourceforge.net/peps/pep-0234.html
|
||||
|
||||
[5] Dr. David Mertz's draft column for Charming Python.
|
||||
http://gnosis.cx/publish/programming/charming_python_b5.txt
|
||||
|
@ -403,6 +489,9 @@ References
|
|||
along with other examples and tests is at:
|
||||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
|
||||
|
||||
[10] Oren Tirosh's XLazy library with re-startable x-functions is at:
|
||||
http://www.tothink.com/python/dataflow/
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
|
|
Loading…
Reference in New Issue