Raymond Hettinger's latest update.

This commit is contained in:
Barry Warsaw 2002-03-21 05:54:14 +00:00
parent 19aefb3500
commit 8e48462a3c
1 changed files with 93 additions and 236 deletions

View File

@ -12,7 +12,7 @@ Post-History:
Abstract
This PEP introduces four orthogonal (not mutually exclusive) ideas
This PEP introduces three orthogonal (not mutually exclusive) ideas
for enhancing the generators introduced in Python version 2.2 [1].
The goal is to increase the convenience, utility, and power
of generators.
@ -28,13 +28,13 @@ Rationale
and file objects.
Generators, as proposed in PEP 255 [1], were introduced as a means for
making it easier to create iterators, especially ones with a complex
making it easier to create iterators, especially ones with complex
internal execution or variable states. When I created new programs,
generators were often the tool of choice for creating an iterator.
However, when updating existing programs, I found that the tool had
another use, one that improved program function as well as structure.
Those programs exhibited a pattern of creating large lists and then
Some programs exhibited a pattern of creating large lists and then
looping over them. As data sizes increased, the programs encountered
scalability limitations owing to excessive memory consumption (and
malloc time) for the intermediate lists. Generators were found to be
@ -53,41 +53,32 @@ Rationale
once iterators and generators became available. It provides
all iterables with the same advantage that iteritems() affords
to dictionaries -- a compact, readable, reliable index notation.
2. Establish a generator alternative to list comprehensions [3]
to provide a simple way to convert a list comprehensions into
generators whenever memory issues arise.
that provides a simple way to convert a list comprehension into
a generator whenever memory issues arise.
3. Add a generator method to enable exceptions to be passed to a
generator. Currently, there is no clean method for triggering
exceptions from outside the generator. Also, generator exception
passing helps mitigate the try/finally prohibition for generators.
4. [Proposal 4 is now deferred until Python 2.4]
Extend the syntax of the 'yield' keyword to enable generator
parameter passing. The resulting increase in power simplifies
the creation of consumer streams which have a complex execution
state and/or variable state.
All of the suggestions are designed to take advantage of the
existing implementation and require little additional effort to
incorporate. Each is backward compatible and requires no new
keywords. The first three generator tools go into Python 2.3 when
keywords. The three generator tools go into Python 2.3 when
generators become final and are not imported from __future__.
The fourth proposal should be considered deferred and will be
proposed for Python 2.4 after the Python community has more
experience with generators.
Reference Implementation
There is not currently a CPython implementation; however, a simulation
module written in pure Python is available on SourceForge [7]. The
module written in pure Python is available on SourceForge [5]. The
simulation covers every feature proposed in this PEP and is meant
to allow direct experimentation with the proposals.
There is also a module [8] with working source code for all of the
There is also a module [6] with working source code for all of the
examples used in this PEP. It serves as a test suite for the simulator
and it documents how each of the new features works in practice.
@ -98,14 +89,15 @@ Reference Implementation
Ka-Ping thought so also. GvR said he could believe that it was true.
Tim did not have an opportunity to give an assessment.
Specification for a new builtin:
def indexed(collection, cnt=0, limit=None):
def indexed(collection, start=0, stop=None):
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
gen = iter(collection)
while limit is None or cnt<limit:
cnt = start
while stop is None or cnt<stop:
yield (cnt, gen.next())
cnt += 1
@ -119,7 +111,7 @@ Specification for a new builtin:
PEP 212 had the disadvantage of consuming memory with a giant list
of tuples. The generator version presented here is fast and light,
works with all iterables, and allows users to abandon the sequence
in mid-stream.
in mid-stream with no loss of computation effort.
There are other PEPs which touch on related issues: integer iterators,
integer for-loops, and one for modifying the arguments to range and
@ -139,39 +131,45 @@ Specification for a new builtin:
iterable interface. Just as zip() solves the problem of looping
over multiple sequences, the indexed() function solves the loop
counter problem.
If only one builtin is allowed, then indexed() is the most important
general purpose tool, solving the broadest class of problems while
improving program brevity, clarity and reliability.
Commentary from GvR: filter and map should die and be subsumed into list
Comments from GvR: filter and map should die and be subsumed into list
comprehensions, not grow more variants. I'd rather introduce builtins
that do iterator algebra (e.g. the iterzip that I've often used as
an example).
Commentary from Ka-Ping Yee: I'm also quite happy with everything you
Comments from Ka-Ping Yee: I'm also quite happy with everything you
proposed ... and the extra builtins (really 'indexed' in particular)
are things I have wanted for a long time.
Commentary from Neil Schemenauer: The new builtins sound okay. Guido
Comments from Neil Schemenauer: The new builtins sound okay. Guido
may be concerned with increasing the number of builtins too much. You
might be better off selling them as part of a module. If you use a
module then you can add lots of useful functions (Haskell has lots of
them that we could steal).
Commentary for Magnus Lie Hetland: I think indexed would be a useful and
Comments for Magnus Lie Hetland: I think indexed would be a useful and
natural built-in function. I would certainly use it a lot.
I like indexed() a lot; +1. I'm quite happy to have it make PEP 281
obsolete. Adding a separate module for iterator utilities seems like
a good idea.
Comments from the Community: The response to the indexed() proposal has
been close to 100% favorable. Almost everyone loves the idea.
Author response: Prior to these comments, four builtins were proposed.
After the comments, xmap xfilter and xzip were withdrawn. The one
that remains is vital for the language and is proposed by itself.
I still secretly covet xzip() a.k.a. iterzip() but think that it will
happen on its own someday.
Indexed() is trivially easy to implement and can be documented in
minutes. More importantly, it is useful in everyday programming
which does not otherwise involve explicit use of generators.
Though withdrawn from the proposal, I still secretly covet xzip()
a.k.a. iterzip() but think that it will happen on its own someday.
@ -181,7 +179,6 @@ Specification for Generator Comprehensions:
express the comprehension with a generator. For example:
g = [yield (len(line),line) for line in file if len(line)>5]
print g.next()
This would be implemented as if it had been written:
@ -190,7 +187,7 @@ Specification for Generator Comprehensions:
if len(line) > 5:
yield (len(line), line)
g = __temp()
print g.next()
Note A: There is some discussion about whether the enclosing brackets
should be part of the syntax for generator comprehensions. On the
@ -210,7 +207,7 @@ Specification for Generator Comprehensions:
makes it trivially easy to transform existing list comprehension
code to a generator by adding yield. Likewise, it can be converted
back to a list by deleting yield. This makes it easy to scale-up
programs from small datasets to ones large enough to warrant
programs from small datasets to ones large enough to warrant
just in time evaluation.
@ -234,32 +231,50 @@ Specification for Generator Comprehensions:
iteration starts (possibly never).
Commentary from GvR: Cute hack, but I think the use of the [] syntax
Comments from GvR: Cute hack, but I think the use of the [] syntax
strongly suggests that it would return a list, not an iterator. I
also think that this is trying to turn Python into a functional
language, where most algorithms use lazy infinite sequences, and I
just don't think that's where its future lies.
just don't think that's where its future lies.
Commentary from Ka-Ping Yee: I am very happy with the things you have
Comments from Ka-Ping Yee: I am very happy with the things you have
proposed in this PEP. I feel quite positive about generator
comprehensions and have no reservations. So a +1 on that.
Commentary from Neil Schemenauer: I'm -0 on the generator list
Comments from Neil Schemenauer: I'm -0 on the generator list
comprehensions. They don't seem to add much. You could easily use
a nested generator to do the same thing. They smell like lambda.
Commentary for Magnus Lie Hetland: Generator comprehensions seem mildly
Comments for Magnus Lie Hetland: Generator comprehensions seem mildly
useful, but I vote +0. Defining a separate, named generator would
probably be my preference. On the other hand, I do see the advantage
of "scaling up" from list comprehensions.
Author response: This may be before its time in that some people still
don't like list comprehensions and half of this PEP's reviewers did
not have any use for generators in any form. What I like best about
generator comprehensions is that I can design using list
comprehensions and then easily switch to a generator (by adding
yield) in response to scalability requirements (when the list
comprehension produces too large of an intermediate result).
Comments from the Community: The response to the generator comprehension
proposal has been mostly favorable. There were some 0 votes from
people who didn't see a real need or who were not energized by the
idea. Some of the 0 votes were tempered by comments that the reviewer
did not even like list comprehensions or did not have any use for
generators in any form. The +1 votes outnumbered the 0 votes by about
two to one.
Author response: I've studied several syntactical variations and
concluded that the brackets are essential for:
- teachability (it's like a list comprehension)
- set-off (yield applies to the comprehension not the enclosing
function)
- substitutability (list comprehensions can be made lazy just by
adding yield)
What I like best about generator comprehensions is that I can design
using list comprehensions and then easily switch to a generator (by
adding yield) in response to scalability requirements (when the list
comprehension produces too large of an intermediate result). Had
generators already been in-place when list comprehensions were
accepted, the yield option might have been incorporated from the
start. For certain, the mathematical style notation is explicit and
readable as compared to a separate function definition with an
embedded yield.
@ -268,10 +283,10 @@ Specification for Generator Exception Passing:
Add a .throw(exception) method to the generator interface:
def logger():
start = time.time()
log = []
start = time.time()
log = []
try:
while 1:0
while 1:
log.append( time.time() - start )
yield log[-1]
except WriteLog:
@ -279,19 +294,19 @@ Specification for Generator Exception Passing:
g = logger()
for i in [10,20,40,80,160]:
testsuite(i)
g.next()
testsuite(i)
g.next()
g.throw(WriteLog)
There is no existing work-around for triggering an exception
inside a generator. This is a true deficiency. It is the only
case in Python where active code cannot be excepted to or through.
case in Python where active code cannot be excepted to or through.
Generator exception passing also helps address an intrinsic limitation
on generators, the prohibition against their using try/finally to
trigger clean-up code [1]. Without .throw(), the current work-around
forces the resolution or clean-up code to be moved outside the generator.
Note A: The name of the throw method was selected for several
reasons. Raise is a keyword and so cannot be used as a method
@ -307,11 +322,11 @@ Specification for Generator Exception Passing:
Note B: The throw syntax should exactly match raise's syntax:
throw([expression, [expression, [expression]]])
Accordingly, it should be implemented to handle all of the following:
raise string g.throw(string)
raise string, data g.throw(string,data)
raise class, instance g.throw(class,instance)
@ -319,28 +334,37 @@ Specification for Generator Exception Passing:
raise g.throw()
Commentary from GvR: I'm not convinced that the cleanup problem that
Comments from GvR: I'm not convinced that the cleanup problem that
this is trying to solve exists in practice. I've never felt the need
to put yield inside a try/except. I think the PEP doesn't make enough
of a case that this is useful.
to put yield inside a try/except. I think the PEP doesn't make enough
of a case that this is useful.
Commentary from Ka-Ping Yee: I agree that the exception issue needs to
Comments from Ka-Ping Yee: I agree that the exception issue needs to
be resolved and [that] you have suggested a fine solution.
Commentary from Neil Schemenauer: The exception passing idea is one I
Comments from Neil Schemenauer: The exception passing idea is one I
hadn't thought of before and looks interesting. If we enable the
passing of values back, then we should add this feature too.
Commentary for Magnus Lie Hetland: Even though I cannot speak for the
Comments for Magnus Lie Hetland: Even though I cannot speak for the
ease of implementation, I vote +1 for the exception passing mechanism.
Comments from the Community: The response has been mostly favorable. One
negative comment from GvR is shown above. The other was from Martin von
Loewis who was concerned that it could be difficult to implement and
is withholding his support until a working patch is available. To probe
Martin's comment, I checked with the implementers of the original
generator PEP for an opinion on the ease of implementation. They felt that
implementation would be straight-forward and could be grafted onto the
existing implementation without disturbing its internals.
Author response: When the sole use of generators is to simplify writing
iterators for lazy producers, then the odds of needing generator
exception passing are very slim. If, on the other hand, generators
iterators for lazy producers, then the odds of needing generator
exception passing are slim. If, on the other hand, generators
are used to write lazy consumers, create coroutines, generate output
streams, or simply for their marvelous capability for restarting a
previously frozen state, THEN the need to raise exceptions will
come up almost every time.
come up frequently.
I'm no judge of what is truly Pythonic, but am still astonished
that there can exist blocks of code that can't be excepted to or
@ -350,167 +374,6 @@ Specification for Generator Exception Passing:
Specification for Generator Parameter Passing [Deferred Proposal]
1. Allow 'yield' to assign a value as in:
def mygen():
while 1:
x = yield None
print x
2. Let the .next() method take a value to pass to the generator as in:
g = mygen()
g.next() # runs the generator until the first 'yield'
g.next(1) # '1' is bound to 'x' in mygen(), then printed
g.next(2) # '2' is bound to 'x' in mygen(), then printed
The control flow of 'yield' and 'next' is unchanged by this proposal.
The only change is that a value can be sent into the generator.
By analogy, consider the quality improvement from GOSUB (which had
no argument passing mechanism) to modern procedure calls (which can
pass in arguments and return values).
Most of the underlying machinery is already in place, only the
communication needs to be added by modifying the parse syntax to
accept the new 'x = yield expr' syntax and by allowing the .next()
method to accept an optional argument.
Yield is more than just a simple iterator creator. It does
something else truly wonderful -- it suspends execution and saves
state. It is good for a lot more than writing iterators. This
proposal further expands its capability by making it easier to
share data with the generator.
The .next(arg) mechanism is especially useful for:
1. Sending data to any generator
2. Writing lazy consumers with complex execution states
3. Writing co-routines (as demonstrated in Dr. Mertz's article [5])
The proposal is a clear improvement over the existing alternative
of passing data via global variables. It is also much simpler,
more readable and easier to debug than an approach involving the
threading module with its attendant mutexes, semaphores, and data
queues. A class-based approach competes well when there are no
complex execution states or variable states. However, when the
complexity increases, generators with parameter passing are much simpler
because they automatically save state (unlike classes which must
explicitly save the variable and execution state in instance variables).
Note A: This proposal changes 'yield' from a statement to an
expression with binding and precedence similar to lambda.
Example of a Complex Consumer
The encoder for arithmetic compression sends a series of
fractional values to a complex, lazy consumer. That consumer
makes computations based on previous inputs and only writes out
when certain conditions have been met. After the last fraction is
received, it has a procedure for flushing any unwritten data.
Example of a Consumer Stream
def filelike(packagename, appendOrOverwrite):
cum = []
if appendOrOverwrite == 'w+':
cum.extend(packages[packagename])
try:
while 1:
dat = yield None
cum.append(dat)
except FlushStream:
packages[packagename] = cum
ostream = filelike('mydest','w') # Analogous to file.open(name,flag)
ostream.next() # Advance to the first yield
ostream.next(firstdat) # Analogous to file.write(dat)
ostream.next(seconddat)
ostream.throw(FlushStream) # This feature proposed above
Example of a Complex Consumer
Loop over the picture files in a directory, shrink them
one at a time to thumbnail size using PIL [6], and send them to a
lazy consumer. That consumer is responsible for creating a large
blank image, accepting thumbnails one at a time and placing them
in a 5 by 3 grid format onto the blank image. Whenever the grid is
full, it writes-out the large image as an index print. A
FlushStream exception indicates that no more thumbnails are
available and that the partial index print should be written out
if there are one or more thumbnails on it.
Example of a Producer and Consumer Used Together in a Pipe-like Fashion
'Analogy to Linux style pipes: source | upper | sink'
sink = sinkgen()
sink.next()
for word in source():
sink.next(word.upper())
Commentary from GvR: We discussed this at length when we were hashing
out generators and coroutines, and found that there's always a problem
with this: the argument to the first next() call has to be thrown away,
because it doesn't correspond to a yield statement. This looks ugly
(note that the example code has a dummy call to next() to get the
generator going). But there may be useful examples that can only be
programmed (elegantly) with this feature, so I'm reserving judgment.
I can believe that it's easy to implement.
Commentary from Ka-Ping Yee: I also think there is a lot of power to be
gained from generator argument passing.
Commentary from Neil Schemenauer: I like the idea of being able to pass
values back into a generator. I originally pitched this idea to Guido
but in the end we decided against it (at least for the initial
implementation). There was a few issues to work out but I can't seem
to remember what they were. My feeling is that we need to wait until
the Python community has more experience with generators before adding
this feature. Maybe for 2.4 but not for 2.3. In the mean time you
can work around this limitation by making your generator a method.
Values can be passed back by mutating the instance.
Commentary for Magnus Lie Hetland: I like the generator parameter
passing mechanism. Although I see no need to defer it, deferral seems
to be the most likely scenario, and in the meantime I guess the
functionality can be emulated either by implementing the generator
as a method, or by passing a parameter with the exception
passing mechanism.
Author response: Okay, consider this part of the proposal deferred
until 2.4.
Restartability
[Discussion of restartability deleted]
Commentary from GvR: The PEP then goes on to discuss restartable
iterators. I think this is an evil idea obtained from reading too
much about C++ STL iterators. It should definitely be a separate
PEP if the author wants me to take this seriously.
Commentary from Ka-Ping Yee: I have less of an opinion on restartability
since i have not yet had to really run into that issue. It seems
reasonable that it might be good idea, though perhaps YAGNI will apply
here until I experience the need for it first-hand.
Commentary for Magnus Lie Hetland: I guess there is no real need to comment
on restartability, but I can't see that I have any need for it.
Author response: Over thirty reviewers responded, only one was interested
in restartability on the theory that it made life easier for beginners
and that it made lazy evaluation more substitutable for full
evaluation. I was never sold on it myself. Consider it retracted.
References
[1] PEP 255 Simple Generators
@ -525,16 +388,10 @@ References
[4] PEP 234 Iterators
http://python.sourceforge.net/peps/pep-0234.html
[5] Dr. David Mertz's draft column for Charming Python.
http://gnosis.cx/publish/programming/charming_python_b5.txt
[6] PIL, the Python Imaging Library can be found at:
http://www.pythonware.com/products/pil/
[7] A pure Python simulation of every feature in this PEP is at:
[5] A pure Python simulation of every feature in this PEP is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
[8] The full, working source code for each of the examples in this PEP
[6] The full, working source code for each of the examples in this PEP
along with other examples and tests is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756