411 lines
18 KiB
Plaintext
411 lines
18 KiB
Plaintext
PEP: 279
|
||
Title: Enhanced Generators
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: python@rcn.com (Raymond D. Hettinger)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 30-Jan-2002
|
||
Python-Version: 2.3
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
This PEP introduces three orthogonal (not mutually exclusive) ideas
|
||
for enhancing the generators introduced in Python version 2.2 [1].
|
||
The goal is to increase the convenience, utility, and power
|
||
of generators.
|
||
|
||
|
||
Rationale
|
||
|
||
Python 2.2 introduced the concept of an iterable interface as proposed
|
||
in PEP 234 [4]. The iter() factory function was provided as common
|
||
calling convention and deep changes were made to use iterators as a
|
||
unifying theme throughout Python. The unification came in the form of
|
||
establishing a common iterable interface for mappings, sequences,
|
||
and file objects.
|
||
|
||
Generators, as proposed in PEP 255 [1], were introduced as a means for
|
||
making it easier to create iterators, especially ones with complex
|
||
internal execution or variable states. When I created new programs,
|
||
generators were often the tool of choice for creating an iterator.
|
||
|
||
However, when updating existing programs, I found that the tool had
|
||
another use, one that improved program function as well as structure.
|
||
Some programs exhibited a pattern of creating large lists and then
|
||
looping over them. As data sizes increased, the programs encountered
|
||
scalability limitations owing to excessive memory consumption (and
|
||
malloc time) for the intermediate lists. Generators were found to be
|
||
directly substitutable for the lists while eliminating the memory
|
||
issues through lazy evaluation a.k.a. just in time manufacturing.
|
||
|
||
Python itself encountered similar issues. As a result, xrange() and
|
||
xreadlines() were introduced. And, in the case of file objects and
|
||
mappings, lazy evaluation became the norm. Generators provide a tool
|
||
to program memory conserving for-loops whenever complete evaluation is
|
||
not desired because of memory restrictions or availability of data.
|
||
|
||
The next steps in the evolution of generators are:
|
||
|
||
1. Add a new builtin function, indexed() which was made possible
|
||
once iterators and generators became available. It provides
|
||
all iterables with the same advantage that iteritems() affords
|
||
to dictionaries -- a compact, readable, reliable index notation.
|
||
|
||
2. Establish a generator alternative to list comprehensions [3]
|
||
that provides a simple way to convert a list comprehension into
|
||
a generator whenever memory issues arise.
|
||
|
||
3. Add a generator method to enable exceptions to be passed to a
|
||
generator. Currently, there is no clean method for triggering
|
||
exceptions from outside the generator. Also, generator exception
|
||
passing helps mitigate the try/finally prohibition for generators.
|
||
|
||
All of the suggestions are designed to take advantage of the
|
||
existing implementation and require little additional effort to
|
||
incorporate. Each is backward compatible and requires no new
|
||
keywords. The three generator tools go into Python 2.3 when
|
||
generators become final and are not imported from __future__.
|
||
|
||
|
||
|
||
Reference Implementation
|
||
|
||
There is not currently a CPython implementation; however, a simulation
|
||
module written in pure Python is available on SourceForge [5]. The
|
||
simulation covers every feature proposed in this PEP and is meant
|
||
to allow direct experimentation with the proposals.
|
||
|
||
There is also a module [6] with working source code for all of the
|
||
examples used in this PEP. It serves as a test suite for the simulator
|
||
and it documents how each of the new features works in practice.
|
||
|
||
The authors and implementers of PEP 255 [1] were contacted to provide
|
||
their assessment of whether these enhancements were going to be
|
||
straight-forward to implement and require only minor modification
|
||
of the existing generator code. Neil felt the assertion was correct.
|
||
Ka-Ping thought so also. GvR said he could believe that it was true.
|
||
Tim did not have an opportunity to give an assessment.
|
||
|
||
|
||
|
||
Specification for a new builtin:
|
||
|
||
def indexed(collection, start=0, stop=None):
|
||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||
gen = iter(collection)
|
||
cnt = start
|
||
while stop is None or cnt<stop:
|
||
yield (cnt, gen.next())
|
||
cnt += 1
|
||
|
||
|
||
Note A: PEP 212 Loop Counter Iteration [2] discussed several
|
||
proposals for achieving indexing. Some of the proposals only work
|
||
for lists unlike the above function which works for any generator,
|
||
xrange, sequence, or iterable object. Also, those proposals were
|
||
presented and evaluated in the world prior to Python 2.2 which did
|
||
not include generators. As a result, the non-generator version in
|
||
PEP 212 had the disadvantage of consuming memory with a giant list
|
||
of tuples. The generator version presented here is fast and light,
|
||
works with all iterables, and allows users to abandon the sequence
|
||
in mid-stream with no loss of computation effort.
|
||
|
||
There are other PEPs which touch on related issues: integer iterators,
|
||
integer for-loops, and one for modifying the arguments to range and
|
||
xrange. The indexed() proposal does not preclude the other proposals
|
||
and it still meets an important need even if those are adopted -- the need
|
||
to count items in any iterable. The other proposals give a means of
|
||
producing an index but not the corresponding value. This is especially
|
||
problematic if a sequence is given which doesn't support random access
|
||
such as a file object, generator, or sequence defined with __getitem__.
|
||
|
||
|
||
Note B: Almost all of the PEP reviewers welcomed the function but were
|
||
divided as to whether there should be any builtins. The main argument
|
||
for a separate module was to slow the rate of language inflation. The
|
||
main argument for a builtin was that the function is destined to be
|
||
part of a core programming style, applicable to any object with an
|
||
iterable interface. Just as zip() solves the problem of looping
|
||
over multiple sequences, the indexed() function solves the loop
|
||
counter problem.
|
||
|
||
If only one builtin is allowed, then indexed() is the most important
|
||
general purpose tool, solving the broadest class of problems while
|
||
improving program brevity, clarity and reliability.
|
||
|
||
|
||
Comments from GvR: filter and map should die and be subsumed into list
|
||
comprehensions, not grow more variants. I'd rather introduce builtins
|
||
that do iterator algebra (e.g. the iterzip that I've often used as
|
||
an example).
|
||
|
||
Comments from Ka-Ping Yee: I'm also quite happy with everything you
|
||
proposed ... and the extra builtins (really 'indexed' in particular)
|
||
are things I have wanted for a long time.
|
||
|
||
Comments from Neil Schemenauer: The new builtins sound okay. Guido
|
||
may be concerned with increasing the number of builtins too much. You
|
||
might be better off selling them as part of a module. If you use a
|
||
module then you can add lots of useful functions (Haskell has lots of
|
||
them that we could steal).
|
||
|
||
Comments for Magnus Lie Hetland: I think indexed would be a useful and
|
||
natural built-in function. I would certainly use it a lot.
|
||
I like indexed() a lot; +1. I'm quite happy to have it make PEP 281
|
||
obsolete. Adding a separate module for iterator utilities seems like
|
||
a good idea.
|
||
|
||
Comments from the Community: The response to the indexed() proposal has
|
||
been close to 100% favorable. Almost everyone loves the idea.
|
||
|
||
Author response: Prior to these comments, four builtins were proposed.
|
||
After the comments, xmap xfilter and xzip were withdrawn. The one
|
||
that remains is vital for the language and is proposed by itself.
|
||
Indexed() is trivially easy to implement and can be documented in
|
||
minutes. More importantly, it is useful in everyday programming
|
||
which does not otherwise involve explicit use of generators.
|
||
|
||
Though withdrawn from the proposal, I still secretly covet xzip()
|
||
a.k.a. iterzip() but think that it will happen on its own someday.
|
||
|
||
|
||
|
||
Specification for Generator Comprehensions:
|
||
|
||
If a list comprehension starts with a 'yield' keyword, then
|
||
express the comprehension with a generator. For example:
|
||
|
||
g = [yield (len(line),line) for line in file if len(line)>5]
|
||
|
||
This would be implemented as if it had been written:
|
||
|
||
def __temp(self):
|
||
for line in file:
|
||
if len(line) > 5:
|
||
yield (len(line), line)
|
||
g = __temp()
|
||
|
||
|
||
Note A: There is some discussion about whether the enclosing brackets
|
||
should be part of the syntax for generator comprehensions. On the
|
||
plus side, it neatly parallels list comprehensions and would be
|
||
immediately recognizable as a similar form with similar internal
|
||
syntax (taking maximum advantage of what people already know).
|
||
More importantly, it sets off the generator comprehension from the
|
||
rest of the function so as to not suggest that the enclosing
|
||
function is a generator (currently the only cue that a function is
|
||
really a generator is the presence of the yield keyword). On the
|
||
minus side, the brackets may falsely suggest that the whole
|
||
expression returns a list. Most of the feedback received to date
|
||
indicates that brackets are helpful and not misleading. Unfortunately,
|
||
the one dissent is from GvR.
|
||
|
||
A key advantage of the generator comprehension syntax is that it
|
||
makes it trivially easy to transform existing list comprehension
|
||
code to a generator by adding yield. Likewise, it can be converted
|
||
back to a list by deleting yield. This makes it easy to scale-up
|
||
programs from small datasets to ones large enough to warrant
|
||
just in time evaluation.
|
||
|
||
|
||
Note B: List comprehensions expose their looping variable and
|
||
leave that variable in the enclosing scope. The code, [str(i) for
|
||
i in range(8)] leaves 'i' set to 7 in the scope where the
|
||
comprehension appears. This behavior is by design and reflects an
|
||
intent to duplicate the result of coding a for-loop instead of a
|
||
list comprehension. Further, the variable 'i' is in a defined and
|
||
potentially useful state on the line immediately following the
|
||
list comprehension.
|
||
|
||
In contrast, generator comprehensions do not expose the looping
|
||
variable to the enclosing scope. The code, [yield str(i) for i in
|
||
range(8)] leaves 'i' untouched in the scope where the
|
||
comprehension appears. This is also by design and reflects an
|
||
intent to duplicate the result of coding a generator directly
|
||
instead of a generator comprehension. Further, the variable 'i'
|
||
is not in a defined state on the line immediately following the
|
||
list comprehension. It does not come into existence until
|
||
iteration starts (possibly never).
|
||
|
||
|
||
Comments from GvR: Cute hack, but I think the use of the [] syntax
|
||
strongly suggests that it would return a list, not an iterator. I
|
||
also think that this is trying to turn Python into a functional
|
||
language, where most algorithms use lazy infinite sequences, and I
|
||
just don't think that's where its future lies.
|
||
|
||
Comments from Ka-Ping Yee: I am very happy with the things you have
|
||
proposed in this PEP. I feel quite positive about generator
|
||
comprehensions and have no reservations. So a +1 on that.
|
||
|
||
Comments from Neil Schemenauer: I'm -0 on the generator list
|
||
comprehensions. They don't seem to add much. You could easily use
|
||
a nested generator to do the same thing. They smell like lambda.
|
||
|
||
Comments for Magnus Lie Hetland: Generator comprehensions seem mildly
|
||
useful, but I vote +0. Defining a separate, named generator would
|
||
probably be my preference. On the other hand, I do see the advantage
|
||
of "scaling up" from list comprehensions.
|
||
|
||
Comments from the Community: The response to the generator comprehension
|
||
proposal has been mostly favorable. There were some 0 votes from
|
||
people who didn't see a real need or who were not energized by the
|
||
idea. Some of the 0 votes were tempered by comments that the reviewer
|
||
did not even like list comprehensions or did not have any use for
|
||
generators in any form. The +1 votes outnumbered the 0 votes by about
|
||
two to one.
|
||
|
||
Author response: I've studied several syntactical variations and
|
||
concluded that the brackets are essential for:
|
||
- teachability (it's like a list comprehension)
|
||
- set-off (yield applies to the comprehension not the enclosing
|
||
function)
|
||
- substitutability (list comprehensions can be made lazy just by
|
||
adding yield)
|
||
|
||
What I like best about generator comprehensions is that I can design
|
||
using list comprehensions and then easily switch to a generator (by
|
||
adding yield) in response to scalability requirements (when the list
|
||
comprehension produces too large of an intermediate result). Had
|
||
generators already been in-place when list comprehensions were
|
||
accepted, the yield option might have been incorporated from the
|
||
start. For certain, the mathematical style notation is explicit and
|
||
readable as compared to a separate function definition with an
|
||
embedded yield.
|
||
|
||
|
||
|
||
Specification for Generator Exception Passing:
|
||
|
||
Add a .throw(exception) method to the generator interface:
|
||
|
||
def logger():
|
||
start = time.time()
|
||
log = []
|
||
try:
|
||
while 1:
|
||
log.append( time.time() - start )
|
||
yield log[-1]
|
||
except WriteLog:
|
||
return log
|
||
|
||
g = logger()
|
||
for i in [10,20,40,80,160]:
|
||
testsuite(i)
|
||
g.next()
|
||
g.throw(WriteLog)
|
||
|
||
There is no existing work-around for triggering an exception
|
||
inside a generator. This is a true deficiency. It is the only
|
||
case in Python where active code cannot be excepted to or through.
|
||
|
||
Generator exception passing also helps address an intrinsic limitation
|
||
on generators, the prohibition against their using try/finally to
|
||
trigger clean-up code [1]. Without .throw(), the current work-around
|
||
forces the resolution or clean-up code to be moved outside the generator.
|
||
|
||
|
||
Note A: The name of the throw method was selected for several
|
||
reasons. Raise is a keyword and so cannot be used as a method
|
||
name. Unlike raise which immediately raises an exception from the
|
||
current execution point, throw will first return to the generator
|
||
and then raise the exception. The word throw is suggestive of
|
||
putting the exception in another location. The word throw is
|
||
already associated with exceptions in other languages.
|
||
|
||
Alternative method names were considered: resolve(), signal(),
|
||
genraise(), raiseinto(), and flush(). None of these seem to fit
|
||
as well as throw().
|
||
|
||
|
||
Note B: The throw syntax should exactly match raise's syntax:
|
||
|
||
throw([expression, [expression, [expression]]])
|
||
|
||
Accordingly, it should be implemented to handle all of the following:
|
||
|
||
raise string g.throw(string)
|
||
raise string, data g.throw(string,data)
|
||
raise class, instance g.throw(class,instance)
|
||
raise instance g.throw(instance)
|
||
raise g.throw()
|
||
|
||
|
||
Comments from GvR: I'm not convinced that the cleanup problem that
|
||
this is trying to solve exists in practice. I've never felt the need
|
||
to put yield inside a try/except. I think the PEP doesn't make enough
|
||
of a case that this is useful.
|
||
|
||
Comments from Ka-Ping Yee: I agree that the exception issue needs to
|
||
be resolved and [that] you have suggested a fine solution.
|
||
|
||
Comments from Neil Schemenauer: The exception passing idea is one I
|
||
hadn't thought of before and looks interesting. If we enable the
|
||
passing of values back, then we should add this feature too.
|
||
|
||
Comments for Magnus Lie Hetland: Even though I cannot speak for the
|
||
ease of implementation, I vote +1 for the exception passing mechanism.
|
||
|
||
Comments from the Community: The response has been mostly favorable. One
|
||
negative comment from GvR is shown above. The other was from Martin von
|
||
Loewis who was concerned that it could be difficult to implement and
|
||
is withholding his support until a working patch is available. To probe
|
||
Martin's comment, I checked with the implementers of the original
|
||
generator PEP for an opinion on the ease of implementation. They felt that
|
||
implementation would be straight-forward and could be grafted onto the
|
||
existing implementation without disturbing its internals.
|
||
|
||
Author response: When the sole use of generators is to simplify writing
|
||
iterators for lazy producers, then the odds of needing generator
|
||
exception passing are slim. If, on the other hand, generators
|
||
are used to write lazy consumers, create coroutines, generate output
|
||
streams, or simply for their marvelous capability for restarting a
|
||
previously frozen state, THEN the need to raise exceptions will
|
||
come up frequently.
|
||
|
||
I'm no judge of what is truly Pythonic, but am still astonished
|
||
that there can exist blocks of code that can't be excepted to or
|
||
through, that the try/finally combination is blocked, and that the
|
||
only work-around is to rewrite as a class and move the exception
|
||
code out of the function or method being excepted.
|
||
|
||
|
||
|
||
References
|
||
|
||
[1] PEP 255 Simple Generators
|
||
http://python.sourceforge.net/peps/pep-0255.html
|
||
|
||
[2] PEP 212 Loop Counter Iteration
|
||
http://python.sourceforge.net/peps/pep-0212.html
|
||
|
||
[3] PEP 202 List Comprehensions
|
||
http://python.sourceforge.net/peps/pep-0202.html
|
||
|
||
[4] PEP 234 Iterators
|
||
http://python.sourceforge.net/peps/pep-0234.html
|
||
|
||
[5] A pure Python simulation of every feature in this PEP is at:
|
||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
|
||
|
||
[6] The full, working source code for each of the examples in this PEP
|
||
along with other examples and tests is at:
|
||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
|
||
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|