python-peps/pep-0279.txt

411 lines
18 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 279
Title: Enhanced Generators
Version: $Revision$
Last-Modified: $Date$
Author: python@rcn.com (Raymond D. Hettinger)
Status: Draft
Type: Standards Track
Created: 30-Jan-2002
Python-Version: 2.3
Post-History:
Abstract
This PEP introduces three orthogonal (not mutually exclusive) ideas
for enhancing the generators introduced in Python version 2.2 [1].
The goal is to increase the convenience, utility, and power
of generators.
Rationale
Python 2.2 introduced the concept of an iterable interface as proposed
in PEP 234 [4]. The iter() factory function was provided as common
calling convention and deep changes were made to use iterators as a
unifying theme throughout Python. The unification came in the form of
establishing a common iterable interface for mappings, sequences,
and file objects.
Generators, as proposed in PEP 255 [1], were introduced as a means for
making it easier to create iterators, especially ones with complex
internal execution or variable states. When I created new programs,
generators were often the tool of choice for creating an iterator.
However, when updating existing programs, I found that the tool had
another use, one that improved program function as well as structure.
Some programs exhibited a pattern of creating large lists and then
looping over them. As data sizes increased, the programs encountered
scalability limitations owing to excessive memory consumption (and
malloc time) for the intermediate lists. Generators were found to be
directly substitutable for the lists while eliminating the memory
issues through lazy evaluation a.k.a. just in time manufacturing.
Python itself encountered similar issues. As a result, xrange() and
xreadlines() were introduced. And, in the case of file objects and
mappings, lazy evaluation became the norm. Generators provide a tool
to program memory conserving for-loops whenever complete evaluation is
not desired because of memory restrictions or availability of data.
The next steps in the evolution of generators are:
1. Add a new builtin function, indexed() which was made possible
once iterators and generators became available. It provides
all iterables with the same advantage that iteritems() affords
to dictionaries -- a compact, readable, reliable index notation.
2. Establish a generator alternative to list comprehensions [3]
that provides a simple way to convert a list comprehension into
a generator whenever memory issues arise.
3. Add a generator method to enable exceptions to be passed to a
generator. Currently, there is no clean method for triggering
exceptions from outside the generator. Also, generator exception
passing helps mitigate the try/finally prohibition for generators.
All of the suggestions are designed to take advantage of the
existing implementation and require little additional effort to
incorporate. Each is backward compatible and requires no new
keywords. The three generator tools go into Python 2.3 when
generators become final and are not imported from __future__.
Reference Implementation
There is not currently a CPython implementation; however, a simulation
module written in pure Python is available on SourceForge [5]. The
simulation covers every feature proposed in this PEP and is meant
to allow direct experimentation with the proposals.
There is also a module [6] with working source code for all of the
examples used in this PEP. It serves as a test suite for the simulator
and it documents how each of the new features works in practice.
The authors and implementers of PEP 255 [1] were contacted to provide
their assessment of whether these enhancements were going to be
straight-forward to implement and require only minor modification
of the existing generator code. Neil felt the assertion was correct.
Ka-Ping thought so also. GvR said he could believe that it was true.
Tim did not have an opportunity to give an assessment.
Specification for a new builtin:
def indexed(collection, start=0, stop=None):
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
gen = iter(collection)
cnt = start
while stop is None or cnt<stop:
yield (cnt, gen.next())
cnt += 1
Note A: PEP 212 Loop Counter Iteration [2] discussed several
proposals for achieving indexing. Some of the proposals only work
for lists unlike the above function which works for any generator,
xrange, sequence, or iterable object. Also, those proposals were
presented and evaluated in the world prior to Python 2.2 which did
not include generators. As a result, the non-generator version in
PEP 212 had the disadvantage of consuming memory with a giant list
of tuples. The generator version presented here is fast and light,
works with all iterables, and allows users to abandon the sequence
in mid-stream with no loss of computation effort.
There are other PEPs which touch on related issues: integer iterators,
integer for-loops, and one for modifying the arguments to range and
xrange. The indexed() proposal does not preclude the other proposals
and it still meets an important need even if those are adopted -- the need
to count items in any iterable. The other proposals give a means of
producing an index but not the corresponding value. This is especially
problematic if a sequence is given which doesn't support random access
such as a file object, generator, or sequence defined with __getitem__.
Note B: Almost all of the PEP reviewers welcomed the function but were
divided as to whether there should be any builtins. The main argument
for a separate module was to slow the rate of language inflation. The
main argument for a builtin was that the function is destined to be
part of a core programming style, applicable to any object with an
iterable interface. Just as zip() solves the problem of looping
over multiple sequences, the indexed() function solves the loop
counter problem.
If only one builtin is allowed, then indexed() is the most important
general purpose tool, solving the broadest class of problems while
improving program brevity, clarity and reliability.
Comments from GvR: filter and map should die and be subsumed into list
comprehensions, not grow more variants. I'd rather introduce builtins
that do iterator algebra (e.g. the iterzip that I've often used as
an example).
Comments from Ka-Ping Yee: I'm also quite happy with everything you
proposed ... and the extra builtins (really 'indexed' in particular)
are things I have wanted for a long time.
Comments from Neil Schemenauer: The new builtins sound okay. Guido
may be concerned with increasing the number of builtins too much. You
might be better off selling them as part of a module. If you use a
module then you can add lots of useful functions (Haskell has lots of
them that we could steal).
Comments for Magnus Lie Hetland: I think indexed would be a useful and
natural built-in function. I would certainly use it a lot.
I like indexed() a lot; +1. I'm quite happy to have it make PEP 281
obsolete. Adding a separate module for iterator utilities seems like
a good idea.
Comments from the Community: The response to the indexed() proposal has
been close to 100% favorable. Almost everyone loves the idea.
Author response: Prior to these comments, four builtins were proposed.
After the comments, xmap xfilter and xzip were withdrawn. The one
that remains is vital for the language and is proposed by itself.
Indexed() is trivially easy to implement and can be documented in
minutes. More importantly, it is useful in everyday programming
which does not otherwise involve explicit use of generators.
Though withdrawn from the proposal, I still secretly covet xzip()
a.k.a. iterzip() but think that it will happen on its own someday.
Specification for Generator Comprehensions:
If a list comprehension starts with a 'yield' keyword, then
express the comprehension with a generator. For example:
g = [yield (len(line),line) for line in file if len(line)>5]
This would be implemented as if it had been written:
def __temp(self):
for line in file:
if len(line) > 5:
yield (len(line), line)
g = __temp()
Note A: There is some discussion about whether the enclosing brackets
should be part of the syntax for generator comprehensions. On the
plus side, it neatly parallels list comprehensions and would be
immediately recognizable as a similar form with similar internal
syntax (taking maximum advantage of what people already know).
More importantly, it sets off the generator comprehension from the
rest of the function so as to not suggest that the enclosing
function is a generator (currently the only cue that a function is
really a generator is the presence of the yield keyword). On the
minus side, the brackets may falsely suggest that the whole
expression returns a list. Most of the feedback received to date
indicates that brackets are helpful and not misleading. Unfortunately,
the one dissent is from GvR.
A key advantage of the generator comprehension syntax is that it
makes it trivially easy to transform existing list comprehension
code to a generator by adding yield. Likewise, it can be converted
back to a list by deleting yield. This makes it easy to scale-up
programs from small datasets to ones large enough to warrant
just in time evaluation.
Note B: List comprehensions expose their looping variable and
leave that variable in the enclosing scope. The code, [str(i) for
i in range(8)] leaves 'i' set to 7 in the scope where the
comprehension appears. This behavior is by design and reflects an
intent to duplicate the result of coding a for-loop instead of a
list comprehension. Further, the variable 'i' is in a defined and
potentially useful state on the line immediately following the
list comprehension.
In contrast, generator comprehensions do not expose the looping
variable to the enclosing scope. The code, [yield str(i) for i in
range(8)] leaves 'i' untouched in the scope where the
comprehension appears. This is also by design and reflects an
intent to duplicate the result of coding a generator directly
instead of a generator comprehension. Further, the variable 'i'
is not in a defined state on the line immediately following the
list comprehension. It does not come into existence until
iteration starts (possibly never).
Comments from GvR: Cute hack, but I think the use of the [] syntax
strongly suggests that it would return a list, not an iterator. I
also think that this is trying to turn Python into a functional
language, where most algorithms use lazy infinite sequences, and I
just don't think that's where its future lies.
Comments from Ka-Ping Yee: I am very happy with the things you have
proposed in this PEP. I feel quite positive about generator
comprehensions and have no reservations. So a +1 on that.
Comments from Neil Schemenauer: I'm -0 on the generator list
comprehensions. They don't seem to add much. You could easily use
a nested generator to do the same thing. They smell like lambda.
Comments for Magnus Lie Hetland: Generator comprehensions seem mildly
useful, but I vote +0. Defining a separate, named generator would
probably be my preference. On the other hand, I do see the advantage
of "scaling up" from list comprehensions.
Comments from the Community: The response to the generator comprehension
proposal has been mostly favorable. There were some 0 votes from
people who didn't see a real need or who were not energized by the
idea. Some of the 0 votes were tempered by comments that the reviewer
did not even like list comprehensions or did not have any use for
generators in any form. The +1 votes outnumbered the 0 votes by about
two to one.
Author response: I've studied several syntactical variations and
concluded that the brackets are essential for:
- teachability (it's like a list comprehension)
- set-off (yield applies to the comprehension not the enclosing
function)
- substitutability (list comprehensions can be made lazy just by
adding yield)
What I like best about generator comprehensions is that I can design
using list comprehensions and then easily switch to a generator (by
adding yield) in response to scalability requirements (when the list
comprehension produces too large of an intermediate result). Had
generators already been in-place when list comprehensions were
accepted, the yield option might have been incorporated from the
start. For certain, the mathematical style notation is explicit and
readable as compared to a separate function definition with an
embedded yield.
Specification for Generator Exception Passing:
Add a .throw(exception) method to the generator interface:
def logger():
start = time.time()
log = []
try:
while 1:
log.append( time.time() - start )
yield log[-1]
except WriteLog:
return log
g = logger()
for i in [10,20,40,80,160]:
testsuite(i)
g.next()
g.throw(WriteLog)
There is no existing work-around for triggering an exception
inside a generator. This is a true deficiency. It is the only
case in Python where active code cannot be excepted to or through.
Generator exception passing also helps address an intrinsic limitation
on generators, the prohibition against their using try/finally to
trigger clean-up code [1]. Without .throw(), the current work-around
forces the resolution or clean-up code to be moved outside the generator.
Note A: The name of the throw method was selected for several
reasons. Raise is a keyword and so cannot be used as a method
name. Unlike raise which immediately raises an exception from the
current execution point, throw will first return to the generator
and then raise the exception. The word throw is suggestive of
putting the exception in another location. The word throw is
already associated with exceptions in other languages.
Alternative method names were considered: resolve(), signal(),
genraise(), raiseinto(), and flush(). None of these seem to fit
as well as throw().
Note B: The throw syntax should exactly match raise's syntax:
throw([expression, [expression, [expression]]])
Accordingly, it should be implemented to handle all of the following:
raise string g.throw(string)
raise string, data g.throw(string,data)
raise class, instance g.throw(class,instance)
raise instance g.throw(instance)
raise g.throw()
Comments from GvR: I'm not convinced that the cleanup problem that
this is trying to solve exists in practice. I've never felt the need
to put yield inside a try/except. I think the PEP doesn't make enough
of a case that this is useful.
Comments from Ka-Ping Yee: I agree that the exception issue needs to
be resolved and [that] you have suggested a fine solution.
Comments from Neil Schemenauer: The exception passing idea is one I
hadn't thought of before and looks interesting. If we enable the
passing of values back, then we should add this feature too.
Comments for Magnus Lie Hetland: Even though I cannot speak for the
ease of implementation, I vote +1 for the exception passing mechanism.
Comments from the Community: The response has been mostly favorable. One
negative comment from GvR is shown above. The other was from Martin von
Loewis who was concerned that it could be difficult to implement and
is withholding his support until a working patch is available. To probe
Martin's comment, I checked with the implementers of the original
generator PEP for an opinion on the ease of implementation. They felt that
implementation would be straight-forward and could be grafted onto the
existing implementation without disturbing its internals.
Author response: When the sole use of generators is to simplify writing
iterators for lazy producers, then the odds of needing generator
exception passing are slim. If, on the other hand, generators
are used to write lazy consumers, create coroutines, generate output
streams, or simply for their marvelous capability for restarting a
previously frozen state, THEN the need to raise exceptions will
come up frequently.
I'm no judge of what is truly Pythonic, but am still astonished
that there can exist blocks of code that can't be excepted to or
through, that the try/finally combination is blocked, and that the
only work-around is to rewrite as a class and move the exception
code out of the function or method being excepted.
References
[1] PEP 255 Simple Generators
http://python.sourceforge.net/peps/pep-0255.html
[2] PEP 212 Loop Counter Iteration
http://python.sourceforge.net/peps/pep-0212.html
[3] PEP 202 List Comprehensions
http://python.sourceforge.net/peps/pep-0202.html
[4] PEP 234 Iterators
http://python.sourceforge.net/peps/pep-0234.html
[5] A pure Python simulation of every feature in this PEP is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
[6] The full, working source code for each of the examples in this PEP
along with other examples and tests is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: