python-peps/pep-0289.txt

245 lines
11 KiB
Plaintext
Raw Normal View History

PEP: 289
Title: Generator Comprehensions
Version: $Revision$
Last-Modified: $Date$
Author: python@rcn.com (Raymond D. Hettinger)
Status: Rejected
Type: Standards Track
Created: 30-Jan-2002
Python-Version: 2.3
Post-History:
Abstract
This PEP introduces generator comprehensions as an idea for
enhancing the generators introduced in Python version 2.2 [1].
The goal is to increase the convenience, utility, and power of
generators by making it easy to convert a list comprehension into
a generator.
Rationale
Python 2.2 introduced the concept of an iterable interface as
proposed in PEP 234 [4]. The iter() factory function was provided
as common calling convention and deep changes were made to use
iterators as a unifying theme throughout Python. The unification
came in the form of establishing a common iterable interface for
mappings, sequences, and file objects.
Generators, as proposed in PEP 255 [1], were introduced as a means
for making it easier to create iterators, especially ones with
complex internal execution or variable states. When I created new
programs, generators were often the tool of choice for creating an
iterator.
However, when updating existing programs, I found that the tool
had another use, one that improved program function as well as
structure. Some programs exhibited a pattern of creating large
lists and then looping over them. As data sizes increased, the
programs encountered scalability limitations owing to excessive
memory consumption (and malloc time) for the intermediate lists.
Generators were found to be directly substitutable for the lists
while eliminating the memory issues through lazy evaluation
a.k.a. just in time manufacturing.
Python itself encountered similar issues. As a result, xrange()
and xreadlines() were introduced. And, in the case of file
objects and mappings, just-in-time evaluation became the norm.
Generators provide a tool to program memory conserving for-loops
whenever complete evaluation is not desired because of memory
restrictions or availability of data.
The next step in the evolution of generators is to establish a
generator alternative to list comprehensions [3]. This
alternative provides a simple way to convert a list comprehension
into a generator whenever memory issues arise.
This suggestion is designed to take advantage of the existing
implementation and require little additional effort to
incorporate. It is backward compatible and requires no new
keywords.
BDFL Pronouncements
Generator comprehensions are REJECTED. The rationale is that the
benefits are marginal since generators can already be coded
directly and the costs are high because implementation and
maintenance require major efforts with the parser.
Reference Implementation
There is not currently a CPython implementation; however, a
simulation module written in pure Python is available on
SourceForge [5]. The simulation is meant to allow direct
experimentation with the proposal.
There is also a module [6] with working source code for all of the
examples used in this PEP. It serves as a test suite for the
simulator and it documents how each the new feature works in
practice.
The authors and implementers of PEP 255 [1] were contacted to
provide their assessment of whether these enhancements were going
to be straight-forward to implement and require only minor
modification of the existing generator code. Neil felt the
assertion was correct. Ka-Ping thought so also. GvR said he
could believe that it was true. Later GvR re-assessed and thought
that it would be difficult to tweak the code generator to produce
a separate object. Tim did not have an opportunity to give an
assessment.
Specification for Generator Comprehensions :
If a list comprehension starts with a 'yield' keyword, then
express the comprehension with a generator. For example:
g = [yield (len(line),line) for line in file if len(line)>5]
This would be implemented as if it had been written:
def __temp(self):
for line in file:
if len(line) > 5:
yield (len(line), line)
g = __temp()
Note A: There is some discussion about whether the enclosing
brackets should be part of the syntax for generator
comprehensions. On the plus side, it neatly parallels list
comprehensions and would be immediately recognizable as a similar
form with similar internal syntax (taking maximum advantage of
what people already know). More importantly, it sets off the
generator comprehension from the rest of the function so as to not
suggest that the enclosing function is a generator (currently the
only cue that a function is really a generator is the presence of
the yield keyword). On the minus side, the brackets may falsely
suggest that the whole expression returns a list. Most of the
feedback received to date indicates that brackets are helpful and
not misleading. Unfortunately, the one dissent is from GvR.
A key advantage of the generator comprehension syntax is that it
makes it trivially easy to transform existing list comprehension
code to a generator by adding yield. Likewise, it can be
converted back to a list by deleting yield. This makes it easy to
scale-up programs from small datasets to ones large enough to
warrant just in time evaluation.
Note B: List comprehensions expose their looping variable and
leave that variable in the enclosing scope. The code, [str(i) for
i in range(8)] leaves 'i' set to 7 in the scope where the
comprehension appears. This behavior is by design and reflects an
intent to duplicate the result of coding a for-loop instead of a
list comprehension. Further, the variable 'i' is in a defined and
potentially useful state on the line immediately following the
list comprehension.
In contrast, generator comprehensions do not expose the looping
variable to the enclosing scope. The code, [yield str(i) for i in
range(8)] leaves 'i' untouched in the scope where the
comprehension appears. This is also by design and reflects an
intent to duplicate the result of coding a generator directly
instead of a generator comprehension. Further, the variable 'i'
is not in a defined state on the line immediately following the
list comprehension. It does not come into existence until
iteration starts (possibly never).
Comments from GvR: Cute hack, but I think the use of the [] syntax
strongly suggests that it would return a list, not an
iterator. I also think that this is trying to turn Python into
a functional language, where most algorithms use lazy infinite
sequences, and I just don't think that's where its future
lies.
I don't think it's worth the trouble. I expect it will take a
lot of work to hack it into the code generator: it has to
create a separate code object in order to be a generator.
List comprehensions are inlined, so I expect that the
generator comprehension code generator can't share much with
the list comprehension code generator. And this for something
that's not that common and easily done by writing a 2-line
helper function. IOW the ROI isn't high enough.
Comments from Ka-Ping Yee: I am very happy with the things you have
proposed in this PEP. I feel quite positive about generator
comprehensions and have no reservations. So a +1 on that.
Comments from Neil Schemenauer: I'm -0 on the generator list
comprehensions. They don't seem to add much. You could
easily use a nested generator to do the same thing. They
smell like lambda.
Comments from Magnus Lie Hetland: Generator comprehensions seem mildly
useful, but I vote +0. Defining a separate, named generator
would probably be my preference. On the other hand, I do see
the advantage of "scaling up" from list comprehensions.
Comments from the Community: The response to the generator comprehension
proposal has been mostly favorable. There were some 0 votes
from people who didn't see a real need or who were not
energized by the idea. Some of the 0 votes were tempered by
comments that the reviewer did not even like list
comprehensions or did not have any use for generators in any
form. The +1 votes outnumbered the 0 votes by about two to
one.
Author response: I've studied several syntactical variations and
concluded that the brackets are essential for:
- teachability (it's like a list comprehension)
- set-off (yield applies to the comprehension not the enclosing
function)
- substitutability (list comprehensions can be made lazy just by
adding yield)
What I like best about generator comprehensions is that I can
design using list comprehensions and then easily switch to a
generator (by adding yield) in response to scalability
requirements (when the list comprehension produces too large
of an intermediate result). Had generators already been
in-place when list comprehensions were accepted, the yield
option might have been incorporated from the start. For
certain, the mathematical style notation is explicit and
readable as compared to a separate function definition with an
embedded yield.
References
[1] PEP 255 Simple Generators
http://python.sourceforge.net/peps/pep-0255.html
[2] PEP 212 Loop Counter Iteration
http://python.sourceforge.net/peps/pep-0212.html
[3] PEP 202 List Comprehensions
http://python.sourceforge.net/peps/pep-0202.html
[4] PEP 234 Iterators
http://python.sourceforge.net/peps/pep-0234.html
[5] A pure Python simulation of every feature in this PEP is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
[6] The full, working source code for each of the examples in this PEP
along with other examples and tests is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
2003-08-30 19:57:36 -04:00
[7] Another partial implementation is at:
http://www.python.org/sf/795947
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: