Resurrect the PEP on generator expressions

This commit is contained in:
Raymond Hettinger 2003-10-22 18:09:36 +00:00
parent ea6410f7bd
commit a08117dee0
2 changed files with 218 additions and 187 deletions

View File

@ -97,6 +97,7 @@ Index by Category
S 286 Enhanced Argument Tuples von Loewis S 286 Enhanced Argument Tuples von Loewis
I 287 reStructuredText Docstring Format Goodger I 287 reStructuredText Docstring Format Goodger
S 288 Generators Attributes and Exceptions Hettinger S 288 Generators Attributes and Exceptions Hettinger
S 289 Generator Expressions Hettinger
S 292 Simpler String Substitutions Warsaw S 292 Simpler String Substitutions Warsaw
S 294 Type Names in the types Module Tirosh S 294 Type Names in the types Module Tirosh
S 297 Support for System Upgrades Lemburg S 297 Support for System Upgrades Lemburg
@ -182,7 +183,6 @@ Index by Category
SR 259 Omit printing newline after newline GvR SR 259 Omit printing newline after newline GvR
SR 270 uniq method for list objects Petrone SR 270 uniq method for list objects Petrone
SR 271 Prefixing sys.path by command line option Giacometti SR 271 Prefixing sys.path by command line option Giacometti
SR 289 Generator Comprehensions Hettinger
SR 295 Interpretation of multiline string constants Koltsov SR 295 Interpretation of multiline string constants Koltsov
SR 296 Adding a bytes Object Type Gilbert SR 296 Adding a bytes Object Type Gilbert
SR 308 If-then-else expression GvR, Hettinger SR 308 If-then-else expression GvR, Hettinger
@ -304,7 +304,7 @@ Numerical Index
S 286 Enhanced Argument Tuples von Loewis S 286 Enhanced Argument Tuples von Loewis
I 287 reStructuredText Docstring Format Goodger I 287 reStructuredText Docstring Format Goodger
S 288 Generators Attributes and Exceptions Hettinger S 288 Generators Attributes and Exceptions Hettinger
SR 289 Generator Comprehensions Hettinger S 289 Generator Expressions Hettinger
I 290 Code Migration and Modernization Hettinger I 290 Code Migration and Modernization Hettinger
I 291 Backward Compatibility for Standard Library Norwitz I 291 Backward Compatibility for Standard Library Norwitz
S 292 Simpler String Substitutions Warsaw S 292 Simpler String Substitutions Warsaw

View File

@ -1,244 +1,275 @@
PEP: 289 PEP: 289
Title: Generator Comprehensions Title: Generator Expressions
Version: $Revision$ Version: $Revision$
Last-Modified: $Date$ Last-Modified: $Date$
Author: python@rcn.com (Raymond D. Hettinger) Author: python@rcn.com (Raymond D. Hettinger)
Status: Rejected Status: Active
Type: Standards Track Type: Standards Track
Content-Type: text/x-rst
Created: 30-Jan-2002 Created: 30-Jan-2002
Python-Version: 2.3 Python-Version: 2.3
Post-History: Post-History: 22-Oct-2003
Abstract Abstract
========
This PEP introduces generator comprehensions as an idea for This PEP introduces generator expressions as a high performance,
enhancing the generators introduced in Python version 2.2 [1]. memory efficient generalization of list comprehensions [1]_ and
The goal is to increase the convenience, utility, and power of generators [2]_.
generators by making it easy to convert a list comprehension into
a generator.
Rationale Rationale
=========
Python 2.2 introduced the concept of an iterable interface as Experience with list comprehensions has shown their wide-spread
proposed in PEP 234 [4]. The iter() factory function was provided utility throughout Python. However, many of the use cases do
as common calling convention and deep changes were made to use not need to have a full list created in memory. Instead, they
iterators as a unifying theme throughout Python. The unification only need to iterate over the elements one at a time.
came in the form of establishing a common iterable interface for
mappings, sequences, and file objects.
Generators, as proposed in PEP 255 [1], were introduced as a means For instance, the following summation code will build a full list of
for making it easier to create iterators, especially ones with squares in memory, iterate over those values, and, when the reference
complex internal execution or variable states. When I created new is no longer needed, delete the list::
programs, generators were often the tool of choice for creating an
iterator.
However, when updating existing programs, I found that the tool sum([x*x for x in range(10)])
had another use, one that improved program function as well as
structure. Some programs exhibited a pattern of creating large
lists and then looping over them. As data sizes increased, the
programs encountered scalability limitations owing to excessive
memory consumption (and malloc time) for the intermediate lists.
Generators were found to be directly substitutable for the lists
while eliminating the memory issues through lazy evaluation
a.k.a. just in time manufacturing.
Python itself encountered similar issues. As a result, xrange() Time, clarity, and memory are conserved by using an generator
and xreadlines() were introduced. And, in the case of file expession instead::
objects and mappings, just-in-time evaluation became the norm.
Generators provide a tool to program memory conserving for-loops
whenever complete evaluation is not desired because of memory
restrictions or availability of data.
The next step in the evolution of generators is to establish a sum(x*x for x in range(10))
generator alternative to list comprehensions [3]. This
alternative provides a simple way to convert a list comprehension
into a generator whenever memory issues arise.
This suggestion is designed to take advantage of the existing Similar benefits are conferred on constructors for container objects::
implementation and require little additional effort to
incorporate. It is backward compatible and requires no new s = Set(word for line in page for word in line.split())
keywords. d = dict( (k, func(v)) for k in keylist)
Generator expressions are especially useful in functions that reduce
an iterable input to a single value::
sum(len(line) for line in file if line.strip())
Accordingly, generator expressions are expected to partially eliminate
the need for reduce() which is notorious for its lack of clarity. And,
there are additional speed and clarity benefits from writing expressions
directly instead of using lambda.
List comprehensions greatly reduced the need for filter() and map().
Likewise, generator expressions are expected to minimize the need
for itertools.ifilter() and itertools.imap(). In contrast, the
utility of other itertools will be enhanced by generator expressions::
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
Having a syntax similar to list comprehensions also makes it easy to
convert existing code into an generator expression when scaling up
application.
BDFL Pronouncements BDFL Pronouncements
===================
Generator comprehensions are REJECTED. The rationale is that the The previous version of this PEP was REJECTED. The bracketed yield
benefits are marginal since generators can already be coded syntax left something to be desired; the performance gains had not been
directly and the costs are high because implementation and demonstrated; and the range of use cases had not been shown. After,
maintenance require major efforts with the parser. much discussion on the python-dev list, the PEP has been resurrected
its present form. The impetus for the discussion was an innovative
proposal from Peter Norvig [3]_.
Reference Implementation The Gory Details
================
There is not currently a CPython implementation; however, a 1. The semantics of a generator expression are equivalent to creating
simulation module written in pure Python is available on an anonymous generator function and calling it. There's still discussion
SourceForge [5]. The simulation is meant to allow direct about whether that generator function should copy the current value of all
experimentation with the proposal. free variables into default arguments.
There is also a module [6] with working source code for all of the 2. The syntax requires that a generator expression always needs to be inside
examples used in this PEP. It serves as a test suite for the a set of parentheses and cannot have a comma on either side. Unfortunately,
simulator and it documents how each the new feature works in this is different from list comprehensions. While [1, x for x in R] is
practice. illegal, [x for x in 1, 2, 3] is legal, meaning [x for x in (1,2,3)].
With reference to the file Grammar/Grammar in CVS, two rules change:
The authors and implementers of PEP 255 [1] were contacted to a) The rule::
provide their assessment of whether these enhancements were going
to be straight-forward to implement and require only minor atom: '(' [testlist] ')'
modification of the existing generator code. Neil felt the
assertion was correct. Ka-Ping thought so also. GvR said he changes to::
could believe that it was true. Later GvR re-assessed and thought
that it would be difficult to tweak the code generator to produce atom: '(' [listmaker1] ')'
a separate object. Tim did not have an opportunity to give an
assessment. where listmaker1 is almost the same as listmaker, but only allows
a single test after 'for' ... 'in'.
b) The rule for arglist needs similar changes.
Specification for Generator Comprehensions : 2. The loop variable is not exposed to the surrounding function. This
facilates the implementation and makes typical use cases more reliable.
In some future version of Python, list comprehensions will also hide the
induction variable from the surrounding code (and, in Py2.4, warnings
will be issued for code accessing the induction variable).
If a list comprehension starts with a 'yield' keyword, then 3. There is still discussion about whether variable referenced in generator
express the comprehension with a generator. For example: expressions will exhibit late binding just like other Python code. In the
following example, the iterator runs *after* the value of y is set to one::
g = [yield (len(line),line) for line in file if len(line)>5] def h():
y = 0
l = [1,2]
def gen(S):
for x in S:
yield x+y
it = gen(l)
y = 1
for v in it:
print v
This would be implemented as if it had been written: 4. List comprehensions will remain unchanged::
def __temp(self): [x for x in S] # This is a list comprehension.
for line in file: [(x for x in S)] # This is a list containing one generator expression.
if len(line) > 5:
yield (len(line), line)
g = __temp()
Note A: There is some discussion about whether the enclosing
brackets should be part of the syntax for generator
comprehensions. On the plus side, it neatly parallels list
comprehensions and would be immediately recognizable as a similar
form with similar internal syntax (taking maximum advantage of
what people already know). More importantly, it sets off the
generator comprehension from the rest of the function so as to not
suggest that the enclosing function is a generator (currently the
only cue that a function is really a generator is the presence of
the yield keyword). On the minus side, the brackets may falsely
suggest that the whole expression returns a list. Most of the
feedback received to date indicates that brackets are helpful and
not misleading. Unfortunately, the one dissent is from GvR.
A key advantage of the generator comprehension syntax is that it Reduction Functions
makes it trivially easy to transform existing list comprehension ===================
code to a generator by adding yield. Likewise, it can be
converted back to a list by deleting yield. This makes it easy to
scale-up programs from small datasets to ones large enough to
warrant just in time evaluation.
Note B: List comprehensions expose their looping variable and The utility of generator expressions is greatly enhanced when combined
leave that variable in the enclosing scope. The code, [str(i) for with appropriate reduction functions like sum(), min(), and max(). I
i in range(8)] leaves 'i' set to 7 in the scope where the propose creating a set of high speed reduction functions designed to tap the
comprehension appears. This behavior is by design and reflects an power of generator expressions and replace the most common uses of reduce()::
intent to duplicate the result of coding a for-loop instead of a
list comprehension. Further, the variable 'i' is in a defined and
potentially useful state on the line immediately following the
list comprehension.
In contrast, generator comprehensions do not expose the looping def xorsum(it):
variable to the enclosing scope. The code, [yield str(i) for i in return reduce(operator.xor, it, 0)
range(8)] leaves 'i' untouched in the scope where the
comprehension appears. This is also by design and reflects an
intent to duplicate the result of coding a generator directly
instead of a generator comprehension. Further, the variable 'i'
is not in a defined state on the line immediately following the
list comprehension. It does not come into existence until
iteration starts (possibly never).
Comments from GvR: Cute hack, but I think the use of the [] syntax def product(it):
strongly suggests that it would return a list, not an return reduce(operator.mul, it, 1)
iterator. I also think that this is trying to turn Python into
a functional language, where most algorithms use lazy infinite
sequences, and I just don't think that's where its future
lies.
I don't think it's worth the trouble. I expect it will take a def anytrue(it):
lot of work to hack it into the code generator: it has to for elem in it:
create a separate code object in order to be a generator. if it:
List comprehensions are inlined, so I expect that the return True
generator comprehension code generator can't share much with return False
the list comprehension code generator. And this for something
that's not that common and easily done by writing a 2-line
helper function. IOW the ROI isn't high enough.
Comments from Ka-Ping Yee: I am very happy with the things you have def alltrue(it):
proposed in this PEP. I feel quite positive about generator for elem in it:
comprehensions and have no reservations. So a +1 on that. if it:
return False
return True
Comments from Neil Schemenauer: I'm -0 on the generator list def horner(it, x):
comprehensions. They don't seem to add much. You could 'horner([6,3,4], 5) evaluates 6*x**2 + 3*x + 4 at x=5'
easily use a nested generator to do the same thing. They cum = 0.0
smell like lambda. for c in it:
cum = cum*x + c
return cum
Comments from Magnus Lie Hetland: Generator comprehensions seem mildly def mean(it):
useful, but I vote +0. Defining a separate, named generator data = list(it)
would probably be my preference. On the other hand, I do see return sum(data) / len(data)
the advantage of "scaling up" from list comprehensions.
Comments from the Community: The response to the generator comprehension def smallest(it, siz=1):
proposal has been mostly favorable. There were some 0 votes result = []
from people who didn't see a real need or who were not for elem in it:
energized by the idea. Some of the 0 votes were tempered by if len(result) < siz:
comments that the reviewer did not even like list bisect.insort_left(result, elem)
comprehensions or did not have any use for generators in any elif elem < result[-1]:
form. The +1 votes outnumbered the 0 votes by about two to result.pop()
one. bisect.insort_left(result, elem)
return result
Author response: I've studied several syntactical variations and def largest(it, siz=1):
concluded that the brackets are essential for: result = []
- teachability (it's like a list comprehension) for elem in it:
- set-off (yield applies to the comprehension not the enclosing if len(result) < siz:
function) bisect.insort_left(result, elem)
- substitutability (list comprehensions can be made lazy just by elif elem > result[0]:
adding yield) result.pop(0)
bisect.insort_left(result, elem)
result.reverse()
return result
What I like best about generator comprehensions is that I can Notes on reduce()
design using list comprehensions and then easily switch to a =================
generator (by adding yield) in response to scalability
requirements (when the list comprehension produces too large Reduce typically has three types of use cases:
of an intermediate result). Had generators already been
in-place when list comprehensions were accepted, the yield 1) Common reduction functions applied directly to elements in a sequence.
option might have been incorporated from the start. For This use case is addressed by the sum(), min(), max(), and the additional
certain, the mathematical style notation is explicit and functions listed above.
readable as compared to a separate function definition with an
embedded yield. 2) Reduce is often used with lambda when the data needs be extracted from
complex sequence elements. For example::
reduce(lambda sum, x: sum + x.myattr, data, 0)
reduce(lambda prod, x: prod * x[3], data, 1)
In concert with reduction functions, generator expressions completely
fulfill these use cases::
sum(x.myattr for x in data)
product(x[3] for x in data)
3) On rare occasions, the reduction function is non-standard and requires
custom coding::
reduce(lambda cum, c: (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)], data, -1)
Because a complex lambda is required, this use case becomes clearer and
faster when coded directly as a for-loop::
cum = -1
for c in data:
cum = (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)]
In conclusion, after adding generator expressions and a set of common reduction
functions, few, if any cases remain for reduce().
Acknowledgements
================
* Raymond Hettinger first proposed the idea of "generator comprehensions"
in January 2002.
* Peter Norvig resurrected the discussion in his proposal for
Accumulation Displays [3]_.
* Alex Martelli provided critical measurements that proved the performance
benefits of generator expressions. He also provided strong arguments
that they were a desirable thing to have.
* Samuele Pedroni provided the example of late binding.
Various contributors have made arguments for and against late binding.
* Phillip Eby suggested "iterator expressions" as the name.
* Subsequently, Tim Peters suggested the name "generator expressions".
References References
==========
[1] PEP 255 Simple Generators .. [1] PEP 202 List Comprehensions
http://python.sourceforge.net/peps/pep-0255.html
[2] PEP 212 Loop Counter Iteration
http://python.sourceforge.net/peps/pep-0212.html
[3] PEP 202 List Comprehensions
http://python.sourceforge.net/peps/pep-0202.html http://python.sourceforge.net/peps/pep-0202.html
[4] PEP 234 Iterators .. [2] PEP 255 Simple Generators
http://python.sourceforge.net/peps/pep-0234.html http://python.sourceforge.net/peps/pep-0255.html
[5] A pure Python simulation of every feature in this PEP is at: .. [3] Peter Norvig's Accumulation Display Proposal
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752 http:///www.norvig.com/pyacc.html
[6] The full, working source code for each of the examples in this PEP
along with other examples and tests is at:
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
[7] Another partial implementation is at:
http://www.python.org/sf/795947
Copyright Copyright
=========
This document has been placed in the public domain. This document has been placed in the public domain.
..
Local Variables: Local Variables:
mode: indented-text mode: indented-text
indent-tabs-mode: nil indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70 fill-column: 70
End: End: