Resurrect the PEP on generator expressions
This commit is contained in:
parent
ea6410f7bd
commit
a08117dee0
|
@ -97,6 +97,7 @@ Index by Category
|
|||
S 286 Enhanced Argument Tuples von Loewis
|
||||
I 287 reStructuredText Docstring Format Goodger
|
||||
S 288 Generators Attributes and Exceptions Hettinger
|
||||
S 289 Generator Expressions Hettinger
|
||||
S 292 Simpler String Substitutions Warsaw
|
||||
S 294 Type Names in the types Module Tirosh
|
||||
S 297 Support for System Upgrades Lemburg
|
||||
|
@ -182,7 +183,6 @@ Index by Category
|
|||
SR 259 Omit printing newline after newline GvR
|
||||
SR 270 uniq method for list objects Petrone
|
||||
SR 271 Prefixing sys.path by command line option Giacometti
|
||||
SR 289 Generator Comprehensions Hettinger
|
||||
SR 295 Interpretation of multiline string constants Koltsov
|
||||
SR 296 Adding a bytes Object Type Gilbert
|
||||
SR 308 If-then-else expression GvR, Hettinger
|
||||
|
@ -304,7 +304,7 @@ Numerical Index
|
|||
S 286 Enhanced Argument Tuples von Loewis
|
||||
I 287 reStructuredText Docstring Format Goodger
|
||||
S 288 Generators Attributes and Exceptions Hettinger
|
||||
SR 289 Generator Comprehensions Hettinger
|
||||
S 289 Generator Expressions Hettinger
|
||||
I 290 Code Migration and Modernization Hettinger
|
||||
I 291 Backward Compatibility for Standard Library Norwitz
|
||||
S 292 Simpler String Substitutions Warsaw
|
||||
|
|
401
pep-0289.txt
401
pep-0289.txt
|
@ -1,244 +1,275 @@
|
|||
PEP: 289
|
||||
Title: Generator Comprehensions
|
||||
Title: Generator Expressions
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: python@rcn.com (Raymond D. Hettinger)
|
||||
Status: Rejected
|
||||
Status: Active
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 30-Jan-2002
|
||||
Python-Version: 2.3
|
||||
Post-History:
|
||||
Post-History: 22-Oct-2003
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP introduces generator comprehensions as an idea for
|
||||
enhancing the generators introduced in Python version 2.2 [1].
|
||||
The goal is to increase the convenience, utility, and power of
|
||||
generators by making it easy to convert a list comprehension into
|
||||
a generator.
|
||||
This PEP introduces generator expressions as a high performance,
|
||||
memory efficient generalization of list comprehensions [1]_ and
|
||||
generators [2]_.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Python 2.2 introduced the concept of an iterable interface as
|
||||
proposed in PEP 234 [4]. The iter() factory function was provided
|
||||
as common calling convention and deep changes were made to use
|
||||
iterators as a unifying theme throughout Python. The unification
|
||||
came in the form of establishing a common iterable interface for
|
||||
mappings, sequences, and file objects.
|
||||
Experience with list comprehensions has shown their wide-spread
|
||||
utility throughout Python. However, many of the use cases do
|
||||
not need to have a full list created in memory. Instead, they
|
||||
only need to iterate over the elements one at a time.
|
||||
|
||||
Generators, as proposed in PEP 255 [1], were introduced as a means
|
||||
for making it easier to create iterators, especially ones with
|
||||
complex internal execution or variable states. When I created new
|
||||
programs, generators were often the tool of choice for creating an
|
||||
iterator.
|
||||
For instance, the following summation code will build a full list of
|
||||
squares in memory, iterate over those values, and, when the reference
|
||||
is no longer needed, delete the list::
|
||||
|
||||
However, when updating existing programs, I found that the tool
|
||||
had another use, one that improved program function as well as
|
||||
structure. Some programs exhibited a pattern of creating large
|
||||
lists and then looping over them. As data sizes increased, the
|
||||
programs encountered scalability limitations owing to excessive
|
||||
memory consumption (and malloc time) for the intermediate lists.
|
||||
Generators were found to be directly substitutable for the lists
|
||||
while eliminating the memory issues through lazy evaluation
|
||||
a.k.a. just in time manufacturing.
|
||||
sum([x*x for x in range(10)])
|
||||
|
||||
Python itself encountered similar issues. As a result, xrange()
|
||||
and xreadlines() were introduced. And, in the case of file
|
||||
objects and mappings, just-in-time evaluation became the norm.
|
||||
Generators provide a tool to program memory conserving for-loops
|
||||
whenever complete evaluation is not desired because of memory
|
||||
restrictions or availability of data.
|
||||
Time, clarity, and memory are conserved by using an generator
|
||||
expession instead::
|
||||
|
||||
The next step in the evolution of generators is to establish a
|
||||
generator alternative to list comprehensions [3]. This
|
||||
alternative provides a simple way to convert a list comprehension
|
||||
into a generator whenever memory issues arise.
|
||||
sum(x*x for x in range(10))
|
||||
|
||||
This suggestion is designed to take advantage of the existing
|
||||
implementation and require little additional effort to
|
||||
incorporate. It is backward compatible and requires no new
|
||||
keywords.
|
||||
Similar benefits are conferred on constructors for container objects::
|
||||
|
||||
s = Set(word for line in page for word in line.split())
|
||||
d = dict( (k, func(v)) for k in keylist)
|
||||
|
||||
Generator expressions are especially useful in functions that reduce
|
||||
an iterable input to a single value::
|
||||
|
||||
sum(len(line) for line in file if line.strip())
|
||||
|
||||
Accordingly, generator expressions are expected to partially eliminate
|
||||
the need for reduce() which is notorious for its lack of clarity. And,
|
||||
there are additional speed and clarity benefits from writing expressions
|
||||
directly instead of using lambda.
|
||||
|
||||
List comprehensions greatly reduced the need for filter() and map().
|
||||
Likewise, generator expressions are expected to minimize the need
|
||||
for itertools.ifilter() and itertools.imap(). In contrast, the
|
||||
utility of other itertools will be enhanced by generator expressions::
|
||||
|
||||
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
|
||||
|
||||
Having a syntax similar to list comprehensions also makes it easy to
|
||||
convert existing code into an generator expression when scaling up
|
||||
application.
|
||||
|
||||
|
||||
BDFL Pronouncements
|
||||
===================
|
||||
|
||||
Generator comprehensions are REJECTED. The rationale is that the
|
||||
benefits are marginal since generators can already be coded
|
||||
directly and the costs are high because implementation and
|
||||
maintenance require major efforts with the parser.
|
||||
The previous version of this PEP was REJECTED. The bracketed yield
|
||||
syntax left something to be desired; the performance gains had not been
|
||||
demonstrated; and the range of use cases had not been shown. After,
|
||||
much discussion on the python-dev list, the PEP has been resurrected
|
||||
its present form. The impetus for the discussion was an innovative
|
||||
proposal from Peter Norvig [3]_.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
The Gory Details
|
||||
================
|
||||
|
||||
There is not currently a CPython implementation; however, a
|
||||
simulation module written in pure Python is available on
|
||||
SourceForge [5]. The simulation is meant to allow direct
|
||||
experimentation with the proposal.
|
||||
1. The semantics of a generator expression are equivalent to creating
|
||||
an anonymous generator function and calling it. There's still discussion
|
||||
about whether that generator function should copy the current value of all
|
||||
free variables into default arguments.
|
||||
|
||||
There is also a module [6] with working source code for all of the
|
||||
examples used in this PEP. It serves as a test suite for the
|
||||
simulator and it documents how each the new feature works in
|
||||
practice.
|
||||
2. The syntax requires that a generator expression always needs to be inside
|
||||
a set of parentheses and cannot have a comma on either side. Unfortunately,
|
||||
this is different from list comprehensions. While [1, x for x in R] is
|
||||
illegal, [x for x in 1, 2, 3] is legal, meaning [x for x in (1,2,3)].
|
||||
With reference to the file Grammar/Grammar in CVS, two rules change:
|
||||
|
||||
The authors and implementers of PEP 255 [1] were contacted to
|
||||
provide their assessment of whether these enhancements were going
|
||||
to be straight-forward to implement and require only minor
|
||||
modification of the existing generator code. Neil felt the
|
||||
assertion was correct. Ka-Ping thought so also. GvR said he
|
||||
could believe that it was true. Later GvR re-assessed and thought
|
||||
that it would be difficult to tweak the code generator to produce
|
||||
a separate object. Tim did not have an opportunity to give an
|
||||
assessment.
|
||||
a) The rule::
|
||||
|
||||
atom: '(' [testlist] ')'
|
||||
|
||||
changes to::
|
||||
|
||||
atom: '(' [listmaker1] ')'
|
||||
|
||||
where listmaker1 is almost the same as listmaker, but only allows
|
||||
a single test after 'for' ... 'in'.
|
||||
|
||||
b) The rule for arglist needs similar changes.
|
||||
|
||||
|
||||
Specification for Generator Comprehensions :
|
||||
2. The loop variable is not exposed to the surrounding function. This
|
||||
facilates the implementation and makes typical use cases more reliable.
|
||||
In some future version of Python, list comprehensions will also hide the
|
||||
induction variable from the surrounding code (and, in Py2.4, warnings
|
||||
will be issued for code accessing the induction variable).
|
||||
|
||||
If a list comprehension starts with a 'yield' keyword, then
|
||||
express the comprehension with a generator. For example:
|
||||
3. There is still discussion about whether variable referenced in generator
|
||||
expressions will exhibit late binding just like other Python code. In the
|
||||
following example, the iterator runs *after* the value of y is set to one::
|
||||
|
||||
g = [yield (len(line),line) for line in file if len(line)>5]
|
||||
def h():
|
||||
y = 0
|
||||
l = [1,2]
|
||||
def gen(S):
|
||||
for x in S:
|
||||
yield x+y
|
||||
it = gen(l)
|
||||
y = 1
|
||||
for v in it:
|
||||
print v
|
||||
|
||||
This would be implemented as if it had been written:
|
||||
4. List comprehensions will remain unchanged::
|
||||
|
||||
def __temp(self):
|
||||
for line in file:
|
||||
if len(line) > 5:
|
||||
yield (len(line), line)
|
||||
g = __temp()
|
||||
[x for x in S] # This is a list comprehension.
|
||||
[(x for x in S)] # This is a list containing one generator expression.
|
||||
|
||||
Note A: There is some discussion about whether the enclosing
|
||||
brackets should be part of the syntax for generator
|
||||
comprehensions. On the plus side, it neatly parallels list
|
||||
comprehensions and would be immediately recognizable as a similar
|
||||
form with similar internal syntax (taking maximum advantage of
|
||||
what people already know). More importantly, it sets off the
|
||||
generator comprehension from the rest of the function so as to not
|
||||
suggest that the enclosing function is a generator (currently the
|
||||
only cue that a function is really a generator is the presence of
|
||||
the yield keyword). On the minus side, the brackets may falsely
|
||||
suggest that the whole expression returns a list. Most of the
|
||||
feedback received to date indicates that brackets are helpful and
|
||||
not misleading. Unfortunately, the one dissent is from GvR.
|
||||
|
||||
A key advantage of the generator comprehension syntax is that it
|
||||
makes it trivially easy to transform existing list comprehension
|
||||
code to a generator by adding yield. Likewise, it can be
|
||||
converted back to a list by deleting yield. This makes it easy to
|
||||
scale-up programs from small datasets to ones large enough to
|
||||
warrant just in time evaluation.
|
||||
Reduction Functions
|
||||
===================
|
||||
|
||||
Note B: List comprehensions expose their looping variable and
|
||||
leave that variable in the enclosing scope. The code, [str(i) for
|
||||
i in range(8)] leaves 'i' set to 7 in the scope where the
|
||||
comprehension appears. This behavior is by design and reflects an
|
||||
intent to duplicate the result of coding a for-loop instead of a
|
||||
list comprehension. Further, the variable 'i' is in a defined and
|
||||
potentially useful state on the line immediately following the
|
||||
list comprehension.
|
||||
The utility of generator expressions is greatly enhanced when combined
|
||||
with appropriate reduction functions like sum(), min(), and max(). I
|
||||
propose creating a set of high speed reduction functions designed to tap the
|
||||
power of generator expressions and replace the most common uses of reduce()::
|
||||
|
||||
In contrast, generator comprehensions do not expose the looping
|
||||
variable to the enclosing scope. The code, [yield str(i) for i in
|
||||
range(8)] leaves 'i' untouched in the scope where the
|
||||
comprehension appears. This is also by design and reflects an
|
||||
intent to duplicate the result of coding a generator directly
|
||||
instead of a generator comprehension. Further, the variable 'i'
|
||||
is not in a defined state on the line immediately following the
|
||||
list comprehension. It does not come into existence until
|
||||
iteration starts (possibly never).
|
||||
def xorsum(it):
|
||||
return reduce(operator.xor, it, 0)
|
||||
|
||||
Comments from GvR: Cute hack, but I think the use of the [] syntax
|
||||
strongly suggests that it would return a list, not an
|
||||
iterator. I also think that this is trying to turn Python into
|
||||
a functional language, where most algorithms use lazy infinite
|
||||
sequences, and I just don't think that's where its future
|
||||
lies.
|
||||
def product(it):
|
||||
return reduce(operator.mul, it, 1)
|
||||
|
||||
I don't think it's worth the trouble. I expect it will take a
|
||||
lot of work to hack it into the code generator: it has to
|
||||
create a separate code object in order to be a generator.
|
||||
List comprehensions are inlined, so I expect that the
|
||||
generator comprehension code generator can't share much with
|
||||
the list comprehension code generator. And this for something
|
||||
that's not that common and easily done by writing a 2-line
|
||||
helper function. IOW the ROI isn't high enough.
|
||||
def anytrue(it):
|
||||
for elem in it:
|
||||
if it:
|
||||
return True
|
||||
return False
|
||||
|
||||
Comments from Ka-Ping Yee: I am very happy with the things you have
|
||||
proposed in this PEP. I feel quite positive about generator
|
||||
comprehensions and have no reservations. So a +1 on that.
|
||||
def alltrue(it):
|
||||
for elem in it:
|
||||
if it:
|
||||
return False
|
||||
return True
|
||||
|
||||
Comments from Neil Schemenauer: I'm -0 on the generator list
|
||||
comprehensions. They don't seem to add much. You could
|
||||
easily use a nested generator to do the same thing. They
|
||||
smell like lambda.
|
||||
def horner(it, x):
|
||||
'horner([6,3,4], 5) evaluates 6*x**2 + 3*x + 4 at x=5'
|
||||
cum = 0.0
|
||||
for c in it:
|
||||
cum = cum*x + c
|
||||
return cum
|
||||
|
||||
Comments from Magnus Lie Hetland: Generator comprehensions seem mildly
|
||||
useful, but I vote +0. Defining a separate, named generator
|
||||
would probably be my preference. On the other hand, I do see
|
||||
the advantage of "scaling up" from list comprehensions.
|
||||
def mean(it):
|
||||
data = list(it)
|
||||
return sum(data) / len(data)
|
||||
|
||||
Comments from the Community: The response to the generator comprehension
|
||||
proposal has been mostly favorable. There were some 0 votes
|
||||
from people who didn't see a real need or who were not
|
||||
energized by the idea. Some of the 0 votes were tempered by
|
||||
comments that the reviewer did not even like list
|
||||
comprehensions or did not have any use for generators in any
|
||||
form. The +1 votes outnumbered the 0 votes by about two to
|
||||
one.
|
||||
def smallest(it, siz=1):
|
||||
result = []
|
||||
for elem in it:
|
||||
if len(result) < siz:
|
||||
bisect.insort_left(result, elem)
|
||||
elif elem < result[-1]:
|
||||
result.pop()
|
||||
bisect.insort_left(result, elem)
|
||||
return result
|
||||
|
||||
Author response: I've studied several syntactical variations and
|
||||
concluded that the brackets are essential for:
|
||||
- teachability (it's like a list comprehension)
|
||||
- set-off (yield applies to the comprehension not the enclosing
|
||||
function)
|
||||
- substitutability (list comprehensions can be made lazy just by
|
||||
adding yield)
|
||||
def largest(it, siz=1):
|
||||
result = []
|
||||
for elem in it:
|
||||
if len(result) < siz:
|
||||
bisect.insort_left(result, elem)
|
||||
elif elem > result[0]:
|
||||
result.pop(0)
|
||||
bisect.insort_left(result, elem)
|
||||
result.reverse()
|
||||
return result
|
||||
|
||||
What I like best about generator comprehensions is that I can
|
||||
design using list comprehensions and then easily switch to a
|
||||
generator (by adding yield) in response to scalability
|
||||
requirements (when the list comprehension produces too large
|
||||
of an intermediate result). Had generators already been
|
||||
in-place when list comprehensions were accepted, the yield
|
||||
option might have been incorporated from the start. For
|
||||
certain, the mathematical style notation is explicit and
|
||||
readable as compared to a separate function definition with an
|
||||
embedded yield.
|
||||
Notes on reduce()
|
||||
=================
|
||||
|
||||
Reduce typically has three types of use cases:
|
||||
|
||||
1) Common reduction functions applied directly to elements in a sequence.
|
||||
This use case is addressed by the sum(), min(), max(), and the additional
|
||||
functions listed above.
|
||||
|
||||
2) Reduce is often used with lambda when the data needs be extracted from
|
||||
complex sequence elements. For example::
|
||||
|
||||
reduce(lambda sum, x: sum + x.myattr, data, 0)
|
||||
reduce(lambda prod, x: prod * x[3], data, 1)
|
||||
|
||||
In concert with reduction functions, generator expressions completely
|
||||
fulfill these use cases::
|
||||
|
||||
sum(x.myattr for x in data)
|
||||
product(x[3] for x in data)
|
||||
|
||||
3) On rare occasions, the reduction function is non-standard and requires
|
||||
custom coding::
|
||||
|
||||
reduce(lambda cum, c: (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)], data, -1)
|
||||
|
||||
Because a complex lambda is required, this use case becomes clearer and
|
||||
faster when coded directly as a for-loop::
|
||||
|
||||
cum = -1
|
||||
for c in data:
|
||||
cum = (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)]
|
||||
|
||||
In conclusion, after adding generator expressions and a set of common reduction
|
||||
functions, few, if any cases remain for reduce().
|
||||
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
* Raymond Hettinger first proposed the idea of "generator comprehensions"
|
||||
in January 2002.
|
||||
|
||||
* Peter Norvig resurrected the discussion in his proposal for
|
||||
Accumulation Displays [3]_.
|
||||
|
||||
* Alex Martelli provided critical measurements that proved the performance
|
||||
benefits of generator expressions. He also provided strong arguments
|
||||
that they were a desirable thing to have.
|
||||
|
||||
* Samuele Pedroni provided the example of late binding.
|
||||
Various contributors have made arguments for and against late binding.
|
||||
|
||||
* Phillip Eby suggested "iterator expressions" as the name.
|
||||
|
||||
* Subsequently, Tim Peters suggested the name "generator expressions".
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] PEP 255 Simple Generators
|
||||
http://python.sourceforge.net/peps/pep-0255.html
|
||||
.. [1] PEP 202 List Comprehensions
|
||||
http://python.sourceforge.net/peps/pep-0202.html
|
||||
|
||||
[2] PEP 212 Loop Counter Iteration
|
||||
http://python.sourceforge.net/peps/pep-0212.html
|
||||
.. [2] PEP 255 Simple Generators
|
||||
http://python.sourceforge.net/peps/pep-0255.html
|
||||
|
||||
[3] PEP 202 List Comprehensions
|
||||
http://python.sourceforge.net/peps/pep-0202.html
|
||||
.. [3] Peter Norvig's Accumulation Display Proposal
|
||||
http:///www.norvig.com/pyacc.html
|
||||
|
||||
[4] PEP 234 Iterators
|
||||
http://python.sourceforge.net/peps/pep-0234.html
|
||||
|
||||
[5] A pure Python simulation of every feature in this PEP is at:
|
||||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
|
||||
|
||||
[6] The full, working source code for each of the examples in this PEP
|
||||
along with other examples and tests is at:
|
||||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
|
||||
|
||||
[7] Another partial implementation is at:
|
||||
http://www.python.org/sf/795947
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
fill-column: 70
|
||||
End:
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
||||
|
|
Loading…
Reference in New Issue