310 lines
10 KiB
Plaintext
310 lines
10 KiB
Plaintext
PEP: 289
|
|
Title: Generator Expressions
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: python@rcn.com (Raymond Hettinger)
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 30-Jan-2002
|
|
Python-Version: 2.4
|
|
Post-History: 22-Oct-2003
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP introduces generator expressions as a high performance,
|
|
memory efficient generalization of list comprehensions :pep:`202` and
|
|
generators :pep:`255`.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Experience with list comprehensions has shown their widespread
|
|
utility throughout Python. However, many of the use cases do
|
|
not need to have a full list created in memory. Instead, they
|
|
only need to iterate over the elements one at a time.
|
|
|
|
For instance, the following summation code will build a full list of
|
|
squares in memory, iterate over those values, and, when the reference
|
|
is no longer needed, delete the list::
|
|
|
|
sum([x*x for x in range(10)])
|
|
|
|
Memory is conserved by using a generator expression instead::
|
|
|
|
sum(x*x for x in range(10))
|
|
|
|
Similar benefits are conferred on constructors for container objects::
|
|
|
|
s = set(word for line in page for word in line.split())
|
|
d = dict( (k, func(k)) for k in keylist)
|
|
|
|
Generator expressions are especially useful with functions like sum(),
|
|
min(), and max() that reduce an iterable input to a single value::
|
|
|
|
max(len(line) for line in file if line.strip())
|
|
|
|
Generator expressions also address some examples of functionals coded
|
|
with lambda::
|
|
|
|
reduce(lambda s, a: s + a.myattr, data, 0)
|
|
reduce(lambda s, a: s + a[3], data, 0)
|
|
|
|
These simplify to::
|
|
|
|
sum(a.myattr for a in data)
|
|
sum(a[3] for a in data)
|
|
|
|
List comprehensions greatly reduced the need for filter() and map().
|
|
Likewise, generator expressions are expected to minimize the need
|
|
for itertools.ifilter() and itertools.imap(). In contrast, the
|
|
utility of other itertools will be enhanced by generator expressions::
|
|
|
|
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
|
|
|
|
Having a syntax similar to list comprehensions also makes it easy to
|
|
convert existing code into a generator expression when scaling up
|
|
application.
|
|
|
|
Early timings showed that generators had a significant performance
|
|
advantage over list comprehensions. However, the latter were highly
|
|
optimized for Py2.4 and now the performance is roughly comparable
|
|
for small to mid-sized data sets. As the data volumes grow larger,
|
|
generator expressions tend to perform better because they do not
|
|
exhaust cache memory and they allow Python to re-use objects between
|
|
iterations.
|
|
|
|
BDFL Pronouncements
|
|
===================
|
|
|
|
This PEP is ACCEPTED for Py2.4.
|
|
|
|
|
|
The Details
|
|
===========
|
|
|
|
(None of this is exact enough in the eye of a reader from Mars, but I
|
|
hope the examples convey the intention well enough for a discussion in
|
|
c.l.py. The Python Reference Manual should contain a 100% exact
|
|
semantic and syntactic specification.)
|
|
|
|
1. The semantics of a generator expression are equivalent to creating
|
|
an anonymous generator function and calling it. For example::
|
|
|
|
g = (x**2 for x in range(10))
|
|
print g.next()
|
|
|
|
is equivalent to::
|
|
|
|
def __gen(exp):
|
|
for x in exp:
|
|
yield x**2
|
|
g = __gen(iter(range(10)))
|
|
print g.next()
|
|
|
|
Only the outermost for-expression is evaluated immediately, the other
|
|
expressions are deferred until the generator is run::
|
|
|
|
|
|
g = (tgtexp for var1 in exp1 if exp2 for var2 in exp3 if exp4)
|
|
|
|
is equivalent to::
|
|
|
|
def __gen(bound_exp):
|
|
for var1 in bound_exp:
|
|
if exp2:
|
|
for var2 in exp3:
|
|
if exp4:
|
|
yield tgtexp
|
|
g = __gen(iter(exp1))
|
|
del __gen
|
|
|
|
2. The syntax requires that a generator expression always needs to be
|
|
directly inside a set of parentheses and cannot have a comma on
|
|
either side. With reference to the file Grammar/Grammar in CVS,
|
|
two rules change:
|
|
|
|
a) The rule::
|
|
|
|
atom: '(' [testlist] ')'
|
|
|
|
changes to::
|
|
|
|
atom: '(' [testlist_gexp] ')'
|
|
|
|
where testlist_gexp is almost the same as listmaker, but only
|
|
allows a single test after 'for' ... 'in'::
|
|
|
|
testlist_gexp: test ( gen_for | (',' test)* [','] )
|
|
|
|
b) The rule for arglist needs similar changes.
|
|
|
|
This means that you can write::
|
|
|
|
sum(x**2 for x in range(10))
|
|
|
|
but you would have to write::
|
|
|
|
reduce(operator.add, (x**2 for x in range(10)))
|
|
|
|
and also::
|
|
|
|
g = (x**2 for x in range(10))
|
|
|
|
i.e. if a function call has a single positional argument, it can be
|
|
a generator expression without extra parentheses, but in all other
|
|
cases you have to parenthesize it.
|
|
|
|
The exact details were checked in to Grammar/Grammar version 1.49.
|
|
|
|
3. The loop variable (if it is a simple variable or a tuple of simple
|
|
variables) is not exposed to the surrounding function. This
|
|
facilitates the implementation and makes typical use cases more
|
|
reliable. In some future version of Python, list comprehensions
|
|
will also hide the induction variable from the surrounding code
|
|
(and, in Py2.4, warnings will be issued for code accessing the
|
|
induction variable).
|
|
|
|
For example::
|
|
|
|
x = "hello"
|
|
y = list(x for x in "abc")
|
|
print x # prints "hello", not "c"
|
|
|
|
4. List comprehensions will remain unchanged. For example::
|
|
|
|
[x for x in S] # This is a list comprehension.
|
|
[(x for x in S)] # This is a list containing one generator
|
|
# expression.
|
|
|
|
Unfortunately, there is currently a slight syntactic difference.
|
|
The expression::
|
|
|
|
[x for x in 1, 2, 3]
|
|
|
|
is legal, meaning::
|
|
|
|
[x for x in (1, 2, 3)]
|
|
|
|
But generator expressions will not allow the former version::
|
|
|
|
(x for x in 1, 2, 3)
|
|
|
|
is illegal.
|
|
|
|
The former list comprehension syntax will become illegal in Python
|
|
3.0, and should be deprecated in Python 2.4 and beyond.
|
|
|
|
List comprehensions also "leak" their loop variable into the
|
|
surrounding scope. This will also change in Python 3.0, so that
|
|
the semantic definition of a list comprehension in Python 3.0 will
|
|
be equivalent to list(<generator expression>). Python 2.4 and
|
|
beyond should issue a deprecation warning if a list comprehension's
|
|
loop variable has the same name as a variable used in the
|
|
immediately surrounding scope.
|
|
|
|
Early Binding versus Late Binding
|
|
=================================
|
|
|
|
After much discussion, it was decided that the first (outermost)
|
|
for-expression should be evaluated immediately and that the remaining
|
|
expressions be evaluated when the generator is executed.
|
|
|
|
Asked to summarize the reasoning for binding the first expression,
|
|
Guido offered [1]_::
|
|
|
|
Consider sum(x for x in foo()). Now suppose there's a bug in foo()
|
|
that raises an exception, and a bug in sum() that raises an
|
|
exception before it starts iterating over its argument. Which
|
|
exception would you expect to see? I'd be surprised if the one in
|
|
sum() was raised rather the one in foo(), since the call to foo()
|
|
is part of the argument to sum(), and I expect arguments to be
|
|
processed before the function is called.
|
|
|
|
OTOH, in sum(bar(x) for x in foo()), where sum() and foo()
|
|
are bugfree, but bar() raises an exception, we have no choice but
|
|
to delay the call to bar() until sum() starts iterating -- that's
|
|
part of the contract of generators. (They do nothing until their
|
|
next() method is first called.)
|
|
|
|
Various use cases were proposed for binding all free variables when
|
|
the generator is defined. And some proponents felt that the resulting
|
|
expressions would be easier to understand and debug if bound immediately.
|
|
|
|
However, Python takes a late binding approach to lambda expressions and
|
|
has no precedent for automatic, early binding. It was felt that
|
|
introducing a new paradigm would unnecessarily introduce complexity.
|
|
|
|
After exploring many possibilities, a consensus emerged that binding
|
|
issues were hard to understand and that users should be strongly
|
|
encouraged to use generator expressions inside functions that consume
|
|
their arguments immediately. For more complex applications, full
|
|
generator definitions are always superior in terms of being obvious
|
|
about scope, lifetime, and binding [2]_.
|
|
|
|
|
|
Reduction Functions
|
|
===================
|
|
|
|
The utility of generator expressions is greatly enhanced when combined
|
|
with reduction functions like sum(), min(), and max(). The heapq
|
|
module in Python 2.4 includes two new reduction functions: nlargest()
|
|
and nsmallest(). Both work well with generator expressions and keep
|
|
no more than n items in memory at one time.
|
|
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
* Raymond Hettinger first proposed the idea of "generator
|
|
comprehensions" in January 2002.
|
|
|
|
* Peter Norvig resurrected the discussion in his proposal for
|
|
Accumulation Displays.
|
|
|
|
* Alex Martelli provided critical measurements that proved the
|
|
performance benefits of generator expressions. He also provided
|
|
strong arguments that they were a desirable thing to have.
|
|
|
|
* Phillip Eby suggested "iterator expressions" as the name.
|
|
|
|
* Subsequently, Tim Peters suggested the name "generator expressions".
|
|
|
|
* Armin Rigo, Tim Peters, Guido van Rossum, Samuele Pedroni,
|
|
Hye-Shik Chang and Raymond Hettinger teased out the issues surrounding
|
|
early versus late binding [1]_.
|
|
|
|
* Jiwon Seo single-handedly implemented various versions of the proposal
|
|
including the final version loaded into CVS. Along the way, there
|
|
were periodic code reviews by Hye-Shik Chang and Raymond Hettinger.
|
|
Guido van Rossum made the key design decisions after comments from
|
|
Armin Rigo and newsgroup discussions. Raymond Hettinger provided
|
|
the test suite, documentation, tutorial, and examples [2]_.
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Discussion over the relative merits of early versus late binding
|
|
https://mail.python.org/pipermail/python-dev/2004-April/044555.html
|
|
|
|
.. [2] Patch discussion and alternative patches on Source Forge
|
|
https://bugs.python.org/issue872326
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
End:
|