274 lines
8.1 KiB
Plaintext
274 lines
8.1 KiB
Plaintext
PEP: 289
|
|
Title: Generator Expressions
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: python@rcn.com (Raymond D. Hettinger)
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 30-Jan-2002
|
|
Python-Version: 2.3
|
|
Post-History: 22-Oct-2003
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP introduces generator expressions as a high performance,
|
|
memory efficient generalization of list comprehensions [1]_ and
|
|
generators [2]_.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Experience with list comprehensions has shown their wide-spread
|
|
utility throughout Python. However, many of the use cases do
|
|
not need to have a full list created in memory. Instead, they
|
|
only need to iterate over the elements one at a time.
|
|
|
|
For instance, the following summation code will build a full list of
|
|
squares in memory, iterate over those values, and, when the reference
|
|
is no longer needed, delete the list::
|
|
|
|
sum([x*x for x in range(10)])
|
|
|
|
Time, clarity, and memory are conserved by using an generator
|
|
expession instead::
|
|
|
|
sum(x*x for x in range(10))
|
|
|
|
Similar benefits are conferred on constructors for container objects::
|
|
|
|
s = Set(word for line in page for word in line.split())
|
|
d = dict( (k, func(v)) for k in keylist)
|
|
|
|
Generator expressions are especially useful with functions like sum(),
|
|
min(), and max() that reduce an iterable input to a single value::
|
|
|
|
max(len(line) for line in file if line.strip())
|
|
|
|
Generator expressions also address some examples of functionals coded
|
|
with lambda::
|
|
|
|
reduce(lambda s, a: s + a.myattr, data, 0)
|
|
reduce(lambda s, a: s + a[3], data, 0)
|
|
|
|
These simplify to::
|
|
|
|
sum(a.myattr for a in data)
|
|
sum(a[3] for a in data)
|
|
|
|
List comprehensions greatly reduced the need for filter() and map().
|
|
Likewise, generator expressions are expected to minimize the need
|
|
for itertools.ifilter() and itertools.imap(). In contrast, the
|
|
utility of other itertools will be enhanced by generator expressions::
|
|
|
|
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
|
|
|
|
Having a syntax similar to list comprehensions also makes it easy to
|
|
convert existing code into an generator expression when scaling up
|
|
application.
|
|
|
|
|
|
BDFL Pronouncements
|
|
===================
|
|
|
|
The previous version of this PEP was REJECTED. The bracketed yield
|
|
syntax left something to be desired; the performance gains had not been
|
|
demonstrated; and the range of use cases had not been shown. After,
|
|
much discussion on the python-dev list, the PEP has been resurrected
|
|
its present form. The impetus for the discussion was an innovative
|
|
proposal from Peter Norvig [3]_.
|
|
|
|
|
|
The Details
|
|
===========
|
|
|
|
(None of this is exact enough in the eye of a reader from Mars, but I
|
|
hope the examples convey the intention well enough for a discussion in
|
|
c.l.py. The Python Reference Manual should contain a 100% exact
|
|
semantic and syntactic specification.)
|
|
|
|
1. The semantics of a generator expression are equivalent to creating
|
|
an anonymous generator function and calling it. For example::
|
|
|
|
g = (x**2 for x in range(10))
|
|
print g.next()
|
|
|
|
is equivalent to::
|
|
|
|
def __gen():
|
|
for x in range(10):
|
|
yield x**2
|
|
g = __gen()
|
|
print g.next()
|
|
|
|
2. The syntax requires that a generator expression always needs to be
|
|
directly inside a set of parentheses and cannot have a comma on
|
|
either side. With reference to the file Grammar/Grammar in CVS,
|
|
two rules change:
|
|
|
|
a) The rule::
|
|
|
|
atom: '(' [testlist] ')'
|
|
|
|
changes to::
|
|
|
|
atom: '(' [listmaker1] ')'
|
|
|
|
where listmaker1 is almost the same as listmaker, but only
|
|
allows a single test after 'for' ... 'in'.
|
|
|
|
b) The rule for arglist needs similar changes.
|
|
|
|
This means that you can write::
|
|
|
|
sum(x**2 for x in range(10))
|
|
|
|
but you would have to write::
|
|
|
|
reduce(operator.add, (x**2 for x in range(10)))
|
|
|
|
and also::
|
|
|
|
g = (x**2 for i in range(10))
|
|
|
|
i.e. if a function call has a single positional argument, it can be
|
|
a generator expression without extra parentheses, but in all other
|
|
cases you have to parenthesize it.
|
|
|
|
3. The loop variable (if it is a simple variable or a tuple of simple
|
|
variables) is not exposed to the surrounding function. This
|
|
facilates the implementation and makes typical use cases more
|
|
reliable. In some future version of Python, list comprehensions
|
|
will also hide the induction variable from the surrounding code
|
|
(and, in Py2.4, warnings will be issued for code accessing the
|
|
induction variable).
|
|
|
|
For example::
|
|
|
|
x = "hello"
|
|
y = list(x for x in "abc")
|
|
print x # prints "hello", not "c"
|
|
|
|
(Loop variables may also use constructs like x[i] or x.a; this form
|
|
may be deprecated.)
|
|
|
|
4. All free variable bindings are captured at the time this function
|
|
is defined, and passed into it using default argument values. For
|
|
example::
|
|
|
|
x = 0
|
|
g = (x for c in "abc") # x is not the loop variable!
|
|
x = 1
|
|
print g.next() # prints 0 (captured x), not 1 (current x)
|
|
|
|
This behavior of free variables is almost always what you want when
|
|
the generator expression is evaluated at a later point than its
|
|
definition. In fact, to date, no examples have been found of code
|
|
where it would be better to use the execution-time instead of the
|
|
definition-time value of a free variable.
|
|
|
|
Note that free variables aren't copied, only their binding is
|
|
captured. They may still change if they are mutable, for example::
|
|
|
|
x = []
|
|
g = (x for c in "abc")
|
|
x.append(1)
|
|
print g.next() # prints [1], not []
|
|
|
|
5. List comprehensions will remain unchanged. For example::
|
|
|
|
[x for x in S] # This is a list comprehension.
|
|
[(x for x in S)] # This is a list containing one generator
|
|
# expression.
|
|
|
|
Unfortunately, there is currently a slight syntactic difference.
|
|
The expression::
|
|
|
|
[x for x in 1, 2, 3]
|
|
|
|
is legal, meaning::
|
|
|
|
[x for x in (1, 2, 3)]
|
|
|
|
But generator expressions will not allow the former version::
|
|
|
|
(x for x in 1, 2, 3)
|
|
|
|
is illegal.
|
|
|
|
The former list comprehension syntax will become illegal in Python
|
|
3.0, and should be deprecated in Python 2.4 and beyond.
|
|
|
|
List comprehensions also "leak" their loop variable into the
|
|
surrounding scope. This will also change in Python 3.0, so that
|
|
the semantic definition of a list comprehension in Python 3.0 will
|
|
be equivalent to list(<generator expression>). Python 2.4 and
|
|
beyond should issue a deprecation warning if a list comprehension's
|
|
loop variable has the same name as a variable used in the
|
|
immediately surrounding scope.
|
|
|
|
|
|
Reduction Functions
|
|
===================
|
|
|
|
The utility of generator expressions is greatly enhanced when combined
|
|
with reduction functions like sum(), min(), and max(). Separate
|
|
proposals are forthcoming that recommend several new accumulation
|
|
functions possibly including: product(), average(), alltrue(),
|
|
anytrue(), nlargest(), nsmallest().
|
|
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
* Raymond Hettinger first proposed the idea of "generator
|
|
comprehensions" in January 2002.
|
|
|
|
* Peter Norvig resurrected the discussion in his proposal for
|
|
Accumulation Displays.
|
|
|
|
* Alex Martelli provided critical measurements that proved the
|
|
performance benefits of generator expressions. He also provided
|
|
strong arguments that they were a desirable thing to have.
|
|
|
|
* Samuele Pedroni provided the example of late binding. Various
|
|
contributors have made arguments for and against late binding.
|
|
|
|
* Phillip Eby suggested "iterator expressions" as the name.
|
|
|
|
* Subsequently, Tim Peters suggested the name "generator expressions".
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] PEP 202 List Comprehensions
|
|
http://python.sourceforge.net/peps/pep-0202.html
|
|
|
|
.. [2] PEP 255 Simple Generators
|
|
http://python.sourceforge.net/peps/pep-0255.html
|
|
|
|
.. [3] Peter Norvig's Accumulation Display Proposal
|
|
http://www.norvig.com/pyacc.html
|
|
|
|
.. [4] Jeff Epler had worked up a patch demonstrating
|
|
the previously proposed bracket and yield syntax
|
|
http://python.org/sf/795947
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
End:
|