2002-04-08 11:51:17 -04:00
|
|
|
PEP: 289
|
2003-10-22 14:09:36 -04:00
|
|
|
Title: Generator Expressions
|
2002-04-08 11:51:17 -04:00
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: python@rcn.com (Raymond D. Hettinger)
|
2003-10-22 14:09:36 -04:00
|
|
|
Status: Active
|
2002-04-08 11:51:17 -04:00
|
|
|
Type: Standards Track
|
2003-10-22 14:09:36 -04:00
|
|
|
Content-Type: text/x-rst
|
2002-04-08 11:51:17 -04:00
|
|
|
Created: 30-Jan-2002
|
|
|
|
Python-Version: 2.3
|
2003-10-22 14:09:36 -04:00
|
|
|
Post-History: 22-Oct-2003
|
2002-04-08 11:51:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
2003-10-22 14:09:36 -04:00
|
|
|
========
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
This PEP introduces generator expressions as a high performance,
|
|
|
|
memory efficient generalization of list comprehensions [1]_ and
|
|
|
|
generators [2]_.
|
2002-04-08 11:51:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
2003-10-22 14:09:36 -04:00
|
|
|
=========
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
Experience with list comprehensions has shown their wide-spread
|
|
|
|
utility throughout Python. However, many of the use cases do
|
|
|
|
not need to have a full list created in memory. Instead, they
|
|
|
|
only need to iterate over the elements one at a time.
|
|
|
|
|
|
|
|
For instance, the following summation code will build a full list of
|
|
|
|
squares in memory, iterate over those values, and, when the reference
|
|
|
|
is no longer needed, delete the list::
|
|
|
|
|
|
|
|
sum([x*x for x in range(10)])
|
|
|
|
|
|
|
|
Time, clarity, and memory are conserved by using an generator
|
|
|
|
expession instead::
|
|
|
|
|
|
|
|
sum(x*x for x in range(10))
|
|
|
|
|
|
|
|
Similar benefits are conferred on constructors for container objects::
|
|
|
|
|
|
|
|
s = Set(word for line in page for word in line.split())
|
|
|
|
d = dict( (k, func(v)) for k in keylist)
|
|
|
|
|
|
|
|
Generator expressions are especially useful in functions that reduce
|
|
|
|
an iterable input to a single value::
|
|
|
|
|
|
|
|
sum(len(line) for line in file if line.strip())
|
|
|
|
|
|
|
|
Accordingly, generator expressions are expected to partially eliminate
|
|
|
|
the need for reduce() which is notorious for its lack of clarity. And,
|
|
|
|
there are additional speed and clarity benefits from writing expressions
|
|
|
|
directly instead of using lambda.
|
|
|
|
|
|
|
|
List comprehensions greatly reduced the need for filter() and map().
|
|
|
|
Likewise, generator expressions are expected to minimize the need
|
|
|
|
for itertools.ifilter() and itertools.imap(). In contrast, the
|
|
|
|
utility of other itertools will be enhanced by generator expressions::
|
|
|
|
|
|
|
|
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
|
|
|
|
|
|
|
|
Having a syntax similar to list comprehensions also makes it easy to
|
|
|
|
convert existing code into an generator expression when scaling up
|
|
|
|
application.
|
2002-04-08 11:51:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
BDFL Pronouncements
|
2003-10-22 14:09:36 -04:00
|
|
|
===================
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
The previous version of this PEP was REJECTED. The bracketed yield
|
|
|
|
syntax left something to be desired; the performance gains had not been
|
|
|
|
demonstrated; and the range of use cases had not been shown. After,
|
|
|
|
much discussion on the python-dev list, the PEP has been resurrected
|
|
|
|
its present form. The impetus for the discussion was an innovative
|
|
|
|
proposal from Peter Norvig [3]_.
|
|
|
|
|
|
|
|
|
|
|
|
The Gory Details
|
|
|
|
================
|
|
|
|
|
|
|
|
1. The semantics of a generator expression are equivalent to creating
|
|
|
|
an anonymous generator function and calling it. There's still discussion
|
|
|
|
about whether that generator function should copy the current value of all
|
|
|
|
free variables into default arguments.
|
|
|
|
|
|
|
|
2. The syntax requires that a generator expression always needs to be inside
|
|
|
|
a set of parentheses and cannot have a comma on either side. Unfortunately,
|
|
|
|
this is different from list comprehensions. While [1, x for x in R] is
|
|
|
|
illegal, [x for x in 1, 2, 3] is legal, meaning [x for x in (1,2,3)].
|
|
|
|
With reference to the file Grammar/Grammar in CVS, two rules change:
|
|
|
|
|
|
|
|
a) The rule::
|
|
|
|
|
|
|
|
atom: '(' [testlist] ')'
|
|
|
|
|
|
|
|
changes to::
|
|
|
|
|
|
|
|
atom: '(' [listmaker1] ')'
|
|
|
|
|
|
|
|
where listmaker1 is almost the same as listmaker, but only allows
|
|
|
|
a single test after 'for' ... 'in'.
|
|
|
|
|
|
|
|
b) The rule for arglist needs similar changes.
|
|
|
|
|
|
|
|
|
|
|
|
2. The loop variable is not exposed to the surrounding function. This
|
|
|
|
facilates the implementation and makes typical use cases more reliable.
|
|
|
|
In some future version of Python, list comprehensions will also hide the
|
|
|
|
induction variable from the surrounding code (and, in Py2.4, warnings
|
|
|
|
will be issued for code accessing the induction variable).
|
|
|
|
|
|
|
|
3. There is still discussion about whether variable referenced in generator
|
|
|
|
expressions will exhibit late binding just like other Python code. In the
|
|
|
|
following example, the iterator runs *after* the value of y is set to one::
|
|
|
|
|
|
|
|
def h():
|
|
|
|
y = 0
|
|
|
|
l = [1,2]
|
|
|
|
def gen(S):
|
|
|
|
for x in S:
|
|
|
|
yield x+y
|
|
|
|
it = gen(l)
|
|
|
|
y = 1
|
|
|
|
for v in it:
|
|
|
|
print v
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
4. List comprehensions will remain unchanged::
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
[x for x in S] # This is a list comprehension.
|
|
|
|
[(x for x in S)] # This is a list containing one generator expression.
|
|
|
|
|
|
|
|
|
|
|
|
Reduction Functions
|
|
|
|
===================
|
|
|
|
|
|
|
|
The utility of generator expressions is greatly enhanced when combined
|
|
|
|
with appropriate reduction functions like sum(), min(), and max(). I
|
|
|
|
propose creating a set of high speed reduction functions designed to tap the
|
|
|
|
power of generator expressions and replace the most common uses of reduce()::
|
|
|
|
|
|
|
|
def xorsum(it):
|
|
|
|
return reduce(operator.xor, it, 0)
|
|
|
|
|
|
|
|
def product(it):
|
|
|
|
return reduce(operator.mul, it, 1)
|
|
|
|
|
|
|
|
def anytrue(it):
|
|
|
|
for elem in it:
|
|
|
|
if it:
|
|
|
|
return True
|
|
|
|
return False
|
|
|
|
|
|
|
|
def alltrue(it):
|
|
|
|
for elem in it:
|
|
|
|
if it:
|
|
|
|
return False
|
|
|
|
return True
|
|
|
|
|
|
|
|
def horner(it, x):
|
|
|
|
'horner([6,3,4], 5) evaluates 6*x**2 + 3*x + 4 at x=5'
|
|
|
|
cum = 0.0
|
|
|
|
for c in it:
|
|
|
|
cum = cum*x + c
|
|
|
|
return cum
|
|
|
|
|
|
|
|
def mean(it):
|
|
|
|
data = list(it)
|
|
|
|
return sum(data) / len(data)
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
def smallest(it, siz=1):
|
|
|
|
result = []
|
|
|
|
for elem in it:
|
|
|
|
if len(result) < siz:
|
|
|
|
bisect.insort_left(result, elem)
|
|
|
|
elif elem < result[-1]:
|
|
|
|
result.pop()
|
|
|
|
bisect.insort_left(result, elem)
|
|
|
|
return result
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
def largest(it, siz=1):
|
|
|
|
result = []
|
|
|
|
for elem in it:
|
|
|
|
if len(result) < siz:
|
|
|
|
bisect.insort_left(result, elem)
|
|
|
|
elif elem > result[0]:
|
|
|
|
result.pop(0)
|
|
|
|
bisect.insort_left(result, elem)
|
|
|
|
result.reverse()
|
|
|
|
return result
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
Notes on reduce()
|
|
|
|
=================
|
|
|
|
|
|
|
|
Reduce typically has three types of use cases:
|
|
|
|
|
|
|
|
1) Common reduction functions applied directly to elements in a sequence.
|
|
|
|
This use case is addressed by the sum(), min(), max(), and the additional
|
|
|
|
functions listed above.
|
|
|
|
|
|
|
|
2) Reduce is often used with lambda when the data needs be extracted from
|
|
|
|
complex sequence elements. For example::
|
|
|
|
|
|
|
|
reduce(lambda sum, x: sum + x.myattr, data, 0)
|
|
|
|
reduce(lambda prod, x: prod * x[3], data, 1)
|
|
|
|
|
|
|
|
In concert with reduction functions, generator expressions completely
|
|
|
|
fulfill these use cases::
|
|
|
|
|
|
|
|
sum(x.myattr for x in data)
|
|
|
|
product(x[3] for x in data)
|
|
|
|
|
|
|
|
3) On rare occasions, the reduction function is non-standard and requires
|
|
|
|
custom coding::
|
|
|
|
|
|
|
|
reduce(lambda cum, c: (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)], data, -1)
|
|
|
|
|
|
|
|
Because a complex lambda is required, this use case becomes clearer and
|
|
|
|
faster when coded directly as a for-loop::
|
|
|
|
|
|
|
|
cum = -1
|
|
|
|
for c in data:
|
|
|
|
cum = (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)]
|
|
|
|
|
|
|
|
In conclusion, after adding generator expressions and a set of common reduction
|
|
|
|
functions, few, if any cases remain for reduce().
|
|
|
|
|
|
|
|
|
|
|
|
Acknowledgements
|
|
|
|
================
|
|
|
|
|
|
|
|
* Raymond Hettinger first proposed the idea of "generator comprehensions"
|
|
|
|
in January 2002.
|
|
|
|
|
|
|
|
* Peter Norvig resurrected the discussion in his proposal for
|
|
|
|
Accumulation Displays [3]_.
|
|
|
|
|
|
|
|
* Alex Martelli provided critical measurements that proved the performance
|
|
|
|
benefits of generator expressions. He also provided strong arguments
|
|
|
|
that they were a desirable thing to have.
|
|
|
|
|
|
|
|
* Samuele Pedroni provided the example of late binding.
|
|
|
|
Various contributors have made arguments for and against late binding.
|
|
|
|
|
|
|
|
* Phillip Eby suggested "iterator expressions" as the name.
|
|
|
|
|
|
|
|
* Subsequently, Tim Peters suggested the name "generator expressions".
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
.. [1] PEP 202 List Comprehensions
|
|
|
|
http://python.sourceforge.net/peps/pep-0202.html
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
.. [2] PEP 255 Simple Generators
|
|
|
|
http://python.sourceforge.net/peps/pep-0255.html
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
.. [3] Peter Norvig's Accumulation Display Proposal
|
|
|
|
http:///www.norvig.com/pyacc.html
|
2002-04-08 11:51:17 -04:00
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
2003-10-22 14:09:36 -04:00
|
|
|
=========
|
2002-04-08 11:51:17 -04:00
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
This document has been placed in the public domain.
|
2002-04-08 11:51:17 -04:00
|
|
|
|
|
|
|
|
2003-10-22 14:09:36 -04:00
|
|
|
..
|
|
|
|
Local Variables:
|
|
|
|
mode: indented-text
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
sentence-end-double-space: t
|
|
|
|
fill-column: 70
|
|
|
|
End:
|