PEP: 289 Title: Generator Expressions Version: $Revision$ Last-Modified: $Date$ Author: python@rcn.com (Raymond D. Hettinger) Status: Accepted Type: Standards Track Content-Type: text/x-rst Created: 30-Jan-2002 Python-Version: 2.3 Post-History: 22-Oct-2003 Abstract ======== This PEP introduces generator expressions as a high performance, memory efficient generalization of list comprehensions [1]_ and generators [2]_. Rationale ========= Experience with list comprehensions has shown their wide-spread utility throughout Python. However, many of the use cases do not need to have a full list created in memory. Instead, they only need to iterate over the elements one at a time. For instance, the following summation code will build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list:: sum([x*x for x in range(10)]) Time, clarity, and memory are conserved by using an generator expession instead:: sum(x*x for x in range(10)) Similar benefits are conferred on constructors for container objects:: s = Set(word for line in page for word in line.split()) d = dict( (k, func(k)) for k in keylist) Generator expressions are especially useful with functions like sum(), min(), and max() that reduce an iterable input to a single value:: max(len(line) for line in file if line.strip()) Generator expressions also address some examples of functionals coded with lambda:: reduce(lambda s, a: s + a.myattr, data, 0) reduce(lambda s, a: s + a[3], data, 0) These simplify to:: sum(a.myattr for a in data) sum(a[3] for a in data) List comprehensions greatly reduced the need for filter() and map(). Likewise, generator expressions are expected to minimize the need for itertools.ifilter() and itertools.imap(). In contrast, the utility of other itertools will be enhanced by generator expressions:: dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector)) Having a syntax similar to list comprehensions also makes it easy to convert existing code into an generator expression when scaling up application. BDFL Pronouncements =================== This PEP is ACCEPTED for Py2.4. The Details =========== (None of this is exact enough in the eye of a reader from Mars, but I hope the examples convey the intention well enough for a discussion in c.l.py. The Python Reference Manual should contain a 100% exact semantic and syntactic specification.) 1. The semantics of a generator expression are equivalent to creating an anonymous generator function and calling it. For example:: g = (x**2 for x in range(10)) print g.next() is equivalent to:: def __gen(): for x in range(10): yield x**2 g = __gen() print g.next() 2. The syntax requires that a generator expression always needs to be directly inside a set of parentheses and cannot have a comma on either side. With reference to the file Grammar/Grammar in CVS, two rules change: a) The rule:: atom: '(' [testlist] ')' changes to:: atom: '(' [listmaker1] ')' where listmaker1 is almost the same as listmaker, but only allows a single test after 'for' ... 'in'. b) The rule for arglist needs similar changes. This means that you can write:: sum(x**2 for x in range(10)) but you would have to write:: reduce(operator.add, (x**2 for x in range(10))) and also:: g = (x**2 for x in range(10)) i.e. if a function call has a single positional argument, it can be a generator expression without extra parentheses, but in all other cases you have to parenthesize it. 3. The loop variable (if it is a simple variable or a tuple of simple variables) is not exposed to the surrounding function. This facilates the implementation and makes typical use cases more reliable. In some future version of Python, list comprehensions will also hide the induction variable from the surrounding code (and, in Py2.4, warnings will be issued for code accessing the induction variable). For example:: x = "hello" y = list(x for x in "abc") print x # prints "hello", not "c" 4. All free variable bindings are captured at the time this function is defined, and passed into it using default argument values. For example:: x = 0 g = (x for c in "abc") # x is not the loop variable! x = 1 print g.next() # prints 0 (captured x), not 1 (current x) This behavior of free variables is almost always what you want when the generator expression is evaluated at a later point than its definition. In fact, to date, no examples have been found of code where it would be better to use the execution-time instead of the definition-time value of a free variable. Note that free variables aren't copied, only their binding is captured. They may still change if they are mutable, for example:: x = [] g = (x for c in "abc") x.append(1) print g.next() # prints [1], not [] 5. List comprehensions will remain unchanged. For example:: [x for x in S] # This is a list comprehension. [(x for x in S)] # This is a list containing one generator # expression. Unfortunately, there is currently a slight syntactic difference. The expression:: [x for x in 1, 2, 3] is legal, meaning:: [x for x in (1, 2, 3)] But generator expressions will not allow the former version:: (x for x in 1, 2, 3) is illegal. The former list comprehension syntax will become illegal in Python 3.0, and should be deprecated in Python 2.4 and beyond. List comprehensions also "leak" their loop variable into the surrounding scope. This will also change in Python 3.0, so that the semantic definition of a list comprehension in Python 3.0 will be equivalent to list(). Python 2.4 and beyond should issue a deprecation warning if a list comprehension's loop variable has the same name as a variable used in the immediately surrounding scope. Reduction Functions =================== The utility of generator expressions is greatly enhanced when combined with reduction functions like sum(), min(), and max(). Separate proposals are forthcoming that recommend several new accumulation functions possibly including: product(), average(), alltrue(), anytrue(), nlargest(), nsmallest(). Acknowledgements ================ * Raymond Hettinger first proposed the idea of "generator comprehensions" in January 2002. * Peter Norvig resurrected the discussion in his proposal for Accumulation Displays. * Alex Martelli provided critical measurements that proved the performance benefits of generator expressions. He also provided strong arguments that they were a desirable thing to have. * Samuele Pedroni provided the example of late binding. Various contributors have made arguments for and against late binding. * Phillip Eby suggested "iterator expressions" as the name. * Subsequently, Tim Peters suggested the name "generator expressions". References ========== .. [1] PEP 202 List Comprehensions http://python.sourceforge.net/peps/pep-0202.html .. [2] PEP 255 Simple Generators http://python.sourceforge.net/peps/pep-0255.html .. [3] Peter Norvig's Accumulation Display Proposal http://www.norvig.com/pyacc.html .. [4] Jeff Epler had worked up a patch demonstrating the previously proposed bracket and yield syntax http://python.org/sf/795947 Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: