python-peps/pep-0289.txt

PEP: 289
Title: Generator Expressions
Version: $Revision$
Last-Modified: $Date$
Author: python@rcn.com (Raymond D. Hettinger)
Status: Active
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Jan-2002
Python-Version: 2.3
Post-History: 22-Oct-2003


Abstract
========

This PEP introduces generator expressions as a high performance,
memory efficient generalization of list comprehensions [1]_ and
generators [2]_.


Rationale
=========

Experience with list comprehensions has shown their wide-spread
utility throughout Python.  However, many of the use cases do
not need to have a full list created in memory.  Instead, they
only need to iterate over the elements one at a time.

For instance, the following summation code will build a full list of
squares in memory, iterate over those values, and, when the reference
is no longer needed, delete the list::

    sum([x*x for x in range(10)])

Time, clarity, and memory are conserved by using an generator
expession instead::

    sum(x*x for x in range(10))

Similar benefits are conferred on constructors for container objects::

    s = Set(word  for line in page  for word in line.split())
    d = dict( (k, func(v)) for k in keylist)

Generator expressions are especially useful in functions that reduce
an iterable input to a single value::

    sum(len(line)  for line in file  if line.strip())

Accordingly, generator expressions are expected to partially eliminate
the need for reduce() which is notorious for its lack of clarity. And,
there are additional speed and clarity benefits from writing expressions
directly instead of using lambda.

List comprehensions greatly reduced the need for filter() and map().
Likewise, generator expressions are expected to minimize the need
for itertools.ifilter() and itertools.imap().  In contrast, the
utility of other itertools will be enhanced by generator expressions::

    dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
    
Having a syntax similar to list comprehensions also makes it easy to
convert existing code into an generator expression when scaling up
application.


BDFL Pronouncements
===================

The previous version of this PEP was REJECTED.  The bracketed yield
syntax left something to be desired; the performance gains had not been
demonstrated; and the range of use cases had not been shown.  After,
much discussion on the python-dev list, the PEP has been resurrected
its present form.  The impetus for the discussion was an innovative
proposal from Peter Norvig [3]_.


The Gory Details
================

1.  The semantics of a generator expression are equivalent to creating
an anonymous generator function and calling it.  There's still discussion
about whether that generator function should copy the current value of all
free variables into default arguments.

2. The syntax requires that a generator expression always needs to be inside
a set of parentheses and cannot have a comma on either side.  Unfortunately,
this is different from list comprehensions.  While [1, x for x in R] is
illegal, [x for x in 1, 2, 3] is legal, meaning [x for x in (1,2,3)].
With reference to the file Grammar/Grammar in CVS, two rules change:

    a) The rule::

          atom: '(' [testlist] ')'

       changes to::

          atom: '(' [listmaker1] ')'

       where listmaker1 is almost the same as listmaker, but only allows
       a single test after 'for' ... 'in'.

    b)  The rule for arglist needs similar changes.


2. The loop variable is not exposed to the surrounding function.  This
facilates the implementation and makes typical use cases more reliable.
In some future version of Python, list comprehensions will also hide the
induction variable from the surrounding code (and, in Py2.4, warnings
will be issued for code accessing the induction variable).
                                                                
3. There is still discussion about whether variable referenced in generator
expressions will exhibit late binding just like other Python code.  In the
following example, the iterator runs *after* the value of y is set to one::

    def h():
        y = 0
        l = [1,2]
        def gen(S):
            for x in S:
                yield x+y
        it = gen(l)
        y = 1
        for v in it:
          print v

4. List comprehensions will remain unchanged::

    [x for x in S]    # This is a list comprehension.
    [(x for x in S)]  # This is a list containing one generator expression.


Reduction Functions
===================

The utility of generator expressions is greatly enhanced when combined
with appropriate reduction functions like sum(), min(), and max(). I
propose creating a set of high speed reduction functions designed to tap the
power of generator expressions and replace the most common uses of reduce()::

    def xorsum(it):
        return reduce(operator.xor, it, 0)

    def product(it):
        return reduce(operator.mul, it, 1)

    def anytrue(it):
        for elem in it:
            if it:
                return True
        return False

    def alltrue(it):
        for elem in it:
            if it:
                return False
        return True

    def horner(it, x):
        'horner([6,3,4], 5) evaluates 6*x**2 + 3*x + 4 at x=5'
        cum = 0.0
        for c in it:
            cum = cum*x + c
        return cum

    def mean(it):
        data = list(it)
        return sum(data) / len(data)

    def smallest(it, siz=1):
        result = []
        for elem in it:
            if len(result) < siz:
                bisect.insort_left(result, elem)
            elif elem < result[-1]:
                result.pop()
                bisect.insort_left(result, elem)
        return result

    def largest(it, siz=1):
        result = []
        for elem in it:
            if len(result) < siz:
                bisect.insort_left(result, elem)
            elif elem > result[0]:
                result.pop(0)
                bisect.insort_left(result, elem)
        result.reverse()            
        return result

Notes on reduce()
=================

Reduce typically has three types of use cases:

1) Common reduction functions applied directly to elements in a sequence.
This use case is addressed by the sum(), min(), max(), and the additional
functions listed above.

2) Reduce is often used with lambda when the data needs be extracted from
complex sequence elements.  For example::

    reduce(lambda sum, x: sum + x.myattr, data, 0)
    reduce(lambda prod, x:  prod * x[3], data, 1)

In concert with reduction functions, generator expressions completely
fulfill these use cases::

    sum(x.myattr for x in data)
    product(x[3] for x in data)

3) On rare occasions, the reduction function is non-standard and requires
custom coding::

    reduce(lambda cum, c: (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)], data, -1)

Because a complex lambda is required, this use case becomes clearer and
faster when coded directly as a for-loop::

    cum = -1
    for c in data:
        cum = (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)]

In conclusion, after adding generator expressions and a set of common reduction
functions, few, if any cases remain for reduce().


Acknowledgements
================

* Raymond Hettinger first proposed the idea of "generator comprehensions"
  in January 2002.
  
* Peter Norvig resurrected the discussion in his proposal for
  Accumulation Displays [3]_.

* Alex Martelli provided critical measurements that proved the performance
  benefits of generator expressions.  He also provided strong arguments
  that they were a desirable thing to have.

* Samuele Pedroni provided the example of late binding.
  Various contributors have made arguments for and against late binding.

* Phillip Eby suggested "iterator expressions" as the name.

* Subsequently, Tim Peters suggested the name "generator expressions".


References
==========

.. [1] PEP 202 List Comprehensions
       http://python.sourceforge.net/peps/pep-0202.html

.. [2] PEP 255 Simple Generators
       http://python.sourceforge.net/peps/pep-0255.html

.. [3] Peter Norvig's Accumulation Display Proposal
       http:///www.norvig.com/pyacc.html


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00			`PEP: 289`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`Title: Generator Expressions`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00			`Version: $Revision$`
			`Last-Modified: $Date$`
			`Author: python@rcn.com (Raymond D. Hettinger)`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`Status: Active`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00			`Type: Standards Track`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`Content-Type: text/x-rst`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00			`Created: 30-Jan-2002`
			`Python-Version: 2.3`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`Post-History: 22-Oct-2003`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00

			`Abstract`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`========`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`This PEP introduces generator expressions as a high performance,`
			`memory efficient generalization of list comprehensions [1]_ and`
			`generators [2]_.`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00

			`Rationale`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`=========`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`Experience with list comprehensions has shown their wide-spread`
			`utility throughout Python. However, many of the use cases do`
			`not need to have a full list created in memory. Instead, they`
			`only need to iterate over the elements one at a time.`

			`For instance, the following summation code will build a full list of`
			`squares in memory, iterate over those values, and, when the reference`
			`is no longer needed, delete the list::`

			`sum([x*x for x in range(10)])`

			`Time, clarity, and memory are conserved by using an generator`
			`expession instead::`

			`sum(x*x for x in range(10))`

			`Similar benefits are conferred on constructors for container objects::`

			`s = Set(word for line in page for word in line.split())`
			`d = dict( (k, func(v)) for k in keylist)`

			`Generator expressions are especially useful in functions that reduce`
			`an iterable input to a single value::`

			`sum(len(line) for line in file if line.strip())`

			`Accordingly, generator expressions are expected to partially eliminate`
			`the need for reduce() which is notorious for its lack of clarity. And,`
			`there are additional speed and clarity benefits from writing expressions`
			`directly instead of using lambda.`

			`List comprehensions greatly reduced the need for filter() and map().`
			`Likewise, generator expressions are expected to minimize the need`
			`for itertools.ifilter() and itertools.imap(). In contrast, the`
			`utility of other itertools will be enhanced by generator expressions::`

			`dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))`

			`Having a syntax similar to list comprehensions also makes it easy to`
			`convert existing code into an generator expression when scaling up`
			`application.`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00

			`BDFL Pronouncements`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`===================`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`The previous version of this PEP was REJECTED. The bracketed yield`
			`syntax left something to be desired; the performance gains had not been`
			`demonstrated; and the range of use cases had not been shown. After,`
			`much discussion on the python-dev list, the PEP has been resurrected`
			`its present form. The impetus for the discussion was an innovative`
			`proposal from Peter Norvig [3]_.`


			`The Gory Details`
			`================`

			`1. The semantics of a generator expression are equivalent to creating`
			`an anonymous generator function and calling it. There's still discussion`
			`about whether that generator function should copy the current value of all`
			`free variables into default arguments.`

			`2. The syntax requires that a generator expression always needs to be inside`
			`a set of parentheses and cannot have a comma on either side. Unfortunately,`
			`this is different from list comprehensions. While [1, x for x in R] is`
			`illegal, [x for x in 1, 2, 3] is legal, meaning [x for x in (1,2,3)].`
			`With reference to the file Grammar/Grammar in CVS, two rules change:`

			`a) The rule::`

			`atom: '(' [testlist] ')'`

			`changes to::`

			`atom: '(' [listmaker1] ')'`

			`where listmaker1 is almost the same as listmaker, but only allows`
			`a single test after 'for' ... 'in'.`

			`b) The rule for arglist needs similar changes.`


			`2. The loop variable is not exposed to the surrounding function. This`
			`facilates the implementation and makes typical use cases more reliable.`
			`In some future version of Python, list comprehensions will also hide the`
			`induction variable from the surrounding code (and, in Py2.4, warnings`
			`will be issued for code accessing the induction variable).`

			`3. There is still discussion about whether variable referenced in generator`
			`expressions will exhibit late binding just like other Python code. In the`
			`following example, the iterator runs after the value of y is set to one::`

			`def h():`
			`y = 0`
			`l = [1,2]`
			`def gen(S):`
			`for x in S:`
			`yield x+y`
			`it = gen(l)`
			`y = 1`
			`for v in it:`
			`print v`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`4. List comprehensions will remain unchanged::`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`[x for x in S] # This is a list comprehension.`
			`[(x for x in S)] # This is a list containing one generator expression.`


			`Reduction Functions`
			`===================`

			`The utility of generator expressions is greatly enhanced when combined`
			`with appropriate reduction functions like sum(), min(), and max(). I`
			`propose creating a set of high speed reduction functions designed to tap the`
			`power of generator expressions and replace the most common uses of reduce()::`

			`def xorsum(it):`
			`return reduce(operator.xor, it, 0)`

			`def product(it):`
			`return reduce(operator.mul, it, 1)`

			`def anytrue(it):`
			`for elem in it:`
			`if it:`
			`return True`
			`return False`

			`def alltrue(it):`
			`for elem in it:`
			`if it:`
			`return False`
			`return True`

			`def horner(it, x):`
			`'horner([6,3,4], 5) evaluates 6x2 + 3x + 4 at x=5'`
			`cum = 0.0`
			`for c in it:`
			`cum = cum*x + c`
			`return cum`

			`def mean(it):`
			`data = list(it)`
			`return sum(data) / len(data)`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`def smallest(it, siz=1):`
			`result = []`
			`for elem in it:`
			`if len(result) < siz:`
			`bisect.insort_left(result, elem)`
			`elif elem < result[-1]:`
			`result.pop()`
			`bisect.insort_left(result, elem)`
			`return result`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`def largest(it, siz=1):`
			`result = []`
			`for elem in it:`
			`if len(result) < siz:`
			`bisect.insort_left(result, elem)`
			`elif elem > result[0]:`
			`result.pop(0)`
			`bisect.insort_left(result, elem)`
			`result.reverse()`
			`return result`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`Notes on reduce()`
			`=================`

			`Reduce typically has three types of use cases:`

			`1) Common reduction functions applied directly to elements in a sequence.`
			`This use case is addressed by the sum(), min(), max(), and the additional`
			`functions listed above.`

			`2) Reduce is often used with lambda when the data needs be extracted from`
			`complex sequence elements. For example::`

			`reduce(lambda sum, x: sum + x.myattr, data, 0)`
			`reduce(lambda prod, x: prod * x[3], data, 1)`

			`In concert with reduction functions, generator expressions completely`
			`fulfill these use cases::`

			`sum(x.myattr for x in data)`
			`product(x[3] for x in data)`

			`3) On rare occasions, the reduction function is non-standard and requires`
			`custom coding::`

			`reduce(lambda cum, c: (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)], data, -1)`

			`Because a complex lambda is required, this use case becomes clearer and`
			`faster when coded directly as a for-loop::`

			`cum = -1`
			`for c in data:`
			`cum = (cum >> 8) ^ crc32[ord(a) ^ (cum & 0x00ff)]`

			`In conclusion, after adding generator expressions and a set of common reduction`
			`functions, few, if any cases remain for reduce().`


			`Acknowledgements`
			`================`

			`* Raymond Hettinger first proposed the idea of "generator comprehensions"`
			`in January 2002.`

			`* Peter Norvig resurrected the discussion in his proposal for`
			`Accumulation Displays [3]_.`

			`* Alex Martelli provided critical measurements that proved the performance`
			`benefits of generator expressions. He also provided strong arguments`
			`that they were a desirable thing to have.`

			`* Samuele Pedroni provided the example of late binding.`
			`Various contributors have made arguments for and against late binding.`

			`* Phillip Eby suggested "iterator expressions" as the name.`

			`* Subsequently, Tim Peters suggested the name "generator expressions".`


			`References`
			`==========`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`.. [1] PEP 202 List Comprehensions`
			`http://python.sourceforge.net/peps/pep-0202.html`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`.. [2] PEP 255 Simple Generators`
			`http://python.sourceforge.net/peps/pep-0255.html`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`.. [3] Peter Norvig's Accumulation Display Proposal`
			`http:///www.norvig.com/pyacc.html`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00

			`Copyright`
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`=========`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00
Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`This document has been placed in the public domain.`
PEP 289, Generator Comprehensions, Raymond Hettinger 2002-04-08 11:51:17 -04:00

Resurrect the PEP on generator expressions 2003-10-22 14:09:36 -04:00			`..`
			`Local Variables:`
			`mode: indented-text`
			`indent-tabs-mode: nil`
			`sentence-end-double-space: t`
			`fill-column: 70`
			`End:`