Raymond Hettinger's latest revision. Now marked Accepted.
This commit is contained in:
parent
529701291f
commit
6913aef3e1
383
pep-0279.txt
383
pep-0279.txt
|
@ -3,7 +3,7 @@ Title: Enhanced Generators
|
|||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: python@rcn.com (Raymond D. Hettinger)
|
||||
Status: Draft
|
||||
Status: Accepted
|
||||
Type: Standards Track
|
||||
Created: 30-Jan-2002
|
||||
Python-Version: 2.3
|
||||
|
@ -12,103 +12,60 @@ Post-History:
|
|||
|
||||
Abstract
|
||||
|
||||
This PEP introduces two orthogonal (not mutually exclusive) ideas
|
||||
for enhancing the generators introduced in Python version 2.2 [1].
|
||||
The goal is to increase the convenience, utility, and power
|
||||
of generators.
|
||||
This PEP introduces a new built-in function, enumerate() to
|
||||
simplify a commonly used looping idiom. It provides all iterable
|
||||
collections with the same advantage that iteritems() affords to
|
||||
dictionaries -- a compact, readable, reliable index notation.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Python 2.2 introduced the concept of an iterable interface as proposed
|
||||
in PEP 234 [4]. The iter() factory function was provided as common
|
||||
calling convention and deep changes were made to use iterators as a
|
||||
unifying theme throughout Python. The unification came in the form of
|
||||
establishing a common iterable interface for mappings, sequences,
|
||||
and file objects.
|
||||
Python 2.2 introduced the concept of an iterable interface as
|
||||
proposed in PEP 234 [3]. The iter() factory function was provided
|
||||
as common calling convention and deep changes were made to use
|
||||
iterators as a unifying theme throughout Python. The unification
|
||||
came in the form of establishing a common iterable interface for
|
||||
mappings, sequences, and file objects.
|
||||
|
||||
Generators, as proposed in PEP 255 [1], were introduced as a means for
|
||||
making it easier to create iterators, especially ones with complex
|
||||
internal execution or variable states. When I created new programs,
|
||||
generators were often the tool of choice for creating an iterator.
|
||||
Generators, as proposed in PEP 255 [1], were introduced as a means
|
||||
for making it easier to create iterators, especially ones with
|
||||
complex internal execution or variable states. The availability
|
||||
of generators makes it possible to improve on the loop counter
|
||||
ideas in PEP 212 [2]. Those ideas provided a clean syntax for
|
||||
iteration with indices and values, but did not apply to all
|
||||
iterable objects. Also, that approach did not have the memory
|
||||
friendly benefit provided by generators which do not evaluate the
|
||||
entire sequence all at once.
|
||||
|
||||
However, when updating existing programs, I found that the tool had
|
||||
another use, one that improved program function as well as structure.
|
||||
Some programs exhibited a pattern of creating large lists and then
|
||||
looping over them. As data sizes increased, the programs encountered
|
||||
scalability limitations owing to excessive memory consumption (and
|
||||
malloc time) for the intermediate lists. Generators were found to be
|
||||
directly substitutable for the lists while eliminating the memory
|
||||
issues through lazy evaluation a.k.a. just in time manufacturing.
|
||||
|
||||
Python itself encountered similar issues. As a result, xrange() and
|
||||
xreadlines() were introduced. And, in the case of file objects and
|
||||
mappings, lazy evaluation became the norm. Generators provide a tool
|
||||
to program memory conserving for-loops whenever complete evaluation is
|
||||
not desired because of memory restrictions or availability of data.
|
||||
|
||||
The next steps in the evolution of generators are:
|
||||
|
||||
1. Add a new builtin function, iterindexed() which was made possible
|
||||
once iterators and generators became available. It provides
|
||||
all iterables with the same advantage that iteritems() affords
|
||||
to dictionaries -- a compact, readable, reliable index notation.
|
||||
|
||||
2. Establish a generator alternative to list comprehensions [3]
|
||||
that provides a simple way to convert a list comprehension into
|
||||
a generator whenever memory issues arise.
|
||||
|
||||
All of the suggestions are designed to take advantage of the
|
||||
existing implementation and require little additional effort to
|
||||
incorporate. Each is backward compatible and requires no new
|
||||
keywords. The two generator tools go into Python 2.3 when
|
||||
generators become final and are not imported from __future__.
|
||||
The new proposal is to add a built-in function, enumerate() which
|
||||
was made possible once iterators and generators became available.
|
||||
It provides all iterables with the same advantage that iteritems()
|
||||
affords to dictionaries -- a compact, readable, reliable index
|
||||
notation. Like zip(), it is expected to become a commonly used
|
||||
looping idiom.
|
||||
|
||||
This suggestion is designed to take advantage of the existing
|
||||
implementation and require little additional effort to
|
||||
incorporate. It is backwards compatible and requires no new
|
||||
keywords. The proposal will go into Python 2.3 when generators
|
||||
become final and are not imported from __future__.
|
||||
|
||||
|
||||
BDFL Pronouncements
|
||||
|
||||
1. The new built-in function is ACCEPTED. There needs to be further
|
||||
discussion on the best name for the function.
|
||||
|
||||
2. Generator comprehensions are REJECTED. The rationale is that
|
||||
the benefits are marginal since generators can already be coded directly
|
||||
and the costs are high because implementation and maintenance require
|
||||
major efforts with the parser.
|
||||
The new built-in function is ACCEPTED.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
Specification for a new built-in:
|
||||
|
||||
There is not currently a CPython implementation; however, a simulation
|
||||
module written in pure Python is available on SourceForge [5]. The
|
||||
simulation covers every feature proposed in this PEP and is meant
|
||||
to allow direct experimentation with the proposals.
|
||||
|
||||
There is also a module [6] with working source code for all of the
|
||||
examples used in this PEP. It serves as a test suite for the simulator
|
||||
and it documents how each of the new features works in practice.
|
||||
|
||||
The authors and implementers of PEP 255 [1] were contacted to provide
|
||||
their assessment of whether these enhancements were going to be
|
||||
straight-forward to implement and require only minor modification
|
||||
of the existing generator code. Neil felt the assertion was correct.
|
||||
Ka-Ping thought so also. GvR said he could believe that it was true.
|
||||
Tim did not have an opportunity to give an assessment.
|
||||
|
||||
|
||||
|
||||
Specification for a new builtin [ACCEPTED PROPOSAL]:
|
||||
|
||||
|
||||
def iterindexed(collection):
|
||||
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
|
||||
def enumerate(collection):
|
||||
'Generates an indexed series: (0,coll[0]), (1,coll[1]) ...'
|
||||
i = 0
|
||||
it = iter(collection)
|
||||
while 1:
|
||||
yield (i, it.next())
|
||||
i += 1
|
||||
|
||||
|
||||
Note A: PEP 212 Loop Counter Iteration [2] discussed several
|
||||
proposals for achieving indexing. Some of the proposals only work
|
||||
for lists unlike the above function which works for any generator,
|
||||
|
@ -116,231 +73,119 @@ Specification for a new builtin [ACCEPTED PROPOSAL]:
|
|||
presented and evaluated in the world prior to Python 2.2 which did
|
||||
not include generators. As a result, the non-generator version in
|
||||
PEP 212 had the disadvantage of consuming memory with a giant list
|
||||
of tuples. The generator version presented here is fast and light,
|
||||
works with all iterables, and allows users to abandon the sequence
|
||||
in mid-stream with no loss of computation effort.
|
||||
of tuples. The generator version presented here is fast and
|
||||
light, works with all iterables, and allows users to abandon the
|
||||
sequence in mid-stream with no loss of computation effort.
|
||||
|
||||
There are other PEPs which touch on related issues: integer iterators,
|
||||
integer for-loops, and one for modifying the arguments to range and
|
||||
xrange. The iterindexed() proposal does not preclude the other proposals
|
||||
and it still meets an important need even if those are adopted -- the need
|
||||
to count items in any iterable. The other proposals give a means of
|
||||
producing an index but not the corresponding value. This is especially
|
||||
problematic if a sequence is given which doesn't support random access
|
||||
such as a file object, generator, or sequence defined with __getitem__.
|
||||
There are other PEPs which touch on related issues: integer
|
||||
iterators, integer for-loops, and one for modifying the arguments
|
||||
to range and xrange. The enumerate() proposal does not preclude
|
||||
the other proposals and it still meets an important need even if
|
||||
those are adopted -- the need to count items in any iterable. The
|
||||
other proposals give a means of producing an index but not the
|
||||
corresponding value. This is especially problematic if a sequence
|
||||
is given which doesn't support random access such as a file
|
||||
object, generator, or sequence defined with __getitem__.
|
||||
|
||||
Note B: Almost all of the PEP reviewers welcomed the function but
|
||||
were divided as to whether there should be any built-ins. The
|
||||
main argument for a separate module was to slow the rate of
|
||||
language inflation. The main argument for a built-in was that the
|
||||
function is destined to be part of a core programming style,
|
||||
applicable to any object with an iterable interface. Just as
|
||||
zip() solves the problem of looping over multiple sequences, the
|
||||
enumerate() function solves the loop counter problem.
|
||||
|
||||
Note B: Almost all of the PEP reviewers welcomed the function but were
|
||||
divided as to whether there should be any builtins. The main argument
|
||||
for a separate module was to slow the rate of language inflation. The
|
||||
main argument for a builtin was that the function is destined to be
|
||||
part of a core programming style, applicable to any object with an
|
||||
iterable interface. Just as zip() solves the problem of looping
|
||||
over multiple sequences, the iterindexed() function solves the loop
|
||||
counter problem.
|
||||
If only one built-in is allowed, then enumerate() is the most
|
||||
important general purpose tool, solving the broadest class of
|
||||
problems while improving program brevity, clarity and reliability.
|
||||
|
||||
If only one builtin is allowed, then iterindexed() is the most important
|
||||
general purpose tool, solving the broadest class of problems while
|
||||
improving program brevity, clarity and reliability.
|
||||
Note C: Various alternative names were discussed:
|
||||
|
||||
|
||||
Note C: Various alternative names have been proposed:
|
||||
|
||||
iterindexed()-- five syllables is a mouthfull
|
||||
iterindexed()-- five syllables is a mouthful
|
||||
index() -- nice verb but could be confused the .index() method
|
||||
indexed() -- widely liked however adjectives should be avoided
|
||||
indexer() -- noun did not read well in a for-loop
|
||||
count() -- direct and explicit but often used in other contexts
|
||||
itercount() -- direct, explicit and hated by more than one person
|
||||
enumerate() -- a contender but doesn't mention iteration or indices
|
||||
iteritems() -- conflicts with key:value concept for dictionaries
|
||||
itemize() -- confusing because amap.items() != list(itemize(amap))
|
||||
enum() -- pithy; less clear than enumerate; too similar to enum
|
||||
in other languages where it has a different meaning
|
||||
|
||||
All of the names involving 'count' had the further disadvantage of
|
||||
implying that the count would begin from one instead of zero.
|
||||
|
||||
Note D: This function was originally proposed with optional start and
|
||||
stop arguments. GvR pointed out that the function call
|
||||
iterindexed(seqn,4,6) had an alternate, plausible interpretation as a
|
||||
slice that would return the fourth and fifth elements of the sequence.
|
||||
To avoid the ambiguity, the optional arguments were dropped eventhough
|
||||
it meant losing flexibity as a loop counter. That flexiblity was most
|
||||
important for the common case of counting from one, as in:
|
||||
for linenum, line in iterindexed(source): print linenum, line
|
||||
All of the names involving 'index' clashed with usage in database
|
||||
languages where indexing implies a sorting operation rather than
|
||||
linear sequencing.
|
||||
|
||||
Note D: This function was originally proposed with optional start
|
||||
and stop arguments. GvR pointed out that the function call
|
||||
enumerate(seqn,4,6) had an alternate, plausible interpretation as
|
||||
a slice that would return the fourth and fifth elements of the
|
||||
sequence. To avoid the ambiguity, the optional arguments were
|
||||
dropped even though it meant losing flexibility as a loop counter.
|
||||
That flexibility was most important for the common case of
|
||||
counting from one, as in:
|
||||
|
||||
for linenum, line in enumerate(source,1): print linenum, line
|
||||
|
||||
Comments from GvR: filter and map should die and be subsumed into list
|
||||
comprehensions, not grow more variants. I'd rather introduce builtins
|
||||
that do iterator algebra (e.g. the iterzip that I've often used as
|
||||
an example).
|
||||
comprehensions, not grow more variants. I'd rather introduce
|
||||
built-ins that do iterator algebra (e.g. the iterzip that I've
|
||||
often used as an example).
|
||||
|
||||
I like the idea of having some way to iterate over a sequence and
|
||||
its index set in parallel. It's fine for this to be a builtin.
|
||||
I like the idea of having some way to iterate over a sequence
|
||||
and its index set in parallel. It's fine for this to be a
|
||||
built-in.
|
||||
|
||||
I don't like the name "indexed"; adjectives do not make good
|
||||
function names. Maybe iterindexed()?
|
||||
|
||||
Comments from Ka-Ping Yee: I'm also quite happy with everything you
|
||||
proposed ... and the extra builtins (really 'indexed' in particular)
|
||||
are things I have wanted for a long time.
|
||||
proposed ... and the extra built-ins (really 'indexed' in
|
||||
particular) are things I have wanted for a long time.
|
||||
|
||||
Comments from Neil Schemenauer: The new builtins sound okay. Guido
|
||||
may be concerned with increasing the number of builtins too much. You
|
||||
might be better off selling them as part of a module. If you use a
|
||||
module then you can add lots of useful functions (Haskell has lots of
|
||||
them that we could steal).
|
||||
Comments from Neil Schemenauer: The new built-ins sound okay. Guido
|
||||
may be concerned with increasing the number of built-ins too
|
||||
much. You might be better off selling them as part of a
|
||||
module. If you use a module then you can add lots of useful
|
||||
functions (Haskell has lots of them that we could steal).
|
||||
|
||||
Comments for Magnus Lie Hetland: I think indexed would be a useful and
|
||||
natural built-in function. I would certainly use it a lot.
|
||||
I like indexed() a lot; +1. I'm quite happy to have it make PEP 281
|
||||
obsolete. Adding a separate module for iterator utilities seems like
|
||||
a good idea.
|
||||
natural built-in function. I would certainly use it a lot. I
|
||||
like indexed() a lot; +1. I'm quite happy to have it make PEP
|
||||
281 obsolete. Adding a separate module for iterator utilities
|
||||
seems like a good idea.
|
||||
|
||||
Comments from the Community: The response to the iterindexed() proposal
|
||||
has been close to 100% favorable. Almost everyone loves the idea.
|
||||
Comments from the Community: The response to the enumerate() proposal
|
||||
has been close to 100% favorable. Almost everyone loves the
|
||||
idea.
|
||||
|
||||
Author response: Prior to these comments, four builtins were proposed.
|
||||
After the comments, xmap xfilter and xzip were withdrawn. The one
|
||||
that remains is vital for the language and is proposed by itself.
|
||||
Indexed() is trivially easy to implement and can be documented in
|
||||
minutes. More importantly, it is useful in everyday programming
|
||||
which does not otherwise involve explicit use of generators.
|
||||
Author response: Prior to these comments, four built-ins were proposed.
|
||||
After the comments, xmap xfilter and xzip were withdrawn. The
|
||||
one that remains is vital for the language and is proposed by
|
||||
itself. Indexed() is trivially easy to implement and can be
|
||||
documented in minutes. More importantly, it is useful in
|
||||
everyday programming which does not otherwise involve explicit
|
||||
use of generators.
|
||||
|
||||
Though withdrawn from the proposal, I still secretly covet xzip()
|
||||
a.k.a. iterzip() but think that it will happen on its own someday.
|
||||
|
||||
|
||||
|
||||
Specification for Generator Comprehensions [REJECTED PROPOSAL]:
|
||||
|
||||
If a list comprehension starts with a 'yield' keyword, then
|
||||
express the comprehension with a generator. For example:
|
||||
|
||||
g = [yield (len(line),line) for line in file if len(line)>5]
|
||||
|
||||
This would be implemented as if it had been written:
|
||||
|
||||
def __temp(self):
|
||||
for line in file:
|
||||
if len(line) > 5:
|
||||
yield (len(line), line)
|
||||
g = __temp()
|
||||
|
||||
|
||||
Note A: There is some discussion about whether the enclosing brackets
|
||||
should be part of the syntax for generator comprehensions. On the
|
||||
plus side, it neatly parallels list comprehensions and would be
|
||||
immediately recognizable as a similar form with similar internal
|
||||
syntax (taking maximum advantage of what people already know).
|
||||
More importantly, it sets off the generator comprehension from the
|
||||
rest of the function so as to not suggest that the enclosing
|
||||
function is a generator (currently the only cue that a function is
|
||||
really a generator is the presence of the yield keyword). On the
|
||||
minus side, the brackets may falsely suggest that the whole
|
||||
expression returns a list. Most of the feedback received to date
|
||||
indicates that brackets are helpful and not misleading. Unfortunately,
|
||||
the one dissent is from GvR.
|
||||
|
||||
A key advantage of the generator comprehension syntax is that it
|
||||
makes it trivially easy to transform existing list comprehension
|
||||
code to a generator by adding yield. Likewise, it can be converted
|
||||
back to a list by deleting yield. This makes it easy to scale-up
|
||||
programs from small datasets to ones large enough to warrant
|
||||
just in time evaluation.
|
||||
|
||||
|
||||
Note B: List comprehensions expose their looping variable and
|
||||
leave that variable in the enclosing scope. The code, [str(i) for
|
||||
i in range(8)] leaves 'i' set to 7 in the scope where the
|
||||
comprehension appears. This behavior is by design and reflects an
|
||||
intent to duplicate the result of coding a for-loop instead of a
|
||||
list comprehension. Further, the variable 'i' is in a defined and
|
||||
potentially useful state on the line immediately following the
|
||||
list comprehension.
|
||||
|
||||
In contrast, generator comprehensions do not expose the looping
|
||||
variable to the enclosing scope. The code, [yield str(i) for i in
|
||||
range(8)] leaves 'i' untouched in the scope where the
|
||||
comprehension appears. This is also by design and reflects an
|
||||
intent to duplicate the result of coding a generator directly
|
||||
instead of a generator comprehension. Further, the variable 'i'
|
||||
is not in a defined state on the line immediately following the
|
||||
list comprehension. It does not come into existence until
|
||||
iteration starts (possibly never).
|
||||
|
||||
|
||||
Comments from GvR: Cute hack, but I think the use of the [] syntax
|
||||
strongly suggests that it would return a list, not an iterator. I
|
||||
also think that this is trying to turn Python into a functional
|
||||
language, where most algorithms use lazy infinite sequences, and I
|
||||
just don't think that's where its future lies.
|
||||
|
||||
I don't think it's worth the trouble. I expect it will take a lot
|
||||
of work to hack it into the code generator: it has to create a
|
||||
separate code object in order to be a generator. List
|
||||
comprehensions are inlined, so I expect that the generator
|
||||
comprehension code generator can't share much with the list
|
||||
comprehension code generator. And this for something that's not
|
||||
that common and easily done by writing a 2-line helper function.
|
||||
IOW the ROI isn't high enough.
|
||||
|
||||
Comments from Ka-Ping Yee: I am very happy with the things you have
|
||||
proposed in this PEP. I feel quite positive about generator
|
||||
comprehensions and have no reservations. So a +1 on that.
|
||||
|
||||
Comments from Neil Schemenauer: I'm -0 on the generator list
|
||||
comprehensions. They don't seem to add much. You could easily use
|
||||
a nested generator to do the same thing. They smell like lambda.
|
||||
|
||||
Comments for Magnus Lie Hetland: Generator comprehensions seem mildly
|
||||
useful, but I vote +0. Defining a separate, named generator would
|
||||
probably be my preference. On the other hand, I do see the advantage
|
||||
of "scaling up" from list comprehensions.
|
||||
|
||||
Comments from the Community: The response to the generator comprehension
|
||||
proposal has been mostly favorable. There were some 0 votes from
|
||||
people who didn't see a real need or who were not energized by the
|
||||
idea. Some of the 0 votes were tempered by comments that the reviewer
|
||||
did not even like list comprehensions or did not have any use for
|
||||
generators in any form. The +1 votes outnumbered the 0 votes by about
|
||||
two to one.
|
||||
|
||||
Author response: I've studied several syntactical variations and
|
||||
concluded that the brackets are essential for:
|
||||
- teachability (it's like a list comprehension)
|
||||
- set-off (yield applies to the comprehension not the enclosing
|
||||
function)
|
||||
- substitutability (list comprehensions can be made lazy just by
|
||||
adding yield)
|
||||
|
||||
What I like best about generator comprehensions is that I can design
|
||||
using list comprehensions and then easily switch to a generator (by
|
||||
adding yield) in response to scalability requirements (when the list
|
||||
comprehension produces too large of an intermediate result). Had
|
||||
generators already been in-place when list comprehensions were
|
||||
accepted, the yield option might have been incorporated from the
|
||||
start. For certain, the mathematical style notation is explicit and
|
||||
readable as compared to a separate function definition with an
|
||||
embedded yield.
|
||||
Though withdrawn from the proposal, I still secretly covet
|
||||
xzip() a.k.a. iterzip() but think that it will happen on its
|
||||
own someday.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] PEP 255 Simple Generators
|
||||
http://www.python.org/peps/pep-0255.html
|
||||
http://python.sourceforge.net/peps/pep-0255.html
|
||||
|
||||
[2] PEP 212 Loop Counter Iteration
|
||||
http://www.python.org/peps/pep-0212.html
|
||||
|
||||
[3] PEP 202 List Comprehensions
|
||||
http://www.python.org/peps/pep-0202.html
|
||||
|
||||
[4] PEP 234 Iterators
|
||||
http://www.python.org/peps/pep-0234.html
|
||||
|
||||
[5] A pure Python simulation of every feature in this PEP is at:
|
||||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752
|
||||
|
||||
[6] The full, working source code for each of the examples in this PEP
|
||||
along with other examples and tests is at:
|
||||
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756
|
||||
http://python.sourceforge.net/peps/pep-0212.html
|
||||
|
||||
[3] PEP 234 Iterators
|
||||
http://python.sourceforge.net/peps/pep-0234.html
|
||||
|
||||
|
||||
Copyright
|
||||
|
@ -354,3 +199,5 @@ mode: indented-text
|
|||
indent-tabs-mode: nil
|
||||
fill-column: 70
|
||||
End:
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue