PEP: 279 Title: Enhanced Generators Version: $Revision$ Last-Modified: $Date$ Author: python@rcn.com (Raymond D. Hettinger) Status: Draft Type: Standards Track Created: 30-Jan-2002 Python-Version: 2.3 Post-History: Abstract This PEP introduces four orthogonal (not mutually exclusive) ideas for enhancing the generators introduced in Python version 2.2 [1]. The goal is to increase the convenience, utility, and power of generators. Rationale Python 2.2 introduced the concept of an iterable interface as proposed in PEP 234 [4]. The iter() factory function was provided as common calling convention and deep changes were made to use iterators as a unifying theme throughout Python. The unification came in the form of establishing a common iterable interface for mappings, sequences, and file objects. Generators, as proposed in PEP 255 [1], were introduced as a means for making it easier to create iterators, especially ones with a complex internal execution or variable states. When I created new programs, generators were often the tool of choice for creating an iterator. However, when updating existing programs, I found that the tool had another use, one that improved program function as well as structure. Those programs exhibited a pattern of creating large lists and then looping over them. As data sizes increased, the programs encountered scalability limitations owing to excessive memory consumption (and malloc time) for the intermediate lists. Generators were found to be directly substitutable for the lists while eliminating the memory issues through lazy evaluation a.k.a. just in time manufacturing. Python itself encountered similar issues. As a result, xrange() and xreadlines() were introduced. And, in the case of file objects and mappings, lazy evaluation became the norm. Generators provide a tool to program memory conserving for-loops whenever complete evaluation is not desired because of memory restrictions or availability of data. The next steps in the evolution of generators are: 1. Add a new builtin function, indexed() which was made possible once iterators and generators became available. It provides all iterables with the same advantage that iteritem() affords to dictionaries -- a compact, readable, reliable index notation. 2. Establish a generator alternative to list comprehensions [3] to provide a simple way to convert a list comprehensions into generators whenever memory issues arise. 3. Add a generator method to enable exceptions to be passed to a generator. Currently, there is no clean method for triggering exceptions from outside the generator. Also, generator exception passing helps mitigate the try/finally prohibition for generators. 4. [Proposal 4 is now deferred until Python 2.4] Extend the syntax of the 'yield' keyword to enable generator parameter passing. The resulting increase in power simplifies the creation of consumer streams which have a complex execution state and/or variable state. All of the suggestions are designed to take advantage of the existing implementation and require little additional effort to incorporate. Each is backward compatible and requires no new keywords. The first three generator tools go into Python 2.3 when generators become final and are not imported from __future__. The fourth proposal should be considered deferred and will be proposed for Python 2.4 after the Python community has more experience with generators. Reference Implementation There is not currently a CPython implementation; however, a simulation module written in pure Python is available on SourceForge [7]. The simulation covers every feature proposed in this PEP and is meant to allow direct experimentation with the proposals. There is also a module [8] with working source code for all of the examples used in this PEP. It serves as a test suite for the simulator and it documents how each of the new features works in practice. The authors and implementers of PEP 255 [1] were contacted to provide their assessment of whether these enhancements were going to be straight-forward to implement and require only minor modification of the existing generator code. Neil felt the assertion was correct. Ka-Ping thought so also. GvR said he could believe that it was true. Tim did not have an opportunity to give an assessment. Specification for a new builtin: def indexed(collection, cnt=0, limit=None): 'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...' gen = iter(collection) while limit is None or cnt5] print g.next() This would be implemented as if it had been written: def __temp(self): for line in file: if len(line) > 5: yield (len(line), line) g = __temp() print g.next() Note A: There is some discussion about whether the enclosing brackets should be part of the syntax for generator comprehensions. On the plus side, it neatly parallels list comprehensions and would be immediately recognizable as a similar form with similar internal syntax (taking maximum advantage of what people already know). More importantly, it sets off the generator comprehension from the rest of the function so as to not suggest that the enclosing function is a generator (currently the only cue that a function is really a generator is the presence of the yield keyword). On the minus side, the brackets may falsely suggest that the whole expression returns a list. Most of the feedback received to date indicates that brackets are helpful and not misleading. Unfortunately, the one dissent is from GvR. A key advantage of the generator comprehension syntax is that it makes it trivially easy to transform existing list comprehension code to a generator by adding yield. Likewise, it can be converted back to a list by deleting yield. This makes it easy to scale-up programs from small datasets to ones large enough to warrant just in time evaluation. Note B: List comprehensions expose their looping variable and leave that variable in the enclosing scope. The code, [str(i) for i in range(8)] leaves 'i' set to 7 in the scope where the comprehension appears. This behavior is by design and reflects an intent to duplicate the result of coding a for-loop instead of a list comprehension. Further, the variable 'i' is in a defined and potentially useful state on the line immediately following the list comprehension. In contrast, generator comprehensions do not expose the looping variable to the enclosing scope. The code, [yield str(i) for i in range(8)] leaves 'i' untouched in the scope where the comprehension appears. This is also by design and reflects an intent to duplicate the result of coding a generator directly instead of a generator comprehension. Further, the variable 'i' is not in a defined state on the line immediately following the list comprehension. It does not come into existence until iteration starts (possibly never). Commentary from GvR: Cute hack, but I think the use of the [] syntax strongly suggests that it would return a list, not an iterator. I also think that this is trying to turn Python into a functional language, where most algorithms use lazy infinite sequences, and I just don't think that's where its future lies. Commentary from Ka-Ping Yee: I am very happy with the things you have proposed in this PEP. I feel quite positive about generator comprehensions and have no reservations. So a +1 on that. Commentary from Neil Schemenauer: I'm -0 on the generator list comprehensions. They don't seem to add much. You could easily use a nested generator to do the same thing. They smell like lambda. Author response: This may be before its time in that some people still don't like list comprehensions and half of this PEP's reviewers did not have any use for generators in any form. What I like best about generator comprehensions is that I can design using list comprehensions and then easily switch to a generator (by adding yield) in response to scalability requirements (when the list comprehension produces too large of an intermediate result). Specification for Generator Exception Passing: Add a .throw(exception) method to the generator interface: def logger(): start = time.time() log = [] try: while 1:0 log.append( time.time() - start ) yield log[-1] except WriteLog: return log g = logger() for i in [10,20,40,80,160]: testsuite(i) g.next() g.throw(WriteLog) There is no existing work-around for triggering an exception inside a generator. This is a true deficiency. It is the only case in Python where active code cannot be excepted to or through. Generator exception passing also helps address an intrinsic limitation on generators, the prohibition against their using try/finally to trigger clean-up code [1]. Without .throw(), the current work-around forces the resolution or clean-up code to be moved outside the generator. Note A: The name of the throw method was selected for several reasons. Raise is a keyword and so cannot be used as a method name. Unlike raise which immediately raises an exception from the current execution point, throw will first return to the generator and then raise the exception. The word throw is suggestive of putting the exception in another location. The word throw is already associated with exceptions in other languages. Alternative method names were considered: resolve(), signal(), genraise(), raiseinto(), and flush(). None of these seem to fit as well as throw(). Note B: The throw syntax should exactly match raise's syntax: throw([expression, [expression, [expression]]]) Accordingly, it should be implemented to handle all of the following: raise string g.throw(string) raise string, data g.throw(string,data) raise class, instance g.throw(class,instance) raise instance g.throw(instance) raise g.throw() Commentary from GvR: I'm not convinced that the cleanup problem that this is trying to solve exists in practice. I've never felt the need to put yield inside a try/except. I think the PEP doesn't make enough of a case that this is useful. Commentary from Ka-Ping Yee: I agree that the exception issue needs to be resolved and [that] you have suggested a fine solution. Commentary from Neil Schemenauer: The exception passing idea is one I hadn't thought of before and looks interesting. If we enable the passing of values back, then we should add this feature too. Author response: If the sole use of generators is to simplify writing iterators for lazy producers, then the odds of needing generator exception passing are very slim. If, on the other hand, generators are used to write lazy consumers, create coroutines, generate output streams, or simply for their marvelous capability for restarting a previously frozen state, THEN the need to raise exceptions will come up almost every time. I'm no judge of what is truly Pythonic, but am still astonished that there can exist blocks of code that can't be excepted to or through, that the try/finally combination is blocked, and that the only work-around is to rewrite as a class and move the exception code out of the function or method being excepted. Specification for Generator Parameter Passing [Deferred Proposal] 1. Allow 'yield' to assign a value as in: def mygen(): while 1: x = yield None print x 2. Let the .next() method take a value to pass to the generator as in: g = mygen() g.next() # runs the generator until the first 'yield' g.next(1) # '1' is bound to 'x' in mygen(), then printed g.next(2) # '2' is bound to 'x' in mygen(), then printed The control flow of 'yield' and 'next' is unchanged by this proposal. The only change is that a value can be sent into the generator. By analogy, consider the quality improvement from GOSUB (which had no argument passing mechanism) to modern procedure calls (which can pass in arguments and return values). Most of the underlying machinery is already in place, only the communication needs to be added by modifying the parse syntax to accept the new 'x = yield expr' syntax and by allowing the .next() method to accept an optional argument. Yield is more than just a simple iterator creator. It does something else truly wonderful -- it suspends execution and saves state. It is good for a lot more than writing iterators. This proposal further expands its capability by making it easier to share data with the generator. The .next(arg) mechanism is especially useful for: 1. Sending data to any generator 2. Writing lazy consumers with complex execution states 3. Writing co-routines (as demonstrated in Dr. Mertz's article [5]) The proposal is a clear improvement over the existing alternative of passing data via global variables. It is also much simpler, more readable and easier to debug than an approach involving the threading module with its attendant mutexes, semaphores, and data queues. A class-based approach competes well when there are no complex execution states or variable states. However, when the complexity increases, generators with parameter passing are much simpler because they automatically save state (unlike classes which must explicitly save the variable and execution state in instance variables). Note A: This proposal changes 'yield' from a statement to an expression with binding and precedence similar to lambda. Example of a Complex Consumer The encoder for arithmetic compression sends a series of fractional values to a complex, lazy consumer. That consumer makes computations based on previous inputs and only writes out when certain conditions have been met. After the last fraction is received, it has a procedure for flushing any unwritten data. Example of a Consumer Stream def filelike(packagename, appendOrOverwrite): cum = [] if appendOrOverwrite == 'w+': cum.extend(packages[packagename]) try: while 1: dat = yield None cum.append(dat) except FlushStream: packages[packagename] = cum ostream = filelike('mydest','w') # Analogous to file.open(name,flag) ostream.next() # Advance to the first yield ostream.next(firstdat) # Analogous to file.write(dat) ostream.next(seconddat) ostream.throw(FlushStream) # This feature proposed above Example of a Complex Consumer Loop over the picture files in a directory, shrink them one at a time to thumbnail size using PIL [6], and send them to a lazy consumer. That consumer is responsible for creating a large blank image, accepting thumbnails one at a time and placing them in a 5 by 3 grid format onto the blank image. Whenever the grid is full, it writes-out the large image as an index print. A FlushStream exception indicates that no more thumbnails are available and that the partial index print should be written out if there are one or more thumbnails on it. Example of a Producer and Consumer Used Together in a Pipe-like Fashion 'Analogy to Linux style pipes: source | upper | sink' sink = sinkgen() sink.next() for word in source(): sink.next(word.upper()) Commentary from GvR: We discussed this at length when we were hashing out generators and coroutines, and found that there's always a problem with this: the argument to the first next() call has to be thrown away, because it doesn't correspond to a yield statement. This looks ugly (note that the example code has a dummy call to next() to get the generator going). But there may be useful examples that can only be programmed (elegantly) with this feature, so I'm reserving judgment. I can believe that it's easy to implement. Commentary from Ka-Ping Yee: I also think there is a lot of power to be gained from generator argument passing. Commentary from Neil Schemenauer: I like the idea of being able to pass values back into a generator. I originally pitched this idea to Guido but in the end we decided against it (at least for the initial implementation). There was a few issues to work out but I can't seem to remember what they were. My feeling is that we need to wait until the Python community has more experience with generators before adding this feature. Maybe for 2.4 but not for 2.3. In the mean time you can work around this limitation by making your generator a method. Values can be passed back by mutating the instance. Author response: Okay, consider this part of the proposal deferred until 2.4. Restartability [Discussion of restartability deleted] Commentary from GvR: The PEP then goes on to discuss restartable iterators. I think this is an evil idea obtained from reading too much about C++ STL iterators. It should definitely be a separate PEP if the author wants me to take this seriously. Commentary from Ka-Ping Yee: I have less of an opinion on restartability since i have not yet had to really run into that issue. It seems reasonable that it might be good idea, though perhaps YAGNI will apply here until I experience the need for it first-hand. Author response: Over thirty reviewers responded, only one was interested in restartability on the theory that it made life easier for beginners and that it made lazy evaluation more substitutable for full evaluation. I was never sold on it myself. Consider it retracted. References [1] PEP 255 Simple Generators http://python.sourceforge.net/peps/pep-0255.html [2] PEP 212 Loop Counter Iteration http://python.sourceforge.net/peps/pep-0212.html [3] PEP 202 List Comprehensions http://python.sourceforge.net/peps/pep-0202.html [4] PEP 234 Iterators http://python.sourceforge.net/peps/pep-0234.html [5] Dr. David Mertz's draft column for Charming Python. http://gnosis.cx/publish/programming/charming_python_b5.txt [6] PIL, the Python Imaging Library can be found at: http://www.pythonware.com/products/pil/ [7] A pure Python simulation of every feature in this PEP is at: http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752 [8] The full, working source code for each of the examples in this PEP along with other examples and tests is at: http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil fill-column: 70 End: