PEP: 279 Title: Enhanced Generators Version: $Revision$ Last-Modified: $Date$ Author: othello@javanet.com (Raymond D. Hettinger) Status: Draft Type: Standards Track Created: 30-Jan-2002 Python-Version: 2.3 Post-History: Abstract This PEP introduces four orthogonal (not mutually exclusive) ideas for enhancing the generators as introduced in Python version 2.2 [1]. The goal is to increase the convenience, utility, and power of generators. Rationale Starting with xrange() and xreadlines(), Python has been evolving toward a model that provides lazy evaluation as an alternative when complete evaluation is not desired because of memory restrictions or availability of data. Starting with Python 2.2, a second evolutionary direction came in the form of iterators and generators. The iter() factory function and generators were provided as convenient means of creating iterators. Deep changes were made to use iterators as a unifying theme throughout Python. The unification came in the form of establishing a common iterable interface for mappings, sequences, and file objects. In the case of mappings and file objects, lazy evaluation was made available. The next steps in the evolution of generators are: 1. Add built-in functions which provide lazy alternatives to their complete evaluation counterparts and one other convenience function which was made possible once iterators and generators became available. The new functions are xzip, xmap, xfilter, and indexed. 2. Provide a generator alternative to list comprehensions [3] making generator creation as convenient as list creation. 3. Extend the syntax of the 'yield' keyword to enable generator parameter passing. The resulting increase in power simplifies the creation of consumer streams which have a complex execution state and/or variable state. 4. Add a generator method to enable exceptions to be passed to a generator. Currently, there is no clean method for triggering exceptions from outside the generator. All of the suggestions are designed to take advantage of the existing implementation and require little additional effort to incorporate. Each is backward compatible and requires no new keywords. These generator tools go into Python 2.3 when generators become final and are not imported from __future__. SourceForge contains a working, pure Python simulation of every feature proposed in this PEP [8]. SourceForge also has a separate file with a simulation test suite and working source code for the examples listed used in this PEP [9]. Specification for new built-ins: def xfilter( pred, gen ): ''' xfilter(...) xfilter(function, sequence) -> list Return an iterator containing those items of sequence for which function is true. If function is None, return a list of items that are true. ''' if pred is None: for i in gen: if i: yield i else: for i in gen: if pred(i): yield i def xmap( fun, *collections ): ''' xmap(...) xmap(function, sequence[, sequence, ...]) -> list Return an iterator applying the function to the items of the argument collection(s). If more than one collection is given, the function is called with an argument list consisting of the corresponding item of each collection, substituting None for missing values when not all collections have the same length. If the function is None, return a list of the items of the collection (or a list of tuples if more than one collection). ''' gens = map(iter, collections) values_left = [1] def values(): # Emulate map behaviour, i.e. shorter # sequences are padded with None when # they run out of values. values_left[0] = 0 for i in range(len(gens)): iterator = gens[i] if iterator is None: yield None else: try: yield iterator.next() values_left[0] = 1 except StopIteration: gens[i] = None yield None while 1: args = tuple(values()) if not values_left[0]: raise StopIteration yield fun(*args) def xzip( *collections ): ### Code from Python Cookbook [6] ''' xzip(...) xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)] Return a iterator of tuples, where each tuple contains the i-th element from each of the argument sequences or iterable. The returned iterator is truncated in length to the length of the shortest argument collection. ''' gens = map(iter, collections) while 1: yield tuple( [g.next() for g in gens] ) def indexed( collection, cnt=0, limit=None ): 'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...' gen = iter(collection) while limit is None or cnt5] This would be implemented as if it had been written: class __Temp: def __iter__(self): for line in file: if len(line) > 5: yield (len(line), line) g = __Temp() Note A: There is some debate about whether the enclosing brackets should be part of the syntax for generator comprehensions. On the plus side, it neatly parallels list comprehensions and would be immediately recognizable as a similar form with similar internal syntax (taking maximum advantage of what people already know). More importantly, it sets off the generator comprehension from the rest of the function so as to not suggest that the enclosing function is a generator (currently the only cue that a function is really a generator is the presence of the yield keyword). On the minus side, the brackets may falsely suggest that the whole expression returns a list. Most of the feedback received to date indicates that brackets do not make a false suggestion and are in fact helpful. Note B: An iterable instance is returned by the above code. The purpose is to allow the object to be re-started and looped-over multiple times. This accurately mimics the behavior of list comprehensions. As a result, the following code (provided by Oren Tirosh) works equally well with or without 'yield': letters = [yield chr(i) for i in xrange(ord('a'),ord('z')+1)] digits = [yield str(i) for i in xrange(10)] letdig = [yield l+d for l in letters for d in digits] Note C: List comprehensions expose their looping variable and leave the variable in the enclosing scope. The code, [str(i) for i in range(8)] leaves 'i' set to 7 in the scope where the comprehension appears. This behavior is by design and reflects an intent to duplicate the result of coding a for-loop instead of a list comprehension. Further, the variable 'i' is in a defined and potentially useful state on the line immediately following the list comprehension. In contrast, generator comprehensions do not expose the looping variable to the enclosing scope. The code, [yield str(i) for i in range(8)] leaves 'i' untouched in the scope where the comprehension appears. This is also by design and reflects an intent to duplicate the result of coding a generator directly instead of a generator comprehension. Further, the variable 'i' is not in a defined state on the line immediately following the list comprehension. It does not come into existence until iteration starts. Since several generators may be running at once, there are potentially multiple, unequal instances of 'i' at any one time. Specification for Generator Parameter Passing: 1. Allow 'yield' to assign a value as in: def mygen(): while 1: x = yield None print x 2. Let the .next() method take a value to pass to generator as in: g = mygen() g.next() # runs the generator until the first 'yield' g.next(1) # '1' is bound to 'x' in mygen(), then printed g.next(2) # '2' is bound to 'x' in mygen(), then printed The control flow is unchanged by this proposal. The only change is that a value can be sent into the generator. By analogy, consider the quality improvement from GOSUB (which had no argument passing mechanism) to modern procedure calls (which can pass in arguments and return values). Most of the underlying machinery is already in place, only the communication needs to be added by modifying the parse syntax to accept the new 'x = yield expr' syntax and by allowing the .next() method to accept an optional argument. Yield is more than just a simple iterator creator. It does something else truly wonderful -- it suspends execution and saves state. It is good for a lot more than writing iterators. This proposal further expands its capability by making it easier to share data with the generator. The .next(arg) mechanism is especially useful for: 1. Sending data to any generator 2. Writing lazy consumers with complex execution states 3. Writing co-routines (as demonstrated in Dr. Mertz's article [5]) The proposal is a clear improvement over the existing alternative of passing data via global variables. It is also much simpler, more readable and easier to debug than an approach involving the threading module with its attendant mutexes, semaphores, and data queues. A class-based approach competes well when there are no complex execution states or variable states. When the complexity increases, generators with parameter passing are much simpler because they automatically save state (unlike classes which must explicitly save the variable and execution state in instance variables). Example of a Complex Consumer The encoder for arithmetic compression sends a series of fractional values to a complex, lazy consumer. That consumer makes computations based on previous inputs and only writes out when certain conditions have been met. After the last fraction is received, it has a procedure for flushing any unwritten data. Example of a Consumer Stream def filelike(packagename, appendOrOverwrite): cum = [] if appendOrOverwrite == 'w+': cum.extend( packages[packagename] ) try: while 1: dat = yield None cum.append(dat) except FlushStream: packages[packagename] = cum ostream = filelike('mydest','w') # Analogous to file.open(name,flag) ostream.next() # Advance to the first yield ostream.next(firstdat) # Analogous to file.write(dat) ostream.next(seconddat) ostream.throw( FlushStream ) # This feature proposed below Example of a Complex Consumer Loop over the picture files in a directory, shrink them one-at-a-time to thumbnail size using PIL [7], and send them to a lazy consumer. That consumer is responsible for creating a large blank image, accepting thumbnails one-at-a-time and placing them in a 5x3 grid format onto the blank image. Whenever the grid is full, it writes-out the large image as an index print. A FlushStream exception indicates that no more thumbnails are available and that the partial index print should be written out if there are one or more thumbnails on it. Example of a Producer and Consumer Used Together in a Pipelike Fashion 'Analogy to: source | upper | sink' sink = sinkgen() sink.next() for word in source(): sink.next( word.upper() ) Specification for Generator Exception Passing: Add a .throw(exception) method to the resulting generator as in: def mygen(): try: while 1: x = yield None print x except FlushStream: print 'Done' g = mygen() g.next(5) g.throw(FlushStream) There is no existing work around for triggering an exception inside a generator. This is a true deficiency. It is the only case in Python where active code cannot be excepted to or through. Even if the .next(arg) proposal is not adopted, we should add the .throw() method. Note A: The name of the throw method was selected for several reasons. Raise is a keyword and so cannot be used as a method name. Unlike raise which immediately raises an exception from the current execution point, throw will first return to the generator and then raise the exception. The word throw is suggestive of putting the exception in another location. The word throw is already associated with exceptions in other languages. Note B: The throw syntax should exactly match raise's syntax including: raise string g.throw(string) raise string, data g.throw(string,data) raise class, instance g.throw(class,instance) raise instance g.throw(instance) raise g.throw() References [1] PEP 255 Simple Generators http://python.sourceforge.net/peps/pep-0255.html [2] PEP 212 Loop Counter Iteration http://python.sourceforge.net/peps/pep-0212.html [3] PEP 202 List Comprehensions http://python.sourceforge.net/peps/pep-0202.html [4] There have been several discussion on comp.lang.python which helped tease out these proposals: Indexed Function http://groups.google.com/groups?hl=en&th=33f778d92dd5720a Xmap, Xfilter, Xzip and Two-way Generator Communication http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1 Two-way Generator Communication -- Revised Version http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1 Generator Comprehensions http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2 Discussion Draft of this PEP http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7 [5] Dr. David Mertz's draft column for Charming Python. http://gnosis.cx/publish/programming/charming_python_b5.txt [6] The code fragment for xmap() was found at: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66448 [7] PIL, the Python Imaging Library can be found at: http://www.pythonware.com/products/pil/ [8] A pure Python simulation of every feature in this PEP is at: http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17348&aid=513752 [9] The full, working source code for each of the examples in this PEP along with other examples and tests is at: http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=17412&aid=513756 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil fill-column: 70 End: