From 1947249ff7df90ba79ef42eba0aaafc20d67578d Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Fri, 1 Feb 2002 05:59:14 +0000 Subject: [PATCH] PEP 279, Enhanced Generators, Raymond D. Hettinger. (Minor spell checking and formatting by BAW) --- pep-0279.txt | 422 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 422 insertions(+) create mode 100644 pep-0279.txt diff --git a/pep-0279.txt b/pep-0279.txt new file mode 100644 index 000000000..a1a9f9cd9 --- /dev/null +++ b/pep-0279.txt @@ -0,0 +1,422 @@ +PEP: 279 +Title: Enhanced Generators +Version: $Revision$ +Last-Modified: $Date$ +Author: othello@javanet.com (Raymond D. Hettinger) +Status: Draft +Type: Standards Track +Created: 30-Jan-2002 +Python-Version: 2.3 +Post-History: + + +Abstract + + This PEP introduces four orthogonal (not mutually exclusive) ideas + for enhancing the generators as introduced in Python version 2.2 + [1]. The goal is increase the convenience, utility, and power of + generators. + + +Rationale + + Starting with xrange() and xreadlines(), Python has been evolving + toward a model that provides lazy evaluation as an alternative + when complete evaluation is not desired because of memory + restrictions or availability of data. + + Starting with Python 2.2, a second evolutionary direction came in + the form of iterators and generators. The iter() factory function + and generators were provided as convenient means of creating + iterators. Deep changes were made to use iterators as a unifying + theme throughout Python. The unification came in the form of + establishing a common iterable interface for mappings, sequences, + and file objects. In the case of mappings and file objects, lazy + evaluation was made available. + + The next steps in the evolution of generators are: + + 1. Add built-in functions which provide lazy alternatives to their + complete evaluation counterparts and one other convenience + function which was made possible once iterators and generators + became available. The new functions are xzip, xmap, xfilter, + and indexed. + + 2. Provide a generator alternative to list comprehensions [3] + making generator creation as convenient as list creation. + + 3. Extend the syntax of the 'yield' keyword to enable two way + parameter passing. The resulting increase in power simplifies + the creation of consumer streams which have a complex execution + state and/or variable state. + + 4. Add a generator method to enable exceptions to be passed to a + generator. Currently, there is no clean method for triggering + exceptions from outside the generator. + + All of the suggestions are designed to take advantage of the + existing implementation and require little additional effort to + incorporate. Each is backward compatible and requires no new + keywords. + + +Specification for new built-ins: + + def xfilter( pred, gen ): + ''' + xfilter(...) + xfilter(function, sequence) -> list + + Return an iterator containing those items of sequence for + which function is true. If function is None, return a list of + items that are true. + ''' + if pred is None: + for i in gen: + if i: + yield i + else: + for i in gen: + if pred(i): + yield i + + def xmap( fun, *collections ): + ''' + xmap(...) + xmap(function, sequence[, sequence, ...]) -> list + + Return an iterator applying the function to the items of the + argument collection(s). If more than one collection is given, + the function is called with an argument list consisting of the + corresponding item of each collection, substituting None for + missing values when not all collections have the same length. + If the function is None, return a list of the items of the + collection (or a list of tuples if more than one collection). + ''' + gens = map(iter, collections) + values_left = [1] + def values(): + # Emulate map behaviour, i.e. shorter + # sequences are padded with None when + # they run out of values. + values_left[0] = 0 + for i in range(len(gens)): + iterator = gens[i] + if iterator is None: + yield None + else: + try: + yield iterator.next() + values_left[0] = 1 + except StopIteration: + gens[i] = None + yield None + while 1: + args = tuple(values()) + if not values_left[0]: + raise StopIteration + yield func(*args) + + def xzip( *collections ): + ''' + xzip(...) + xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)] + + Return a iterator of tuples, where each tuple contains the + i-th element from each of the argument sequences or iterable. + The returned iterator is truncated in length to the length of + the shortest argument collection. + ''' + gens = map(iter, collections) + while 1: + yield tuple( [g.next() for g in gens] ) + + def indexed( collection, cnt=0, limit=None ): + 'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...' + gen = iter(collection) + while limit is None or cnt5] + print g.next() + print g.next() + + This would be implemented as if it had been written: + + def __temp_gen(): + for line in file.readline(): + if len(line) > 5: + yield (len(line), line) + g = __temp_gen() + print g.next() + print g.next() + + Note A: There is a difference in the above implementation as + compared to list comprehensions. For a generator comprehension, + the variables are created in a separate scope while list + comprehensions use the enclosing scope. If this PEP is accepted, + the parser should generate byte code that eliminates this + difference by passing the line variable in the enclosing scope and + using that same variable passed by reference inside the generator. + This will make the behavior of generator comprehension identical + to that of list comprehensions. + + Note B: There is some debate about whether the enclosing brackets + should be part of the syntax for generator comprehensions. On the + plus side, it neatly parallels list comprehensions and would be + immediately recognizable as a similar form with similar internal + syntax (taking maximum advantage of what people already know). + More importantly, it sets off the generator comprehension from the + rest of the function so as to not suggest that the enclosing + function is a generator (currently the only cue that a function is + really a generator is the presence of the yield keyword). On the + minus side, the brackets may falsely suggest that the whole + expression returns a list. All of the feedback received to date + indicates that brackets do not make a false suggestion and are + in fact helpful. + + +Specification for two-way Generator Parameter Passing: + + 1. Allow 'yield' to assign a value as in: + + def mygen(): + while 1: + x = yield None + print x + + 2. Let the .next() method take a value to pass to generator as in: + + g = mygen() + g.next() # runs the generators until the first 'yield' + g.next(1) # the '1' gets bound to 'x' in mygen() + g.next(2) # the '2' gets bound to 'x' in mygen() + + Note A: An early question arose, when would you need this? The + answer is that existing generators make it easy to write lazy + producers which may have a complex execution state and/or complex + variable state. This proposal makes it equally easy to write lazy + consumers which may also have a complex execution or variable + state. + + For instance, when writing an encoder for arithmetic compression, + a series of fractional values are sent to a function which has + periodic output and a complex state which depends on previous + inputs. Also, that encoder requires a flush() function when no + additional fractions are to be output. It is helpful to think of + the following parallel with file output streams: + + ostream = file('mydest.txt','w') + ostream.write(firstdat) + ostream.write(seconddat) + ostream.flush() + + With the proposed extensions, it could be written like this: + + def filelike(packagename, appendOrOverwrite): + cum = [] + if appendOrOverwrite == 'w+': + cum.extend( packages[packagename] ) + try: + while 1: + dat = yield None + cum.append(dat) + except FlushStream: + packages[packagename] = cum + ostream = filelike('mydest','w') + ostream.next() + ostream.next(firstdat) + ostream.next(seconddat) + ostream.throw( FlushStream ) # this feature discussed below + + Note C: Almost all of the machinery necessary to implement this + extension is already in place. The parse syntax needs to be + modified to accept the new x = yield None syntax and the .next() + method needs to allow an argument. + + Note D: Some care must be used when writing a values to the + generator because execution starts at the top of the generator not + at the first yield. + + Consider the usual flow using .next() without an argument. + + g = mygen(p1) will bind p1 to a local variable and then return a + generator to be bound to g and NOT run any code in mygen(). + y = g.next() runs the generator from the first line until it + encounters a yield when it suspends execution and a returns + a value to be bound to y + + Since the same flow applies when you are submitting values, the + first call to .next() should have no argument since there is no + place to put it. + + g = mygen(p1) will bind p1 to a local variable and then return a + generator to be bound to g and NOT run any code in mygen() + g.next() will START execution in mygen() from the first line. Note, + that there is nowhere to bind any potential arguments that + might have been supplied to next(). Execution continues + until the first yield is encountered and control is returned + to the caller. + g.next(val) resumes execution at the yield and binds val to the + left hand side of the yield assignment and continues running + until another yield is encountered. This makes sense because + you submit values expecting them to be processed right away. + + + Q. Two-way generator parameter passing seems awfully bold. To + my mind, one of the great things about generators is that they + meet the (very simple) definition of an iterator. With this, + they no longer do. I like lazy consumers -- really I do -- + but I'd rather be conservative about putting something like + this in the language. + + A. If you don't use x = yield expr, then nothing changes and you + haven't lost anything. So, it isn't really bold. It simply + adds an option to pass in data as well as take it out. Other + generator implementations (like the thread based generator.py) + already have provisions for two-way parameter passing so that + consumers are put on an equal footing with producers. Two-way + is the norm, not the exception. + + Yield is not just a simple iterator creator. It does + something else truly wonderful -- it suspends execution and + saves state. It is good for a lot more than its original + purpose. Dr. Mertz's article [5] shows how they can be used + to create general purpose co-routines. + + Besides, 98% of the mechanism is already in place. Only the + communication needs to be added. Remember GOSUB which neither + took nor returned data. Routines which accepted parameters + and returned values were a major step forward. + + When you first need to pass information into a generator, the + existing alternative is clumsy. It involves setting a global + variable, calling .next(), and assigning the local from the + global. + + + Q. Why not introduce another keyword 'accept' for lazy consumers? + + A. To avoid conflicts with 'yield', to avoid creating a new + keyword, and to take advantage of the explicit clarity of the + '=' operator. + + + Q. How often does one need to write a lazy consumer or a co-routine? + + A. Not often. But, when you DO have to write one, this approach + is the easiest to implement, read, and debug. + + It clearly beats using existing generators and passing data + through global variables. It is much clearer and easier to + debug than an equivalent approach using threading, mutexes, + semaphores, and data queues. A class based approach competes + well when there are no complex execution states or variable + states. When the complexity increases, generators with + two-way communication are much simpler because they + automatically save state unlike classes which must explicitly + store variable and execution state in instance variables. + + + Q. Why does yield require an argument? Isn't yield None too wordy? + + A. It doesn't matter for the purposes of this PEP. For + information purposes, here is the reasoning as I understand + it. Though return allows an implicit None, some now consider + this to be weak design. There is some spirit of "Explicit is + better than Implicit". More importantly, in most uses of + yield, a missing argument is more likely to be a bug than an + intended yield None. + + +Specification for Generator Exception Passing: + + Add a .throw(exception) method to the resulting generator as in: + + def mygen(): + try: + while 1: + x = yield None + print x + except FlushStream: + print 'Done' + + g = mygen() + g.next(5) + g.throw(FlushStream) + + There is no existing work around for triggering an exception + inside a generator. This is a true deficiency. It is the only + case in Python where active code cannot be excepted to or through. + Even if .next(arg) is not adopted, we should add the .throw() + method. + + Note A: The name of the throw method was selected for several + reasons. Raise is a keyword and so cannot be used as a method + name. Unlike raise which immediately raises an exception from the + current execution point, throw will first return to the generator + and then raise the exception. The word throw is suggestive of + putting the exception in another location. The word throw is + already associated with exceptions in other languages. + + +References + + [1] PEP 255 Simple Generators + http://python.sourceforge.net/peps/pep-0255.html + + [2] PEP 212 Loop Counter Iteration + http://python.sourceforge.net/peps/pep-0212.html + + [3] PEP 202 List Comprehensions + http://python.sourceforge.net/peps/pep-0202.html + + [4] There have been several discussion on comp.lang.python which helped + tease out these proposals: + + Indexed Function + http://groups.google.com/groups?hl=en&th=33f778d92dd5720a + + Xmap, Xfilter, Xzip and Two-way Generator Communication + http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1 + + Two-way Generator Communication -- Revised Version + http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1 + + Generator Comprehensions + http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2 + + + http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7 + + [5] Dr. David Mertz's draft column for Charming Python. + href="http://gnosis.cx/publish/programming/charming_python_b5.txt + + +Copyright + + This document has been placed in the public domain. + + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +fill-column: 70 +End: + +