Integrated Raymond Hettinger's latest update

This commit is contained in:
Barry Warsaw 2002-02-01 14:55:46 +00:00
parent 080582a8fd
commit f2115cc852
1 changed files with 66 additions and 130 deletions

View File

@ -14,8 +14,8 @@ Abstract
This PEP introduces four orthogonal (not mutually exclusive) ideas This PEP introduces four orthogonal (not mutually exclusive) ideas
for enhancing the generators as introduced in Python version 2.2 for enhancing the generators as introduced in Python version 2.2
[1]. The goal is increase the convenience, utility, and power of [1]. The goal is to increase the convenience, utility, and power
generators. of generators.
Rationale Rationale
@ -115,7 +115,7 @@ Specification for new built-ins:
args = tuple(values()) args = tuple(values())
if not values_left[0]: if not values_left[0]:
raise StopIteration raise StopIteration
yield func(*args) yield fun(*args)
def xzip( *collections ): def xzip( *collections ):
''' '''
@ -135,7 +135,7 @@ Specification for new built-ins:
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...' 'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
gen = iter(collection) gen = iter(collection)
while limit is None or cnt<limit: while limit is None or cnt<limit:
yield (cnt, collection.next()) yield (cnt, gen.next())
cnt += 1 cnt += 1
Note A: PEP 212 Loop Counter Iteration [2] discussed several Note A: PEP 212 Loop Counter Iteration [2] discussed several
@ -185,7 +185,7 @@ Specification for Generator Comprehensions:
function is a generator (currently the only cue that a function is function is a generator (currently the only cue that a function is
really a generator is the presence of the yield keyword). On the really a generator is the presence of the yield keyword). On the
minus side, the brackets may falsely suggest that the whole minus side, the brackets may falsely suggest that the whole
expression returns a list. All of the feedback received to date expression returns a list. Most of the feedback received to date
indicates that brackets do not make a false suggestion and are indicates that brackets do not make a false suggestion and are
in fact helpful. in fact helpful.
@ -202,30 +202,54 @@ Specification for two-way Generator Parameter Passing:
2. Let the .next() method take a value to pass to generator as in: 2. Let the .next() method take a value to pass to generator as in:
g = mygen() g = mygen()
g.next() # runs the generators until the first 'yield' g.next() # runs the generator until the first 'yield'
g.next(1) # the '1' gets bound to 'x' in mygen() g.next(1) # '1' is bound to 'x' in mygen(), then printed
g.next(2) # the '2' gets bound to 'x' in mygen() g.next(2) # '2' is bound to 'x' in mygen(), then printed
Note A: An early question arose, when would you need this? The The control flow is unchanged by this proposal. The only change
answer is that existing generators make it easy to write lazy is that a value can be sent into the generator. By analogy,
producers which may have a complex execution state and/or complex consider the quality improvement from GOSUB (which had no argument
variable state. This proposal makes it equally easy to write lazy passing mechanism) to modern procedure calls (which pass in
consumers which may also have a complex execution or variable arguments and return values).
state.
For instance, when writing an encoder for arithmetic compression, Most of the underlying machinery is already in place, only the
a series of fractional values are sent to a function which has communication needs to be added by modifying the parse syntax to
periodic output and a complex state which depends on previous accept the new 'x = yield expr' syntax and by allowing the .next()
inputs. Also, that encoder requires a flush() function when no method to accept an optional argument.
additional fractions are to be output. It is helpful to think of
the following parallel with file output streams:
ostream = file('mydest.txt','w') Yield is more than just a simple iterator creator. It does
ostream.write(firstdat) something else truly wonderful -- it suspends execution and saves
ostream.write(seconddat) state. It is good for a lot more than writing iterators. This
ostream.flush() proposal further expands its capability by making it easier to
share data with the generator.
With the proposed extensions, it could be written like this: The .next(arg) mechanism is especially useful for:
1. Sending data to any generator
2. Writing lazy consumers with complex execution states
3. Writing co-routines (as demonstrated in Dr. Mertz's article [5])
The proposal is a clear improvement over the existing alternative
of passing data via global variables. It is also much simpler,
more readable and easier to debug than an approach involving the
threading module with its attendant mutexes, semaphores, and data
queues. A class-based approach competes well when there are no
complex execution states or variable states. When the complexity
increases, generators with two-way communication are much simpler
because they automatically save state (unlike classes which must
explicitly save the variable and execution state in instance
variables).
Example of a Complex Consumer
The encoder for arithmetic compression sends a series of
fractional values to a complex, lazy consumer. That consumer
makes computations based on previous inputs and only writes out
when certain conditions have been met. After the last fraction is
received, it has a procedure for flushing any unwritten data.
Example of a Consumer Stream
def filelike(packagename, appendOrOverwrite): def filelike(packagename, appendOrOverwrite):
cum = [] cum = []
@ -237,110 +261,24 @@ Specification for two-way Generator Parameter Passing:
cum.append(dat) cum.append(dat)
except FlushStream: except FlushStream:
packages[packagename] = cum packages[packagename] = cum
ostream = filelike('mydest','w') ostream = filelike('mydest','w') # Analogous to file.open(name,flag)
ostream.next() ostream.next() # Advance to the first yield
ostream.next(firstdat) ostream.next(firstdat) # Analogous to file.write(dat)
ostream.next(seconddat) ostream.next(seconddat)
ostream.throw( FlushStream ) # this feature discussed below ostream.throw( FlushStream ) # This feature proposed below
Note C: Almost all of the machinery necessary to implement this
extension is already in place. The parse syntax needs to be
modified to accept the new x = yield None syntax and the .next()
method needs to allow an argument.
Note D: Some care must be used when writing a values to the
generator because execution starts at the top of the generator not
at the first yield.
Consider the usual flow using .next() without an argument.
g = mygen(p1) will bind p1 to a local variable and then return a
generator to be bound to g and NOT run any code in mygen().
y = g.next() runs the generator from the first line until it
encounters a yield when it suspends execution and a returns
a value to be bound to y
Since the same flow applies when you are submitting values, the
first call to .next() should have no argument since there is no
place to put it.
g = mygen(p1) will bind p1 to a local variable and then return a
generator to be bound to g and NOT run any code in mygen()
g.next() will START execution in mygen() from the first line. Note,
that there is nowhere to bind any potential arguments that
might have been supplied to next(). Execution continues
until the first yield is encountered and control is returned
to the caller.
g.next(val) resumes execution at the yield and binds val to the
left hand side of the yield assignment and continues running
until another yield is encountered. This makes sense because
you submit values expecting them to be processed right away.
Q. Two-way generator parameter passing seems awfully bold. To Example of a Complex Consumer
my mind, one of the great things about generators is that they
meet the (very simple) definition of an iterator. With this,
they no longer do. I like lazy consumers -- really I do --
but I'd rather be conservative about putting something like
this in the language.
A. If you don't use x = yield expr, then nothing changes and you Loop over the picture files in a directory, shrink them
haven't lost anything. So, it isn't really bold. It simply one-at-a-time to thumbnail size using PIL, and send them to a lazy
adds an option to pass in data as well as take it out. Other consumer. That consumer is responsible for creating a large blank
generator implementations (like the thread based generator.py) image, accepting thumbnails one-at-a-time and placing them in a
already have provisions for two-way parameter passing so that 5x3 grid format onto the blank image. Whenever the grid is full,
consumers are put on an equal footing with producers. Two-way it writes-out the large image as an index print. A FlushStream
is the norm, not the exception. exception indicates that no more thumbnails are available and that
the partial index print should be written out if there are one or
Yield is not just a simple iterator creator. It does more thumbnails on it.
something else truly wonderful -- it suspends execution and
saves state. It is good for a lot more than its original
purpose. Dr. Mertz's article [5] shows how they can be used
to create general purpose co-routines.
Besides, 98% of the mechanism is already in place. Only the
communication needs to be added. Remember GOSUB which neither
took nor returned data. Routines which accepted parameters
and returned values were a major step forward.
When you first need to pass information into a generator, the
existing alternative is clumsy. It involves setting a global
variable, calling .next(), and assigning the local from the
global.
Q. Why not introduce another keyword 'accept' for lazy consumers?
A. To avoid conflicts with 'yield', to avoid creating a new
keyword, and to take advantage of the explicit clarity of the
'=' operator.
Q. How often does one need to write a lazy consumer or a co-routine?
A. Not often. But, when you DO have to write one, this approach
is the easiest to implement, read, and debug.
It clearly beats using existing generators and passing data
through global variables. It is much clearer and easier to
debug than an equivalent approach using threading, mutexes,
semaphores, and data queues. A class based approach competes
well when there are no complex execution states or variable
states. When the complexity increases, generators with
two-way communication are much simpler because they
automatically save state unlike classes which must explicitly
store variable and execution state in instance variables.
Q. Why does yield require an argument? Isn't yield None too wordy?
A. It doesn't matter for the purposes of this PEP. For
information purposes, here is the reasoning as I understand
it. Though return allows an implicit None, some now consider
this to be weak design. There is some spirit of "Explicit is
better than Implicit". More importantly, in most uses of
yield, a missing argument is more likely to be a bug than an
intended yield None.
Specification for Generator Exception Passing: Specification for Generator Exception Passing:
@ -362,8 +300,8 @@ Specification for Generator Exception Passing:
There is no existing work around for triggering an exception There is no existing work around for triggering an exception
inside a generator. This is a true deficiency. It is the only inside a generator. This is a true deficiency. It is the only
case in Python where active code cannot be excepted to or through. case in Python where active code cannot be excepted to or through.
Even if .next(arg) is not adopted, we should add the .throw() Even if the .next(arg) proposal is not adopted, we should add the
method. .throw() method.
Note A: The name of the throw method was selected for several Note A: The name of the throw method was selected for several
reasons. Raise is a keyword and so cannot be used as a method reasons. Raise is a keyword and so cannot be used as a method
@ -400,7 +338,7 @@ References
Generator Comprehensions Generator Comprehensions
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2 http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
Discussion Draft of this PEP
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7 http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
[5] Dr. David Mertz's draft column for Charming Python. [5] Dr. David Mertz's draft column for Charming Python.
@ -418,5 +356,3 @@ mode: indented-text
indent-tabs-mode: nil indent-tabs-mode: nil
fill-column: 70 fill-column: 70
End: End: