Integrated Raymond Hettinger's latest update

This commit is contained in:
Barry Warsaw 2002-02-01 14:55:46 +00:00
parent 080582a8fd
commit f2115cc852
1 changed files with 66 additions and 130 deletions

View File

@ -14,8 +14,8 @@ Abstract
This PEP introduces four orthogonal (not mutually exclusive) ideas
for enhancing the generators as introduced in Python version 2.2
[1]. The goal is increase the convenience, utility, and power of
generators.
[1]. The goal is to increase the convenience, utility, and power
of generators.
Rationale
@ -115,7 +115,7 @@ Specification for new built-ins:
args = tuple(values())
if not values_left[0]:
raise StopIteration
yield func(*args)
yield fun(*args)
def xzip( *collections ):
'''
@ -135,7 +135,7 @@ Specification for new built-ins:
'Generates an indexed series: (0,seqn[0]), (1,seqn[1]) ...'
gen = iter(collection)
while limit is None or cnt<limit:
yield (cnt, collection.next())
yield (cnt, gen.next())
cnt += 1
Note A: PEP 212 Loop Counter Iteration [2] discussed several
@ -185,7 +185,7 @@ Specification for Generator Comprehensions:
function is a generator (currently the only cue that a function is
really a generator is the presence of the yield keyword). On the
minus side, the brackets may falsely suggest that the whole
expression returns a list. All of the feedback received to date
expression returns a list. Most of the feedback received to date
indicates that brackets do not make a false suggestion and are
in fact helpful.
@ -202,30 +202,54 @@ Specification for two-way Generator Parameter Passing:
2. Let the .next() method take a value to pass to generator as in:
g = mygen()
g.next() # runs the generators until the first 'yield'
g.next(1) # the '1' gets bound to 'x' in mygen()
g.next(2) # the '2' gets bound to 'x' in mygen()
g.next() # runs the generator until the first 'yield'
g.next(1) # '1' is bound to 'x' in mygen(), then printed
g.next(2) # '2' is bound to 'x' in mygen(), then printed
Note A: An early question arose, when would you need this? The
answer is that existing generators make it easy to write lazy
producers which may have a complex execution state and/or complex
variable state. This proposal makes it equally easy to write lazy
consumers which may also have a complex execution or variable
state.
The control flow is unchanged by this proposal. The only change
is that a value can be sent into the generator. By analogy,
consider the quality improvement from GOSUB (which had no argument
passing mechanism) to modern procedure calls (which pass in
arguments and return values).
For instance, when writing an encoder for arithmetic compression,
a series of fractional values are sent to a function which has
periodic output and a complex state which depends on previous
inputs. Also, that encoder requires a flush() function when no
additional fractions are to be output. It is helpful to think of
the following parallel with file output streams:
Most of the underlying machinery is already in place, only the
communication needs to be added by modifying the parse syntax to
accept the new 'x = yield expr' syntax and by allowing the .next()
method to accept an optional argument.
ostream = file('mydest.txt','w')
ostream.write(firstdat)
ostream.write(seconddat)
ostream.flush()
Yield is more than just a simple iterator creator. It does
something else truly wonderful -- it suspends execution and saves
state. It is good for a lot more than writing iterators. This
proposal further expands its capability by making it easier to
share data with the generator.
With the proposed extensions, it could be written like this:
The .next(arg) mechanism is especially useful for:
1. Sending data to any generator
2. Writing lazy consumers with complex execution states
3. Writing co-routines (as demonstrated in Dr. Mertz's article [5])
The proposal is a clear improvement over the existing alternative
of passing data via global variables. It is also much simpler,
more readable and easier to debug than an approach involving the
threading module with its attendant mutexes, semaphores, and data
queues. A class-based approach competes well when there are no
complex execution states or variable states. When the complexity
increases, generators with two-way communication are much simpler
because they automatically save state (unlike classes which must
explicitly save the variable and execution state in instance
variables).
Example of a Complex Consumer
The encoder for arithmetic compression sends a series of
fractional values to a complex, lazy consumer. That consumer
makes computations based on previous inputs and only writes out
when certain conditions have been met. After the last fraction is
received, it has a procedure for flushing any unwritten data.
Example of a Consumer Stream
def filelike(packagename, appendOrOverwrite):
cum = []
@ -237,110 +261,24 @@ Specification for two-way Generator Parameter Passing:
cum.append(dat)
except FlushStream:
packages[packagename] = cum
ostream = filelike('mydest','w')
ostream.next()
ostream.next(firstdat)
ostream = filelike('mydest','w') # Analogous to file.open(name,flag)
ostream.next() # Advance to the first yield
ostream.next(firstdat) # Analogous to file.write(dat)
ostream.next(seconddat)
ostream.throw( FlushStream ) # this feature discussed below
Note C: Almost all of the machinery necessary to implement this
extension is already in place. The parse syntax needs to be
modified to accept the new x = yield None syntax and the .next()
method needs to allow an argument.
Note D: Some care must be used when writing a values to the
generator because execution starts at the top of the generator not
at the first yield.
Consider the usual flow using .next() without an argument.
g = mygen(p1) will bind p1 to a local variable and then return a
generator to be bound to g and NOT run any code in mygen().
y = g.next() runs the generator from the first line until it
encounters a yield when it suspends execution and a returns
a value to be bound to y
Since the same flow applies when you are submitting values, the
first call to .next() should have no argument since there is no
place to put it.
g = mygen(p1) will bind p1 to a local variable and then return a
generator to be bound to g and NOT run any code in mygen()
g.next() will START execution in mygen() from the first line. Note,
that there is nowhere to bind any potential arguments that
might have been supplied to next(). Execution continues
until the first yield is encountered and control is returned
to the caller.
g.next(val) resumes execution at the yield and binds val to the
left hand side of the yield assignment and continues running
until another yield is encountered. This makes sense because
you submit values expecting them to be processed right away.
ostream.throw( FlushStream ) # This feature proposed below
Q. Two-way generator parameter passing seems awfully bold. To
my mind, one of the great things about generators is that they
meet the (very simple) definition of an iterator. With this,
they no longer do. I like lazy consumers -- really I do --
but I'd rather be conservative about putting something like
this in the language.
Example of a Complex Consumer
A. If you don't use x = yield expr, then nothing changes and you
haven't lost anything. So, it isn't really bold. It simply
adds an option to pass in data as well as take it out. Other
generator implementations (like the thread based generator.py)
already have provisions for two-way parameter passing so that
consumers are put on an equal footing with producers. Two-way
is the norm, not the exception.
Yield is not just a simple iterator creator. It does
something else truly wonderful -- it suspends execution and
saves state. It is good for a lot more than its original
purpose. Dr. Mertz's article [5] shows how they can be used
to create general purpose co-routines.
Besides, 98% of the mechanism is already in place. Only the
communication needs to be added. Remember GOSUB which neither
took nor returned data. Routines which accepted parameters
and returned values were a major step forward.
When you first need to pass information into a generator, the
existing alternative is clumsy. It involves setting a global
variable, calling .next(), and assigning the local from the
global.
Q. Why not introduce another keyword 'accept' for lazy consumers?
A. To avoid conflicts with 'yield', to avoid creating a new
keyword, and to take advantage of the explicit clarity of the
'=' operator.
Q. How often does one need to write a lazy consumer or a co-routine?
A. Not often. But, when you DO have to write one, this approach
is the easiest to implement, read, and debug.
It clearly beats using existing generators and passing data
through global variables. It is much clearer and easier to
debug than an equivalent approach using threading, mutexes,
semaphores, and data queues. A class based approach competes
well when there are no complex execution states or variable
states. When the complexity increases, generators with
two-way communication are much simpler because they
automatically save state unlike classes which must explicitly
store variable and execution state in instance variables.
Q. Why does yield require an argument? Isn't yield None too wordy?
A. It doesn't matter for the purposes of this PEP. For
information purposes, here is the reasoning as I understand
it. Though return allows an implicit None, some now consider
this to be weak design. There is some spirit of "Explicit is
better than Implicit". More importantly, in most uses of
yield, a missing argument is more likely to be a bug than an
intended yield None.
Loop over the picture files in a directory, shrink them
one-at-a-time to thumbnail size using PIL, and send them to a lazy
consumer. That consumer is responsible for creating a large blank
image, accepting thumbnails one-at-a-time and placing them in a
5x3 grid format onto the blank image. Whenever the grid is full,
it writes-out the large image as an index print. A FlushStream
exception indicates that no more thumbnails are available and that
the partial index print should be written out if there are one or
more thumbnails on it.
Specification for Generator Exception Passing:
@ -362,8 +300,8 @@ Specification for Generator Exception Passing:
There is no existing work around for triggering an exception
inside a generator. This is a true deficiency. It is the only
case in Python where active code cannot be excepted to or through.
Even if .next(arg) is not adopted, we should add the .throw()
method.
Even if the .next(arg) proposal is not adopted, we should add the
.throw() method.
Note A: The name of the throw method was selected for several
reasons. Raise is a keyword and so cannot be used as a method
@ -400,7 +338,7 @@ References
Generator Comprehensions
http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2
Discussion Draft of this PEP
http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7
[5] Dr. David Mertz's draft column for Charming Python.
@ -418,5 +356,3 @@ mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: