minor style conformance changes
This commit is contained in:
parent
414241bf39
commit
741090d1c4
250
pep-0219.txt
250
pep-0219.txt
|
@ -4,166 +4,184 @@ Version: $Revision$
|
|||
Author: gmcm@hypernet.com (Gordon McMillan)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Python-Version: 2.1 Created: 14-Aug-2000
|
||||
Created: 14-Aug-2000
|
||||
Python-Version: 2.1
|
||||
Post-History:
|
||||
|
||||
|
||||
Introduction
|
||||
|
||||
This PEP discusses changes required to core Python in order to
|
||||
efficiently support generators, microthreads and coroutines. It
|
||||
is related to PEP 220, which describes how Python should be extended
|
||||
efficiently support generators, microthreads and coroutines. It is
|
||||
related to PEP 220, which describes how Python should be extended
|
||||
to support these facilities. The focus of this PEP is strictly on
|
||||
the changes required to allow these extensions to work.
|
||||
|
||||
While these PEPs are based on Christian Tismer's Stackless[1]
|
||||
implementation, they do not regard Stackless as a reference
|
||||
implementation. Stackless (with an extension module) implements
|
||||
continuations, and from continuations one can implement coroutines,
|
||||
microthreads (as has been done by Will Ware[2]) and generators. But
|
||||
in more that a year, no one has found any other productive use of
|
||||
continuations, so there seems to be no demand for their support.
|
||||
While these PEPs are based on Christian Tismer's Stackless[1]
|
||||
implementation, they do not regard Stackless as a reference
|
||||
implementation. Stackless (with an extension module) implements
|
||||
continuations, and from continuations one can implement
|
||||
coroutines, microthreads (as has been done by Will Ware[2]) and
|
||||
generators. But in more that a year, no one has found any other
|
||||
productive use of continuations, so there seems to be no demand
|
||||
for their support.
|
||||
|
||||
However, Stackless support for continuations is a relatively minor
|
||||
piece of the implementation, so one might regard it as "a" reference
|
||||
implementation (rather than "the" reference implementation).
|
||||
piece of the implementation, so one might regard it as "a"
|
||||
reference implementation (rather than "the" reference
|
||||
implementation).
|
||||
|
||||
|
||||
Background
|
||||
|
||||
Generators and coroutines have been implmented in a number of languages in
|
||||
a number of ways. Indeed, Tim Peters has done pure Python implementations
|
||||
of generators[3] and coroutines[4] using threads (and a thread-based
|
||||
coroutine implementation exists for Java). However, the horrendous
|
||||
overhead of a thread-based implementation severely limits the usefulness
|
||||
of this approach.
|
||||
Generators and coroutines have been implemented in a number of
|
||||
languages in a number of ways. Indeed, Tim Peters has done pure
|
||||
Python implementations of generators[3] and coroutines[4] using
|
||||
threads (and a thread-based coroutine implementation exists for
|
||||
Java). However, the horrendous overhead of a thread-based
|
||||
implementation severely limits the usefulness of this approach.
|
||||
|
||||
Microthreads (a.k.a "green" or "user" threads) and coroutines involve
|
||||
transfers of control that are difficult to accomodate in a language
|
||||
implementation based on a single stack. (Generators can be done on a
|
||||
single stack, but they can also be regarded as a very simple case of
|
||||
coroutines.)
|
||||
Microthreads (a.k.a "green" or "user" threads) and coroutines
|
||||
involve transfers of control that are difficult to accommodate in
|
||||
a language implementation based on a single stack. (Generators can
|
||||
be done on a single stack, but they can also be regarded as a very
|
||||
simple case of coroutines.)
|
||||
|
||||
Real threads allocate a full-sized stack for each thread of control, and
|
||||
this is the major source of overhead. However, coroutines and microthreads
|
||||
can be implemented in Python in a way that involves almost no overhead.
|
||||
This PEP, therefor, offers a way for making Python able to realistically
|
||||
manage thousands of separate "threads" of activity (vs. todays limit of
|
||||
perhaps dozens of separate threads of activity).
|
||||
Real threads allocate a full-sized stack for each thread of
|
||||
control, and this is the major source of overhead. However,
|
||||
coroutines and microthreads can be implemented in Python in a way
|
||||
that involves almost no overhead. This PEP, therefor, offers a
|
||||
way for making Python able to realistically manage thousands of
|
||||
separate "threads" of activity (vs. todays limit of perhaps dozens
|
||||
of separate threads of activity).
|
||||
|
||||
Another justification for this PEP (explored in PEP 220) is that
|
||||
coroutines and generators often allow a more direct expression of
|
||||
an algorithm than is possible in today's Python.
|
||||
|
||||
Another justification for this PEP (explored in PEP 220) is that
|
||||
coroutines and generators often allow a more direct expression of an
|
||||
algorithm than is possible in today's Python.
|
||||
|
||||
Discussion
|
||||
|
||||
The first thing to note is that Python, while it mingles interpreter data
|
||||
(normal C stack usage) with Python data (the state of the interpreted
|
||||
program) on the stack, the two are logically separate. They just happen to
|
||||
use the same stack.
|
||||
The first thing to note is that Python, while it mingles
|
||||
interpreter data (normal C stack usage) with Python data (the
|
||||
state of the interpreted program) on the stack, the two are
|
||||
logically separate. They just happen to use the same stack.
|
||||
|
||||
A real thread gets something approaching a process-sized stack because the
|
||||
implementation has no way of knowing how much stack space the thread will
|
||||
require. The stack space required for an individual frame is likely to be
|
||||
reasonable, but stack switching is an arcane and non-portable process,
|
||||
not supported by C.
|
||||
A real thread gets something approaching a process-sized stack
|
||||
because the implementation has no way of knowing how much stack
|
||||
space the thread will require. The stack space required for an
|
||||
individual frame is likely to be reasonable, but stack switching
|
||||
is an arcane and non-portable process, not supported by C.
|
||||
|
||||
Once Python stops putting Python data on the C stack, however, stack
|
||||
switching becomes easy.
|
||||
Once Python stops putting Python data on the C stack, however,
|
||||
stack switching becomes easy.
|
||||
|
||||
The fundamental approach of the PEP is based on these two ideas. First,
|
||||
separate C's stack usage from Python's stack usage. Secondly, associate
|
||||
with each frame enough stack space to handle that frame's execution.
|
||||
The fundamental approach of the PEP is based on these two
|
||||
ideas. First, separate C's stack usage from Python's stack
|
||||
usage. Secondly, associate with each frame enough stack space to
|
||||
handle that frame's execution.
|
||||
|
||||
In the normal usage, Stackless Python has a normal stack
|
||||
structure, except that it is broken into chunks. But in the
|
||||
presence of a coroutine / microthread extension, this same
|
||||
mechanism supports a stack with a tree structure. That is, an
|
||||
extension can support transfers of control between frames outside
|
||||
the normal "call / return" path.
|
||||
|
||||
In the normal usage, Stackless Python has a normal stack structure,
|
||||
except that it is broken into chunks. But in the presence of a
|
||||
coroutine / microthread extension, this same mechanism supports a stack
|
||||
with a tree structure. That is, an extension can support transfers of
|
||||
control between frames outside the normal "call / return" path.
|
||||
|
||||
Problems
|
||||
|
||||
The major difficulty with this approach is C calling Python. The problem
|
||||
is that the C stack now holds a nested execution of the byte-code
|
||||
interpreter. In that situation, a coroutine / microthread extension cannot
|
||||
be permitted to transfer control to a frame in a different invocation of the
|
||||
byte-code interpreter. If a frame were to complete and exit back to C from
|
||||
the wrong interpreter, the C stack could be trashed.
|
||||
The major difficulty with this approach is C calling Python. The
|
||||
problem is that the C stack now holds a nested execution of the
|
||||
byte-code interpreter. In that situation, a coroutine /
|
||||
microthread extension cannot be permitted to transfer control to a
|
||||
frame in a different invocation of the byte-code interpreter. If a
|
||||
frame were to complete and exit back to C from the wrong
|
||||
interpreter, the C stack could be trashed.
|
||||
|
||||
The ideal solution is to create a mechanism where nested executions of the
|
||||
byte code interpreter are never needed. The easy solution is for the
|
||||
coroutine / microthread extension(s) to recognize the situation and refuse
|
||||
to allow transfers outside the current invocation.
|
||||
The ideal solution is to create a mechanism where nested
|
||||
executions of the byte code interpreter are never needed. The easy
|
||||
solution is for the coroutine / microthread extension(s) to
|
||||
recognize the situation and refuse to allow transfers outside the
|
||||
current invocation.
|
||||
|
||||
We can categorize code that involves C calling Python into two camps:
|
||||
Python's implementation, and C extensions. And hopefully we can offer a
|
||||
compromise: Python's internal usage (and C extension writers who want to
|
||||
go to the effort) will no longer use a nested invocation of the
|
||||
interpreter. Extensions which do not go to the effort will still be
|
||||
safe, but will not play well with coroutines / microthreads.
|
||||
We can categorize code that involves C calling Python into two
|
||||
camps: Python's implementation, and C extensions. And hopefully we
|
||||
can offer a compromise: Python's internal usage (and C extension
|
||||
writers who want to go to the effort) will no longer use a nested
|
||||
invocation of the interpreter. Extensions which do not go to the
|
||||
effort will still be safe, but will not play well with coroutines
|
||||
/ microthreads.
|
||||
|
||||
Generally, when a recursive call is transformed into a loop, a bit of
|
||||
extra bookkeeping is required. The loop will need to keep it's own
|
||||
"stack" of arguments and results since the real stack can now only hold
|
||||
the most recent. The code will be more verbose, because it's not quite
|
||||
as obvious when we're done. While Stackless is not implemented this way,
|
||||
it has to deal with the same issues.
|
||||
Generally, when a recursive call is transformed into a loop, a bit
|
||||
of extra bookkeeping is required. The loop will need to keep it's
|
||||
own "stack" of arguments and results since the real stack can now
|
||||
only hold the most recent. The code will be more verbose, because
|
||||
it's not quite as obvious when we're done. While Stackless is not
|
||||
implemented this way, it has to deal with the same issues.
|
||||
|
||||
In normal Python, PyEval_EvalCode is used to build a frame and execute
|
||||
it. Stackless Python introduces the concept of a FrameDispatcher. Like
|
||||
PyEval_EvalCode, it executes one frame. But the interpreter may signal
|
||||
the FrameDispatcher that a new frame has been swapped in, and the new
|
||||
frame should be executed. When a frame completes, the FrameDispatcher
|
||||
follows the back pointer to resume the "calling" frame.
|
||||
In normal Python, PyEval_EvalCode is used to build a frame and
|
||||
execute it. Stackless Python introduces the concept of a
|
||||
FrameDispatcher. Like PyEval_EvalCode, it executes one frame. But
|
||||
the interpreter may signal the FrameDispatcher that a new frame
|
||||
has been swapped in, and the new frame should be executed. When a
|
||||
frame completes, the FrameDispatcher follows the back pointer to
|
||||
resume the "calling" frame.
|
||||
|
||||
So Stackless transforms recursions into a loop, but it is not the
|
||||
FrameDispatcher that manages the frames. This is done by the interpreter
|
||||
(or an extension that knows what it's doing).
|
||||
So Stackless transforms recursions into a loop, but it is not the
|
||||
FrameDispatcher that manages the frames. This is done by the
|
||||
interpreter (or an extension that knows what it's doing).
|
||||
|
||||
The general idea is that where C code needs to execute Python code, it
|
||||
creates a frame for the Python code, setting its back pointer to the
|
||||
current frame. Then it swaps in the frame, signals the FrameDispatcher
|
||||
and gets out of the way. The C stack is now clean - the Python code can
|
||||
transfer control to any other frame (if an extension gives it the means
|
||||
to do so).
|
||||
The general idea is that where C code needs to execute Python
|
||||
code, it creates a frame for the Python code, setting its back
|
||||
pointer to the current frame. Then it swaps in the frame, signals
|
||||
the FrameDispatcher and gets out of the way. The C stack is now
|
||||
clean - the Python code can transfer control to any other frame
|
||||
(if an extension gives it the means to do so).
|
||||
|
||||
In the vanilla case, this magic can be hidden from the programmer (even,
|
||||
in most cases, from the Python-internals programmer). Many situations
|
||||
present another level of difficulty, however.
|
||||
In the vanilla case, this magic can be hidden from the programmer
|
||||
(even, in most cases, from the Python-internals programmer). Many
|
||||
situations present another level of difficulty, however.
|
||||
|
||||
The map builtin function involves two obstacles to this approach. It
|
||||
cannot simply construct a frame and get out of the way, not just because
|
||||
there's a loop involved, but each pass through the loop requires some
|
||||
"post" processing. In order to play well with others, Stackless
|
||||
constructs a frame object for map itself.
|
||||
The map builtin function involves two obstacles to this
|
||||
approach. It cannot simply construct a frame and get out of the
|
||||
way, not just because there's a loop involved, but each pass
|
||||
through the loop requires some "post" processing. In order to play
|
||||
well with others, Stackless constructs a frame object for map
|
||||
itself.
|
||||
|
||||
Most recursions of the interpreter are not this complex, but fairly
|
||||
frequently, some "post" operations are required. Stackless does not
|
||||
fix these situations because of amount of code changes required. Instead,
|
||||
Stackless prohibits transfers out of a nested interpreter. While not
|
||||
ideal (and sometimes puzzling), this limitation is hardly crippling.
|
||||
Most recursions of the interpreter are not this complex, but
|
||||
fairly frequently, some "post" operations are required. Stackless
|
||||
does not fix these situations because of amount of code changes
|
||||
required. Instead, Stackless prohibits transfers out of a nested
|
||||
interpreter. While not ideal (and sometimes puzzling), this
|
||||
limitation is hardly crippling.
|
||||
|
||||
|
||||
Advantages
|
||||
|
||||
For normal Python, the advantage to this approach is that C stack usage
|
||||
becomes much smaller and more predictable. Unbounded recursion in Python
|
||||
code becomes a memory error, instead of a stack error (and thus, in
|
||||
non-Cupertino operating systems, something that can be recovered from).
|
||||
The price, of course, is the added complexity that comes from transforming
|
||||
recursions of the byte-code interpreter loop into a higher order loop
|
||||
(and the attendant bookkeeping involved).
|
||||
For normal Python, the advantage to this approach is that C stack
|
||||
usage becomes much smaller and more predictable. Unbounded
|
||||
recursion in Python code becomes a memory error, instead of a
|
||||
stack error (and thus, in non-Cupertino operating systems,
|
||||
something that can be recovered from). The price, of course, is
|
||||
the added complexity that comes from transforming recursions of
|
||||
the byte-code interpreter loop into a higher order loop (and the
|
||||
attendant bookkeeping involved).
|
||||
|
||||
The big advantage comes from realizing that the Python stack is
|
||||
really a tree, and the frame dispatcher can transfer control
|
||||
freely between leaf nodes of the tree, thus allowing things like
|
||||
microthreads and coroutines.
|
||||
|
||||
The big advantage comes from realizing that the Python stack is really
|
||||
a tree, and the frame dispatcher can transfer control freely between
|
||||
leaf nodes of the tree, thus allowing things like microthreads and
|
||||
coroutines.
|
||||
|
||||
References
|
||||
|
||||
[1] www.stackless.com
|
||||
[2] http://world.std.com/~wware/uthread.html
|
||||
[3] Demo/threads/Generator.py in the source distribution
|
||||
[4] http://www.stackless.com/coroutines.tim.peters.html
|
||||
[1] www.stackless.com
|
||||
[2] http://world.std.com/~wware/uthread.html
|
||||
[3] Demo/threads/Generator.py in the source distribution
|
||||
[4] http://www.stackless.com/coroutines.tim.peters.html
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
|
|
Loading…
Reference in New Issue