190 lines
7.9 KiB
ReStructuredText
190 lines
7.9 KiB
ReStructuredText
PEP: 219
|
|
Title: Stackless Python
|
|
Author: Gordon McMillan <gmcm@hypernet.com>
|
|
Status: Deferred
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 14-Aug-2000
|
|
Python-Version: 2.1
|
|
Post-History:
|
|
|
|
|
|
Introduction
|
|
============
|
|
|
|
This PEP discusses changes required to core Python in order to
|
|
efficiently support generators, microthreads and coroutines. It is
|
|
related to :pep:`220`, which describes how Python should be extended
|
|
to support these facilities. The focus of this PEP is strictly on
|
|
the changes required to allow these extensions to work.
|
|
|
|
While these PEPs are based on Christian Tismer's Stackless [1]_
|
|
implementation, they do not regard Stackless as a reference
|
|
implementation. Stackless (with an extension module) implements
|
|
continuations, and from continuations one can implement
|
|
coroutines, microthreads (as has been done by Will Ware [2]_) and
|
|
generators. But in more than a year, no one has found any other
|
|
productive use of continuations, so there seems to be no demand
|
|
for their support.
|
|
|
|
However, Stackless support for continuations is a relatively minor
|
|
piece of the implementation, so one might regard it as "a"
|
|
reference implementation (rather than "the" reference
|
|
implementation).
|
|
|
|
|
|
Background
|
|
==========
|
|
|
|
Generators and coroutines have been implemented in a number of
|
|
languages in a number of ways. Indeed, Tim Peters has done pure
|
|
Python implementations of generators [3]_ and coroutines [4]_ using
|
|
threads (and a thread-based coroutine implementation exists for
|
|
Java). However, the horrendous overhead of a thread-based
|
|
implementation severely limits the usefulness of this approach.
|
|
|
|
Microthreads (a.k.a "green" or "user" threads) and coroutines
|
|
involve transfers of control that are difficult to accommodate in
|
|
a language implementation based on a single stack. (Generators can
|
|
be done on a single stack, but they can also be regarded as a very
|
|
simple case of coroutines.)
|
|
|
|
Real threads allocate a full-sized stack for each thread of
|
|
control, and this is the major source of overhead. However,
|
|
coroutines and microthreads can be implemented in Python in a way
|
|
that involves almost no overhead. This PEP, therefore, offers a
|
|
way for making Python able to realistically manage thousands of
|
|
separate "threads" of activity (vs. today's limit of perhaps dozens
|
|
of separate threads of activity).
|
|
|
|
Another justification for this PEP (explored in :pep:`220`) is that
|
|
coroutines and generators often allow a more direct expression of
|
|
an algorithm than is possible in today's Python.
|
|
|
|
|
|
Discussion
|
|
==========
|
|
|
|
The first thing to note is that Python, while it mingles
|
|
interpreter data (normal C stack usage) with Python data (the
|
|
state of the interpreted program) on the stack, the two are
|
|
logically separate. They just happen to use the same stack.
|
|
|
|
A real thread gets something approaching a process-sized stack
|
|
because the implementation has no way of knowing how much stack
|
|
space the thread will require. The stack space required for an
|
|
individual frame is likely to be reasonable, but stack switching
|
|
is an arcane and non-portable process, not supported by C.
|
|
|
|
Once Python stops putting Python data on the C stack, however,
|
|
stack switching becomes easy.
|
|
|
|
The fundamental approach of the PEP is based on these two
|
|
ideas. First, separate C's stack usage from Python's stack
|
|
usage. Secondly, associate with each frame enough stack space to
|
|
handle that frame's execution.
|
|
|
|
In the normal usage, Stackless Python has a normal stack
|
|
structure, except that it is broken into chunks. But in the
|
|
presence of a coroutine / microthread extension, this same
|
|
mechanism supports a stack with a tree structure. That is, an
|
|
extension can support transfers of control between frames outside
|
|
the normal "call / return" path.
|
|
|
|
|
|
Problems
|
|
========
|
|
|
|
The major difficulty with this approach is C calling Python. The
|
|
problem is that the C stack now holds a nested execution of the
|
|
byte-code interpreter. In that situation, a coroutine /
|
|
microthread extension cannot be permitted to transfer control to a
|
|
frame in a different invocation of the byte-code interpreter. If a
|
|
frame were to complete and exit back to C from the wrong
|
|
interpreter, the C stack could be trashed.
|
|
|
|
The ideal solution is to create a mechanism where nested
|
|
executions of the byte code interpreter are never needed. The easy
|
|
solution is for the coroutine / microthread extension(s) to
|
|
recognize the situation and refuse to allow transfers outside the
|
|
current invocation.
|
|
|
|
We can categorize code that involves C calling Python into two
|
|
camps: Python's implementation, and C extensions. And hopefully we
|
|
can offer a compromise: Python's internal usage (and C extension
|
|
writers who want to go to the effort) will no longer use a nested
|
|
invocation of the interpreter. Extensions which do not go to the
|
|
effort will still be safe, but will not play well with coroutines
|
|
/ microthreads.
|
|
|
|
Generally, when a recursive call is transformed into a loop, a bit
|
|
of extra bookkeeping is required. The loop will need to keep its
|
|
own "stack" of arguments and results since the real stack can now
|
|
only hold the most recent. The code will be more verbose, because
|
|
it's not quite as obvious when we're done. While Stackless is not
|
|
implemented this way, it has to deal with the same issues.
|
|
|
|
In normal Python, ``PyEval_EvalCode`` is used to build a frame and
|
|
execute it. Stackless Python introduces the concept of a
|
|
``FrameDispatcher``. Like ``PyEval_EvalCode``, it executes one frame. But
|
|
the interpreter may signal the ``FrameDispatcher`` that a new frame
|
|
has been swapped in, and the new frame should be executed. When a
|
|
frame completes, the ``FrameDispatcher`` follows the back pointer to
|
|
resume the "calling" frame.
|
|
|
|
So Stackless transforms recursions into a loop, but it is not the
|
|
``FrameDispatcher`` that manages the frames. This is done by the
|
|
interpreter (or an extension that knows what it's doing).
|
|
|
|
The general idea is that where C code needs to execute Python
|
|
code, it creates a frame for the Python code, setting its back
|
|
pointer to the current frame. Then it swaps in the frame, signals
|
|
the ``FrameDispatcher`` and gets out of the way. The C stack is now
|
|
clean - the Python code can transfer control to any other frame
|
|
(if an extension gives it the means to do so).
|
|
|
|
In the vanilla case, this magic can be hidden from the programmer
|
|
(even, in most cases, from the Python-internals programmer). Many
|
|
situations present another level of difficulty, however.
|
|
|
|
The map builtin function involves two obstacles to this
|
|
approach. It cannot simply construct a frame and get out of the
|
|
way, not just because there's a loop involved, but each pass
|
|
through the loop requires some "post" processing. In order to play
|
|
well with others, Stackless constructs a frame object for map
|
|
itself.
|
|
|
|
Most recursions of the interpreter are not this complex, but
|
|
fairly frequently, some "post" operations are required. Stackless
|
|
does not fix these situations because of the amount of code changes
|
|
required. Instead, Stackless prohibits transfers out of a nested
|
|
interpreter. While not ideal (and sometimes puzzling), this
|
|
limitation is hardly crippling.
|
|
|
|
|
|
Advantages
|
|
==========
|
|
|
|
For normal Python, the advantage to this approach is that C stack
|
|
usage becomes much smaller and more predictable. Unbounded
|
|
recursion in Python code becomes a memory error, instead of a
|
|
stack error (and thus, in non-Cupertino operating systems,
|
|
something that can be recovered from). The price, of course, is
|
|
the added complexity that comes from transforming recursions of
|
|
the byte-code interpreter loop into a higher order loop (and the
|
|
attendant bookkeeping involved).
|
|
|
|
The big advantage comes from realizing that the Python stack is
|
|
really a tree, and the frame dispatcher can transfer control
|
|
freely between leaf nodes of the tree, thus allowing things like
|
|
microthreads and coroutines.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] http://www.stackless.com
|
|
.. [2] http://web.archive.org/web/20000815070602/http://world.std.com/~wware/uthread.html
|
|
.. [3] Demo/threads/Generator.py in the source distribution
|
|
.. [4] http://www.stackless.com/coroutines.tim.peters.html
|