python-peps/pep-0219.txt

PEP: 219
Title: Stackless Python
Version: $Revision$
Author: gmcm@hypernet.com (Gordon McMillan)
Status: Draft
Type: Standards Track
Python-Version: 2.1 Created: 14-Aug-2000
Post-History:


Introduction

    This PEP discusses changes required to core Python in order to
    efficiently support generators, microthreads and coroutines. It
    is related to PEP 220, which describes how Python should be extended
    to support these facilities. The focus of this PEP is strictly on
    the changes required to allow these extensions to work.

    While these PEPs are based on Christian Tismer's Stackless[1]
    implementation, they do not regard Stackless as a reference
    implementation.  Stackless (with an extension module) implements
    continuations, and from continuations one can implement coroutines,
    microthreads (as has been done by Will Ware[2]) and generators. But
    in more that a year, no one has found any other productive use of
    continuations, so there seems to be no demand for their support.

    However, Stackless support for continuations is a relatively minor
    piece of the implementation, so one might regard it as "a" reference
    implementation (rather than "the" reference implementation).

Background

    Generators and coroutines have been implmented in a number of languages in
    a number of ways. Indeed, Tim Peters has done pure Python implementations
    of generators[3] and coroutines[4] using threads (and a thread-based
    coroutine implementation exists for Java). However, the horrendous
    overhead of a thread-based implementation severely limits the usefulness
    of this approach.

    Microthreads (a.k.a "green" or "user" threads) and coroutines involve
    transfers of control that are difficult to accomodate in a language
    implementation based on a single stack. (Generators can be done on a
    single stack, but they can also be regarded as a very simple case of
    coroutines.)

    Real threads allocate a full-sized stack for each thread of control, and
    this is the major source of overhead. However, coroutines and microthreads
    can be implemented in Python in a way that involves almost no overhead.
    This PEP, therefor, offers a way for making Python able to realistically
    manage thousands of separate "threads" of activity (vs. todays limit of
    perhaps dozens of separate threads of activity).

    Another justification for this PEP (explored in PEP 220) is that
    coroutines and generators often allow a more direct expression of an
    algorithm than is possible in today's Python.

Discussion

    The first thing to note is that Python, while it mingles interpreter data
    (normal C stack usage) with Python data (the state of the interpreted
    program) on the stack, the two are logically separate. They just happen to
    use the same stack.

    A real thread gets something approaching a process-sized stack because the
    implementation has no way of knowing how much stack space the thread will
    require. The stack space required for an individual frame is likely to be
    reasonable, but stack switching is an arcane and non-portable process,
    not supported by C.

    Once Python stops putting Python data on the C stack, however, stack
    switching becomes easy.

    The fundamental approach of the PEP is based on these two ideas. First,
    separate C's stack usage from Python's stack usage. Secondly, associate
    with each frame enough stack space to handle that frame's execution.

    In the normal usage, Stackless Python has a normal stack structure,
    except that it is broken into chunks. But in the presence of a
    coroutine / microthread extension, this same mechanism supports a stack
    with a tree structure.  That is, an extension can support transfers of
    control between frames outside the normal "call / return" path.

Problems

    The major difficulty with this approach is C calling Python. The problem
    is that the C stack now holds a nested execution of the byte-code
    interpreter. In that situation, a coroutine / microthread extension cannot
    be permitted to transfer control to a frame in a different invocation of the
    byte-code interpreter. If a frame were to complete and exit back to C from
    the wrong interpreter, the C stack could be trashed.

    The ideal solution is to create a mechanism where nested executions of the
    byte code interpreter are never needed. The easy solution is for the
    coroutine / microthread extension(s) to recognize the situation and refuse
    to allow transfers outside the current invocation.

    We can categorize code that involves C calling Python into two camps:
    Python's implementation, and C extensions. And hopefully we can offer a
    compromise: Python's internal usage (and C extension writers who want to
    go to the effort) will no longer use a nested invocation of the
    interpreter. Extensions which do not go to the effort will still be
    safe, but will not play well with coroutines / microthreads.

    Generally, when a recursive call is transformed into a loop, a bit of
    extra bookkeeping is required. The loop will need to keep it's own
    "stack" of arguments and results since the real stack can now only hold
    the most recent. The code will be more verbose, because it's not quite
    as obvious when we're done. While Stackless is not implemented this way,
    it has to deal with the same issues.

    In normal Python, PyEval_EvalCode is used to build a frame and execute
    it. Stackless Python introduces the concept of a FrameDispatcher. Like
    PyEval_EvalCode, it executes one frame. But the interpreter may signal
    the FrameDispatcher that a new frame has been swapped in, and the new
    frame should be executed. When a frame completes, the FrameDispatcher
    follows the back pointer to resume the "calling" frame.

    So Stackless transforms recursions into a loop, but it is not the
    FrameDispatcher that manages the frames. This is done by the interpreter
    (or an extension that knows what it's doing).

    The general idea is that where C code needs to execute Python code, it
    creates a frame for the Python code, setting its back pointer to the
    current frame. Then it swaps in the frame, signals the FrameDispatcher
    and gets out of the way. The C stack is now clean - the Python code can
    transfer control to any other frame (if an extension gives it the means
    to do so).

    In the vanilla case, this magic can be hidden from the programmer (even,
    in most cases, from the Python-internals programmer). Many situations
    present another level of difficulty, however.

    The map builtin function involves two obstacles to this approach. It
    cannot simply construct a frame and get out of the way, not just because
    there's a loop involved, but each pass through the loop requires some
    "post" processing. In order to play well with others, Stackless
    constructs a frame object for map itself.

    Most recursions of the interpreter are not this complex, but fairly
    frequently, some "post" operations are required. Stackless does not
    fix these situations because of amount of code changes required. Instead,
    Stackless prohibits transfers out of a nested interpreter. While not
    ideal (and sometimes puzzling), this limitation is hardly crippling.


Advantages

    For normal Python, the advantage to this approach is that C stack usage
    becomes much smaller and more predictable. Unbounded recursion in Python
    code becomes a memory error, instead of a stack error (and thus, in
    non-Cupertino operating systems, something that can be recovered from).
    The price, of course, is the added complexity that comes from transforming
    recursions of the byte-code interpreter loop into a higher order loop
    (and the attendant bookkeeping involved).

    The big advantage comes from realizing that the Python stack is really
    a tree, and the frame dispatcher can transfer control freely between
    leaf nodes of the tree, thus allowing things like microthreads and
    coroutines.

References

[1] www.stackless.com
[2] http://world.std.com/~wware/uthread.html
[3] Demo/threads/Generator.py in the source distribution
[4] http://www.stackless.com/coroutines.tim.peters.html


Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: