diff --git a/pep-0219.txt b/pep-0219.txt index 5df23eb9e..57daa7851 100644 --- a/pep-0219.txt +++ b/pep-0219.txt @@ -4,166 +4,184 @@ Version: $Revision$ Author: gmcm@hypernet.com (Gordon McMillan) Status: Draft Type: Standards Track -Python-Version: 2.1 Created: 14-Aug-2000 +Created: 14-Aug-2000 +Python-Version: 2.1 Post-History: Introduction This PEP discusses changes required to core Python in order to - efficiently support generators, microthreads and coroutines. It - is related to PEP 220, which describes how Python should be extended + efficiently support generators, microthreads and coroutines. It is + related to PEP 220, which describes how Python should be extended to support these facilities. The focus of this PEP is strictly on the changes required to allow these extensions to work. - While these PEPs are based on Christian Tismer's Stackless[1] - implementation, they do not regard Stackless as a reference - implementation. Stackless (with an extension module) implements - continuations, and from continuations one can implement coroutines, - microthreads (as has been done by Will Ware[2]) and generators. But - in more that a year, no one has found any other productive use of - continuations, so there seems to be no demand for their support. + While these PEPs are based on Christian Tismer's Stackless[1] + implementation, they do not regard Stackless as a reference + implementation. Stackless (with an extension module) implements + continuations, and from continuations one can implement + coroutines, microthreads (as has been done by Will Ware[2]) and + generators. But in more that a year, no one has found any other + productive use of continuations, so there seems to be no demand + for their support. However, Stackless support for continuations is a relatively minor - piece of the implementation, so one might regard it as "a" reference - implementation (rather than "the" reference implementation). + piece of the implementation, so one might regard it as "a" + reference implementation (rather than "the" reference + implementation). + Background - Generators and coroutines have been implmented in a number of languages in - a number of ways. Indeed, Tim Peters has done pure Python implementations - of generators[3] and coroutines[4] using threads (and a thread-based - coroutine implementation exists for Java). However, the horrendous - overhead of a thread-based implementation severely limits the usefulness - of this approach. + Generators and coroutines have been implemented in a number of + languages in a number of ways. Indeed, Tim Peters has done pure + Python implementations of generators[3] and coroutines[4] using + threads (and a thread-based coroutine implementation exists for + Java). However, the horrendous overhead of a thread-based + implementation severely limits the usefulness of this approach. - Microthreads (a.k.a "green" or "user" threads) and coroutines involve - transfers of control that are difficult to accomodate in a language - implementation based on a single stack. (Generators can be done on a - single stack, but they can also be regarded as a very simple case of - coroutines.) + Microthreads (a.k.a "green" or "user" threads) and coroutines + involve transfers of control that are difficult to accommodate in + a language implementation based on a single stack. (Generators can + be done on a single stack, but they can also be regarded as a very + simple case of coroutines.) - Real threads allocate a full-sized stack for each thread of control, and - this is the major source of overhead. However, coroutines and microthreads - can be implemented in Python in a way that involves almost no overhead. - This PEP, therefor, offers a way for making Python able to realistically - manage thousands of separate "threads" of activity (vs. todays limit of - perhaps dozens of separate threads of activity). + Real threads allocate a full-sized stack for each thread of + control, and this is the major source of overhead. However, + coroutines and microthreads can be implemented in Python in a way + that involves almost no overhead. This PEP, therefor, offers a + way for making Python able to realistically manage thousands of + separate "threads" of activity (vs. todays limit of perhaps dozens + of separate threads of activity). + + Another justification for this PEP (explored in PEP 220) is that + coroutines and generators often allow a more direct expression of + an algorithm than is possible in today's Python. - Another justification for this PEP (explored in PEP 220) is that - coroutines and generators often allow a more direct expression of an - algorithm than is possible in today's Python. Discussion - The first thing to note is that Python, while it mingles interpreter data - (normal C stack usage) with Python data (the state of the interpreted - program) on the stack, the two are logically separate. They just happen to - use the same stack. + The first thing to note is that Python, while it mingles + interpreter data (normal C stack usage) with Python data (the + state of the interpreted program) on the stack, the two are + logically separate. They just happen to use the same stack. - A real thread gets something approaching a process-sized stack because the - implementation has no way of knowing how much stack space the thread will - require. The stack space required for an individual frame is likely to be - reasonable, but stack switching is an arcane and non-portable process, - not supported by C. + A real thread gets something approaching a process-sized stack + because the implementation has no way of knowing how much stack + space the thread will require. The stack space required for an + individual frame is likely to be reasonable, but stack switching + is an arcane and non-portable process, not supported by C. - Once Python stops putting Python data on the C stack, however, stack - switching becomes easy. + Once Python stops putting Python data on the C stack, however, + stack switching becomes easy. - The fundamental approach of the PEP is based on these two ideas. First, - separate C's stack usage from Python's stack usage. Secondly, associate - with each frame enough stack space to handle that frame's execution. + The fundamental approach of the PEP is based on these two + ideas. First, separate C's stack usage from Python's stack + usage. Secondly, associate with each frame enough stack space to + handle that frame's execution. + + In the normal usage, Stackless Python has a normal stack + structure, except that it is broken into chunks. But in the + presence of a coroutine / microthread extension, this same + mechanism supports a stack with a tree structure. That is, an + extension can support transfers of control between frames outside + the normal "call / return" path. - In the normal usage, Stackless Python has a normal stack structure, - except that it is broken into chunks. But in the presence of a - coroutine / microthread extension, this same mechanism supports a stack - with a tree structure. That is, an extension can support transfers of - control between frames outside the normal "call / return" path. Problems - The major difficulty with this approach is C calling Python. The problem - is that the C stack now holds a nested execution of the byte-code - interpreter. In that situation, a coroutine / microthread extension cannot - be permitted to transfer control to a frame in a different invocation of the - byte-code interpreter. If a frame were to complete and exit back to C from - the wrong interpreter, the C stack could be trashed. + The major difficulty with this approach is C calling Python. The + problem is that the C stack now holds a nested execution of the + byte-code interpreter. In that situation, a coroutine / + microthread extension cannot be permitted to transfer control to a + frame in a different invocation of the byte-code interpreter. If a + frame were to complete and exit back to C from the wrong + interpreter, the C stack could be trashed. - The ideal solution is to create a mechanism where nested executions of the - byte code interpreter are never needed. The easy solution is for the - coroutine / microthread extension(s) to recognize the situation and refuse - to allow transfers outside the current invocation. + The ideal solution is to create a mechanism where nested + executions of the byte code interpreter are never needed. The easy + solution is for the coroutine / microthread extension(s) to + recognize the situation and refuse to allow transfers outside the + current invocation. - We can categorize code that involves C calling Python into two camps: - Python's implementation, and C extensions. And hopefully we can offer a - compromise: Python's internal usage (and C extension writers who want to - go to the effort) will no longer use a nested invocation of the - interpreter. Extensions which do not go to the effort will still be - safe, but will not play well with coroutines / microthreads. + We can categorize code that involves C calling Python into two + camps: Python's implementation, and C extensions. And hopefully we + can offer a compromise: Python's internal usage (and C extension + writers who want to go to the effort) will no longer use a nested + invocation of the interpreter. Extensions which do not go to the + effort will still be safe, but will not play well with coroutines + / microthreads. - Generally, when a recursive call is transformed into a loop, a bit of - extra bookkeeping is required. The loop will need to keep it's own - "stack" of arguments and results since the real stack can now only hold - the most recent. The code will be more verbose, because it's not quite - as obvious when we're done. While Stackless is not implemented this way, - it has to deal with the same issues. + Generally, when a recursive call is transformed into a loop, a bit + of extra bookkeeping is required. The loop will need to keep it's + own "stack" of arguments and results since the real stack can now + only hold the most recent. The code will be more verbose, because + it's not quite as obvious when we're done. While Stackless is not + implemented this way, it has to deal with the same issues. - In normal Python, PyEval_EvalCode is used to build a frame and execute - it. Stackless Python introduces the concept of a FrameDispatcher. Like - PyEval_EvalCode, it executes one frame. But the interpreter may signal - the FrameDispatcher that a new frame has been swapped in, and the new - frame should be executed. When a frame completes, the FrameDispatcher - follows the back pointer to resume the "calling" frame. + In normal Python, PyEval_EvalCode is used to build a frame and + execute it. Stackless Python introduces the concept of a + FrameDispatcher. Like PyEval_EvalCode, it executes one frame. But + the interpreter may signal the FrameDispatcher that a new frame + has been swapped in, and the new frame should be executed. When a + frame completes, the FrameDispatcher follows the back pointer to + resume the "calling" frame. - So Stackless transforms recursions into a loop, but it is not the - FrameDispatcher that manages the frames. This is done by the interpreter - (or an extension that knows what it's doing). + So Stackless transforms recursions into a loop, but it is not the + FrameDispatcher that manages the frames. This is done by the + interpreter (or an extension that knows what it's doing). - The general idea is that where C code needs to execute Python code, it - creates a frame for the Python code, setting its back pointer to the - current frame. Then it swaps in the frame, signals the FrameDispatcher - and gets out of the way. The C stack is now clean - the Python code can - transfer control to any other frame (if an extension gives it the means - to do so). + The general idea is that where C code needs to execute Python + code, it creates a frame for the Python code, setting its back + pointer to the current frame. Then it swaps in the frame, signals + the FrameDispatcher and gets out of the way. The C stack is now + clean - the Python code can transfer control to any other frame + (if an extension gives it the means to do so). - In the vanilla case, this magic can be hidden from the programmer (even, - in most cases, from the Python-internals programmer). Many situations - present another level of difficulty, however. + In the vanilla case, this magic can be hidden from the programmer + (even, in most cases, from the Python-internals programmer). Many + situations present another level of difficulty, however. - The map builtin function involves two obstacles to this approach. It - cannot simply construct a frame and get out of the way, not just because - there's a loop involved, but each pass through the loop requires some - "post" processing. In order to play well with others, Stackless - constructs a frame object for map itself. + The map builtin function involves two obstacles to this + approach. It cannot simply construct a frame and get out of the + way, not just because there's a loop involved, but each pass + through the loop requires some "post" processing. In order to play + well with others, Stackless constructs a frame object for map + itself. - Most recursions of the interpreter are not this complex, but fairly - frequently, some "post" operations are required. Stackless does not - fix these situations because of amount of code changes required. Instead, - Stackless prohibits transfers out of a nested interpreter. While not - ideal (and sometimes puzzling), this limitation is hardly crippling. + Most recursions of the interpreter are not this complex, but + fairly frequently, some "post" operations are required. Stackless + does not fix these situations because of amount of code changes + required. Instead, Stackless prohibits transfers out of a nested + interpreter. While not ideal (and sometimes puzzling), this + limitation is hardly crippling. Advantages - For normal Python, the advantage to this approach is that C stack usage - becomes much smaller and more predictable. Unbounded recursion in Python - code becomes a memory error, instead of a stack error (and thus, in - non-Cupertino operating systems, something that can be recovered from). - The price, of course, is the added complexity that comes from transforming - recursions of the byte-code interpreter loop into a higher order loop - (and the attendant bookkeeping involved). + For normal Python, the advantage to this approach is that C stack + usage becomes much smaller and more predictable. Unbounded + recursion in Python code becomes a memory error, instead of a + stack error (and thus, in non-Cupertino operating systems, + something that can be recovered from). The price, of course, is + the added complexity that comes from transforming recursions of + the byte-code interpreter loop into a higher order loop (and the + attendant bookkeeping involved). + + The big advantage comes from realizing that the Python stack is + really a tree, and the frame dispatcher can transfer control + freely between leaf nodes of the tree, thus allowing things like + microthreads and coroutines. - The big advantage comes from realizing that the Python stack is really - a tree, and the frame dispatcher can transfer control freely between - leaf nodes of the tree, thus allowing things like microthreads and - coroutines. References -[1] www.stackless.com -[2] http://world.std.com/~wware/uthread.html -[3] Demo/threads/Generator.py in the source distribution -[4] http://www.stackless.com/coroutines.tim.peters.html + [1] www.stackless.com + [2] http://world.std.com/~wware/uthread.html + [3] Demo/threads/Generator.py in the source distribution + [4] http://www.stackless.com/coroutines.tim.peters.html + Local Variables: