1731 lines
68 KiB
ReStructuredText
1731 lines
68 KiB
ReStructuredText
PEP: 554
|
||
Title: Multiple Interpreters in the Stdlib
|
||
Author: Eric Snow <ericsnowcurrently@gmail.com>
|
||
BDFL-Delegate: Antoine Pitrou <antoine@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 05-Sep-2017
|
||
Python-Version: 3.10
|
||
Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017,
|
||
09-May-2018, 20-Apr-2020, 04-May-2020
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
CPython has supported multiple interpreters in the same process (AKA
|
||
"subinterpreters") since version 1.5 (1997). The feature has been
|
||
available via the C-API. [c-api]_ Subinterpreters operate in
|
||
`relative isolation from one another <Interpreter Isolation_>`_, which
|
||
facilitates novel alternative approaches to
|
||
`concurrency <Concurrency_>`_.
|
||
|
||
This proposal introduces the stdlib ``interpreters`` module. The module
|
||
will be `provisional <Provisional Status_>`_. It exposes the basic
|
||
functionality of subinterpreters already provided by the C-API, along
|
||
with new (basic) functionality for sharing data between interpreters.
|
||
|
||
|
||
A Disclaimer about the GIL
|
||
==========================
|
||
|
||
To avoid any confusion up front: This PEP is unrelated to any efforts
|
||
to stop sharing the GIL between subinterpreters. At most this proposal
|
||
will allow users to take advantage of any results of work on the GIL.
|
||
The position here is that exposing subinterpreters to Python code is
|
||
worth doing, even if they still share the GIL.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
The ``interpreters`` module will be added to the stdlib. To help
|
||
authors of extension modules, a new page will be added to the
|
||
`Extending Python <extension-docs_>`_ docs. More information on both
|
||
is found in the immediately following sections.
|
||
|
||
The "interpreters" Module
|
||
-------------------------
|
||
|
||
The ``interpreters`` module will
|
||
provide a high-level interface to subinterpreters and wrap a new
|
||
low-level ``_interpreters`` (in the same way as the ``threading``
|
||
module). See the `Examples`_ section for concrete usage and use cases.
|
||
|
||
Along with exposing the existing (in CPython) subinterpreter support,
|
||
the module will also provide a mechanism for sharing data between
|
||
interpreters. This mechanism centers around "channels", which are
|
||
similar to queues and pipes.
|
||
|
||
Note that *objects* are not shared between interpreters since they are
|
||
tied to the interpreter in which they were created. Instead, the
|
||
objects' *data* is passed between interpreters. See the `Shared data`_
|
||
section for more details about sharing between interpreters.
|
||
|
||
At first only the following types will be supported for sharing:
|
||
|
||
* None
|
||
* bytes
|
||
* str
|
||
* int
|
||
* PEP 554 channels
|
||
|
||
Support for other basic types (e.g. bool, float, Ellipsis) will be added later.
|
||
|
||
API summary for interpreters module
|
||
-----------------------------------
|
||
|
||
Here is a summary of the API for the ``interpreters`` module. For a
|
||
more in-depth explanation of the proposed classes and functions, see
|
||
the `"interpreters" Module API`_ section below.
|
||
|
||
For creating and using interpreters:
|
||
|
||
+---------------------------------------------+----------------------------------------------+
|
||
| signature | description |
|
||
+=============================================+==============================================+
|
||
| ``list_all() -> [Interpreter]`` | Get all existing interpreters. |
|
||
+---------------------------------------------+----------------------------------------------+
|
||
| ``get_current() -> Interpreter`` | Get the currently running interpreter. |
|
||
+---------------------------------------------+----------------------------------------------+
|
||
| ``get_main() -> Interpreter`` | Get the main interpreter. |
|
||
+---------------------------------------------+----------------------------------------------+
|
||
| ``create(*, isolated=True) -> Interpreter`` | Initialize a new (idle) Python interpreter. |
|
||
+---------------------------------------------+----------------------------------------------+
|
||
|
||
|
|
||
|
||
+----------------------------------------+-----------------------------------------------------+
|
||
| signature | description |
|
||
+========================================+=====================================================+
|
||
| ``class Interpreter(id)`` | A single interpreter. |
|
||
+----------------------------------------+-----------------------------------------------------+
|
||
| ``.id`` | The interpreter's ID (read-only). |
|
||
+----------------------------------------+-----------------------------------------------------+
|
||
| ``.isolated`` | The interpreter's mode (read-only). |
|
||
+----------------------------------------+-----------------------------------------------------+
|
||
| ``.is_running() -> bool`` | Is the interpreter currently executing code? |
|
||
+----------------------------------------+-----------------------------------------------------+
|
||
| ``.close()`` | Finalize and destroy the interpreter. |
|
||
+----------------------------------------+-----------------------------------------------------+
|
||
| ``.run(src_str, /, *, channels=None)`` | | Run the given source code in the interpreter. |
|
||
| | | (This blocks the current thread until done.) |
|
||
+----------------------------------------+-----------------------------------------------------+
|
||
|
||
|
|
||
|
||
+--------------------+------------------+------------------------------------------------------+
|
||
| exception | base | description |
|
||
+====================+==================+======================================================+
|
||
| ``RunFailedError`` | ``RuntimeError`` | Interpreter.run() resulted in an uncaught exception. |
|
||
+--------------------+------------------+------------------------------------------------------+
|
||
|
||
For sharing data between interpreters:
|
||
|
||
+---------------------------------------------------------+--------------------------------------------+
|
||
| signature | description |
|
||
+=========================================================+============================================+
|
||
| ``is_shareable(obj) -> Bool`` | | Can the object's data be shared |
|
||
| | | between interpreters? |
|
||
+---------------------------------------------------------+--------------------------------------------+
|
||
| ``create_channel() -> (RecvChannel, SendChannel)`` | | Create a new channel for passing |
|
||
| | | data between interpreters. |
|
||
+---------------------------------------------------------+--------------------------------------------+
|
||
| ``list_all_channels() -> [(RecvChannel, SendChannel)]`` | Get all open channels. |
|
||
+---------------------------------------------------------+--------------------------------------------+
|
||
|
||
|
|
||
|
||
+------------------------------------------+-----------------------------------------------+
|
||
| signature | description |
|
||
+==========================================+===============================================+
|
||
| ``class RecvChannel(id)`` | The receiving end of a channel. |
|
||
+------------------------------------------+-----------------------------------------------+
|
||
| ``.id`` | The channel's unique ID. |
|
||
+------------------------------------------+-----------------------------------------------+
|
||
| ``.recv() -> object`` | | Get the next object from the channel, |
|
||
| | | and wait if none have been sent. |
|
||
+------------------------------------------+-----------------------------------------------+
|
||
| ``.recv_nowait(default=None) -> object`` | | Like recv(), but return the default |
|
||
| | | instead of waiting. |
|
||
+------------------------------------------+-----------------------------------------------+
|
||
|
||
|
|
||
|
||
+------------------------------+--------------------------------------------------+
|
||
| signature | description |
|
||
+==============================+==================================================+
|
||
| ``class SendChannel(id)`` | The sending end of a channel. |
|
||
+------------------------------+--------------------------------------------------+
|
||
| ``.id`` | The channel's unique ID. |
|
||
+------------------------------+--------------------------------------------------+
|
||
| ``.send(obj)`` | | Send the object (i.e. its data) to the |
|
||
| | | receiving end of the channel and wait. |
|
||
+------------------------------+--------------------------------------------------+
|
||
| ``.send_nowait(obj)`` | | Like send(), but return False if not received. |
|
||
+------------------------------+--------------------------------------------------+
|
||
|
||
|
|
||
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
| exception | base | description |
|
||
+==========================+========================+================================================+
|
||
| ``ChannelError`` | ``Exception`` | The base class for channel-related exceptions. |
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
| ``ChannelNotFoundError`` | ``ChannelError`` | The identified channel was not found. |
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
| ``ChannelEmptyError`` | ``ChannelError`` | The channel was unexpectedly empty. |
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
| ``ChannelNotEmptyError`` | ``ChannelError`` | The channel was unexpectedly not empty. |
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
| ``NotReceivedError`` | ``ChannelError`` | Nothing was waiting to receive a sent object. |
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
|
||
Help for Extension Module Maintainers
|
||
-------------------------------------
|
||
|
||
Many extension modules do not support use in subinterpreters yet. The
|
||
maintainers and users of such extension modules will both benefit when
|
||
they are updated to support subinterpreters. In the meantime users may
|
||
become confused by failures when using subinterpreters, which could
|
||
negatively impact extension maintainers. See `Concerns`_ below.
|
||
|
||
To mitigate that impact and accelerate compatibility, we will do the
|
||
following:
|
||
|
||
* be clear that extension modules are *not* required to support use in
|
||
subinterpreters
|
||
* raise ``ImportError`` when an incompatible (no PEP 489 support) module
|
||
is imported in a subinterpreter
|
||
* provide resources (e.g. docs) to help maintainers reach compatibility
|
||
* reach out to the maintainers of Cython and of the most used extension
|
||
modules (on PyPI) to get feedback and possibly provide assistance
|
||
|
||
|
||
Examples
|
||
========
|
||
|
||
Run isolated code
|
||
-----------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
print('before')
|
||
interp.run('print("during")')
|
||
print('after')
|
||
|
||
Run in a thread
|
||
---------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
def run():
|
||
interp.run('print("during")')
|
||
t = threading.Thread(target=run)
|
||
print('before')
|
||
t.start()
|
||
print('after')
|
||
|
||
Pre-populate an interpreter
|
||
---------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
interp.run(tw.dedent("""
|
||
import some_lib
|
||
import an_expensive_module
|
||
some_lib.set_up()
|
||
"""))
|
||
wait_for_request()
|
||
interp.run(tw.dedent("""
|
||
some_lib.handle_request()
|
||
"""))
|
||
|
||
Handling an exception
|
||
---------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
try:
|
||
interp.run(tw.dedent("""
|
||
raise KeyError
|
||
"""))
|
||
except interpreters.RunFailedError as exc:
|
||
print(f"got the error from the subinterpreter: {exc}")
|
||
|
||
Re-raising an exception
|
||
-----------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
try:
|
||
try:
|
||
interp.run(tw.dedent("""
|
||
raise KeyError
|
||
"""))
|
||
except interpreters.RunFailedError as exc:
|
||
raise exc.__cause__
|
||
except KeyError:
|
||
print("got a KeyError from the subinterpreter")
|
||
|
||
Note that this pattern is a candidate for later improvement.
|
||
|
||
Synchronize using a channel
|
||
---------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r, s = interpreters.create_channel()
|
||
def run():
|
||
interp.run(tw.dedent("""
|
||
reader.recv()
|
||
print("during")
|
||
"""),
|
||
shared=dict(
|
||
reader=r,
|
||
),
|
||
)
|
||
t = threading.Thread(target=run)
|
||
print('before')
|
||
t.start()
|
||
print('after')
|
||
s.send(b'')
|
||
|
||
Sharing a file descriptor
|
||
-------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r1, s1 = interpreters.create_channel()
|
||
r2, s2 = interpreters.create_channel()
|
||
def run():
|
||
interp.run(tw.dedent("""
|
||
fd = int.from_bytes(
|
||
reader.recv(), 'big')
|
||
for line in os.fdopen(fd):
|
||
print(line)
|
||
writer.send(b'')
|
||
"""),
|
||
shared=dict(
|
||
reader=r,
|
||
writer=s2,
|
||
),
|
||
)
|
||
t = threading.Thread(target=run)
|
||
t.start()
|
||
with open('spamspamspam') as infile:
|
||
fd = infile.fileno().to_bytes(1, 'big')
|
||
s.send(fd)
|
||
r.recv()
|
||
|
||
Passing objects via marshal
|
||
---------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r, s = interpreters.create_channel()
|
||
interp.run(tw.dedent("""
|
||
import marshal
|
||
"""),
|
||
shared=dict(
|
||
reader=r,
|
||
),
|
||
)
|
||
def run():
|
||
interp.run(tw.dedent("""
|
||
data = reader.recv()
|
||
while data:
|
||
obj = marshal.loads(data)
|
||
do_something(obj)
|
||
data = reader.recv()
|
||
"""))
|
||
t = threading.Thread(target=run)
|
||
t.start()
|
||
for obj in input:
|
||
data = marshal.dumps(obj)
|
||
s.send(data)
|
||
s.send(None)
|
||
|
||
Passing objects via pickle
|
||
--------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r, s = interpreters.create_channel()
|
||
interp.run(tw.dedent("""
|
||
import pickle
|
||
"""),
|
||
shared=dict(
|
||
reader=r,
|
||
),
|
||
)
|
||
def run():
|
||
interp.run(tw.dedent("""
|
||
data = reader.recv()
|
||
while data:
|
||
obj = pickle.loads(data)
|
||
do_something(obj)
|
||
data = reader.recv()
|
||
"""))
|
||
t = threading.Thread(target=run)
|
||
t.start()
|
||
for obj in input:
|
||
data = pickle.dumps(obj)
|
||
s.send(data)
|
||
s.send(None)
|
||
|
||
Running a module
|
||
----------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
main_module = mod_name
|
||
interp.run(f'import runpy; runpy.run_module({main_module!r})')
|
||
|
||
Running as script (including zip archives & directories)
|
||
--------------------------------------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
main_script = path_name
|
||
interp.run(f"import runpy; runpy.run_path({main_script!r})")
|
||
|
||
Running in a thread pool executor
|
||
---------------------------------
|
||
|
||
::
|
||
|
||
interps = [interpreters.create() for i in range(5)]
|
||
with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) as pool:
|
||
print('before')
|
||
for interp in interps:
|
||
pool.submit(interp.run, 'print("starting"); print("stopping")'
|
||
print('after')
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Running code in multiple interpreters provides a useful level of
|
||
isolation within the same process. This can be leveraged in a number
|
||
of ways. Furthermore, subinterpreters provide a well-defined framework
|
||
in which such isolation may extended.
|
||
|
||
Nick Coghlan explained some of the benefits through a comparison with
|
||
multi-processing [benefits]_::
|
||
|
||
[I] expect that communicating between subinterpreters is going
|
||
to end up looking an awful lot like communicating between
|
||
subprocesses via shared memory.
|
||
|
||
The trade-off between the two models will then be that one still
|
||
just looks like a single process from the point of view of the
|
||
outside world, and hence doesn't place any extra demands on the
|
||
underlying OS beyond those required to run CPython with a single
|
||
interpreter, while the other gives much stricter isolation
|
||
(including isolating C globals in extension modules), but also
|
||
demands much more from the OS when it comes to its IPC
|
||
capabilities.
|
||
|
||
The security risk profiles of the two approaches will also be quite
|
||
different, since using subinterpreters won't require deliberately
|
||
poking holes in the process isolation that operating systems give
|
||
you by default.
|
||
|
||
CPython has supported subinterpreters, with increasing levels of
|
||
support, since version 1.5. While the feature has the potential
|
||
to be a powerful tool, subinterpreters have suffered from neglect
|
||
because they are not available directly from Python. Exposing the
|
||
existing functionality in the stdlib will help reverse the situation.
|
||
|
||
This proposal is focused on enabling the fundamental capability of
|
||
multiple isolated interpreters in the same Python process. This is a
|
||
new area for Python so there is relative uncertainly about the best
|
||
tools to provide as companions to subinterpreters. Thus we minimize
|
||
the functionality we add in the proposal as much as possible.
|
||
|
||
Concerns
|
||
--------
|
||
|
||
* "subinterpreters are not worth the trouble"
|
||
|
||
Some have argued that subinterpreters do not add sufficient benefit
|
||
to justify making them an official part of Python. Adding features
|
||
to the language (or stdlib) has a cost in increasing the size of
|
||
the language. So an addition must pay for itself. In this case,
|
||
subinterpreters provide a novel concurrency model focused on isolated
|
||
threads of execution. Furthermore, they provide an opportunity for
|
||
changes in CPython that will allow simultaneous use of multiple CPU
|
||
cores (currently prevented by the GIL).
|
||
|
||
Alternatives to subinterpreters include threading, async, and
|
||
multiprocessing. Threading is limited by the GIL and async isn't
|
||
the right solution for every problem (nor for every person).
|
||
Multiprocessing is likewise valuable in some but not all situations.
|
||
Direct IPC (rather than via the multiprocessing module) provides
|
||
similar benefits but with the same caveat.
|
||
|
||
Notably, subinterpreters are not intended as a replacement for any of
|
||
the above. Certainly they overlap in some areas, but the benefits of
|
||
subinterpreters include isolation and (potentially) performance. In
|
||
particular, subinterpreters provide a direct route to an alternate
|
||
concurrency model (e.g. CSP) which has found success elsewhere and
|
||
will appeal to some Python users. That is the core value that the
|
||
``interpreters`` module will provide.
|
||
|
||
* "stdlib support for subinterpreters adds extra burden
|
||
on C extension authors"
|
||
|
||
In the `Interpreter Isolation`_ section below we identify ways in
|
||
which isolation in CPython's subinterpreters is incomplete. Most
|
||
notable is extension modules that use C globals to store internal
|
||
state. PEP 3121 and PEP 489 provide a solution for most of the
|
||
problem, but one still remains. [petr-c-ext]_ Until that is resolved
|
||
(see PEP 573), C extension authors will face extra difficulty
|
||
to support subinterpreters.
|
||
|
||
Consequently, projects that publish extension modules may face an
|
||
increased maintenance burden as their users start using subinterpreters,
|
||
where their modules may break. This situation is limited to modules
|
||
that use C globals (or use libraries that use C globals) to store
|
||
internal state. For numpy, the reported-bug rate is one every 6
|
||
months. [bug-rate]_
|
||
|
||
Ultimately this comes down to a question of how often it will be a
|
||
problem in practice: how many projects would be affected, how often
|
||
their users will be affected, what the additional maintenance burden
|
||
will be for projects, and what the overall benefit of subinterpreters
|
||
is to offset those costs. The position of this PEP is that the actual
|
||
extra maintenance burden will be small and well below the threshold at
|
||
which subinterpreters are worth it.
|
||
|
||
* "creating a new concurrency API deserves much more thought and
|
||
experimentation, so the new module shouldn't go into the stdlib
|
||
right away, if ever"
|
||
|
||
Introducing an API for a new concurrency model, like happened with
|
||
asyncio, is an extremely large project that requires a lot of careful
|
||
consideration. It is not something that can be done a simply as this
|
||
PEP proposes and likely deserves significant time on PyPI to mature.
|
||
(See `Nathaniel's post <nathaniel-asyncio>`_ on python-dev.)
|
||
|
||
However, this PEP does not propose any new concurrency API. At most
|
||
it exposes minimal tools (e.g. subinterpreters, channels) which may
|
||
be used to write code that follows patterns associated with (relatively)
|
||
new-to-Python `concurrency models <Concurrency_>`_. Those tools could
|
||
also be used as the basis for APIs for such concurrency models.
|
||
Again, this PEP does not propose any such API.
|
||
|
||
* "there is no point to exposing subinterpreters if they still share
|
||
the GIL"
|
||
* "the effort to make the GIL per-interpreter is disruptive and risky"
|
||
|
||
A common misconception is that this PEP also includes a promise that
|
||
subinterpreters will no longer share the GIL. When that is clarified,
|
||
the next question is "what is the point?". This is already answered
|
||
at length in this PEP. Just to be clear, the value lies in::
|
||
|
||
* increase exposure of the existing feature, which helps improve
|
||
the code health of the entire CPython runtime
|
||
* expose the (mostly) isolated execution of subinterpreters
|
||
* preparation for per-interpreter GIL
|
||
* encourage experimentation
|
||
|
||
* "data sharing can have a negative impact on cache performance
|
||
in multi-core scenarios"
|
||
|
||
(See [cache-line-ping-pong]_.)
|
||
|
||
This shouldn't be a problem for now as we have no immediate plans
|
||
to actually share data between interpreters, instead focusing
|
||
on copying.
|
||
|
||
|
||
About Subinterpreters
|
||
=====================
|
||
|
||
Concurrency
|
||
-----------
|
||
|
||
Concurrency is a challenging area of software development. Decades of
|
||
research and practice have led to a wide variety of concurrency models,
|
||
each with different goals. Most center on correctness and usability.
|
||
|
||
One class of concurrency models focuses on isolated threads of
|
||
execution that interoperate through some message passing scheme. A
|
||
notable example is `Communicating Sequential Processes`_ (CSP) (upon
|
||
which Go's concurrency is roughly based). The isolation inherent to
|
||
subinterpreters makes them well-suited to this approach.
|
||
|
||
Shared data
|
||
-----------
|
||
|
||
Subinterpreters are inherently isolated (with caveats explained below),
|
||
in contrast to threads. So the same communicate-via-shared-memory
|
||
approach doesn't work. Without an alternative, effective use of
|
||
concurrency via subinterpreters is significantly limited.
|
||
|
||
The key challenge here is that sharing objects between interpreters
|
||
faces complexity due to various constraints on object ownership,
|
||
visibility, and mutability. At a conceptual level it's easier to
|
||
reason about concurrency when objects only exist in one interpreter
|
||
at a time. At a technical level, CPython's current memory model
|
||
limits how Python *objects* may be shared safely between interpreters;
|
||
effectively objects are bound to the interpreter in which they were
|
||
created. Furthermore, the complexity of *object* sharing increases as
|
||
subinterpreters become more isolated, e.g. after GIL removal.
|
||
|
||
Consequently,the mechanism for sharing needs to be carefully considered.
|
||
There are a number of valid solutions, several of which may be
|
||
appropriate to support in Python. This proposal provides a single basic
|
||
solution: "channels". Ultimately, any other solution will look similar
|
||
to the proposed one, which will set the precedent. Note that the
|
||
implementation of ``Interpreter.run()`` will be done in a way that
|
||
allows for multiple solutions to coexist, but doing so is not
|
||
technically a part of the proposal here.
|
||
|
||
Regarding the proposed solution, "channels", it is a basic, opt-in data
|
||
sharing mechanism that draws inspiration from pipes, queues, and CSP's
|
||
channels. [fifo]_
|
||
|
||
As simply described earlier by the API summary,
|
||
channels have two operations: send and receive. A key characteristic
|
||
of those operations is that channels transmit data derived from Python
|
||
objects rather than the objects themselves. When objects are sent,
|
||
their data is extracted. When the "object" is received in the other
|
||
interpreter, the data is converted back into an object owned by that
|
||
interpreter.
|
||
|
||
To make this work, the mutable shared state will be managed by the
|
||
Python runtime, not by any of the interpreters. Initially we will
|
||
support only one type of objects for shared state: the channels provided
|
||
by ``create_channel()``. Channels, in turn, will carefully manage
|
||
passing objects between interpreters.
|
||
|
||
This approach, including keeping the API minimal, helps us avoid further
|
||
exposing any underlying complexity to Python users. Along those same
|
||
lines, we will initially restrict the types that may be passed through
|
||
channels to the following:
|
||
|
||
* None
|
||
* bytes
|
||
* str
|
||
* int
|
||
* channels
|
||
|
||
Limiting the initial shareable types is a practical matter, reducing
|
||
the potential complexity of the initial implementation. There are a
|
||
number of strategies we may pursue in the future to expand supported
|
||
objects and object sharing strategies.
|
||
|
||
Interpreter Isolation
|
||
---------------------
|
||
|
||
CPython's interpreters are intended to be strictly isolated from each
|
||
other. Each interpreter has its own copy of all modules, classes,
|
||
functions, and variables. The same applies to state in C, including in
|
||
extension modules. The CPython C-API docs explain more. [caveats]_
|
||
|
||
However, there are ways in which interpreters share some state. First
|
||
of all, some process-global state remains shared:
|
||
|
||
* file descriptors
|
||
* builtin types (e.g. dict, bytes)
|
||
* singletons (e.g. None)
|
||
* underlying static module data (e.g. functions) for
|
||
builtin/extension/frozen modules
|
||
|
||
There are no plans to change this.
|
||
|
||
Second, some isolation is faulty due to bugs or implementations that did
|
||
not take subinterpreters into account. This includes things like
|
||
extension modules that rely on C globals. [cryptography]_ In these
|
||
cases bugs should be opened (some are already):
|
||
|
||
* readline module hook functions (http://bugs.python.org/issue4202)
|
||
* memory leaks on re-init (http://bugs.python.org/issue21387)
|
||
|
||
Finally, some potential isolation is missing due to the current design
|
||
of CPython. Improvements are currently going on to address gaps in this
|
||
area:
|
||
|
||
* GC is not run per-interpreter [global-gc]_
|
||
* at-exit handlers are not run per-interpreter [global-atexit]_
|
||
* extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
|
||
* interpreters share memory management (e.g. allocators, gc)
|
||
* interpreters share the GIL
|
||
|
||
Existing Usage
|
||
--------------
|
||
|
||
Subinterpreters are not a widely used feature. In fact, the only
|
||
documented cases of widespread usage are
|
||
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_,
|
||
`OpenStack Ceph <https://github.com/ceph/ceph/pull/14971>`_, and
|
||
`JEP <https://github.com/ninia/jep>`_. On the one hand, these cases
|
||
provide confidence that existing subinterpreter support is relatively
|
||
stable. On the other hand, there isn't much of a sample size from which
|
||
to judge the utility of the feature.
|
||
|
||
|
||
Provisional Status
|
||
==================
|
||
|
||
The new ``interpreters`` module will be added with "provisional" status
|
||
(see PEP 411). This allows Python users to experiment with the feature
|
||
and provide feedback while still allowing us to adjust to that feedback.
|
||
The module will be provisional in Python 3.9 and we will make a decision
|
||
before the 3.10 release whether to keep it provisional, graduate it, or
|
||
remove it. This PEP will be updated accordingly.
|
||
|
||
While the module is provisional, any changes to the API (or to behavior)
|
||
do not need to be reflected here, nor get approval by the BDFL-delegate.
|
||
However, such changes will still need to go through the normal processes
|
||
(BPO for smaller changes and python-dev/PEP for substantial ones).
|
||
|
||
|
||
Alternate Python Implementations
|
||
================================
|
||
|
||
I've solicited feedback from various Python implementors about support
|
||
for subinterpreters. Each has indicated that they would be able to
|
||
support subinterpreters (if they choose to) without a lot of
|
||
trouble. Here are the projects I contacted:
|
||
|
||
* jython ([jython]_)
|
||
* ironpython (personal correspondence)
|
||
* pypy (personal correspondence)
|
||
* micropython (personal correspondence)
|
||
|
||
|
||
.. _interpreters-list-all:
|
||
.. _interpreters-get-current:
|
||
.. _interpreters-create:
|
||
.. _interpreters-Interpreter:
|
||
|
||
"interpreters" Module API
|
||
=========================
|
||
|
||
The module provides the following functions::
|
||
|
||
list_all() -> [Interpreter]
|
||
|
||
Return a list of all existing interpreters.
|
||
|
||
get_current() => Interpreter
|
||
|
||
Return the currently running interpreter.
|
||
|
||
get_main() => Interpreter
|
||
|
||
Return the main interpreter. If the Python implementation
|
||
has no concept of a main interpreter then return None.
|
||
|
||
create(*, isolated=True) -> Interpreter
|
||
|
||
Initialize a new Python interpreter and return it. The
|
||
interpreter will be created in the current thread and will remain
|
||
idle until something is run in it. The interpreter may be used
|
||
in any thread and will run in whichever thread calls
|
||
``interp.run()``. See "Interpreter Isolated Mode" below for
|
||
an explanation of the "isolated" parameter.
|
||
|
||
|
||
The module also provides the following class::
|
||
|
||
class Interpreter(id):
|
||
|
||
id -> int:
|
||
|
||
The interpreter's ID. (read-only)
|
||
|
||
isolated -> bool:
|
||
|
||
Whether or not the interpreter is operating in "isolated" mode.
|
||
(read-only)
|
||
|
||
is_running() -> bool:
|
||
|
||
Return whether or not the interpreter is currently executing
|
||
code. Calling this on the current interpreter will always
|
||
return True.
|
||
|
||
close():
|
||
|
||
Finalize and destroy the interpreter.
|
||
|
||
This may not be called on an already running interpreter.
|
||
Doing so results in a RuntimeError.
|
||
|
||
run(source_str, /, *, channels=None):
|
||
|
||
Run the provided Python source code in the interpreter. If
|
||
the "channels" keyword argument is provided (and is a mapping
|
||
of attribute names to channels) then it is added to the
|
||
interpreter's execution namespace (the interpreter's
|
||
"__main__" module). If any of the values are not RecvChannel
|
||
or SendChannel instances then ValueError gets raised.
|
||
|
||
This may not be called on an already running interpreter.
|
||
Doing so results in a RuntimeError.
|
||
|
||
A "run()" call is similar to a function call. Once it
|
||
completes, the code that called "run()" continues executing
|
||
(in the original interpreter). Likewise, if there is any
|
||
uncaught exception then it effectively (see below) propagates
|
||
into the code where ``run()`` was called. However, unlike
|
||
function calls (but like threads), there is no return value.
|
||
If any value is needed, pass it out via a channel.
|
||
|
||
The big difference from functions is that "run()" executes
|
||
the code in an entirely different interpreter, with entirely
|
||
separate state. The state of the current interpreter in the
|
||
current OS thread is swapped out with the state of the target
|
||
interpreter (the one that will execute the code). When the
|
||
target finishes executing, the original interpreter gets
|
||
swapped back in and its execution resumes.
|
||
|
||
So calling "run()" will effectively cause the current Python
|
||
thread to pause. Sometimes you won't want that pause, in
|
||
which case you should make the "run()" call in another thread.
|
||
To do so, add a function that calls "run()" and then run that
|
||
function in a normal "threading.Thread".
|
||
|
||
Note that the interpreter's state is never reset, neither
|
||
before "run()" executes the code nor after. Thus the
|
||
interpreter state is preserved between calls to "run()".
|
||
This includes "sys.modules", the "builtins" module, and the
|
||
internal state of C extension modules.
|
||
|
||
Also note that "run()" executes in the namespace of the
|
||
"__main__" module, just like scripts, the REPL, "-m", and
|
||
"-c". Just as the interpreter's state is not ever reset, the
|
||
"__main__" module is never reset. You can imagine
|
||
concatenating the code from each "run()" call into one long
|
||
script. This is the same as how the REPL operates.
|
||
|
||
Supported code: source text.
|
||
|
||
Uncaught Exceptions
|
||
-------------------
|
||
|
||
Regarding uncaught exceptions in ``Interpreter.run()``, we noted that
|
||
they are "effectively" propagated into the code where ``run()`` was
|
||
called. To prevent leaking exceptions (and tracebacks) between
|
||
interpreters, we create a surrogate of the exception and its traceback
|
||
(see ``traceback.TracebackException``), set it to ``__cause__`` on a
|
||
new ``RunFailedError``, and raise that.
|
||
|
||
Raising (a proxy of) the exception directly is problematic since it's
|
||
harder to distinguish between an error in the ``run()`` call and an
|
||
uncaught exception from the subinterpreter.
|
||
|
||
.. _interpreters-is-shareable:
|
||
.. _interpreters-create-channel:
|
||
.. _interpreters-list-all-channels:
|
||
.. _interpreters-RecvChannel:
|
||
.. _interpreters-SendChannel:
|
||
|
||
API for sharing data
|
||
--------------------
|
||
|
||
Subinterpreters are less useful without a mechanism for sharing data
|
||
between them. Sharing actual Python objects between interpreters,
|
||
however, has enough potential problems that we are avoiding support
|
||
for that here. Instead, only minimum set of types will be supported.
|
||
Initially this will include ``None``, ``bytes``, ``str``, ``int``,
|
||
and channels. Further types may be supported later.
|
||
|
||
The ``interpreters`` module provides a function that users may call
|
||
to determine whether an object is shareable or not::
|
||
|
||
is_shareable(obj) -> bool:
|
||
|
||
Return True if the object may be shared between interpreters.
|
||
This does not necessarily mean that the actual objects will be
|
||
shared. Insead, it means that the objects' underlying data will
|
||
be shared in a cross-interpreter way, whether via a proxy, a
|
||
copy, or some other means.
|
||
|
||
This proposal provides two ways to share such objects between
|
||
interpreters.
|
||
|
||
First, channels may be passed to ``run()`` via the ``channels``
|
||
keyword argument, where they are effectively injected into the target
|
||
interpreter's ``__main__`` module. While passing arbitrary shareable
|
||
objects this way is possible, doing so is mainly intended for sharing
|
||
meta-objects (e.g. channels) between interpreters. It is less useful
|
||
to pass other objects (like ``bytes``) to ``run`` directly.
|
||
|
||
Second, the main mechanism for sharing objects (i.e. their data) between
|
||
interpreters is through channels. A channel is a simplex FIFO similar
|
||
to a pipe. The main difference is that channels can be associated with
|
||
zero or more interpreters on either end. Like queues, which are also
|
||
many-to-many, channels are buffered (though they also offer methods
|
||
with unbuffered semantics).
|
||
|
||
Python objects are not shared between interpreters. However, in some
|
||
cases data those objects wrap is actually shared and not just copied.
|
||
One example might be PEP 3118 buffers. In those cases the object in the
|
||
original interpreter is kept alive until the shared data in the other
|
||
interpreter is no longer used. Then object destruction can happen like
|
||
normal in the original interpreter, along with the previously shared
|
||
data.
|
||
|
||
The ``interpreters`` module provides the following functions related
|
||
to channels::
|
||
|
||
create_channel() -> (RecvChannel, SendChannel):
|
||
|
||
Create a new channel and return (recv, send), the RecvChannel
|
||
and SendChannel corresponding to the ends of the channel.
|
||
|
||
Both ends of the channel are supported "shared" objects (i.e.
|
||
may be safely shared by different interpreters. Thus they
|
||
may be passed as keyword arguments to "Interpreter.run()".
|
||
|
||
list_all_channels() -> [(RecvChannel, SendChannel)]:
|
||
|
||
Return a list of all open channel-end pairs.
|
||
|
||
The module also provides the following channel-related classes::
|
||
|
||
class RecvChannel(id):
|
||
|
||
The receiving end of a channel. An interpreter may use this to
|
||
receive objects from another interpreter. At first only a few
|
||
of the simple, immutable builtin types will be supported.
|
||
|
||
id -> int:
|
||
|
||
The channel's unique ID. This is shared with the "send" end.
|
||
|
||
recv():
|
||
|
||
Return the next object from the channel. If none have been
|
||
sent then wait until the next send.
|
||
|
||
At the least, the object will be equivalent to the sent object.
|
||
That will almost always mean the same type with the same data,
|
||
though it could also be a compatible proxy. Regardless, it may
|
||
use a copy of that data or actually share the data.
|
||
|
||
recv_nowait(default=None):
|
||
|
||
Return the next object from the channel. If none have been
|
||
sent then return the default. Otherwise, this is the same
|
||
as the "recv()" method.
|
||
|
||
|
||
class SendChannel(id):
|
||
|
||
The sending end of a channel. An interpreter may use this to
|
||
send objects to another interpreter. At first only a few of
|
||
the simple, immutable builtin types will be supported.
|
||
|
||
id -> int:
|
||
|
||
The channel's unique ID. This is shared with the "recv" end.
|
||
|
||
send(obj):
|
||
|
||
Send the object (i.e. its data) to the "recv" end of the
|
||
channel. Wait until the object is received. If the object
|
||
is not shareable then ValueError is raised.
|
||
|
||
send_nowait(obj):
|
||
|
||
Send the object to the "recv" end of the channel. This
|
||
behaves the same as "send()", except for the waiting part.
|
||
If no interpreter is currently receiving (waiting on the
|
||
other end) then queue the object and return False. Otherwise
|
||
return True.
|
||
|
||
Channel Lifespan
|
||
----------------
|
||
|
||
A channel is automatically closed and destroyed once there are no more
|
||
Python objects (e.g. ``RecvChannel`` and ``SendChannel``) referring
|
||
to it. So it is effectively triggered via garbage-collection of those
|
||
objects..
|
||
|
||
|
||
.. _isolated-mode:
|
||
|
||
Interpreter "Isolated" Mode
|
||
===========================
|
||
|
||
By default, every new interpreter created by ``interpreters.create()``
|
||
has specific restrictions on any code it runs. This includes the
|
||
following:
|
||
|
||
* importing an extension module fails if it does not implement the
|
||
PEP 489 API
|
||
* new threads of any kind are not allowed
|
||
* ``os.fork()`` is not allowed (so no ``multiprocessing``)
|
||
* ``os.exec*()``, AKA "fork+exec", is not allowed (so no ``subprocess``)
|
||
|
||
This represents the full "isolated" mode of subinterpreters. It is
|
||
applied when ``interpreters.create()`` is called with the "isolated"
|
||
keyword-only argument set to ``True`` (the default). If
|
||
``interpreters.create(isolated=False)`` is called then none of those
|
||
restrictions is applied.
|
||
|
||
One advantage of this approach is that it allows extension maintainers
|
||
to check subinterpreter compatibility before they implement the PEP 489
|
||
API. Also note that ``isolated=False`` represents the historical
|
||
behavior when using the existing subinterpreters C-API, thus providing
|
||
backward compatibility. For the existing C-API itself, the default
|
||
remains ``isolated=False``. The same is true for the "main" module, so
|
||
existing use of Python will not change.
|
||
|
||
We may choose to later loosen some of the above restrictions or provide
|
||
a way to enable/disable granular restrictions individually. Regardless,
|
||
requiring PEP 489 support from extension modules will always be a
|
||
default restriction.
|
||
|
||
|
||
Documentation
|
||
=============
|
||
|
||
The new stdlib docs page for the ``interpreters`` module will include
|
||
the following:
|
||
|
||
* (at the top) a clear note that subinterpreter support in extension
|
||
modules is not required
|
||
* some explanation about what subinterpreters are
|
||
* brief examples of how to use subinterpreters and channels
|
||
* a summary of the limitations of subinterpreters
|
||
* (for extension maintainers) a link to the resources for ensuring
|
||
subinterpreter compatibility
|
||
* much of the API information in this PEP
|
||
|
||
A separate page will be added to the docs for resources to help
|
||
extension maintainers ensure their modules can be used safely in
|
||
subinterpreters, under `Extending Python <extension-docs>`_. The page
|
||
will include the following information:
|
||
|
||
* a summary about subinterpreters (similar to the same in the new
|
||
``interpreters`` module page and in the C-API docs)
|
||
* an explanation of how extension modules can be impacted
|
||
* how to implement PEP 489 support
|
||
* how to move from global module state to per-interpreter
|
||
* how to take advantage of PEP 384 (heap types), PEP 3121
|
||
(module state), and PEP 573
|
||
* strategies for dealing with 3rd party C libraries that keep their
|
||
own subinterpreter-incompatible global state
|
||
|
||
Note that the documentation will play a large part in mitigating any
|
||
negative impact that the new ``interpreters`` module might have on
|
||
extension module maintainers.
|
||
|
||
Also, the ``ImportError`` for incompatible extgension modules will have
|
||
a message that clearly says it is due to missing subinterpreter
|
||
compatibility and that extensions are not required to provide it. This
|
||
will help set user expectations properly.
|
||
|
||
|
||
Deferred Functionality
|
||
======================
|
||
|
||
In the interest of keeping this proposal minimal, the following
|
||
functionality has been left out for future consideration. Note that
|
||
this is not a judgement against any of said capability, but rather a
|
||
deferment. That said, each is arguably valid.
|
||
|
||
Interpreter.call()
|
||
------------------
|
||
|
||
It would be convenient to run existing functions in subinterpreters
|
||
directly. ``Interpreter.run()`` could be adjusted to support this or
|
||
a ``call()`` method could be added::
|
||
|
||
Interpreter.call(f, *args, **kwargs)
|
||
|
||
This suffers from the same problem as sharing objects between
|
||
interpreters via queues. The minimal solution (running a source string)
|
||
is sufficient for us to get the feature out where it can be explored.
|
||
|
||
timeout arg to recv() and send()
|
||
--------------------------------
|
||
|
||
Typically functions that have a ``block`` argument also have a
|
||
``timeout`` argument. It sometimes makes sense to do likewise for
|
||
functions that otherwise block, like the channel ``recv()`` and
|
||
``send()`` methods. We can add it later if needed.
|
||
|
||
Interpreter.run_in_thread()
|
||
---------------------------
|
||
|
||
This method would make a ``run()`` call for you in a thread. Doing this
|
||
using only ``threading.Thread`` and ``run()`` is relatively trivial so
|
||
we've left it out.
|
||
|
||
Synchronization Primitives
|
||
--------------------------
|
||
|
||
The ``threading`` module provides a number of synchronization primitives
|
||
for coordinating concurrent operations. This is especially necessary
|
||
due to the shared-state nature of threading. In contrast,
|
||
subinterpreters do not share state. Data sharing is restricted to
|
||
channels, which do away with the need for explicit synchronization. If
|
||
any sort of opt-in shared state support is added to subinterpreters in
|
||
the future, that same effort can introduce synchronization primitives
|
||
to meet that need.
|
||
|
||
CSP Library
|
||
-----------
|
||
|
||
A ``csp`` module would not be a large step away from the functionality
|
||
provided by this PEP. However, adding such a module is outside the
|
||
minimalist goals of this proposal.
|
||
|
||
Syntactic Support
|
||
-----------------
|
||
|
||
The ``Go`` language provides a concurrency model based on CSP, so
|
||
it's similar to the concurrency model that subinterpreters support.
|
||
However, ``Go`` also provides syntactic support, as well several builtin
|
||
concurrency primitives, to make concurrency a first-class feature.
|
||
Conceivably, similar syntactic (and builtin) support could be added to
|
||
Python using subinterpreters. However, that is *way* outside the scope
|
||
of this PEP!
|
||
|
||
Multiprocessing
|
||
---------------
|
||
|
||
The ``multiprocessing`` module could support subinterpreters in the same
|
||
way it supports threads and processes. In fact, the module's
|
||
maintainer, Davin Potts, has indicated this is a reasonable feature
|
||
request. However, it is outside the narrow scope of this PEP.
|
||
|
||
C-extension opt-in/opt-out
|
||
--------------------------
|
||
|
||
By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily
|
||
add a mechanism by which C-extension modules could opt out of support
|
||
for subinterpreters. Then the import machinery, when operating in
|
||
a subinterpreter, would need to check the module for support. It would
|
||
raise an ImportError if unsupported.
|
||
|
||
Alternately we could support opting in to subinterpreter support.
|
||
However, that would probably exclude many more modules (unnecessarily)
|
||
than the opt-out approach. Also, note that PEP 489 defined that an
|
||
extension's use of the PEP's machinery implies support for
|
||
subinterpreters.
|
||
|
||
The scope of adding the ModuleDef slot and fixing up the import
|
||
machinery is non-trivial, but could be worth it. It all depends on
|
||
how many extension modules break under subinterpreters. Given that
|
||
there are relatively few cases we know of through mod_wsgi, we can
|
||
leave this for later.
|
||
|
||
Poisoning channels
|
||
------------------
|
||
|
||
CSP has the concept of poisoning a channel. Once a channel has been
|
||
poisoned, any ``send()`` or ``recv()`` call on it would raise a special
|
||
exception, effectively ending execution in the interpreter that tried
|
||
to use the poisoned channel.
|
||
|
||
This could be accomplished by adding a ``poison()`` method to both ends
|
||
of the channel. The ``close()`` method can be used in this way
|
||
(mostly), but these semantics are relatively specialized and can wait.
|
||
|
||
Resetting __main__
|
||
------------------
|
||
|
||
As proposed, every call to ``Interpreter.run()`` will execute in the
|
||
namespace of the interpreter's existing ``__main__`` module. This means
|
||
that data persists there between ``run()`` calls. Sometimes this isn't
|
||
desirable and you want to execute in a fresh ``__main__``. Also,
|
||
you don't necessarily want to leak objects there that you aren't using
|
||
any more.
|
||
|
||
Note that the following won't work right because it will clear too much
|
||
(e.g. ``__name__`` and the other "__dunder__" attributes::
|
||
|
||
interp.run('globals().clear()')
|
||
|
||
Possible solutions include:
|
||
|
||
* a ``create()`` arg to indicate resetting ``__main__`` after each
|
||
``run`` call
|
||
* an ``Interpreter.reset_main`` flag to support opting in or out
|
||
after the fact
|
||
* an ``Interpreter.reset_main()`` method to opt in when desired
|
||
* ``importlib.util.reset_globals()`` [reset_globals]_
|
||
|
||
Also note that resetting ``__main__`` does nothing about state stored
|
||
in other modules. So any solution would have to be clear about the
|
||
scope of what is being reset. Conceivably we could invent a mechanism
|
||
by which any (or every) module could be reset, unlike ``reload()``
|
||
which does not clear the module before loading into it. Regardless,
|
||
since ``__main__`` is the execution namespace of the interpreter,
|
||
resetting it has a much more direct correlation to interpreters and
|
||
their dynamic state than does resetting other modules. So a more
|
||
generic module reset mechanism may prove unnecessary.
|
||
|
||
This isn't a critical feature initially. It can wait until later
|
||
if desirable.
|
||
|
||
Resetting an interpreter's state
|
||
--------------------------------
|
||
|
||
It may be nice to re-use an existing subinterpreter instead of
|
||
spinning up a new one. Since an interpreter has substantially more
|
||
state than just the ``__main__`` module, it isn't so easy to put an
|
||
interpreter back into a pristine/fresh state. In fact, there *may*
|
||
be parts of the state that cannot be reset from Python code.
|
||
|
||
A possible solution is to add an ``Interpreter.reset()`` method. This
|
||
would put the interpreter back into the state it was in when newly
|
||
created. If called on a running interpreter it would fail (hence the
|
||
main interpreter could never be reset). This would likely be more
|
||
efficient than creating a new subinterpreter, though that depends on
|
||
what optimizations will be made later to subinterpreter creation.
|
||
|
||
While this would potentially provide functionality that is not
|
||
otherwise available from Python code, it isn't a fundamental
|
||
functionality. So in the spirit of minimalism here, this can wait.
|
||
Regardless, I doubt it would be controversial to add it post-PEP.
|
||
|
||
File descriptors and sockets in channels
|
||
----------------------------------------
|
||
|
||
Given that file descriptors and sockets are process-global resources,
|
||
support for passing them through channels is a reasonable idea. They
|
||
would be a good candidate for the first effort at expanding the types
|
||
that channels support. They aren't strictly necessary for the initial
|
||
API.
|
||
|
||
Integration with async
|
||
----------------------
|
||
|
||
Per Antoine Pitrou [async]_::
|
||
|
||
Has any thought been given to how FIFOs could integrate with async
|
||
code driven by an event loop (e.g. asyncio)? I think the model of
|
||
executing several asyncio (or Tornado) applications each in their
|
||
own subinterpreter may prove quite interesting to reconcile multi-
|
||
core concurrency with ease of programming. That would require the
|
||
FIFOs to be able to synchronize on something an event loop can wait
|
||
on (probably a file descriptor?).
|
||
|
||
A possible solution is to provide async implementations of the blocking
|
||
channel methods (``recv()``, and ``send()``). However,
|
||
the basic functionality of subinterpreters does not depend on async and
|
||
can be added later.
|
||
|
||
Alternately, "readiness callbacks" could be used to simplify use in
|
||
async scenarios. This would mean adding an optional ``callback``
|
||
(kw-only) parameter to the ``recv_nowait()`` and ``send_nowait()``
|
||
channel methods. The callback would be called once the object was sent
|
||
or received (respectively).
|
||
|
||
(Note that making channels buffered makes readiness callbacks less
|
||
important.)
|
||
|
||
Support for iteration
|
||
---------------------
|
||
|
||
Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or
|
||
``_next__()``) may be useful. A trivial implementation would use the
|
||
``recv()`` method, similar to how files do iteration. Since this isn't
|
||
a fundamental capability and has a simple analog, adding iteration
|
||
support can wait until later.
|
||
|
||
Channel context managers
|
||
------------------------
|
||
|
||
Context manager support on ``RecvChannel`` and ``SendChannel`` may be
|
||
helpful. The implementation would be simple, wrapping a call to
|
||
``close()`` (or maybe ``release()``) like files do. As with iteration,
|
||
this can wait.
|
||
|
||
Pipes and Queues
|
||
----------------
|
||
|
||
With the proposed object passing mechanism of "channels", other similar
|
||
basic types aren't required to achieve the minimal useful functionality
|
||
of subinterpreters. Such types include pipes (like unbuffered channels,
|
||
but one-to-one) and queues (like channels, but more generic). See below
|
||
in `Rejected Ideas`_ for more information.
|
||
|
||
Even though these types aren't part of this proposal, they may still
|
||
be useful in the context of concurrency. Adding them later is entirely
|
||
reasonable. The could be trivially implemented as wrappers around
|
||
channels. Alternatively they could be implemented for efficiency at the
|
||
same low level as channels.
|
||
|
||
Return a lock from send()
|
||
-------------------------
|
||
|
||
When sending an object through a channel, you don't have a way of knowing
|
||
when the object gets received on the other end. One way to work around
|
||
this is to return a locked ``threading.Lock`` from ``SendChannel.send()``
|
||
that unlocks once the object is received.
|
||
|
||
Alternately, the proposed ``SendChannel.send()`` (blocking) and
|
||
``SendChannel.send_nowait()`` provide an explicit distinction that is
|
||
less likely to confuse users.
|
||
|
||
Note that returning a lock would matter for buffered channels
|
||
(i.e. queues). For unbuffered channels it is a non-issue.
|
||
|
||
Support prioritization in channels
|
||
----------------------------------
|
||
|
||
A simple example is ``queue.PriorityQueue`` in the stdlib.
|
||
|
||
Support inheriting settings (and more?)
|
||
---------------------------------------
|
||
|
||
Folks might find it useful, when creating a new subinterpreter, to be
|
||
able to indicate that they would like some things "inherited" by the
|
||
new interpreter. The mechanism could be a strict copy or it could be
|
||
copy-on-write. The motivating example is with the warnings module
|
||
(e.g. copy the filters).
|
||
|
||
The feature isn't critical, nor would it be widely useful, so it
|
||
can wait until there's interest. Notably, both suggested solutions
|
||
will require significant work, especially when it comes to complex
|
||
objects and most especially for mutable containers of mutable
|
||
complex objects.
|
||
|
||
Make exceptions shareable
|
||
-------------------------
|
||
|
||
Exceptions are propagated out of ``run()`` calls, so it isn't a big
|
||
leap to make them shareable in channels. However, as noted elsewhere,
|
||
it isn't essential or (particularly common) so we can wait on doing
|
||
that.
|
||
|
||
Make RunFailedError.__cause__ lazy
|
||
----------------------------------
|
||
|
||
An uncaught exception in a subinterpreter (from ``run()``) is copied
|
||
to the calling interpreter and set as ``__cause__`` on a
|
||
``RunFailedError`` which is then raised. That copying part involves
|
||
some sort of deserialization in the calling interpreter, which can be
|
||
expensive (e.g. due to imports) yet is not always necessary.
|
||
|
||
So it may be useful to use an ``ExceptionProxy`` type to wrap the
|
||
serialized exception and only deserialize it when needed. That could
|
||
be via ``ExceptionProxy__getattribute__()`` or perhaps through
|
||
``RunFailedError.resolve()`` (which would raise the deserialized
|
||
exception and set ``RunFailedError.__cause__`` to the exception.
|
||
|
||
It may also make sense to have ``RunFailedError.__cause__`` be a
|
||
descriptor that does the lazy deserialization (and set ``__cause__``)
|
||
on the ``RunFailedError`` instance.
|
||
|
||
Serialize everything through channels
|
||
-------------------------------------
|
||
|
||
We could use pickle (or marshal) to serialize everything sent through
|
||
channels. Doing this is potentially inefficient, but it may be a
|
||
matter of convenience in the end. We can add it later, but trying to
|
||
remove it later would be significantly more painful.
|
||
|
||
Return a value from ``run()``
|
||
-----------------------------
|
||
|
||
Currently ``run()`` always returns None. One idea is to return the
|
||
return value from whatever the subinterpreter ran. However, for now
|
||
it doesn't make sense. The only thing folks can run is a string of
|
||
code (i.e. a script). This is equivalent to ``PyRun_StringFlags()``,
|
||
``exec()``, or a module body. None of those "return" anything. We can
|
||
revisit this once ``run()`` supports functions, etc.
|
||
|
||
Add a "tp_share" type slot
|
||
--------------------------
|
||
|
||
This would replace the current global registry for shareable types.
|
||
|
||
Expose which interpreters have actually *used* a channel end.
|
||
-------------------------------------------------------------
|
||
|
||
Currently we associate interpreters upon access to a channel. We would
|
||
keep a separate association list for "upon use" and expose that.
|
||
|
||
Add a shareable synchronization primitive
|
||
-----------------------------------------
|
||
|
||
This would be ``_threading.Lock`` (or something like it) where
|
||
interpreters would actually share the underlying mutex. This would
|
||
provide much better efficiency than blocking channel ops. The main
|
||
concern is that locks and channels don't mix well (as learned in Go).
|
||
|
||
Note that the same functionality as a lock can be achieved by passing
|
||
some sort of "token" object through a channel. "send()" would be
|
||
equivalent to releasing the lock and "recv()" to acquiring the lock.
|
||
|
||
We can add this later if it proves desirable without much trouble.
|
||
|
||
Propagate SystemExit and KeyboardInterrupt Differently
|
||
------------------------------------------------------
|
||
|
||
The exception types that inherit from ``BaseException`` (aside from
|
||
``Exception``) are usually treated specially. These types are:
|
||
``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``. It may
|
||
make sense to treat them specially when it comes to propagation from
|
||
``run()``. Here are some options::
|
||
|
||
* propagate like normal via RunFailedError
|
||
* do not propagate (handle them somehow in the subinterpreter)
|
||
* propagate them directly (avoid RunFailedError)
|
||
* propagate them directly (set RunFailedError as __cause__)
|
||
|
||
We aren't going to worry about handling them differently. Threads
|
||
already ignore ``SystemExit``, so for now we will follow that pattern.
|
||
|
||
Add an explicit release() and close() to channel end classes
|
||
------------------------------------------------------------
|
||
|
||
It can be convenient to have an explicit way to close a channel against
|
||
further global use. Likewise it could be useful to have an explicit
|
||
way to release one of the channel ends relative to the current
|
||
interpreter. Among other reasons, such a mechanism is useful for
|
||
communicating overall state between interpreters without the extra
|
||
boilerplate that passing objects through a channel directly would
|
||
require.
|
||
|
||
The challenge is getting automatic release/close right without making
|
||
it hard to understand. This is especially true when dealing with a
|
||
non-empty channel. We should be able to get by without release/close
|
||
for now.
|
||
|
||
Add SendChannel.send_buffer()
|
||
-----------------------------
|
||
|
||
This method would allow no-copy sending of an object through a channel
|
||
if it supports the PEP 3118 buffer protocol (e.g. memoryview).
|
||
|
||
Support for this is not fundamental to channels and can be added on
|
||
later without much disruption.
|
||
|
||
Auto-run in a thread
|
||
--------------------
|
||
|
||
The PEP proposes a hard separation between subinterpreters and threads:
|
||
if you want to run in a thread you must create the thread yourself and
|
||
call ``run()`` in it. However, it might be convenient if ``run()``
|
||
could do that for you, meaning there would be less boilerplate.
|
||
|
||
Furthermore, we anticipate that users will want to run in a thread much
|
||
more often than not. So it would make sense to make this the default
|
||
behavior. We would add a kw-only param "threaded" (default ``True``)
|
||
to ``run()`` to allow the run-in-the-current-thread operation.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Explicit channel association
|
||
----------------------------
|
||
|
||
Interpreters are implicitly associated with channels upon ``recv()`` and
|
||
``send()`` calls. They are de-associated with ``release()`` calls. The
|
||
alternative would be explicit methods. It would be either
|
||
``add_channel()`` and ``remove_channel()`` methods on ``Interpreter``
|
||
objects or something similar on channel objects.
|
||
|
||
In practice, this level of management shouldn't be necessary for users.
|
||
So adding more explicit support would only add clutter to the API.
|
||
|
||
Use pipes instead of channels
|
||
-----------------------------
|
||
|
||
A pipe would be a simplex FIFO between exactly two interpreters. For
|
||
most use cases this would be sufficient. It could potentially simplify
|
||
the implementation as well. However, it isn't a big step to supporting
|
||
a many-to-many simplex FIFO via channels. Also, with pipes the API
|
||
ends up being slightly more complicated, requiring naming the pipes.
|
||
|
||
Use queues instead of channels
|
||
------------------------------
|
||
|
||
Queues and buffered channels are almost the same thing. The main
|
||
difference is that channels have a stronger relationship with context
|
||
(i.e. the associated interpreter).
|
||
|
||
The name "Channel" was used instead of "Queue" to avoid confusion with
|
||
the stdlib ``queue.Queue``.
|
||
|
||
"enumerate"
|
||
-----------
|
||
|
||
The ``list_all()`` function provides the list of all interpreters.
|
||
In the threading module, which partly inspired the proposed API, the
|
||
function is called ``enumerate()``. The name is different here to
|
||
avoid confusing Python users that are not already familiar with the
|
||
threading API. For them "enumerate" is rather unclear, whereas
|
||
"list_all" is clear.
|
||
|
||
Alternate solutions to prevent leaking exceptions across interpreters
|
||
---------------------------------------------------------------------
|
||
|
||
In function calls, uncaught exceptions propagate to the calling frame.
|
||
The same approach could be taken with ``run()``. However, this would
|
||
mean that exception objects would leak across the inter-interpreter
|
||
boundary. Likewise, the frames in the traceback would potentially leak.
|
||
|
||
While that might not be a problem currently, it would be a problem once
|
||
interpreters get better isolation relative to memory management (which
|
||
is necessary to stop sharing the GIL between interpreters). We've
|
||
resolved the semantics of how the exceptions propagate by raising a
|
||
``RunFailedError`` instead, for which ``__cause__`` wraps a safe proxy
|
||
for the original exception and traceback.
|
||
|
||
Rejected possible solutions:
|
||
|
||
* reproduce the exception and traceback in the original interpreter
|
||
and raise that.
|
||
* raise a subclass of RunFailedError that proxies the original
|
||
exception and traceback.
|
||
* raise RuntimeError instead of RunFailedError
|
||
* convert at the boundary (a la ``subprocess.CalledProcessError``)
|
||
(requires a cross-interpreter representation)
|
||
* support customization via ``Interpreter.excepthook``
|
||
(requires a cross-interpreter representation)
|
||
* wrap in a proxy at the boundary (including with support for
|
||
something like ``err.raise()`` to propagate the traceback).
|
||
* return the exception (or its proxy) from ``run()`` instead of
|
||
raising it
|
||
* return a result object (like ``subprocess`` does) [result-object]_
|
||
(unnecessary complexity?)
|
||
* throw the exception away and expect users to deal with unhandled
|
||
exceptions explicitly in the script they pass to ``run()``
|
||
(they can pass error info out via channels); with threads you have
|
||
to do something similar
|
||
|
||
Always associate each new interpreter with its own thread
|
||
---------------------------------------------------------
|
||
|
||
As implemented in the C-API, a subinterpreter is not inherently tied to
|
||
any thread. Furthermore, it will run in any existing thread, whether
|
||
created by Python or not. You only have to activate one of its thread
|
||
states (``PyThreadState``) in the thread first. This means that the
|
||
same thread may run more than one interpreter (though obviously
|
||
not at the same time).
|
||
|
||
The proposed module maintains this behavior. Subinterpreters are not
|
||
tied to threads. Only calls to ``Interpreter.run()`` are. However,
|
||
one of the key objectives of this PEP is to provide a more human-
|
||
centric concurrency model. With that in mind, from a conceptual
|
||
standpoint the module *might* be easier to understand if each
|
||
subinterpreter were associated with its own thread.
|
||
|
||
That would mean ``interpreters.create()`` would create a new thread
|
||
and ``Interpreter.run()`` would only execute in that thread (and
|
||
nothing else would). The benefit is that users would not have to
|
||
wrap ``Interpreter.run()`` calls in a new ``threading.Thread``. Nor
|
||
would they be in a position to accidentally pause the current
|
||
interpreter (in the current thread) while their subinterpreter
|
||
executes.
|
||
|
||
The idea is rejected because the benefit is small and the cost is high.
|
||
The difference from the capability in the C-API would be potentially
|
||
confusing. The implicit creation of threads is magical. The early
|
||
creation of threads is potentially wasteful. The inability to run
|
||
arbitrary interpreters in an existing thread would prevent some valid
|
||
use cases, frustrating users. Tying interpreters to threads would
|
||
require extra runtime modifications. It would also make the module's
|
||
implementation overly complicated. Finally, it might not even make
|
||
the module easier to understand.
|
||
|
||
Only associate interpreters upon use
|
||
------------------------------------
|
||
|
||
Associate interpreters with channel ends only once ``recv()``,
|
||
``send()``, etc. are called.
|
||
|
||
Doing this is potentially confusing and also can lead to unexpected
|
||
races where a channel is auto-closed before it can be used in the
|
||
original (creating) interpreter.
|
||
|
||
Add a "reraise" method to RunFailedError
|
||
----------------------------------------
|
||
|
||
While having ``__cause__`` set on ``RunFailedError`` helps produce a
|
||
more useful traceback, it's less helpful when handling the original
|
||
error. To help facilitate this, we could add
|
||
``RunFailedError.reraise()``. This method would enable the following
|
||
pattern::
|
||
|
||
try:
|
||
try:
|
||
interp.run(script)
|
||
except RunFailedError as exc:
|
||
exc.reraise()
|
||
except MyException:
|
||
...
|
||
|
||
This would be made even simpler if there existed a ``__reraise__``
|
||
protocol.
|
||
|
||
All that said, this is completely unnecessary. Using ``__cause__``
|
||
is good enough::
|
||
|
||
try:
|
||
try:
|
||
interp.run(script)
|
||
except RunFailedError as exc:
|
||
raise exc.__cause__
|
||
except MyException:
|
||
...
|
||
|
||
Note that in extreme cases it may require a little extra boilerplate::
|
||
|
||
try:
|
||
try:
|
||
interp.run(script)
|
||
except RunFailedError as exc:
|
||
if exc.__cause__ is not None:
|
||
raise exc.__cause__
|
||
raise # re-raise
|
||
except MyException:
|
||
...
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
The implementation of the PEP has 4 parts:
|
||
|
||
* the high-level module described in this PEP (mostly a light wrapper
|
||
around a low-level C extension
|
||
* the low-level C extension module
|
||
* additions to the ("private") C=API needed by the low-level module
|
||
* secondary fixes/changes in the CPython runtime that facilitate
|
||
the low-level module (among other benefits)
|
||
|
||
These are at various levels of completion, with more done the lower
|
||
you go:
|
||
|
||
* the high-level module has been, at best, roughly implemented.
|
||
However, fully implementing it will be almost trivial.
|
||
* the low-level module is mostly complete. The bulk of the
|
||
implementation was merged into master in December 2018 as the
|
||
"_xxsubinterpreters" module (for the sake of testing subinterpreter
|
||
functionality). Only 3 parts of the implementation remain:
|
||
"send_wait()", "send_buffer()", and exception propagation. All three
|
||
have been mostly finished, but were blocked by work related to ceval.
|
||
That blocker is basically resolved now and finishing the low-level
|
||
will not require extensive work.
|
||
* all necessary C-API work has been finished
|
||
* all anticipated work in the runtime has been finished
|
||
|
||
The implementation effort for PEP 554 is being tracked as part of
|
||
a larger project aimed at improving multi-core support in CPython.
|
||
[multi-core-project]_
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [c-api]
|
||
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
|
||
|
||
.. _Communicating Sequential Processes:
|
||
|
||
.. [CSP]
|
||
https://en.wikipedia.org/wiki/Communicating_sequential_processes
|
||
https://github.com/futurecore/python-csp
|
||
|
||
.. [fifo]
|
||
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Pipe
|
||
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue
|
||
https://docs.python.org/3/library/queue.html#module-queue
|
||
http://stackless.readthedocs.io/en/2.7-slp/library/stackless/channels.html
|
||
https://golang.org/doc/effective_go.html#sharing
|
||
http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
|
||
|
||
.. [caveats]
|
||
https://docs.python.org/3/c-api/init.html#bugs-and-caveats
|
||
|
||
.. [petr-c-ext]
|
||
https://mail.python.org/pipermail/import-sig/2016-June/001062.html
|
||
https://mail.python.org/pipermail/python-ideas/2016-April/039748.html
|
||
|
||
.. [cryptography]
|
||
https://github.com/pyca/cryptography/issues/2299
|
||
|
||
.. [global-gc]
|
||
http://bugs.python.org/issue24554
|
||
|
||
.. [gilstate]
|
||
https://bugs.python.org/issue10915
|
||
http://bugs.python.org/issue15751
|
||
|
||
.. [global-atexit]
|
||
https://bugs.python.org/issue6531
|
||
|
||
.. [mp-conn]
|
||
https://docs.python.org/3/library/multiprocessing.html#connection-objects
|
||
|
||
.. [bug-rate]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047094.html
|
||
|
||
.. [benefits]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047122.html
|
||
|
||
.. [main-thread]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149566.html
|
||
|
||
.. [reset_globals]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149545.html
|
||
|
||
.. [async]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149420.html
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149585.html
|
||
|
||
.. [result-object]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
|
||
|
||
.. [jython]
|
||
https://mail.python.org/pipermail/python-ideas/2017-May/045771.html
|
||
|
||
.. [multi-core-project]
|
||
https://github.com/ericsnowcurrently/multi-core-python
|
||
|
||
.. [cache-line-ping-pong]
|
||
https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/
|
||
|
||
.. [nathaniel-asyncio]
|
||
https://mail.python.org/archives/list/python-dev@python.org/message/TUEAZNZHVJGGLL4OFD32OW6JJDKM6FAS/
|
||
|
||
.. [extension-docs]
|
||
https://docs.python.org/3/extending/index.html
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|