1454 lines
56 KiB
ReStructuredText
1454 lines
56 KiB
ReStructuredText
PEP: 554
|
||
Title: Multiple Interpreters in the Stdlib
|
||
Author: Eric Snow <ericsnowcurrently@gmail.com>
|
||
BDFL-Delegate: Antoine Pitrou <antoine@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 05-Sep-2017
|
||
Python-Version: 3.12
|
||
Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017,
|
||
09-May-2018, 20-Apr-2020, 04-May-2020
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
CPython has supported multiple interpreters in the same process (AKA
|
||
"subinterpreters") since version 1.5 (1997). The feature has been
|
||
available via the C-API. [c-api]_ Multiple interpreters operate in
|
||
`relative isolation from one another <Interpreter Isolation_>`_, which
|
||
facilitates novel alternative approaches to
|
||
`concurrency <Concurrency_>`_.
|
||
|
||
This proposal introduces the stdlib ``interpreters`` module. It exposes
|
||
the basic functionality of multiple interpreters already provided by the
|
||
C-API, along with a *very* basic way to communicate
|
||
(i.e. pass data between interpreters).
|
||
|
||
|
||
A Disclaimer about the GIL
|
||
==========================
|
||
|
||
To avoid any confusion up front: This PEP is meant to be independent
|
||
of any efforts to stop sharing the GIL between interpreters (:pep:`684`).
|
||
At most this proposal will allow users to take advantage of any
|
||
GIL-related work.
|
||
|
||
The author's position here is that exposing multiple interpreters
|
||
to Python code is worth doing, even if they still share the GIL.
|
||
Conversations with past steering councils indicates they do not
|
||
necessarily agree.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
The "interpreters" Module
|
||
-------------------------
|
||
|
||
The ``interpreters`` module will provide a high-level interface
|
||
to the multiple interpreter functionality, and wrap a new low-level
|
||
``_interpreters`` (in the same way as the ``threading`` module).
|
||
See the `Examples`_ section for concrete usage and use cases.
|
||
|
||
Along with exposing the existing (in CPython) multiple interpreter
|
||
support, the module will also support a very basic mechanism for
|
||
passing data between interpreters. That involves setting simple objects
|
||
in the ``__main__`` module of a target subinterpreter. If one end of
|
||
an ``os.pipe()`` is passed this way then that pipe can be used to send
|
||
bytes between the two interpreters.
|
||
|
||
Note that *objects* are not shared between interpreters since they are
|
||
tied to the interpreter in which they were created. Instead, the
|
||
objects' *data* is passed between interpreters. See the `Shared Data`_
|
||
and `API For Sharing Data`_ sections for more details about
|
||
sharing/communicating between interpreters.
|
||
|
||
API summary for interpreters module
|
||
-----------------------------------
|
||
|
||
Here is a summary of the API for the ``interpreters`` module. For a
|
||
more in-depth explanation of the proposed classes and functions, see
|
||
the `"interpreters" Module API`_ section below.
|
||
|
||
For creating and using interpreters:
|
||
|
||
+----------------------------------+----------------------------------------------+
|
||
| signature | description |
|
||
+==================================+==============================================+
|
||
| ``list_all() -> [Interpreter]`` | Get all existing interpreters. |
|
||
+----------------------------------+----------------------------------------------+
|
||
| ``get_current() -> Interpreter`` | Get the currently running interpreter. |
|
||
+----------------------------------+----------------------------------------------+
|
||
| ``get_main() -> Interpreter`` | Get the main interpreter. |
|
||
+----------------------------------+----------------------------------------------+
|
||
| ``create() -> Interpreter`` | Initialize a new (idle) Python interpreter. |
|
||
+----------------------------------+----------------------------------------------+
|
||
|
||
|
|
||
|
||
+---------------------------------------------------+---------------------------------------------------+
|
||
| signature | description |
|
||
+===================================================+===================================================+
|
||
| ``class Interpreter`` | A single interpreter. |
|
||
+---------------------------------------------------+---------------------------------------------------+
|
||
| ``.id`` | The interpreter's ID (read-only). |
|
||
+---------------------------------------------------+---------------------------------------------------+
|
||
| ``.is_running() -> bool`` | Is the interpreter currently executing code? |
|
||
+---------------------------------------------------+---------------------------------------------------+
|
||
| ``.close()`` | Finalize and destroy the interpreter. |
|
||
+---------------------------------------------------+---------------------------------------------------+
|
||
| ``.run(src_str, /, *, shared=None) -> Status`` | | Run the given source code in the interpreter |
|
||
| | | (in its own thread). |
|
||
+---------------------------------------------------+---------------------------------------------------+
|
||
|
||
.. XXX Support blocking interp.run() until the interpreter
|
||
finishes its current work.
|
||
|
||
|
|
||
|
||
+--------------------+------------------+------------------------------------------------------+
|
||
| exception | base | description |
|
||
+====================+==================+======================================================+
|
||
| ``RunFailedError`` | ``RuntimeError`` | Interpreter.run() resulted in an uncaught exception. |
|
||
+--------------------+------------------+------------------------------------------------------+
|
||
|
||
.. XXX Add "InterpreterAlreadyRunningError"?
|
||
|
||
Asynchronous results:
|
||
|
||
+--------------------------------------------------+---------------------------------------------------+
|
||
| signature | description |
|
||
+==================================================+===================================================+
|
||
| ``class Status`` | Tracks if a request is complete. |
|
||
+--------------------------------------------------+---------------------------------------------------+
|
||
| ``.wait(timeout=None)`` | Block until the requested work is done. |
|
||
+--------------------------------------------------+---------------------------------------------------+
|
||
| ``.done() -> bool`` | Has the requested work completed (or failed)? |
|
||
+--------------------------------------------------+---------------------------------------------------+
|
||
| ``.exception() -> Exception | None`` | Return any exception from the requested work. |
|
||
+--------------------------------------------------+---------------------------------------------------+
|
||
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
| exception | base | description |
|
||
+==========================+========================+================================================+
|
||
| ``NotFinishedError`` | ``Exception`` | The request has not completed yet. |
|
||
+--------------------------+------------------------+------------------------------------------------+
|
||
|
||
For sharing data between interpreters:
|
||
|
||
+---------------------------------------------------------+--------------------------------------------+
|
||
| signature | description |
|
||
+=========================================================+============================================+
|
||
| ``is_shareable(obj) -> Bool`` | | Can the object's data be shared |
|
||
| | | between interpreters? |
|
||
+---------------------------------------------------------+--------------------------------------------+
|
||
|
||
Help for Extension Module Maintainers
|
||
-------------------------------------
|
||
|
||
In practice, an extension that implements multi-phase init (:pep:`489`)
|
||
is considered isolated and thus compatible with multiple interpreters.
|
||
Otherwise it is "incompatible".
|
||
|
||
Many extension modules are still incompatible. The maintainers and
|
||
users of such extension modules will both benefit when they are updated
|
||
to support multiple interpreters. In the meantime, users may become
|
||
confused by failures when using multiple interpreters, which could
|
||
negatively impact extension maintainers. See `Concerns`_ below.
|
||
|
||
To mitigate that impact and accelerate compatibility, we will do the
|
||
following:
|
||
|
||
* be clear that extension modules are *not* required to support use in
|
||
multiple interpreters
|
||
* raise ``ImportError`` when an incompatible module is imported
|
||
in a subinterpreter
|
||
* provide resources (e.g. docs) to help maintainers reach compatibility
|
||
* reach out to the maintainers of Cython and of the most used extension
|
||
modules (on PyPI) to get feedback and possibly provide assistance
|
||
|
||
|
||
Examples
|
||
========
|
||
|
||
Run isolated code
|
||
-----------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
print('before')
|
||
interp.run('print("during")').wait()
|
||
print('after')
|
||
|
||
Pre-populate an interpreter
|
||
---------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
st = interp.run(tw.dedent("""
|
||
import some_lib
|
||
import an_expensive_module
|
||
some_lib.set_up()
|
||
"""))
|
||
wait_for_request()
|
||
st.wait()
|
||
interp.run(tw.dedent("""
|
||
some_lib.handle_request()
|
||
"""))
|
||
|
||
Handling an exception
|
||
---------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
try:
|
||
interp.run(tw.dedent("""
|
||
raise KeyError
|
||
""")).wait()
|
||
except interpreters.RunFailedError as exc:
|
||
print(f"got the error from the subinterpreter: {exc}")
|
||
|
||
Re-raising an exception
|
||
-----------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
try:
|
||
try:
|
||
interp.run(tw.dedent("""
|
||
raise KeyError
|
||
""")).wait()
|
||
except interpreters.RunFailedError as exc:
|
||
raise exc.__cause__
|
||
except KeyError:
|
||
print("got a KeyError from the subinterpreter")
|
||
|
||
Note that this pattern is a candidate for later improvement.
|
||
|
||
Synchronize using an OS pipe
|
||
----------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r, s = os.pipe()
|
||
print('before')
|
||
interp.run(tw.dedent("""
|
||
import os
|
||
os.read(reader, 1)
|
||
print("during")
|
||
"""),
|
||
shared=dict(
|
||
reader=r,
|
||
),
|
||
)
|
||
print('after')
|
||
os.write(s, '')
|
||
|
||
Sharing a file descriptor
|
||
-------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r1, s1 = os.pipe()
|
||
r2, s2 = os.pipe()
|
||
interp.run(tw.dedent("""
|
||
import os
|
||
fd = int.from_bytes(
|
||
os.read(reader, 10), 'big')
|
||
for line in os.fdopen(fd):
|
||
print(line)
|
||
os.write(writer, b'')
|
||
"""),
|
||
shared=dict(
|
||
reader=r1,
|
||
writer=s2,
|
||
),
|
||
)
|
||
with open('spamspamspam') as infile:
|
||
fd = infile.fileno().to_bytes(1, 'big')
|
||
os.write(s1, fd)
|
||
os.read(r2, 1)
|
||
|
||
Passing objects via pickle
|
||
--------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r, s = os.pipe()
|
||
interp.run(tw.dedent("""
|
||
import os
|
||
import pickle
|
||
"""),
|
||
shared=dict(
|
||
reader=r,
|
||
),
|
||
).wait()
|
||
interp.run(tw.dedent("""
|
||
data = b''
|
||
c = os.read(reader, 1)
|
||
while c != b'\x00':
|
||
while c != b'\x00':
|
||
data += c
|
||
c = os.read(reader, 1)
|
||
obj = pickle.loads(data)
|
||
do_something(obj)
|
||
c = os.read(reader, 1)
|
||
"""))
|
||
for obj in input:
|
||
data = pickle.dumps(obj)
|
||
os.write(s, data)
|
||
os.write(s, b'\x00')
|
||
os.write(s, b'\x00')
|
||
|
||
Running a module
|
||
----------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
main_module = mod_name
|
||
interp.run(f'import runpy; runpy.run_module({main_module!r})')
|
||
|
||
Running as script (including zip archives & directories)
|
||
--------------------------------------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
main_script = path_name
|
||
interp.run(f"import runpy; runpy.run_path({main_script!r})")
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Running code in multiple interpreters provides a useful level of
|
||
isolation within the same process. This can be leveraged in a number
|
||
of ways. Furthermore, subinterpreters provide a well-defined framework
|
||
in which such isolation may extended. (See :pep:`684`.)
|
||
|
||
Nick Coghlan explained some of the benefits through a comparison with
|
||
multi-processing [benefits]_::
|
||
|
||
[I] expect that communicating between subinterpreters is going
|
||
to end up looking an awful lot like communicating between
|
||
subprocesses via shared memory.
|
||
|
||
The trade-off between the two models will then be that one still
|
||
just looks like a single process from the point of view of the
|
||
outside world, and hence doesn't place any extra demands on the
|
||
underlying OS beyond those required to run CPython with a single
|
||
interpreter, while the other gives much stricter isolation
|
||
(including isolating C globals in extension modules), but also
|
||
demands much more from the OS when it comes to its IPC
|
||
capabilities.
|
||
|
||
The security risk profiles of the two approaches will also be quite
|
||
different, since using subinterpreters won't require deliberately
|
||
poking holes in the process isolation that operating systems give
|
||
you by default.
|
||
|
||
CPython has supported multiple interpreters, with increasing levels
|
||
of support, since version 1.5. While the feature has the potential
|
||
to be a powerful tool, it has suffered from neglect
|
||
because the multiple interpreter capabilities are not readily available
|
||
directly from Python. Exposing the existing functionality
|
||
in the stdlib will help reverse the situation.
|
||
|
||
This proposal is focused on enabling the fundamental capability of
|
||
multiple interpreters, isolated from each other,
|
||
in the same Python process. This is a
|
||
new area for Python so there is relative uncertainly about the best
|
||
tools to provide as companions to interpreters. Thus we minimize
|
||
the functionality we add in the proposal as much as possible.
|
||
|
||
Concerns
|
||
--------
|
||
|
||
* "subinterpreters are not worth the trouble"
|
||
|
||
Some have argued that subinterpreters do not add sufficient benefit
|
||
to justify making them an official part of Python. Adding features
|
||
to the language (or stdlib) has a cost in increasing the size of
|
||
the language. So an addition must pay for itself.
|
||
|
||
In this case, multiple interpreter support provide a novel concurrency
|
||
model focused on isolated threads of execution. Furthermore, they
|
||
provide an opportunity for changes in CPython that will allow
|
||
simultaneous use of multiple CPU cores (currently prevented
|
||
by the GIL--see :pep:`684`).
|
||
|
||
Alternatives to subinterpreters include threading, async, and
|
||
multiprocessing. Threading is limited by the GIL and async isn't
|
||
the right solution for every problem (nor for every person).
|
||
Multiprocessing is likewise valuable in some but not all situations.
|
||
Direct IPC (rather than via the multiprocessing module) provides
|
||
similar benefits but with the same caveat.
|
||
|
||
Notably, subinterpreters are not intended as a replacement for any of
|
||
the above. Certainly they overlap in some areas, but the benefits of
|
||
subinterpreters include isolation and (potentially) performance. In
|
||
particular, subinterpreters provide a direct route to an alternate
|
||
concurrency model (e.g. CSP) which has found success elsewhere and
|
||
will appeal to some Python users. That is the core value that the
|
||
``interpreters`` module will provide.
|
||
|
||
* "stdlib support for multiple interpreters adds extra burden
|
||
on C extension authors"
|
||
|
||
In the `Interpreter Isolation`_ section below we identify ways in
|
||
which isolation in CPython's subinterpreters is incomplete. Most
|
||
notable is extension modules that use C globals to store internal
|
||
state. :pep:`3121` and :pep:`489` provide a solution for most of the
|
||
problem, but one still remains. [petr-c-ext]_ Until that is resolved
|
||
(see :pep:`573`), C extension authors will face extra difficulty
|
||
to support subinterpreters.
|
||
|
||
Consequently, projects that publish extension modules may face an
|
||
increased maintenance burden as their users start using subinterpreters,
|
||
where their modules may break. This situation is limited to modules
|
||
that use C globals (or use libraries that use C globals) to store
|
||
internal state. For numpy, the reported-bug rate is one every 6
|
||
months. [bug-rate]_
|
||
|
||
Ultimately this comes down to a question of how often it will be a
|
||
problem in practice: how many projects would be affected, how often
|
||
their users will be affected, what the additional maintenance burden
|
||
will be for projects, and what the overall benefit of subinterpreters
|
||
is to offset those costs. The position of this PEP is that the actual
|
||
extra maintenance burden will be small and well below the threshold at
|
||
which subinterpreters are worth it.
|
||
|
||
* "creating a new concurrency API deserves much more thought and
|
||
experimentation, so the new module shouldn't go into the stdlib
|
||
right away, if ever"
|
||
|
||
Introducing an API for a new concurrency model, like happened with
|
||
asyncio, is an extremely large project that requires a lot of careful
|
||
consideration. It is not something that can be done a simply as this
|
||
PEP proposes and likely deserves significant time on PyPI to mature.
|
||
(See `Nathaniel's post <nathaniel-asyncio_>`_ on python-dev.)
|
||
|
||
However, this PEP does not propose any new concurrency API.
|
||
At most it exposes minimal tools (e.g. subinterpreters, simple "sharing")
|
||
which may be used to write code that follows patterns associated with
|
||
(relatively) new-to-Python `concurrency models <Concurrency_>`_.
|
||
Those tools could also be used as the basis for APIs for such
|
||
concurrency models. Again, this PEP does not propose any such API.
|
||
|
||
* "there is no point to exposing subinterpreters if they still share
|
||
the GIL"
|
||
* "the effort to make the GIL per-interpreter is disruptive and risky"
|
||
|
||
A common misconception is that this PEP also includes a promise that
|
||
interpreters will no longer share the GIL. When that is clarified,
|
||
the next question is "what is the point?". This is already answered
|
||
at length in this PEP. Just to be clear, the value lies in::
|
||
|
||
* increase exposure of the existing feature, which helps improve
|
||
the code health of the entire CPython runtime
|
||
* expose the (mostly) isolated execution of interpreters
|
||
* preparation for per-interpreter GIL
|
||
* encourage experimentation
|
||
|
||
* "data sharing can have a negative impact on cache performance
|
||
in multi-core scenarios"
|
||
|
||
(See [cache-line-ping-pong]_.)
|
||
|
||
This shouldn't be a problem for now as we have no immediate plans
|
||
to actually share data between interpreters, instead focusing
|
||
on copying.
|
||
|
||
|
||
About Subinterpreters
|
||
=====================
|
||
|
||
Concurrency
|
||
-----------
|
||
|
||
Concurrency is a challenging area of software development. Decades of
|
||
research and practice have led to a wide variety of concurrency models,
|
||
each with different goals. Most center on correctness and usability.
|
||
|
||
One class of concurrency models focuses on isolated threads of
|
||
execution that interoperate through some message passing scheme. A
|
||
notable example is Communicating Sequential Processes [CSP]_ (upon
|
||
which Go's concurrency is roughly based). The inteded isolation
|
||
inherent to CPython's interpreters makes them well-suited
|
||
to this approach.
|
||
|
||
Shared Data
|
||
-----------
|
||
|
||
CPython's interpreters are inherently isolated (with caveats
|
||
explained below), in contrast to threads. So the same
|
||
communicate-via-shared-memory approach doesn't work. Without an
|
||
alternative, effective use of concurrency via multiple interpreters
|
||
is significantly limited.
|
||
|
||
The key challenge here is that sharing objects between interpreters
|
||
faces complexity due to various constraints on object ownership,
|
||
visibility, and mutability. At a conceptual level it's easier to
|
||
reason about concurrency when objects only exist in one interpreter
|
||
at a time. At a technical level, CPython's current memory model
|
||
limits how Python *objects* may be shared safely between interpreters;
|
||
effectively, objects are bound to the interpreter in which they were
|
||
created. Furthermore, the complexity of *object* sharing increases as
|
||
interpreters become more isolated, e.g. after GIL removal (though this
|
||
is mitigated somewhat for some "immortal" objects (see :pep:`683`).
|
||
|
||
Consequently,the mechanism for sharing needs to be carefully considered.
|
||
There are a number of valid solutions, several of which may be
|
||
appropriate to support in Python. Earlier versions of this proposal
|
||
included a basic capability ("channels"), though most of the options
|
||
were quite similar.
|
||
|
||
Note that the implementation of ``Interpreter.run()`` will be done
|
||
in a way that allows for may of these solutions to be implemented
|
||
independently and to coexist, but doing so is not technically
|
||
a part of the proposal here.
|
||
|
||
The fundamental enabling feature for communication is that most objects
|
||
can be converted to some encoding of underlying raw data, which is safe
|
||
to be passed between interpreters. For example, an ``int`` object can
|
||
be turned into a C ``long`` value, send to another interpreter, and
|
||
turned back into an ``int`` object there.
|
||
|
||
Regardless, the effort to determine the best way forward here is outside
|
||
the scope of this PEP. In the meantime, this proposal provides a basic
|
||
interim solution, described in `API For Sharing Data`_ below.
|
||
|
||
Interpreter Isolation
|
||
---------------------
|
||
|
||
CPython's interpreters are intended to be strictly isolated from each
|
||
other. Each interpreter has its own copy of all modules, classes,
|
||
functions, and variables. The same applies to state in C, including in
|
||
extension modules. The CPython C-API docs explain more. [caveats]_
|
||
|
||
However, there are ways in which interpreters share some state. First
|
||
of all, some process-global state remains shared:
|
||
|
||
* file descriptors
|
||
* builtin types (e.g. dict, bytes)
|
||
* singletons (e.g. None)
|
||
* underlying static module data (e.g. functions) for
|
||
builtin/extension/frozen modules
|
||
|
||
There are no plans to change this.
|
||
|
||
Second, some isolation is faulty due to bugs or implementations that did
|
||
not take subinterpreters into account. This includes things like
|
||
extension modules that rely on C globals. [cryptography]_ In these
|
||
cases bugs should be opened (some are already):
|
||
|
||
* readline module hook functions (http://bugs.python.org/issue4202)
|
||
* memory leaks on re-init (http://bugs.python.org/issue21387)
|
||
|
||
Finally, some potential isolation is missing due to the current design
|
||
of CPython. Improvements are currently going on to address gaps in this
|
||
area:
|
||
|
||
* GC is not run per-interpreter [global-gc]_
|
||
* at-exit handlers are not run per-interpreter [global-atexit]_
|
||
* extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
|
||
* interpreters share memory management (e.g. allocators, gc)
|
||
* interpreters share the GIL
|
||
|
||
Existing Usage
|
||
--------------
|
||
|
||
Multiple interpreter support is not a widely used feature. In fact,
|
||
the only documented cases of widespread usage are
|
||
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_,
|
||
`OpenStack Ceph <https://github.com/ceph/ceph/pull/14971>`_, and
|
||
`JEP <https://github.com/ninia/jep>`_. On the one hand, these cases
|
||
provide confidence that existing multiple interpreter support is
|
||
relatively stable. On the other hand, there isn't much of a sample
|
||
size from which to judge the utility of the feature.
|
||
|
||
|
||
Alternate Python Implementations
|
||
================================
|
||
|
||
I've solicited feedback from various Python implementors about support
|
||
for subinterpreters. Each has indicated that they would be able to
|
||
support multiple interpreters in the same process (if they choose to)
|
||
without a lot of trouble. Here are the projects I contacted:
|
||
|
||
* jython ([jython]_)
|
||
* ironpython (personal correspondence)
|
||
* pypy (personal correspondence)
|
||
* micropython (personal correspondence)
|
||
|
||
|
||
.. _interpreters-list-all:
|
||
.. _interpreters-get-current:
|
||
.. _interpreters-create:
|
||
.. _interpreters-Interpreter:
|
||
|
||
"interpreters" Module API
|
||
=========================
|
||
|
||
The module provides the following functions::
|
||
|
||
list_all() -> [Interpreter]
|
||
|
||
Return a list of all existing interpreters.
|
||
|
||
get_current() => Interpreter
|
||
|
||
Return the currently running interpreter.
|
||
|
||
get_main() => Interpreter
|
||
|
||
Return the main interpreter. If the Python implementation
|
||
has no concept of a main interpreter then return None.
|
||
|
||
create() -> Interpreter
|
||
|
||
Initialize a new Python interpreter and return it.
|
||
It will remain idle until something is run in it and always
|
||
run in its own thread.
|
||
|
||
|
||
The module also provides the following classes::
|
||
|
||
class Interpreter(id):
|
||
|
||
id -> int:
|
||
|
||
The interpreter's ID. (read-only)
|
||
|
||
is_running() -> bool:
|
||
|
||
Return whether or not the interpreter is currently executing
|
||
code. Calling this on the current interpreter will always
|
||
return True.
|
||
|
||
close():
|
||
|
||
Finalize and destroy the interpreter.
|
||
|
||
This may not be called on an already running interpreter.
|
||
Doing so results in a RuntimeError.
|
||
|
||
run(source_str, /, *, shared=None) -> Status:
|
||
|
||
Run the provided Python source code in the interpreter and
|
||
return a Status object that tracks when it finishes.
|
||
|
||
If the "shared" keyword argument is provided (and is a mapping
|
||
of attribute name keys) then each key-value pair is added to
|
||
the interpreter's execution namespace (the interpreter's
|
||
"__main__" module). If any of the values are not a shareable
|
||
object (see below) then ValueError gets raised.
|
||
|
||
This may not be called on an already running interpreter.
|
||
Doing so results in a RuntimeError.
|
||
|
||
A "run()" call is similar to a Thread.start() call. That code
|
||
starts running in a background thread and "run()" returns. At
|
||
that point, the code that called "run()" continues executing
|
||
(in the original interpreter). If any "return" value is
|
||
needed, pass it out via a pipe (os.pipe()). If there is any
|
||
uncaught exception then the returned Status object will expose it.
|
||
|
||
The big difference from functions or threading.Thread is that
|
||
"run()" executes the code in an entirely different interpreter,
|
||
with entirely separate state. The state of the current
|
||
interpreter in the original OS thread does not affect that of
|
||
the target interpreter (the one that will execute the code).
|
||
|
||
Note that the interpreter's state is never reset, neither
|
||
before "run()" executes the code nor after. Thus the
|
||
interpreter state is preserved between calls to "run()".
|
||
This includes "sys.modules", the "builtins" module, and the
|
||
internal state of C extension modules.
|
||
|
||
Also note that "run()" executes in the namespace of the
|
||
"__main__" module, just like scripts, the REPL, "-m", and
|
||
"-c". Just as the interpreter's state is not ever reset, the
|
||
"__main__" module is never reset. You can imagine
|
||
concatenating the code from each "run()" call into one long
|
||
script. This is the same as how the REPL operates.
|
||
|
||
Supported code: source text.
|
||
|
||
class Status:
|
||
|
||
# This is similar to concurrent.futures.Future.
|
||
|
||
wait(timeout=None):
|
||
|
||
Block until the requested work has finished.
|
||
|
||
done() -> bool:
|
||
|
||
Has the requested work completed (or failed)?
|
||
|
||
exception() -> Exception | None:
|
||
|
||
Return the exception raised by the requested work, if any.
|
||
If the work has not completed yet then ``NotFinishedError``
|
||
is raised.
|
||
|
||
Uncaught Exceptions
|
||
-------------------
|
||
|
||
Regarding uncaught exceptions in ``Interpreter.run()``, we noted that
|
||
they are exposed via the returned ``Status`` object. To prevent leaking
|
||
exceptions (and tracebacks) between interpreters, we create a surrogate
|
||
of the exception and its traceback
|
||
(see ``traceback.TracebackException``). This is returned by
|
||
``Status.exception()``. ``Status.wait()`` set it to ``__cause__``
|
||
on a new ``RunFailedError``, and raise that.
|
||
|
||
Raising (a proxy of) the exception directly is problematic since it's
|
||
harder to distinguish between an error in the ``wait()`` call and an
|
||
uncaught exception from the subinterpreter.
|
||
|
||
API For Sharing Data
|
||
--------------------
|
||
|
||
As discussed in `Shared Data`_ above, multiple interpreter support
|
||
is less useful without a mechanism for sharing data (communicating)
|
||
between them. Sharing actual Python objects between interpreters,
|
||
however, has enough potential problems that we are avoiding support
|
||
for that in this proposal. Nor, as mentioned earlier, are we adding
|
||
anything more than the most minimal mechanism for communication.
|
||
|
||
That very basic mechanism, using pipes (see ``os.pipe()``), will allow
|
||
users to send data (bytes) from one interpreter to another. We'll
|
||
take a closer look in a moment. Fundamentally, it's a simple
|
||
application of the underlying sharing capability proposed here.
|
||
|
||
The various aspects of the approach, including keeping the API minimal,
|
||
helps us avoid further exposing any underlying complexity
|
||
to Python users.
|
||
|
||
.. _interpreters-is-shareable:
|
||
|
||
Shareable Objects
|
||
'''''''''''''''''
|
||
|
||
A "shareable" object is one that the runtime knows how to safely "share"
|
||
between interpreters. For now this actually means that a copy of the
|
||
object is provided to the second interpreter. Legitimate sharing is
|
||
feasible but beyond the scope of this proposal.
|
||
|
||
In fact, this proposal only covers very minimal "sharing" of a handful
|
||
of simple, immutable object types. We will initially limit the types
|
||
that are shareable to the following:
|
||
|
||
* ``None``
|
||
* ``bytes``
|
||
* ``str``
|
||
* ``int``
|
||
|
||
Support for other basic types (e.g. ``bool``, ``float``, ``Ellipsis``)
|
||
will be added later, separately.
|
||
|
||
Limiting the initial shareable types is a practical matter, reducing
|
||
the potential complexity of the initial implementation. There are a
|
||
number of solutions we may pursue in the future to expand supported
|
||
objects and object sharing strategies.
|
||
|
||
However, this PEP does provide one concrete addition related to
|
||
shareable objects. The ``interpreters`` module provides a function
|
||
that users may call to determine whether an object is shareable or not::
|
||
|
||
is_shareable(obj) -> bool:
|
||
|
||
Return True if the object may be shared between interpreters.
|
||
This does not necessarily mean that the actual objects will be
|
||
shared. Insead, it means that the objects' underlying data will
|
||
be shared in a cross-interpreter way, whether via a proxy, a
|
||
copy, or some other means.
|
||
|
||
How Sharing Works
|
||
'''''''''''''''''
|
||
|
||
In this propsal, shareable objects are used with ``Interpreter.run()``.
|
||
The steps look something like this:
|
||
|
||
1. a "shareable" object is mapped to an identifier in some container
|
||
2. that mapping is passed as the "shared" argument in the
|
||
``Interpreter.run()`` call
|
||
3. the mapped object is converted to an object that the target
|
||
interpreter may safely use
|
||
4. that object is bound to the mapped name in the target interpreter's
|
||
``__main__`` module, where the running code has access to it
|
||
|
||
The critical part is what happens in step 3. The object must be
|
||
converted to some cross-interpreter-safe data (its raw data or even
|
||
a pointer). Then that data must be converted back into an object
|
||
for the target interpreter to use, likely a new object. For example,
|
||
an ``int`` object could be converted to the underlying C ``long`` value
|
||
and then back into a Python ``int`` object.
|
||
|
||
To make this work, the intermediate data (and any associated mutable
|
||
shared state) will be managed by the Python runtime, not by any of the
|
||
interpreters.
|
||
|
||
The underlying runtime capability that ``Interpreter.run()`` uses is
|
||
what enables data/object "sharing", and is available for use elsewhere
|
||
in the runtime. In fact, it was used in the implementation of the
|
||
"channels" that were part of an earlier version of this PEP.
|
||
Likewise, this runtime functionality facilitates most of the possible
|
||
solutions to which `Shared Data`_ alluded. Thus any separate effort
|
||
to introduce effective means for communicating and sharing data will
|
||
be well served by the underlying functionality proposed here.
|
||
|
||
.. XXX Add Interpreter.set_on___main__() and drop the "shared" arg?
|
||
|
||
Communicating Through OS Pipes
|
||
''''''''''''''''''''''''''''''
|
||
|
||
As noted, this proposal enables a very basic mechanism for
|
||
communicating between interpreters, which makes use of
|
||
``Interpreter.run()`` and shareable objects:
|
||
|
||
1. interpreter A calls ``os.pipe()`` to get a read/write pair
|
||
of file descriptors (both shareable ``int`` objects)
|
||
2. interpreter A calls ``run()`` on interpreter B, passing
|
||
the read FD via the "shared" argument
|
||
3. interpreter A writes some bytes to the write FD
|
||
4. interpreter B reads those bytes
|
||
|
||
Several of the earlier examples demonstrate this, such as
|
||
`Synchronize using an OS pipe`_.
|
||
|
||
|
||
Interpreter Restrictions
|
||
========================
|
||
|
||
Every new interpreter created by ``interpreters.create()``
|
||
now has specific restrictions on any code it runs. This includes the
|
||
following:
|
||
|
||
* importing an extension module fails if it does not implement
|
||
multi-phase init
|
||
* daemon threads may not be created
|
||
* ``os.fork()`` is not allowed (so no ``multiprocessing``)
|
||
* ``os.exec*()`` is not allowed
|
||
(but "fork+exec", a la ``subprocess`` is okay)
|
||
|
||
Note that interpreters created with the existing C-API do not have these
|
||
restrictions. The same is true for the "main" interpreter, so
|
||
existing use of Python will not change.
|
||
|
||
.. Mention the similar restrictions in PEP 684?
|
||
|
||
We may choose to later loosen some of the above restrictions or provide
|
||
a way to enable/disable granular restrictions individually. Regardless,
|
||
requiring multi-phase init from extension modules will always be a
|
||
default restriction.
|
||
|
||
|
||
Documentation
|
||
=============
|
||
|
||
The new stdlib docs page for the ``interpreters`` module will include
|
||
the following:
|
||
|
||
* (at the top) a clear note that support for multiple interpreters
|
||
is not required from extension modules
|
||
* some explanation about what subinterpreters are
|
||
* brief examples of how to use multiple interpreters
|
||
(and communicating between them)
|
||
* a summary of the limitations of using multiple interpreters
|
||
* (for extension maintainers) a link to the resources for ensuring
|
||
multiple interpreters compatibility
|
||
* much of the API information in this PEP
|
||
|
||
Docs about resources for extension maintainers already exist on the
|
||
`Isolating Extension Modules <isolation-howto_>`_ howto page. Any
|
||
extra help will be added there. For example, it may prove helpful
|
||
to discuss strategies for dealing with linked libraries that keep
|
||
their own subinterpreter-incompatible global state.
|
||
|
||
.. _isolation-howto:
|
||
https://docs.python.org/3/howto/isolating-extensions.html
|
||
|
||
Note that the documentation will play a large part in mitigating any
|
||
negative impact that the new ``interpreters`` module might have on
|
||
extension module maintainers.
|
||
|
||
Also, the ``ImportError`` for incompatible extension modules will have
|
||
a message that clearly says it is due to missing multiple interpreters
|
||
compatibility and that extensions are not required to provide it. This
|
||
will help set user expectations properly.
|
||
|
||
|
||
Deferred Functionality
|
||
======================
|
||
|
||
In the interest of keeping this proposal minimal, the following
|
||
functionality has been left out for future consideration. Note that
|
||
this is not a judgement against any of said capability, but rather a
|
||
deferment. That said, each is arguably valid.
|
||
|
||
Interpreter.call()
|
||
------------------
|
||
|
||
It would be convenient to run existing functions in subinterpreters
|
||
directly. ``Interpreter.run()`` could be adjusted to support this or
|
||
a ``call()`` method could be added::
|
||
|
||
Interpreter.call(f, *args, **kwargs)
|
||
|
||
This suffers from the same problem as sharing objects between
|
||
interpreters via queues. The minimal solution (running a source string)
|
||
is sufficient for us to get the feature out where it can be explored.
|
||
|
||
Interpreter.run_in_thread()
|
||
---------------------------
|
||
|
||
This method would make a ``run()`` call for you in a thread. Doing this
|
||
using only ``threading.Thread`` and ``run()`` is relatively trivial so
|
||
we've left it out.
|
||
|
||
Synchronization Primitives
|
||
--------------------------
|
||
|
||
The ``threading`` module provides a number of synchronization primitives
|
||
for coordinating concurrent operations. This is especially necessary
|
||
due to the shared-state nature of threading. In contrast,
|
||
interpreters do not share state. Data sharing is restricted to the
|
||
runtime's shareable objects capability, which does away with the need
|
||
for explicit synchronization. If any sort of opt-in shared state
|
||
support is added to CPython's interpreters in the future, that same
|
||
effort can introduce synchronization primitives to meet that need.
|
||
|
||
CSP Library
|
||
-----------
|
||
|
||
A ``csp`` module would not be a large step away from the functionality
|
||
provided by this PEP. However, adding such a module is outside the
|
||
minimalist goals of this proposal.
|
||
|
||
Syntactic Support
|
||
-----------------
|
||
|
||
The ``Go`` language provides a concurrency model based on CSP,
|
||
so it's similar to the concurrency model that multiple interpreters
|
||
support. However, ``Go`` also provides syntactic support, as well as
|
||
several builtin concurrency primitives, to make concurrency a
|
||
first-class feature. Conceivably, similar syntactic (and builtin)
|
||
support could be added to Python using interpreters. However,
|
||
that is *way* outside the scope of this PEP!
|
||
|
||
Multiprocessing
|
||
---------------
|
||
|
||
The ``multiprocessing`` module could support interpreters in the same
|
||
way it supports threads and processes. In fact, the module's
|
||
maintainer, Davin Potts, has indicated this is a reasonable feature
|
||
request. However, it is outside the narrow scope of this PEP.
|
||
|
||
C-extension opt-in/opt-out
|
||
--------------------------
|
||
|
||
By using the ``PyModuleDef_Slot`` introduced by :pep:`489`, we could
|
||
easily add a mechanism by which C-extension modules could opt out of
|
||
multiple interpreter support. Then the import machinery, when operating
|
||
in a subinterpreter, would need to check the module for support.
|
||
It would raise an ImportError if unsupported.
|
||
|
||
Alternately we could support opting in to multiple interpreters support.
|
||
However, that would probably exclude many more modules (unnecessarily)
|
||
than the opt-out approach. Also, note that :pep:`489` defined that an
|
||
extension's use of the PEP's machinery implies multiple interpreters
|
||
support.
|
||
|
||
The scope of adding the ModuleDef slot and fixing up the import
|
||
machinery is non-trivial, but could be worth it. It all depends on
|
||
how many extension modules break under subinterpreters. Given that
|
||
there are relatively few cases we know of through mod_wsgi, we can
|
||
leave this for later.
|
||
|
||
Resetting __main__
|
||
------------------
|
||
|
||
As proposed, every call to ``Interpreter.run()`` will execute in the
|
||
namespace of the interpreter's existing ``__main__`` module. This means
|
||
that data persists there between ``run()`` calls. Sometimes this isn't
|
||
desirable and you want to execute in a fresh ``__main__``. Also,
|
||
you don't necessarily want to leak objects there that you aren't using
|
||
any more.
|
||
|
||
Note that the following won't work right because it will clear too much
|
||
(e.g. ``__name__`` and the other "__dunder__" attributes::
|
||
|
||
interp.run('globals().clear()')
|
||
|
||
Possible solutions include:
|
||
|
||
* a ``create()`` arg to indicate resetting ``__main__`` after each
|
||
``run`` call
|
||
* an ``Interpreter.reset_main`` flag to support opting in or out
|
||
after the fact
|
||
* an ``Interpreter.reset_main()`` method to opt in when desired
|
||
* ``importlib.util.reset_globals()`` [reset_globals]_
|
||
|
||
Also note that resetting ``__main__`` does nothing about state stored
|
||
in other modules. So any solution would have to be clear about the
|
||
scope of what is being reset. Conceivably we could invent a mechanism
|
||
by which any (or every) module could be reset, unlike ``reload()``
|
||
which does not clear the module before loading into it. Regardless,
|
||
since ``__main__`` is the execution namespace of the interpreter,
|
||
resetting it has a much more direct correlation to interpreters and
|
||
their dynamic state than does resetting other modules. So a more
|
||
generic module reset mechanism may prove unnecessary.
|
||
|
||
This isn't a critical feature initially. It can wait until later
|
||
if desirable.
|
||
|
||
Resetting an interpreter's state
|
||
--------------------------------
|
||
|
||
It may be nice to re-use an existing subinterpreter instead of
|
||
spinning up a new one. Since an interpreter has substantially more
|
||
state than just the ``__main__`` module, it isn't so easy to put an
|
||
interpreter back into a pristine/fresh state. In fact, there *may*
|
||
be parts of the state that cannot be reset from Python code.
|
||
|
||
A possible solution is to add an ``Interpreter.reset()`` method. This
|
||
would put the interpreter back into the state it was in when newly
|
||
created. If called on a running interpreter it would fail (hence the
|
||
main interpreter could never be reset). This would likely be more
|
||
efficient than creating a new interpreter, though that depends on
|
||
what optimizations will be made later to interpreter creation.
|
||
|
||
While this would potentially provide functionality that is not
|
||
otherwise available from Python code, it isn't a fundamental
|
||
functionality. So in the spirit of minimalism here, this can wait.
|
||
Regardless, I doubt it would be controversial to add it post-PEP.
|
||
|
||
Shareable file descriptors and sockets
|
||
--------------------------------------
|
||
|
||
Given that file descriptors and sockets are process-global resources,
|
||
making them shareable is a reasonable idea. They would be a good
|
||
candidate for the first effort at expanding the supported shareable
|
||
types. They aren't strictly necessary for the initial API.
|
||
|
||
Integration with async
|
||
----------------------
|
||
|
||
Per Antoine Pitrou [async]_::
|
||
|
||
Has any thought been given to how FIFOs could integrate with async
|
||
code driven by an event loop (e.g. asyncio)? I think the model of
|
||
executing several asyncio (or Tornado) applications each in their
|
||
own subinterpreter may prove quite interesting to reconcile multi-
|
||
core concurrency with ease of programming. That would require the
|
||
FIFOs to be able to synchronize on something an event loop can wait
|
||
on (probably a file descriptor?).
|
||
|
||
The basic functionality of multiple interpreters support does not depend
|
||
on async and can be added later.
|
||
|
||
channels
|
||
--------
|
||
|
||
We could introduce some relatively efficient, native data types for
|
||
passing data between interpreters, to use instead of OS pipes. Earlier
|
||
versions of this PEP introduced one such mechanism, called "channels".
|
||
This can be pursued later.
|
||
|
||
Pipes and Queues
|
||
----------------
|
||
|
||
With the proposed object passing mechanism of "os.pipe()", other similar
|
||
basic types aren't strictly required to achieve the minimal useful
|
||
functionality of multiple interpreters. Such types include pipes
|
||
(like unbuffered channels, but one-to-one) and queues (like channels,
|
||
but more generic). See below in `Rejected Ideas`_ for more information.
|
||
|
||
Even though these types aren't part of this proposal, they may still
|
||
be useful in the context of concurrency. Adding them later is entirely
|
||
reasonable. The could be trivially implemented as wrappers around
|
||
channels. Alternatively they could be implemented for efficiency at the
|
||
same low level as channels.
|
||
|
||
Support inheriting settings (and more?)
|
||
---------------------------------------
|
||
|
||
Folks might find it useful, when creating a new interpreter, to be
|
||
able to indicate that they would like some things "inherited" by the
|
||
new interpreter. The mechanism could be a strict copy or it could be
|
||
copy-on-write. The motivating example is with the warnings module
|
||
(e.g. copy the filters).
|
||
|
||
The feature isn't critical, nor would it be widely useful, so it
|
||
can wait until there's interest. Notably, both suggested solutions
|
||
will require significant work, especially when it comes to complex
|
||
objects and most especially for mutable containers of mutable
|
||
complex objects.
|
||
|
||
Make exceptions shareable
|
||
-------------------------
|
||
|
||
Exceptions are propagated out of ``run()`` calls, so it isn't a big
|
||
leap to make them shareable. However, as noted elsewhere,
|
||
it isn't essential or (particularly common) so we can wait on doing
|
||
that.
|
||
|
||
Make RunFailedError.__cause__ lazy
|
||
----------------------------------
|
||
|
||
An uncaught exception in a subinterpreter (from ``run()``) is copied
|
||
to the calling interpreter and set as ``__cause__`` on a
|
||
``RunFailedError`` which is then raised. That copying part involves
|
||
some sort of deserialization in the calling interpreter, which can be
|
||
expensive (e.g. due to imports) yet is not always necessary.
|
||
|
||
So it may be useful to use an ``ExceptionProxy`` type to wrap the
|
||
serialized exception and only deserialize it when needed. That could
|
||
be via ``ExceptionProxy__getattribute__()`` or perhaps through
|
||
``RunFailedError.resolve()`` (which would raise the deserialized
|
||
exception and set ``RunFailedError.__cause__`` to the exception.
|
||
|
||
It may also make sense to have ``RunFailedError.__cause__`` be a
|
||
descriptor that does the lazy deserialization (and set ``__cause__``)
|
||
on the ``RunFailedError`` instance.
|
||
|
||
Make everything shareable through serialization
|
||
-----------------------------------------------
|
||
|
||
We could use pickle (or marshal) to serialize everything and thus
|
||
make them shareable. Doing this is potentially inefficient,
|
||
but it may be a matter of convenience in the end.
|
||
We can add it later, but trying to remove it later
|
||
would be significantly more painful.
|
||
|
||
Return a value from ``run()``
|
||
-----------------------------
|
||
|
||
Currently ``run()`` always returns None. One idea is to return the
|
||
return value from whatever the subinterpreter ran. However, for now
|
||
it doesn't make sense. The only thing folks can run is a string of
|
||
code (i.e. a script). This is equivalent to ``PyRun_StringFlags()``,
|
||
``exec()``, or a module body. None of those "return" anything. We can
|
||
revisit this once ``run()`` supports functions, etc.
|
||
|
||
Add a "tp_share" type slot
|
||
--------------------------
|
||
|
||
This would replace the current global registry for shareable types.
|
||
|
||
Add a shareable synchronization primitive
|
||
-----------------------------------------
|
||
|
||
This would be ``_threading.Lock`` (or something like it) where
|
||
interpreters would actually share the underlying mutex. The main
|
||
concern is that locks and isolated interpreters may not mix well
|
||
(as learned in Go).
|
||
|
||
We can add this later if it proves desirable without much trouble.
|
||
|
||
Propagate SystemExit and KeyboardInterrupt Differently
|
||
------------------------------------------------------
|
||
|
||
The exception types that inherit from ``BaseException`` (aside from
|
||
``Exception``) are usually treated specially. These types are:
|
||
``KeyboardInterrupt``, ``SystemExit``, and ``GeneratorExit``. It may
|
||
make sense to treat them specially when it comes to propagation from
|
||
``run()``. Here are some options::
|
||
|
||
* propagate like normal via RunFailedError
|
||
* do not propagate (handle them somehow in the subinterpreter)
|
||
* propagate them directly (avoid RunFailedError)
|
||
* propagate them directly (set RunFailedError as __cause__)
|
||
|
||
We aren't going to worry about handling them differently. Threads
|
||
already ignore ``SystemExit``, so for now we will follow that pattern.
|
||
|
||
Auto-run in a thread
|
||
--------------------
|
||
|
||
The PEP proposes a hard separation between interpreters and threads:
|
||
if you want to run in a thread you must create the thread yourself and
|
||
call ``run()`` in it. However, it might be convenient if ``run()``
|
||
could do that for you, meaning there would be less boilerplate.
|
||
|
||
Furthermore, we anticipate that users will want to run in a thread much
|
||
more often than not. So it would make sense to make this the default
|
||
behavior. We would add a kw-only param "threaded" (default ``True``)
|
||
to ``run()`` to allow the run-in-the-current-thread operation.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Use pipes instead of channels
|
||
-----------------------------
|
||
|
||
A pipe would be a simplex FIFO between exactly two interpreters. For
|
||
most use cases this would be sufficient. It could potentially simplify
|
||
the implementation as well. However, it isn't a big step to supporting
|
||
a many-to-many simplex FIFO via channels. Also, with pipes the API
|
||
ends up being slightly more complicated, requiring naming the pipes.
|
||
|
||
Use queues instead of channels
|
||
------------------------------
|
||
|
||
Queues and buffered channels are almost the same thing. The main
|
||
difference is that channels have a stronger relationship with context
|
||
(i.e. the associated interpreter).
|
||
|
||
The name "Channel" was used instead of "Queue" to avoid confusion with
|
||
the stdlib ``queue.Queue``.
|
||
|
||
"enumerate"
|
||
-----------
|
||
|
||
The ``list_all()`` function provides the list of all interpreters.
|
||
In the threading module, which partly inspired the proposed API, the
|
||
function is called ``enumerate()``. The name is different here to
|
||
avoid confusing Python users that are not already familiar with the
|
||
threading API. For them "enumerate" is rather unclear, whereas
|
||
"list_all" is clear.
|
||
|
||
Alternate solutions to prevent leaking exceptions across interpreters
|
||
---------------------------------------------------------------------
|
||
|
||
In function calls, uncaught exceptions propagate to the calling frame.
|
||
The same approach could be taken with ``run()``. However, this would
|
||
mean that exception objects would leak across the inter-interpreter
|
||
boundary. Likewise, the frames in the traceback would potentially leak.
|
||
|
||
While that might not be a problem currently, it would be a problem once
|
||
interpreters get better isolation relative to memory management (which
|
||
is necessary to stop sharing the GIL between interpreters). We've
|
||
resolved the semantics of how the exceptions propagate by raising a
|
||
``RunFailedError`` instead, for which ``__cause__`` wraps a safe proxy
|
||
for the original exception and traceback.
|
||
|
||
Rejected possible solutions:
|
||
|
||
* reproduce the exception and traceback in the original interpreter
|
||
and raise that.
|
||
* raise a subclass of RunFailedError that proxies the original
|
||
exception and traceback.
|
||
* raise RuntimeError instead of RunFailedError
|
||
* convert at the boundary (a la ``subprocess.CalledProcessError``)
|
||
(requires a cross-interpreter representation)
|
||
* support customization via ``Interpreter.excepthook``
|
||
(requires a cross-interpreter representation)
|
||
* wrap in a proxy at the boundary (including with support for
|
||
something like ``err.raise()`` to propagate the traceback).
|
||
* return the exception (or its proxy) from ``run()`` instead of
|
||
raising it
|
||
* return a result object (like ``subprocess`` does) [result-object]_
|
||
(unnecessary complexity?)
|
||
* throw the exception away and expect users to deal with unhandled
|
||
exceptions explicitly in the script they pass to ``run()``
|
||
(they can pass error info out via ``os.pipe()``);
|
||
with threads you have to do something similar
|
||
|
||
Always associate each new interpreter with its own thread
|
||
---------------------------------------------------------
|
||
|
||
As implemented in the C-API, an interpreter is not inherently tied to
|
||
any thread. Furthermore, it will run in any existing thread, whether
|
||
created by Python or not. You only have to activate one of its thread
|
||
states (``PyThreadState``) in the thread first. This means that the
|
||
same thread may run more than one interpreter (though obviously
|
||
not at the same time).
|
||
|
||
The proposed module maintains this behavior. Interpreters are not
|
||
tied to threads. Only calls to ``Interpreter.run()`` are. However,
|
||
one of the key objectives of this PEP is to provide a more
|
||
human-centric concurrency model. With that in mind, from a conceptual
|
||
standpoint the module *might* be easier to understand if each
|
||
interpreter were associated with its own thread.
|
||
|
||
That would mean ``interpreters.create()`` would create a new thread
|
||
and ``Interpreter.run()`` would only execute in that thread (and
|
||
nothing else would). The benefit is that users would not have to
|
||
wrap ``Interpreter.run()`` calls in a new ``threading.Thread``. Nor
|
||
would they be in a position to accidentally pause the current
|
||
interpreter (in the current thread) while their interpreter
|
||
executes.
|
||
|
||
The idea is rejected because the benefit is small and the cost is high.
|
||
The difference from the capability in the C-API would be potentially
|
||
confusing. The implicit creation of threads is magical. The early
|
||
creation of threads is potentially wasteful. The inability to run
|
||
arbitrary interpreters in an existing thread would prevent some valid
|
||
use cases, frustrating users. Tying interpreters to threads would
|
||
require extra runtime modifications. It would also make the module's
|
||
implementation overly complicated. Finally, it might not even make
|
||
the module easier to understand.
|
||
|
||
Add a "reraise" method to RunFailedError
|
||
----------------------------------------
|
||
|
||
While having ``__cause__`` set on ``RunFailedError`` helps produce a
|
||
more useful traceback, it's less helpful when handling the original
|
||
error. To help facilitate this, we could add
|
||
``RunFailedError.reraise()``. This method would enable the following
|
||
pattern::
|
||
|
||
try:
|
||
try:
|
||
interp.run(script)
|
||
except RunFailedError as exc:
|
||
exc.reraise()
|
||
except MyException:
|
||
...
|
||
|
||
This would be made even simpler if there existed a ``__reraise__``
|
||
protocol.
|
||
|
||
All that said, this is completely unnecessary. Using ``__cause__``
|
||
is good enough::
|
||
|
||
try:
|
||
try:
|
||
interp.run(script)
|
||
except RunFailedError as exc:
|
||
raise exc.__cause__
|
||
except MyException:
|
||
...
|
||
|
||
Note that in extreme cases it may require a little extra boilerplate::
|
||
|
||
try:
|
||
try:
|
||
interp.run(script)
|
||
except RunFailedError as exc:
|
||
if exc.__cause__ is not None:
|
||
raise exc.__cause__
|
||
raise # re-raise
|
||
except MyException:
|
||
...
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
The implementation of the PEP has 4 parts:
|
||
|
||
* the high-level module described in this PEP (mostly a light wrapper
|
||
around a low-level C extension
|
||
* the low-level C extension module
|
||
* additions to the ("private") C=API needed by the low-level module
|
||
* secondary fixes/changes in the CPython runtime that facilitate
|
||
the low-level module (among other benefits)
|
||
|
||
These are at various levels of completion, with more done the lower
|
||
you go:
|
||
|
||
* the high-level module has been, at best, roughly implemented.
|
||
However, fully implementing it will be almost trivial.
|
||
* the low-level module is mostly complete. The bulk of the
|
||
implementation was merged into master in December 2018 as the
|
||
"_xxsubinterpreters" module (for the sake of testing multiple
|
||
interpreters functionality). Only 3 parts of the implementation
|
||
remain: "send_wait()", "send_buffer()", and exception propagation.
|
||
All three have been mostly finished, but were blocked by work
|
||
related to ceval. That blocker is basically resolved now and
|
||
finishing the low-level will not require extensive work.
|
||
* all necessary C-API work has been finished
|
||
* all anticipated work in the runtime has been finished
|
||
|
||
The implementation effort for :pep:`554` is being tracked as part of
|
||
a larger project aimed at improving multi-core support in CPython.
|
||
[multi-core-project]_
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [c-api]
|
||
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
|
||
|
||
.. [CSP]
|
||
https://en.wikipedia.org/wiki/Communicating_sequential_processes
|
||
https://github.com/futurecore/python-csp
|
||
|
||
.. [caveats]
|
||
https://docs.python.org/3/c-api/init.html#bugs-and-caveats
|
||
|
||
.. [petr-c-ext]
|
||
https://mail.python.org/pipermail/import-sig/2016-June/001062.html
|
||
https://mail.python.org/pipermail/python-ideas/2016-April/039748.html
|
||
|
||
.. [cryptography]
|
||
https://github.com/pyca/cryptography/issues/2299
|
||
|
||
.. [global-gc]
|
||
http://bugs.python.org/issue24554
|
||
|
||
.. [gilstate]
|
||
https://bugs.python.org/issue10915
|
||
http://bugs.python.org/issue15751
|
||
|
||
.. [global-atexit]
|
||
https://bugs.python.org/issue6531
|
||
|
||
.. [bug-rate]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047094.html
|
||
|
||
.. [benefits]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047122.html
|
||
|
||
.. [reset_globals]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149545.html
|
||
|
||
.. [async]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149420.html
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149585.html
|
||
|
||
.. [result-object]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
|
||
|
||
.. [jython]
|
||
https://mail.python.org/pipermail/python-ideas/2017-May/045771.html
|
||
|
||
.. [multi-core-project]
|
||
https://github.com/ericsnowcurrently/multi-core-python
|
||
|
||
.. [cache-line-ping-pong]
|
||
https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/
|
||
|
||
.. _nathaniel-asyncio:
|
||
https://mail.python.org/archives/list/python-dev@python.org/message/TUEAZNZHVJGGLL4OFD32OW6JJDKM6FAS/
|
||
|
||
* mp-conn
|
||
https://docs.python.org/3/library/multiprocessing.html#connection-objects
|
||
|
||
* main-thread
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149566.html
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|