1077 lines
39 KiB
ReStructuredText
1077 lines
39 KiB
ReStructuredText
PEP: 554
|
||
Title: Multiple Interpreters in the Stdlib
|
||
Author: Eric Snow <ericsnowcurrently@gmail.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 2017-09-05
|
||
Python-Version: 3.7
|
||
Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
CPython has supported subinterpreters, with increasing levels of
|
||
support, since version 1.5. The feature has been available via the
|
||
C-API. [c-api]_ Subinterpreters operate in
|
||
`relative isolation from one another <Interpreter Isolation_>`_, which
|
||
provides the basis for an
|
||
`alternative concurrency model <Concurrency_>`_.
|
||
|
||
This proposal introduces the stdlib ``interpreters`` module. The module
|
||
will be `provisional <Provisional Status_>`_. It exposes the basic
|
||
functionality of subinterpreters already provided by the C-API.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
The ``interpreters`` module will be added to the stdlib. It will
|
||
provide a high-level interface to subinterpreters and wrap the low-level
|
||
``_interpreters`` module. See the `Examples`_ section for concrete
|
||
usage and use cases.
|
||
|
||
API for interpreters
|
||
--------------------
|
||
|
||
The module provides the following functions:
|
||
|
||
``list_all()``::
|
||
|
||
Return a list of all existing interpreters.
|
||
|
||
``get_current()``::
|
||
|
||
Return the currently running interpreter.
|
||
|
||
``create()``::
|
||
|
||
Initialize a new Python interpreter and return it. The
|
||
interpreter will be created in the current thread and will remain
|
||
idle until something is run in it. The interpreter may be used
|
||
in any thread and will run in whichever thread calls
|
||
``interp.run()``.
|
||
|
||
|
||
The module also provides the following class:
|
||
|
||
``Interpreter(id)``::
|
||
|
||
id:
|
||
|
||
The interpreter's ID (read-only).
|
||
|
||
is_running():
|
||
|
||
Return whether or not the interpreter is currently executing code.
|
||
Calling this on the current interpreter will always return True.
|
||
|
||
destroy():
|
||
|
||
Finalize and destroy the interpreter.
|
||
|
||
This may not be called on an already running interpreter. Doing
|
||
so results in a RuntimeError.
|
||
|
||
run(source_str, /, **shared):
|
||
|
||
Run the provided Python source code in the interpreter. Any
|
||
keyword arguments are added to the interpreter's execution
|
||
namespace (the interpreter's "__main__" module). If any of the
|
||
values are not supported for sharing between interpreters then
|
||
ValueError gets raised. Currently only channels (see
|
||
"create_channel()" below) are supported.
|
||
|
||
This may not be called on an already running interpreter. Doing
|
||
so results in a RuntimeError.
|
||
|
||
A "run()" call is quite similar to any other function call. Once
|
||
it completes, the code that called "run()" continues executing
|
||
(in the original interpreter). Likewise, if there is any uncaught
|
||
exception, it propagates into the code where "run()" was called.
|
||
|
||
The big difference is that "run()" executes the code in an
|
||
entirely different interpreter, with entirely separate state.
|
||
The state of the current interpreter in the current OS thread
|
||
is swapped out with the state of the target interpreter (the one
|
||
that will execute the code). When the target finishes executing,
|
||
the original interpreter gets swapped back in and its execution
|
||
resumes.
|
||
|
||
So calling "run()" will effectively cause the current Python
|
||
thread to pause. Sometimes you won't want that pause, in which
|
||
case you should make the "run()" call in another thread. To do
|
||
so, add a function that calls "run()" and then run that function
|
||
in a normal "threading.Thread".
|
||
|
||
Note that interpreter's state is never reset, neither before
|
||
"run()" executes the code nor after. Thus the interpreter
|
||
state is preserved between calls to "run()". This includes
|
||
"sys.modules", the "builtins" module, and the internal state
|
||
of C extension modules.
|
||
|
||
Also note that "run()" executes in the namespace of the "__main__"
|
||
module, just like scripts, the REPL, "-m", and "-c". Just as
|
||
the interpreter's state is not ever reset, the "__main__" module
|
||
is never reset. You can imagine concatenating the code from each
|
||
"run()" call into one long script. This is the same as how the
|
||
REPL operates.
|
||
|
||
Supported code: source text.
|
||
|
||
API for sharing data
|
||
--------------------
|
||
|
||
The mechanism for passing objects between interpreters is through
|
||
channels. A channel is a simplex FIFO similar to a pipe. The main
|
||
difference is that channels can be associated with zero or more
|
||
interpreters on either end. Unlike queues, which are also many-to-many,
|
||
channels have no buffer.
|
||
|
||
``create_channel()``::
|
||
|
||
Create a new channel and return (recv, send), the RecvChannel and
|
||
SendChannel corresponding to the ends of the channel. The channel
|
||
is not closed and destroyed (i.e. garbage-collected) until the number
|
||
of associated interpreters returns to 0.
|
||
|
||
An interpreter gets associated with a channel by calling its "send()"
|
||
or "recv()" method. That association gets dropped by calling
|
||
"close()" on the channel.
|
||
|
||
Both ends of the channel are supported "shared" objects (i.e. may be
|
||
safely shared by different interpreters. Thus they may be passed as
|
||
keyword arguments to "Interpreter.run()".
|
||
|
||
``list_all_channels()``::
|
||
|
||
Return a list of all open (RecvChannel, SendChannel) pairs.
|
||
|
||
|
||
``RecvChannel(id)``::
|
||
|
||
The receiving end of a channel. An interpreter may use this to
|
||
receive objects from another interpreter. At first only bytes will
|
||
be supported.
|
||
|
||
id:
|
||
|
||
The channel's unique ID.
|
||
|
||
interpreters:
|
||
|
||
The list of associated interpreters: those that have called
|
||
the "recv()" or "__next__()" methods and haven't called "close()".
|
||
|
||
recv():
|
||
|
||
Return the next object from the channel. If none have been sent
|
||
then wait until the next send. This associates the current
|
||
interpreter with the channel.
|
||
|
||
If the channel is already closed (see the close() method)
|
||
then raise EOFError. If the channel isn't closed, but the current
|
||
interpreter already called the "close()" method (which drops its
|
||
association with the channel) then raise ValueError.
|
||
|
||
recv_nowait(default=None):
|
||
|
||
Return the next object from the channel. If none have been sent
|
||
then return the default. Otherwise, this is the same as the
|
||
"recv()" method.
|
||
|
||
close():
|
||
|
||
No longer associate the current interpreter with the channel (on
|
||
the receiving end) and block future association (via the "recv()"
|
||
method. If the interpreter was never associated with the channel
|
||
then still block future association. Once an interpreter is no
|
||
longer associated with the channel, subsequent (or current) send()
|
||
and recv() calls from that interpreter will raise ValueError
|
||
(or EOFError if the channel is actually marked as closed).
|
||
|
||
Once the number of associated interpreters on both ends drops
|
||
to 0, the channel is actually marked as closed. The Python
|
||
runtime will garbage collect all closed channels, though it may
|
||
not be immediately. Note that "close()" is automatically called
|
||
in behalf of the current interpreter when the channel is no longer
|
||
used (i.e. has no references) in that interpreter.
|
||
|
||
This operation is idempotent. Return True if "close()" has not
|
||
been called before by the current interpreter.
|
||
|
||
|
||
``SendChannel(id)``::
|
||
|
||
The sending end of a channel. An interpreter may use this to send
|
||
objects to another interpreter. At first only bytes will be
|
||
supported.
|
||
|
||
id:
|
||
|
||
The channel's unique ID.
|
||
|
||
interpreters:
|
||
|
||
The list of associated interpreters (those that have called
|
||
the "send()" method).
|
||
|
||
send(obj):
|
||
|
||
Send the object to the receiving end of the channel. Wait until
|
||
the object is received. If the channel does not support the
|
||
object then ValueError is raised. Currently only bytes are
|
||
supported.
|
||
|
||
If the channel is already closed (see the close() method)
|
||
then raise EOFError. If the channel isn't closed, but the current
|
||
interpreter already called the "close()" method (which drops its
|
||
association with the channel) then raise ValueError.
|
||
|
||
send_nowait(obj):
|
||
|
||
Send the object to the receiving end of the channel. If the
|
||
object is received then return True. If not then return False.
|
||
Otherwise, this is the same as the "send()" method.
|
||
|
||
close():
|
||
|
||
This is the same as "RecvChannel.close(), but applied to the
|
||
sending end of the channel.
|
||
|
||
|
||
Examples
|
||
========
|
||
|
||
Run isolated code
|
||
-----------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
print('before')
|
||
interp.run('print("during")')
|
||
print('after')
|
||
|
||
Run in a thread
|
||
---------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
def run():
|
||
interp.run('print("during")')
|
||
t = threading.Thread(target=run)
|
||
print('before')
|
||
t.start()
|
||
print('after')
|
||
|
||
Pre-populate an interpreter
|
||
---------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
interp.run(tw.dedent("""
|
||
import some_lib
|
||
import an_expensive_module
|
||
some_lib.set_up()
|
||
"""))
|
||
wait_for_request()
|
||
interp.run(tw.dedent("""
|
||
some_lib.handle_request()
|
||
"""))
|
||
|
||
Handling an exception
|
||
---------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
try:
|
||
interp.run(tw.dedent("""
|
||
raise KeyError
|
||
"""))
|
||
except KeyError:
|
||
print("got the error from the subinterpreter")
|
||
|
||
Synchronize using a channel
|
||
---------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r, s = interpreters.create_channel()
|
||
def run():
|
||
interp.run(tw.dedent("""
|
||
reader.recv()
|
||
print("during")
|
||
reader.close()
|
||
"""),
|
||
reader=r))
|
||
t = threading.Thread(target=run)
|
||
print('before')
|
||
t.start()
|
||
print('after')
|
||
s.send(b'')
|
||
s.close()
|
||
|
||
Sharing a file descriptor
|
||
-------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r1, s1 = interpreters.create_channel()
|
||
r2, s2 = interpreters.create_channel()
|
||
def run():
|
||
interp.run(tw.dedent("""
|
||
fd = int.from_bytes(
|
||
reader.recv(), 'big')
|
||
for line in os.fdopen(fd):
|
||
print(line)
|
||
writer.send(b'')
|
||
"""),
|
||
reader=r1, writer=s2)
|
||
t = threading.Thread(target=run)
|
||
t.start()
|
||
with open('spamspamspam') as infile:
|
||
fd = infile.fileno().to_bytes(1, 'big')
|
||
s.send(fd)
|
||
r.recv()
|
||
|
||
Passing objects via pickle
|
||
--------------------------
|
||
|
||
::
|
||
|
||
interp = interpreters.create()
|
||
r, s = interpreters.create_channel()
|
||
interp.run(tw.dedent("""
|
||
import pickle
|
||
"""),
|
||
reader=r)
|
||
def run():
|
||
interp.run(tw.dedent("""
|
||
data = reader.recv()
|
||
while data:
|
||
obj = pickle.loads(data)
|
||
do_something(obj)
|
||
data = reader.recv()
|
||
reader.close()
|
||
"""),
|
||
reader=r)
|
||
t = threading.Thread(target=run)
|
||
t.start()
|
||
for obj in input:
|
||
data = pickle.dumps(obj)
|
||
s.send(data)
|
||
s.send(b'')
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Running code in multiple interpreters provides a useful level of
|
||
isolation within the same process. This can be leveraged in number
|
||
of ways. Furthermore, subinterpreters provide a well-defined framework
|
||
in which such isolation may extended.
|
||
|
||
Nick Coghlan explained some of the benefits through a comparison with
|
||
multi-processing [benefits]_::
|
||
|
||
[I] expect that communicating between subinterpreters is going
|
||
to end up looking an awful lot like communicating between
|
||
subprocesses via shared memory.
|
||
|
||
The trade-off between the two models will then be that one still
|
||
just looks like a single process from the point of view of the
|
||
outside world, and hence doesn't place any extra demands on the
|
||
underlying OS beyond those required to run CPython with a single
|
||
interpreter, while the other gives much stricter isolation
|
||
(including isolating C globals in extension modules), but also
|
||
demands much more from the OS when it comes to its IPC
|
||
capabilities.
|
||
|
||
The security risk profiles of the two approaches will also be quite
|
||
different, since using subinterpreters won't require deliberately
|
||
poking holes in the process isolation that operating systems give
|
||
you by default.
|
||
|
||
CPython has supported subinterpreters, with increasing levels of
|
||
support, since version 1.5. While the feature has the potential
|
||
to be a powerful tool, subinterpreters have suffered from neglect
|
||
because they are not available directly from Python. Exposing the
|
||
existing functionality in the stdlib will help reverse the situation.
|
||
|
||
This proposal is focused on enabling the fundamental capability of
|
||
multiple isolated interpreters in the same Python process. This is a
|
||
new area for Python so there is relative uncertainly about the best
|
||
tools to provide as companions to subinterpreters. Thus we minimize
|
||
the functionality we add in the proposal as much as possible.
|
||
|
||
Concerns
|
||
--------
|
||
|
||
* "subinterpreters are not worth the trouble"
|
||
|
||
Some have argued that subinterpreters do not add sufficient benefit
|
||
to justify making them an official part of Python. Adding features
|
||
to the language (or stdlib) has a cost in increasing the size of
|
||
the language. So it must pay for itself. In this case, subinterpreters
|
||
provide a novel concurrency model focused on isolated threads of
|
||
execution. Furthermore, they present an opportunity for changes in
|
||
CPython that will allow simulateous use of multiple CPU cores (currently
|
||
prevented by the GIL).
|
||
|
||
Alternatives to subinterpreters include threading, async, and
|
||
multiprocessing. Threading is limited by the GIL and async isn't
|
||
the right solution for every problem (nor for every person).
|
||
Multiprocessing is likewise valuable in some but not all situations.
|
||
Direct IPC (rather than via the multiprocessing module) provides
|
||
similar benefits but with the same caveat.
|
||
|
||
Notably, subinterpreters are not intended as a replacement for any of
|
||
the above. Certainly they overlap in some areas, but the benefits of
|
||
subinterpreters include isolation and (potentially) performance. In
|
||
particular, subinterpreters provide a direct route to an alternate
|
||
concurrency model (e.g. CSP) which has found success elsewhere and
|
||
will appeal to some Python users. That is the core value that the
|
||
``interpreters`` module will provide.
|
||
|
||
* "stdlib support for subinterpreters adds extra burden
|
||
on C extension authors"
|
||
|
||
In the `Interpreter Isolation`_ section below we identify ways in
|
||
which isolation in CPython's subinterpreters is incomplete. Most
|
||
notable is extension modules that use C globals to store internal
|
||
state. PEP 3121 and PEP 489 provide a solution for most of the
|
||
problem, but one still remains. [petr-c-ext]_ Until that is resolved,
|
||
C extension authors will face extra difficulty to support
|
||
subinterpreters.
|
||
|
||
Consequently, projects that publish extension modules may face an
|
||
increased maintenance burden as their users start using subinterpreters,
|
||
where their modules may break. This situation is limited to modules
|
||
that use C globals (or use libraries that use C globals) to store
|
||
internal state. For numpy, the reported-bug rate is one every 6
|
||
months. [bug-rate]_
|
||
|
||
Ultimately this comes down to a question of how often it will be a
|
||
problem in practice: how many projects would be affected, how often
|
||
their users will be affected, what the additional maintenance burden
|
||
will be for projects, and what the overall benefit of subinterpreters
|
||
is to offset those costs. The position of this PEP is that the actual
|
||
extra maintenance burden will be small and well below the threshold at
|
||
which subinterpreters are worth it.
|
||
|
||
|
||
About Subinterpreters
|
||
=====================
|
||
|
||
Shared data
|
||
-----------
|
||
|
||
Subinterpreters are inherently isolated (with caveats explained below),
|
||
in contrast to threads. This enables `a different concurrency model
|
||
<Concurrency_>`_ than is currently readily available in Python.
|
||
`Communicating Sequential Processes`_ (CSP) is the prime example.
|
||
|
||
A key component of this approach to concurrency is message passing. So
|
||
providing a message/object passing mechanism alongside ``Interpreter``
|
||
is a fundamental requirement. This proposal includes a basic mechanism
|
||
upon which more complex machinery may be built. That basic mechanism
|
||
draws inspiration from pipes, queues, and CSP's channels. [fifo]_
|
||
|
||
The key challenge here is that sharing objects between interpreters
|
||
faces complexity due in part to CPython's current memory model.
|
||
Furthermore, in this class of concurrency, the ideal is that objects
|
||
only exist in one interpreter at a time. However, this is not practical
|
||
for Python so we initially constrain supported objects to ``bytes``.
|
||
There are a number of strategies we may pursue in the future to expand
|
||
supported objects and object sharing strategies.
|
||
|
||
Note that the complexity of object sharing increases as subinterpreters
|
||
become more isolated, e.g. after GIL removal. So the mechanism for
|
||
message passing needs to be carefully considered. Keeping the API
|
||
minimal and initially restricting the supported types helps us avoid
|
||
further exposing any underlying complexity to Python users.
|
||
|
||
To make this work, the mutable shared state will be managed by the
|
||
Python runtime, not by any of the interpreters. Initially we will
|
||
support only one type of objects for shared state: the channels provided
|
||
by ``create_channel()``. Channels, in turn, will carefully manage
|
||
passing objects between interpreters.
|
||
|
||
Interpreter Isolation
|
||
---------------------
|
||
|
||
CPython's interpreters are intended to be strictly isolated from each
|
||
other. Each interpreter has its own copy of all modules, classes,
|
||
functions, and variables. The same applies to state in C, including in
|
||
extension modules. The CPython C-API docs explain more. [caveats]_
|
||
|
||
However, there are ways in which interpreters share some state. First
|
||
of all, some process-global state remains shared:
|
||
|
||
* file descriptors
|
||
* builtin types (e.g. dict, bytes)
|
||
* singletons (e.g. None)
|
||
* underlying static module data (e.g. functions) for
|
||
builtin/extension/frozen modules
|
||
|
||
There are no plans to change this.
|
||
|
||
Second, some isolation is faulty due to bugs or implementations that did
|
||
not take subinterpreters into account. This includes things like
|
||
extension modules that rely on C globals. [cryptography]_ In these
|
||
cases bugs should be opened (some are already):
|
||
|
||
* readline module hook functions (http://bugs.python.org/issue4202)
|
||
* memory leaks on re-init (http://bugs.python.org/issue21387)
|
||
|
||
Finally, some potential isolation is missing due to the current design
|
||
of CPython. Improvements are currently going on to address gaps in this
|
||
area:
|
||
|
||
* interpreters share the GIL
|
||
* interpreters share memory management (e.g. allocators, gc)
|
||
* GC is not run per-interpreter [global-gc]_
|
||
* at-exit handlers are not run per-interpreter [global-atexit]_
|
||
* extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
|
||
|
||
Concurrency
|
||
-----------
|
||
|
||
Concurrency is a challenging area of software development. Decades of
|
||
research and practice have led to a wide variety of concurrency models,
|
||
each with different goals. Most center on correctness and usability.
|
||
|
||
One class of concurrency models focuses on isolated threads of
|
||
execution that interoperate through some message passing scheme. A
|
||
notable example is `Communicating Sequential Processes`_ (CSP), upon
|
||
which Go's concurrency is based. The isolation inherent to
|
||
subinterpreters makes them well-suited to this approach.
|
||
|
||
|
||
Existing Usage
|
||
--------------
|
||
|
||
Subinterpreters are not a widely used feature. In fact, the only
|
||
documented cases of wide-spread usage are
|
||
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_and
|
||
`JEP <https://github.com/ninia/jep>`_. On the one hand, this case
|
||
provides confidence that existing subinterpreter support is relatively
|
||
stable. On the other hand, there isn't much of a sample size from which
|
||
to judge the utility of the feature.
|
||
|
||
|
||
Provisional Status
|
||
==================
|
||
|
||
The new ``interpreters`` module will be added with "provisional" status
|
||
(see PEP 411). This allows Python users to experiment with the feature
|
||
and provide feedback while still allowing us to adjust to that feedback.
|
||
The module will be provisional in Python 3.7 and we will make a decision
|
||
before the 3.8 release whether to keep it provisional, graduate it, or
|
||
remove it.
|
||
|
||
|
||
Alternate Python Implementations
|
||
================================
|
||
|
||
I'll be soliciting feedback from the different Python implementors about
|
||
subinterpreter support.
|
||
|
||
Multiple-interpter support in the major Python implementations:
|
||
|
||
TBD
|
||
|
||
* jython: yes [jython]_
|
||
* ironpython: yes?
|
||
* pypy: maybe not? [pypy]_
|
||
* micropython: ???
|
||
|
||
|
||
Open Questions
|
||
==============
|
||
|
||
Leaking exceptions across interpreters
|
||
--------------------------------------
|
||
|
||
As currently proposed, uncaught exceptions from ``run()`` propagate
|
||
to the frame that called it. However, this means that exception
|
||
objects are leaking across the inter-interpreter boundary. Likewise,
|
||
the frames in the traceback potentially leak.
|
||
|
||
While that might not be a problem currently, it would be a problem once
|
||
interpreters get better isolation relative to memory management (which
|
||
is necessary to stop sharing the GIL between interpreters). So the
|
||
semantics of how the exceptions propagate needs to be resolved.
|
||
|
||
Possible solutions:
|
||
|
||
* convert at the boundary (e.g. ``subprocess.CalledProcessError``)
|
||
* wrap in a proxy at the boundary (including with support for
|
||
something like ``err.raise()`` to propagate the traceback).
|
||
* return the exception (or its proxy) from ``run()`` instead of
|
||
raising it
|
||
* return a result object (like ``subprocess`` does) [result-object]_
|
||
* throw the exception away and expect users to deal with unhandled
|
||
exceptions explicitly in the script they pass to ``run()``
|
||
(they can pass error info out via channels); with threads you have
|
||
to do something similar
|
||
|
||
Initial support for buffers in channels
|
||
---------------------------------------
|
||
|
||
An alternative to support for bytes in channels in support for
|
||
read-only buffers (the PEP 3118 kind). Then ``recv()`` would return
|
||
a memoryview to expose the buffer in a zero-copy way. This is similar
|
||
to what ``multiprocessing.Connection`` supports. [mp-conn]
|
||
|
||
Switching to such an approach would help resolve questions of how
|
||
passing bytes through channels will work once we isolate memory
|
||
management in interpreters.
|
||
|
||
Does every interpreter think that their thread is the "main" thread?
|
||
--------------------------------------------------------------------
|
||
|
||
CPython's interpreter implementation identifies the OS thread in which
|
||
it was started as the "main" thread. The interpreter the has slightly
|
||
different behavior depending on if the current thread is the main one
|
||
or not. This presents a problem in cases where "main thread" is meant
|
||
to imply "main thread in the main interpreter" [main-thread]_, where
|
||
the main interpreter is the initial one.
|
||
|
||
Disallow subinterpreters in the main thread?
|
||
--------------------------------------------
|
||
|
||
This is a specific case of the above issue. Currently in CPython,
|
||
"we need a main \*thread\* in order to sensibly manage the way signal
|
||
handling works across different platforms". [main-thread]_
|
||
|
||
Since signal handlers are part of the interpreter state, running a
|
||
subinterpreter in the main thread means that the main interpreter
|
||
can no longer properly handle signals (since it's effectively paused).
|
||
|
||
Furthermore, running a subinterpreter in the main thread would
|
||
conceivably allow setting signal handlers on that interpreter, which
|
||
would likewise impact signal handling when that interpreter isn't
|
||
running or is running in a different thread.
|
||
|
||
Ultimately, running subinterpreters in the main OS thread introduces
|
||
complications to the signal handling implementation. So it may make
|
||
the most sense to disallow running subinterpreters in the main thread.
|
||
Support for it could be considered later. The downside is that folks
|
||
wanting to try out subinterpreters would be required to take the extra
|
||
step of using threads. This could slow adoption and experimentation,
|
||
whereas without the restriction there's less of an obstacle.
|
||
|
||
Pass channels explicitly to run()?
|
||
----------------------------------
|
||
|
||
Nick Coghlan suggested [explicit-channels]_ that we may want something more explicit than
|
||
the keyword args of ``run()`` (``**shared``)::
|
||
|
||
The subprocess.run() comparison does make me wonder whether this
|
||
might be a more future-proof signature for Interpreter.run() though:
|
||
|
||
def run(source_str, /, *, channels=None):
|
||
...
|
||
|
||
That way channels can be a namespace *specifically* for passing in
|
||
channels, and can be reported as such on RunResult. If we decide
|
||
to allow arbitrary shared objects in the future, or add flag options
|
||
like "reraise=True" to reraise exceptions from the subinterpreter
|
||
in the current interpreter, we'd have that ability, rather than
|
||
having the entire potential keyword namespace taken up for passing
|
||
shared objects.
|
||
|
||
and::
|
||
|
||
It does occur to me that if we wanted to align with the way the
|
||
`runpy` module spells that concept, we'd call the option
|
||
`init_globals`, but I'm thinking it will be better to only allow
|
||
channels to be passed through directly, and require that everything
|
||
else be sent through a channel.
|
||
|
||
|
||
Deferred Functionality
|
||
======================
|
||
|
||
In the interest of keeping this proposal minimal, the following
|
||
functionality has been left out for future consideration. Note that
|
||
this is not a judgement against any of said capability, but rather a
|
||
deferment. That said, each is arguably valid.
|
||
|
||
Interpreter.call()
|
||
------------------
|
||
|
||
It would be convenient to run existing functions in subinterpreters
|
||
directly. ``Interpreter.run()`` could be adjusted to support this or
|
||
a ``call()`` method could be added::
|
||
|
||
Interpreter.call(f, *args, **kwargs)
|
||
|
||
This suffers from the same problem as sharing objects between
|
||
interpreters via queues. The minimal solution (running a source string)
|
||
is sufficient for us to get the feature out where it can be explored.
|
||
|
||
timeout arg to recv() and send()
|
||
--------------------------------
|
||
|
||
Typically functions that have a ``block`` argument also have a
|
||
``timeout`` argument. It sometimes makes sense to do likewise for
|
||
functions that otherwise block, like the channel ``recv()`` and
|
||
``send()`` methods. We can add it later if needed.
|
||
|
||
get_main()
|
||
----------
|
||
|
||
CPython has a concept of a "main" interpreter. This is the initial
|
||
interpreter created during CPython's runtime initialization. It may
|
||
be useful to identify the main interpreter. For instance, the main
|
||
interpreter should not be destroyed. However, for the basic
|
||
functionality of a high-level API a ``get_main()`` function is not
|
||
necessary. Furthermore, there is no requirement that a Python
|
||
implementation have a concept of a main interpreter. So until there's
|
||
a clear need we'll leave ``get_main()`` out.
|
||
|
||
Interpreter.run_in_thread()
|
||
---------------------------
|
||
|
||
This method would make a ``run()`` call for you in a thread. Doing this
|
||
using only ``threading.Thread`` and ``run()`` is relatively trivial so
|
||
we've left it out.
|
||
|
||
Synchronization Primitives
|
||
--------------------------
|
||
|
||
The ``threading`` module provides a number of synchronization primitives
|
||
for coordinating concurrent operations. This is especially necessary
|
||
due to the shared-state nature of threading. In contrast,
|
||
subinterpreters do not share state. Data sharing is restricted to
|
||
channels, which do away with the need for explicit synchronization. If
|
||
any sort of opt-in shared state support is added to subinterpreters in
|
||
the future, that same effort can introduce synchronization primitives
|
||
to meet that need.
|
||
|
||
CSP Library
|
||
-----------
|
||
|
||
A ``csp`` module would not be a large step away from the functionality
|
||
provided by this PEP. However, adding such a module is outside the
|
||
minimalist goals of this proposal.
|
||
|
||
Syntactic Support
|
||
-----------------
|
||
|
||
The ``Go`` language provides a concurrency model based on CSP, so
|
||
it's similar to the concurrency model that subinterpreters support.
|
||
``Go`` provides syntactic support, as well several builtin concurrency
|
||
primitives, to make concurrency a first-class feature. Conceivably,
|
||
similar syntactic (and builtin) support could be added to Python using
|
||
subinterpreters. However, that is *way* outside the scope of this PEP!
|
||
|
||
Multiprocessing
|
||
---------------
|
||
|
||
The ``multiprocessing`` module could support subinterpreters in the same
|
||
way it supports threads and processes. In fact, the module's
|
||
maintainer, Davin Potts, has indicated this is a reasonable feature
|
||
request. However, it is outside the narrow scope of this PEP.
|
||
|
||
C-extension opt-in/opt-out
|
||
--------------------------
|
||
|
||
By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily
|
||
add a mechanism by which C-extension modules could opt out of support
|
||
for subinterpreters. Then the import machinery, when operating in
|
||
a subinterpreter, would need to check the module for support. It would
|
||
raise an ImportError if unsupported.
|
||
|
||
Alternately we could support opting in to subinterpreter support.
|
||
However, that would probably exclude many more modules (unnecessarily)
|
||
than the opt-out approach.
|
||
|
||
The scope of adding the ModuleDef slot and fixing up the import
|
||
machinery is non-trivial, but could be worth it. It all depends on
|
||
how many extension modules break under subinterpreters. Given the
|
||
relatively few cases we know of through mod_wsgi, we can leave this
|
||
for later.
|
||
|
||
Poisoning channels
|
||
------------------
|
||
|
||
CSP has the concept of poisoning a channel. Once a channel has been
|
||
poisoned, and ``send()`` or ``recv()`` call on it will raise a special
|
||
exception, effectively ending execution in the interpreter that tried
|
||
to use the poisoned channel.
|
||
|
||
This could be accomplished by adding a ``poison()`` method to both ends
|
||
of the channel. The ``close()`` method could work if it had a ``force``
|
||
option to force the channel closed. Regardless, these semantics are
|
||
relatively specialized and can wait.
|
||
|
||
Sending channels over channels
|
||
------------------------------
|
||
|
||
Some advanced usage of subinterpreters could take advantage of the
|
||
ability to send channels over channels, in addition to bytes. Given
|
||
that channels will already be multi-interpreter safe, supporting then
|
||
in ``RecvChannel.recv()`` wouldn't be a big change. However, this can
|
||
wait until the basic functionality has been ironed out.
|
||
|
||
Reseting __main__
|
||
-----------------
|
||
|
||
As proposed, every call to ``Interpreter.run()`` will execute in the
|
||
namespace of the interpreter's existing ``__main__`` module. This means
|
||
that data persists there between ``run()`` calls. Sometimes this isn't
|
||
desireable and you want to execute in a fresh ``__main__``. Also,
|
||
you don't necessarily want to leak objects there that you aren't using
|
||
any more.
|
||
|
||
Note that the following won't work right because it will clear too much
|
||
(e.g. ``__name__`` and the other "__dunder__" attributes::
|
||
|
||
interp.run('globals().clear()')
|
||
|
||
Possible solutions include:
|
||
|
||
* a ``create()`` arg to indicate resetting ``__main__`` after each
|
||
``run`` call
|
||
* an ``Interpreter.reset_main`` flag to support opting in or out
|
||
after the fact
|
||
* an ``Interpreter.reset_main()`` method to opt in when desired
|
||
* ``importlib.util.reset_globals()`` [reset_globals]_
|
||
|
||
Also note that reseting ``__main__`` does nothing about state stored
|
||
in other modules. So any solution would have to be clear about the
|
||
scope of what is being reset. Conceivably we could invent a mechanism
|
||
by which any (or every) module could be reset, unlike ``reload()``
|
||
which does not clear the module before loading into it. Regardless,
|
||
since ``__main__`` is the execution namespace of the interpreter,
|
||
resetting it has a much more direct correlation to interpreters and
|
||
their dynamic state than does resetting other modules. So a more
|
||
generic module reset mechanism may prove unnecessary.
|
||
|
||
This isn't a critical feature initially. It can wait until later
|
||
if desirable.
|
||
|
||
Support passing ints in channels
|
||
--------------------------------
|
||
|
||
Passing ints around should be fine and ultimately is probably
|
||
desirable. However, we can get by with serializing them as bytes
|
||
for now. The goal is a minimal API for the sake of basic
|
||
functionality at first.
|
||
|
||
File descriptors and sockets in channels
|
||
----------------------------------------
|
||
|
||
Given that file descriptors and sockets are process-global resources,
|
||
support for passing them through channels is a reasonable idea. They
|
||
would be a good candidate for the first effort at expanding the types
|
||
that channels support. They aren't strictly necessary for the initial
|
||
API.
|
||
|
||
Integration with async
|
||
----------------------
|
||
|
||
Per Antoine Pitrou [async]_::
|
||
|
||
Has any thought been given to how FIFOs could integrate with async
|
||
code driven by an event loop (e.g. asyncio)? I think the model of
|
||
executing several asyncio (or Tornado) applications each in their
|
||
own subinterpreter may prove quite interesting to reconcile multi-
|
||
core concurrency with ease of programming. That would require the
|
||
FIFOs to be able to synchronize on something an event loop can wait
|
||
on (probably a file descriptor?).
|
||
|
||
A possible solution is to provide async implementations of the blocking
|
||
channel methods (``__next__()``, ``recv()``, and ``send()``). However,
|
||
the basic functionality of subinterpreters does not depend on async and
|
||
can be added later.
|
||
|
||
Support for iteration
|
||
---------------------
|
||
|
||
Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or
|
||
``_next__()``) may be useful. A trivial implementation would use the
|
||
``recv()`` method, similar to how files do iteration. Since this isn't
|
||
a fundamental capability and has a simple analog, adding iteration
|
||
support can wait until later.
|
||
|
||
Channel context managers
|
||
------------------------
|
||
|
||
Context manager support on ``RecvChannel`` and ``SendChannel`` may be
|
||
helpful. The implementation would be simple, wrapping a call to
|
||
``close()`` like files do. As with iteration, this can wait.
|
||
|
||
Pipes and Queues
|
||
----------------
|
||
|
||
With the proposed object passing machanism of "channels", other similar
|
||
basic types aren't required to achieve the minimal useful functionality
|
||
of subinterpreters. Such types include pipes (like channels, but
|
||
one-to-one) and queues (like channels, but buffered). See below in
|
||
`Rejected Ideas` for more information.
|
||
|
||
Even though these types aren't part of this proposal, they may still
|
||
be useful in the context of concurrency. Adding them later is entirely
|
||
reasonable. The could be trivially implemented as wrappers around
|
||
channels. Alternatively they could be implemented for efficiency at the
|
||
same low level as channels.
|
||
|
||
interpreters.RunFailedError
|
||
---------------------------
|
||
|
||
As currently proposed, ``Interpreter.run()`` offers you no way to
|
||
distinguish an error coming from sub-interpreter from any other
|
||
error in the current interpreter. Your only option would be to
|
||
explicitly wrap your ``run()`` call in a ``try: ... except Exception:``.
|
||
|
||
If this is a problem in practice then would could add something like
|
||
``interpreters.RunFailedError`` and raise that in ``run()``, chaining
|
||
the actual error.
|
||
|
||
Of course, this depends on how we resolve `Leaking exceptions across
|
||
interpreters`_.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Explicit channel association
|
||
----------------------------
|
||
|
||
Interpreters are implicitly associated with channels upon ``recv()`` and
|
||
``send()`` calls. They are de-associated with ``close()`` calls. The
|
||
alternative would be explicit methods. It would be either
|
||
``add_channel()`` and ``remove_channel()`` methods on ``Interpreter``
|
||
objects or something similar on channel objects.
|
||
|
||
In practice, this level of management shouldn't be necessary for users.
|
||
So adding more explicit support would only add clutter to the API.
|
||
|
||
Use pipes instead of channels
|
||
-----------------------------
|
||
|
||
A pipe would be a simplex FIFO between exactly two interpreters. For
|
||
most use cases this would be sufficient. It could potentially simplify
|
||
the implementation as well. However, it isn't a big step to supporting
|
||
a many-to-many simplex FIFO via channels. Also, with pipes the API
|
||
ends up being slightly more complicated, requiring naming the pipes.
|
||
|
||
Use queues instead of channels
|
||
------------------------------
|
||
|
||
The main difference between queues and channels is that queues support
|
||
buffering. This would complicate the blocking semantics of ``recv()``
|
||
and ``send()``. Also, queues can be built on top of channels.
|
||
|
||
"enumerate"
|
||
-----------
|
||
|
||
The ``list_all()`` function provides the list of all interpreters.
|
||
In the threading module, which partly inspired the proposed API, the
|
||
function is called ``enumerate()``. The name is different here to
|
||
avoid confusing Python users that are not already familiar with the
|
||
threading API. For them "enumerate" is rather unclear, whereas
|
||
"list_all" is clear.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [c-api]
|
||
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
|
||
|
||
.. _Communicating Sequential Processes:
|
||
|
||
.. [CSP]
|
||
https://en.wikipedia.org/wiki/Communicating_sequential_processes
|
||
https://github.com/futurecore/python-csp
|
||
|
||
.. [fifo]
|
||
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Pipe
|
||
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue
|
||
https://docs.python.org/3/library/queue.html#module-queue
|
||
http://stackless.readthedocs.io/en/2.7-slp/library/stackless/channels.html
|
||
https://golang.org/doc/effective_go.html#sharing
|
||
http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
|
||
|
||
.. [caveats]
|
||
https://docs.python.org/3/c-api/init.html#bugs-and-caveats
|
||
|
||
.. [petr-c-ext]
|
||
https://mail.python.org/pipermail/import-sig/2016-June/001062.html
|
||
https://mail.python.org/pipermail/python-ideas/2016-April/039748.html
|
||
|
||
.. [cryptography]
|
||
https://github.com/pyca/cryptography/issues/2299
|
||
|
||
.. [global-gc]
|
||
http://bugs.python.org/issue24554
|
||
|
||
.. [gilstate]
|
||
https://bugs.python.org/issue10915
|
||
http://bugs.python.org/issue15751
|
||
|
||
.. [global-atexit]
|
||
https://bugs.python.org/issue6531
|
||
|
||
.. [mp-conn]
|
||
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Connection
|
||
|
||
.. [bug-rate]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047094.html
|
||
|
||
.. [benefits]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047122.html
|
||
|
||
.. [main-thread]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149566.html
|
||
|
||
.. [explicit-channels]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149565.html
|
||
|
||
.. [reset_globals]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149545.html
|
||
|
||
.. [async]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149420.html
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149585.html
|
||
|
||
.. [result-object]
|
||
https://mail.python.org/pipermail/python-dev/2017-September/149562.html
|
||
|
||
.. [jython]
|
||
https://mail.python.org/pipermail/python-ideas/2017-May/045771.html
|
||
|
||
.. [pypy]
|
||
https://mail.python.org/pipermail/python-ideas/2017-September/046973.html
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|