PEP 734: Multiple Interpreters in the Stdlib (#3523)
This commit is contained in:
parent
242239bb0c
commit
6aeaba7f23
|
@ -611,6 +611,7 @@ peps/pep-0730.rst @ned-deily
|
|||
peps/pep-0731.rst @gvanrossum @encukou @vstinner @zooba @iritkatriel
|
||||
peps/pep-0732.rst @Mariatta
|
||||
peps/pep-0733.rst @encukou @vstinner @zooba @iritkatriel
|
||||
peps/pep-0734.rst @ericsnowcurrently
|
||||
# ...
|
||||
# peps/pep-0754.rst
|
||||
# ...
|
||||
|
|
|
@ -2,7 +2,7 @@ PEP: 554
|
|||
Title: Multiple Interpreters in the Stdlib
|
||||
Author: Eric Snow <ericsnowcurrently@gmail.com>
|
||||
Discussions-To: https://discuss.python.org/t/pep-554-multiple-interpreters-in-the-stdlib/24855
|
||||
Status: Draft
|
||||
Status: Superseded
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 05-Sep-2017
|
||||
|
@ -13,6 +13,15 @@ Post-History: `07-Sep-2017 <https://mail.python.org/archives/list/python-ideas@p
|
|||
`05-Dec-2017 <https://mail.python.org/archives/list/python-dev@python.org/thread/BCSRGAMCYB3NGXNU42U66J56XNZVMQP2/>`__,
|
||||
`04-May-2020 <https://mail.python.org/archives/list/python-dev@python.org/thread/X2KPCSRVBD2QD5GP5IMXXZTGZ46OXD3D/>`__,
|
||||
`14-Mar-2023 <https://discuss.python.org/t/pep-554-multiple-interpreters-in-the-stdlib/24855/2/>`__,
|
||||
`01-Nov-2023 <https://discuss.python.org/t/pep-554-multiple-interpreters-in-the-stdlib/24855/26/>`__,
|
||||
Superseded-By: 734
|
||||
|
||||
|
||||
.. note::
|
||||
This PEP effectively continues in a cleaner form in :pep:`734`.
|
||||
This PEP is kept as-is for the sake of the various sections of
|
||||
background information and deferred/rejected ideas that have
|
||||
been stripped from :pep:`734`.
|
||||
|
||||
|
||||
Abstract
|
||||
|
@ -747,7 +756,7 @@ The module provides the following functions::
|
|||
|
||||
Return True if the object may be "shared" between interpreters.
|
||||
This does not necessarily mean that the actual objects will be
|
||||
shared. Insead, it means that the objects' underlying data will
|
||||
shared. Instead, it means that the objects' underlying data will
|
||||
be shared in a cross-interpreter way, whether via a proxy, a
|
||||
copy, or some other means.
|
||||
|
||||
|
@ -927,7 +936,7 @@ Shareable Types
|
|||
|
||||
An object is "shareable" if its type supports shareable instances.
|
||||
The type must implement a new internal protocol, which is used to
|
||||
convert an object to interpreter-independent data and then coverted
|
||||
convert an object to interpreter-independent data and then converted
|
||||
back to an object on the other side. Also see
|
||||
`is_shareable() <interpreters-is-shareable_>`_ above.
|
||||
|
||||
|
|
|
@ -0,0 +1,925 @@
|
|||
PEP: 734
|
||||
Title: Multiple Interpreters in the Stdlib
|
||||
Author: Eric Snow <ericsnowcurrently@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 06-Nov-2023
|
||||
Python-Version: 3.13
|
||||
Replaces: 554
|
||||
|
||||
|
||||
.. note::
|
||||
This PEP is essentially a continuation of :pep:`554`. That document
|
||||
had grown a lot of ancillary information across 7 years of discussion.
|
||||
This PEP is a reduction back to the essential information. Much of
|
||||
that extra information is still valid and useful, just not in the
|
||||
immediate context of the specific proposal here.
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes to add a new module, ``interpreters``, to support
|
||||
inspecting, creating, and running code in multiple interpreters in the
|
||||
current process. This includes ``Interpreter`` objects that represent
|
||||
the underlying interpreters. The module will also provide a basic
|
||||
``Queue`` class for communication between interpreters. Finally, we
|
||||
will add a new ``concurrent.futures.InterpreterPoolExecutor`` based
|
||||
on the ``interpreters`` module.
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Fundamentally, an "interpreter" is the collection of (essentially)
|
||||
all runtime state which Python threads must share. So, let's first
|
||||
look at threads. Then we'll circle back to interpreters.
|
||||
|
||||
Threads and Thread States
|
||||
-------------------------
|
||||
|
||||
A Python process will have one or more OS threads running Python code
|
||||
(or otherwise interacting with the C API). Each of these threads
|
||||
interacts with the CPython runtime using its own thread state
|
||||
(``PyThreadState``), which holds all the runtime state unique to that
|
||||
thread. There is also some runtime state that is shared between
|
||||
multiple OS threads.
|
||||
|
||||
Any OS thread may switch which thread state it is currently using, as
|
||||
long as it isn't one that another OS thread is already using (or has
|
||||
been using). This "current" thread state is stored by the runtime
|
||||
in a thread-local variable, and may be looked up explicitly with
|
||||
``PyThreadState_Get()``. It gets set automatically for the initial
|
||||
("main") OS thread and for ``threading.Thread`` objects. From the
|
||||
C API it is set (and cleared) by ``PyThreadState_Swap()`` and may
|
||||
be set by ``PyGILState_Ensure()``. Most of the C API requires that
|
||||
there be a current thread state, either looked up implicitly
|
||||
or passed in as an argument.
|
||||
|
||||
The relationship between OS threads and thread states is one-to-many.
|
||||
Each thread state is associated with at most a single OS thread and
|
||||
records its thread ID. A thread state is never used for more than one
|
||||
OS thread. In the other direction, however, an OS thread may have more
|
||||
than one thread state associated with it, though, again, only one
|
||||
may be current.
|
||||
|
||||
When there's more than one thread state for an OS thread,
|
||||
``PyThreadState_Swap()`` is used in that OS thread to switch
|
||||
between them, with the requested thread state becoming the current one.
|
||||
Whatever was running in the thread using the old thread state is
|
||||
effectively paused until that thread state is swapped back in.
|
||||
|
||||
Interpreter States
|
||||
------------------
|
||||
|
||||
As noted earlier, there is some runtime state that multiple OS threads
|
||||
share. Some of it is exposed by the ``sys`` module, though much is
|
||||
used internally and not exposed explicitly or only through the C API.
|
||||
|
||||
This shared state is called the interpreter state
|
||||
(``PyInterpreterState``). We'll sometimes refer to it here as just
|
||||
"interpreter", though that is also sometimes used to refer to the
|
||||
``python`` executable, to the Python implementation, and to the
|
||||
bytecode interpreter (i.e. ``exec()``/``eval()``).
|
||||
|
||||
CPython has supported multiple interpreters in the same process (AKA
|
||||
"subinterpreters") since version 1.5 (1997). The feature has been
|
||||
available via the :ref:`C API <python:sub-interpreter-support>`.
|
||||
|
||||
Interpreters and Threads
|
||||
------------------------
|
||||
|
||||
Thread states are related to interpreter states in much the same way
|
||||
that OS threads and processes are related (at a hight level). To
|
||||
begin with, the relationship is one-to-many.
|
||||
A thread state belongs to a single interpreter (and stores
|
||||
a pointer to it). That thread state is never used for a different
|
||||
interpreter. In the other direction, however, an interpreter may have
|
||||
zero or more thread states associated with it. The interpreter is only
|
||||
considered active in OS threads where one of its thread states
|
||||
is current.
|
||||
|
||||
Interpreters are created via the C API using
|
||||
``Py_NewInterpreterFromConfig()`` (or ``Py_NewInterpreter()``, which
|
||||
is a light wrapper around ``Py_NewInterpreterFromConfig()``).
|
||||
That function does the following:
|
||||
|
||||
1. create a new interpreter state
|
||||
2. create a new thread state
|
||||
3. set the thread state as current
|
||||
(a current tstate is needed for interpreter init)
|
||||
4. initialize the interpreter state using that thread state
|
||||
5. return the thread state (still current)
|
||||
|
||||
Note that the returned thread state may be immediately discarded.
|
||||
There is no requirement that an interpreter have any thread states,
|
||||
except as soon as the interpreter is meant to actually be used.
|
||||
At that point it must be made active in the current OS thread.
|
||||
|
||||
To make an existing interpreter active in the current OS thread,
|
||||
the C API user first makes sure that interpreter has a corresponding
|
||||
thread state. Then ``PyThreadState_Swap()`` is called like normal
|
||||
using that thread state. If the thread state for another interpreter
|
||||
was already current then it gets swapped out like normal and execution
|
||||
of that interpreter in the OS thread is thus effectively paused until
|
||||
it is swapped back in.
|
||||
|
||||
Once an interpreter is active in the current OS thread like that, the
|
||||
thread can call any of the C API, such as ``PyEval_EvalCode()``
|
||||
(i.e. ``exec()``). This works by using the current thread state as
|
||||
the runtime context.
|
||||
|
||||
The "Main" Interpreter
|
||||
----------------------
|
||||
|
||||
When a Python process starts, it creates a single interpreter state
|
||||
(the "main" interpreter) with a single thread state for the current
|
||||
OS thread. The Python runtime is then initialized using them.
|
||||
|
||||
After initialization, the script or module or REPL is executed using
|
||||
them. That execution happens in the interpreter's ``__main__`` module.
|
||||
|
||||
When the process finishes running the requested Python code or REPL,
|
||||
in the main OS thread, the Python runtime is finalized in that thread
|
||||
using the main interpreter.
|
||||
|
||||
Runtime finalization has only a slight, indirect effect on still-running
|
||||
Python threads, whether in the main interpreter or in subinterpreters.
|
||||
That's because right away it waits indefinitely for all non-daemon
|
||||
Python threads to finish.
|
||||
|
||||
While the C API may be queried, there is no mechanism by which any
|
||||
Python thread is directly alerted that finalization has begun,
|
||||
other than perhaps with "atexit" functions that may be been
|
||||
registered using ``threading._register_atexit()``.
|
||||
|
||||
Any remaining subinterpreters are themselves finalized later,
|
||||
but at that point they aren't current in any OS threads.
|
||||
|
||||
Interpreter Isolation
|
||||
---------------------
|
||||
|
||||
CPython's interpreters are intended to be strictly isolated from each
|
||||
other. That means interpreters never share objects (except in very
|
||||
specific cases with immortal, immutable builtin objects). Each
|
||||
interpreter has its own modules (``sys.modules``), classes, functions,
|
||||
and variables. Even where two interpreters define the same class,
|
||||
each will have its own copy. The same applies to state in C, including
|
||||
in extension modules. The CPython C API docs `explain more`_.
|
||||
|
||||
.. _explain more:
|
||||
https://docs.python.org/3/c-api/init.html#bugs-and-caveats
|
||||
|
||||
Notably, there is some process-global state that interpreters will
|
||||
always share, some mutable and some immutable. Sharing immutable
|
||||
state presents few problems, while providing some benefits (mainly
|
||||
performance). However, all shared mutable state requires special
|
||||
management, particularly for thread-safety, some of which the OS
|
||||
takes care of for us.
|
||||
|
||||
Mutable:
|
||||
|
||||
* file descriptors
|
||||
* low-level env vars
|
||||
* process memory (though allocators *are* isolated)
|
||||
* the list of interpreters
|
||||
|
||||
Immutable:
|
||||
|
||||
* builtin types (e.g. ``dict``, ``bytes``)
|
||||
* singletons (e.g. ``None``)
|
||||
* underlying static module data (e.g. functions) for
|
||||
builtin/extension/frozen modules
|
||||
|
||||
Existing Execution Components
|
||||
-----------------------------
|
||||
|
||||
There are a number of existing parts of Python that may help
|
||||
with understanding how code may be run in a subinterpreter.
|
||||
|
||||
In CPython, each component is built around one of the following
|
||||
C API functions (or variants):
|
||||
|
||||
* ``PyEval_EvalCode()``: run the bytecode interpreter with the given
|
||||
code object
|
||||
* ``PyRun_String()``: compile + ``PyEval_EvalCode()``
|
||||
* ``PyRun_File()``: read + compile + ``PyEval_EvalCode()``
|
||||
* ``PyRun_InteractiveOneObject()``: compile + ``PyEval_EvalCode()``
|
||||
* ``PyObject_Call()``: calls ``PyEval_EvalCode()``
|
||||
|
||||
builtins.exec()
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The builtin ``exec()`` may be used to execute Python code. It is
|
||||
essentially a wrapper around the C API functions ``PyRun_String()``
|
||||
and ``PyEval_EvalCode()``.
|
||||
|
||||
Here are some relevant characteristics of the builtin ``exec()``:
|
||||
|
||||
* It runs in the current OS thread and pauses whatever
|
||||
was running there, which resumes when ``exec()`` finishes.
|
||||
No other OS threads are affected.
|
||||
(To avoid pausing the current Python thread, run ``exec()``
|
||||
in a ``threading.Thread``.)
|
||||
* It may start additional threads, which don't interrupt it.
|
||||
* It executes against a "globals" namespace (and a "locals"
|
||||
namespace). At module-level, ``exec()`` defaults to using
|
||||
``__dict__`` of the current module (i.e. ``globals()``).
|
||||
``exec()`` uses that namespace as-is and does not clear it before or after.
|
||||
* It propagates any uncaught exception from the code it ran.
|
||||
The exception is raised from the ``exec()`` call in the Python
|
||||
thread that originally called ``exec()``.
|
||||
|
||||
Command-line
|
||||
^^^^^^^^^^^^
|
||||
|
||||
The ``python`` CLI provides several ways to run Python code. In each
|
||||
case it maps to a corresponding C API call:
|
||||
|
||||
* ``<no args>``, ``-i`` - run the REPL
|
||||
(``PyRun_InteractiveOneObject()``)
|
||||
* ``<filename>`` - run a script (``PyRun_File()``)
|
||||
* ``-c <code>`` - run the given Python code (``PyRun_String()``)
|
||||
* ``-m module`` - run the module as a script
|
||||
(``PyEval_EvalCode()`` via ``runpy._run_module_as_main()``)
|
||||
|
||||
In each case it is essentially a variant of running ``exec()``
|
||||
at the top-level of the ``__main__`` module of the main interpreter.
|
||||
|
||||
threading.Thread
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
When a Python thread is started, it runs the "target" function
|
||||
with ``PyObject_Call()`` using a new thread state. The globals
|
||||
namespace come from ``func.__globals__`` and any uncaught
|
||||
exception is discarded.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
The ``interpreters`` module will provide a high-level interface to the
|
||||
multiple interpreter functionality. The goal is to make the existing
|
||||
multiple-interpreters feature of CPython more easily accessible to
|
||||
Python code. This is particularly relevant now that CPython has a
|
||||
per-interpreter GIL (:pep:`684`) and people are more interested
|
||||
in using multiple interpreters.
|
||||
|
||||
Without a stdlib module, users are limited to the
|
||||
:ref:`C API <python:sub-interpreter-support>`, which restricts how much
|
||||
they can try out and take advantage of multiple interpreters.
|
||||
|
||||
The module will include a basic mechanism for communicating between
|
||||
interpreters. Without one, multiple interpreters are a much less
|
||||
useful feature.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
A Minimal API
|
||||
-------------
|
||||
|
||||
Since the core dev team has no real experience with
|
||||
how users will make use of multiple interpreters in Python code, this
|
||||
proposal purposefully keeps the initial API as lean and minimal as
|
||||
possible. The objective is to provide a well-considered foundation
|
||||
on which further (more advanced) functionality may be added later,
|
||||
as appropriate.
|
||||
|
||||
That said, the proposed design incorporates lessons learned from
|
||||
existing use of subinterpreters by the community, from existing stdlib
|
||||
modules, and from other programming languages. It also factors in
|
||||
experience from using subinterpreters in the CPython test suite and
|
||||
using them in `concurrency benchmarks`_.
|
||||
|
||||
.. _concurrency benchmarks:
|
||||
https://github.com/ericsnowcurrently/concurrency-benchmarks
|
||||
|
||||
Interpreter.prepare_main() Sets Multiple Variables
|
||||
--------------------------------------------------
|
||||
|
||||
``prepare_main()`` may be seen as a setter function of sorts.
|
||||
It supports setting multiple names at once,
|
||||
e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
|
||||
set one item at a time. The main reason is for efficiency.
|
||||
|
||||
To set a value in the interpreter's ``__main__.__dict__``, the
|
||||
implementation must first switch the OS thread to the identified
|
||||
interpreter, which involves some non-negligible overhead. After
|
||||
setting the value it must switch back.
|
||||
Furthermore, there is some additional overhead to the mechanism
|
||||
by which it passes objects between interpreters, which can be
|
||||
reduced in aggregate if multiple values are set at once.
|
||||
|
||||
Therefore, ``prepare_main()`` supports setting multiple
|
||||
values at once.
|
||||
|
||||
Propagating Exceptions
|
||||
----------------------
|
||||
|
||||
An uncaught exception from a subinterpreter,
|
||||
via ``Interpreter.exec_sync()``,
|
||||
could either be (effectively) ignored, like ``threading.Thread()`` does,
|
||||
or propagated, like the builtin ``exec()`` does. Since ``exec_sync()``
|
||||
is a synchronous operation, like the builtin ``exec()``,
|
||||
uncaught exceptions are propagated.
|
||||
|
||||
However, such exceptions are not raised directly. That's because
|
||||
interpreters are isolated from each other and must not share objects,
|
||||
including exceptions. That could be addressed by raising a surrogate
|
||||
of the exception, whether a summary, a copy, or a proxy that wraps it.
|
||||
Any of those could preserve the traceback, which is useful for
|
||||
debugging. The ``ExecFailure`` that gets raised
|
||||
is such a surrogate.
|
||||
|
||||
There's another concern to consider. If a propagated exception isn't
|
||||
immediately caught, it will bubble up through the call stack until
|
||||
caught (or not). In the case that code somewhere else may catch it,
|
||||
it is helpful to identify that the exception came from a subinterpreter
|
||||
(i.e. a "remote" source), rather than from the current interpreter.
|
||||
That's why ``Interpreter.exec_sync()`` raises ``ExecFailure`` and why
|
||||
it is a plain ``Exception``, rather than a copy or proxy with a class
|
||||
that matches the original exception. For example, an uncaught
|
||||
``ValueError`` from a subinterpreter would never get caught in a later
|
||||
``try: ... except ValueError: ...``. Instead, ``ExecFailure``
|
||||
must be handled directly.
|
||||
|
||||
Limited Object Sharing
|
||||
----------------------
|
||||
|
||||
As noted in `Interpreter Isolation`_, only a small number of builtin
|
||||
objects may be truly shared between interpreters. In all other cases
|
||||
objects can only be shared indirectly, through copies or proxies.
|
||||
|
||||
The set of objects that are shareable as copies through queues
|
||||
(and ``Interpreter.prepare_main()``) is limited for the sake of
|
||||
efficiency.
|
||||
|
||||
Supporting sharing of *all* objects is possible (via pickle)
|
||||
but not part of this proposal. For one thing, it's helpful to know
|
||||
that only an efficient implementation is being used. Furthermore,
|
||||
for mutable objects pickling would violate the guarantee that "shared"
|
||||
objects be equivalent (and stay that way).
|
||||
|
||||
Objects vs. ID Proxies
|
||||
----------------------
|
||||
|
||||
For both interpreters and queues, the low-level module makes use of
|
||||
proxy objects that expose the underlying state by their corresponding
|
||||
process-global IDs. In both cases the state is likewise process-global
|
||||
and will be used by multiple interpreters. Thus they aren't suitable
|
||||
to be implemented as ``PyObject``, which is only really an option for
|
||||
interpreter-specific data. That's why the ``interpreters`` module
|
||||
instead provides objects that are weakly associated through the ID.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
The module will:
|
||||
|
||||
* expose the existing multiple interpreter support
|
||||
* introduce a basic mechanism for communicating between interpreters
|
||||
|
||||
The module will wrap a new low-level ``_interpreters`` module
|
||||
(in the same way as the ``threading`` module).
|
||||
However, that low-level API is not intended for public use
|
||||
and thus not part of this proposal.
|
||||
|
||||
Using Interpreters
|
||||
------------------
|
||||
|
||||
The module defines the following functions:
|
||||
|
||||
* ``get_current() -> Interpreter``
|
||||
Returns the ``Interpreter`` object for the currently executing
|
||||
interpreter.
|
||||
|
||||
* ``list_all() -> list[Interpreter]``
|
||||
Returns the ``Interpreter`` object for each existing interpreter,
|
||||
whether it is currently running in any OS threads or not.
|
||||
|
||||
* ``create() -> Interpreter``
|
||||
Create a new interpreter and return the ``Interpreter`` object
|
||||
for it. The interpreter doesn't do anything on its own and is
|
||||
not inherently tied to any OS thread. That only happens when
|
||||
something is actually run in the interpreter
|
||||
(e.g. ``Interpreter.exec_sync()``), and only while running.
|
||||
The interpreter may or may not have thread states ready to use,
|
||||
but that is strictly an internal implementation detail.
|
||||
|
||||
Interpreter Objects
|
||||
-------------------
|
||||
|
||||
An ``interpreters.Interpreter`` object that represents the interpreter
|
||||
(``PyInterpreterState``) with the corresponding unique ID.
|
||||
There will only be one object for any given interpreter.
|
||||
|
||||
If the interpreter was created with ``interpreters.create()`` then
|
||||
it will be destroyed as soon as all ``Interpreter`` objects have been
|
||||
deleted.
|
||||
|
||||
Attributes and methods:
|
||||
|
||||
* ``id``
|
||||
(read-only) A non-negative ``int`` that identifies the
|
||||
interpreter that this ``Interpreter`` instance represents.
|
||||
Conceptually, this is similar to a process ID.
|
||||
|
||||
* ``__hash__()``
|
||||
Returns the hash of the interpreter's ``id``. This is the same
|
||||
as the hash of the ID's integer value.
|
||||
|
||||
* ``is_running() -> bool``
|
||||
Returns ``True`` if the interpreter is currently executing code
|
||||
in its ``__main__`` module. This excludes sub-threads.
|
||||
|
||||
It refers only to if there is an OS thread
|
||||
running a script (code) in the interpreter's ``__main__`` module.
|
||||
That basically means whether or not ``Interpreter.exec_sync()``
|
||||
is running in some OS thread. Code running in sub-threads
|
||||
is ignored.
|
||||
|
||||
* ``prepare_main(**kwargs)``
|
||||
Bind one or more objects in the interpreter's ``__main__`` module.
|
||||
|
||||
The keyword argument names will be used as the attribute names.
|
||||
The values will be bound as new objects, though exactly equivalent
|
||||
to the original. Only objects specifically supported for passing
|
||||
between interpreters are allowed. See `Shareable Objects`_.
|
||||
|
||||
``prepare_main()`` is helpful for initializing the
|
||||
globals for an interpreter before running code in it.
|
||||
|
||||
* ``exec_sync(code, /)``
|
||||
Execute the given source code in the interpreter
|
||||
(in the current OS thread), using its ``__main__`` module.
|
||||
It doesn't return anything.
|
||||
|
||||
This is essentially equivalent to switching to this interpreter
|
||||
in the current OS thread and then calling the builtin ``exec()``
|
||||
using this interpreter's ``__main__`` module's ``__dict__`` as
|
||||
the globals and locals.
|
||||
|
||||
The code running in the current OS thread (a different
|
||||
interpreter) is effectively paused until ``exec_sync()``
|
||||
finishes. To avoid pausing it, create a new ``threading.Thread``
|
||||
and call ``exec_sync()`` in it.
|
||||
|
||||
``exec_sync()`` does not reset the interpreter's state nor
|
||||
the ``__main__`` module, neither before nor after, so each
|
||||
successive call picks up where the last one left off. This can
|
||||
be useful for running some code to initialize an interpreter
|
||||
(e.g. with imports) before later performing some repeated task.
|
||||
|
||||
If there is an uncaught exception, it will be propagated into
|
||||
the calling interpreter as a ``ExecFailure``, which
|
||||
preserves enough information for a helpful error display. That
|
||||
means if the ``ExecFailure`` isn't caught then the full
|
||||
traceback of the propagated exception, including details about
|
||||
syntax errors, etc., will be displayed. Having the full
|
||||
traceback is particularly useful when debugging.
|
||||
|
||||
If exception propagation is not desired then an explicit
|
||||
try-except should be used around the *code* passed to
|
||||
``exec_sync()``. Likewise any error handling that depends
|
||||
on specific information from the exception must use an explicit
|
||||
try-except around the given *code*, since ``ExecFailure``
|
||||
will not preserve that information.
|
||||
|
||||
* ``run(code, /) -> threading.Thread``
|
||||
Create a new thread and call ``exec_sync()`` in it.
|
||||
Exceptions are not propagated.
|
||||
|
||||
This is roughly equivalent to::
|
||||
|
||||
def task():
|
||||
interp.exec_sync(code)
|
||||
t = threading.Thread(target=task)
|
||||
t.start()
|
||||
|
||||
Communicating Between Interpreters
|
||||
----------------------------------
|
||||
|
||||
The module introduces a basic communication mechanism through special
|
||||
queues.
|
||||
|
||||
There are ``interpreters.Queue`` objects, but they only proxy
|
||||
the actual data structure: an unbounded FIFO queue that exists
|
||||
outside any one interpreter. These queues have special accommodations
|
||||
for safely passing object data between interpreters, without violating
|
||||
interpreter isolation. This includes thread-safety.
|
||||
|
||||
As with other queues in Python, for each "put" the object is added to
|
||||
the back and each "get" pops the next one off the front. Every added
|
||||
object will be popped off in the order it was pushed on.
|
||||
|
||||
Only objects that are specifically supported for passing
|
||||
between interpreters may be sent through a ``Queue``.
|
||||
Note that the actual objects aren't sent, but rather their
|
||||
underlying data. However, the popped object will still be
|
||||
strictly equivalent to the original.
|
||||
See `Shareable Objects`_.
|
||||
|
||||
The module defines the following functions:
|
||||
|
||||
* ``create_queue(maxsize=0) -> Queue``
|
||||
Create a new queue. If the maxsize is zero or negative then the
|
||||
queue is unbounded.
|
||||
|
||||
Queue Objects
|
||||
-------------
|
||||
|
||||
``interpreters.Queue`` objects act as proxies for the underlying
|
||||
cross-interpreter-safe queues exposed by the ``interpreters`` module.
|
||||
Each ``Queue`` object represents the queue with the corresponding
|
||||
unique ID.
|
||||
There will only be one object for any given queue.
|
||||
|
||||
``Queue`` implements all the methods of ``queue.Queue`` except for
|
||||
``task_done()`` and ``join()``, hence it is similar to
|
||||
``asyncio.Queue`` and ``multiprocessing.Queue``.
|
||||
|
||||
Attributes and methods:
|
||||
|
||||
* ``id``
|
||||
(read-only) A non-negative ``int`` that identifies
|
||||
the corresponding cross-interpreter queue.
|
||||
Conceptually, this is similar to the file descriptor
|
||||
used for a pipe.
|
||||
|
||||
* ``maxsize``
|
||||
Number of items allowed in the queue. Zero means "unbounded".
|
||||
|
||||
* ``__hash__()``
|
||||
Return the hash of the queue's ``id``. This is the same
|
||||
as the hash of the ID's integer value.
|
||||
|
||||
* ``empty()``
|
||||
Return ``True`` if the queue is empty, ``False`` otherwise.
|
||||
|
||||
This is only a snapshot of the state at the time of the call.
|
||||
Other threads or interpreters may cause this to change.
|
||||
|
||||
* ``full()``
|
||||
Return ``True`` if there are ``maxsize`` items in the queue.
|
||||
|
||||
If the queue was initialized with ``maxsize=0`` (the default),
|
||||
then ``full()`` never returns ``True``.
|
||||
|
||||
This is only a snapshot of the state at the time of the call.
|
||||
Other threads or interpreters may cause this to change.
|
||||
|
||||
* ``qsize()``
|
||||
Return the number of items in the queue.
|
||||
|
||||
This is only a snapshot of the state at the time of the call.
|
||||
Other threads or interpreters may cause this to change.
|
||||
|
||||
* ``put(obj, timeout=None)``
|
||||
Add the object to the queue.
|
||||
|
||||
The object must be `shareable <Shareable Objects_>`_, which means
|
||||
the object's data is passed through rather than the object itself.
|
||||
|
||||
If ``maxsize > 0`` and the queue is full then this blocks until
|
||||
a free slot is available. If *timeout* is a positive number
|
||||
then it only blocks at least that many seconds and then raises
|
||||
``interpreters.QueueFull``. Otherwise is blocks forever.
|
||||
|
||||
* ``put_nowait(obj)``
|
||||
Like ``put()`` but effectively with an immediate timeout.
|
||||
Thus if the queue is full, it immediately raises
|
||||
``interpreters.QueueFull``.
|
||||
|
||||
* ``get(timeout=None) -> object``
|
||||
Pop the next object from the queue and return it. Block while
|
||||
the queue is empty. If a positive *timeout* is provided and an
|
||||
object hasn't been added to the queue in that many seconds
|
||||
then raise ``interpreters.QueueEmpty``.
|
||||
|
||||
* ``get_nowait() -> object``
|
||||
Like ``get()``, but do not block. If the queue is not empty
|
||||
then return the next item. Otherwise, raise
|
||||
``interpreters.QueueEmpty``.
|
||||
|
||||
Shareable Objects
|
||||
-----------------
|
||||
|
||||
Both ``Interpreter.prepare_main()`` and ``Queue`` work only with
|
||||
"shareable" objects.
|
||||
|
||||
A "shareable" object is one which may be passed from one interpreter
|
||||
to another. The object is not necessarily actually directly shared
|
||||
by the interpreters. However, even if it isn't, the shared object
|
||||
should be treated as though it *were* shared directly. That's a
|
||||
strong equivalence guarantee for all shareable objects.
|
||||
(See below.)
|
||||
|
||||
For some types (builtin singletons), the actual object is shared.
|
||||
For some, the object's underlying data is actually shared but each
|
||||
interpreter has a distinct object wrapping that data. For all other
|
||||
shareable types, a strict copy or proxy is made such that the
|
||||
corresponding objects continue to match exactly. In cases where
|
||||
the underlying data is complex but must be copied (e.g. ``tuple``),
|
||||
the data is serialized as efficiently as possible.
|
||||
|
||||
Shareable objects must be specifically supported internally
|
||||
by the Python runtime. However, there is no restriction against
|
||||
adding support for more types later.
|
||||
|
||||
Here's the initial list of supported objects:
|
||||
|
||||
* ``str``
|
||||
* ``bytes``
|
||||
* ``int``
|
||||
* ``float``
|
||||
* ``bool`` (``True``/``False``)
|
||||
* ``None``
|
||||
* ``tuple`` (only with shareable items)
|
||||
* ``Queue``
|
||||
* ``memoryview`` (underlying buffer actually shared)
|
||||
|
||||
Note that the last two on the list, queues and ``memoryview``, are
|
||||
technically mutable data types, whereas the rest are not. When any
|
||||
interpreters share mutable data there is always a risk of data races.
|
||||
Cross-interpreter safety, including thread-safety, is a fundamental
|
||||
feature of queues.
|
||||
|
||||
However, ``memoryview`` does not have any native accommodations.
|
||||
The user is responsible for managing thread-safety, whether passing
|
||||
a token back and forth through a queue to indicate safety
|
||||
(see `Synchronization`_), or by assigning sub-range exclusivity
|
||||
to individual interpreters.
|
||||
|
||||
Most objects will be shared through queues (``Queue``), as interpreters
|
||||
communicate information between each other. Less frequently, objects
|
||||
will be shared through ``prepare_main()`` to set up an interpreter
|
||||
prior to running code in it. However, ``prepare_main()`` is the
|
||||
primary way that queues are shared, to provide another interpreter
|
||||
with a means of further communication.
|
||||
|
||||
Finally, a reminder: for a few types the actual object is shared,
|
||||
whereas for the rest only the underlying data is shared, whether
|
||||
as a copy or through a proxy. Regardless, it always preserves
|
||||
the strong equivalence guarantee of "shareable" objects.
|
||||
|
||||
The guarantee is that a shared object in one interpreter is strictly
|
||||
equivalent to the corresponding object in the other interpreter.
|
||||
In other words, the two objects will be indistinguishable from each
|
||||
other. The shared object should be treated as though the original
|
||||
had been shared directly, whether or not it actually was.
|
||||
That's a slightly different and stronger promise than just equality.
|
||||
|
||||
The guarantee is especially important for mutable objects, like
|
||||
``Queue`` and ``memoryview``. Mutating the object in one interpreter
|
||||
will always be reflected immediately in every other interpreter
|
||||
sharing the object.
|
||||
|
||||
Synchronization
|
||||
---------------
|
||||
|
||||
There are situations where two interpreters should be synchronized.
|
||||
That may involve sharing a resource, worker management, or preserving
|
||||
sequential consistency.
|
||||
|
||||
In threaded programming the typical synchronization primitives are
|
||||
types like mutexes. The ``threading`` module exposes several.
|
||||
However, interpreters cannot share objects which means they cannot
|
||||
share ``threading.Lock`` objects.
|
||||
|
||||
The ``interpreters`` module does not provide any such dedicated
|
||||
synchronization primitives. Instead, ``Queue`` objects provide
|
||||
everything one might need.
|
||||
|
||||
For example, if there's a shared resource that needs managed
|
||||
access then a queue may be used to manage it, where the interpreters
|
||||
pass an object around to indicate who can use the resource::
|
||||
|
||||
import interpreters
|
||||
from mymodule import load_big_data, check_data
|
||||
|
||||
numworkers = 10
|
||||
control = interpreters.create_queue()
|
||||
data = memoryview(load_big_data())
|
||||
|
||||
def worker():
|
||||
interp = interpreters.create()
|
||||
interp.prepare_main(control=control, data=data)
|
||||
interp.exec_sync("""if True:
|
||||
from mymodule import edit_data
|
||||
while True:
|
||||
token = control.get()
|
||||
edit_data(data)
|
||||
control.put(token)
|
||||
""")
|
||||
threads = [threading.Thread(target=worker) for _ in range(numworkers)]
|
||||
for t in threads:
|
||||
t.start()
|
||||
|
||||
token = 'football'
|
||||
control.put(token)
|
||||
while True:
|
||||
control.get()
|
||||
if not check_data(data):
|
||||
break
|
||||
control.put(token)
|
||||
|
||||
Exceptions
|
||||
----------
|
||||
|
||||
* ``ExecFailure``
|
||||
Raised from ``Interpreter.exec_sync()`` when there's an
|
||||
uncaught exception. The error display for this exception
|
||||
includes the traceback of the uncaught exception, which gets
|
||||
shown after the normal error display, much like happens for
|
||||
``ExceptionGroup``.
|
||||
|
||||
Attributes:
|
||||
|
||||
* ``type`` - a representation of the original exception's class,
|
||||
with ``__name__``, ``__module__``, and ``__qualname__`` attrs.
|
||||
* ``msg`` - ``str(exc)`` of the original exception
|
||||
* ``snapshot`` - a ``traceback.TracebackException`` object
|
||||
for the original exception
|
||||
|
||||
This exception is a subclass of ``RuntimeError``.
|
||||
|
||||
* ``QueueEmpty``
|
||||
Raised from ``Queue.get()`` (or ``get_nowait()`` with no default)
|
||||
when the queue is empty.
|
||||
|
||||
This exception is a subclass of ``queue.Empty``.
|
||||
|
||||
* ``QueueFull``
|
||||
Raised from ``Queue.put()`` (with a timeout) or ``put_nowait()``
|
||||
when the queue is already at its max size.
|
||||
|
||||
This exception is a subclass of ``queue.Full``.
|
||||
|
||||
InterpreterPoolExecutor
|
||||
-----------------------
|
||||
|
||||
Along with the new ``interpreters`` module, there will be a new
|
||||
``concurrent.futures.InterpreterPoolExecutor``. Each worker executes
|
||||
in its own thread with its own subinterpreter. Communication may
|
||||
still be done through ``Queue`` objects, set with the initializer.
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
The following examples demonstrate practical cases where multiple
|
||||
interpreters may be useful.
|
||||
|
||||
Example 1:
|
||||
|
||||
There's a stream of requests coming in that will be handled
|
||||
via workers in sub-threads.
|
||||
|
||||
* each worker thread has its own interpreter
|
||||
* there's one queue to send tasks to workers and
|
||||
another queue to return results
|
||||
* the results are handled in a dedicated thread
|
||||
* each worker keeps going until it receives a "stop" sentinel (``None``)
|
||||
* the results handler keeps going until all workers have stopped
|
||||
|
||||
::
|
||||
|
||||
import interpreters
|
||||
from mymodule import iter_requests, handle_result
|
||||
|
||||
tasks = interpreters.create_queue()
|
||||
results = interpreters.create_queue()
|
||||
|
||||
numworkers = 20
|
||||
threads = []
|
||||
|
||||
def results_handler():
|
||||
running = numworkers
|
||||
while running:
|
||||
try:
|
||||
res = results.get(timeout=0.1)
|
||||
except interpreters.QueueEmpty:
|
||||
# No workers have finished a request since last time.
|
||||
pass
|
||||
else:
|
||||
if res is None:
|
||||
# A worker has stopped.
|
||||
running -= 1
|
||||
else:
|
||||
handle_result(res)
|
||||
empty = object()
|
||||
assert results.get_nowait(empty) is empty
|
||||
threads.append(threading.Thread(target=results_handler))
|
||||
|
||||
def worker():
|
||||
interp = interpreters.create()
|
||||
interp.prepare_main(tasks=tasks, results=results)
|
||||
interp.exec_sync("""if True:
|
||||
from mymodule import handle_request, capture_exception
|
||||
|
||||
while True:
|
||||
req = tasks.get()
|
||||
if req is None:
|
||||
# Stop!
|
||||
break
|
||||
try:
|
||||
res = handle_request(req)
|
||||
except Exception as exc:
|
||||
res = capture_exception(exc)
|
||||
results.put(res)
|
||||
# Notify the results handler.
|
||||
results.put(None)
|
||||
""")
|
||||
threads.extend(threading.Thread(target=worker) for _ in range(numworkers))
|
||||
|
||||
for t in threads:
|
||||
t.start()
|
||||
|
||||
for req in iter_requests():
|
||||
tasks.put(req)
|
||||
# Send the "stop" signal.
|
||||
for _ in range(numworkers):
|
||||
tasks.put(None)
|
||||
|
||||
for t in threads:
|
||||
t.join()
|
||||
|
||||
Example 2:
|
||||
|
||||
This case is similar to the last as there are a bunch of workers
|
||||
in sub-threads. However, this time the code is chunking up a big array
|
||||
of data, where each worker processes one chunk at a time. Copying
|
||||
that data to each interpreter would be exceptionally inefficient,
|
||||
so the code takes advantage of directly sharing ``memoryview`` buffers.
|
||||
|
||||
* all the interpreters share the buffer of the source array
|
||||
* each one writes its results to a second shared buffer
|
||||
* there's use a queue to send tasks to workers
|
||||
* only one worker will ever read any given index in the source array
|
||||
* only one worker will ever write to any given index in the results
|
||||
(this is how it ensures thread-safety)
|
||||
|
||||
::
|
||||
|
||||
import interpreters
|
||||
import queue
|
||||
from mymodule import read_large_data_set, use_results
|
||||
|
||||
numworkers = 3
|
||||
data, chunksize = read_large_data_set()
|
||||
buf = memoryview(data)
|
||||
numchunks = (len(buf) + 1) / chunksize
|
||||
results = memoryview(b'\0' * numchunks)
|
||||
|
||||
tasks = interpreters.create_queue()
|
||||
|
||||
def worker(id):
|
||||
interp = interpreters.create()
|
||||
interp.prepare_main(data=buf, results=results, tasks=tasks)
|
||||
interp.exec_sync("""if True:
|
||||
from mymodule import reduce_chunk
|
||||
|
||||
while True:
|
||||
req = tasks.get()
|
||||
if res is None:
|
||||
# Stop!
|
||||
break
|
||||
resindex, start, end = req
|
||||
chunk = data[start: end]
|
||||
res = reduce_chunk(chunk)
|
||||
results[resindex] = res
|
||||
""")
|
||||
threads = [threading.Thread(target=worker) for _ in range(numworkers)]
|
||||
for t in threads:
|
||||
t.start()
|
||||
|
||||
for i in range(numchunks):
|
||||
# Assume there's at least one worker running still.
|
||||
start = i * chunksize
|
||||
end = start + chunksize
|
||||
if end > len(buf):
|
||||
end = len(buf)
|
||||
tasks.put((start, end, i))
|
||||
# Send the "stop" signal.
|
||||
for _ in range(numworkers):
|
||||
tasks.put(None)
|
||||
|
||||
for t in threads:
|
||||
t.join()
|
||||
|
||||
use_results(results)
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
See :pep:`PEP 554 <554#rejected-ideas>`.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
Loading…
Reference in New Issue