PEP: 734
Title: Multiple Interpreters in the Stdlib
Author: Eric Snow <ericsnowcurrently@gmail.com>
Discussions-To: https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147
Status: Deferred
Type: Standards Track
Created: 06-Nov-2023
Python-Version: 3.13
Post-History: `14-Dec-2023 <https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147/>`__,
Replaces: 554
Resolution: https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147/24


.. note::
   This PEP is essentially a continuation of :pep:`554`.  That document
   had grown a lot of ancillary information across 7 years of discussion.
   This PEP is a reduction back to the essential information.  Much of
   that extra information is still valid and useful, just not in the
   immediate context of the specific proposal here.


Abstract
========

This PEP proposes to add a new module, ``interpreters``, to support
inspecting, creating, and running code in multiple interpreters in the
current process.  This includes ``Interpreter`` objects that represent
the underlying interpreters.  The module will also provide a basic
``Queue`` class for communication between interpreters.
Finally, we will add a new ``concurrent.futures.InterpreterPoolExecutor``
based on the ``interpreters`` module.


Introduction
============

Fundamentally, an "interpreter" is the collection of (essentially)
all runtime state which Python threads must share.  So, let's first
look at threads.  Then we'll circle back to interpreters.

Threads and Thread States
-------------------------

A Python process will have one or more OS threads running Python code
(or otherwise interacting with the C API).  Each of these threads
interacts with the CPython runtime using its own thread state
(``PyThreadState``), which holds all the runtime state unique to that
thread.  There is also some runtime state that is shared between
multiple OS threads.

Any OS thread may switch which thread state it is currently using, as
long as it isn't one that another OS thread is already using (or has
been using).  This "current" thread state is stored by the runtime
in a thread-local variable, and may be looked up explicitly with
``PyThreadState_Get()``.  It gets set automatically for the initial
("main") OS thread and for ``threading.Thread`` objects.  From the
C API it is set (and cleared) by ``PyThreadState_Swap()`` and may
be set by ``PyGILState_Ensure()``.  Most of the C API requires that
there be a current thread state, either looked up implicitly
or passed in as an argument.

The relationship between OS threads and thread states is one-to-many.
Each thread state is associated with at most a single OS thread and
records its thread ID.  A thread state is never used for more than one
OS thread.  In the other direction, however, an OS thread may have more
than one thread state associated with it, though, again, only one
may be current.

When there's more than one thread state for an OS thread,
``PyThreadState_Swap()`` is used in that OS thread to switch
between them, with the requested thread state becoming the current one.
Whatever was running in the thread using the old thread state is
effectively paused until that thread state is swapped back in.

Interpreter States
------------------

As noted earlier, there is some runtime state that multiple OS threads
share.  Some of it is exposed by the ``sys`` module, though much is
used internally and not exposed explicitly or only through the C API.

This shared state is called the interpreter state
(``PyInterpreterState``).  We'll sometimes refer to it here as just
"interpreter", though that is also sometimes used to refer to the
``python`` executable, to the Python implementation, and to the
bytecode interpreter (i.e. ``exec()``/``eval()``).

CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997).  The feature has been
available via the :ref:`C API <python:sub-interpreter-support>`.

Interpreters and Threads
------------------------

Thread states are related to interpreter states in much the same way
that OS threads and processes are related (at a high level).  To
begin with, the relationship is one-to-many.
A thread state belongs to a single interpreter (and stores
a pointer to it).  That thread state is never used for a different
interpreter.  In the other direction, however, an interpreter may have
zero or more thread states associated with it.  The interpreter is only
considered active in OS threads where one of its thread states
is current.

Interpreters are created via the C API using
``Py_NewInterpreterFromConfig()`` (or ``Py_NewInterpreter()``, which
is a light wrapper around ``Py_NewInterpreterFromConfig()``).
That function does the following:

1. create a new interpreter state
2. create a new thread state
3. set the thread state as current
   (a current tstate is needed for interpreter init)
4. initialize the interpreter state using that thread state
5. return the thread state (still current)

Note that the returned thread state may be immediately discarded.
There is no requirement that an interpreter have any thread states,
except as soon as the interpreter is meant to actually be used.
At that point it must be made active in the current OS thread.

To make an existing interpreter active in the current OS thread,
the C API user first makes sure that interpreter has a corresponding
thread state.  Then ``PyThreadState_Swap()`` is called like normal
using that thread state.  If the thread state for another interpreter
was already current then it gets swapped out like normal and execution
of that interpreter in the OS thread is thus effectively paused until
it is swapped back in.

Once an interpreter is active in the current OS thread like that, the
thread can call any of the C API, such as ``PyEval_EvalCode()``
(i.e. ``exec()``).  This works by using the current thread state as
the runtime context.

The "Main" Interpreter
----------------------

When a Python process starts, it creates a single interpreter state
(the "main" interpreter) with a single thread state for the current
OS thread.  The Python runtime is then initialized using them.

After initialization, the script or module or REPL is executed using
them.  That execution happens in the interpreter's ``__main__`` module.

When the process finishes running the requested Python code or REPL,
in the main OS thread, the Python runtime is finalized in that thread
using the main interpreter.

Runtime finalization has only a slight, indirect effect on still-running
Python threads, whether in the main interpreter or in subinterpreters.
That's because right away it waits indefinitely for all non-daemon
Python threads to finish.

While the C API may be queried, there is no mechanism by which any
Python thread is directly alerted that finalization has begun,
other than perhaps with "atexit" functions that may be been
registered using ``threading._register_atexit()``.

Any remaining subinterpreters are themselves finalized later,
but at that point they aren't current in any OS threads.

Interpreter Isolation
---------------------

CPython's interpreters are intended to be strictly isolated from each
other.  That means interpreters never share objects (except in very
specific cases with immortal, immutable builtin objects).  Each
interpreter has its own modules (``sys.modules``), classes, functions,
and variables.  Even where two interpreters define the same class,
each will have its own copy.  The same applies to state in C, including
in extension modules.  The CPython C API docs `explain more`_.

.. _explain more:
   https://docs.python.org/3/c-api/init.html#bugs-and-caveats

Notably, there is some process-global state that interpreters will
always share, some mutable and some immutable.  Sharing immutable
state presents few problems, while providing some benefits (mainly
performance).  However, all shared mutable state requires special
management, particularly for thread-safety, some of which the OS
takes care of for us.

Mutable:

* file descriptors
* low-level env vars
* process memory (though allocators *are* isolated)
* the list of interpreters

Immutable:

* builtin types (e.g. ``dict``, ``bytes``)
* singletons (e.g. ``None``)
* underlying static module data (e.g. functions) for
  builtin/extension/frozen modules

Existing Execution Components
-----------------------------

There are a number of existing parts of Python that may help
with understanding how code may be run in a subinterpreter.

In CPython, each component is built around one of the following
C API functions (or variants):

* ``PyEval_EvalCode()``: run the bytecode interpreter with the given
  code object
* ``PyRun_String()``: compile + ``PyEval_EvalCode()``
* ``PyRun_File()``: read + compile + ``PyEval_EvalCode()``
* ``PyRun_InteractiveOneObject()``: compile + ``PyEval_EvalCode()``
* ``PyObject_Call()``: calls ``PyEval_EvalCode()``

builtins.exec()
^^^^^^^^^^^^^^^

The builtin ``exec()`` may be used to execute Python code.  It is
essentially a wrapper around the C API functions ``PyRun_String()``
and ``PyEval_EvalCode()``.

Here are some relevant characteristics of the builtin ``exec()``:

* It runs in the current OS thread and pauses whatever
  was running there, which resumes when ``exec()`` finishes.
  No other OS threads are affected.
  (To avoid pausing the current Python thread, run ``exec()``
  in a ``threading.Thread``.)
* It may start additional threads, which don't interrupt it.
* It executes against a "globals" namespace (and a "locals"
  namespace).  At module-level, ``exec()`` defaults to using
  ``__dict__`` of the current module (i.e. ``globals()``).
  ``exec()`` uses that namespace as-is and does not clear it before or after.
* It propagates any uncaught exception from the code it ran.
  The exception is raised from the ``exec()`` call in the Python
  thread that originally called ``exec()``.

Command-line
^^^^^^^^^^^^

The ``python`` CLI provides several ways to run Python code.  In each
case it maps to a corresponding C API call:

* ``<no args>``, ``-i`` - run the REPL
  (``PyRun_InteractiveOneObject()``)
* ``<filename>`` - run a script (``PyRun_File()``)
* ``-c <code>`` - run the given Python code (``PyRun_String()``)
* ``-m module`` - run the module as a script
  (``PyEval_EvalCode()`` via ``runpy._run_module_as_main()``)

In each case it is essentially a variant of running ``exec()``
at the top-level of the ``__main__`` module of the main interpreter.

threading.Thread
^^^^^^^^^^^^^^^^

When a Python thread is started, it runs the "target" function
with ``PyObject_Call()`` using a new thread state.  The globals
namespace come from ``func.__globals__`` and any uncaught
exception is discarded.


Motivation
==========

The ``interpreters`` module will provide a high-level interface to the
multiple interpreter functionality.  The goal is to make the existing
multiple-interpreters feature of CPython more easily accessible to
Python code.  This is particularly relevant now that CPython has a
per-interpreter GIL (:pep:`684`) and people are more interested
in using multiple interpreters.

Without a stdlib module, users are limited to the
:ref:`C API <python:sub-interpreter-support>`, which restricts how much
they can try out and take advantage of multiple interpreters.

The module will include a basic mechanism for communicating between
interpreters.  Without one, multiple interpreters are a much less
useful feature.


Specification
=============

The module will:

* expose the existing multiple interpreter support
* introduce a basic mechanism for communicating between interpreters

The module will wrap a new low-level ``_interpreters`` module
(in the same way as the ``threading`` module).
However, that low-level API is not intended for public use
and thus not part of this proposal.

Using Interpreters
------------------

The module defines the following functions:

* ``get_current() -> Interpreter``
      Returns the ``Interpreter`` object for the currently executing
      interpreter.

* ``list_all() -> list[Interpreter]``
      Returns the ``Interpreter`` object for each existing interpreter,
      whether it is currently running in any OS threads or not.

* ``create() -> Interpreter``
      Create a new interpreter and return the ``Interpreter`` object
      for it.  The interpreter doesn't do anything on its own and is
      not inherently tied to any OS thread.  That only happens when
      something is actually run in the interpreter
      (e.g. ``Interpreter.exec()``), and only while running.
      The interpreter may or may not have thread states ready to use,
      but that is strictly an internal implementation detail.

Interpreter Objects
-------------------

An ``interpreters.Interpreter`` object that represents the interpreter
(``PyInterpreterState``) with the corresponding unique ID.
There will only be one object for any given interpreter.

If the interpreter was created with ``interpreters.create()`` then
it will be destroyed as soon as all ``Interpreter`` objects with its ID
(across all interpreters) have been deleted.

``Interpreter`` objects may represent other interpreters than those
created by ``interpreters.create()``.  Examples include the main
interpreter (created by Python's runtime initialization) and those
created via the C-API, using ``Py_NewInterpreter()``.  Such
``Interpreter`` objects will not be able to interact with their
corresponding interpreters, e.g. via ``Interpreter.exec()``
(though we may relax this in the future).

Attributes and methods:

* ``id``
      (read-only) A non-negative ``int`` that identifies the
      interpreter that this ``Interpreter`` instance represents.
      Conceptually, this is similar to a process ID.

* ``__hash__()``
      Returns the hash of the interpreter's ``id``.  This is the same
      as the hash of the ID's integer value.

* ``is_running() -> bool``
      Returns ``True`` if the interpreter is currently executing code
      in its ``__main__`` module.  This excludes sub-threads.

      It refers only to if there is an OS thread
      running a script (code) in the interpreter's ``__main__`` module.
      That basically means whether or not ``Interpreter.exec()``
      is running in some OS thread.  Code running in sub-threads
      is ignored.

* ``prepare_main(**kwargs)``
      Bind one or more objects in the interpreter's ``__main__`` module.

      The keyword argument names will be used as the attribute names.
      The values will be bound as new objects, though exactly equivalent
      to the original.  Only objects specifically supported for passing
      between interpreters are allowed.  See `Shareable Objects`_.

      ``prepare_main()`` is helpful for initializing the
      globals for an interpreter before running code in it.

* ``exec(code, /)``
      Execute the given source code in the interpreter
      (in the current OS thread), using its ``__main__`` module.
      It doesn't return anything.

      This is essentially equivalent to switching to this interpreter
      in the current OS thread and then calling the builtin ``exec()``
      using this interpreter's ``__main__`` module's ``__dict__`` as
      the globals and locals.

      The code running in the current OS thread (a different
      interpreter) is effectively paused until ``Interpreter.exec()``
      finishes.  To avoid pausing it, create a new ``threading.Thread``
      and call ``Interpreter.exec()`` in it
      (like ``Interpreter.call_in_thread()`` does).

      ``Interpreter.exec()`` does not reset the interpreter's state nor
      the ``__main__`` module, neither before nor after, so each
      successive call picks up where the last one left off.  This can
      be useful for running some code to initialize an interpreter
      (e.g. with imports) before later performing some repeated task.

      If there is an uncaught exception, it will be propagated into
      the calling interpreter as an ``ExecutionFailed``.  The full error
      display of the original exception, generated relative to the
      called interpreter, is preserved on the propagated ``ExecutionFailed``.
      That includes the full traceback, with all the extra info like
      syntax error details and chained exceptions.
      If the ``ExecutionFailed`` is not caught then that full error display
      will be shown, much like it would be if the propagated exception
      had been raised in the main interpreter and uncaught.  Having
      the full traceback is particularly useful when debugging.

      If exception propagation is not desired then an explicit
      try-except should be used around the *code* passed to
      ``Interpreter.exec()``.  Likewise any error handling that depends
      on specific information from the exception must use an explicit
      try-except around the given *code*, since ``ExecutionFailed``
      will not preserve that information.

* ``call(callable, /)``
      Call the callable object in the interpreter.
      The return value is discarded.  If the callable raises an exception
      then it gets propagated as an ``ExecutionFailed`` exception,
      in the same way as ``Interpreter.exec()``.

      For now only plain functions are supported and only ones that
      take no arguments and have no cell vars.  Free globals are resolved
      against the target interpreter's ``__main__`` module.

      In the future, we can add support for arguments, closures,
      and a broader variety of callables, at least partly via pickle.
      We can also consider not discarding the return value.
      The initial restrictions are in place to allow us to get the basic
      functionality of the module out to users sooner.

* ``call_in_thread(callable, /) -> threading.Thread``
      Essentially, apply ``Interpreter.call()`` in a new thread.
      Return values are discarded and exceptions are not propagated.

      ``call_in_thread()`` is roughly equivalent to::

         def task():
             interp.call(func)
         t = threading.Thread(target=task)
         t.start()

* ``close()``
      Destroy the underlying interpreter.

Communicating Between Interpreters
----------------------------------

The module introduces a basic communication mechanism through special
queues.

There are ``interpreters.Queue`` objects, but they only proxy
the actual data structure: an unbounded FIFO queue that exists
outside any one interpreter.  These queues have special accommodations
for safely passing object data between interpreters, without violating
interpreter isolation.  This includes thread-safety.

As with other queues in Python, for each "put" the object is added to
the back and each "get" pops the next one off the front.  Every added
object will be popped off in the order it was pushed on.

Only objects that are specifically supported for passing
between interpreters may be sent through an ``interpreters.Queue``.
Note that the actual objects aren't sent, but rather their
underlying data.  However, the popped object will still be
strictly equivalent to the original.
See `Shareable Objects`_.

The module defines the following functions:

* ``create_queue(maxsize=0, *, syncobj=False) -> Queue``
      Create a new queue.  If the maxsize is zero or negative then the
      queue is unbounded.

      "syncobj" is used as the default for ``put()`` and ``put_nowait()``.

Queue Objects
-------------

``interpreters.Queue`` objects act as proxies for the underlying
cross-interpreter-safe queues exposed by the ``interpreters`` module.
Each ``Queue`` object represents the queue with the corresponding
unique ID.
There will only be one object for any given queue.

``Queue`` implements all the methods of ``queue.Queue`` except for
``task_done()`` and ``join()``, hence it is similar to
``asyncio.Queue`` and ``multiprocessing.Queue``.

Attributes and methods:

* ``id``
      (read-only) A non-negative ``int`` that identifies
      the corresponding cross-interpreter queue.
      Conceptually, this is similar to the file descriptor
      used for a pipe.

* ``maxsize``
      (read-only) Number of items allowed in the queue.
      Zero means "unbounded".

* ``__hash__()``
      Return the hash of the queue's ``id``.  This is the same
      as the hash of the ID's integer value.

* ``empty()``
      Return ``True`` if the queue is empty, ``False`` otherwise.

      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.

* ``full()``
      Return ``True`` if there are ``maxsize`` items in the queue.

      If the queue was initialized with ``maxsize=0`` (the default),
      then ``full()`` never returns ``True``.

      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.

* ``qsize()``
      Return the number of items in the queue.

      This is only a snapshot of the state at the time of the call.
      Other threads or interpreters may cause this to change.

* ``put(obj, timeout=None, *, syncobj=None)``
      Add the object to the queue.

      If ``maxsize > 0`` and the queue is full then this blocks until
      a free slot is available.  If *timeout* is a positive number
      then it only blocks at least that many seconds and then raises
      ``interpreters.QueueFull``.  Otherwise is blocks forever.

      If "syncobj" is true then the object must be
      `shareable <Shareable Objects_>`_, which means the object's data
      is passed through rather than the object itself.
      If "syncobj" is false then all objects are supported.  However,
      there are some performance penalties and all objects are copies
      (e.g. via pickle).  Thus mutable objects will never be
      automatically synchronized between interpreters.
      If "syncobj" is None (the default) then the queue's default
      value is used.

      If an object is still in the queue, and the interpreter which put
      it in the queue (i.e. to which it belongs) is destroyed, then the
      object is immediately removed from the queue.  (We may later add
      an option to replace the removed object in the queue with a
      sentinel or to raise an exception for the corresponding ``get()``
      call.)

* ``put_nowait(obj, *, syncobj=None)``
      Like ``put()`` but effectively with an immediate timeout.
      Thus if the queue is full, it immediately raises
      ``interpreters.QueueFull``.

* ``get(timeout=None) -> object``
      Pop the next object from the queue and return it.  Block while
      the queue is empty.  If a positive *timeout* is provided and an
      object hasn't been added to the queue in that many seconds
      then raise ``interpreters.QueueEmpty``.

* ``get_nowait() -> object``
      Like ``get()``, but do not block.  If the queue is not empty
      then return the next item.  Otherwise, raise
      ``interpreters.QueueEmpty``.

Shareable Objects
-----------------

``Interpreter.prepare_main()`` only works with "shareable" objects.
The same goes for ``interpreters.Queue`` (optionally).

A "shareable" object is one which may be passed from one interpreter
to another.  The object is not necessarily actually directly shared
by the interpreters.  However, even if it isn't, the shared object
should be treated as though it *were* shared directly.  That's a
strong equivalence guarantee for all shareable objects.
(See below.)

For some types (builtin singletons), the actual object is shared.
For some, the object's underlying data is actually shared but each
interpreter has a distinct object wrapping that data.  For all other
shareable types, a strict copy or proxy is made such that the
corresponding objects continue to match exactly.  In cases where
the underlying data is complex but must be copied (e.g. ``tuple``),
the data is serialized as efficiently as possible.

Shareable objects must be specifically supported internally
by the Python runtime.  However, there is no restriction against
adding support for more types later.

Here's the initial list of supported objects:

* ``str``
* ``bytes``
* ``int``
* ``float``
* ``bool`` (``True``/``False``)
* ``None``
* ``tuple`` (only with shareable items)
* ``interpreters.Queue``
* ``memoryview`` (underlying buffer actually shared)

Note that the last two on the list, queues and ``memoryview``, are
technically mutable data types, whereas the rest are not.  When any
interpreters share mutable data there is always a risk of data races.
Cross-interpreter safety, including thread-safety, is a fundamental
feature of queues.

However, ``memoryview`` does not have any native accommodations.
The user is responsible for managing thread-safety, whether passing
a token back and forth through a queue to indicate safety
(see `Synchronization`_), or by assigning sub-range exclusivity
to individual interpreters.

Most objects will be shared through queues (``interpreters.Queue``),
as interpreters communicate information between each other.
Less frequently, objects will be shared through ``prepare_main()``
to set up an interpreter prior to running code in it.  However,
``prepare_main()`` is the primary way that queues are shared,
to provide another interpreter with a means
of further communication.

Finally, a reminder: for a few types the actual object is shared,
whereas for the rest only the underlying data is shared, whether
as a copy or through a proxy.  Regardless, it always preserves
the strong equivalence guarantee of "shareable" objects.

The guarantee is that a shared object in one interpreter is strictly
equivalent to the corresponding object in the other interpreter.
In other words, the two objects will be indistinguishable from each
other.  The shared object should be treated as though the original
had been shared directly, whether or not it actually was.
That's a slightly different and stronger promise than just equality.

The guarantee is especially important for mutable objects, like
``Interpreters.Queue`` and ``memoryview``.  Mutating the object
in one interpreter will always be reflected immediately in every
other interpreter sharing the object.

Synchronization
---------------

There are situations where two interpreters should be synchronized.
That may involve sharing a resource, worker management, or preserving
sequential consistency.

In threaded programming the typical synchronization primitives are
types like mutexes.  The ``threading`` module exposes several.
However, interpreters cannot share objects which means they cannot
share ``threading.Lock`` objects.

The ``interpreters`` module does not provide any such dedicated
synchronization primitives.  Instead, ``interpreters.Queue``
objects provide everything one might need.

For example, if there's a shared resource that needs managed
access then a queue may be used to manage it, where the interpreters
pass an object around to indicate who can use the resource::

   import interpreters
   from mymodule import load_big_data, check_data

   numworkers = 10
   control = interpreters.create_queue()
   data = memoryview(load_big_data())

   def worker():
       interp = interpreters.create()
       interp.prepare_main(control=control, data=data)
       interp.exec("""if True:
           from mymodule import edit_data
           while True:
               token = control.get()
               edit_data(data)
               control.put(token)
           """)
   threads = [threading.Thread(target=worker) for _ in range(numworkers)]
   for t in threads:
       t.start()

   token = 'football'
   control.put(token)
   while True:
       control.get()
       if not check_data(data):
           break
       control.put(token)

Exceptions
----------

* ``InterpreterError``
      Indicates that some interpreter-related failure occurred.

      This exception is a subclass of ``Exception``.

* ``InterpreterNotFoundError``
      Raised from ``Interpreter`` methods after the underlying
      interpreter has been destroyed, e.g. via the C-API.

      This exception is a subclass of ``InterpreterError``.

* ``ExecutionFailed``
      Raised from ``Interpreter.exec()`` and ``Interpreter.call()``
      when there's an uncaught exception.
      The error display for this exception includes the traceback
      of the uncaught exception, which gets shown after the normal
      error display, much like happens for ``ExceptionGroup``.

      Attributes:

      * ``type`` - a representation of the original exception's class,
        with ``__name__``, ``__module__``, and ``__qualname__`` attrs.
      * ``msg`` - ``str(exc)`` of the original exception
      * ``snapshot`` - a ``traceback.TracebackException`` object
        for the original exception

      This exception is a subclass of ``InterpreterError``.

* ``QueueError``
      Indicates that some queue-related failure occurred.

      This exception is a subclass of ``Exception``.

* ``QueueNotFoundError``
      Raised from ``interpreters.Queue`` methods after the underlying
      queue has been destroyed.

      This exception is a subclass of ``QueueError``.

* ``QueueEmpty``
      Raised from ``Queue.get()`` (or ``get_nowait()`` with no default)
      when the queue is empty.

      This exception is a subclass of both ``QueueError``
      and the stdlib ``queue.Empty``.

* ``QueueFull``
      Raised from ``Queue.put()`` (with a timeout) or ``put_nowait()``
      when the queue is already at its max size.

      This exception is a subclass of both ``QueueError``
      and the stdlib ``queue.Empty``.

InterpreterPoolExecutor
-----------------------

Along with the new ``interpreters`` module, there will be a new
``concurrent.futures.InterpreterPoolExecutor``.  It will be a
derivative of ``ThreadPoolExecutor``, where each worker executes
in its own thread, but each with its own subinterpreter.

Like the other executors, ``InterpreterPoolExecutor`` will support
callables for tasks, and for the initializer.  Also like the other
executors, the arguments in both cases will be mostly unrestricted.
The callables and arguments will typically be serialized when sent
to a worker's interpreter, e.g. with pickle, like how the
``ProcessPoolExecutor`` works.  This contrasts with
``Interpreter.call()``, which will (at least initially)
be much more restricted.

Communication between workers, or between the executor
(or generally its interpreter) and the workers, may still be done
through ``interpreters.Queue`` objects, set with the initializer.

sys.implementation.supports_isolated_interpreters
-------------------------------------------------

Python implementations are not required to support subinterpreters,
though most major ones do.  If an implementation does support them
then ``sys.implementation.supports_isolated_interpreters`` will be
set to ``True``.  Otherwise it will be ``False``.  If the feature
is not supported then importing the ``interpreters`` module will
raise an ``ImportError``.

Examples
--------

The following examples demonstrate practical cases where multiple
interpreters may be useful.

Example 1:

There's a stream of requests coming in that will be handled
via workers in sub-threads.

* each worker thread has its own interpreter
* there's one queue to send tasks to workers and
  another queue to return results
* the results are handled in a dedicated thread
* each worker keeps going until it receives a "stop" sentinel (``None``)
* the results handler keeps going until all workers have stopped

::

   import interpreters
   from mymodule import iter_requests, handle_result

   tasks = interpreters.create_queue()
   results = interpreters.create_queue()

   numworkers = 20
   threads = []

   def results_handler():
       running = numworkers
       while running:
           try:
               res = results.get(timeout=0.1)
           except interpreters.QueueEmpty:
               # No workers have finished a request since last time.
               pass
           else:
               if res is None:
                   # A worker has stopped.
                   running -= 1
               else:
                   handle_result(res)
       empty = object()
       assert results.get_nowait(empty) is empty
   threads.append(threading.Thread(target=results_handler))

   def worker():
       interp = interpreters.create()
       interp.prepare_main(tasks=tasks, results=results)
       interp.exec("""if True:
           from mymodule import handle_request, capture_exception

           while True:
               req = tasks.get()
               if req is None:
                   # Stop!
                   break
               try:
                   res = handle_request(req)
               except Exception as exc:
                   res = capture_exception(exc)
               results.put(res)
           # Notify the results handler.
           results.put(None)
           """)
   threads.extend(threading.Thread(target=worker) for _ in range(numworkers))

   for t in threads:
       t.start()

   for req in iter_requests():
       tasks.put(req)
   # Send the "stop" signal.
   for _ in range(numworkers):
       tasks.put(None)

   for t in threads:
       t.join()

Example 2:

This case is similar to the last as there are a bunch of workers
in sub-threads.  However, this time the code is chunking up a big array
of data, where each worker processes one chunk at a time.  Copying
that data to each interpreter would be exceptionally inefficient,
so the code takes advantage of directly sharing ``memoryview`` buffers.

* all the interpreters share the buffer of the source array
* each one writes its results to a second shared buffer
* there's use a queue to send tasks to workers
* only one worker will ever read any given index in the source array
* only one worker will ever write to any given index in the results
  (this is how it ensures thread-safety)

::

   import interpreters
   import queue
   from mymodule import read_large_data_set, use_results

   numworkers = 3
   data, chunksize = read_large_data_set()
   buf = memoryview(data)
   numchunks = (len(buf) + 1) / chunksize
   results = memoryview(b'\0' * numchunks)

   tasks = interpreters.create_queue()

   def worker(id):
       interp = interpreters.create()
       interp.prepare_main(data=buf, results=results, tasks=tasks)
       interp.exec("""if True:
           from mymodule import reduce_chunk

           while True:
               req = tasks.get()
               if res is None:
                   # Stop!
                   break
               resindex, start, end = req
               chunk = data[start: end]
               res = reduce_chunk(chunk)
               results[resindex] = res
           """)
   threads = [threading.Thread(target=worker) for _ in range(numworkers)]
   for t in threads:
       t.start()

   for i in range(numchunks):
       # Assume there's at least one worker running still.
       start = i * chunksize
       end = start + chunksize
       if end > len(buf):
           end = len(buf)
       tasks.put((start, end, i))
   # Send the "stop" signal.
   for _ in range(numworkers):
       tasks.put(None)

   for t in threads:
       t.join()

   use_results(results)


Rationale
=========

A Minimal API
-------------

Since the core dev team has no real experience with
how users will make use of multiple interpreters in Python code, this
proposal purposefully keeps the initial API as lean and minimal as
possible.  The objective is to provide a well-considered foundation
on which further (more advanced) functionality may be added later,
as appropriate.

That said, the proposed design incorporates lessons learned from
existing use of subinterpreters by the community, from existing stdlib
modules, and from other programming languages.  It also factors in
experience from using subinterpreters in the CPython test suite and
using them in `concurrency benchmarks`_.

.. _concurrency benchmarks:
   https://github.com/ericsnowcurrently/concurrency-benchmarks

create(), create_queue()
------------------------

Typically, users call a type to create instances of the type, at which
point the object's resources get provisioned.  The ``interpreters``
module takes a different approach, where users must call ``create()``
to get a new interpreter or ``create_queue()`` for a new queue.
Calling ``interpreters.Interpreter()`` directly only returns a wrapper
around an existing interpreters (likewise for
``interpreters.Queue()``).

This is because interpreters (and queues) are special resources.
They exist globally in the process and are not managed/owned by the
current interpreter.  Thus the ``interpreters`` module makes creating
an interpreter (or queue) a visibly distinct operation from creating
an instance of ``interpreters.Interpreter``
(or ``interpreters.Queue``).

Interpreter.prepare_main() Sets Multiple Variables
--------------------------------------------------

``prepare_main()`` may be seen as a setter function of sorts.
It supports setting multiple names at once,
e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters
set one item at a time.  The main reason is for efficiency.

To set a value in the interpreter's ``__main__.__dict__``, the
implementation must first switch the OS thread to the identified
interpreter, which involves some non-negligible overhead.  After
setting the value it must switch back.
Furthermore, there is some additional overhead to the mechanism
by which it passes objects between interpreters, which can be
reduced in aggregate if multiple values are set at once.

Therefore, ``prepare_main()`` supports setting multiple
values at once.

Propagating Exceptions
----------------------

An uncaught exception from a subinterpreter,
via ``Interpreter.exec()``,
could either be (effectively) ignored,
like ``threading.Thread()`` does,
or propagated, like the builtin ``exec()`` does.
Since ``Interpreter.exec()`` is a synchronous operation,
like the builtin ``exec()``, uncaught exceptions are propagated.

However, such exceptions are not raised directly.  That's because
interpreters are isolated from each other and must not share objects,
including exceptions.  That could be addressed by raising a surrogate
of the exception, whether a summary, a copy, or a proxy that wraps it.
Any of those could preserve the traceback, which is useful for
debugging.  The ``ExecutionFailed`` that gets raised
is such a surrogate.

There's another concern to consider.  If a propagated exception isn't
immediately caught, it will bubble up through the call stack until
caught (or not).  In the case that code somewhere else may catch it,
it is helpful to identify that the exception came from a subinterpreter
(i.e. a "remote" source), rather than from the current interpreter.
That's why ``Interpreter.exec()`` raises ``ExecutionFailed`` and why
it is a plain ``Exception``, rather than a copy or proxy with a class
that matches the original exception.  For example, an uncaught
``ValueError`` from a subinterpreter would never get caught in a later
``try: ... except ValueError: ...``.  Instead, ``ExecutionFailed``
must be handled directly.

In contrast, exceptions propagated from ``Interpreter.call()`` do not
involve ``ExecutionFailed`` but are raised directly, as though originating
in the calling interpreter.  This is because ``Interpreter.call()`` is
a higher level method that uses pickle to support objects that can't
normally be passed between interpreters.

Limited Object Sharing
----------------------

As noted in `Interpreter Isolation`_, only a small number of builtin
objects may be truly shared between interpreters.  In all other cases
objects can only be shared indirectly, through copies or proxies.

The set of objects that are shareable as copies through queues
(and ``Interpreter.prepare_main()``) is limited for the sake of
efficiency.

Supporting sharing of *all* objects is possible (via pickle)
but not part of this proposal.  For one thing, it's helpful to know
in those cases that only an efficient implementation is being used.
Furthermore, in those cases supporting mutable objects via pickling
would violate the guarantee that "shared" objects be equivalent
(and stay that way).

Objects vs. ID Proxies
----------------------

For both interpreters and queues, the low-level module makes use of
proxy objects that expose the underlying state by their corresponding
process-global IDs.  In both cases the state is likewise process-global
and will be used by multiple interpreters.  Thus they aren't suitable
to be implemented as ``PyObject``, which is only really an option for
interpreter-specific data.  That's why the ``interpreters`` module
instead provides objects that are weakly associated through the ID.


Rejected Ideas
==============

See :pep:`PEP 554 <554#rejected-ideas>`.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.