PEP: 734 Title: Multiple Interpreters in the Stdlib Author: Eric Snow Discussions-To: https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147 Status: Draft Type: Standards Track Created: 06-Nov-2023 Python-Version: 3.13 Post-History: `14-Dec-2023 `__, Replaces: 554 .. note:: This PEP is essentially a continuation of :pep:`554`. That document had grown a lot of ancillary information across 7 years of discussion. This PEP is a reduction back to the essential information. Much of that extra information is still valid and useful, just not in the immediate context of the specific proposal here. Abstract ======== This PEP proposes to add a new module, ``interpreters``, to support inspecting, creating, and running code in multiple interpreters in the current process. This includes ``Interpreter`` objects that represent the underlying interpreters. The module will also provide a basic ``Queue`` class for communication between interpreters. Finally, we will add a new ``concurrent.futures.InterpreterPoolExecutor`` based on the ``interpreters`` module. Introduction ============ Fundamentally, an "interpreter" is the collection of (essentially) all runtime state which Python threads must share. So, let's first look at threads. Then we'll circle back to interpreters. Threads and Thread States ------------------------- A Python process will have one or more OS threads running Python code (or otherwise interacting with the C API). Each of these threads interacts with the CPython runtime using its own thread state (``PyThreadState``), which holds all the runtime state unique to that thread. There is also some runtime state that is shared between multiple OS threads. Any OS thread may switch which thread state it is currently using, as long as it isn't one that another OS thread is already using (or has been using). This "current" thread state is stored by the runtime in a thread-local variable, and may be looked up explicitly with ``PyThreadState_Get()``. It gets set automatically for the initial ("main") OS thread and for ``threading.Thread`` objects. From the C API it is set (and cleared) by ``PyThreadState_Swap()`` and may be set by ``PyGILState_Ensure()``. Most of the C API requires that there be a current thread state, either looked up implicitly or passed in as an argument. The relationship between OS threads and thread states is one-to-many. Each thread state is associated with at most a single OS thread and records its thread ID. A thread state is never used for more than one OS thread. In the other direction, however, an OS thread may have more than one thread state associated with it, though, again, only one may be current. When there's more than one thread state for an OS thread, ``PyThreadState_Swap()`` is used in that OS thread to switch between them, with the requested thread state becoming the current one. Whatever was running in the thread using the old thread state is effectively paused until that thread state is swapped back in. Interpreter States ------------------ As noted earlier, there is some runtime state that multiple OS threads share. Some of it is exposed by the ``sys`` module, though much is used internally and not exposed explicitly or only through the C API. This shared state is called the interpreter state (``PyInterpreterState``). We'll sometimes refer to it here as just "interpreter", though that is also sometimes used to refer to the ``python`` executable, to the Python implementation, and to the bytecode interpreter (i.e. ``exec()``/``eval()``). CPython has supported multiple interpreters in the same process (AKA "subinterpreters") since version 1.5 (1997). The feature has been available via the :ref:`C API `. Interpreters and Threads ------------------------ Thread states are related to interpreter states in much the same way that OS threads and processes are related (at a hight level). To begin with, the relationship is one-to-many. A thread state belongs to a single interpreter (and stores a pointer to it). That thread state is never used for a different interpreter. In the other direction, however, an interpreter may have zero or more thread states associated with it. The interpreter is only considered active in OS threads where one of its thread states is current. Interpreters are created via the C API using ``Py_NewInterpreterFromConfig()`` (or ``Py_NewInterpreter()``, which is a light wrapper around ``Py_NewInterpreterFromConfig()``). That function does the following: 1. create a new interpreter state 2. create a new thread state 3. set the thread state as current (a current tstate is needed for interpreter init) 4. initialize the interpreter state using that thread state 5. return the thread state (still current) Note that the returned thread state may be immediately discarded. There is no requirement that an interpreter have any thread states, except as soon as the interpreter is meant to actually be used. At that point it must be made active in the current OS thread. To make an existing interpreter active in the current OS thread, the C API user first makes sure that interpreter has a corresponding thread state. Then ``PyThreadState_Swap()`` is called like normal using that thread state. If the thread state for another interpreter was already current then it gets swapped out like normal and execution of that interpreter in the OS thread is thus effectively paused until it is swapped back in. Once an interpreter is active in the current OS thread like that, the thread can call any of the C API, such as ``PyEval_EvalCode()`` (i.e. ``exec()``). This works by using the current thread state as the runtime context. The "Main" Interpreter ---------------------- When a Python process starts, it creates a single interpreter state (the "main" interpreter) with a single thread state for the current OS thread. The Python runtime is then initialized using them. After initialization, the script or module or REPL is executed using them. That execution happens in the interpreter's ``__main__`` module. When the process finishes running the requested Python code or REPL, in the main OS thread, the Python runtime is finalized in that thread using the main interpreter. Runtime finalization has only a slight, indirect effect on still-running Python threads, whether in the main interpreter or in subinterpreters. That's because right away it waits indefinitely for all non-daemon Python threads to finish. While the C API may be queried, there is no mechanism by which any Python thread is directly alerted that finalization has begun, other than perhaps with "atexit" functions that may be been registered using ``threading._register_atexit()``. Any remaining subinterpreters are themselves finalized later, but at that point they aren't current in any OS threads. Interpreter Isolation --------------------- CPython's interpreters are intended to be strictly isolated from each other. That means interpreters never share objects (except in very specific cases with immortal, immutable builtin objects). Each interpreter has its own modules (``sys.modules``), classes, functions, and variables. Even where two interpreters define the same class, each will have its own copy. The same applies to state in C, including in extension modules. The CPython C API docs `explain more`_. .. _explain more: https://docs.python.org/3/c-api/init.html#bugs-and-caveats Notably, there is some process-global state that interpreters will always share, some mutable and some immutable. Sharing immutable state presents few problems, while providing some benefits (mainly performance). However, all shared mutable state requires special management, particularly for thread-safety, some of which the OS takes care of for us. Mutable: * file descriptors * low-level env vars * process memory (though allocators *are* isolated) * the list of interpreters Immutable: * builtin types (e.g. ``dict``, ``bytes``) * singletons (e.g. ``None``) * underlying static module data (e.g. functions) for builtin/extension/frozen modules Existing Execution Components ----------------------------- There are a number of existing parts of Python that may help with understanding how code may be run in a subinterpreter. In CPython, each component is built around one of the following C API functions (or variants): * ``PyEval_EvalCode()``: run the bytecode interpreter with the given code object * ``PyRun_String()``: compile + ``PyEval_EvalCode()`` * ``PyRun_File()``: read + compile + ``PyEval_EvalCode()`` * ``PyRun_InteractiveOneObject()``: compile + ``PyEval_EvalCode()`` * ``PyObject_Call()``: calls ``PyEval_EvalCode()`` builtins.exec() ^^^^^^^^^^^^^^^ The builtin ``exec()`` may be used to execute Python code. It is essentially a wrapper around the C API functions ``PyRun_String()`` and ``PyEval_EvalCode()``. Here are some relevant characteristics of the builtin ``exec()``: * It runs in the current OS thread and pauses whatever was running there, which resumes when ``exec()`` finishes. No other OS threads are affected. (To avoid pausing the current Python thread, run ``exec()`` in a ``threading.Thread``.) * It may start additional threads, which don't interrupt it. * It executes against a "globals" namespace (and a "locals" namespace). At module-level, ``exec()`` defaults to using ``__dict__`` of the current module (i.e. ``globals()``). ``exec()`` uses that namespace as-is and does not clear it before or after. * It propagates any uncaught exception from the code it ran. The exception is raised from the ``exec()`` call in the Python thread that originally called ``exec()``. Command-line ^^^^^^^^^^^^ The ``python`` CLI provides several ways to run Python code. In each case it maps to a corresponding C API call: * ````, ``-i`` - run the REPL (``PyRun_InteractiveOneObject()``) * ```` - run a script (``PyRun_File()``) * ``-c `` - run the given Python code (``PyRun_String()``) * ``-m module`` - run the module as a script (``PyEval_EvalCode()`` via ``runpy._run_module_as_main()``) In each case it is essentially a variant of running ``exec()`` at the top-level of the ``__main__`` module of the main interpreter. threading.Thread ^^^^^^^^^^^^^^^^ When a Python thread is started, it runs the "target" function with ``PyObject_Call()`` using a new thread state. The globals namespace come from ``func.__globals__`` and any uncaught exception is discarded. Motivation ========== The ``interpreters`` module will provide a high-level interface to the multiple interpreter functionality. The goal is to make the existing multiple-interpreters feature of CPython more easily accessible to Python code. This is particularly relevant now that CPython has a per-interpreter GIL (:pep:`684`) and people are more interested in using multiple interpreters. Without a stdlib module, users are limited to the :ref:`C API `, which restricts how much they can try out and take advantage of multiple interpreters. The module will include a basic mechanism for communicating between interpreters. Without one, multiple interpreters are a much less useful feature. Rationale ========= A Minimal API ------------- Since the core dev team has no real experience with how users will make use of multiple interpreters in Python code, this proposal purposefully keeps the initial API as lean and minimal as possible. The objective is to provide a well-considered foundation on which further (more advanced) functionality may be added later, as appropriate. That said, the proposed design incorporates lessons learned from existing use of subinterpreters by the community, from existing stdlib modules, and from other programming languages. It also factors in experience from using subinterpreters in the CPython test suite and using them in `concurrency benchmarks`_. .. _concurrency benchmarks: https://github.com/ericsnowcurrently/concurrency-benchmarks Interpreter.prepare_main() Sets Multiple Variables -------------------------------------------------- ``prepare_main()`` may be seen as a setter function of sorts. It supports setting multiple names at once, e.g. ``interp.prepare_main(spam=1, eggs=2)``, whereas most setters set one item at a time. The main reason is for efficiency. To set a value in the interpreter's ``__main__.__dict__``, the implementation must first switch the OS thread to the identified interpreter, which involves some non-negligible overhead. After setting the value it must switch back. Furthermore, there is some additional overhead to the mechanism by which it passes objects between interpreters, which can be reduced in aggregate if multiple values are set at once. Therefore, ``prepare_main()`` supports setting multiple values at once. Propagating Exceptions ---------------------- An uncaught exception from a subinterpreter, via ``Interpreter.exec_sync()``, could either be (effectively) ignored, like ``threading.Thread()`` does, or propagated, like the builtin ``exec()`` does. Since ``exec_sync()`` is a synchronous operation, like the builtin ``exec()``, uncaught exceptions are propagated. However, such exceptions are not raised directly. That's because interpreters are isolated from each other and must not share objects, including exceptions. That could be addressed by raising a surrogate of the exception, whether a summary, a copy, or a proxy that wraps it. Any of those could preserve the traceback, which is useful for debugging. The ``ExecFailure`` that gets raised is such a surrogate. There's another concern to consider. If a propagated exception isn't immediately caught, it will bubble up through the call stack until caught (or not). In the case that code somewhere else may catch it, it is helpful to identify that the exception came from a subinterpreter (i.e. a "remote" source), rather than from the current interpreter. That's why ``Interpreter.exec_sync()`` raises ``ExecFailure`` and why it is a plain ``Exception``, rather than a copy or proxy with a class that matches the original exception. For example, an uncaught ``ValueError`` from a subinterpreter would never get caught in a later ``try: ... except ValueError: ...``. Instead, ``ExecFailure`` must be handled directly. Limited Object Sharing ---------------------- As noted in `Interpreter Isolation`_, only a small number of builtin objects may be truly shared between interpreters. In all other cases objects can only be shared indirectly, through copies or proxies. The set of objects that are shareable as copies through queues (and ``Interpreter.prepare_main()``) is limited for the sake of efficiency. Supporting sharing of *all* objects is possible (via pickle) but not part of this proposal. For one thing, it's helpful to know that only an efficient implementation is being used. Furthermore, for mutable objects pickling would violate the guarantee that "shared" objects be equivalent (and stay that way). Objects vs. ID Proxies ---------------------- For both interpreters and queues, the low-level module makes use of proxy objects that expose the underlying state by their corresponding process-global IDs. In both cases the state is likewise process-global and will be used by multiple interpreters. Thus they aren't suitable to be implemented as ``PyObject``, which is only really an option for interpreter-specific data. That's why the ``interpreters`` module instead provides objects that are weakly associated through the ID. Specification ============= The module will: * expose the existing multiple interpreter support * introduce a basic mechanism for communicating between interpreters The module will wrap a new low-level ``_interpreters`` module (in the same way as the ``threading`` module). However, that low-level API is not intended for public use and thus not part of this proposal. Using Interpreters ------------------ The module defines the following functions: * ``get_current() -> Interpreter`` Returns the ``Interpreter`` object for the currently executing interpreter. * ``list_all() -> list[Interpreter]`` Returns the ``Interpreter`` object for each existing interpreter, whether it is currently running in any OS threads or not. * ``create() -> Interpreter`` Create a new interpreter and return the ``Interpreter`` object for it. The interpreter doesn't do anything on its own and is not inherently tied to any OS thread. That only happens when something is actually run in the interpreter (e.g. ``Interpreter.exec_sync()``), and only while running. The interpreter may or may not have thread states ready to use, but that is strictly an internal implementation detail. Interpreter Objects ------------------- An ``interpreters.Interpreter`` object that represents the interpreter (``PyInterpreterState``) with the corresponding unique ID. There will only be one object for any given interpreter. If the interpreter was created with ``interpreters.create()`` then it will be destroyed as soon as all ``Interpreter`` objects have been deleted. Attributes and methods: * ``id`` (read-only) A non-negative ``int`` that identifies the interpreter that this ``Interpreter`` instance represents. Conceptually, this is similar to a process ID. * ``__hash__()`` Returns the hash of the interpreter's ``id``. This is the same as the hash of the ID's integer value. * ``is_running() -> bool`` Returns ``True`` if the interpreter is currently executing code in its ``__main__`` module. This excludes sub-threads. It refers only to if there is an OS thread running a script (code) in the interpreter's ``__main__`` module. That basically means whether or not ``Interpreter.exec_sync()`` is running in some OS thread. Code running in sub-threads is ignored. * ``prepare_main(**kwargs)`` Bind one or more objects in the interpreter's ``__main__`` module. The keyword argument names will be used as the attribute names. The values will be bound as new objects, though exactly equivalent to the original. Only objects specifically supported for passing between interpreters are allowed. See `Shareable Objects`_. ``prepare_main()`` is helpful for initializing the globals for an interpreter before running code in it. * ``exec_sync(code, /)`` Execute the given source code in the interpreter (in the current OS thread), using its ``__main__`` module. It doesn't return anything. This is essentially equivalent to switching to this interpreter in the current OS thread and then calling the builtin ``exec()`` using this interpreter's ``__main__`` module's ``__dict__`` as the globals and locals. The code running in the current OS thread (a different interpreter) is effectively paused until ``exec_sync()`` finishes. To avoid pausing it, create a new ``threading.Thread`` and call ``exec_sync()`` in it. ``exec_sync()`` does not reset the interpreter's state nor the ``__main__`` module, neither before nor after, so each successive call picks up where the last one left off. This can be useful for running some code to initialize an interpreter (e.g. with imports) before later performing some repeated task. If there is an uncaught exception, it will be propagated into the calling interpreter as a ``ExecFailure``, which preserves enough information for a helpful error display. That means if the ``ExecFailure`` isn't caught then the full traceback of the propagated exception, including details about syntax errors, etc., will be displayed. Having the full traceback is particularly useful when debugging. If exception propagation is not desired then an explicit try-except should be used around the *code* passed to ``exec_sync()``. Likewise any error handling that depends on specific information from the exception must use an explicit try-except around the given *code*, since ``ExecFailure`` will not preserve that information. * ``run(code, /) -> threading.Thread`` Create a new thread and call ``exec_sync()`` in it. Exceptions are not propagated. This is roughly equivalent to:: def task(): interp.exec_sync(code) t = threading.Thread(target=task) t.start() Communicating Between Interpreters ---------------------------------- The module introduces a basic communication mechanism through special queues. There are ``interpreters.Queue`` objects, but they only proxy the actual data structure: an unbounded FIFO queue that exists outside any one interpreter. These queues have special accommodations for safely passing object data between interpreters, without violating interpreter isolation. This includes thread-safety. As with other queues in Python, for each "put" the object is added to the back and each "get" pops the next one off the front. Every added object will be popped off in the order it was pushed on. Only objects that are specifically supported for passing between interpreters may be sent through a ``Queue``. Note that the actual objects aren't sent, but rather their underlying data. However, the popped object will still be strictly equivalent to the original. See `Shareable Objects`_. The module defines the following functions: * ``create_queue(maxsize=0) -> Queue`` Create a new queue. If the maxsize is zero or negative then the queue is unbounded. Queue Objects ------------- ``interpreters.Queue`` objects act as proxies for the underlying cross-interpreter-safe queues exposed by the ``interpreters`` module. Each ``Queue`` object represents the queue with the corresponding unique ID. There will only be one object for any given queue. ``Queue`` implements all the methods of ``queue.Queue`` except for ``task_done()`` and ``join()``, hence it is similar to ``asyncio.Queue`` and ``multiprocessing.Queue``. Attributes and methods: * ``id`` (read-only) A non-negative ``int`` that identifies the corresponding cross-interpreter queue. Conceptually, this is similar to the file descriptor used for a pipe. * ``maxsize`` Number of items allowed in the queue. Zero means "unbounded". * ``__hash__()`` Return the hash of the queue's ``id``. This is the same as the hash of the ID's integer value. * ``empty()`` Return ``True`` if the queue is empty, ``False`` otherwise. This is only a snapshot of the state at the time of the call. Other threads or interpreters may cause this to change. * ``full()`` Return ``True`` if there are ``maxsize`` items in the queue. If the queue was initialized with ``maxsize=0`` (the default), then ``full()`` never returns ``True``. This is only a snapshot of the state at the time of the call. Other threads or interpreters may cause this to change. * ``qsize()`` Return the number of items in the queue. This is only a snapshot of the state at the time of the call. Other threads or interpreters may cause this to change. * ``put(obj, timeout=None)`` Add the object to the queue. The object must be `shareable `_, which means the object's data is passed through rather than the object itself. If ``maxsize > 0`` and the queue is full then this blocks until a free slot is available. If *timeout* is a positive number then it only blocks at least that many seconds and then raises ``interpreters.QueueFull``. Otherwise is blocks forever. * ``put_nowait(obj)`` Like ``put()`` but effectively with an immediate timeout. Thus if the queue is full, it immediately raises ``interpreters.QueueFull``. * ``get(timeout=None) -> object`` Pop the next object from the queue and return it. Block while the queue is empty. If a positive *timeout* is provided and an object hasn't been added to the queue in that many seconds then raise ``interpreters.QueueEmpty``. * ``get_nowait() -> object`` Like ``get()``, but do not block. If the queue is not empty then return the next item. Otherwise, raise ``interpreters.QueueEmpty``. Shareable Objects ----------------- Both ``Interpreter.prepare_main()`` and ``Queue`` work only with "shareable" objects. A "shareable" object is one which may be passed from one interpreter to another. The object is not necessarily actually directly shared by the interpreters. However, even if it isn't, the shared object should be treated as though it *were* shared directly. That's a strong equivalence guarantee for all shareable objects. (See below.) For some types (builtin singletons), the actual object is shared. For some, the object's underlying data is actually shared but each interpreter has a distinct object wrapping that data. For all other shareable types, a strict copy or proxy is made such that the corresponding objects continue to match exactly. In cases where the underlying data is complex but must be copied (e.g. ``tuple``), the data is serialized as efficiently as possible. Shareable objects must be specifically supported internally by the Python runtime. However, there is no restriction against adding support for more types later. Here's the initial list of supported objects: * ``str`` * ``bytes`` * ``int`` * ``float`` * ``bool`` (``True``/``False``) * ``None`` * ``tuple`` (only with shareable items) * ``Queue`` * ``memoryview`` (underlying buffer actually shared) Note that the last two on the list, queues and ``memoryview``, are technically mutable data types, whereas the rest are not. When any interpreters share mutable data there is always a risk of data races. Cross-interpreter safety, including thread-safety, is a fundamental feature of queues. However, ``memoryview`` does not have any native accommodations. The user is responsible for managing thread-safety, whether passing a token back and forth through a queue to indicate safety (see `Synchronization`_), or by assigning sub-range exclusivity to individual interpreters. Most objects will be shared through queues (``Queue``), as interpreters communicate information between each other. Less frequently, objects will be shared through ``prepare_main()`` to set up an interpreter prior to running code in it. However, ``prepare_main()`` is the primary way that queues are shared, to provide another interpreter with a means of further communication. Finally, a reminder: for a few types the actual object is shared, whereas for the rest only the underlying data is shared, whether as a copy or through a proxy. Regardless, it always preserves the strong equivalence guarantee of "shareable" objects. The guarantee is that a shared object in one interpreter is strictly equivalent to the corresponding object in the other interpreter. In other words, the two objects will be indistinguishable from each other. The shared object should be treated as though the original had been shared directly, whether or not it actually was. That's a slightly different and stronger promise than just equality. The guarantee is especially important for mutable objects, like ``Queue`` and ``memoryview``. Mutating the object in one interpreter will always be reflected immediately in every other interpreter sharing the object. Synchronization --------------- There are situations where two interpreters should be synchronized. That may involve sharing a resource, worker management, or preserving sequential consistency. In threaded programming the typical synchronization primitives are types like mutexes. The ``threading`` module exposes several. However, interpreters cannot share objects which means they cannot share ``threading.Lock`` objects. The ``interpreters`` module does not provide any such dedicated synchronization primitives. Instead, ``Queue`` objects provide everything one might need. For example, if there's a shared resource that needs managed access then a queue may be used to manage it, where the interpreters pass an object around to indicate who can use the resource:: import interpreters from mymodule import load_big_data, check_data numworkers = 10 control = interpreters.create_queue() data = memoryview(load_big_data()) def worker(): interp = interpreters.create() interp.prepare_main(control=control, data=data) interp.exec_sync("""if True: from mymodule import edit_data while True: token = control.get() edit_data(data) control.put(token) """) threads = [threading.Thread(target=worker) for _ in range(numworkers)] for t in threads: t.start() token = 'football' control.put(token) while True: control.get() if not check_data(data): break control.put(token) Exceptions ---------- * ``ExecFailure`` Raised from ``Interpreter.exec_sync()`` when there's an uncaught exception. The error display for this exception includes the traceback of the uncaught exception, which gets shown after the normal error display, much like happens for ``ExceptionGroup``. Attributes: * ``type`` - a representation of the original exception's class, with ``__name__``, ``__module__``, and ``__qualname__`` attrs. * ``msg`` - ``str(exc)`` of the original exception * ``snapshot`` - a ``traceback.TracebackException`` object for the original exception This exception is a subclass of ``RuntimeError``. * ``QueueEmpty`` Raised from ``Queue.get()`` (or ``get_nowait()`` with no default) when the queue is empty. This exception is a subclass of ``queue.Empty``. * ``QueueFull`` Raised from ``Queue.put()`` (with a timeout) or ``put_nowait()`` when the queue is already at its max size. This exception is a subclass of ``queue.Full``. InterpreterPoolExecutor ----------------------- Along with the new ``interpreters`` module, there will be a new ``concurrent.futures.InterpreterPoolExecutor``. Each worker executes in its own thread with its own subinterpreter. Communication may still be done through ``Queue`` objects, set with the initializer. Examples -------- The following examples demonstrate practical cases where multiple interpreters may be useful. Example 1: There's a stream of requests coming in that will be handled via workers in sub-threads. * each worker thread has its own interpreter * there's one queue to send tasks to workers and another queue to return results * the results are handled in a dedicated thread * each worker keeps going until it receives a "stop" sentinel (``None``) * the results handler keeps going until all workers have stopped :: import interpreters from mymodule import iter_requests, handle_result tasks = interpreters.create_queue() results = interpreters.create_queue() numworkers = 20 threads = [] def results_handler(): running = numworkers while running: try: res = results.get(timeout=0.1) except interpreters.QueueEmpty: # No workers have finished a request since last time. pass else: if res is None: # A worker has stopped. running -= 1 else: handle_result(res) empty = object() assert results.get_nowait(empty) is empty threads.append(threading.Thread(target=results_handler)) def worker(): interp = interpreters.create() interp.prepare_main(tasks=tasks, results=results) interp.exec_sync("""if True: from mymodule import handle_request, capture_exception while True: req = tasks.get() if req is None: # Stop! break try: res = handle_request(req) except Exception as exc: res = capture_exception(exc) results.put(res) # Notify the results handler. results.put(None) """) threads.extend(threading.Thread(target=worker) for _ in range(numworkers)) for t in threads: t.start() for req in iter_requests(): tasks.put(req) # Send the "stop" signal. for _ in range(numworkers): tasks.put(None) for t in threads: t.join() Example 2: This case is similar to the last as there are a bunch of workers in sub-threads. However, this time the code is chunking up a big array of data, where each worker processes one chunk at a time. Copying that data to each interpreter would be exceptionally inefficient, so the code takes advantage of directly sharing ``memoryview`` buffers. * all the interpreters share the buffer of the source array * each one writes its results to a second shared buffer * there's use a queue to send tasks to workers * only one worker will ever read any given index in the source array * only one worker will ever write to any given index in the results (this is how it ensures thread-safety) :: import interpreters import queue from mymodule import read_large_data_set, use_results numworkers = 3 data, chunksize = read_large_data_set() buf = memoryview(data) numchunks = (len(buf) + 1) / chunksize results = memoryview(b'\0' * numchunks) tasks = interpreters.create_queue() def worker(id): interp = interpreters.create() interp.prepare_main(data=buf, results=results, tasks=tasks) interp.exec_sync("""if True: from mymodule import reduce_chunk while True: req = tasks.get() if res is None: # Stop! break resindex, start, end = req chunk = data[start: end] res = reduce_chunk(chunk) results[resindex] = res """) threads = [threading.Thread(target=worker) for _ in range(numworkers)] for t in threads: t.start() for i in range(numchunks): # Assume there's at least one worker running still. start = i * chunksize end = start + chunksize if end > len(buf): end = len(buf) tasks.put((start, end, i)) # Send the "stop" signal. for _ in range(numworkers): tasks.put(None) for t in threads: t.join() use_results(results) Rejected Ideas ============== See :pep:`PEP 554 <554#rejected-ideas>`. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.