PEP: 684 Title: A Per-Interpreter GIL Author: Eric Snow Discussions-To: https://discuss.python.org/t/pep-684-a-per-interpreter-gil/19583 Status: Draft Type: Standards Track Content-Type: text/x-rst Requires: 683 Created: 08-Mar-2022 Python-Version: 3.12 Post-History: `08-Mar-2022 `__, `29-Sep-2022 `__, Resolution: Abstract ======== Since Python 1.5 (1997), CPython users can run multiple interpreters in the same process. However, interpreters in the same process have always shared a significant amount of global state. This is a source of bugs, with a growing impact as more and more people use the feature. Furthermore, sufficient isolation would facilitate true multi-core parallelism, where interpreters no longer share the GIL. The changes outlined in this proposal will result in that level of interpreter isolation. High-Level Summary ================== At a high level, this proposal changes CPython in the following ways: * stops sharing the GIL between interpreters, given sufficient isolation * adds several new interpreter config options for isolation settings * adds some public C-API for fine-grained control when creating interpreters * keeps incompatible extensions from causing problems The GIL ------- The GIL protects concurrent access to most of CPython's runtime state. So all that GIL-protected global state must move to each interpreter before the GIL can. (In a handful of cases, other mechanisms can be used to ensure thread-safe sharing instead, such as locks or "immortal" objects.) CPython Runtime State --------------------- Properly isolating interpreters requires that most of CPython's runtime state be stored in the ``PyInterpreterState`` struct. Currently, only a portion of it is; the rest is found either in global variables or in ``_PyRuntimeState``. Most of that will have to be moved. This directly coincides with an ongoing effort (of many years) to greatly reduce internal use of C global variables and consolidate the runtime state into ``_PyRuntimeState`` and ``PyInterpreterState``. (See `Consolidating Runtime Global State`_ below.) That project has `significant merit on its own `_ and has faced little controversy. So, while a per-interpreter GIL relies on the completion of that effort, that project should not be considered a part of this proposal--only a dependency. Other Isolation Considerations ------------------------------ CPython's interpreters must be strictly isolated from each other, with few exceptions. To a large extent they already are. Each interpreter has its own copy of all modules, classes, functions, and variables. The CPython C-API docs `explain further `_. .. _caveats: https://docs.python.org/3/c-api/init.html#bugs-and-caveats However, aside from what has already been mentioned (e.g. the GIL), there are a couple of ways in which interpreters still share some state. First of all, some process-global resources (e.g. memory, file descriptors, environment variables) are shared. There are no plans to change this. Second, some isolation is faulty due to bugs or implementations that did not take multiple interpreters into account. This includes CPython's runtime and the stdlib, as well as extension modules that rely on global variables. Bugs should be opened in these cases, as some already have been. Depending on Immortal Objects ----------------------------- :pep:`683` introduces immortal objects as a CPython-internal feature. With immortal objects, we can share any otherwise immutable global objects between all interpreters. Consequently, this PEP does not need to address how to deal with the various objects `exposed in the public C-API `_. It also simplifies the question of what to do about the builtin static types. (See `Global Objects`_ below.) Both issues have alternate solutions, but everything is simpler with immortal objects. If PEP 683 is not accepted then this one will be updated with the alternatives. This lets us reduce noise in this proposal. Motivation ========== The fundamental problem we're solving here is a lack of true multi-core parallelism (for Python code) in the CPython runtime. The GIL is the cause. While it usually isn't a problem in practice, at the very least it makes Python's multi-core story murky, which makes the GIL a consistent distraction. Isolated interpreters are also an effective mechanism to support certain concurrency models. :pep:`554` discusses this in more detail. Indirect Benefits ----------------- Most of the effort needed for a per-interpreter GIL has benefits that make those tasks worth doing anyway: * makes multiple-interpreter behavior more reliable * has led to fixes for long-standing runtime bugs that otherwise hadn't been prioritized * has been exposing (and inspiring fixes for) previously unknown runtime bugs * has driven cleaner runtime initialization (:pep:`432`, :pep:`587`) * has driven cleaner and more complete runtime finalization * led to structural layering of the C-API (e.g. ``Include/internal``) * also see `Benefits to Consolidation`_ below .. XXX Add links to example GitHub issues? Furthermore, much of that work benefits other CPython-related projects: * performance improvements ("`faster-cpython`_") * pre-fork application deployment (e.g. `Instagram server`_) * extension module isolation (see :pep:`630`, etc.) * embedding CPython .. _faster-cpython: https://github.com/faster-cpython/ideas .. _Instagram server: https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf Existing Use of Multiple Interpreters ------------------------------------- The C-API for multiple interpreters has been used for many years. However, until relatively recently the feature wasn't widely known, nor extensively used (with the exception of mod_wsgi). In the last few years use of multiple interpreters has been increasing. Here are some of the public projects using the feature currently: * `mod_wsgi `_ * `OpenStack Ceph `_ * `JEP `_ * `Kodi `_ Note that, with :pep:`554`, multiple interpreter usage would likely grow significantly (via Python code rather than the C-API). PEP 554 (Multiple Interpreters in the Stdlib) --------------------------------------------- :pep:`554` is strictly about providing a minimal stdlib module to give users access to multiple interpreters from Python code. In fact, it specifically avoids proposing any changes related to the GIL. Consider, however, that users of that module would benefit from a per-interpreter GIL, which makes PEP 554 more appealing. Rationale ========= During initial investigations in 2014, a variety of possible solutions for multi-core Python were explored, but each had its drawbacks without simple solutions: * the existing practice of releasing the GIL in extension modules * doesn't help with Python code * other Python implementations (e.g. Jython, IronPython) * CPython dominates the community * remove the GIL (e.g. gilectomy, "no-gil") * too much technical risk (at the time) * Trent Nelson's "PyParallel" project * incomplete; Windows-only at the time * ``multiprocessing`` * too much work to make it effective enough; high penalties in some situations (at large scale, Windows) * other parallelism tools (e.g. dask, ray, MPI) * not a fit for the stdlib * give up on multi-core (e.g. async, do nothing) * this can only end in tears Even in 2014, it was fairly clear that a solution using isolated interpreters did not have a high level of technical risk and that most of the work was worth doing anyway. (The downside was the volume of work to be done.) Specification ============= As `summarized above `__, this proposal involves the following changes, in the order they must happen: 1. `consolidate global runtime state `_ (including objects) into ``_PyRuntimeState`` 2. move nearly all of the state down into ``PyInterpreterState`` 3. finally, move the GIL down into ``PyInterpreterState`` 4. everything else * add to the public C-API * implement restrictions in ``ExtensionFileLoader`` * work with popular extension maintainers to help with multi-interpreter support Per-Interpreter State --------------------- The following runtime state will be moved to ``PyInterpreterState``: * all global objects that are not safely shareable (fully immutable) * the GIL * most mutable data that's currently protected by the GIL * mutable data that's currently protected by some other per-interpreter lock * mutable data that may be used independently in different interpreters (also applies to extension modules, including those with multi-phase init) * all other mutable data not otherwise excluded below Furthermore, a portion of the full global state has already been moved to the interpreter, including GC, warnings, and atexit hooks. The following runtime state will not be moved: * global objects that are safely shareable, if any * immutable data, often ``const`` * effectively immutable data (treated as immutable), for example: * some state is initialized early and never modified again * hashes for strings (``PyUnicodeObject``) are idempotently calculated when first needed and then cached * all data that is guaranteed to be modified exclusively in the main thread, including: * state used only in CPython's ``main()`` * the REPL's state * data modified only during runtime init (effectively immutable afterward) * mutable data that's protected by some global lock (other than the GIL) * global state in atomic variables * mutable global state that can be changed (sensibly) to atomic variables Memory Allocators ''''''''''''''''' This is the highest risk part of the work to isolate interpreters and may require more than just moving fields down from ``_PyRuntimeState``. CPython provides a memory management C-API, with `three allocator domains`_: "raw", "mem", and "object". Each provides the equivalent of ``malloc()``, ``calloc()``, ``realloc()``, and ``free()``. A custom allocator for each domain can be set during runtime initialization and the current allocator can be wrapped with a hook using the same API (for example, the stdlib tracemalloc module). The allocators are currently runtime-global, shared by all interpreters. .. _three allocator domains: https://docs.python.org/3/c-api/memory.html#allocator-domains The "raw" allocator is expected to be thread-safe and defaults to glibc's allocator (``malloc()``, etc.). However, the "mem" and "object" allocators are not expected to be thread-safe and currently may rely on the GIL for thread-safety. This is partly because the default allocator for both, AKA "pyobject", `is not thread-safe`_. This is due to how all state for that allocator is stored in C global variables. (See ``Objects/obmalloc.c``.) .. _is not thread-safe: https://peps.python.org/pep-0445/#gil-free-pymem-malloc Thus we come back to the question of isolating runtime state. In order for interpreters to stop sharing the GIL, allocator thread-safety must be addressed. If interpreters continue sharing the allocators then we need some other way to get thread-safety. Otherwise interpreters must stop sharing the allocators. In both cases there are a number of possible solutions, each with potential downsides. To keep sharing the allocators, the simplest solution is to use a granular runtime-global lock around the calls to the "mem" and "object" allocators in ``PyMem_Malloc()``, ``PyObject_Malloc()``, etc. This would impact performance, but there are some ways to mitigate that (e.g. only start locking once the first subinterpreter is created). Another way to keep sharing the allocators is to require that the "mem" and "object" allocators be thread-safe. This would mean we'd have to make the pyobject allocator implementation thread-safe. That could even involve re-implementing it using an extensible allocator like mimalloc. The potential downside is in the cost to re-implement the allocator and the risk of defects inherent to such an endeavor. Regardless, a switch to requiring thread-safe allocators would impact anyone that embeds CPython and currently sets a thread-unsafe allocator. We'd need to consider who might be affected and how we reduce any negative impact (e.g. add a basic C-API to help make an allocator thread-safe). If we did stop sharing the allocators between interpreters, we'd have to do so only for the "mem" and "object" allocators. We might also need to keep a full set of global allocators for certain runtime-level usage. There would be some performance penalty due to looking up the current interpreter and then pointer indirection to get the allocators. Embedders would also likely have to provide a new allocator context for each interpreter. On the plus side, allocator hooks (e.g. tracemalloc) would not be affected. This is an open issue for which this proposal has not settled on a solution. .. _proposed capi: C-API ----- Internally, the interpreter state will now track how the import system should handle extension modules which do not support use with multiple interpreters. See `Restricting Extension Modules`_ below. We'll refer to that setting here as "PyInterpreterState.strict_extensions". The following public API will be added: * ``PyInterpreterConfig`` (struct) * ``PyInterpreterConfig_LEGACY_INIT`` (macro) * ``PyInterpreterConfig_INIT`` (macro) * ``PyThreadState * Py_NewInterpreterEx(PyInterpreterConfig *)`` * ``bool PyInterpreterState_GetStrictExtensions(PyInterpreterState *)`` * ``void PyInterpreterState_SetStrictExtensions(PyInterpreterState *, bool)`` A note about the "main" interpreter: Below, we mention the "main" interpreter several times. This refers to the interpreter created during runtime initialization, for which the initial ``PyThreadState`` corresponds to the process's main thread. It is has a number of unique responsibilities (e.g. handling signals), as well as a special role during runtime initialization/finalization. It is also usually (for now) the only interpreter. (Also see https://docs.python.org/3/c-api/init.html#sub-interpreter-support.) PyInterpreterConfig ''''''''''''''''''' This is a struct with 4 bool fields, effectively:: typedef struct { /* Allow forking the process. */ unsigned int allow_fork_without_exec; /* Allow daemon threads. */ unsigned int allow_daemon_threads; /* Use a unique "global" interpreter lock. Otherwise, use the main interpreter's GIL. */ unsigned int own_gil; /* Only allow extension modules that support use in multiple interpreters. */ unsigned int strict_extensions; } PyInterpreterConfig; The first two fields are essentially derived from the existing ``PyConfig._isolated_interpreter`` field. ``PyInterpreterConfig.strict_extensions`` is basically the initial value used for "PyInterpreterState.strict_extensions". We may add other fields, as needed, over time (e.g. possibly "allow_subprocess", "allow_threading", "own_initial_thread"). Note that a similar ``_PyInterpreterConfig`` may already exist internally, with similar fields. (See `issue #91120 `__ and `PR #31771 `__.) If it does exist then ``PyInterpreterConfig`` would replace it. PyInterpreterConfig.own_gil ''''''''''''''''''''''''''' If ``true`` then the new interpreter will have its own "global" interpreter lock. This means the new interpreter can run without getting interrupted by other interpreters. This effectively unblocks full use of multiple cores. That is the fundamental goal of this PEP. If ``false`` then the new interpreter will use the main interpreter's lock. This is the legacy (pre-3.12) behavior in CPython, where all interpreters share a single GIL. Sharing the GIL like this may be desirable when using extension modules that still depend on the GIL for thread safety. PyInterpreterConfig Initializer Macros '''''''''''''''''''''''''''''''''''''' ``#define PyInterpreterConfig_LEGACY_INIT {1, 1, 0, 0}`` This initializer matches the behavior of ``Py_NewInterpreter()``. The main interpreter uses this. ``#define PyInterpreterConfig_INIT {0, 0, 1, 1}`` This initializer would be used to get an isolated interpreter that also avoids subinterpreter-unfriendly features. It would be the default for interpreters created through :pep:`554`. Fork (without exec) would be disabled by default due to the general problems of mixing threads with fork, coupled with the role of the main interpreter in the runtime lifecycle. Daemon threads would be disabled due to their poor interaction with interpreter finalization. New API Functions ''''''''''''''''' ``PyThreadState * Py_NewInterpreterEx(PyInterpreterConfig *)`` This is like ``Py_NewInterpreter()`` but initializes uses the granular config. It will replace the "private" func ``_Py_NewInterpreter()``. ``bool PyInterpreter_GetStrictExtensions(PyInterpreterState *)`` ``void PyInterpreter_SetStrictExtensions(PyInterpreterState *, bool)`` Respectively, these get/set "PyInterpreterState.strict_extensions". Restricting Extension Modules ----------------------------- Extension modules have many of the same problems as the runtime when state is stored in global variables. :pep:`630` covers all the details of what extensions must do to support isolation, and thus safely run in multiple interpreters at once. This includes dealing with their globals. If an extension implements multi-phase init (see :pep:`489`) it is considered compatible with multiple interpreters. All other extensions are considered incompatible. This position is based on the premise that if a module supports use with multiple interpreters then it necessarily will work even if interpreters do not share the GIL. This position is still the subject of debate. If an incompatible extension is imported and the current "PyInterpreterState.strict_extensions" value is ``true`` then the import system will raise ``ImportError``. (For ``false`` it simply doesn't check.) This will be done through ``importlib._bootstrap_external.ExtensionFileLoader``. Such imports will never fail in the main interpreter (or in interpreters created through ``Py_NewInterpreter()``) since "PyInterpreterState.strict_extensions" initializes to ``false`` in both cases. Thus the legacy (pre-3.12) behavior is preserved. We will work with popular extensions to help them support use in multiple interpreters. This may involve adding to CPython's public C-API, which we will address on a case-by-case basis. Extension Module Compatibility '''''''''''''''''''''''''''''' As noted in `Extension Modules`_, many extensions work fine in multiple interpreters without needing any changes. The import system will still fail if such a module doesn't explicitly indicate support. At first, not many extension modules will, so this is a potential source of frustration. We will address this by adding a context manager to temporarily disable the check on multiple interpreter support: ``importlib.util.allow_all_extensions()``. More or less, it will modify the current "PyInterpreterState.strict_extensions" value (e.g. through a private ``sys`` function). Documentation ------------- The "Sub-interpreter support" section of ``Doc/c-api/init.rst`` will be updated with the added API. Impact ====== Backwards Compatibility ----------------------- No behavior or APIs are intended to change due to this proposal, with one exception noted in `the next section `_. The existing C-API for managing interpreters will preserve its current behavior, with new behavior exposed through new API. No other API or runtime behavior is meant to change, including compatibility with the stable ABI. See `Objects Exposed in the C-API`_ below for related discussion. Extension Modules ''''''''''''''''' Currently the most common usage of Python, by far, is with the main interpreter running by itself. This proposal has zero impact on extension modules in that scenario. Likewise, for better or worse, there is no change in behavior under multiple interpreters created using the existing ``Py_NewInterpreter()``. Keep in mind that some extensions already break when used in multiple interpreters, due to keeping module state in global variables. They may crash or, worse, experience inconsistent behavior. That was part of the motivation for :pep:`630` and friends, so this is not a new situation nor a consequence of this proposal. In contrast, when the `proposed API `_ is used to create multiple interpreters, the default behavior will change for some extensions. In that case, importing an extension will fail (outside the main interpreter) if it doesn't indicate support for multiple interpreters. For extensions that already break in multiple interpreters, this will be an improvement. Now we get to the break in compatibility mentioned above. Some extensions are safe under multiple interpreters, even though they haven't indicated that. Unfortunately, there is no reliable way for the import system to infer that such an extension is safe, so importing them will still fail. This case is addressed in `Extension Module Compatibility`_ below. Extension Module Maintainers ---------------------------- One related consideration is that a per-interpreter GIL will likely drive increased use of multiple interpreters, particularly if :pep:`554` is accepted. Some maintainers of large extension modules have expressed concern about the increased burden they anticipate due to increased use of multiple interpreters. Specifically, enabling support for multiple interpreters will require substantial work for some extension modules. To add that support, the maintainer(s) of such a module (often volunteers) would have to set aside their normal priorities and interests to focus on compatibility (see :pep:`630`). Of course, extension maintainers are free to not add support for use in multiple interpreters. However, users will increasingly demand such support, especially if the feature grows in popularity. Either way, the situation can be stressful for maintainers of such extensions, particularly when they are doing the work in their spare time. The concerns they have expressed are understandable, and we address the partial solution in `Restricting Extension Modules`_ below. Alternate Python Implementations -------------------------------- Other Python implementation are not required to provide support for multiple interpreters in the same process (though some do already). Security Implications --------------------- There is no known impact to security with this proposal. Maintainability --------------- On the one hand, this proposal has already motivated a number of improvements that make CPython *more* maintainable. That is expected to continue. On the other hand, the underlying work has already exposed various pre-existing defects in the runtime that have had to be fixed. That is also expected to continue as multiple interpreters receive more use. Otherwise, there shouldn't be a significant impact on maintainability, so the net effect should be positive. Performance ----------- The work to consolidate globals has already provided a number of improvements to CPython's performance, both speeding it up and using less memory, and this should continue. Performance benefits to a per-interpreter GIL have not been explored. At the very least, it is not expected to make CPython slower (as long as interpreters are sufficiently isolated). How to Teach This ================= This is an advanced feature for users of the C-API. There is no expectation that this will be taught. That said, if it were taught then it would boil down to the following: In addition to Py_NewInterpreter(), you can use Py_NewInterpreterEx() to create an interpreter. The config you pass it indicates how you want that interpreter to behave. .. XXX We should add docs (a la PEP 630) that spell out how to make an extension compatible with per-interpreter GIL. Reference Implementation ======================== Open Issues =========== * What to do about the allocators? * How would a per-interpreter tracemalloc module relate to global allocators? * Would the faulthandler module be limited to the main interpreter (like the signal module) or would we leak that global state between interpreters (protected by a granular lock)? * Split out an informational PEP with all the relevant info, based on the "Consolidating Runtime Global State" section? * Does supporting multiple interpreters automatically mean an extension supports a per-interpreter GIL? * What would be a better (scarier-sounding) name for ``allow_all_extensions``? Deferred Functionality ====================== * ``PyInterpreterConfig`` option to always run the interpreter in a new thread * ``PyInterpreterConfig`` option to assign a "main" thread to the interpreter and only run in that thread Rejected Ideas ============== Extra Context ============= Sharing Global Objects ---------------------- We are sharing some global objects between interpreters. This is an implementation detail and relates more to `globals consolidation `_ than to this proposal, but it is a significant enough detail to explain here. The alternative is to share no objects between interpreters, ever. To accomplish that, we'd have to sort out the fate of all our static types, as well as deal with compatibility issues for the many objects `exposed in the public C-API `_. That approach introduces a meaningful amount of extra complexity and higher risk, though prototyping has demonstrated valid solutions. Also, it would likely result in a performance penalty. `Immortal objects `_ allow us to share the otherwise immutable global objects. That way we avoid the extra costs. .. _capi objects: Objects Exposed in the C-API '''''''''''''''''''''''''''' The C-API (including the limited API) exposes all the builtin types, including the builtin exceptions, as well as the builtin singletons. The exceptions are exposed as ``PyObject *`` but the rest are exposed as the static values rather than pointers. This was one of the few non-trivial problems we had to solve for per-interpreter GIL. With immortal objects this is a non-issue. Consolidating Runtime Global State ---------------------------------- As noted in `CPython Runtime State`_ above, there is an active effort (separate from this PEP) to consolidate CPython's global state into the ``_PyRuntimeState`` struct. Nearly all the work involves moving that state from global variables. The project is particularly relevant to this proposal, so below is some extra detail. Benefits to Consolidation ''''''''''''''''''''''''' Consolidating the globals has a variety of benefits: * greatly reduces the number of C globals (best practice for C code) * the move draws attention to runtime state that is unstable or broken * encourages more consistency in how runtime state is used * makes it easier to discover/identify CPython's runtime state * makes it easier to statically allocate runtime state in a consistent way * better memory locality for runtime state Furthermore all the benefits listed in `Indirect Benefits`_ above also apply here, and the same projects listed there benefit. Scale of Work ''''''''''''' The number of global variables to be moved is large enough to matter, but most are Python objects that can be dealt with in large groups (like ``Py_IDENTIFIER``). In nearly all cases, moving these globals to the interpreter is highly mechanical. That doesn't require cleverness but instead requires someone to put in the time. State To Be Moved ''''''''''''''''' The remaining global variables can be categorized as follows: * global objects * static types (incl. exception types) * non-static types (incl. heap types, structseq types) * singletons (static) * singletons (initialized once) * cached objects * non-objects * will not (or unlikely to) change after init * only used in the main thread * initialized lazily * pre-allocated buffers * state Those globals are spread between the core runtime, the builtin modules, and the stdlib extension modules. For a breakdown of the remaining globals, run: .. code-block:: bash ./python Tools/c-analyzer/table-file.py Tools/c-analyzer/cpython/globals-to-fix.tsv Already Completed Work '''''''''''''''''''''' As mentioned, this work has been going on for many years. Here are some of the things that have already been done: * cleanup of runtime initialization (see :pep:`432` / :pep:`587`) * extension module isolation machinery (see :pep:`384` / :pep:`3121` / :pep:`489`) * isolation for many builtin modules * isolation for many stdlib extension modules * addition of ``_PyRuntimeState`` * no more ``_Py_IDENTIFIER()`` * statically allocated: * empty string * string literals * identifiers * latin-1 strings * length-1 bytes * empty tuple Tooling ''''''' As already indicated, there are several tools to help identify the globals and reason about them. * ``Tools/c-analyzer/cpython/globals-to-fix.tsv`` - the list of remaining globals * ``Tools/c-analyzer/c-analyzer.py`` * ``analyze`` - identify all the globals * ``check`` - fail if there are any unsupported globals that aren't ignored * ``Tools/c-analyzer/table-file.py`` - summarize the known globals Also, the check for unsupported globals is incorporated into CI so that no new globals are accidentally added. Global Objects '''''''''''''' Global objects that are safe to be shared (without a GIL) between interpreters can stay on ``_PyRuntimeState``. Not only must the object be effectively immutable (e.g. singletons, strings), but not even the refcount can change for it to be safe. Immortality (:pep:`683`) provides that. (The alternative is that no objects are shared, which adds significant complexity to the solution, particularly for the objects `exposed in the public C-API `_.) Builtin static types are a special case of global objects that will be shared. They are effectively immutable except for one part: ``__subclasses__`` (AKA ``tp_subclasses``). We expect that nothing else on a builtin type will change, even the content of ``__dict__`` (AKA ``tp_dict``). ``__subclasses__`` for the builtin types will be dealt with by making it a getter that does a lookup on the current ``PyInterpreterState`` for that type. References ========== Related: * :pep:`384` "Defining a Stable ABI" * :pep:`432` "Restructuring the CPython startup sequence" * :pep:`489` "Multi-phase extension module initialization" * :pep:`554` "Multiple Interpreters in the Stdlib" * :pep:`573` "Module State Access from C Extension Methods" * :pep:`587` "Python Initialization Configuration" * :pep:`630` "Isolating Extension Modules" * :pep:`683` "Immortal Objects, Using a Fixed Refcount" * :pep:`3121` "Extension Module Initialization and Finalization" Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.