diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 3bb0daa98..ccd3a4dc0 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -232,7 +232,7 @@ repos: language: pygrep entry: '(dev/peps|peps\.python\.org)/pep-\d+' files: '^pep-\d+\.(rst|txt)$' - exclude: '^pep-(0009|0287|0676|8001)\.(rst|txt)$' + exclude: '^pep-(0009|0287|0676|0684|8001)\.(rst|txt)$' types: [text] - id: check-direct-rfc-links diff --git a/pep-0684.rst b/pep-0684.rst index a9d8368fc..22ff10db2 100644 --- a/pep-0684.rst +++ b/pep-0684.rst @@ -5,14 +5,12 @@ Discussions-To: https://mail.python.org/archives/list/python-dev@python.org/thre Status: Draft Type: Standards Track Content-Type: text/x-rst +Requires: 683 Created: 08-Mar-2022 Python-Version: 3.12 Post-History: `08-Mar-2022 `__ Resolution: -.. XXX Split out an informational PEP with all the relevant info, - based on the "Consolidating Runtime Global State" section? - Abstract ======== @@ -130,13 +128,19 @@ make those tasks worth doing anyway: * led to structural layering of the C-API (e.g. ``Include/internal``) * also see `Benefits to Consolidation`_ below +.. XXX Add links to example GitHub issues? + Furthermore, much of that work benefits other CPython-related projects: -* performance improvements ("faster-cpython") -* pre-fork application deployment (e.g. Instagram) +* performance improvements ("`faster-cpython`_") +* pre-fork application deployment (e.g. `Instagram server`_) * extension module isolation (see :pep:`630`, etc.) * embedding CPython +.. _faster-cpython: https://github.com/faster-cpython/ideas + +.. _Instagram server: https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf + Existing Use of Multiple Interpreters ------------------------------------- @@ -155,8 +159,8 @@ Here are some of the public projects using the feature currently: Note that, with :pep:`554`, multiple interpreter usage would likely grow significantly (via Python code rather than the C-API). -PEP 554 -------- +PEP 554 (Multiple Interpreters in the Stdlib) +--------------------------------------------- :pep:`554` is strictly about providing a minimal stdlib module to give users access to multiple interpreters from Python code. @@ -173,21 +177,32 @@ for multi-core Python were explored, but each had its drawbacks without simple solutions: * the existing practice of releasing the GIL in extension modules + * doesn't help with Python code + * other Python implementations (e.g. Jython, IronPython) + * CPython dominates the community + * remove the GIL (e.g. gilectomy, "no-gil") + * too much technical risk (at the time) + * Trent Nelson's "PyParallel" project + * incomplete; Windows-only at the time + * ``multiprocessing`` * too much work to make it effective enough; high penalties in some situations (at large scale, Windows) * other parallelism tools (e.g. dask, ray, MPI) + * not a fit for the stdlib + * give up on multi-core (e.g. async, do nothing) + * this can only end in tears Even in 2014, it was fairly clear that a solution using isolated @@ -207,9 +222,9 @@ following changes, in the order they must happen: 2. move nearly all of the state down into ``PyInterpreterState`` 3. finally, move the GIL down into ``PyInterpreterState`` 4. everything else + * add to the public C-API * implement restrictions in ``ExtensionFileLoader`` - * work with popular extension maintainers to help with multi-interpreter support @@ -220,50 +235,207 @@ The following runtime state will be moved to ``PyInterpreterState``: * all global objects that are not safely shareable (fully immutable) * the GIL -* mutable, currently protected by the GIL -* mutable, currently protected by some other per-interpreter lock -* mutable, may be used independently in different interpreters -* all other mutable (or effectively mutable) state - not otherwise excluded below +* most mutable data that's currently protected by the GIL +* mutable data that's currently protected by some other per-interpreter lock +* mutable data that may be used independently in different interpreters + (also applies to extension modules, including those with multi-phase init) +* all other mutable data not otherwise excluded below -Furthermore, a number of parts of the global state have already been -moved to the interpreter, such as GC, warnings, and atexit hooks. +Furthermore, a portion of the full global state has already been +moved to the interpreter, including GC, warnings, and atexit hooks. -The following state will not be moved: +The following runtime state will not be moved: * global objects that are safely shareable, if any -* immutable, often ``const`` -* treated as immutable -* related to CPython's ``main()`` execution -* related to the REPL -* set during runtime init, then treated as immutable -* mutable, protected by some global lock -* mutable, atomic +* immutable data, often ``const`` +* effectively immutable data (treated as immutable), for example: -Note that currently the allocators (see ``Objects/obmalloc.c``) are shared -between all interpreters, protected by the GIL. They will need to move -to each interpreter (or a global lock will be needed). This is the -highest risk part of the work to isolate interpreters and may require -more than just moving fields down from ``_PyRuntimeState``. Some of -the complexity is reduced if CPython switches to a thread-safe -allocator like mimalloc. + * some state is initialized early and never modified again + * hashes for strings (``PyUnicodeObject``) are idempotently calculated + when first needed and then cached + +* all data that is guaranteed to be modified exclusively in the main thread, + including: + + * state used only in CPython's ``main()`` + * the REPL's state + * data modified only during runtime init (effectively immutable afterward) + +* mutable data that's protected by some global lock (other than the GIL) +* global state in atomic variables +* mutable global state that can be changed (sensibly) to atomic variables + +Memory Allocators +''''''''''''''''' + +This is the highest risk part of the work to isolate interpreters +and may require more than just moving fields down +from ``_PyRuntimeState``. + +CPython provides a memory management C-API, with `three allocator domains`_: +"raw", "mem", and "object". Each provides the equivalent of ``malloc()``, +``calloc()``, ``realloc()``, and ``free()``. A custom allocator for each +domain can be set during runtime initialization and the current allocator +can be wrapped with a hook using the same API (for example, the stdlib +tracemalloc module). The allocators are currently runtime-global, +shared by all interpreters. + +.. _three allocator domains: https://docs.python.org/3/c-api/memory.html#allocator-domains + +The "raw" allocator is expected to be thread-safe and defaults to glibc's +allocator (``malloc()``, etc.). However, the "mem" and "object" allocators +are not expected to be thread-safe and currently may rely on the GIL for +thread-safety. This is partly because the default allocator for both, +AKA "pyobject", `is not thread-safe`_. This is due to how all state for +that allocator is stored in C global variables. +(See ``Objects/obmalloc.c``.) + +.. _is not thread-safe: https://peps.python.org/pep-0445/#gil-free-pymem-malloc + +Thus we come back to the question of isolating runtime state. In order +for interpreters to stop sharing the GIL, allocator thread-safety +must be addressed. If interpreters continue sharing the allocators +then we need some other way to get thread-safety. Otherwise interpreters +must stop sharing the allocators. In both cases there are a number of +possible solutions, each with potential downsides. + +To keep sharing the allocators, the simplest solution is to use +a granular runtime-global lock around the calls to the "mem" and "object" +allocators in ``PyMem_Malloc()``, ``PyObject_Malloc()``, etc. This would +impact performance, but there are some ways to mitigate that (e.g. only +start locking once the first subinterpreter is created). + +Another way to keep sharing the allocators is to require that the "mem" +and "object" allocators be thread-safe. This would mean we'd have to +make the pyobject allocator implementation thread-safe. That could +even involve re-implementing it using an extensible allocator like +mimalloc. The potential downside is in the cost to re-implement +the allocator and the risk of defects inherent to such an endeavor. + +Regardless, a switch to requiring thread-safe allocators would impact +anyone that embeds CPython and currently sets a thread-unsafe allocator. +We'd need to consider who might be affected and how we reduce any +negative impact (e.g. add a basic C-API to help make an allocator +thread-safe). + +If we did stop sharing the allocators between interpreters, we'd have +to do so only for the "mem" and "object" allocators. We might also need +to keep a full set of global allocators for certain runtime-level usage. +There would be some performance penalty due to looking up the current +interpreter and then pointer indirection to get the allocators. +Embedders would also likely have to provide a new allocator context +for each interpreter. On the plus side, allocator hooks (e.g. tracemalloc) +would not be affected. + +This is an open issue for which this proposal has not settled +on a solution. .. _proposed capi: C-API ----- -The following private API will be made public: +Internally, the interpreter state will now track how the import system +should handle extension modules which do not support use with multiple +interpreters. See `Restricting Extension Modules`_ below. We'll refer +to that setting here as "PyInterpreterState.strict_extensions". -* ``_PyInterpreterConfig`` -* ``_Py_NewInterpreter()`` (as ``Py_NewInterpreterEx()``) +The following public API will be added: -The following fields will be added to ``PyInterpreterConfig``: +* ``PyInterpreterConfig`` (struct) +* ``PyInterpreterConfig_LEGACY_INIT`` (macro) +* ``PyInterpreterConfig_INIT`` (macro) +* ``PyThreadState * Py_NewInterpreterEx(PyInterpreterConfig *)`` +* ``bool PyInterpreterState_GetStrictExtensions(PyInterpreterState *)`` +* ``void PyInterpreterState_SetStrictExtensions(PyInterpreterState *, bool)`` -* ``own_gil`` - (bool) create a new interpreter lock - (instead of sharing with the main interpreter) -* ``strict_extensions`` - fail import in this interpreter for - incompatible extensions (see `Restricting Extension Modules`_) +A note about the "main" interpreter: + +Below, we mention the "main" interpreter several times. This refers +to the interpreter created during runtime initialization, for which +the initial ``PyThreadState`` corresponds to the process's main thread. +It is has a number of unique responsibilities (e.g. handling signals), +as well as a special role during runtime initialization/finalization. +It is also usually (for now) the only interpreter. +(Also see https://docs.python.org/3/c-api/init.html#sub-interpreter-support.) + +PyInterpreterConfig +''''''''''''''''''' + +This is a struct with 4 bool fields, effectively:: + + typedef struct { + /* Allow forking the process. */ + unsigned int allow_fork_without_exec; + /* Allow daemon threads. */ + unsigned int allow_daemon_threads; + /* Use a unique "global" interpreter lock. + Otherwise, use the main interpreter's GIL. */ + unsigned int own_gil; + /* Only allow extension modules that support + use in multiple interpreters. */ + unsigned int strict_extensions; + } PyInterpreterConfig; + +The first two fields are essentially derived from the existing +``PyConfig._isolated_interpreter`` field. + +``PyInterpreterConfig.strict_extensions`` is basically the initial +value used for "PyInterpreterState.strict_extensions". + +We may add other fields, as needed, over time +(e.g. possibly "allow_subprocess", "allow_threading", "own_initial_thread"). + +Note that a similar ``_PyInterpreterConfig`` may already exist internally, +with similar fields. +(See `issue #91120 `__ +and `PR #31771 `__.) +If it does exist then ``PyInterpreterConfig`` would replace it. + +PyInterpreterConfig.own_gil +''''''''''''''''''''''''''' + +If ``true`` then the new interpreter will have its own "global" +interpreter lock. This means the new interpreter can run without +getting interrupted by other interpreters. This effectively unblocks +full use of multiple cores. That is the fundamental goal of this PEP. + +If ``false`` then the new interpreter will use the main interpreter's +lock. This is the legacy (pre-3.12) behavior in CPython, where all +interpreters share a single GIL. Sharing the GIL like this may be +desirable when using extension modules that still depend on +the GIL for thread safety. + +PyInterpreterConfig Initializer Macros +'''''''''''''''''''''''''''''''''''''' + +``#define PyInterpreterConfig_LEGACY_INIT {1, 1, 0, 0}`` + +This initializer matches the behavior of ``Py_NewInterpreter()``. +The main interpreter uses this. + +``#define PyInterpreterConfig_INIT {0, 0, 1, 1}`` + +This initializer would be used to get an isolated interpreter that +also avoids subinterpreter-unfriendly features. It would be the default +for interpreters created through :pep:`554`. Fork (without exec) would +be disabled by default due to the general problems of mixing threads +with fork, coupled with the role of the main interpreter in the runtime +lifecycle. Daemon threads would be disabled due to their poor interaction +with interpreter finalization. + +New API Functions +''''''''''''''''' + +``PyThreadState * Py_NewInterpreterEx(PyInterpreterConfig *)`` + +This is like ``Py_NewInterpreter()`` but initializes uses the granular +config. It will replace the "private" func ``_Py_NewInterpreter()``. + +``bool PyInterpreter_GetStrictExtensions(PyInterpreterState *)`` +``void PyInterpreter_SetStrictExtensions(PyInterpreterState *, bool)`` + +Respectively, these get/set "PyInterpreterState.strict_extensions". Restricting Extension Modules ----------------------------- @@ -273,11 +445,24 @@ state is stored in global variables. :pep:`630` covers all the details of what extensions must do to support isolation, and thus safely run in multiple interpreters at once. This includes dealing with their globals. -Extension modules that do not implement isolation will only run in -the main interpreter. In all other interpreters, the import will -raise ``ImportError``. This will be done through +If an extension implements multi-phase init (see :pep:`489`) it is +considered compatible with multiple interpreters. All other extensions +are considered incompatible. This position is based on the premise that +if a module supports use with multiple interpreters then it necessarily +will work even if interpreters do not share the GIL. This position is +still the subject of debate. + +If an incompatible extension is imported and the current +"PyInterpreterState.strict_extensions" value is ``true`` then the import +system will raise ``ImportError``. (For ``false`` it simply doesn't check.) +This will be done through ``importlib._bootstrap_external.ExtensionFileLoader``. +Such imports will never fail in the main interpreter (or in interpreters +created through ``Py_NewInterpreter()``) since +"PyInterpreterState.strict_extensions" initializes to ``false`` in both +cases. Thus the legacy (pre-3.12) behavior is preserved. + We will work with popular extensions to help them support use in multiple interpreters. This may involve adding to CPython's public C-API, which we will address on a case-by-case basis. @@ -293,7 +478,9 @@ of frustration. We will address this by adding a context manager to temporarily disable the check on multiple interpreter support: -``importlib.util.allow_all_extensions()``. +``importlib.util.allow_all_extensions()``. More or less, it will modify +the current "PyInterpreterState.strict_extensions" value (e.g. through +a private ``sys`` function). Documentation ------------- @@ -416,6 +603,9 @@ That said, if it were taught then it would boil down to the following: to create an interpreter. The config you pass it indicates how you want that interpreter to behave. +.. XXX We should add docs (a la PEP 630) that spell out how to make + an extension compatible with per-interpreter GIL. + Reference Implementation ======================== @@ -426,8 +616,17 @@ Reference Implementation Open Issues =========== -* What are the risks/hurdles involved with moving the allocators? -* Is ``allow_all_extensions`` the best name for the context manager? +* What to do about the allocators? +* How would a per-interpreter tracemalloc module relate to global allocators? +* Would the faulthandler module be limited to the main interpreter + (like the signal module) or would we leak that global state between + interpreters (protected by a granular lock)? +* Split out an informational PEP with all the relevant info, + based on the "Consolidating Runtime Global State" section? +* Does supporting multiple interpreters automatically mean an extension + supports a per-interpreter GIL? +* What would be a better (scarier-sounding) name + for ``allow_all_extensions``? Deferred Functionality @@ -500,22 +699,12 @@ Consolidating the globals has a variety of benefits: * greatly reduces the number of C globals (best practice for C code) * the move draws attention to runtime state that is unstable or broken * encourages more consistency in how runtime state is used -* makes multiple-interpreter behavior more reliable -* leads to fixes for long-standing runtime bugs that otherwise - haven't been prioritized -* exposes (and inspires fixes for) previously unknown runtime bugs -* facilitates cleaner runtime initialization and finalization * makes it easier to discover/identify CPython's runtime state * makes it easier to statically allocate runtime state in a consistent way * better memory locality for runtime state -* structural layering of the C-API (e.g. ``Include/internal``) -Furthermore, much of that work benefits other CPython-related projects: - -* performance improvements ("faster-cpython") -* pre-fork application deployment (e.g. Instagram) -* extension module isolation (see :pep:`630`, etc.) -* embedding CPython +Furthermore all the benefits listed in `Indirect Benefits`_ above also +apply here, and the same projects listed there benefit. Scale of Work ''''''''''''' @@ -532,12 +721,15 @@ State To Be Moved The remaining global variables can be categorized as follows: * global objects + * static types (incl. exception types) * non-static types (incl. heap types, structseq types) * singletons (static) * singletons (initialized once) * cached objects + * non-objects + * will not (or unlikely to) change after init * only used in the main thread * initialized lazily @@ -582,8 +774,10 @@ globals and reason about them. * ``Tools/c-analyzer/cpython/globals-to-fix.tsv`` - the list of remaining globals * ``Tools/c-analyzer/c-analyzer.py`` + * ``analyze`` - identify all the globals * ``check`` - fail if there are any unsupported globals that aren't ignored + * ``Tools/c-analyzer/table-file.py`` - summarize the known globals Also, the check for unsupported globals is incorporated into CI so that @@ -616,15 +810,15 @@ References Related: -* :pep:`384` -* :pep:`432` -* :pep:`489` -* :pep:`554` -* :pep:`573` -* :pep:`587` -* :pep:`630` -* :pep:`683` -* :pep:`3121` +* :pep:`384` "Defining a Stable ABI" +* :pep:`432` "Restructuring the CPython startup sequence" +* :pep:`489` "Multi-phase extension module initialization" +* :pep:`554` "Multiple Interpreters in the Stdlib" +* :pep:`573` "Module State Access from C Extension Methods" +* :pep:`587` "Python Initialization Configuration" +* :pep:`630` "Isolating Extension Modules" +* :pep:`683` "Immortal Objects, Using a Fixed Refcount" +* :pep:`3121` "Extension Module Initialization and Finalization" Copyright