PEP 684: Updates for Round 2 of Discussions (gh-2807)

This commit is contained in:
Eric Snow 2022-10-28 19:28:38 -06:00 committed by GitHub
parent 37a5ea8fe6
commit 4a2e3e3059
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 180 additions and 129 deletions

View File

@ -33,7 +33,6 @@ At a high level, this proposal changes CPython in the following ways:
* stops sharing the GIL between interpreters, given sufficient isolation * stops sharing the GIL between interpreters, given sufficient isolation
* adds several new interpreter config options for isolation settings * adds several new interpreter config options for isolation settings
* adds some public C-API for fine-grained control when creating interpreters
* keeps incompatible extensions from causing problems * keeps incompatible extensions from causing problems
The GIL The GIL
@ -51,11 +50,11 @@ CPython Runtime State
Properly isolating interpreters requires that most of CPython's Properly isolating interpreters requires that most of CPython's
runtime state be stored in the ``PyInterpreterState`` struct. Currently, runtime state be stored in the ``PyInterpreterState`` struct. Currently,
only a portion of it is; the rest is found either in global variables only a portion of it is; the rest is found either in C global variables
or in ``_PyRuntimeState``. Most of that will have to be moved. or in ``_PyRuntimeState``. Most of that will have to be moved.
This directly coincides with an ongoing effort (of many years) to greatly This directly coincides with an ongoing effort (of many years) to greatly
reduce internal use of C global variables and consolidate the runtime reduce internal use of global variables and consolidate the runtime
state into ``_PyRuntimeState`` and ``PyInterpreterState``. state into ``_PyRuntimeState`` and ``PyInterpreterState``.
(See `Consolidating Runtime Global State`_ below.) That project has (See `Consolidating Runtime Global State`_ below.) That project has
`significant merit on its own <Benefits to Consolidation_>`_ `significant merit on its own <Benefits to Consolidation_>`_
@ -201,7 +200,7 @@ without simple solutions:
* other parallelism tools (e.g. dask, ray, MPI) * other parallelism tools (e.g. dask, ray, MPI)
* not a fit for the stdlib * not a fit for the runtime/stdlib
* give up on multi-core (e.g. async, do nothing) * give up on multi-core (e.g. async, do nothing)
@ -225,8 +224,8 @@ following changes, in the order they must happen:
3. finally, move the GIL down into ``PyInterpreterState`` 3. finally, move the GIL down into ``PyInterpreterState``
4. everything else 4. everything else
* add to the public C-API * update the C-API
* implement restrictions in ``ExtensionFileLoader`` * implement extension module restrictions
* work with popular extension maintainers to help * work with popular extension maintainers to help
with multi-interpreter support with multi-interpreter support
@ -270,9 +269,11 @@ The following runtime state will not be moved:
Memory Allocators Memory Allocators
''''''''''''''''' '''''''''''''''''
This is the highest risk part of the work to isolate interpreters This is one of the most sensitive parts of the work to isolate interpreters.
and may require more than just moving fields down The simplest solution is to move the global state of the internal
from ``_PyRuntimeState``. "small block" allocator to ``PyInterpreterState``, as we are doing with
nearly all other runtime state. The following elaborates on the details
and rationale.
CPython provides a memory management C-API, with `three allocator domains`_: CPython provides a memory management C-API, with `three allocator domains`_:
"raw", "mem", and "object". Each provides the equivalent of ``malloc()``, "raw", "mem", and "object". Each provides the equivalent of ``malloc()``,
@ -329,8 +330,17 @@ Embedders would also likely have to provide a new allocator context
for each interpreter. On the plus side, allocator hooks (e.g. tracemalloc) for each interpreter. On the plus side, allocator hooks (e.g. tracemalloc)
would not be affected. would not be affected.
This is an open issue for which this proposal has not settled Ultimately, we will go with the simplest option:
on a solution.
* keep the allocators in the global runtime state
* require that they be thread-safe
* move the state of the default object allocator (AKA "small block"
allocator) to ``PyInterpreterState``
We experimented with `a rough implementation`_ and found it was fairly
straightforward, and the performance penalty was essentially zero.
.. _a rough implementation: https://github.com/ericsnowcurrently/cpython/tree/try-per-interpreter-alloc
.. _proposed capi: .. _proposed capi:
@ -340,16 +350,29 @@ C-API
Internally, the interpreter state will now track how the import system Internally, the interpreter state will now track how the import system
should handle extension modules which do not support use with multiple should handle extension modules which do not support use with multiple
interpreters. See `Restricting Extension Modules`_ below. We'll refer interpreters. See `Restricting Extension Modules`_ below. We'll refer
to that setting here as "PyInterpreterState.strict_extensions". to that setting here as "PyInterpreterState.strict_extension_compat".
The following public API will be added: The following API will be made public, if they haven't been already:
* ``PyInterpreterConfig`` (struct) * ``PyInterpreterConfig`` (struct)
* ``PyInterpreterConfig_LEGACY_INIT`` (macro)
* ``PyInterpreterConfig_INIT`` (macro) * ``PyInterpreterConfig_INIT`` (macro)
* ``PyThreadState * Py_NewInterpreterEx(PyInterpreterConfig *)`` * ``PyInterpreterConfig_LEGACY_INIT`` (macro)
* ``bool PyInterpreterState_GetStrictExtensions(PyInterpreterState *)`` * ``PyThreadState * Py_NewInterpreterFromConfig(PyInterpreterConfig *)``
* ``void PyInterpreterState_SetStrictExtensions(PyInterpreterState *, bool)``
We will add two new fields to ``PyInterpreterConfig``:
* ``int own_gil``
* ``int strict_extensions_compat``
We may add other fields over time, as needed (e.g. "own_initial_thread").
Regarding the initializer macros, ``PyInterpreterConfig_INIT`` would
be used to get an isolated interpreter that also avoids
subinterpreter-unfriendly features. It would be the default for
interpreters created through :pep:`554`. The unrestricted (status quo)
will continue to be available through ``PyInterpreterConfig_LEGACY_INIT``,
which is already used for the main interpreter and ``Py_NewInterpreter()``.
This will not change.
A note about the "main" interpreter: A note about the "main" interpreter:
@ -361,83 +384,28 @@ as well as a special role during runtime initialization/finalization.
It is also usually (for now) the only interpreter. It is also usually (for now) the only interpreter.
(Also see https://docs.python.org/3/c-api/init.html#sub-interpreter-support.) (Also see https://docs.python.org/3/c-api/init.html#sub-interpreter-support.)
PyInterpreterConfig
'''''''''''''''''''
This is a struct with 4 bool fields, effectively::
typedef struct {
/* Allow forking the process. */
unsigned int allow_fork_without_exec;
/* Allow daemon threads. */
unsigned int allow_daemon_threads;
/* Use a unique "global" interpreter lock.
Otherwise, use the main interpreter's GIL. */
unsigned int own_gil;
/* Only allow extension modules that support
use in multiple interpreters. */
unsigned int strict_extensions;
} PyInterpreterConfig;
The first two fields are essentially derived from the existing
``PyConfig._isolated_interpreter`` field.
``PyInterpreterConfig.strict_extensions`` is basically the initial
value used for "PyInterpreterState.strict_extensions".
We may add other fields, as needed, over time
(e.g. possibly "allow_subprocess", "allow_threading", "own_initial_thread").
Note that a similar ``_PyInterpreterConfig`` may already exist internally,
with similar fields.
(See `issue #91120 <https://github.com/python/cpython/issues/91120>`__
and `PR #31771 <https://github.com/python/cpython/pull/31771>`__.)
If it does exist then ``PyInterpreterConfig`` would replace it.
PyInterpreterConfig.own_gil PyInterpreterConfig.own_gil
''''''''''''''''''''''''''' '''''''''''''''''''''''''''
If ``true`` then the new interpreter will have its own "global" If ``true`` (``1``) then the new interpreter will have its own "global"
interpreter lock. This means the new interpreter can run without interpreter lock. This means the new interpreter can run without
getting interrupted by other interpreters. This effectively unblocks getting interrupted by other interpreters. This effectively unblocks
full use of multiple cores. That is the fundamental goal of this PEP. full use of multiple cores. That is the fundamental goal of this PEP.
If ``false`` then the new interpreter will use the main interpreter's If ``false`` (``0``) then the new interpreter will use the main
lock. This is the legacy (pre-3.12) behavior in CPython, where all interpreter's lock. This is the legacy (pre-3.12) behavior in CPython,
interpreters share a single GIL. Sharing the GIL like this may be where all interpreters share a single GIL. Sharing the GIL like this
desirable when using extension modules that still depend on may be desirable when using extension modules that still depend
the GIL for thread safety. on the GIL for thread safety.
PyInterpreterConfig Initializer Macros In ``PyInterpreterConfig_INIT``, this will be ``true``.
'''''''''''''''''''''''''''''''''''''' In ``PyInterpreterConfig_LEGACY_INIT``, this will be ``false``.
``#define PyInterpreterConfig_LEGACY_INIT {1, 1, 0, 0}`` PyInterpreterConfig.strict_extensions_compat
''''''''''''''''''''''''''''''''''''''''''''
This initializer matches the behavior of ``Py_NewInterpreter()``. ``PyInterpreterConfig.strict_extension_compat`` is basically the initial
The main interpreter uses this. value used for "PyInterpreterState.strict_extension_compat".
``#define PyInterpreterConfig_INIT {0, 0, 1, 1}``
This initializer would be used to get an isolated interpreter that
also avoids subinterpreter-unfriendly features. It would be the default
for interpreters created through :pep:`554`. Fork (without exec) would
be disabled by default due to the general problems of mixing threads
with fork, coupled with the role of the main interpreter in the runtime
lifecycle. Daemon threads would be disabled due to their poor interaction
with interpreter finalization.
New API Functions
'''''''''''''''''
``PyThreadState * Py_NewInterpreterEx(PyInterpreterConfig *)``
This is like ``Py_NewInterpreter()`` but initializes uses the granular
config. It will replace the "private" func ``_Py_NewInterpreter()``.
``bool PyInterpreter_GetStrictExtensions(PyInterpreterState *)``
``void PyInterpreter_SetStrictExtensions(PyInterpreterState *, bool)``
Respectively, these get/set "PyInterpreterState.strict_extensions".
Restricting Extension Modules Restricting Extension Modules
----------------------------- -----------------------------
@ -449,20 +417,21 @@ multiple interpreters at once. This includes dealing with their globals.
If an extension implements multi-phase init (see :pep:`489`) it is If an extension implements multi-phase init (see :pep:`489`) it is
considered compatible with multiple interpreters. All other extensions considered compatible with multiple interpreters. All other extensions
are considered incompatible. This position is based on the premise that are considered incompatible. (See `Extension Module Thread Safety`_
if a module supports use with multiple interpreters then it necessarily for more details about how a per-interpreter GIL may affect that
will work even if interpreters do not share the GIL. This position is classification.)
still the subject of debate.
If an incompatible extension is imported and the current If an incompatible extension is imported and the current
"PyInterpreterState.strict_extensions" value is ``true`` then the import "PyInterpreterState.strict_extension_compat" value is ``true`` then the import
system will raise ``ImportError``. (For ``false`` it simply doesn't check.) system will raise ``ImportError``. (For ``false`` it simply doesn't check.)
This will be done through This will be done through
``importlib._bootstrap_external.ExtensionFileLoader``. ``importlib._bootstrap_external.ExtensionFileLoader`` (really, through
``_imp.create_dynamic()``, ``_PyImport_LoadDynamicModuleWithSpec()``, and
``PyModule_FromDefAndSpec2()``).
Such imports will never fail in the main interpreter (or in interpreters Such imports will never fail in the main interpreter (or in interpreters
created through ``Py_NewInterpreter()``) since created through ``Py_NewInterpreter()``) since
"PyInterpreterState.strict_extensions" initializes to ``false`` in both "PyInterpreterState.strict_extension_compat" initializes to ``false`` in both
cases. Thus the legacy (pre-3.12) behavior is preserved. cases. Thus the legacy (pre-3.12) behavior is preserved.
We will work with popular extensions to help them support use in We will work with popular extensions to help them support use in
@ -473,22 +442,71 @@ Extension Module Compatibility
'''''''''''''''''''''''''''''' ''''''''''''''''''''''''''''''
As noted in `Extension Modules`_, many extensions work fine in multiple As noted in `Extension Modules`_, many extensions work fine in multiple
interpreters without needing any changes. The import system will still interpreters (and under a per-interpreter GIL) without needing any
fail if such a module doesn't explicitly indicate support. At first, changes. The import system will still fail if such a module doesn't
not many extension modules will, so this is a potential source explicitly indicate support. At first, not many extension modules
of frustration. will, so this is a potential source of frustration.
We will address this by adding a context manager to temporarily disable We will address this by adding a context manager to temporarily disable
the check on multiple interpreter support: the check on multiple interpreter support:
``importlib.util.allow_all_extensions()``. More or less, it will modify ``importlib.util.allow_all_extensions()``. More or less, it will modify
the current "PyInterpreterState.strict_extensions" value (e.g. through the current "PyInterpreterState.strict_extension_compat" value (e.g. through
a private ``sys`` function). a private ``sys`` function).
Extension Module Thread Safety
''''''''''''''''''''''''''''''
If a module supports use with multiple interpreters, that mostly implies
it will work even if those interpreters do not share the GIL. The one
caveat is where a module links against a library with internal global
state that isn't thread-safe. (Even something as innocuous as a static
local variable as a temporary buffer can be a problem.) With a shared
GIL, that state is protected. Without one, such modules must wrap any
use of that state (e.g. through calls) with a lock.
Currently, it isn't clear whether or not supports-multiple-interpreters
is sufficiently equivalent to supports-per-interpreter-gil, such that
we can avoid any special accommodations. This is still a point of
meaningful discussion and investigation. The practical distinction
between the two (in the Python community, e.g. PyPI) is not yet
understood well enough to settle the matter. Likewise, it isn't clear
what we might be able to do to help extension maintainers mitigate
the problem (assuming it is one).
In the meantime, we must proceed as though the difference would be
large enough to cause problems for enough extension modules out there.
The solution we would apply is:
* add a ``PyModuleDef`` slot that indicates an extension can be imported
under a per-interpreter GIL (i.e. opt in)
* add that slot as part of the definition of a "compatible" extension,
as discussed earlier
The downside is that not a single extension module will be able to take
advantage of the per-interpreter GIL without extra effort by the module
maintainer, regardless of how minor that effort. This compounds the
problem described in `Extension Module Compatibility`_ and the same
workaround applies. Ideally, we would determine that there isn't enough
difference to matter.
If we do end up requiring an opt-in for imports under a per-interpreter
GIL, and later determine it isn't necessary, then we can switch the
default at that point, make the old opt-in slot a noop, and add a new
``PyModuleDef`` slot for explicitly opting *out*. In fact, it makes
sense to add that opt-out slot from the beginning.
Documentation Documentation
------------- -------------
The "Sub-interpreter support" section of ``Doc/c-api/init.rst`` will be * C-API: the "Sub-interpreter support" section of ``Doc/c-api/init.rst``
updated with the added API. will detail the updated API
* C-API: that section will explain about the consequences of
a per-interpreter GIL
* importlib: the ``ExtensionFileLoader`` entry will note import
may fail in subinterpreters
* importlib: there will be a new entry about
``importlib.util.allow_all_extensions()``
Impact Impact
@ -498,7 +516,14 @@ Backwards Compatibility
----------------------- -----------------------
No behavior or APIs are intended to change due to this proposal, No behavior or APIs are intended to change due to this proposal,
with one exception noted in `the next section <Extension Modules_>`_. with two exceptions:
* some extensions will fail to import in some subinterpreters
(see `the next section <Extension Modules_>`_)
* "mem" and "object" allocators that are currently not thread-safe
may now be susceptible to data races when used in combination
with multiple interpreters
The existing C-API for managing interpreters will preserve its current The existing C-API for managing interpreters will preserve its current
behavior, with new behavior exposed through new API. No other API behavior, with new behavior exposed through new API. No other API
or runtime behavior is meant to change, including compatibility with or runtime behavior is meant to change, including compatibility with
@ -516,24 +541,35 @@ there is no change in behavior under multiple interpreters created
using the existing ``Py_NewInterpreter()``. using the existing ``Py_NewInterpreter()``.
Keep in mind that some extensions already break when used in multiple Keep in mind that some extensions already break when used in multiple
interpreters, due to keeping module state in global variables. They interpreters, due to keeping module state in global variables (or
due to the `internal state of linked libraries`_). They
may crash or, worse, experience inconsistent behavior. That was part may crash or, worse, experience inconsistent behavior. That was part
of the motivation for :pep:`630` and friends, so this is not a new of the motivation for :pep:`630` and friends, so this is not a new
situation nor a consequence of this proposal. situation nor a consequence of this proposal.
.. _internal state of linked libraries: https://github.com/pyca/cryptography/issues/2299
In contrast, when the `proposed API <proposed capi_>`_ is used to In contrast, when the `proposed API <proposed capi_>`_ is used to
create multiple interpreters, the default behavior will change for create multiple interpreters, with the appropriate settings,
some extensions. In that case, importing an extension will fail the behavior will change for incompatible extensions. In that case,
(outside the main interpreter) if it doesn't indicate support for importing such an extension will fail (outside the main interpreter),
multiple interpreters. For extensions that already break in as explained in `Restricting Extension Modules`_. For extensions that
multiple interpreters, this will be an improvement. already break in multiple interpreters, this will be an improvement.
Additionally, some extension modules link against libraries with
thread-unsafe internal global state.
(See `Extension Module Thread Safety`_.)
Such modules will have to start wrapping any direct or indirect use
of that state in a lock. This is the key difference from other modules
that also implement multi-phase init and thus indicate support for
multiple interpreters (i.e. isolation).
Now we get to the break in compatibility mentioned above. Some Now we get to the break in compatibility mentioned above. Some
extensions are safe under multiple interpreters, even though they extensions are safe under multiple interpreters (and a per-interpreter
haven't indicated that. Unfortunately, there is no reliable way for GIL), even though they haven't indicated that. Unfortunately, there is
the import system to infer that such an extension is safe, so no reliable way for the import system to infer that such an extension
importing them will still fail. This case is addressed in is safe, so importing them will still fail. This case is addressed
`Extension Module Compatibility`_ below. in `Extension Module Compatibility`_ above.
Extension Module Maintainers Extension Module Maintainers
---------------------------- ----------------------------
@ -545,20 +581,20 @@ concern about the increased burden they anticipate due to increased
use of multiple interpreters. use of multiple interpreters.
Specifically, enabling support for multiple interpreters will require Specifically, enabling support for multiple interpreters will require
substantial work for some extension modules. To add that support, substantial work for some extension modules (albeit likely not many).
the maintainer(s) of such a module (often volunteers) would have to To add that support, the maintainer(s) of such a module (often
set aside their normal priorities and interests to focus on volunteers) would have to set aside their normal priorities and
compatibility (see :pep:`630`). interests to focus on compatibility (see :pep:`630`).
Of course, extension maintainers are free to not add support for use Of course, extension maintainers are free to not add support for use
in multiple interpreters. However, users will increasingly demand in multiple interpreters. However, users will increasingly demand
such support, especially if the feature grows such support, especially if the feature grows in popularity.
in popularity.
Either way, the situation can be stressful for maintainers of such Either way, the situation can be stressful for maintainers of such
extensions, particularly when they are doing the work in their spare extensions, particularly when they are doing the work in their spare
time. The concerns they have expressed are understandable, and we address time. The concerns they have expressed are understandable, and we address
the partial solution in `Restricting Extension Modules`_ below. the partial solution in the `Restricting Extension Modules`_ and
`Extension Module Compatibility`_ sections.
Alternate Python Implementations Alternate Python Implementations
-------------------------------- --------------------------------
@ -587,23 +623,35 @@ Performance
The work to consolidate globals has already provided a number of The work to consolidate globals has already provided a number of
improvements to CPython's performance, both speeding it up and using improvements to CPython's performance, both speeding it up and using
less memory, and this should continue. Performance benefits to a less memory, and this should continue. The performance benefits of a
per-interpreter GIL have not been explored. At the very least, it is per-interpreter GIL specifically have not been explored. At the very
not expected to make CPython slower (as long as interpreters are least, it is not expected to make CPython slower
sufficiently isolated). (as long as interpreters are sufficiently isolated). And, obviously,
it enable a variety of multi-core parallelism in Python code.
How to Teach This How to Teach This
================= =================
This is an advanced feature for users of the C-API. There is no Unlike :pep:`554`, this is an advanced feature meant for a narrow set
expectation that this will be taught. of users of the C-API. There is no expectation that the specifics of
the API nor its direct application will be taught.
That said, if it were taught then it would boil down to the following: That said, if it were taught then it would boil down to the following:
In addition to Py_NewInterpreter(), you can use Py_NewInterpreterEx() In addition to Py_NewInterpreter(), you can use
to create an interpreter. The config you pass it indicates how you Py_NewInterpreterFromConfig() to create an interpreter.
want that interpreter to behave. The config you pass it indicates how you want that
interpreter to behave.
Furthermore, the maintainers of any extension modules that create
isolated interpreters will likely need to explain the consequences
of a per-interpreter GIL to their users. The first thing to explain
is what :pep:`554` teaches about the concurrency model that isolated
interpreters enables. That leads into the point that Python software
written using that concurrency model can then take advantage
of multi-core parallelism, which is currently
prevented by the GIL.
.. XXX We should add docs (a la PEP 630) that spell out how to make .. XXX We should add docs (a la PEP 630) that spell out how to make
an extension compatible with per-interpreter GIL. an extension compatible with per-interpreter GIL.
@ -618,15 +666,18 @@ Reference Implementation
Open Issues Open Issues
=========== ===========
* What to do about the allocators? * Are we okay to require "mem" and "object" allcoators to be thread-safe?
* How would a per-interpreter tracemalloc module relate to global allocators? * How would a per-interpreter tracemalloc module relate to global allocators?
* Would the faulthandler module be limited to the main interpreter * Would the faulthandler module be limited to the main interpreter
(like the signal module) or would we leak that global state between (like the signal module) or would we leak that global state between
interpreters (protected by a granular lock)? interpreters (protected by a granular lock)?
* Split out an informational PEP with all the relevant info, * Split out an informational PEP with all the relevant info,
based on the "Consolidating Runtime Global State" section? based on the "Consolidating Runtime Global State" section?
* Does supporting multiple interpreters automatically mean an extension * How likely is it that a module works under multiple interpreters
supports a per-interpreter GIL? (isolation) but doesn't work under a per-interpreter GIL?
(See `Extension Module Thread Safety`_.)
* If it is likely enough, what can we do to help extension maintainers
mitigate the problem and enjoy use under a per-intepreter GIL?
* What would be a better (scarier-sounding) name * What would be a better (scarier-sounding) name
for ``allow_all_extensions``? for ``allow_all_extensions``?