PEP 573: SystemError->TypeError and misc copy-edits (GH-1264)

* report TypeError rather than SystemError when bad types are passed
  to new APIs
* add PyType_FromModuleAndSpec to the list of new functions
* wrap text at 80 columns
* add C API docs links for thread-local storage and context variables
* note in abstract that only the initial state lookup from slots is
  inherently slow, with subsequent lookups being amenable to caching
* other minor wording tweaks
This commit is contained in:
Nick Coghlan 2020-01-03 22:24:51 +10:00 committed by Petr Viktorin
parent 8dcf7fe49c
commit 34343da45d
1 changed files with 83 additions and 57 deletions

View File

@ -19,19 +19,24 @@ Post-History:
Abstract Abstract
======== ========
This PEP proposes to add a way for CPython extension methods to access context such as This PEP proposes to add a way for CPython extension methods to access context,
the state of the modules they are defined in. such as the state of the modules they are defined in.
This will allow extension methods to use direct pointer dereferences This will allow extension methods to use direct pointer dereferences
rather than PyState_FindModule for looking up module state, reducing or eliminating the rather than PyState_FindModule for looking up module state, reducing or
performance cost of using module-scoped state over process global state. eliminating the performance cost of using module-scoped state over process
global state.
This fixes one of the remaining roadblocks for adoption of PEP 3121 (Extension This fixes one of the remaining roadblocks for adoption of PEP 3121 (Extension
module initialization and finalization) and PEP 489 module initialization and finalization) and PEP 489
(Multi-phase extension module initialization). (Multi-phase extension module initialization).
While this PEP takes an additional step towards fully solving the problems that PEP 3121 and PEP 489 started While this PEP takes an additional step towards fully solving the problems that
tackling, it does not attempt to resolve *all* remaining concerns. In particular, accessing the module state from slot methods (``nb_add``, etc) remains slower than accessing that state from other extension methods. PEP 3121 and PEP 489 started tackling, it does not attempt to resolve *all*
remaining concerns. In particular, at least the first access to the module state
from slot methods (``nb_add``, etc) remains slower than accessing that state
from other extension methods. Standard caching techniques can be used to speed
up subsequent access.
Terminology Terminology
@ -58,12 +63,14 @@ Accessed by ``PyModule_GetState()``.
Static Type Static Type
----------- -----------
A type object defined as a C-level static variable, i.e. a compiled-in type object. A type object defined as a C-level static variable, i.e. a compiled-in type
object.
A static type needs to be shared between module instances and has no A static type needs to be shared between module instances and has no
information of what module it belongs to. information of what module it belongs to.
Static types do not have ``__dict__`` (although their instances might). Static types do not have ``__dict__`` (although their instances might).
Heap Type Heap Type
--------- ---------
@ -81,23 +88,23 @@ several advantages to extensions that implement it:
module objects, which paves the way for extension module support for module objects, which paves the way for extension module support for
``runpy`` or for systems that enable extension module reloading. ``runpy`` or for systems that enable extension module reloading.
* Loading multiple modules from the same extension is possible, which * Loading multiple modules from the same extension is possible, which
makes testing module isolation (a key feature for proper sub-interpreter makes it possible to test module isolation (a key feature for proper
support) possible from a single interpreter. sub-interpreter support) from a single interpreter.
The biggest hurdle for adoption of PEP 489 is allowing access to module state The biggest hurdle for adoption of PEP 489 is allowing access to module state
from methods of extension types. from methods of extension types.
Currently, the way to access this state from extension methods is by looking up the module via Currently, the way to access this state from extension methods is by looking up
``PyState_FindModule`` (in contrast to module level functions in extension modules, which the module via ``PyState_FindModule`` (in contrast to module level functions in
receive a module reference as an argument). extension modules, which receive a module reference as an argument).
However, ``PyState_FindModule`` queries the thread-local state, making it relatively However, ``PyState_FindModule`` queries the thread-local state, making it
costly compared to C level process global access and consequently deterring module authors from using it. relatively costly compared to C level process global access and consequently
deterring module authors from using it.
Also, ``PyState_FindModule`` relies on the assumption that in each Also, ``PyState_FindModule`` relies on the assumption that in each
subinterpreter, there is at most one module corresponding to subinterpreter, there is at most one module corresponding to
a given ``PyModuleDef``. This does not align well with Python's import a given ``PyModuleDef``. This assumption does not hold for modules that use
machinery. Since PEP 489 aimed to fix that, the assumption does PEP 489's multi-phase initialization, so ``PyState_FindModule`` is unavailable
not hold for modules that use multi-phase initialization, so for these modules.
``PyState_FindModule`` is unavailable for these modules.
A faster, safer way of accessing module-level state from extension methods A faster, safer way of accessing module-level state from extension methods
is needed. is needed.
@ -133,12 +140,17 @@ In Python code, the Python-level equivalents may be retrieved as::
The defining class is not ``type(self)``, since ``type(self)`` might The defining class is not ``type(self)``, since ``type(self)`` might
be a subclass of ``Foo``. be a subclass of ``Foo``.
The statements marked (1) implicitly rely on name-based lookup via the function's ``__globals__``: The statements marked (1) implicitly rely on name-based lookup via the
either the ``Foo`` attribute to access the defining class and Python function object, or ``__name__`` to find the module object in ``sys.modules``. function's ``__globals__``: either the ``Foo`` attribute to access the defining
In Python code, this is feasible, as ``__globals__`` is set appropriately when the function definition is executed, and class and Python function object, or ``__name__`` to find the module object in
even if the namespace has been manipulated to return a different object, at worst an exception will be raised. ``sys.modules``.
The ``__class__`` closure, (2), is a safer way to get the defining class, but it still relies on ``__closure__`` being set appropriately. In Python code, this is feasible, as ``__globals__`` is set appropriately when
the function definition is executed, and even if the namespace has been
manipulated to return a different object, at worst an exception will be raised.
The ``__class__`` closure, (2), is a safer way to get the defining class, but it
still relies on ``__closure__`` being set appropriately.
By contrast, extension methods are typically implemented as normal C functions. By contrast, extension methods are typically implemented as normal C functions.
This means that they only have access to their arguments and C level thread-local This means that they only have access to their arguments and C level thread-local
@ -149,26 +161,34 @@ their shared state in C-level process globals, causing problems when:
* reloading modules (e.g. to test conditional imports) * reloading modules (e.g. to test conditional imports)
* loading extension modules in subinterpreters * loading extension modules in subinterpreters
PEP 3121 attempted to resolve this by offering the ``PyState_FindModule`` API, but this still has significant problems when it comes to extension methods (rather than module level functions): PEP 3121 attempted to resolve this by offering the ``PyState_FindModule`` API,
but this still has significant problems when it comes to extension methods
(rather than module level functions):
* it is markedly slower than directly accessing C-level process-global state * it is markedly slower than directly accessing C-level process-global state
* there is still some inherent reliance on process global state that means it still doesn't reliably handle module reloading * there is still some inherent reliance on process global state that means it
still doesn't reliably handle module reloading
It's also the case that when looking up a C-level struct such as module state,
supplying an unexpected object layout can crash the interpreter, so it's
significantly more important to ensure that extension methods receive the kind
of object they expect.
It's also the case that when looking up a C-level struct such as module state, supplying
an unexpected object layout can crash the interpreter, so it's significantly more important to ensure that extension
methods receive the kind of object they expect.
Proposal Proposal
======== ========
Currently, a bound extension method (``PyCFunction`` or ``PyCFunctionWithKeywords``) receives only Currently, a bound extension method (``PyCFunction`` or
``self``, and (if applicable) the supplied positional and keyword arguments. ``PyCFunctionWithKeywords``) receives only ``self``, and (if applicable) the
supplied positional and keyword arguments.
While module-level extension functions already receive access to the defining module object via their While module-level extension functions already receive access to the defining
``self`` argument, methods of extension types don't have that luxury: they receive the bound instance module object via their ``self`` argument, methods of extension types don't have
via ``self``, and hence have no direct access to the defining class or the module level state. that luxury: they receive the bound instance via ``self``, and hence have no
direct access to the defining class or the module level state.
The additional module level context described above can be made available with two changes. The additional module level context described above can be made available with
two changes.
Both additions are optional; extension authors need to opt in to start Both additions are optional; extension authors need to opt in to start
using them: using them:
@ -176,17 +196,16 @@ using them:
* Pass the defining class to the underlying C function. * Pass the defining class to the underlying C function.
The defining class is readily available at the time built-in The defining class is readily available at the time the built-in
method object (``PyCFunctionObject``) is created, so it can be stored method object (``PyCFunctionObject``) is created, so it can be stored
in a new struct that extends ``PyCFunctionObject``. in a new struct that extends ``PyCFunctionObject``.
The module state can then be retrieved from the module object via The module state can then be retrieved from the module object via
``PyModule_GetState``. ``PyModule_GetState``.
Note that this proposal implies that any type whose method needs to access Note that this proposal implies that any type whose methods need to access
`per-module state`_ must be a heap type, rather than a static type. `per-module state`_ must be a heap type, rather than a static type. This is
necessary to support loading multiple module objects from a single
This is necessary to support loading multiple module objects from a single
extension: a static type, as a C-level global, has no information about extension: a static type, as a C-level global, has no information about
which module object it belongs to. which module object it belongs to.
@ -201,17 +220,22 @@ simply add a new argument to pass in the defining class.
Two possible solutions have been proposed to this problem: Two possible solutions have been proposed to this problem:
* Look up the class through walking the MRO. * Look up the class through walking the MRO.
This is potentially expensive, but will be useful if performance is not This is potentially expensive, but will be usable if performance is not
a problem (such as when raising a module-level exception). a problem (such as when raising a module-level exception).
* Storing a pointer to the defining class of each slot in a separate table, * Storing a pointer to the defining class of each slot in a separate table,
``__typeslots__`` [#typeslots-mail]_. This is technically feasible and fast, ``__typeslots__`` [#typeslots-mail]_. This is technically feasible and fast,
but quite invasive. but quite invasive.
Due to the invasiveness of the latter approach, this PEP proposes adding an MRO walking Due to the invasiveness of the latter approach, this PEP proposes adding an MRO
helper for use in slot method implementations, deferring the more complex alternative walking helper for use in slot method implementations, deferring the more complex
as a potential future optimisation. Modules affected by this concern also have the alternative as a potential future optimisation.
option of using thread-local state or PEP 567 context variables, or else defining their
own reload-friendly lookup caching scheme. Modules affected by this concern also have the option of using
`thread-local state`_ or `PEP 567 context variables`_ as a caching mechanism, or
else defining their own reload-friendly lookup caching scheme.
.. _thread-local state: https://docs.python.org/3/c-api/init.html#thread-local-storage-support
.. _PEP 567 context variables: https://docs.python.org/3/c-api/contextvars.html
Specification Specification
@ -237,12 +261,12 @@ This acts the same as ``PyType_FromSpecWithBases``, and additionally sets
Additionally, an accessor, ``PyObject * PyType_GetModule(PyTypeObject *)`` Additionally, an accessor, ``PyObject * PyType_GetModule(PyTypeObject *)``
will be provided. will be provided.
It will return the ``ht_module`` if a heap type with module pointer set It will return the ``ht_module`` if a heap type with module pointer set
is passed in, otherwise it will set a SystemError and return NULL. is passed in, otherwise it will set ``TypeError`` and return NULL.
Usually, creating a class with ``ht_module`` set will create a reference Usually, creating a class with ``ht_module`` set will create a reference
cycle involving the class and the module. cycle involving the class and the module.
This is not a problem, as tearing down modules is not a performance-sensitive This is not a problem, as tearing down modules is not a performance-sensitive
operation (and module-level functions typically also create reference cycles). operation, and module-level functions typically also create reference cycles.
The existing "set all module globals to None" code that breaks function cycles The existing "set all module globals to None" code that breaks function cycles
through ``f_globals`` will also break the new cycles through ``ht_module``. through ``f_globals`` will also break the new cycles through ``ht_module``.
@ -325,9 +349,9 @@ will be implemented::
The walker will go through bases of heap-allocated ``type`` The walker will go through bases of heap-allocated ``type``
and search for class that defines ``func`` at its ``slot``. and search for class that defines ``func`` at its ``slot``.
The ``func`` needs not to be inherited by ``type``. The only requirement The ``func`` does not need to be inherited by ``type`` (i.e. it may have been
for the walker to find the defining class is that the defining class overridden in a subclass). The only requirement for the walker to find the
must be heap-allocated. defining class is that the defining class must be heap-allocated.
On failure, exception is set and NULL is returned. On failure, exception is set and NULL is returned.
@ -340,12 +364,13 @@ this easier, a helper will be added::
void *PyType_GetModuleState(PyObject *type) void *PyType_GetModuleState(PyObject *type)
This function takes a heap type and on success, it returns pointer to state of the This function takes a heap type and on success, it returns pointer to the state
module that the heap type belongs to. of the module that the heap type belongs to.
On failure, two scenarios may occur. When a type without a module is passed in, On failure, two scenarios may occur. When a non-type object, or a type without a
``SystemError`` is set and ``NULL`` returned. If the module is found, pointer module is passed in, ``TypeError`` is set and ``NULL`` returned. If the module
to the state, which may be ``NULL``, is returned without setting any exception. is found, the pointer to the state, which may be ``NULL``, is returned without
setting any exception.
Modules Converted in the Initial Implementation Modules Converted in the Initial Implementation
@ -363,9 +388,10 @@ Summary of API Changes and Additions
New functions: New functions:
* ``PyType_FromModuleAndSpec``
* ``PyType_GetModule`` * ``PyType_GetModule``
* ``PyType_DefiningTypeFromSlotFunc``
* ``PyType_GetModuleState`` * ``PyType_GetModuleState``
* ``PyType_DefiningTypeFromSlotFunc``
New macros: New macros:
@ -388,7 +414,7 @@ Other changes:
Backwards Compatibility Backwards Compatibility
======================= =======================
Two new pointers are added to all heap types. One new pointer is added to all heap types.
All other changes are adding new functions and structures, All other changes are adding new functions and structures,
or changes to private implementation details. or changes to private implementation details.