PEP 573: SystemError->TypeError and misc copy-edits (GH-1264)
* report TypeError rather than SystemError when bad types are passed to new APIs * add PyType_FromModuleAndSpec to the list of new functions * wrap text at 80 columns * add C API docs links for thread-local storage and context variables * note in abstract that only the initial state lookup from slots is inherently slow, with subsequent lookups being amenable to caching * other minor wording tweaks
This commit is contained in:
parent
8dcf7fe49c
commit
34343da45d
140
pep-0573.rst
140
pep-0573.rst
|
@ -19,19 +19,24 @@ Post-History:
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
This PEP proposes to add a way for CPython extension methods to access context such as
|
This PEP proposes to add a way for CPython extension methods to access context,
|
||||||
the state of the modules they are defined in.
|
such as the state of the modules they are defined in.
|
||||||
|
|
||||||
This will allow extension methods to use direct pointer dereferences
|
This will allow extension methods to use direct pointer dereferences
|
||||||
rather than PyState_FindModule for looking up module state, reducing or eliminating the
|
rather than PyState_FindModule for looking up module state, reducing or
|
||||||
performance cost of using module-scoped state over process global state.
|
eliminating the performance cost of using module-scoped state over process
|
||||||
|
global state.
|
||||||
|
|
||||||
This fixes one of the remaining roadblocks for adoption of PEP 3121 (Extension
|
This fixes one of the remaining roadblocks for adoption of PEP 3121 (Extension
|
||||||
module initialization and finalization) and PEP 489
|
module initialization and finalization) and PEP 489
|
||||||
(Multi-phase extension module initialization).
|
(Multi-phase extension module initialization).
|
||||||
|
|
||||||
While this PEP takes an additional step towards fully solving the problems that PEP 3121 and PEP 489 started
|
While this PEP takes an additional step towards fully solving the problems that
|
||||||
tackling, it does not attempt to resolve *all* remaining concerns. In particular, accessing the module state from slot methods (``nb_add``, etc) remains slower than accessing that state from other extension methods.
|
PEP 3121 and PEP 489 started tackling, it does not attempt to resolve *all*
|
||||||
|
remaining concerns. In particular, at least the first access to the module state
|
||||||
|
from slot methods (``nb_add``, etc) remains slower than accessing that state
|
||||||
|
from other extension methods. Standard caching techniques can be used to speed
|
||||||
|
up subsequent access.
|
||||||
|
|
||||||
|
|
||||||
Terminology
|
Terminology
|
||||||
|
@ -58,12 +63,14 @@ Accessed by ``PyModule_GetState()``.
|
||||||
Static Type
|
Static Type
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
A type object defined as a C-level static variable, i.e. a compiled-in type object.
|
A type object defined as a C-level static variable, i.e. a compiled-in type
|
||||||
|
object.
|
||||||
|
|
||||||
A static type needs to be shared between module instances and has no
|
A static type needs to be shared between module instances and has no
|
||||||
information of what module it belongs to.
|
information of what module it belongs to.
|
||||||
Static types do not have ``__dict__`` (although their instances might).
|
Static types do not have ``__dict__`` (although their instances might).
|
||||||
|
|
||||||
|
|
||||||
Heap Type
|
Heap Type
|
||||||
---------
|
---------
|
||||||
|
|
||||||
|
@ -81,23 +88,23 @@ several advantages to extensions that implement it:
|
||||||
module objects, which paves the way for extension module support for
|
module objects, which paves the way for extension module support for
|
||||||
``runpy`` or for systems that enable extension module reloading.
|
``runpy`` or for systems that enable extension module reloading.
|
||||||
* Loading multiple modules from the same extension is possible, which
|
* Loading multiple modules from the same extension is possible, which
|
||||||
makes testing module isolation (a key feature for proper sub-interpreter
|
makes it possible to test module isolation (a key feature for proper
|
||||||
support) possible from a single interpreter.
|
sub-interpreter support) from a single interpreter.
|
||||||
|
|
||||||
The biggest hurdle for adoption of PEP 489 is allowing access to module state
|
The biggest hurdle for adoption of PEP 489 is allowing access to module state
|
||||||
from methods of extension types.
|
from methods of extension types.
|
||||||
Currently, the way to access this state from extension methods is by looking up the module via
|
Currently, the way to access this state from extension methods is by looking up
|
||||||
``PyState_FindModule`` (in contrast to module level functions in extension modules, which
|
the module via ``PyState_FindModule`` (in contrast to module level functions in
|
||||||
receive a module reference as an argument).
|
extension modules, which receive a module reference as an argument).
|
||||||
However, ``PyState_FindModule`` queries the thread-local state, making it relatively
|
However, ``PyState_FindModule`` queries the thread-local state, making it
|
||||||
costly compared to C level process global access and consequently deterring module authors from using it.
|
relatively costly compared to C level process global access and consequently
|
||||||
|
deterring module authors from using it.
|
||||||
|
|
||||||
Also, ``PyState_FindModule`` relies on the assumption that in each
|
Also, ``PyState_FindModule`` relies on the assumption that in each
|
||||||
subinterpreter, there is at most one module corresponding to
|
subinterpreter, there is at most one module corresponding to
|
||||||
a given ``PyModuleDef``. This does not align well with Python's import
|
a given ``PyModuleDef``. This assumption does not hold for modules that use
|
||||||
machinery. Since PEP 489 aimed to fix that, the assumption does
|
PEP 489's multi-phase initialization, so ``PyState_FindModule`` is unavailable
|
||||||
not hold for modules that use multi-phase initialization, so
|
for these modules.
|
||||||
``PyState_FindModule`` is unavailable for these modules.
|
|
||||||
|
|
||||||
A faster, safer way of accessing module-level state from extension methods
|
A faster, safer way of accessing module-level state from extension methods
|
||||||
is needed.
|
is needed.
|
||||||
|
@ -133,12 +140,17 @@ In Python code, the Python-level equivalents may be retrieved as::
|
||||||
The defining class is not ``type(self)``, since ``type(self)`` might
|
The defining class is not ``type(self)``, since ``type(self)`` might
|
||||||
be a subclass of ``Foo``.
|
be a subclass of ``Foo``.
|
||||||
|
|
||||||
The statements marked (1) implicitly rely on name-based lookup via the function's ``__globals__``:
|
The statements marked (1) implicitly rely on name-based lookup via the
|
||||||
either the ``Foo`` attribute to access the defining class and Python function object, or ``__name__`` to find the module object in ``sys.modules``.
|
function's ``__globals__``: either the ``Foo`` attribute to access the defining
|
||||||
In Python code, this is feasible, as ``__globals__`` is set appropriately when the function definition is executed, and
|
class and Python function object, or ``__name__`` to find the module object in
|
||||||
even if the namespace has been manipulated to return a different object, at worst an exception will be raised.
|
``sys.modules``.
|
||||||
|
|
||||||
The ``__class__`` closure, (2), is a safer way to get the defining class, but it still relies on ``__closure__`` being set appropriately.
|
In Python code, this is feasible, as ``__globals__`` is set appropriately when
|
||||||
|
the function definition is executed, and even if the namespace has been
|
||||||
|
manipulated to return a different object, at worst an exception will be raised.
|
||||||
|
|
||||||
|
The ``__class__`` closure, (2), is a safer way to get the defining class, but it
|
||||||
|
still relies on ``__closure__`` being set appropriately.
|
||||||
|
|
||||||
By contrast, extension methods are typically implemented as normal C functions.
|
By contrast, extension methods are typically implemented as normal C functions.
|
||||||
This means that they only have access to their arguments and C level thread-local
|
This means that they only have access to their arguments and C level thread-local
|
||||||
|
@ -149,26 +161,34 @@ their shared state in C-level process globals, causing problems when:
|
||||||
* reloading modules (e.g. to test conditional imports)
|
* reloading modules (e.g. to test conditional imports)
|
||||||
* loading extension modules in subinterpreters
|
* loading extension modules in subinterpreters
|
||||||
|
|
||||||
PEP 3121 attempted to resolve this by offering the ``PyState_FindModule`` API, but this still has significant problems when it comes to extension methods (rather than module level functions):
|
PEP 3121 attempted to resolve this by offering the ``PyState_FindModule`` API,
|
||||||
|
but this still has significant problems when it comes to extension methods
|
||||||
|
(rather than module level functions):
|
||||||
|
|
||||||
* it is markedly slower than directly accessing C-level process-global state
|
* it is markedly slower than directly accessing C-level process-global state
|
||||||
* there is still some inherent reliance on process global state that means it still doesn't reliably handle module reloading
|
* there is still some inherent reliance on process global state that means it
|
||||||
|
still doesn't reliably handle module reloading
|
||||||
|
|
||||||
|
It's also the case that when looking up a C-level struct such as module state,
|
||||||
|
supplying an unexpected object layout can crash the interpreter, so it's
|
||||||
|
significantly more important to ensure that extension methods receive the kind
|
||||||
|
of object they expect.
|
||||||
|
|
||||||
It's also the case that when looking up a C-level struct such as module state, supplying
|
|
||||||
an unexpected object layout can crash the interpreter, so it's significantly more important to ensure that extension
|
|
||||||
methods receive the kind of object they expect.
|
|
||||||
|
|
||||||
Proposal
|
Proposal
|
||||||
========
|
========
|
||||||
|
|
||||||
Currently, a bound extension method (``PyCFunction`` or ``PyCFunctionWithKeywords``) receives only
|
Currently, a bound extension method (``PyCFunction`` or
|
||||||
``self``, and (if applicable) the supplied positional and keyword arguments.
|
``PyCFunctionWithKeywords``) receives only ``self``, and (if applicable) the
|
||||||
|
supplied positional and keyword arguments.
|
||||||
|
|
||||||
While module-level extension functions already receive access to the defining module object via their
|
While module-level extension functions already receive access to the defining
|
||||||
``self`` argument, methods of extension types don't have that luxury: they receive the bound instance
|
module object via their ``self`` argument, methods of extension types don't have
|
||||||
via ``self``, and hence have no direct access to the defining class or the module level state.
|
that luxury: they receive the bound instance via ``self``, and hence have no
|
||||||
|
direct access to the defining class or the module level state.
|
||||||
|
|
||||||
The additional module level context described above can be made available with two changes.
|
The additional module level context described above can be made available with
|
||||||
|
two changes.
|
||||||
Both additions are optional; extension authors need to opt in to start
|
Both additions are optional; extension authors need to opt in to start
|
||||||
using them:
|
using them:
|
||||||
|
|
||||||
|
@ -176,17 +196,16 @@ using them:
|
||||||
|
|
||||||
* Pass the defining class to the underlying C function.
|
* Pass the defining class to the underlying C function.
|
||||||
|
|
||||||
The defining class is readily available at the time built-in
|
The defining class is readily available at the time the built-in
|
||||||
method object (``PyCFunctionObject``) is created, so it can be stored
|
method object (``PyCFunctionObject``) is created, so it can be stored
|
||||||
in a new struct that extends ``PyCFunctionObject``.
|
in a new struct that extends ``PyCFunctionObject``.
|
||||||
|
|
||||||
The module state can then be retrieved from the module object via
|
The module state can then be retrieved from the module object via
|
||||||
``PyModule_GetState``.
|
``PyModule_GetState``.
|
||||||
|
|
||||||
Note that this proposal implies that any type whose method needs to access
|
Note that this proposal implies that any type whose methods need to access
|
||||||
`per-module state`_ must be a heap type, rather than a static type.
|
`per-module state`_ must be a heap type, rather than a static type. This is
|
||||||
|
necessary to support loading multiple module objects from a single
|
||||||
This is necessary to support loading multiple module objects from a single
|
|
||||||
extension: a static type, as a C-level global, has no information about
|
extension: a static type, as a C-level global, has no information about
|
||||||
which module object it belongs to.
|
which module object it belongs to.
|
||||||
|
|
||||||
|
@ -201,17 +220,22 @@ simply add a new argument to pass in the defining class.
|
||||||
Two possible solutions have been proposed to this problem:
|
Two possible solutions have been proposed to this problem:
|
||||||
|
|
||||||
* Look up the class through walking the MRO.
|
* Look up the class through walking the MRO.
|
||||||
This is potentially expensive, but will be useful if performance is not
|
This is potentially expensive, but will be usable if performance is not
|
||||||
a problem (such as when raising a module-level exception).
|
a problem (such as when raising a module-level exception).
|
||||||
* Storing a pointer to the defining class of each slot in a separate table,
|
* Storing a pointer to the defining class of each slot in a separate table,
|
||||||
``__typeslots__`` [#typeslots-mail]_. This is technically feasible and fast,
|
``__typeslots__`` [#typeslots-mail]_. This is technically feasible and fast,
|
||||||
but quite invasive.
|
but quite invasive.
|
||||||
|
|
||||||
Due to the invasiveness of the latter approach, this PEP proposes adding an MRO walking
|
Due to the invasiveness of the latter approach, this PEP proposes adding an MRO
|
||||||
helper for use in slot method implementations, deferring the more complex alternative
|
walking helper for use in slot method implementations, deferring the more complex
|
||||||
as a potential future optimisation. Modules affected by this concern also have the
|
alternative as a potential future optimisation.
|
||||||
option of using thread-local state or PEP 567 context variables, or else defining their
|
|
||||||
own reload-friendly lookup caching scheme.
|
Modules affected by this concern also have the option of using
|
||||||
|
`thread-local state`_ or `PEP 567 context variables`_ as a caching mechanism, or
|
||||||
|
else defining their own reload-friendly lookup caching scheme.
|
||||||
|
|
||||||
|
.. _thread-local state: https://docs.python.org/3/c-api/init.html#thread-local-storage-support
|
||||||
|
.. _PEP 567 context variables: https://docs.python.org/3/c-api/contextvars.html
|
||||||
|
|
||||||
|
|
||||||
Specification
|
Specification
|
||||||
|
@ -237,12 +261,12 @@ This acts the same as ``PyType_FromSpecWithBases``, and additionally sets
|
||||||
Additionally, an accessor, ``PyObject * PyType_GetModule(PyTypeObject *)``
|
Additionally, an accessor, ``PyObject * PyType_GetModule(PyTypeObject *)``
|
||||||
will be provided.
|
will be provided.
|
||||||
It will return the ``ht_module`` if a heap type with module pointer set
|
It will return the ``ht_module`` if a heap type with module pointer set
|
||||||
is passed in, otherwise it will set a SystemError and return NULL.
|
is passed in, otherwise it will set ``TypeError`` and return NULL.
|
||||||
|
|
||||||
Usually, creating a class with ``ht_module`` set will create a reference
|
Usually, creating a class with ``ht_module`` set will create a reference
|
||||||
cycle involving the class and the module.
|
cycle involving the class and the module.
|
||||||
This is not a problem, as tearing down modules is not a performance-sensitive
|
This is not a problem, as tearing down modules is not a performance-sensitive
|
||||||
operation (and module-level functions typically also create reference cycles).
|
operation, and module-level functions typically also create reference cycles.
|
||||||
The existing "set all module globals to None" code that breaks function cycles
|
The existing "set all module globals to None" code that breaks function cycles
|
||||||
through ``f_globals`` will also break the new cycles through ``ht_module``.
|
through ``f_globals`` will also break the new cycles through ``ht_module``.
|
||||||
|
|
||||||
|
@ -325,9 +349,9 @@ will be implemented::
|
||||||
The walker will go through bases of heap-allocated ``type``
|
The walker will go through bases of heap-allocated ``type``
|
||||||
and search for class that defines ``func`` at its ``slot``.
|
and search for class that defines ``func`` at its ``slot``.
|
||||||
|
|
||||||
The ``func`` needs not to be inherited by ``type``. The only requirement
|
The ``func`` does not need to be inherited by ``type`` (i.e. it may have been
|
||||||
for the walker to find the defining class is that the defining class
|
overridden in a subclass). The only requirement for the walker to find the
|
||||||
must be heap-allocated.
|
defining class is that the defining class must be heap-allocated.
|
||||||
|
|
||||||
On failure, exception is set and NULL is returned.
|
On failure, exception is set and NULL is returned.
|
||||||
|
|
||||||
|
@ -340,12 +364,13 @@ this easier, a helper will be added::
|
||||||
|
|
||||||
void *PyType_GetModuleState(PyObject *type)
|
void *PyType_GetModuleState(PyObject *type)
|
||||||
|
|
||||||
This function takes a heap type and on success, it returns pointer to state of the
|
This function takes a heap type and on success, it returns pointer to the state
|
||||||
module that the heap type belongs to.
|
of the module that the heap type belongs to.
|
||||||
|
|
||||||
On failure, two scenarios may occur. When a type without a module is passed in,
|
On failure, two scenarios may occur. When a non-type object, or a type without a
|
||||||
``SystemError`` is set and ``NULL`` returned. If the module is found, pointer
|
module is passed in, ``TypeError`` is set and ``NULL`` returned. If the module
|
||||||
to the state, which may be ``NULL``, is returned without setting any exception.
|
is found, the pointer to the state, which may be ``NULL``, is returned without
|
||||||
|
setting any exception.
|
||||||
|
|
||||||
|
|
||||||
Modules Converted in the Initial Implementation
|
Modules Converted in the Initial Implementation
|
||||||
|
@ -363,9 +388,10 @@ Summary of API Changes and Additions
|
||||||
|
|
||||||
New functions:
|
New functions:
|
||||||
|
|
||||||
|
* ``PyType_FromModuleAndSpec``
|
||||||
* ``PyType_GetModule``
|
* ``PyType_GetModule``
|
||||||
* ``PyType_DefiningTypeFromSlotFunc``
|
|
||||||
* ``PyType_GetModuleState``
|
* ``PyType_GetModuleState``
|
||||||
|
* ``PyType_DefiningTypeFromSlotFunc``
|
||||||
|
|
||||||
New macros:
|
New macros:
|
||||||
|
|
||||||
|
@ -388,7 +414,7 @@ Other changes:
|
||||||
Backwards Compatibility
|
Backwards Compatibility
|
||||||
=======================
|
=======================
|
||||||
|
|
||||||
Two new pointers are added to all heap types.
|
One new pointer is added to all heap types.
|
||||||
All other changes are adding new functions and structures,
|
All other changes are adding new functions and structures,
|
||||||
or changes to private implementation details.
|
or changes to private implementation details.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue