622 lines
22 KiB
Plaintext
622 lines
22 KiB
Plaintext
PEP: 489
|
||
Title: Redesigning extension module loading
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Petr Viktorin <encukou@gmail.com>,
|
||
Stefan Behnel <stefan_ml@behnel.de>,
|
||
Nick Coghlan <ncoghlan@gmail.com>
|
||
Discussions-To: import-sig@python.org
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 11-Aug-2013
|
||
Python-Version: 3.5
|
||
Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015
|
||
Resolution:
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes a redesign of the way in which extension modules interact
|
||
with the import machinery. This was last revised for Python 3.0 in PEP
|
||
3121, but did not solve all problems at the time. The goal is to solve them
|
||
by bringing extension modules closer to the way Python modules behave;
|
||
specifically to hook into the ModuleSpec-based loading mechanism
|
||
introduced in PEP 451.
|
||
|
||
This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension
|
||
authors to only define features they need, and to allow future additions
|
||
to extension module declarations.
|
||
|
||
Extensions modules are created in a two-step process, fitting better into
|
||
the ModuleSpec architecture, with parallels to __new__ and __init__ of classes.
|
||
|
||
Extension modules can safely store arbitrary C-level per-module state in
|
||
the module that is covered by normal garbage collection and supports
|
||
reloading and sub-interpreters.
|
||
Extension authors are encouraged to take these issues into account
|
||
when using the new API.
|
||
|
||
The proposal also allows extension modules with non-ASCII names.
|
||
|
||
|
||
Motivation
|
||
==========
|
||
|
||
Python modules and extension modules are not being set up in the same way.
|
||
For Python modules, the module is created and set up first, then the module
|
||
code is being executed (PEP 302).
|
||
A ModuleSpec object (PEP 451) is used to hold information about the module,
|
||
and passed to the relevant hooks.
|
||
|
||
For extensions, i.e. shared libraries, the module
|
||
init function is executed straight away and does both the creation and
|
||
initialization. The initialization function is not passed the ModuleSpec,
|
||
or any information it contains, such as the __file__ or fully-qualified
|
||
name. This hinders relative imports and resource loading.
|
||
|
||
In Py3, modules are also not being added to sys.modules, which means that a
|
||
(potentially transitive) re-import of the module will really try to re-import
|
||
it and thus run into an infinite loop when it executes the module init function
|
||
again. Without the FQMN, it is not trivial to correctly add the module to
|
||
sys.modules either.
|
||
This is specifically a problem for Cython generated modules, for which it's
|
||
not uncommon that the module init code has the same level of complexity as
|
||
that of any 'regular' Python module. Also, the lack of __file__ and __name__
|
||
information hinders the compilation of "__init__.py" modules, i.e. packages,
|
||
especially when relative imports are being used at module init time.
|
||
|
||
Furthermore, the majority of currently existing extension modules has
|
||
problems with sub-interpreter support and/or interpreter reloading, and, while
|
||
it is possible with the current infrastructure to support these
|
||
features, it is neither easy nor efficient.
|
||
Addressing these issues was the goal of PEP 3121, but many extensions,
|
||
including some in the standard library, took the least-effort approach
|
||
to porting to Python 3, leaving these issues unresolved.
|
||
This PEP keeps backwards compatibility, which should reduce pressure and give
|
||
extension authors adequate time to consider these issues when porting.
|
||
|
||
|
||
The current process
|
||
===================
|
||
|
||
Currently, extension modules export an initialization function named
|
||
"PyInit_modulename", named after the file name of the shared library. This
|
||
function is executed by the import machinery and must return either NULL in
|
||
the case of an exception, or a fully initialized module object. The
|
||
function receives no arguments, so it has no way of knowing about its
|
||
import context.
|
||
|
||
During its execution, the module init function creates a module object
|
||
based on a PyModuleDef struct. It then continues to initialize it by adding
|
||
attributes to the module dict, creating types, etc.
|
||
|
||
In the back, the shared library loader keeps a note of the fully qualified
|
||
module name of the last module that it loaded, and when a module gets
|
||
created that has a matching name, this global variable is used to determine
|
||
the fully qualified name of the module object. This is not entirely safe as it
|
||
relies on the module init function creating its own module object first,
|
||
but this assumption usually holds in practice.
|
||
|
||
|
||
The proposal
|
||
============
|
||
|
||
The current extension module initialization will be deprecated in favor of
|
||
a new initialization scheme. Since the current scheme will continue to be
|
||
available, existing code will continue to work unchanged, including binary
|
||
compatibility.
|
||
|
||
Extension modules that support the new initialization scheme must export
|
||
the public symbol "PyModuleExport_<modulename>", where "modulename"
|
||
is the name of the module. (For modules with non-ASCII names the symbol name
|
||
is slightly different, see "Export Hook Name" below.)
|
||
|
||
If defined, this symbol must resolve to a C function with the following
|
||
signature::
|
||
|
||
PyModuleDef* (*PyModuleExportFunction)(void)
|
||
|
||
For cross-platform compatibility, the function should be declared as::
|
||
|
||
PyMODEXPORT_FUNC PyModuleExport_<modulename>(void)
|
||
|
||
The function must return a pointer to a PyModuleDef structure.
|
||
This structure must be available for the lifetime of the module created from
|
||
it – usually, it will be declared statically.
|
||
|
||
Alternatively, this function can return NULL, in which case it is as if the
|
||
symbol was not defined – see the "Legacy Init" section.
|
||
|
||
The PyModuleDef structure will be changed to contain a list of slots,
|
||
similarly to PEP 384's PyType_Spec for types.
|
||
To keep binary compatibility, and avoid needing to introduce a new structure
|
||
(which would introduce additional supporting functions and per-module storage),
|
||
the currently unused m_reload pointer of PyModuleDef will be changed to
|
||
hold the slots. The structures are defined as::
|
||
|
||
typedef struct {
|
||
int slot;
|
||
void *value;
|
||
} PyModuleDef_Slot;
|
||
|
||
typedef struct PyModuleDef {
|
||
PyModuleDef_Base m_base;
|
||
const char* m_name;
|
||
const char* m_doc;
|
||
Py_ssize_t m_size;
|
||
PyMethodDef *m_methods;
|
||
PyModuleDef_Slot *m_slots; /* changed from `inquiry m_reload;` */
|
||
traverseproc m_traverse;
|
||
inquiry m_clear;
|
||
freefunc m_free;
|
||
} PyModuleDef;
|
||
|
||
The *m_slots* member must be either NULL, or point to an array of
|
||
PyModuleDef_Slot structures, terminated by a slot with id set to 0
|
||
(i.e. ``{0, NULL}``).
|
||
|
||
To specify a slot, a unique slot ID must be provided.
|
||
New Python versions may introduce new slot IDs, but slot IDs will never be
|
||
recycled. Slots may get deprecated, but will continue to be supported
|
||
throughout Python 3.x.
|
||
|
||
A slot's value pointer may not be NULL, unless specified otherwise in the
|
||
slot's documentation.
|
||
|
||
The following slots are currently available, and described later:
|
||
|
||
* Py_mod_create
|
||
* Py_mod_exec
|
||
|
||
Unknown slot IDs will cause the import to fail with SystemError.
|
||
|
||
When using the new import mechanism, m_size must not be negative.
|
||
Also, the *m_name* field of PyModuleDef will not be unused during importing;
|
||
the module name will be taken from the ModuleSpec.
|
||
|
||
|
||
Module Creation
|
||
---------------
|
||
|
||
Module creation – that is, the implementation of
|
||
ExecutionLoader.create_module – is governed by the Py_mod_create slot.
|
||
|
||
The Py_mod_create slot
|
||
......................
|
||
|
||
The Py_mod_create slot is used to support custom module subclasses.
|
||
The value pointer must point to a function with the following signature::
|
||
|
||
PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)
|
||
|
||
The function receives a ModuleSpec instance, as defined in PEP 451,
|
||
and the PyModuleDef structure.
|
||
It should return a new module object, or set an error
|
||
and return NULL.
|
||
|
||
This function is not responsible for setting import-related attributes
|
||
specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
|
||
``__loader__``) on the new module.
|
||
|
||
There is no requirement for the returned object to be an instance of
|
||
types.ModuleType. Any type can be used, as long as it supports setting and
|
||
getting attributes, including at least the import-related attributes.
|
||
However, only ModuleType instances support module-specific functionality
|
||
such as per-module state.
|
||
|
||
Note that when this function is called, the module's entry in sys.modules
|
||
is not populated yet. Attempting to import the same module again
|
||
(possibly transitively), may lead to an infinite loop.
|
||
Extension authors are advised to keep Py_mod_create minimal, an in particular
|
||
to not call user code from it.
|
||
|
||
Multiple Py_mod_create slots may not be specified. If they are, import
|
||
will fail with SystemError.
|
||
|
||
If Py_mod_create is not specified, the import machinery will create a normal
|
||
module object by PyModule_New. The name is taken from *spec*.
|
||
|
||
|
||
Post-creation steps
|
||
...................
|
||
|
||
If the Py_mod_create function returns an instance of types.ModuleType
|
||
(or subclass), or if a Py_mod_create slot is not present, the import machinery
|
||
will do the following steps after the module is created:
|
||
|
||
* If *m_size* is specified, per-module state is allocated and made accessible
|
||
through PyModule_GetState
|
||
* The PyModuleDef is associated with the module, making it accessible to
|
||
PyModule_GetDef, and enabling the m_traverse, m_clear and m_free hooks.
|
||
* The docstring is set from m_doc.
|
||
* The module's functions are initialized from m_methods.
|
||
|
||
If the Py_mod_create function does not return a module subclass, then m_size
|
||
must be 0 or negative, and m_traverse, m_clear and m_free must all be NULL.
|
||
Otherwise, SystemError is raised.
|
||
|
||
|
||
Module Execution
|
||
----------------
|
||
|
||
Module execution -- that is, the implementation of
|
||
ExecutionLoader.exec_module -- is governed by "execution slots".
|
||
This PEP only adds one, Py_mod_exec, but others may be added in the future.
|
||
|
||
Execution slots may be specified multiple times, and are processed in the order
|
||
they appear in the slots array.
|
||
When using the default import machinery, they are processed after
|
||
import-related attributes specified in PEP 451 [#pep-0451-attributes]_
|
||
(such as ``__name__`` or ``__loader__``) are set and the module is added
|
||
to sys.modules.
|
||
|
||
|
||
The Py_mod_exec slot
|
||
....................
|
||
|
||
The entry in this slot must point to a function with the following signature::
|
||
|
||
int (*PyModuleExecFunction)(PyObject* module)
|
||
|
||
It will be called to initialize a module. Usually, this amounts to
|
||
setting the module's initial attributes.
|
||
The "module" argument receives the module object to initialize.
|
||
|
||
If PyModuleExec replaces the module's entry in sys.modules,
|
||
the new object will be used and returned by importlib machinery.
|
||
(This mirrors the behavior of Python modules. Note that for extensions,
|
||
implementing Py_mod_create is usually a better solution for the use cases
|
||
this serves.)
|
||
|
||
The function must return ``0`` on success, or, on error, set an exception and
|
||
return ``-1``.
|
||
|
||
|
||
Legacy Init
|
||
-----------
|
||
|
||
If the PyModuleExport function is not defined, or if it returns NULL, the
|
||
import machinery will try to initialize the module using the
|
||
"PyInit_<modulename>" hook, as described in PEP 3121.
|
||
|
||
If the PyModuleExport function is defined, the PyInit function will be ignored.
|
||
Modules requiring compatibility with previous versions of CPython may implement
|
||
the PyInit function in addition to the new hook.
|
||
|
||
Modules using the legacy init API will be initialized entirely in the
|
||
Loader.create_module step; Loader.exec_module will be a no-op.
|
||
|
||
A module that supports older CPython versions can be coded as::
|
||
|
||
#define Py_LIMITED_API
|
||
#include <Python.h>
|
||
|
||
static int spam_exec(PyObject *module) {
|
||
PyModule_AddStringConstant(module, "food", "spam");
|
||
return 0;
|
||
}
|
||
|
||
static PyModuleDef_Slot spam_slots[] = {
|
||
{Py_mod_exec, spam_exec},
|
||
{0, NULL}
|
||
};
|
||
|
||
static PyModuleDef spam_def = {
|
||
PyModuleDef_HEAD_INIT, /* m_base */
|
||
"spam", /* m_name */
|
||
PyDoc_STR("Utilities for cooking spam"), /* m_doc */
|
||
0, /* m_size */
|
||
NULL, /* m_methods */
|
||
spam_slots, /* m_slots */
|
||
NULL, /* m_traverse */
|
||
NULL, /* m_clear */
|
||
NULL, /* m_free */
|
||
};
|
||
|
||
PyModuleDef* PyModuleExport_spam(void) {
|
||
return &spam_def;
|
||
}
|
||
|
||
PyMODINIT_FUNC
|
||
PyInit_spam(void) {
|
||
PyObject *module;
|
||
module = PyModule_Create(&spam_def);
|
||
if (module == NULL) return NULL;
|
||
if (spam_exec(module) != 0) {
|
||
Py_DECREF(module);
|
||
return NULL;
|
||
}
|
||
return module;
|
||
}
|
||
|
||
Note that this must be *compiled* on a new CPython version, but the resulting
|
||
shared library will be backwards compatible.
|
||
(Source-level compatibility is possible with preprocessor directives.)
|
||
|
||
If a Py_mod_create slot is used, PyInit should call its function instead of
|
||
PyModule_Create. Keep in mind that the ModuleSpec object is not available in
|
||
the legacy init scheme.
|
||
|
||
|
||
Subinterpreters and Interpreter Reloading
|
||
-----------------------------------------
|
||
|
||
Extensions using the new initialization scheme are expected to support
|
||
subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
|
||
The mechanism is designed to make this easy, but care is still required
|
||
on the part of the extension author.
|
||
No user-defined functions, methods, or instances may leak to different
|
||
interpreters.
|
||
To achieve this, all module-level state should be kept in either the module
|
||
dict, or in the module object's storage reachable by PyModule_GetState.
|
||
A simple rule of thumb is: Do not define any static data, except built-in types
|
||
with no mutable or user-settable class attributes.
|
||
|
||
Behavior of existing module creation functions
|
||
----------------------------------------------
|
||
|
||
The PyModule_Create function will fail when used on a PyModuleDef structure
|
||
with a non-NULL m_slots pointer.
|
||
The function doesn't have access to the ModuleSpec object necessary for
|
||
"new style" module creation.
|
||
|
||
The PyState_FindModule function will return NULL, and PyState_AddModule
|
||
and PyState_RemoveModule will fail with SystemError.
|
||
PyState registration is disabled because multiple module objects may be
|
||
created from the same PyModuleDef.
|
||
|
||
|
||
Module state and C-level callbacks
|
||
----------------------------------
|
||
|
||
Due to the unavailability of PyState_FindModule, any function that needs access
|
||
to module-level state (including functions, classes or exceptions defined at
|
||
the module level) must receive a reference to the module object (or the
|
||
particular object it needs), either directly or indirectly.
|
||
This is currently difficult in two situations:
|
||
|
||
* Methods of classes, which receive a reference to the class, but not to
|
||
the class's module
|
||
* Libraries with C-level callbacks, unless the callbacks can receive custom
|
||
data set at cllback registration
|
||
|
||
Fixing these cases is outside of the scope of this PEP, but will be needed for
|
||
the new mechanism to be useful to all modules. Proper fixes have been discussed
|
||
on the import-sig mailing list [#findmodule-discussion]_.
|
||
|
||
As a rule of thumb, modules that rely on PyState_FindModule are, at the moment,
|
||
not good candidates for porting to the new mechanism.
|
||
|
||
|
||
New Functions
|
||
-------------
|
||
|
||
A new function and macro will be added to implement module creation.
|
||
These are similar to PyModule_Create and PyModule_Create2, except they
|
||
take an additional ModuleSpec argument, and handle module definitions with
|
||
non-NULL slots::
|
||
|
||
PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
|
||
PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
|
||
int module_api_version)
|
||
|
||
A new function will be added to run "execution slots" on a module::
|
||
|
||
PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)
|
||
|
||
Additionally, two helpers will be added for setting the docstring and
|
||
methods on a module::
|
||
|
||
int PyModule_SetDocString(PyObject *, const char *)
|
||
int PyModule_AddFunctions(PyObject *, PyMethodDef *)
|
||
|
||
|
||
Export Hook Name
|
||
----------------
|
||
|
||
As portable C identifiers are limited to ASCII, module names
|
||
must be encoded to form the PyModuleExport hook name.
|
||
|
||
For ASCII module names, the import hook is named
|
||
PyModuleExport_<modulename>, where <modulename> is the name of the module.
|
||
|
||
For module names containing non-ASCII characters, the import hook is named
|
||
PyModuleExportU_<encodedname>, where the name is encoded using CPython's
|
||
"punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
|
||
with hyphens ("-") replaced by underscores ("_").
|
||
|
||
|
||
In Python::
|
||
|
||
def export_hook_name(name):
|
||
try:
|
||
suffix = b'_' + name.encode('ascii')
|
||
except UnicodeEncodeError:
|
||
suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
|
||
return b'PyModuleExport' + suffix
|
||
|
||
Examples:
|
||
|
||
============= ===========================
|
||
Module name Export hook name
|
||
============= ===========================
|
||
spam PyModuleExport_spam
|
||
lančmít PyModuleExportU_lanmt_2sa6t
|
||
スパム PyModuleExportU_zck5b2b
|
||
============= ===========================
|
||
|
||
|
||
Module Reloading
|
||
----------------
|
||
|
||
Reloading an extension module using importlib.reload() will continue to
|
||
have no effect, except re-setting import-related attributes.
|
||
|
||
Due to limitations in shared library loading (both dlopen on POSIX and
|
||
LoadModuleEx on Windows), it is not generally possible to load
|
||
a modified library after it has changed on disk.
|
||
|
||
Use cases for reloading other than trying out a new version of the module
|
||
are too rare to require all module authors to keep reloading in mind.
|
||
If reload-like functionality is needed, authors can export a dedicated
|
||
function for it.
|
||
|
||
|
||
Multiple modules in one library
|
||
-------------------------------
|
||
|
||
To support multiple Python modules in one shared library, the library can
|
||
export additional PyModuleExport* symbols besides the one that corresponds
|
||
to the library's filename.
|
||
|
||
Note that this mechanism can currently only be used to *load* extra modules,
|
||
not to *find* them.
|
||
|
||
Given the filesystem location of a shared library and a module name,
|
||
a module may be loaded with::
|
||
|
||
import importlib.machinery
|
||
import importlib.util
|
||
loader = importlib.machinery.ExtensionFileLoader(name, path)
|
||
spec = importlib.util.spec_from_loader(name, loader)
|
||
module = importlib.util.module_from_spec(spec)
|
||
loader.exec_module(module)
|
||
return module
|
||
|
||
On platforms that support symbolic links, these may be used to install one
|
||
library under multiple names, exposing all exported modules to normal
|
||
import machinery.
|
||
|
||
|
||
Testing and initial implementations
|
||
-----------------------------------
|
||
|
||
For testing, a new built-in module ``_testmoduleexport`` will be created.
|
||
The library will export several additional modules using the mechanism
|
||
described in "Multiple modules in one library".
|
||
|
||
The ``_testcapi`` module will be unchanged, and will use the old API
|
||
indefinitely (or until the old API is removed).
|
||
|
||
The ``array`` and ``xx*`` modules will be converted to the new API as
|
||
part of the initial implementation.
|
||
|
||
|
||
API Changes and Additions
|
||
-------------------------
|
||
|
||
New functions:
|
||
|
||
* PyModule_FromDefAndSpec (macro)
|
||
* PyModule_FromDefAndSpec2
|
||
* PyModule_ExecDef
|
||
* PyModule_SetDocString
|
||
* PyModule_AddFunctions
|
||
|
||
New macros:
|
||
|
||
* PyMODEXPORT_FUNC
|
||
* Py_mod_create
|
||
* Py_mod_exec
|
||
|
||
New structures:
|
||
|
||
* PyModuleDef_Slot
|
||
|
||
PyModuleDef.m_reload changes to PyModuleDef.m_slots.
|
||
|
||
|
||
Possible Future Extensions
|
||
==========================
|
||
|
||
The slots mechanism, inspired by PyType_Slot from PEP 384,
|
||
allows later extensions.
|
||
|
||
Some extension modules exports many constants; for example _ssl has
|
||
a long list of calls in the form::
|
||
|
||
PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
|
||
PY_SSL_ERROR_ZERO_RETURN);
|
||
|
||
Converting this to a declarative list, similar to PyMethodDef,
|
||
would reduce boilerplate, and provide free error-checking which
|
||
is often missing.
|
||
|
||
String constants and types can be handled similarly.
|
||
(Note that non-default bases for types cannot be portably specified
|
||
statically; this case would need a Py_mod_exec function that runs
|
||
before the slots are added. The free error-checking would still be
|
||
beneficial, though.)
|
||
|
||
Another possibility is providing a "main" function that would be run
|
||
when the module is given to Python's -m switch.
|
||
For this to work, the runpy module will need to be modified to take
|
||
advantage of ModuleSpec-based loading introduced in PEP 451.
|
||
Also, it will be necessary to add a mechanism for setting up a module
|
||
according to slots it wasn't originally defined with.
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
Work-in-progress implementation is available in a Github repository [#gh-repo]_;
|
||
a patchset is at [#gh-patch]_.
|
||
|
||
|
||
Previous Approaches
|
||
===================
|
||
|
||
Stefan Behnel's initial proto-PEP [#stefans_protopep]_
|
||
had a "PyInit_modulename" hook that would create a module class,
|
||
whose ``__init__`` would be then called to create the module.
|
||
This proposal did not correspond to the (then nonexistent) PEP 451,
|
||
where module creation and initialization is broken into distinct steps.
|
||
It also did not support loading an extension into pre-existing module objects.
|
||
|
||
Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
|
||
implementation [#nicks-prototype]_.
|
||
At this time PEP 451 was still not implemented, so the prototype
|
||
does not use ModuleSpec.
|
||
|
||
The original version of this PEP used Create and Exec hooks, and allowed
|
||
loading into arbitrary pre-constructed objects with Exec hook.
|
||
The proposal made extension module initialization closer to how Python modules
|
||
are initialized, but it was later recognized that this isn't an important goal.
|
||
The current PEP describes a simpler solution.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#lazy_import_concerns]
|
||
https://mail.python.org/pipermail/python-dev/2013-August/128129.html
|
||
|
||
.. [#pep-0451-attributes]
|
||
https://www.python.org/dev/peps/pep-0451/#attributes
|
||
|
||
.. [#stefans_protopep]
|
||
https://mail.python.org/pipermail/python-dev/2013-August/128087.html
|
||
|
||
.. [#nicks-prototype]
|
||
https://mail.python.org/pipermail/python-dev/2013-August/128101.html
|
||
|
||
.. [#rfc-3492]
|
||
http://tools.ietf.org/html/rfc3492
|
||
|
||
.. [#gh-repo]
|
||
https://github.com/encukou/cpython/commits/pep489
|
||
|
||
.. [#gh-patch]
|
||
https://github.com/encukou/cpython/compare/master...encukou:pep489.patch
|
||
|
||
.. [#findmodule-discussion]
|
||
https://mail.python.org/pipermail/import-sig/2015-April/000959.html
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|