PEP: 489 Title: Redesigning extension module loading Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan Discussions-To: import-sig@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. Extensions that do not require custom memory layout for their module objects may be executed in arbitrary pre-defined namespaces, paving the way for extension modules being runnable with Python's ``-m`` switch. Other extensions can use custom types for their module implementation. Module types are no longer restricted to types.ModuleType. This proposal makes it easy to support properties at the module level and to safely store arbitrary global state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. The initialisation function is not passed ModuleSpec information about the loaded module, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The other disadvantage of the discrepancy is that existing Python programmers learning C cannot effectively map concepts between the two domains. As long as extension modules are fundamentally different from pure Python ones in the way they're initialised, they are harder for people to pick up without relying on something like cffi, SWIG or Cython to handle the actual extension module creation. Currently, extension modules are also not added to sys.modules until they are fully initialized, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. Without the fully qualified module name, it is not trivial to correctly add the module to sys.modules either. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps the backwards-compatible behavior, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension modules export an initialisation function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialised module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialise it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The current extension module initialisation will be deprecated in favour of a new initialisation scheme. Since the current scheme will continue to be available, existing code will continue to work unchanged, including binary compatibility. Extension modules that support the new initialisation scheme must export the public symbol "PyModuleExec_modulename", and optionally "PyModuleCreate_modulename", where "modulename" is the name of the module. This mimics the previous naming convention for the "PyInit_modulename" function. If defined, these symbols must resolve to C functions with the following signatures, respectively:: int (*PyModuleExecFunction)(PyObject* module) PyObject* (*PyModuleCreateFunction)(PyObject* module_spec) The PyModuleExec function ------------------------- The PyModuleExec function is used to implement "loader.exec_module" defined in PEP 451. It function will be called to initialize a module. (Usually, this amounts to setting the module's initial attributes.) This happens in two situations: when the module is first initialized for a given (sub-)interpreter, and possibly later when the module is reloaded. When PyModuleExec is called, the module has already been added to sys.modules, and import-related attributes specified in PEP 451 [#pep-0451-attributes]_) have been set on the module. The "module" argument receives the module object to initialize. If PyModuleCreate is defined, "module" will generally be the the object returned by it. It is possible for a custom loader to pass any object to PyModuleExec, so this function should check and fail with TypeError if the module's type is unsupported. Any other assumptions should also be checked. If PyModuleCreate is not defined, PyModuleExec is expected to operate on any Python object for which attributes can be added by PyObject_GetAttr* and retrieved by PyObject_SetAttr*. This allows loading an extension into a pre-created module, making it possible to run it as __main__ in the future, participate in certain lazy-loading schemes [#lazy_import_concerns]_, or enable other creative uses. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery. (This mirrors the behavior of Python modules. Note that for extensions, implementing PyModuleCreate is usually a better solution for the use cases this serves.) The function must return ``0`` on success, or, on error, set an exception and return ``-1``. The PyModuleCreate function --------------------------- The optional PyModuleCreate function is used to implement "loader.create_module" defined in PEP 451. By exporting it, an extension module indicates that it uses a custom module object. This prevents loading the extension in a pre-created module, but gives greater flexibility in allowing a custom C-level layout of the module object. Most extensions will not need to implement this function. The "module_spec" argument receives a "ModuleSpec" instance, as defined in PEP 451. When called, this function must create and return a module object, or set an exception and return NULL. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes specified in PEP 451 [#pep-0451-attributes]_. This follows the current support for allowing arbitrary objects in sys.modules and makes it easier for extension modules to define a type that exactly matches their needs for holding module state. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep PyModuleCreate minimal, an in particular to not call user code from it. If PyModuleCreate is not defined, the default loader will construct a module object as if with PyModule_New. Initialization helper functions ------------------------------- For two initialization tasks previously done by PyModule_Create, two functions are introduced:: int PyModule_SetDocString(PyObject *m, const char *doc) int PyModule_AddFunctions(PyObject *m, PyMethodDef *functions) These set the module docstring, and add the module functions, respectively. Both will work on any Python object that supports setting attributes. They return ``0`` on success, and on failure, they set the exception and return ``-1``. PyCapsule convenience functions ------------------------------- Instead of custom module objects, PyCapsule will become the preferred mechanism for storing per-module C data. Two new convenience functions will be added to help with this. * :: PyObject *PyModule_AddCapsule( PyObject *module, const char *module_name, const char *attribute_name, void *pointer, PyCapsule_Destructor destructor) Add a new PyCapsule to *module* as *attribute_name*. The capsule name is formed by joining *module_name* and *attribute_name* by a dot. This convenience function can be used from a module initialization function instead of separate calls to PyCapsule_New and PyModule_AddObject. Returns a borrowed reference to the new capsule, or NULL (with exception set) on failure. * :: void *PyModule_GetCapsulePointer( PyObject *module, const char *module_name, const char *attribute_name) Returns the pointer stored in *module* as *attribute_name*, or NULL (with an exception set) on failure. The capsule name is formed by joining *module_name* and *attribute_name* by a dot. This convenience function can be used instead of separate calls to PyObject_GetAttr and PyCapsule_GetPointer. Extension authors are encouraged to define a macro to call PyModule_GetCapsulePointer and cast the result to an appropriate type. Generalizing PyModule_* functions --------------------------------- The following functions and macros will be modified to work on any object that supports attribute access: * PyModule_GetNameObject * PyModule_GetName * PyModule_GetFilenameObject * PyModule_GetFilename * PyModule_AddIntConstant * PyModule_AddStringConstant * PyModule_AddIntMacro * PyModule_AddStringMacro * PyModule_AddObject The PyModule_GetDict function will continue to only work on true module objects. This means that it should not be used on extension modules that only define PyModuleExec. Legacy Init ----------- If PyModuleExec is not defined, the import machinery will try to initialize the module using the PyModuleInit hook, as described in PEP 3121. If PyModuleExec is defined, PyModuleInit will be ignored. Modules requiring compatibility with previous versions of CPython may implement PyModuleInit in addition to the new hook. Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. Module Reloading ---------------- Reloading an extension module will re-execute its PyModuleInit function. Similar caveats apply to reloading an extension module as to reloading a Python module. Notably, attributes or any other state of the module are not reset before reloading. Additionally, due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Therefore, reloading extension modules is of limited use. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library must export appropriate PyModuleExec_ or PyModuleCreate_ hooks for each exported module. The modules are loaded using a ModuleSpec with origin set to the name of the library file, and name set to the module name. Note that this mechanism can currently only be used to *load* such modules, not to *find* them. XXX: This is an existing issue; either fix it/wait for a fix or provide an example of how to load such modules. Implementation ============== XXX - not started Open issues =========== We should expose some kind of API in importlib.util (or a better place?) that can be used to check that a module works with reloading and subinterpreters. Related issues ============== The runpy module will need to be modified to take advantage of PEP 451 and this PEP. This is out of scope for this PEP. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed the Create annd Exec hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. References ========== .. [#lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html Copyright ========= This document has been placed in the public domain.