PEP: 687 Title: Isolating modules in the standard library Author: Erlend Egeberg Aasland , Petr Viktorin Discussions-To: https://discuss.python.org/t/14824 Status: Draft Type: Standards Track Content-Type: text/x-rst Requires: 489, 573, 630 Created: 04-Apr-2022 Python-Version: 3.12 Post-History: `04-Apr-2022 `__, `11-Apr-2022 `__ Abstract ======== Extensions in the standard library will be converted to multi-phase initialization (:pep:`489`) and where possible, all state will be stored on module objects rather than in process-global variables. Note on Backdating ================== Much of this proposal has already been implemented. We submit this PEP to explain the changes, seek consensus on whether they are good, propose the remaining changes, and set best practices for new modules. Motivation & Rationale ====================== The informational :pep:`630` describes the background, motivation, rationale, implications and implementation notes of the proposed changes as they apply generally to any extension module (not just the standard library). It is an integral part of this proposal. Read it first. This PEP discusses specifics of the standard library. Specification ============= The body of :pep:`630` will be converted to a HOWTO in the Python documentation, and that PEP will be retired (marked Final). All extension modules in the standard library will be converted to multi-phase initialization introduced in :pep:`489`. All stdlib extension modules will be *isolated*. That is: - Types, functions and other objects defined by the module will either be immutable, or not shared with other module instances. - State specific to the module will not be shared with other module instances, unless it represents global state. For example, ``_csv.field_size_limit`` will get/set a module-specific number. On the other hand, functions like ``readline.get_history_item`` or ``os.getpid`` will continue to work with state that is process-global (external to the module, and possibly shared across other libraries, including non-Python ones). Conversion to heap types ------------------------ Static types that do not need module state access, and have no other reason to be converted, should stay static. Types whose methods need access to their module instance will be converted to heap types following :pep:`630`, with the following considerations: - All standard library types that used to be static types should remain immutable. Heap types must be defined with the ``Py_TPFLAGS_IMMUTABLE_TYPE`` flag to retain immutability. See `bpo-43908 `__. Tests should ensure ``TypeError`` is raised when trying to create a new attribute of an immutable type. - A static type with ``tp_new = NULL`` does not have a public constructor, but heap types inherit the constructor from the base class. Make sure types that previously were impossible to instantiate retain that feature; use ``Py_TPFLAGS_DISALLOW_INSTANTIATION``. Add tests using ``test.support.check_disallow_instantiation()``. See `bpo-43916 `__. - Converted heap types may unintentionally become serializable (``pickle``-able). Test that calling ``pickle.dumps`` has the same result before and after conversion, and if the test fails, add a ``__reduce__`` method that raises ``TypeError``. See `PR-21002 `__ for an example. These issues will be added to the Devguide to help any future conversions. If another kind of issue is found, the module in question should be unchanged until a solution is found and added to the Devguide, and already converted modules are checked and fixed. Process ------- The following process should be added to the Devguide, and remain until all modules are converted. Any new findings should be documented there or in the general HOWTO. Part 1: Preparation ................... 1. Open a discussion, either on the bug tracker or on Discourse. Involve the module maintainer and/or code owner. Explain the reason and rationale for the changes. 2. Identify global state performance bottlenecks. Create a proof-of-concept implementation, and measure the performance impact. ``pyperf`` is a good tool for benchmarking. 3. Create an implementation plan. For small modules with few types, a single PR may do the job. For larger modules with lots of types, and possibly also external library callbacks, multiple PR's will be needed. Part 2: Implementation ...................... Note: this is a suggested implementation plan for a complex module, based on lessons learned with other modules. Feel free to simplify it for smaller modules. 1. Add Argument Clinic where possible; it enables you to easily use the defining class to fetch module state from type methods. 2. Prepare for module state; establish a module state ``struct``, add an instance as a static global variable, and create helper stubs for fetching the module state. 3. Add relevant global variables to the module state ``struct``, and modify code that accesses the global state to use the module state helpers instead. This step may be broken into several PR's. 4. Where necessary, convert static types to heap types. 5. Convert the global module state struct to true module state. 6. Implement multi-phase initialisation. Steps 4 through 6 should preferably land in a single alpha development phase. Backwards Compatibility ======================= Extension modules in the standard library will now be loadable more than once. For example, deleting such a module from ``sys.modules`` and re-importing it will result in a fresh module instance, isolated from any previously loaded instances. This may affect code that expected the previous behavior: globals of extension modules were shallowly copied from the first loaded module. Security Implications ===================== None known. How to Teach This ================= A large part of this proposal is a HOWTO aimed at experienced users, which will be moved to the documentation. Beginners should not be affected. Reference Implementation ======================== Most of the changes are now in the main branch, as commits for these issues: - `bpo-40077, Convert static types to heap types: use PyType_FromSpec() `_ - `bpo-46417, Clear static types in Py_Finalize() for embedded Python `_ - `bpo-1635741, Py_Finalize() doesn't clear all Python objects at exit `_ As an example, changes and fix-ups done in the ``_csv`` module are: - `GH-23224, Remove static state from the _csv module `_ - `GH-26008, Allow subclassing of csv.Error `_ - `GH-26074, Add GC support to _csv heap types `_ - `GH-26351, Make heap types converted during 3.10 alpha immutable `_ Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: