python-peps/pep-0690.rst

414 lines
18 KiB
ReStructuredText

PEP: 690
Title: Lazy Imports
Author: Germán Méndez Bravo <german.mb@gmail.com>, Carl Meyer <carl@oddbird.net>
Sponsor: Barry Warsaw <barry@python.org>
Discussions-To: https://discuss.python.org/t/pep-690-lazy-imports/15474
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Apr-2022
Python-Version: 3.12
Abstract
========
This PEP proposes a feature to transparently defer the execution of imported
modules until the moment when an imported object is used. Since Python
programs commonly import many more modules than a single invocation of the
program is likely to use in practice, lazy imports can greatly reduce the
overall number of modules loaded, improving startup time and memory usage. Lazy
imports also mostly eliminate the risk of import cycles.
Motivation
==========
Common Python code style :pep:`prefers <8#imports>` imports at module
level, so they don't have to be repeated within each scope the imported object
is used in, and to avoid the inefficiency of repeated execution of the import
system at runtime. This means that importing the main module of a program
typically results in an immediate cascade of imports of most or all of the
modules that may ever be needed by the program.
Consider the example of a Python command line program with a number of
subcommands. Each subcommand may perform different tasks, requiring the import
of different dependencies. But a given invocation of the program will only
execute a single subcommand, or possibly none (i.e. if just ``--help`` usage
info is requested). Top-level eager imports in such a program will result in
the import of many modules that will never be used at all; the time spent
(possibly compiling and) executing these modules is pure waste.
In an effort to improve startup time, some large Python CLIs tools make imports
lazy by manually placing imports inline into functions to delay imports of
expensive subsystems. This manual approach is labor-intensive and fragile; one
misplaced import or refactor can easily undo painstaking optimization work.
Existing import-hook-based solutions such as `demandimport
<https://github.com/bwesterb/py-demandimport/>`_ or `importlib.util.LazyLoader
<https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader>`_
are limited in that only certain styles of import can be made truly lazy
(imports such as ``from foo import a, b`` will still eagerly import the module
``foo``) and they impose additional runtime overhead on every module attribute
access.
This PEP proposes a more comprehensive solution for lazy imports that does not
impose detectable overhead in real-world use. The implementation in this PEP
has already `demonstrated
<https://github.com/facebookincubator/cinder/blob/cinder/3.8/CinderDoc/lazy_imports.rst>`_
startup time improvements up to 70% and memory-use reductions up to
40% on real-world Python CLIs.
Lazy imports also eliminate most import cycles. With eager imports, "false
cycles" can easily occur which are fixed by simply moving an import to the
bottom of a module or inline into a function, or switching from ``from foo
import bar`` to ``import foo``. With lazy imports, these "cycles" just work.
The only cycles which will remain are those where two modules actually each use
a name from the other at module level; these "true" cycles are only fixable by
refactoring the classes or functions involved.
Rationale
=========
The aim of this feature is to make imports transparently lazy. "Lazy" means
that the import of a module (execution of the module body and addition of the
module object to ``sys.modules``) should not occur until the module (or a name
imported from it) is actually referenced during execution. "Transparent" means
that besides the delayed import (and necessarily observable effects of that,
such as delayed import side effects and changes to ``sys.modules``), there is
no other observable change in behavior: the imported object is present in the
module namespace as normal and is transparently loaded whenever first used: its
status as a "lazy imported object" is not directly observable from Python or
from C extension code.
The requirement that the imported object be present in the module namespace as
usual, even before the import has actually occurred, means that we need some
kind of "lazy object" placeholder to represent the not-yet-imported object.
The transparency requirement dictates that this placeholder must never be
visible to Python code; any reference to it must trigger the import and replace
it with the real imported object.
Given the possibility that Python (or C extension) code may pull objects
directly out of a module ``__dict__``, the only way to reliably prevent
accidental leakage of lazy objects is to have the dictionary itself be
responsible to ensure resolution of lazy objects on lookup.
To avoid a performance penalty on the vast majority of dictionaries which never
contain any lazy objects, we install a specialized lookup function
(``lookdict_unicode_lazy``) for module namespace dictionaries when they first
gain a lazy-object value. When this lookup function finds that the key
references a lazy object, it resolves the lazy object immediately before
returning it.
Some operations on dictionaries (e.g. iterating all values) don't go through
the lookup function; in these cases we have to add a check if the lookup
function is ``lookdict_unicode_lazy`` and if so, resolve all lazy values first.
This implementation comprehensively prevents leakage of lazy objects, ensuring
they are always resolved to the real imported object before anyone can get hold
of them for any use, while avoiding any significant performance impact on
dictionaries in general.
Specification
=============
Lazy imports are opt-in, and globally enabled via a new ``-L`` flag to the
Python interpreter, or a ``PYTHONLAZYIMPORTS`` environment variable.
When enabled, the loading and execution of all (and only) top level imports is
deferred until the imported name is used. This could happen immediately (e.g.
on the very next line after the import statement) or much later (e.g. while
using the name inside a function being called by some other code at some later
time.)
For these top level imports, there are two exceptions which will make them
eager (not lazy): imports inside ``try``/``except``/``finally`` or ``with``
blocks, and star imports (``from foo import *``.) Imports inside
exception-handling blocks (this includes ``with`` blocks, since those can also
"catch" and handle exceptions) remain eager so that any exceptions arising from
the import can be handled. Star imports must remain eager since performing the
import is the only way to know which names should be added to the namespace.
Imports inside class definitions or inside functions/methods are not "top
level" and are never lazy.
Dynamic imports using ``__import__()`` or ``importlib.import_module()`` are
also never lazy.
Debuggability
-------------
The implementation will ensure that exceptions resulting from a deferred import
have metadata attached pointing the user to the original import statement, to
ease debuggability of errors from lazy imports.
Additionally, debug logging from ``python -v`` will include logging when an
import statement has been encountered but execution of the import will be
deferred.
Python's ``-X importtime`` feature for profiling import costs adapts naturally
to lazy imports; the profiled time is the time spent actually importing.
Per-module opt out
------------------
Due to the backwards compatibility issues mentioned below, it may be necessary
to force some imports to be eager.
In first-party code, since imports inside a ``try`` or ``with`` block are never
lazy, this can be easily accomplished::
try: # force these imports to be eager
import foo
import bar
finally:
pass
This PEP proposes to add a new ``importlib.eager_imports()`` context manager,
so the above technique can be less verbose and doesn't require comments to
clarify its intent::
with eager_imports():
import foo
import bar
Since imports within context managers are always eager, the ``eager_imports()``
context manager can just be an alias to a null context manager. The context
manager does not force all imports to be recursively eager: ``foo`` and ``bar``
will be imported eagerly, but imports within those modules will still follow
the usual laziness rules.
The more difficult case can occur if an import in third-party code that can't
easily be modified must be forced to be eager. For this purpose, we propose to
add an API to ``importlib`` that can be called early in the process to specify
a list of module names within which all imports will be eager::
from importlib import set_eager_imports
set_eager_imports(["one.mod", "another"])
The effect of this is also shallow: all imports within ``one.mod`` will be
eager, but not imports in all modules imported by ``one.mod``.
Backwards Compatibility
=======================
This proposal preserves full backwards compatibility when the feature is
disabled, which is the default.
Even when enabled, most code will continue to work normally without any
observable change (other than improved startup time and memory usage.)
Namespace packages are not affected: they work just as they do currently,
except lazily.
In some existing code, lazy imports could produce currently unexpected results
and behaviors. The problems that we may see when enabling lazy imports in an
existing codebase are related to:
Import Side Effects
-------------------
Import side effects that would otherwise be produced by the execution of
imported modules during the execution of import statements will be deferred at
least until the imported objects are used.
These import side effects may include:
* code executing any side-effecting logic during import;
* relying on imported submodules being set as attributes in the parent module.
A relevant and typical affected case is the `click
<https://click.palletsprojects.com/>`_ library for building Python command-line
interfaces. If e.g. ``cli = click.group()`` is defined in ``main.py``, and
``sub.py`` imports ``cli`` from ``main`` and adds subcommands to it via
decorator (``@cli.command(...)``), but the actual ``cli()`` call is in
``main.py``, then lazy imports may prevent the subcommands from being
registered, since in this case Click is depending on side effects of the import
of ``sub.py``. In this case the fix is to ensure the import of ``sub.py`` is
eager, e.g. by using the ``importlib.eager_imports()`` context manager.
Dynamic Paths
-------------
There could be issues related to dynamic Python import paths; particularly,
adding (and then removing after the import) paths from ``sys.path``::
sys.path.insert(0, "/path/to/foo/module")
import foo
del sys.path[0]
foo.Bar()
In this case, with lazy imports enabled, the import of ``foo`` will not
actually occur while the addition to ``sys.path`` is present.
Deferred Exceptions
-------------------
All exceptions arising from import (including ``ModuleNotFoundError``) are
deferred from import time to first-use time, which could complicate debugging.
Accessing an object in the middle of any code could trigger a deferred import
and produce ``ImportError`` or any other exception resulting from the
resolution of the deferred object, while loading and executing the related
imported module. The implementation will provide debugging assistance in
lazy-import-triggered tracebacks to mitigate this issue.
Security Implications
=====================
Deferred execution of code could produce security concerns if process owner,
path, ``sys.path``, or other sensitive environment or contextual states change
between the time the ``import`` statement is executed and the time where the
imported object is used.
Performance Impact
==================
The reference implementation has shown that the feature has negligible
performance impact on existing real-world codebases (Instagram Server and other
several CLI programs at Meta), while providing substantial improvements to
startup time and memory usage.
The reference implementation shows small performance regressions in a few
pyperformance benchmarks, but improvements in others. (TODO update with
detailed data from 3.11 port of implementation.)
How to Teach This
=================
Since the feature is opt-in, beginners should not encounter it by default.
Documentation of the ``-L`` flag and ``PYTHONLAZYIMPORTS`` environment variable
can clarify the behavior of lazy imports.
Some best practices to deal with some of the issues that could arise and to
better take advantage of lazy imports are:
* Avoid relying on import side effects. Perhaps the most common reliance on
import side effects is the registry pattern, where population of some
external registry happens implicitly during the importing of modules, often
via decorators. Instead, the registry should be built via an explicit call
that perhaps does a discovery process to find decorated functions or classes.
* Always import needed submodules explicitly, don't rely on some other import
to ensure a module has its submodules as attributes. That is, do ``import
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only
works (unreliably) because the attribute ``foo.bar`` is added as a side
effect of ``foo.bar`` being imported somewhere else. With lazy imports this
may not always happen on time.
* Avoid using star imports, as those are always eager.
* When possible, do not import whole submodules. Import specific names instead;
i.e.: do ``from foo.bar import Baz``, not ``import foo.bar`` and then
``foo.bar.Baz``. If you import submodules (such as ``foo.qux`` and
``foo.fred``), with lazy imports enabled, when you access the parent module's
name (``foo`` in this case), that will trigger loading all of the sibling
submodules of the parent module (``foo.bar``, ``foo.qux`` and ``foo.fred``),
not only the one being accessed, because the parent module ``foo`` is the
actual deferred object name.
Reference Implementation
========================
The current reference implementation is available as part of
`Cinder <https://github.com/facebookincubator/cinder>`_.
Reference implementation is in use within Meta Platforms and has proven to
achieve improvements in startup time (and total runtime for some applications)
in the range of 40%-70%, as well as significant reduction in memory footprint
(up to 40%), thanks to not needing to execute imports that end up being unused
in the common flow.
Rejected Ideas
==============
Explicit syntax for lazy imports
--------------------------------
If the primary objective of lazy imports were solely to work around import
cycles and forward references, an explicitly-marked syntax for particular
targeted imports to be lazy would make a lot of sense. But in practice it would
be very hard to get robust startup time or memory use benefits from this
approach, since it would require converting most imports within your code base
(and in third-party dependencies) to use the lazy import syntax.
It would be possible to aim for a "shallow" laziness where only the top-level
imports of subsystems from the main module are made explicitly lazy, but then
imports within the subsystems are all eager. This is extremely fragile, though
-- it only takes one mis-placed import to undo the carefully constructed
shallow laziness. Globally enabling lazy imports, on the other hand, provides
in-depth robust laziness where you always pay only for the imports you use.
Half-lazy imports
-----------------
It would be possible to eagerly run the import loader to the point of finding
the module source, but then defer the actual execution of the module and
creation of the module object. The advantage of this would be that certain
classes of import errors (e.g. a simple typo in the module name) would be
caught eagerly instead of being deferred to the use of an imported name.
The disadvantage would be that the startup time benefits of lazy imports would
be significantly reduced, since unused imports would still require a filesystem
``stat()`` call, at least. It would also introduce a possibly non-obvious split
between *which* import errors are raised eagerly and which are delayed, when
lazy imports are enabled.
This idea is rejected for now on the basis that in practice, confusion about
import typos has not been an observed problem with the reference
implementation. Generally delayed imports are not delayed forever, and errors
show up soon enough to be caught and fixed (unless the import is truly unused.)
Lazy dynamic imports
--------------------
It would be possible to add a ``lazy=True`` or similar option to
``__import__()`` and/or ``importlib.import_module()``, to enable them to
perform lazy imports. That idea is rejected in this PEP for lack of a clear
use case. Dynamic imports are already far outside the :pep:`8` code style
recommendations for imports, and can easily be made precisely as lazy as
desired by placing them at the desired point in the code flow. These aren't
commonly used at module top level, which is where lazy imports applies.
Deep eager-imports override
---------------------------
The proposed ``importlib.eager_imports()`` context manager and
``importlib.set_eager_imports()`` override both have shallow effects: they only
force eagerness for the location where they are applied, not transitively. It
would be possible (although not simple) to provide a deep/transitive version of
one or both. That idea is rejected in this PEP because experience with the
reference implementation has not shown it to be necessary, and because it
prevents local reasoning about laziness of imports.
A deep override can lead to confusing behavior because the
transitively-imported modules may be imported from multiple locations, some of
which use the "deep eager override" and some of which don't. Thus those modules
may still be imported lazily initially, if they are first imported from a
location that doesn't have the override.
With deep overrides it is not possible to locally reason about whether a given
import will be lazy or eager. With the behavior specified in this PEP, such
local reasoning is possible.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.