836 lines
37 KiB
ReStructuredText
836 lines
37 KiB
ReStructuredText
PEP: 690
|
|
Title: Lazy Imports
|
|
Author: Germán Méndez Bravo <german.mb@gmail.com>, Carl Meyer <carl@oddbird.net>
|
|
Sponsor: Barry Warsaw <barry@python.org>
|
|
Discussions-To: https://discuss.python.org/t/pep-690-lazy-imports/15474
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 29-Apr-2022
|
|
Python-Version: 3.12
|
|
Post-History: `03-May-2022 <https://discuss.python.org/t/pep-690-lazy-imports/15474>`__,
|
|
`03-May-2022 <https://mail.python.org/archives/list/python-dev@python.org/thread/IHOSWMIBKCXVB46FI7NGOC2F34RUYZ5Z/>`__
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a feature to transparently defer the execution of imported
|
|
modules until the moment when an imported object is first used. Since Python
|
|
programs commonly import many more modules than a single invocation of the
|
|
program is likely to use in practice, lazy imports can greatly reduce the
|
|
overall number of modules loaded, improving startup time and memory usage. Lazy
|
|
imports also mostly eliminate the risk of import cycles.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Common Python code style :pep:`prefers <8#imports>` imports at module
|
|
level, so they don't have to be repeated within each scope the imported object
|
|
is used in, and to avoid the inefficiency of repeated execution of the import
|
|
system at runtime. This means that importing the main module of a program
|
|
typically results in an immediate cascade of imports of most or all of the
|
|
modules that may ever be needed by the program.
|
|
|
|
Consider the example of a Python command line program (CLI) with a number of
|
|
subcommands. Each subcommand may perform different tasks, requiring the import
|
|
of different dependencies. But a given invocation of the program will only
|
|
execute a single subcommand, or possibly none (i.e. if just ``--help`` usage
|
|
info is requested). Top-level eager imports in such a program will result in the
|
|
import of many modules that will never be used at all; the time spent (possibly
|
|
compiling and) executing these modules is pure waste.
|
|
|
|
To improve startup time, some large Python CLIs make imports lazy by manually
|
|
placing imports inline into functions to delay imports of expensive subsystems.
|
|
This manual approach is labor-intensive and fragile; one misplaced import or
|
|
refactor can easily undo painstaking optimization work.
|
|
|
|
The Python standard library already includes built-in support for lazy imports,
|
|
via `importlib.util.LazyLoader
|
|
<https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader>`_.
|
|
There are also third-party packages such as `demandimport
|
|
<https://github.com/bwesterb/py-demandimport/>`_. These provide a "lazy module
|
|
object" which delays its own import until first attribute access. This is not
|
|
sufficient to make all imports lazy: imports such as ``from foo import a, b``
|
|
will still eagerly import the module ``foo`` since they immediately access an
|
|
attribute from it. It also imposes noticeable runtime overhead on every module
|
|
attribute access, since it requires a Python-level ``__getattr__`` or
|
|
``__getattribute__`` implementation.
|
|
|
|
Authors of scientific Python packages have also made extensive use of lazy
|
|
imports to allow users to write e.g. ``import scipy as sp`` and then easily
|
|
access many different submodules with e.g. ``sp.linalg``, without requiring all
|
|
the many submodules to be imported up-front. `SPEC 1
|
|
<https://scientific-python.org/specs/spec-0001/>`_ codifies this practice in the
|
|
form of a ``lazy_loader`` library that can be used explicitly in a package
|
|
``__init__.py`` to provide lazily accessible submodules.
|
|
|
|
Users of static typing also have to import names for use in type annotations
|
|
that may never be used at runtime (if :pep:`563` or possibly in future
|
|
:pep:`649` are used to avoid eager runtime evaluation of annotations). Lazy
|
|
imports are very attractive in this scenario to avoid overhead of unneeded
|
|
imports.
|
|
|
|
This PEP proposes a more general and comprehensive solution for lazy imports
|
|
that can encompass all of the above use cases and does not impose detectable
|
|
overhead in real-world use. The implementation in this PEP has already
|
|
`demonstrated
|
|
<https://github.com/facebookincubator/cinder/blob/cinder/3.8/CinderDoc/lazy_imports.rst>`_
|
|
startup time improvements up to 70% and memory-use reductions up to 40% on
|
|
real-world Python CLIs.
|
|
|
|
Lazy imports also eliminate most import cycles. With eager imports, "false
|
|
cycles" can easily occur which are fixed by simply moving an import to the
|
|
bottom of a module or inline into a function, or switching from ``from foo
|
|
import bar`` to ``import foo``. With lazy imports, these "cycles" just work.
|
|
The only cycles which will remain are those where two modules actually each use
|
|
a name from the other at module level; these "true" cycles are only fixable by
|
|
refactoring the classes or functions involved.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The aim of this feature is to make imports transparently lazy. "Lazy" means
|
|
that the import of a module (execution of the module body and addition of the
|
|
module object to ``sys.modules``) should not occur until the module (or a name
|
|
imported from it) is actually referenced during execution. "Transparent" means
|
|
that besides the delayed import (and necessarily observable effects of that,
|
|
such as delayed import side effects and changes to ``sys.modules``), there is
|
|
no other observable change in behavior: the imported object is present in the
|
|
module namespace as normal and is transparently loaded whenever first used: its
|
|
status as a "lazy imported object" is not directly observable from Python or
|
|
from C extension code.
|
|
|
|
The requirement that the imported object be present in the module namespace as
|
|
usual, even before the import has actually occurred, means that we need some
|
|
kind of "lazy object" placeholder to represent the not-yet-imported object.
|
|
The transparency requirement dictates that this placeholder must never be
|
|
visible to Python code; any reference to it must trigger the import and replace
|
|
it with the real imported object.
|
|
|
|
Given the possibility that Python (or C extension) code may pull objects
|
|
directly out of a module ``__dict__``, the only way to reliably prevent
|
|
accidental leakage of lazy objects is to have the dictionary itself be
|
|
responsible to ensure resolution of lazy objects on lookup.
|
|
|
|
To avoid a performance penalty on the vast majority of dictionaries which never
|
|
contain any lazy objects, we set a specialized lookup kind (`DictKeysKind
|
|
<https://github.com/python/cpython/blob/3.11/Include/internal/pycore_dict.h#L80>`_)
|
|
for module namespace dictionaries when they first gain a lazy-object value. When
|
|
a lazy lookup kind is set and lookup finds that the key references a lazy
|
|
object, it resolves the lazy object immediately before returning it.
|
|
|
|
This implementation comprehensively prevents leakage of lazy objects, ensuring
|
|
they are always resolved to the real imported object before anyone can get hold
|
|
of them for any use, while avoiding any significant performance impact on
|
|
dictionaries in general.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Lazy imports are opt-in, and globally enabled either via a new ``-L`` flag to
|
|
the Python interpreter or via a call to a new ``importlib.set_lazy_imports()``
|
|
function, which makes all imports after the call lazy.
|
|
|
|
When enabled, the loading and execution of all (and only) top-level imports is
|
|
deferred until the imported name is first used. This could happen immediately
|
|
(e.g. on the very next line after the import statement) or much later (e.g.
|
|
while using the name inside a function being called by some other code at some
|
|
later time.)
|
|
|
|
For these top level imports, there are two contexts which will make them eager
|
|
(not lazy): imports inside ``try`` / ``except`` / ``finally`` or ``with``
|
|
blocks, and star imports (``from foo import *``.) Imports inside
|
|
exception-handling blocks (this includes ``with`` blocks, since those can also
|
|
"catch" and handle exceptions) remain eager so that any exceptions arising from
|
|
the import can be handled. Star imports must remain eager since performing the
|
|
import is the only way to know which names should be added to the namespace.
|
|
|
|
Imports inside class definitions or inside functions/methods are not "top
|
|
level" and are never lazy.
|
|
|
|
Dynamic imports using ``__import__()`` or ``importlib.import_module()`` are
|
|
also never lazy.
|
|
|
|
Lazy imports state (i.e. whether they have been enabled, and any excluded
|
|
modules; see below) is per-interpreter, but global within the interpreter (i.e.
|
|
all threads will be affected).
|
|
|
|
|
|
Example
|
|
-------
|
|
|
|
Say we have a module ``spam.py``::
|
|
|
|
# simulate some work
|
|
import time
|
|
time.sleep(10)
|
|
print("spam loaded")
|
|
|
|
And a module ``eggs.py`` which imports it::
|
|
|
|
import spam
|
|
print("imports done")
|
|
|
|
If we run ``python -L eggs.py``, the ``spam`` module will never be imported
|
|
(because it is never referenced after the import), ``"spam loaded"`` will never
|
|
be printed, and there will be no 10 second delay.
|
|
|
|
But if ``eggs.py`` simply references the name ``spam`` after importing it, that
|
|
will be enough to trigger the import of ``spam.py``::
|
|
|
|
import spam
|
|
print("imports done")
|
|
spam
|
|
|
|
Now if we run ``python -L eggs.py``, we will see the output ``"imports done"``
|
|
printed first, then a 10 second delay, and then ``"spam loaded"`` printed after
|
|
that.
|
|
|
|
Of course, in real use cases (especially with lazy imports), it's not
|
|
recommended to rely on import side effects like this to trigger real work. This
|
|
example is just to clarify the behavior of lazy imports.
|
|
|
|
Another way to explain the effect of lazy imports is that it is as if each lazy
|
|
import statement had instead been written inline in the source code immediately
|
|
before each use of the imported name. So one can think of lazy imports as
|
|
similar to transforming this code::
|
|
|
|
import foo
|
|
|
|
def func1():
|
|
return foo.bar()
|
|
|
|
def func2():
|
|
return foo.baz()
|
|
|
|
To this::
|
|
|
|
def func1():
|
|
import foo
|
|
return foo.bar()
|
|
|
|
def func2():
|
|
import foo
|
|
return foo.baz()
|
|
|
|
This gives a good sense of when the import of ``foo`` will occur under lazy
|
|
imports, but lazy import is not really equivalent to this code transformation.
|
|
There are several notable differences:
|
|
|
|
* Unlike in the latter code, under lazy imports the name ``foo`` still does
|
|
exist in the module's global namespace, and can be imported or referenced by
|
|
other modules that import this one. (Such references would also trigger the
|
|
import.)
|
|
|
|
* The runtime overhead of lazy imports is much lower than the latter code; after
|
|
the first reference to the name ``foo`` which triggers the import, subsequent
|
|
references will have zero import system overhead; they are indistinguishable
|
|
from a normal name reference.
|
|
|
|
In a sense, lazy imports turn the import statement into just a declaration of an
|
|
imported name or names, to later be fully resolved when referenced.
|
|
|
|
An import in the style ``from foo import bar`` can also be made lazy. When the
|
|
import occurs, the name ``bar`` will be added to the module namespace as a lazy
|
|
import. The first reference to ``bar`` will import ``foo`` and resolve ``bar``
|
|
to ``foo.bar``.
|
|
|
|
|
|
Intended usage
|
|
--------------
|
|
|
|
Since lazy imports are a potentially-breaking semantic change, they should be
|
|
enabled only by the author or maintainer of a Python application, who is
|
|
prepared to thoroughly test the application under the new semantics, ensure it
|
|
behaves as expected, and opt-out any specific imports as needed (see below).
|
|
Lazy imports should not be enabled speculatively by the end user of a Python
|
|
application with any expectation of success.
|
|
|
|
It is the responsibility of the application developer enabling lazy imports for
|
|
their application to opt-out any library imports that turn out to need to be
|
|
eager for their application to work correctly; it is not the responsibility of
|
|
library authors to ensure that their library behaves exactly the same under lazy
|
|
imports.
|
|
|
|
The documentation of the feature, the ``-L`` flag, and the new ``importlib``
|
|
APIs will be clear about the intended usage and the risks of adoption without
|
|
testing.
|
|
|
|
|
|
Implementation
|
|
--------------
|
|
|
|
Lazy imports are represented internally by a "lazy import" object. When a lazy
|
|
import occurs (say ``import foo`` or ``from foo import bar``), the key ``"foo"``
|
|
or ``"bar"`` is immediately added to the module namespace dictionary, but with
|
|
its value set to an internal-only "lazy import" object that preserves all the
|
|
necessary metadata to execute the import later. The ``DictKeysKind`` for the
|
|
module namespace dictionary is updated from e.g. ``DICT_KEYS_UNICODE`` to
|
|
``DICT_KEYS_UNICODE_LAZY`` to signal that this particular dictionary may contain
|
|
lazy import objects.
|
|
|
|
(In case someone adds a non-unicode key to a module namespace dictionary also
|
|
containing lazy import objects, e.g. via ``globals()[42] = "foo"``, there is
|
|
also ``DICT_KEYS_GENERAL_LAZY``, but in most cases this is not needed.)
|
|
|
|
Anytime a key is looked up in a dictionary with ``DICT_KEYS_UNICODE_LAZY`` or
|
|
``DICT_KEYS_GENERAL_LAZY``, the value is checked to see if it is a lazy import
|
|
object. If so, the import is immediately executed, the lazy import object is
|
|
replaced in the dictionary by the actual imported value, and the imported value
|
|
is returned from the lookup.
|
|
|
|
Because this is all handled internally by the dictionary implementation, lazy
|
|
import objects can never escape from the module namespace to become visible to
|
|
Python code; they are always resolved at their first reference.
|
|
|
|
Since only (some) module namespace dictionaries will ever have
|
|
``DICT_KEYS_*_LAZY`` set, the (minimal) extra lookup overhead to check for lazy
|
|
import objects is only paid by those dictionaries that need it; other
|
|
dictionaries have no added overhead.
|
|
|
|
No stub or dummy objects are ever visible to Python code or placed in
|
|
``sys.modules``. Other than the delayed import, the implementation is
|
|
transparent.
|
|
|
|
If a module is imported lazily, no entry for it will appear in ``sys.modules``
|
|
at all until it is actually imported on first reference.
|
|
|
|
If two different modules (``moda`` and ``modb``) both contain a lazy ``import
|
|
foo``, each module's namespace dictionary will have an independent lazy import
|
|
object under the key ``"foo"``, delaying import of the same ``foo`` module. This
|
|
is not a problem. When there is first a reference to, say, ``moda.foo``, the
|
|
module ``foo`` will be imported and placed in ``sys.modules`` as usual, and the
|
|
lazy object under the key ``moda.__dict__["foo"]`` will be replaced by the
|
|
actual module ``foo``. At this point ``modb.__dict__["foo"]`` will remain a lazy
|
|
import object. When ``modb.foo`` is later referenced, it will also try to
|
|
``import foo``. This import will find the module already present in
|
|
``sys.modules``, as is normal for subsequent imports of the same module in
|
|
Python, and at this point will replace the lazy import object at
|
|
``modb.__dict__["foo"]`` with the actual module ``foo``.
|
|
|
|
There is one case in which a lazy import object can escape one dictionary (but
|
|
only into another dictionary) without being resolved. To preserve the
|
|
performance of bulk-copy operations like ``dict.update()`` and ``dict.copy()``,
|
|
they do not check for or resolve lazy import objects. However, if the source
|
|
dict has a ``*_LAZY`` lookup kind set that indicates it might contain lazy
|
|
objects, that lookup kind will be passed on to the updated/copied dictionary.
|
|
This still ensures that the lazy import object can't escape into Python code
|
|
without being resolved.
|
|
|
|
Other "bulk" dictionary lookup methods (such as ``dict.items()``,
|
|
``dict.values()``, etc) will resolve all lazy import objects in the dictionary.
|
|
Since it is uncommon for any of these to be used on a module namespace
|
|
dictionary, the priority here is simplicity of implementation and minimizing the
|
|
overhead on normal non-lazy dictionaries (just one check to see if the
|
|
dictionary has a ``*_LAZY`` lookup kind).
|
|
|
|
The eagerness of imports within ``try`` / ``except`` / ``with`` blocks or within
|
|
class or function bodies is handled in the compiler via a new
|
|
``EAGER_IMPORT_NAME`` opcode that always imports eagerly. Top-level imports use
|
|
``IMPORT_NAME``, which may be lazy or eager depending on ``-L`` and/or
|
|
``importlib.set_lazy_imports()``.
|
|
|
|
|
|
Exceptions
|
|
----------
|
|
|
|
Exceptions that occur during a lazy import bubble up and erase the
|
|
partially-constructed module(s) from ``sys.modules``, just as exceptions during
|
|
normal import do.
|
|
|
|
Since errors raised during a lazy import will occur later (wherever the imported
|
|
name is first referenced) than they would if the import were eager, it is
|
|
possible that they could be accidentally caught by exception handlers that
|
|
didn't expect the import to be running within their ``try`` block, leading to
|
|
confusion. To reduce the potential for this confusion, exceptions raised in the
|
|
course of executing a lazy import will be replaced by a ``LazyImportError``
|
|
exception (a subclass of ``ImportError``), with ``__cause__`` set to the
|
|
original exception.
|
|
|
|
The ``LazyImportError`` will have source location metadata attached pointing the
|
|
user to the original import statement, to ease debuggability of errors from lazy
|
|
imports. (It won't have a full traceback to the original import location; this
|
|
is too expensive to preserve for all lazy imports, and it's not clear that it
|
|
provides significant value over simply knowing the location of the import
|
|
statement.)
|
|
|
|
Only ``Exception`` are replaced in this way, not ``BaseException``.
|
|
``BaseException`` are for "system-exiting" exceptions like ``KeyboardInterrupt``
|
|
or ``SystemExit``; these are normally not caught, and if they are caught, it is
|
|
less likely to be specific to a certain bit of code that was expected to raise
|
|
them, and more likely that the goal is to catch them whatever their origin.
|
|
|
|
|
|
Debugging
|
|
---------
|
|
|
|
Debug logging from ``python -v`` will include logging whenever an import
|
|
statement has been encountered but execution of the import will be deferred.
|
|
|
|
Python's ``-X importtime`` feature for profiling import costs adapts naturally
|
|
to lazy imports; the profiled time is the time spent actually importing.
|
|
|
|
Although lazy import objects are never visible to Python code, in some debugging
|
|
cases it may be useful to check from Python code whether the value at a given
|
|
key in a given dictionary is a lazy import object, without triggering its
|
|
resolution. For this purpose, ``importlib.is_lazy_import()`` can be used::
|
|
|
|
from importlib import is_lazy_import
|
|
|
|
import foo
|
|
|
|
is_lazy_import(globals(), "foo")
|
|
|
|
foo
|
|
|
|
is_lazy_import(globals(), "foo")
|
|
|
|
In this example, if lazy imports have been enabled the first call to
|
|
``is_lazy_import`` will return ``True`` and the second will return ``False``.
|
|
|
|
|
|
Per-module opt out
|
|
------------------
|
|
|
|
Due to the backwards compatibility issues mentioned below, it may be necessary
|
|
for an application using lazy imports to force some imports to be eager.
|
|
|
|
In first-party code, since imports inside a ``try`` or ``with`` block are never
|
|
lazy, this can be easily accomplished::
|
|
|
|
try: # force these imports to be eager
|
|
import foo
|
|
import bar
|
|
finally:
|
|
pass
|
|
|
|
This PEP proposes to add a new ``importlib.eager_imports()`` context manager,
|
|
so the above technique can be less verbose and doesn't require comments to
|
|
clarify its intent::
|
|
|
|
from importlib import eager_imports
|
|
|
|
with eager_imports():
|
|
import foo
|
|
import bar
|
|
|
|
Since imports within context managers are always eager, the ``eager_imports()``
|
|
context manager can just be an alias to a null context manager. The context
|
|
manager's effect is not transitive: ``foo`` and ``bar`` will be imported
|
|
eagerly, but imports within those modules will still follow the usual laziness
|
|
rules.
|
|
|
|
The more difficult case can occur if an import in third-party code that can't
|
|
easily be modified must be forced to be eager. For this purpose,
|
|
``importlib.set_lazy_imports()`` takes an optional keyword-only ``excluding``
|
|
argument, which can be set to a container of module names within which all
|
|
imports will be eager::
|
|
|
|
from importlib import set_lazy_imports
|
|
|
|
set_lazy_imports(excluding=["one.mod", "another"])
|
|
|
|
The effect of this is also shallow: all imports within ``one.mod`` will be
|
|
eager, but not imports in all modules imported by ``one.mod``.
|
|
|
|
The ``excluding`` parameter of ``set_lazy_imports()`` can also be set to a
|
|
callback which receives a module name and returns whether imports within this
|
|
module should be eager::
|
|
|
|
import re
|
|
from importlib import set_lazy_imports
|
|
|
|
def eager_imports(name):
|
|
return re.match(r"foo\.[^.]+\.logger", name)
|
|
|
|
set_lazy_imports(excluding=eager_imports)
|
|
|
|
If Python was executed with the ``-L`` flag, then lazy imports will already be
|
|
globally enabled, and the only effect of calling ``set_lazy_imports()`` will be
|
|
to globally set the eager module names/callback. If ``set_lazy_imports()`` is
|
|
called with no ``excluding`` argument, the exclusion list/callback will be
|
|
cleared and all eligible imports (module-level imports not in
|
|
``try/except/with``, and not ``import *``) will be lazy from that point forward.
|
|
|
|
``set_lazy_imports()`` may be called more than once, with subsequent calls
|
|
having only the effect of globally replacing or clearing the ``excluding``
|
|
list/callback. Generally there should be no reason to do this: the intended use
|
|
is a single call to ``set_lazy_imports`` in the main module, early in the
|
|
process.
|
|
|
|
This opt-out system is designed to maintain the possibility of local reasoning
|
|
about the laziness of an import. You only need to see the code of one module,
|
|
and the ``excluding`` argument to ``set_lazy_imports``, if any, to know whether
|
|
a given import will be eager or lazy.
|
|
|
|
|
|
Testing
|
|
-------
|
|
|
|
The CPython test suite will pass with lazy imports enabled (possibly with some
|
|
tests skipped). One buildbot should run the test suite with lazy imports
|
|
enabled.
|
|
|
|
|
|
C API
|
|
-----
|
|
|
|
For authors of C extension modules, the proposed
|
|
``importlib.set_lazy_imports()`` function will also be exposed in the stable C
|
|
API as ``PyImport_SetLazyImports(PyObject *names_or_callback_or_null)``, and
|
|
``importlib.is_lazy_import`` will be available as ``PyDict_IsLazyImport(PyObject
|
|
*dict, PyObject *key)``.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
This proposal preserves full backwards compatibility when the feature is
|
|
disabled, which is the default.
|
|
|
|
Even when enabled, most code will continue to work normally without any
|
|
observable change (other than improved startup time and memory usage.)
|
|
Namespace packages are not affected: they work just as they do currently,
|
|
except lazily.
|
|
|
|
In some existing code, lazy imports could produce currently unexpected results
|
|
and behaviors. The problems that we may see when enabling lazy imports in an
|
|
existing codebase are related to:
|
|
|
|
|
|
Import Side Effects
|
|
-------------------
|
|
|
|
Import side effects that would otherwise be produced by the execution of
|
|
imported modules during the execution of import statements will be deferred at
|
|
least until the imported objects are used.
|
|
|
|
These import side effects may include:
|
|
|
|
* code executing any side-effecting logic during import;
|
|
* relying on imported submodules being set as attributes in the parent module.
|
|
|
|
A relevant and typical affected case is the `click
|
|
<https://click.palletsprojects.com/>`_ library for building Python command-line
|
|
interfaces. If e.g. ``cli = click.group()`` is defined in ``main.py``, and
|
|
``sub.py`` imports ``cli`` from ``main`` and adds subcommands to it via
|
|
decorator (``@cli.command(...)``), but the actual ``cli()`` call is in
|
|
``main.py``, then lazy imports may prevent the subcommands from being
|
|
registered, since in this case Click is depending on side effects of the import
|
|
of ``sub.py``. In this case the fix is to ensure the import of ``sub.py`` is
|
|
eager, e.g. by using the ``importlib.eager_imports()`` context manager.
|
|
|
|
|
|
Dynamic Paths
|
|
-------------
|
|
|
|
There could be issues related to dynamic Python import paths; particularly,
|
|
adding (and then removing after the import) paths from ``sys.path``::
|
|
|
|
sys.path.insert(0, "/path/to/foo/module")
|
|
import foo
|
|
del sys.path[0]
|
|
foo.Bar()
|
|
|
|
In this case, with lazy imports enabled, the import of ``foo`` will not actually
|
|
occur while the addition to ``sys.path`` is present.
|
|
|
|
An easy fix for this (which arguably also improves the code style) would be to
|
|
place the ``sys.path`` modifications in a context manager. This resolves the
|
|
issue, since imports inside a ``with`` block are always eager.
|
|
|
|
|
|
Deferred Exceptions
|
|
-------------------
|
|
|
|
All exceptions arising from import (including ``ModuleNotFoundError``) are
|
|
deferred from import time to first-use time, which could complicate debugging.
|
|
Referencing a name in the middle of any code could trigger a deferred import and
|
|
produce ``LazyImportError`` while loading and executing the related imported
|
|
module.
|
|
|
|
Ensuring all lazy import errors are raised as ``LazyImportError`` mitigates this
|
|
issue by reducing the likelihood that they will be accidentally caught and
|
|
mistaken for a different expected exception. ``LazyImportError`` will also
|
|
provide the location of the original import statement to aid in debugging, as
|
|
described above.
|
|
|
|
|
|
Drawbacks
|
|
=========
|
|
|
|
Downsides of this PEP include:
|
|
|
|
* It provides a subtly incompatible semantics for the behavior of Python
|
|
imports. This is a potential burden on library authors who may be asked by their
|
|
users to support both semantics, and is one more possibility for Python
|
|
users/readers to be aware of.
|
|
|
|
* Some popular Python coding patterns (notably centralized registries populated
|
|
by a decorator) rely on import side effects and may require explicit opt-out to
|
|
work as expected with lazy imports.
|
|
|
|
Lazy import semantics are already possible and even supported today in the
|
|
Python standard library, so these drawbacks are not newly introduced by this
|
|
PEP. So far, existing usage of lazy imports by some applications has not proven
|
|
a problem. But this PEP is likely to make the usage of lazy imports more
|
|
popular, potentially exacerbating these drawbacks.
|
|
|
|
These drawbacks must be weighed against the significant benefits offered by this
|
|
PEP's implementation of lazy imports. Ultimately these costs will be higher if
|
|
the feature is widely used; but wide usage also indicates the feature provides a
|
|
lot of value, perhaps justifying the costs.
|
|
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
Deferred execution of code could produce security concerns if process owner,
|
|
shell path, ``sys.path``, or other sensitive environment or contextual states
|
|
change between the time the ``import`` statement is executed and the time the
|
|
imported object is first referenced.
|
|
|
|
|
|
Performance Impact
|
|
==================
|
|
|
|
The reference implementation has shown that the feature has negligible
|
|
performance impact on existing real-world codebases (Instagram Server, several
|
|
CLI programs at Meta, Jupyter notebooks used by Meta researchers), while
|
|
providing substantial improvements to startup time and memory usage.
|
|
|
|
The reference implementation shows small performance regressions in a few
|
|
pyperformance benchmarks, but improvements in others. (TODO update with
|
|
detailed data from main-branch port of implementation.)
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
Since the feature is opt-in, beginners should not encounter it by default.
|
|
Documentation of the ``-L`` flag and ``importlib.set_lazy_imports()`` can
|
|
clarify the behavior of lazy imports.
|
|
|
|
The documentation should also clarify that opting into lazy imports is opting
|
|
into a non-standard semantics for Python imports, which could cause Python
|
|
libraries to break in unexpected ways. The responsibility to identify these
|
|
breakages and work around them with an opt-out (or stop using lazy imports)
|
|
rests entirely with the person choosing to enable lazy imports for their
|
|
application, not with the library author. Python libraries are under no
|
|
obligation to support lazy import semantics. Politely reporting an
|
|
incompatibility may be useful to the library author, but they may choose to
|
|
simply say their library does not support use with lazy imports, and this is a
|
|
valid choice.
|
|
|
|
Some best practices to deal with some of the issues that could arise and to
|
|
better take advantage of lazy imports are:
|
|
|
|
* Avoid relying on import side effects. Perhaps the most common reliance on
|
|
import side effects is the registry pattern, where population of some external
|
|
registry happens implicitly during the importing of modules, often via
|
|
decorators. Instead, the registry should be built via an explicit call that does
|
|
a discovery process to find decorated functions or classes in explicitly
|
|
nominated modules.
|
|
|
|
* Always import needed submodules explicitly, don't rely on some other import
|
|
to ensure a module has its submodules as attributes. That is, unless there is an
|
|
explicit ``from . import bar`` in ``foo/__init__.py``, always do ``import
|
|
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only works
|
|
(unreliably) because the attribute ``foo.bar`` is added as a side effect of
|
|
``foo.bar`` being imported somewhere else. With lazy imports this may not always
|
|
happen on time.
|
|
|
|
* Avoid using star imports, as those are always eager.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The current reference implementation is available as part of `Cinder
|
|
<https://github.com/facebookincubator/cinder>`_. This reference implementation
|
|
is in use within Meta and has proven to achieve improvements in startup time
|
|
(and total runtime for some applications) in the range of 40%-70%, as well as
|
|
significant reduction in memory footprint (up to 40%), thanks to not needing to
|
|
execute imports that end up being unused in the common flow.
|
|
|
|
An updated reference implementation based on CPython main branch is in progress
|
|
and will be linked here soon. (TODO link.)
|
|
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
Per-module opt-in
|
|
-----------------
|
|
|
|
A per-module opt-in using e.g. ``from __future__ import lazy_imports`` has a
|
|
couple of disadvantages:
|
|
|
|
* It is less practical to achieve robust and significant startup-time or
|
|
memory-use wins by piecemeal application of lazy imports. Generally it would
|
|
require blanket application of the ``__future__`` import to most of the
|
|
codebase, as well as to third-party dependencies (which may be hard or
|
|
impossible.)
|
|
|
|
* ``__future__`` imports are not feature flags, they are for transition to
|
|
behaviors which will become default in the future. It is not clear if lazy
|
|
imports will ever make sense as the default behavior, so we should not
|
|
promise this with a ``__future__`` import. Thus, a per-module opt-in would
|
|
require a new ``from __optional_features__ import lazy_imports`` or similar
|
|
mechanism.
|
|
|
|
Experience with the reference implementation suggests that the most practical
|
|
adoption path for lazy imports is for a specific deployed application to opt-in
|
|
globally, observe whether anything breaks, and opt-out specific modules as
|
|
needed.
|
|
|
|
|
|
Explicit syntax for individual lazy imports
|
|
-------------------------------------------
|
|
|
|
If the primary objective of lazy imports were solely to work around import
|
|
cycles and forward references, an explicitly-marked syntax for particular
|
|
targeted imports to be lazy would make a lot of sense. But in practice it would
|
|
be very hard to get robust startup time or memory use benefits from this
|
|
approach, since it would require converting most imports within your code base
|
|
(and in third-party dependencies) to use the lazy import syntax.
|
|
|
|
It would be possible to aim for a "shallow" laziness where only the top-level
|
|
imports of subsystems from the main module are made explicitly lazy, but then
|
|
imports within the subsystems are all eager. This is extremely fragile, though
|
|
-- it only takes one mis-placed import to undo the carefully constructed
|
|
shallow laziness. Globally enabling lazy imports, on the other hand, provides
|
|
in-depth robust laziness where you always pay only for the imports you use.
|
|
|
|
There may be use cases (e.g. for static typing) where individually-marked lazy
|
|
imports are desirable to avoid forward references, but the perf/memory benefits
|
|
of globally lazy imports are not needed. Since this is a different set of
|
|
motivating use cases and requires new syntax, we prefer not to include it in
|
|
this PEP. Another PEP could build on top of this implementation and propose the
|
|
additional syntax.
|
|
|
|
|
|
Environment variable to enable lazy imports
|
|
-------------------------------------------
|
|
|
|
Providing an environment variable opt-in lends itself too easily to abuse of the
|
|
feature. It may seem tempting for a Python user to, for instance, globally set
|
|
the environment variable in their shell in the hopes of speeding up all the
|
|
Python programs they run. This usage with untested programs is likely to lead to
|
|
spurious bug reports and maintenance burden for the authors of those tools. To
|
|
avoid this, we choose not to provide an environment variable opt-in at all.
|
|
|
|
|
|
Removing the ``-L`` flag
|
|
------------------------
|
|
|
|
We do provide the ``-L`` CLI flag, which could in theory be abused in a similar
|
|
way by an end user running an individual Python program that is run with
|
|
``python somescript.py`` or ``python -m somescript`` (rather than distributed
|
|
via Python packaging tools). But the potential scope for misuse is much less
|
|
with ``-L`` than an environment variable, and ``-L`` is valuable for some
|
|
applications to maximize startup time benefits by ensuring that all imports from
|
|
the start of a process will be lazy, so we choose to keep it.
|
|
|
|
It is already the case that running arbitrary Python programs with command line
|
|
flags they weren't intended to be used with (e.g. ``-s``, ``-S``, ``-E``, or
|
|
``-I``) can have unexpected and breaking results. ``-L`` is nothing new in this
|
|
regard.
|
|
|
|
|
|
Half-lazy imports
|
|
-----------------
|
|
|
|
It would be possible to eagerly run the import loader to the point of finding
|
|
the module source, but then defer the actual execution of the module and
|
|
creation of the module object. The advantage of this would be that certain
|
|
classes of import errors (e.g. a simple typo in the module name) would be
|
|
caught eagerly instead of being deferred to the use of an imported name.
|
|
|
|
The disadvantage would be that the startup time benefits of lazy imports would
|
|
be significantly reduced, since unused imports would still require a filesystem
|
|
``stat()`` call, at least. It would also introduce a possibly non-obvious split
|
|
between *which* import errors are raised eagerly and which are delayed, when
|
|
lazy imports are enabled.
|
|
|
|
This idea is rejected for now on the basis that in practice, confusion about
|
|
import typos has not been an observed problem with the reference
|
|
implementation. Generally delayed imports are not delayed forever, and errors
|
|
show up soon enough to be caught and fixed (unless the import is truly unused.)
|
|
|
|
Another possible motivation for half-lazy imports would be to allow modules
|
|
themselves to control via some flag whether they are imported lazily or eagerly.
|
|
This is rejected both on the basis that it requires half-lazy imports, giving up
|
|
some of the performance benefits of import laziness, and because in general
|
|
modules do not decide how or when they are imported, the module importing them
|
|
decides that. There isn't clear rationale for this PEP to invert that control;
|
|
instead it just provides more options for the importing code to make the
|
|
decision.
|
|
|
|
|
|
Lazy dynamic imports
|
|
--------------------
|
|
|
|
It would be possible to add a ``lazy=True`` or similar option to
|
|
``__import__()`` and/or ``importlib.import_module()``, to enable them to
|
|
perform lazy imports. That idea is rejected in this PEP for lack of a clear
|
|
use case. Dynamic imports are already far outside the :pep:`8` code style
|
|
recommendations for imports, and can easily be made precisely as lazy as
|
|
desired by placing them at the desired point in the code flow. These aren't
|
|
commonly used at module top level, which is where lazy imports applies.
|
|
|
|
|
|
Deep eager-imports override
|
|
---------------------------
|
|
|
|
The proposed ``importlib.eager_imports()`` context manager and
|
|
``importlib.set_lazy_imports(excluding=...)`` override both have shallow
|
|
effects: they only force eagerness for the location they are applied to, not
|
|
transitively. It would be possible (although not simple) to provide a
|
|
deep/transitive version of one or both. That idea is rejected in this PEP
|
|
because the implementation would be complex (taking into account threads and
|
|
async code), experience with the reference implementation has not shown it to be
|
|
necessary, and because it prevents local reasoning about laziness of imports.
|
|
|
|
A deep override can lead to confusing behavior because the
|
|
transitively-imported modules may be imported from multiple locations, some of
|
|
which use the "deep eager override" and some of which don't. Thus those modules
|
|
may still be imported lazily initially, if they are first imported from a
|
|
location that doesn't have the override.
|
|
|
|
With deep overrides it is not possible to locally reason about whether a given
|
|
import will be lazy or eager. With the behavior specified in this PEP, such
|
|
local reasoning is possible.
|
|
|
|
|
|
Making lazy imports the default behavior
|
|
----------------------------------------
|
|
|
|
Making lazy imports the default/sole behavior of Python imports, instead of
|
|
opt-in, would have some long-term benefits, in that library authors would
|
|
(eventually) no longer need to consider the possibility of both semantics.
|
|
|
|
However, the backwards-incompatibilies are such that this could only be
|
|
considered over a long time frame, with a ``__future__`` import. It is not at
|
|
all clear that lazy imports should become the default import semantics for
|
|
Python.
|
|
|
|
Providing only per-module opt-in with a ``__future__`` import makes it much more
|
|
difficult for the applications that can benefit from lazy imports to do so
|
|
immediately, as discussed above.
|
|
|
|
This PEP takes the position that the Python community needs more experience with
|
|
lazy imports before considering making it the default behavior, so that is
|
|
entirely left to a possible future PEP.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|