871 lines
39 KiB
ReStructuredText
871 lines
39 KiB
ReStructuredText
PEP: 690
|
|
Title: Lazy Imports
|
|
Author: Germán Méndez Bravo <german.mb@gmail.com>, Carl Meyer <carl@oddbird.net>
|
|
Sponsor: Barry Warsaw <barry@python.org>
|
|
Discussions-To: https://discuss.python.org/t/pep-690-lazy-imports/15474
|
|
Status: Rejected
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 29-Apr-2022
|
|
Python-Version: 3.12
|
|
Post-History: `03-May-2022 <https://discuss.python.org/t/pep-690-lazy-imports/15474>`__,
|
|
`03-May-2022 <https://mail.python.org/archives/list/python-dev@python.org/thread/IHOSWMIBKCXVB46FI7NGOC2F34RUYZ5Z/>`__
|
|
Resolution: https://discuss.python.org/t/pep-690-lazy-imports-again/19661/26
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a feature to transparently defer the finding and execution of
|
|
imported modules until the moment when an imported object is first used. Since
|
|
Python programs commonly import many more modules than a single invocation of
|
|
the program is likely to use in practice, lazy imports can greatly reduce the
|
|
overall number of modules loaded, improving startup time and memory usage. Lazy
|
|
imports also mostly eliminate the risk of import cycles.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Common Python code style :pep:`prefers <8#imports>` imports at module
|
|
level, so they don't have to be repeated within each scope the imported object
|
|
is used in, and to avoid the inefficiency of repeated execution of the import
|
|
system at runtime. This means that importing the main module of a program
|
|
typically results in an immediate cascade of imports of most or all of the
|
|
modules that may ever be needed by the program.
|
|
|
|
Consider the example of a Python command line program (CLI) with a number of
|
|
subcommands. Each subcommand may perform different tasks, requiring the import
|
|
of different dependencies. But a given invocation of the program will only
|
|
execute a single subcommand, or possibly none (i.e. if just ``--help`` usage
|
|
info is requested). Top-level eager imports in such a program will result in the
|
|
import of many modules that will never be used at all; the time spent (possibly
|
|
compiling and) executing these modules is pure waste.
|
|
|
|
To improve startup time, some large Python CLIs make imports lazy by manually
|
|
placing imports inline into functions to delay imports of expensive subsystems.
|
|
This manual approach is labor-intensive and fragile; one misplaced import or
|
|
refactor can easily undo painstaking optimization work.
|
|
|
|
The Python standard library already includes built-in support for lazy imports,
|
|
via `importlib.util.LazyLoader
|
|
<https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader>`_.
|
|
There are also third-party packages such as `demandimport
|
|
<https://github.com/bwesterb/py-demandimport/>`_. These provide a "lazy module
|
|
object" which delays its own import until first attribute access. This is not
|
|
sufficient to make all imports lazy: imports such as ``from foo import a, b``
|
|
will still eagerly import the module ``foo`` since they immediately access an
|
|
attribute from it. It also imposes noticeable runtime overhead on every module
|
|
attribute access, since it requires a Python-level ``__getattr__`` or
|
|
``__getattribute__`` implementation.
|
|
|
|
Authors of scientific Python packages have also made extensive use of lazy
|
|
imports to allow users to write e.g. ``import scipy as sp`` and then easily
|
|
access many different submodules with e.g. ``sp.linalg``, without requiring all
|
|
the many submodules to be imported up-front. `SPEC 1
|
|
<https://scientific-python.org/specs/spec-0001/>`_ codifies this practice in the
|
|
form of a ``lazy_loader`` library that can be used explicitly in a package
|
|
``__init__.py`` to provide lazily accessible submodules.
|
|
|
|
Users of static typing also have to import names for use in type annotations
|
|
that may never be used at runtime (if :pep:`563` or possibly in future
|
|
:pep:`649` are used to avoid eager runtime evaluation of annotations). Lazy
|
|
imports are very attractive in this scenario to avoid overhead of unneeded
|
|
imports.
|
|
|
|
This PEP proposes a more general and comprehensive solution for lazy imports
|
|
that can encompass all of the above use cases and does not impose detectable
|
|
overhead in real-world use. The implementation in this PEP has already
|
|
`demonstrated
|
|
<https://github.com/facebookincubator/cinder/blob/cinder/3.8/CinderDoc/lazy_imports.rst>`_
|
|
startup time improvements up to 70% and memory-use reductions up to 40% on
|
|
real-world Python CLIs.
|
|
|
|
Lazy imports also eliminate most import cycles. With eager imports, "false
|
|
cycles" can easily occur which are fixed by simply moving an import to the
|
|
bottom of a module or inline into a function, or switching from ``from foo
|
|
import bar`` to ``import foo``. With lazy imports, these "cycles" just work.
|
|
The only cycles which will remain are those where two modules actually each use
|
|
a name from the other at module level; these "true" cycles are only fixable by
|
|
refactoring the classes or functions involved.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The aim of this feature is to make imports transparently lazy. "Lazy" means
|
|
that the import of a module (execution of the module body and addition of the
|
|
module object to ``sys.modules``) should not occur until the module (or a name
|
|
imported from it) is actually referenced during execution. "Transparent" means
|
|
that besides the delayed import (and necessarily observable effects of that,
|
|
such as delayed import side effects and changes to ``sys.modules``), there is
|
|
no other observable change in behavior: the imported object is present in the
|
|
module namespace as normal and is transparently loaded whenever first used: its
|
|
status as a "lazy imported object" is not directly observable from Python or
|
|
from C extension code.
|
|
|
|
The requirement that the imported object be present in the module namespace as
|
|
usual, even before the import has actually occurred, means that we need some
|
|
kind of "lazy object" placeholder to represent the not-yet-imported object.
|
|
The transparency requirement dictates that this placeholder must never be
|
|
visible to Python code; any reference to it must trigger the import and replace
|
|
it with the real imported object.
|
|
|
|
Given the possibility that Python (or C extension) code may pull objects
|
|
directly out of a module ``__dict__``, the only way to reliably prevent
|
|
accidental leakage of lazy objects is to have the dictionary itself be
|
|
responsible to ensure resolution of lazy objects on lookup.
|
|
|
|
When a lookup finds that the key references a lazy object, it resolves the lazy
|
|
object immediately before returning it. To avoid side effects mutating
|
|
dictionaries midway through iteration, all lazy objects in a dictionary are
|
|
resolved prior to starting an iteration; this could incur a performance penalty
|
|
when using bulk iterations (``iter(dict)``, ``reversed(dict)``,
|
|
``dict.__reversed__()``, ``dict.keys()``, ``iter(dict.keys())`` and
|
|
``reversed(dict.keys())``). To avoid this performance penalty on the vast
|
|
majority of dictionaries, which never contain any lazy objects, we steal a bit
|
|
from the ``dk_kind`` field for a new ``dk_lazy_imports`` flag to keep track of
|
|
whether a dictionary may contain lazy objects or not.
|
|
|
|
This implementation comprehensively prevents leakage of lazy objects, ensuring
|
|
they are always resolved to the real imported object before anyone can get hold
|
|
of them for any use, while avoiding any significant performance impact on
|
|
dictionaries in general.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Lazy imports are opt-in, and they can be globally enabled either via a new
|
|
``-L`` flag to the Python interpreter, or via a call to a new
|
|
``importlib.set_lazy_imports()`` function. This function takes two arguments, a
|
|
boolean ``enabled`` and an ``excluding`` container. If ``enabled`` is true, lazy
|
|
imports will be turned on from that point forward. If it is false, they will be
|
|
turned off from that point forward. (Use of the ``excluding`` keyword is
|
|
discussed below under "Per-module opt-out.")
|
|
|
|
When the flag ``-L`` is passed to the Python interpreter, a new
|
|
``sys.flags.lazy_imports`` is set to ``True``, otherwise it exists as ``False``.
|
|
This flag is used to propagate ``-L`` to new Python subprocesses.
|
|
|
|
The flag in ``sys.flags.lazy_imports`` does not necessarily reflect the current
|
|
status of lazy imports, only whether the interpreter was started with the ``-L``
|
|
option. Actual current status of whether lazy imports are enabled or not at any
|
|
moment can be retrieved using ``importlib.is_lazy_imports_enabled()``, which
|
|
will return ``True`` if lazy imports are enabled at the call point or ``False``
|
|
otherwise.
|
|
|
|
When lazy imports are enabled, the loading and execution of all (and only)
|
|
top-level imports is deferred until the imported name is first used. This could
|
|
happen immediately (e.g. on the very next line after the import statement) or
|
|
much later (e.g. while using the name inside a function being called by some
|
|
other code at some later time.)
|
|
|
|
For these top level imports, there are two contexts which will make them eager
|
|
(not lazy): imports inside ``try`` / ``except`` / ``finally`` or ``with``
|
|
blocks, and star imports (``from foo import *``.) Imports inside
|
|
exception-handling blocks (this includes ``with`` blocks, since those can also
|
|
"catch" and handle exceptions) remain eager so that any exceptions arising from
|
|
the import can be handled. Star imports must remain eager since performing the
|
|
import is the only way to know which names should be added to the namespace.
|
|
|
|
Imports inside class definitions or inside functions/methods are not "top
|
|
level" and are never lazy.
|
|
|
|
Dynamic imports using ``__import__()`` or ``importlib.import_module()`` are
|
|
also never lazy.
|
|
|
|
Lazy imports state (i.e. whether they have been enabled, and any excluded
|
|
modules; see below) is per-interpreter, but global within the interpreter (i.e.
|
|
all threads will be affected).
|
|
|
|
|
|
Example
|
|
-------
|
|
|
|
Say we have a module ``spam.py``::
|
|
|
|
# simulate some work
|
|
import time
|
|
time.sleep(10)
|
|
print("spam loaded")
|
|
|
|
And a module ``eggs.py`` which imports it::
|
|
|
|
import spam
|
|
print("imports done")
|
|
|
|
If we run ``python -L eggs.py``, the ``spam`` module will never be imported
|
|
(because it is never referenced after the import), ``"spam loaded"`` will never
|
|
be printed, and there will be no 10 second delay.
|
|
|
|
But if ``eggs.py`` simply references the name ``spam`` after importing it, that
|
|
will be enough to trigger the import of ``spam.py``::
|
|
|
|
import spam
|
|
print("imports done")
|
|
spam
|
|
|
|
Now if we run ``python -L eggs.py``, we will see the output ``"imports done"``
|
|
printed first, then a 10 second delay, and then ``"spam loaded"`` printed after
|
|
that.
|
|
|
|
Of course, in real use cases (especially with lazy imports), it's not
|
|
recommended to rely on import side effects like this to trigger real work. This
|
|
example is just to clarify the behavior of lazy imports.
|
|
|
|
Another way to explain the effect of lazy imports is that it is as if each lazy
|
|
import statement had instead been written inline in the source code immediately
|
|
before each use of the imported name. So one can think of lazy imports as
|
|
similar to transforming this code::
|
|
|
|
import foo
|
|
|
|
def func1():
|
|
return foo.bar()
|
|
|
|
def func2():
|
|
return foo.baz()
|
|
|
|
To this::
|
|
|
|
def func1():
|
|
import foo
|
|
return foo.bar()
|
|
|
|
def func2():
|
|
import foo
|
|
return foo.baz()
|
|
|
|
This gives a good sense of when the import of ``foo`` will occur under lazy
|
|
imports, but lazy import is not really equivalent to this code transformation.
|
|
There are several notable differences:
|
|
|
|
* Unlike in the latter code, under lazy imports the name ``foo`` still does
|
|
exist in the module's global namespace, and can be imported or referenced by
|
|
other modules that import this one. (Such references would also trigger the
|
|
import.)
|
|
|
|
* The runtime overhead of lazy imports is much lower than the latter code; after
|
|
the first reference to the name ``foo`` which triggers the import, subsequent
|
|
references will have zero import system overhead; they are indistinguishable
|
|
from a normal name reference.
|
|
|
|
In a sense, lazy imports turn the import statement into just a declaration of an
|
|
imported name or names, to later be fully resolved when referenced.
|
|
|
|
An import in the style ``from foo import bar`` can also be made lazy. When the
|
|
import occurs, the name ``bar`` will be added to the module namespace as a lazy
|
|
import. The first reference to ``bar`` will import ``foo`` and resolve ``bar``
|
|
to ``foo.bar``.
|
|
|
|
|
|
Intended usage
|
|
--------------
|
|
|
|
Since lazy imports are a potentially-breaking semantic change, they should be
|
|
enabled only by the author or maintainer of a Python application, who is
|
|
prepared to thoroughly test the application under the new semantics, ensure it
|
|
behaves as expected, and opt-out any specific imports as needed (see below).
|
|
Lazy imports should not be enabled speculatively by the end user of a Python
|
|
application with any expectation of success.
|
|
|
|
It is the responsibility of the application developer enabling lazy imports for
|
|
their application to opt-out any library imports that turn out to need to be
|
|
eager for their application to work correctly; it is not the responsibility of
|
|
library authors to ensure that their library behaves exactly the same under lazy
|
|
imports.
|
|
|
|
The documentation of the feature, the ``-L`` flag, and the new ``importlib``
|
|
APIs will be clear about the intended usage and the risks of adoption without
|
|
testing.
|
|
|
|
|
|
Implementation
|
|
--------------
|
|
|
|
Lazy imports are represented internally by a "lazy import" object. When a lazy
|
|
import occurs (say ``import foo`` or ``from foo import bar``), the key ``"foo"``
|
|
or ``"bar"`` is immediately added to the module namespace dictionary, but with
|
|
its value set to an internal-only "lazy import" object that preserves all the
|
|
necessary metadata to execute the import later.
|
|
|
|
A new boolean flag in ``PyDictKeysObject`` (``dk_lazy_imports``) is set to
|
|
signal that this particular dictionary may contain lazy import objects. This
|
|
flag is only used to efficiently resolve all lazy objects in "bulk" operations,
|
|
when a dictionary may contain lazy objects.
|
|
|
|
Anytime a key is looked up in a dictionary to extract its value, the
|
|
value is checked to see if it is a lazy import object. If so, the lazy object
|
|
is immediately resolved, the relevant imported modules executed, the lazy
|
|
import object is replaced in the dictionary (if possible) by the actual
|
|
imported value, and the resolved value is returned from the lookup function. A
|
|
dictionary could mutate as part of an import side effect while resolving a lazy
|
|
import object. In this case it is not possible to efficiently replace the key
|
|
value with the resolved object. In this case, the lazy import object will gain
|
|
a cached pointer to the resolved object. On next access that cached reference
|
|
will be returned and the lazy import object will be replaced in the dict with
|
|
the resolved value.
|
|
|
|
Because this is all handled internally by the dictionary implementation, lazy
|
|
import objects can never escape from the module namespace to become visible to
|
|
Python code; they are always resolved at their first reference. No stub, dummy
|
|
or thunk objects are ever visible to Python code or placed in ``sys.modules``.
|
|
If a module is imported lazily, no entry for it will appear in ``sys.modules``
|
|
at all until it is actually imported on first reference.
|
|
|
|
If two different modules (``moda`` and ``modb``) both contain a lazy ``import
|
|
foo``, each module's namespace dictionary will have an independent lazy import
|
|
object under the key ``"foo"``, delaying import of the same ``foo`` module. This
|
|
is not a problem. When there is first a reference to, say, ``moda.foo``, the
|
|
module ``foo`` will be imported and placed in ``sys.modules`` as usual, and the
|
|
lazy object under the key ``moda.__dict__["foo"]`` will be replaced by the
|
|
actual module ``foo``. At this point ``modb.__dict__["foo"]`` will remain a lazy
|
|
import object. When ``modb.foo`` is later referenced, it will also try to
|
|
``import foo``. This import will find the module already present in
|
|
``sys.modules``, as is normal for subsequent imports of the same module in
|
|
Python, and at this point will replace the lazy import object at
|
|
``modb.__dict__["foo"]`` with the actual module ``foo``.
|
|
|
|
There are two cases in which a lazy import object can escape a dictionary:
|
|
|
|
* Into another dictionary: to preserve the performance of bulk-copy operations
|
|
like ``dict.update()`` and ``dict.copy()``, they do not check for or resolve
|
|
lazy import objects. However, if the source dict has the ``dk_lazy_imports``
|
|
flag set that indicates it may contain lazy objects, that flag will be
|
|
passed on to the updated/copied dictionary. This still ensures that the lazy
|
|
import object can't escape into Python code without being resolved.
|
|
|
|
* Through the garbage collector: lazy imported objects are still Python objects
|
|
and live within the garbage collector; as such, they can be collected and seen
|
|
via e.g. ``gc.get_objects()``. If a lazy object becomes
|
|
visible to Python code in this way, it is opaque and inert; it has no useful
|
|
methods or attributes. A ``repr()`` of it would be shown as something like:
|
|
``<lazy_object 'fully.qualified.name'>``.
|
|
|
|
When a lazy object is added to a dictionary, the flag ``dk_lazy_imports`` is set.
|
|
Once set, the flag is only cleared if *all* lazy import objects in the
|
|
dictionary are resolved, e.g. prior to dictionary iteration.
|
|
|
|
All dictionary iteration methods involving values (such as ``dict.items()``,
|
|
``dict.values()``, ``PyDict_Next()`` etc.) will attempt to resolve *all* lazy
|
|
import objects in the dictionary prior to starting the iteration. Since only
|
|
(some) module namespace dictionaries will ever have ``dk_lazy_imports`` set, the
|
|
extra overhead of resolving all lazy import objects inside a dictionary is only
|
|
paid by those dictionaries that need it. Minimizing the overhead on normal
|
|
non-lazy dictionaries is the sole purpose of the ``dk_lazy_imports`` flag.
|
|
|
|
``PyDict_Next`` will attempt to resolve all lazy import objects the first time
|
|
position ``0`` is accessed, and those imports could fail with exceptions. Since
|
|
``PyDict_Next`` cannot set an exception, ``PyDict_Next`` will return ``0``
|
|
immediately in this case, and any exception will be printed to stderr as an
|
|
unraisable exception.
|
|
|
|
For this reason, this PEP introduces ``PyDict_NextWithError``, which works in
|
|
the same way as ``PyDict_Next``, but which can set an error when returning ``0``
|
|
and this should be checked via ``PyErr_Occurred()`` after the call.
|
|
|
|
The eagerness of imports within ``try`` / ``except`` / ``with`` blocks or within
|
|
class or function bodies is handled in the compiler via a new
|
|
``EAGER_IMPORT_NAME`` opcode that always imports eagerly. Top-level imports use
|
|
``IMPORT_NAME``, which may be lazy or eager depending on ``-L`` and/or
|
|
``importlib.set_lazy_imports()``.
|
|
|
|
|
|
Debugging
|
|
---------
|
|
|
|
Debug logging from ``python -v`` will include logging whenever an import
|
|
statement has been encountered but execution of the import will be deferred.
|
|
|
|
Python's ``-X importtime`` feature for profiling import costs adapts naturally
|
|
to lazy imports; the profiled time is the time spent actually importing.
|
|
|
|
Although lazy import objects are not generally visible to Python code, in some
|
|
debugging cases it may be useful to check from Python code whether the value at
|
|
a given key in a given dictionary is a lazy import object, without triggering
|
|
its resolution. For this purpose, ``importlib.is_lazy_import()`` can be used::
|
|
|
|
from importlib import is_lazy_import
|
|
|
|
import foo
|
|
|
|
is_lazy_import(globals(), "foo")
|
|
|
|
foo
|
|
|
|
is_lazy_import(globals(), "foo")
|
|
|
|
In this example, if lazy imports have been enabled the first call to
|
|
``is_lazy_import`` will return ``True`` and the second will return ``False``.
|
|
|
|
|
|
Per-module opt-out
|
|
------------------
|
|
|
|
Due to the backwards compatibility issues mentioned below, it may be necessary
|
|
for an application using lazy imports to force some imports to be eager.
|
|
|
|
In first-party code, since imports inside a ``try`` or ``with`` block are never
|
|
lazy, this can be easily accomplished::
|
|
|
|
try: # force these imports to be eager
|
|
import foo
|
|
import bar
|
|
finally:
|
|
pass
|
|
|
|
This PEP proposes to add a new ``importlib.eager_imports()`` context manager,
|
|
so the above technique can be less verbose and doesn't require comments to
|
|
clarify its intent::
|
|
|
|
from importlib import eager_imports
|
|
|
|
with eager_imports():
|
|
import foo
|
|
import bar
|
|
|
|
Since imports within context managers are always eager, the ``eager_imports()``
|
|
context manager can just be an alias to a null context manager. The context
|
|
manager's effect is not transitive: ``foo`` and ``bar`` will be imported
|
|
eagerly, but imports within those modules will still follow the usual laziness
|
|
rules.
|
|
|
|
The more difficult case can occur if an import in third-party code that can't
|
|
easily be modified must be forced to be eager. For this purpose,
|
|
``importlib.set_lazy_imports()`` takes a second optional keyword-only
|
|
``excluding`` argument, which can be set to a container of module names within
|
|
which all imports will be eager::
|
|
|
|
from importlib import set_lazy_imports
|
|
|
|
set_lazy_imports(excluding=["one.mod", "another"])
|
|
|
|
The effect of this is also shallow: all imports within ``one.mod`` will be
|
|
eager, but not imports in all modules imported by ``one.mod``.
|
|
|
|
The ``excluding`` parameter of ``set_lazy_imports()`` can be a container of any
|
|
kind that will be checked to see whether it contains a module name. If the
|
|
module name is contained in the object, imports within it will be eager. Thus,
|
|
arbitrary opt-out logic can be encoded in a ``__contains__`` method::
|
|
|
|
import re
|
|
from importlib import set_lazy_imports
|
|
|
|
class Checker:
|
|
def __contains__(self, name):
|
|
return re.match(r"foo\.[^.]+\.logger", name)
|
|
|
|
set_lazy_imports(excluding=Checker())
|
|
|
|
If Python was executed with the ``-L`` flag, then lazy imports will already be
|
|
globally enabled, and the only effect of calling ``set_lazy_imports(True,
|
|
excluding=...)`` will be to globally set the eager module names/callback. If
|
|
``set_lazy_imports(True)`` is called with no ``excluding`` argument, the
|
|
exclusion list/callback will be cleared and all eligible imports (module-level
|
|
imports not in ``try/except/with``, and not ``import *``) will be lazy from that
|
|
point forward.
|
|
|
|
This opt-out system is designed to maintain the possibility of local reasoning
|
|
about the laziness of an import. You only need to see the code of one module,
|
|
and the ``excluding`` argument to ``set_lazy_imports``, if any, to know whether
|
|
a given import will be eager or lazy.
|
|
|
|
|
|
Testing
|
|
-------
|
|
|
|
The CPython test suite will pass with lazy imports enabled (with some tests
|
|
skipped). One buildbot should run the test suite with lazy imports enabled.
|
|
|
|
|
|
C API
|
|
-----
|
|
|
|
For authors of C extension modules, the proposed public C API is as follows:
|
|
|
|
.. list-table::
|
|
:widths: 50 50
|
|
:header-rows: 1
|
|
|
|
* - C API
|
|
- Python API
|
|
* - ``PyObject *PyImport_SetLazyImports(PyObject *enabled, PyObject *excluding)``
|
|
- ``importlib.set_lazy_imports(enabled: bool = True, *, excluding: typing.Container[str] | None = None)``
|
|
* - ``int PyDict_IsLazyImport(PyObject *dict, PyObject *name)``
|
|
- ``importlib.is_lazy_import(dict: typing.Dict[str, object], name: str) -> bool``
|
|
* - ``int PyImport_IsLazyImportsEnabled()``
|
|
- ``importlib.is_lazy_imports_enabled() -> bool``
|
|
* - ``void PyDict_ResolveLazyImports(PyObject *dict)``
|
|
-
|
|
* - ``PyDict_NextWithError()``
|
|
-
|
|
|
|
* ``void PyDict_ResolveLazyImports(PyObject *dict)`` resolves all lazy objects
|
|
in a dictionary, if any. To be used prior calling ``PyDict_NextWithError()``
|
|
or ``PyDict_Next()``.
|
|
|
|
* ``PyDict_NextWithError()``, works the same way as ``PyDict_Next()``, with
|
|
the exception it propagates any errors to the caller by returning ``0`` and
|
|
setting an exception. The caller should use ``PyErr_Occurred()`` to check for any
|
|
errors.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
This proposal preserves full backwards compatibility when the feature is
|
|
disabled, which is the default.
|
|
|
|
Even when enabled, most code will continue to work normally without any
|
|
observable change (other than improved startup time and memory usage.)
|
|
Namespace packages are not affected: they work just as they do currently,
|
|
except lazily.
|
|
|
|
In some existing code, lazy imports could produce currently unexpected results
|
|
and behaviors. The problems that we may see when enabling lazy imports in an
|
|
existing codebase are related to:
|
|
|
|
|
|
Import Side Effects
|
|
-------------------
|
|
|
|
Import side effects that would otherwise be produced by the execution of
|
|
imported modules during the execution of import statements will be deferred
|
|
until the imported objects are used.
|
|
|
|
These import side effects may include:
|
|
|
|
* code executing any side-effecting logic during import;
|
|
* relying on imported submodules being set as attributes in the parent module.
|
|
|
|
A relevant and typical affected case is the `click
|
|
<https://click.palletsprojects.com/>`_ library for building Python command-line
|
|
interfaces. If e.g. ``cli = click.group()`` is defined in ``main.py``, and
|
|
``sub.py`` imports ``cli`` from ``main`` and adds subcommands to it via
|
|
decorator (``@cli.command(...)``), but the actual ``cli()`` call is in
|
|
``main.py``, then lazy imports may prevent the subcommands from being
|
|
registered, since in this case Click is depending on side effects of the import
|
|
of ``sub.py``. In this case the fix is to ensure the import of ``sub.py`` is
|
|
eager, e.g. by using the ``importlib.eager_imports()`` context manager.
|
|
|
|
|
|
Dynamic Paths
|
|
-------------
|
|
|
|
There could be issues related to dynamic Python import paths; particularly,
|
|
adding (and then removing after the import) paths from ``sys.path``::
|
|
|
|
sys.path.insert(0, "/path/to/foo/module")
|
|
import foo
|
|
del sys.path[0]
|
|
foo.Bar()
|
|
|
|
In this case, with lazy imports enabled, the import of ``foo`` will not actually
|
|
occur while the addition to ``sys.path`` is present.
|
|
|
|
An easy fix for this (which also improves the code style and ensures cleanup)
|
|
would be to place the ``sys.path`` modifications in a context manager. This
|
|
resolves the issue, since imports inside a ``with`` block are always eager.
|
|
|
|
|
|
Deferred Exceptions
|
|
-------------------
|
|
|
|
Exceptions that occur during a lazy import bubble up and erase the
|
|
partially-constructed module(s) from ``sys.modules``, just as exceptions during
|
|
normal import do.
|
|
|
|
Since errors raised during a lazy import will occur later than they would if
|
|
the import were eager (i.e. wherever the name is first referenced), it is also
|
|
possible that they could be accidentally caught by exception handlers that did
|
|
not expect the import to be running within their ``try`` block, leading to
|
|
confusion.
|
|
|
|
|
|
Drawbacks
|
|
=========
|
|
|
|
Downsides of this PEP include:
|
|
|
|
* It provides a subtly incompatible semantics for the behavior of Python
|
|
imports. This is a potential burden on library authors who may be asked by their
|
|
users to support both semantics, and is one more possibility for Python
|
|
users/readers to be aware of.
|
|
|
|
* Some popular Python coding patterns (notably centralized registries populated
|
|
by a decorator) rely on import side effects and may require explicit opt-out to
|
|
work as expected with lazy imports.
|
|
|
|
* Exceptions can be raised at any point while accessing names representing lazy
|
|
imports, this could lead to confusion and debugging of unexpected exceptions.
|
|
|
|
Lazy import semantics are already possible and even supported today in the
|
|
Python standard library, so these drawbacks are not newly introduced by this
|
|
PEP. So far, existing usage of lazy imports by some applications has not proven
|
|
a problem. But this PEP could make the usage of lazy imports more popular,
|
|
potentially exacerbating these drawbacks.
|
|
|
|
These drawbacks must be weighed against the significant benefits offered by this
|
|
PEP's implementation of lazy imports. Ultimately these costs will be higher if
|
|
the feature is widely used; but wide usage also indicates the feature provides a
|
|
lot of value, perhaps justifying the costs.
|
|
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
Deferred execution of code could produce security concerns if process owner,
|
|
shell path, ``sys.path``, or other sensitive environment or contextual states
|
|
change between the time the ``import`` statement is executed and the time the
|
|
imported object is first referenced.
|
|
|
|
|
|
Performance Impact
|
|
==================
|
|
|
|
The reference implementation has shown that the feature has negligible
|
|
performance impact on existing real-world codebases (Instagram Server, several
|
|
CLI programs at Meta, Jupyter notebooks used by Meta researchers), while
|
|
providing substantial improvements to startup time and memory usage.
|
|
|
|
The reference implementation shows `no measurable change
|
|
<https://gist.github.com/ericsnowcurrently/d027ff4130dedec3b58ab1f55be11e8c>`_
|
|
in aggregate performance on the `pyperformance benchmark suite
|
|
<https://github.com/python/pyperformance>`_.
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
Since the feature is opt-in, beginners should not encounter it by default.
|
|
Documentation of the ``-L`` flag and ``importlib.set_lazy_imports()`` can
|
|
clarify the behavior of lazy imports.
|
|
|
|
The documentation should also clarify that opting into lazy imports is opting
|
|
into a non-standard semantics for Python imports, which could cause Python
|
|
libraries to break in unexpected ways. The responsibility to identify these
|
|
breakages and work around them with an opt-out (or stop using lazy imports)
|
|
rests entirely with the person choosing to enable lazy imports for their
|
|
application, not with the library author. Python libraries are under no
|
|
obligation to support lazy import semantics. Politely reporting an
|
|
incompatibility may be useful to the library author, but they may choose to
|
|
simply say their library does not support use with lazy imports, and this is a
|
|
valid choice.
|
|
|
|
Some best practices to deal with some of the issues that could arise and to
|
|
better take advantage of lazy imports are:
|
|
|
|
* Avoid relying on import side effects. Perhaps the most common reliance on
|
|
import side effects is the registry pattern, where population of some external
|
|
registry happens implicitly during the importing of modules, often via
|
|
decorators. Instead, the registry should be built via an explicit call that does
|
|
a discovery process to find decorated functions or classes in explicitly
|
|
nominated modules.
|
|
|
|
* Always import needed submodules explicitly, don't rely on some other import
|
|
to ensure a module has its submodules as attributes. That is, unless there is an
|
|
explicit ``from . import bar`` in ``foo/__init__.py``, always do ``import
|
|
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only works
|
|
(unreliably) because the attribute ``foo.bar`` is added as a side effect of
|
|
``foo.bar`` being imported somewhere else. With lazy imports this may not always
|
|
happen in time.
|
|
|
|
* Avoid using star imports, as those are always eager.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The initial implementation is available as part of `Cinder
|
|
<https://github.com/facebookincubator/cinder>`_. This reference implementation
|
|
is in use within Meta and has proven to achieve improvements in startup time
|
|
(and total runtime for some applications) in the range of 40%-70%, as well as
|
|
significant reduction in memory footprint (up to 40%), thanks to not needing to
|
|
execute imports that end up being unused in the common flow.
|
|
|
|
An `updated reference implementation based on CPython main branch
|
|
<https://github.com/Kronuz/cpython/pull/17>`_ is also available.
|
|
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
Wrapping deferred exceptions
|
|
----------------------------
|
|
|
|
To reduce the potential for confusion, exceptions raised in the
|
|
course of executing a lazy import could be replaced by a ``LazyImportError``
|
|
exception (a subclass of ``ImportError``), with a ``__cause__`` set to the
|
|
original exception.
|
|
|
|
Ensuring that all lazy import errors are raised as ``LazyImportError`` would
|
|
reduce the likelihood that they would be accidentally caught and mistaken for a
|
|
different expected exception. However, in practice we have seen cases, e.g.
|
|
inside tests, where failing modules raise ``unittest.SkipTest`` exception and
|
|
this too would end up being wrapped in ``LazyImportError``, making such tests
|
|
fail because the true exception type is hidden. The drawbacks here seem to
|
|
outweigh the hypothetical case where unexpected deferred exceptions are caught
|
|
by mistake.
|
|
|
|
|
|
Per-module opt-in
|
|
-----------------
|
|
|
|
A per-module opt-in using future imports (i.e.
|
|
``from __future__ import lazy_imports``) does not make sense because
|
|
``__future__`` imports are not feature flags, they are for transition to
|
|
behaviors which will become default in the future. It is not clear if lazy
|
|
imports will ever make sense as the default behavior, so we should not
|
|
promise this with a ``__future__`` import.
|
|
|
|
There are other cases where a library might desire to locally opt-in to lazy
|
|
imports for a particular module; e.g. a lazy top-level ``__init__.py`` for a
|
|
large library, to make its subcomponents accessible as lazy attributes. For now,
|
|
to keep the feature simpler, this PEP chooses to focus on the "application" use
|
|
case and does not address the library use case. The underlying laziness
|
|
mechanism introduced in this PEP could be used in the future to address this use
|
|
case as well.
|
|
|
|
|
|
Explicit syntax for individual lazy imports
|
|
-------------------------------------------
|
|
|
|
If the primary objective of lazy imports were solely to work around import
|
|
cycles and forward references, an explicitly-marked syntax for particular
|
|
targeted imports to be lazy would make a lot of sense. But in practice it would
|
|
be very hard to get robust startup time or memory use benefits from this
|
|
approach, since it would require converting most imports within your code base
|
|
(and in third-party dependencies) to use the lazy import syntax.
|
|
|
|
It would be possible to aim for a "shallow" laziness where only the top-level
|
|
imports of subsystems from the main module are made explicitly lazy, but then
|
|
imports within the subsystems are all eager. This is extremely fragile, though
|
|
-- it only takes one mis-placed import to undo the carefully constructed
|
|
shallow laziness. Globally enabling lazy imports, on the other hand, provides
|
|
in-depth robust laziness where you always pay only for the imports you use.
|
|
|
|
There may be use cases (e.g. for static typing) where individually-marked lazy
|
|
imports are desirable to avoid forward references, but the perf/memory benefits
|
|
of globally lazy imports are not needed. Since this is a different set of
|
|
motivating use cases and requires new syntax, we prefer not to include it in
|
|
this PEP. Another PEP could build on top of this implementation and propose the
|
|
additional syntax.
|
|
|
|
|
|
Environment variable to enable lazy imports
|
|
-------------------------------------------
|
|
|
|
Providing an environment variable opt-in lends itself too easily to abuse of the
|
|
feature. It may seem tempting for a Python user to, for instance, globally set
|
|
the environment variable in their shell in the hopes of speeding up all the
|
|
Python programs they run. This usage with untested programs is likely to lead to
|
|
spurious bug reports and maintenance burden for the authors of those tools. To
|
|
avoid this, we choose not to provide an environment variable opt-in at all.
|
|
|
|
|
|
Removing the ``-L`` flag
|
|
------------------------
|
|
|
|
We do provide the ``-L`` CLI flag, which could in theory be abused in a similar
|
|
way by an end user running an individual Python program that is run with
|
|
``python somescript.py`` or ``python -m somescript`` (rather than distributed
|
|
via Python packaging tools). But the potential scope for misuse is much less
|
|
with ``-L`` than an environment variable, and ``-L`` is valuable for some
|
|
applications to maximize startup time benefits by ensuring that all imports from
|
|
the start of a process will be lazy, so we choose to keep it.
|
|
|
|
It is already the case that running arbitrary Python programs with command line
|
|
flags they weren't intended to be used with (e.g. ``-s``, ``-S``, ``-E``, or
|
|
``-I``) can have unexpected and breaking results. ``-L`` is nothing new in this
|
|
regard.
|
|
|
|
|
|
Half-lazy imports
|
|
-----------------
|
|
|
|
It would be possible to eagerly run the import loader to the point of finding
|
|
the module source, but then defer the actual execution of the module and
|
|
creation of the module object. The advantage of this would be that certain
|
|
classes of import errors (e.g. a simple typo in the module name) would be
|
|
caught eagerly instead of being deferred to the use of an imported name.
|
|
|
|
The disadvantage would be that the startup time benefits of lazy imports would
|
|
be significantly reduced, since unused imports would still require a filesystem
|
|
``stat()`` call, at least. It would also introduce a possibly non-obvious split
|
|
between *which* import errors are raised eagerly and which are delayed, when
|
|
lazy imports are enabled.
|
|
|
|
This idea is rejected for now on the basis that in practice, confusion about
|
|
import typos has not been an observed problem with the reference
|
|
implementation. Generally delayed imports are not delayed forever, and errors
|
|
show up soon enough to be caught and fixed (unless the import is truly unused.)
|
|
|
|
Another possible motivation for half-lazy imports would be to allow modules
|
|
themselves to control via some flag whether they are imported lazily or eagerly.
|
|
This is rejected both on the basis that it requires half-lazy imports, giving up
|
|
some of the performance benefits of import laziness, and because in general
|
|
modules do not decide how or when they are imported, the module importing them
|
|
decides that. There isn't clear rationale for this PEP to invert that control;
|
|
instead it just provides more options for the importing code to make the
|
|
decision.
|
|
|
|
|
|
Lazy dynamic imports
|
|
--------------------
|
|
|
|
It would be possible to add a ``lazy=True`` or similar option to
|
|
``__import__()`` and/or ``importlib.import_module()``, to enable them to
|
|
perform lazy imports. That idea is rejected in this PEP for lack of a clear
|
|
use case. Dynamic imports are already far outside the :pep:`8` code style
|
|
recommendations for imports, and can easily be made precisely as lazy as
|
|
desired by placing them at the desired point in the code flow. These aren't
|
|
commonly used at module top level, which is where lazy imports applies.
|
|
|
|
|
|
Deep eager-imports override
|
|
---------------------------
|
|
|
|
The proposed ``importlib.eager_imports()`` context manager and excluded modules
|
|
in the ``importlib.set_lazy_imports(excluding=...)`` override all have shallow
|
|
effects: they only force eagerness for the location they are applied to, not
|
|
transitively. It would be possible to provide a deep/transitive version of one
|
|
or both. That idea is rejected in this PEP because the implementation would be
|
|
complex (taking into account threads and async code), experience with the
|
|
reference implementation has not shown it to be necessary, and because it
|
|
prevents local reasoning about laziness of imports.
|
|
|
|
A deep override can lead to confusing behavior because the
|
|
transitively-imported modules may be imported from multiple locations, some of
|
|
which use the "deep eager override" and some of which don't. Thus those modules
|
|
may still be imported lazily initially, if they are first imported from a
|
|
location that doesn't have the override.
|
|
|
|
With deep overrides it is not possible to locally reason about whether a given
|
|
import will be lazy or eager. With the behavior specified in this PEP, such
|
|
local reasoning is possible.
|
|
|
|
|
|
Making lazy imports the default behavior
|
|
----------------------------------------
|
|
|
|
Making lazy imports the default/sole behavior of Python imports, instead of
|
|
opt-in, would have some long-term benefits, in that library authors would
|
|
(eventually) no longer need to consider the possibility of both semantics.
|
|
|
|
However, the backwards-incompatibilies are such that this could only be
|
|
considered over a long time frame, with a ``__future__`` import. It is not at
|
|
all clear that lazy imports should become the default import semantics for
|
|
Python.
|
|
|
|
This PEP takes the position that the Python community needs more experience with
|
|
lazy imports before considering making it the default behavior, so that is
|
|
entirely left to a possible future PEP.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|