PEP 690: Update draft per discussion and feedback (#2613)
This commit is contained in:
parent
6a8349343c
commit
6a5ee970ff
571
pep-0690.rst
571
pep-0690.rst
|
@ -16,7 +16,7 @@ Abstract
|
|||
========
|
||||
|
||||
This PEP proposes a feature to transparently defer the execution of imported
|
||||
modules until the moment when an imported object is used. Since Python
|
||||
modules until the moment when an imported object is first used. Since Python
|
||||
programs commonly import many more modules than a single invocation of the
|
||||
program is likely to use in practice, lazy imports can greatly reduce the
|
||||
overall number of modules loaded, improving startup time and memory usage. Lazy
|
||||
|
@ -33,33 +33,52 @@ system at runtime. This means that importing the main module of a program
|
|||
typically results in an immediate cascade of imports of most or all of the
|
||||
modules that may ever be needed by the program.
|
||||
|
||||
Consider the example of a Python command line program with a number of
|
||||
Consider the example of a Python command line program (CLI) with a number of
|
||||
subcommands. Each subcommand may perform different tasks, requiring the import
|
||||
of different dependencies. But a given invocation of the program will only
|
||||
execute a single subcommand, or possibly none (i.e. if just ``--help`` usage
|
||||
info is requested). Top-level eager imports in such a program will result in
|
||||
the import of many modules that will never be used at all; the time spent
|
||||
(possibly compiling and) executing these modules is pure waste.
|
||||
info is requested). Top-level eager imports in such a program will result in the
|
||||
import of many modules that will never be used at all; the time spent (possibly
|
||||
compiling and) executing these modules is pure waste.
|
||||
|
||||
In an effort to improve startup time, some large Python CLIs tools make imports
|
||||
lazy by manually placing imports inline into functions to delay imports of
|
||||
expensive subsystems. This manual approach is labor-intensive and fragile; one
|
||||
misplaced import or refactor can easily undo painstaking optimization work.
|
||||
To improve startup time, some large Python CLIs make imports lazy by manually
|
||||
placing imports inline into functions to delay imports of expensive subsystems.
|
||||
This manual approach is labor-intensive and fragile; one misplaced import or
|
||||
refactor can easily undo painstaking optimization work.
|
||||
|
||||
Existing import-hook-based solutions such as `demandimport
|
||||
<https://github.com/bwesterb/py-demandimport/>`_ or `importlib.util.LazyLoader
|
||||
<https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader>`_
|
||||
are limited in that only certain styles of import can be made truly lazy
|
||||
(imports such as ``from foo import a, b`` will still eagerly import the module
|
||||
``foo``) and they impose additional runtime overhead on every module attribute
|
||||
access.
|
||||
The Python standard library already includes built-in support for lazy imports,
|
||||
via `importlib.util.LazyLoader
|
||||
<https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader>`_.
|
||||
There are also third-party packages such as `demandimport
|
||||
<https://github.com/bwesterb/py-demandimport/>`_. These provide a "lazy module
|
||||
object" which delays its own import until first attribute access. This is not
|
||||
sufficient to make all imports lazy: imports such as ``from foo import a, b``
|
||||
will still eagerly import the module ``foo`` since they immediately access an
|
||||
attribute from it. It also imposes noticeable runtime overhead on every module
|
||||
attribute access, since it requires a Python-level ``__getattr__`` or
|
||||
``__getattribute__`` implementation.
|
||||
|
||||
This PEP proposes a more comprehensive solution for lazy imports that does not
|
||||
impose detectable overhead in real-world use. The implementation in this PEP
|
||||
has already `demonstrated
|
||||
Authors of scientific Python packages have also made extensive use of lazy
|
||||
imports to allow users to write e.g. ``import scipy as sp`` and then easily
|
||||
access many different submodules with e.g. ``sp.linalg``, without requiring all
|
||||
the many submodules to be imported up-front. `SPEC 1
|
||||
<https://scientific-python.org/specs/spec-0001/>`_ codifies this practice in the
|
||||
form of a ``lazy_loader`` library that can be used explicitly in a package
|
||||
``__init__.py`` to provide lazily accessible submodules.
|
||||
|
||||
Users of static typing also have to import names for use in type annotations
|
||||
that may never be used at runtime (if :pep:`563` or possibly in future
|
||||
:pep:`649` are used to avoid eager runtime evaluation of annotations). Lazy
|
||||
imports are very attractive in this scenario to avoid overhead of unneeded
|
||||
imports.
|
||||
|
||||
This PEP proposes a more general and comprehensive solution for lazy imports
|
||||
that can encompass all of the above use cases and does not impose detectable
|
||||
overhead in real-world use. The implementation in this PEP has already
|
||||
`demonstrated
|
||||
<https://github.com/facebookincubator/cinder/blob/cinder/3.8/CinderDoc/lazy_imports.rst>`_
|
||||
startup time improvements up to 70% and memory-use reductions up to
|
||||
40% on real-world Python CLIs.
|
||||
startup time improvements up to 70% and memory-use reductions up to 40% on
|
||||
real-world Python CLIs.
|
||||
|
||||
Lazy imports also eliminate most import cycles. With eager imports, "false
|
||||
cycles" can easily occur which are fixed by simply moving an import to the
|
||||
|
@ -97,15 +116,11 @@ accidental leakage of lazy objects is to have the dictionary itself be
|
|||
responsible to ensure resolution of lazy objects on lookup.
|
||||
|
||||
To avoid a performance penalty on the vast majority of dictionaries which never
|
||||
contain any lazy objects, we install a specialized lookup function
|
||||
(``lookdict_unicode_lazy``) for module namespace dictionaries when they first
|
||||
gain a lazy-object value. When this lookup function finds that the key
|
||||
references a lazy object, it resolves the lazy object immediately before
|
||||
returning it.
|
||||
|
||||
Some operations on dictionaries (e.g. iterating all values) don't go through
|
||||
the lookup function; in these cases we have to add a check if the lookup
|
||||
function is ``lookdict_unicode_lazy`` and if so, resolve all lazy values first.
|
||||
contain any lazy objects, we set a specialized lookup kind (`DictKeysKind
|
||||
<https://github.com/python/cpython/blob/3.11/Include/internal/pycore_dict.h#L80>`_)
|
||||
for module namespace dictionaries when they first gain a lazy-object value. When
|
||||
a lazy lookup kind is set and lookup finds that the key references a lazy
|
||||
object, it resolves the lazy object immediately before returning it.
|
||||
|
||||
This implementation comprehensively prevents leakage of lazy objects, ensuring
|
||||
they are always resolved to the real imported object before anyone can get hold
|
||||
|
@ -116,17 +131,18 @@ dictionaries in general.
|
|||
Specification
|
||||
=============
|
||||
|
||||
Lazy imports are opt-in, and globally enabled via a new ``-L`` flag to the
|
||||
Python interpreter, or a ``PYTHONLAZYIMPORTS`` environment variable.
|
||||
Lazy imports are opt-in, and globally enabled either via a new ``-L`` flag to
|
||||
the Python interpreter or via a call to a new ``importlib.set_lazy_imports()``
|
||||
function, which makes all imports after the call lazy.
|
||||
|
||||
When enabled, the loading and execution of all (and only) top level imports is
|
||||
deferred until the imported name is used. This could happen immediately (e.g.
|
||||
on the very next line after the import statement) or much later (e.g. while
|
||||
using the name inside a function being called by some other code at some later
|
||||
time.)
|
||||
When enabled, the loading and execution of all (and only) top-level imports is
|
||||
deferred until the imported name is first used. This could happen immediately
|
||||
(e.g. on the very next line after the import statement) or much later (e.g.
|
||||
while using the name inside a function being called by some other code at some
|
||||
later time.)
|
||||
|
||||
For these top level imports, there are two exceptions which will make them
|
||||
eager (not lazy): imports inside ``try``/``except``/``finally`` or ``with``
|
||||
For these top level imports, there are two contexts which will make them eager
|
||||
(not lazy): imports inside ``try`` / ``except`` / ``finally`` or ``with``
|
||||
blocks, and star imports (``from foo import *``.) Imports inside
|
||||
exception-handling blocks (this includes ``with`` blocks, since those can also
|
||||
"catch" and handle exceptions) remain eager so that any exceptions arising from
|
||||
|
@ -139,6 +155,10 @@ level" and are never lazy.
|
|||
Dynamic imports using ``__import__()`` or ``importlib.import_module()`` are
|
||||
also never lazy.
|
||||
|
||||
Lazy imports state (i.e. whether they have been enabled, and any excluded
|
||||
modules; see below) is per-interpreter, but global within the interpreter (i.e.
|
||||
all threads will be affected).
|
||||
|
||||
|
||||
Example
|
||||
-------
|
||||
|
@ -174,27 +194,210 @@ Of course, in real use cases (especially with lazy imports), it's not
|
|||
recommended to rely on import side effects like this to trigger real work. This
|
||||
example is just to clarify the behavior of lazy imports.
|
||||
|
||||
Another way to explain the effect of lazy imports is that it is as if each lazy
|
||||
import statement had instead been written inline in the source code immediately
|
||||
before each use of the imported name. So one can think of lazy imports as
|
||||
similar to transforming this code::
|
||||
|
||||
Debuggability
|
||||
-------------
|
||||
import foo
|
||||
|
||||
The implementation will ensure that exceptions resulting from a deferred import
|
||||
have metadata attached pointing the user to the original import statement, to
|
||||
ease debuggability of errors from lazy imports.
|
||||
def func1():
|
||||
return foo.bar()
|
||||
|
||||
Additionally, debug logging from ``python -v`` will include logging when an
|
||||
import statement has been encountered but execution of the import will be
|
||||
deferred.
|
||||
def func2():
|
||||
return foo.baz()
|
||||
|
||||
To this::
|
||||
|
||||
def func1():
|
||||
import foo
|
||||
return foo.bar()
|
||||
|
||||
def func2():
|
||||
import foo
|
||||
return foo.baz()
|
||||
|
||||
This gives a good sense of when the import of ``foo`` will occur under lazy
|
||||
imports, but lazy import is not really equivalent to this code transformation.
|
||||
There are several notable differences:
|
||||
|
||||
* Unlike in the latter code, under lazy imports the name ``foo`` still does
|
||||
exist in the module's global namespace, and can be imported or referenced by
|
||||
other modules that import this one. (Such references would also trigger the
|
||||
import.)
|
||||
|
||||
* The runtime overhead of lazy imports is much lower than the latter code; after
|
||||
the first reference to the name ``foo`` which triggers the import, subsequent
|
||||
references will have zero import system overhead; they are indistinguishable
|
||||
from a normal name reference.
|
||||
|
||||
In a sense, lazy imports turn the import statement into just a declaration of an
|
||||
imported name or names, to later be fully resolved when referenced.
|
||||
|
||||
An import in the style ``from foo import bar`` can also be made lazy. When the
|
||||
import occurs, the name ``bar`` will be added to the module namespace as a lazy
|
||||
import. The first reference to ``bar`` will import ``foo`` and resolve ``bar``
|
||||
to ``foo.bar``.
|
||||
|
||||
|
||||
Intended usage
|
||||
--------------
|
||||
|
||||
Since lazy imports are a potentially-breaking semantic change, they should be
|
||||
enabled only by the author or maintainer of a Python application, who is
|
||||
prepared to thoroughly test the application under the new semantics, ensure it
|
||||
behaves as expected, and opt-out any specific imports as needed (see below).
|
||||
Lazy imports should not be enabled speculatively by the end user of a Python
|
||||
application with any expectation of success.
|
||||
|
||||
It is the responsibility of the application developer enabling lazy imports for
|
||||
their application to opt-out any library imports that turn out to need to be
|
||||
eager for their application to work correctly; it is not the responsibility of
|
||||
library authors to ensure that their library behaves exactly the same under lazy
|
||||
imports.
|
||||
|
||||
The documentation of the feature, the ``-L`` flag, and the new ``importlib``
|
||||
APIs will be clear about the intended usage and the risks of adoption without
|
||||
testing.
|
||||
|
||||
|
||||
Implementation
|
||||
--------------
|
||||
|
||||
Lazy imports are represented internally by a "lazy import" object. When a lazy
|
||||
import occurs (say ``import foo`` or ``from foo import bar``), the key ``"foo"``
|
||||
or ``"bar"`` is immediately added to the module namespace dictionary, but with
|
||||
its value set to an internal-only "lazy import" object that preserves all the
|
||||
necessary metadata to execute the import later. The ``DictKeysKind`` for the
|
||||
module namespace dictionary is updated from e.g. ``DICT_KEYS_UNICODE`` to
|
||||
``DICT_KEYS_UNICODE_LAZY`` to signal that this particular dictionary may contain
|
||||
lazy import objects.
|
||||
|
||||
(In case someone adds a non-unicode key to a module namespace dictionary also
|
||||
containing lazy import objects, e.g. via ``globals()[42] = "foo"``, there is
|
||||
also ``DICT_KEYS_GENERAL_LAZY``, but in most cases this is not needed.)
|
||||
|
||||
Anytime a key is looked up in a dictionary with ``DICT_KEYS_UNICODE_LAZY`` or
|
||||
``DICT_KEYS_GENERAL_LAZY``, the value is checked to see if it is a lazy import
|
||||
object. If so, the import is immediately executed, the lazy import object is
|
||||
replaced in the dictionary by the actual imported value, and the imported value
|
||||
is returned from the lookup.
|
||||
|
||||
Because this is all handled internally by the dictionary implementation, lazy
|
||||
import objects can never escape from the module namespace to become visible to
|
||||
Python code; they are always resolved at their first reference.
|
||||
|
||||
Since only (some) module namespace dictionaries will ever have
|
||||
``DICT_KEYS_*_LAZY`` set, the (minimal) extra lookup overhead to check for lazy
|
||||
import objects is only paid by those dictionaries that need it; other
|
||||
dictionaries have no added overhead.
|
||||
|
||||
No stub or dummy objects are ever visible to Python code or placed in
|
||||
``sys.modules``. Other than the delayed import, the implementation is
|
||||
transparent.
|
||||
|
||||
If a module is imported lazily, no entry for it will appear in ``sys.modules``
|
||||
at all until it is actually imported on first reference.
|
||||
|
||||
If two different modules (``moda`` and ``modb``) both contain a lazy ``import
|
||||
foo``, each module's namespace dictionary will have an independent lazy import
|
||||
object under the key ``"foo"``, delaying import of the same ``foo`` module. This
|
||||
is not a problem. When there is first a reference to, say, ``moda.foo``, the
|
||||
module ``foo`` will be imported and placed in ``sys.modules`` as usual, and the
|
||||
lazy object under the key ``moda.__dict__["foo"]`` will be replaced by the
|
||||
actual module ``foo``. At this point ``modb.__dict__["foo"]`` will remain a lazy
|
||||
import object. When ``modb.foo`` is later referenced, it will also try to
|
||||
``import foo``. This import will find the module already present in
|
||||
``sys.modules``, as is normal for subsequent imports of the same module in
|
||||
Python, and at this point will replace the lazy import object at
|
||||
``modb.__dict__["foo"]`` with the actual module ``foo``.
|
||||
|
||||
There is one case in which a lazy import object can escape one dictionary (but
|
||||
only into another dictionary) without being resolved. To preserve the
|
||||
performance of bulk-copy operations like ``dict.update()`` and ``dict.copy()``,
|
||||
they do not check for or resolve lazy import objects. However, if the source
|
||||
dict has a ``*_LAZY`` lookup kind set that indicates it might contain lazy
|
||||
objects, that lookup kind will be passed on to the updated/copied dictionary.
|
||||
This still ensures that the lazy import object can't escape into Python code
|
||||
without being resolved.
|
||||
|
||||
Other "bulk" dictionary lookup methods (such as ``dict.items()``,
|
||||
``dict.values()``, etc) will resolve all lazy import objects in the dictionary.
|
||||
Since it is uncommon for any of these to be used on a module namespace
|
||||
dictionary, the priority here is simplicity of implementation and minimizing the
|
||||
overhead on normal non-lazy dictionaries (just one check to see if the
|
||||
dictionary has a ``*_LAZY`` lookup kind).
|
||||
|
||||
The eagerness of imports within ``try`` / ``except`` / ``with`` blocks or within
|
||||
class or function bodies is handled in the compiler via a new
|
||||
``EAGER_IMPORT_NAME`` opcode that always imports eagerly. Top-level imports use
|
||||
``IMPORT_NAME``, which may be lazy or eager depending on ``-L`` and/or
|
||||
``importlib.set_lazy_imports()``.
|
||||
|
||||
|
||||
Exceptions
|
||||
----------
|
||||
|
||||
Exceptions that occur during a lazy import bubble up and erase the
|
||||
partially-constructed module(s) from ``sys.modules``, just as exceptions during
|
||||
normal import do.
|
||||
|
||||
Since errors raised during a lazy import will occur later (wherever the imported
|
||||
name is first referenced) than they would if the import were eager, it is
|
||||
possible that they could be accidentally caught by exception handlers that
|
||||
didn't expect the import to be running within their ``try`` block, leading to
|
||||
confusion. To reduce the potential for this confusion, exceptions raised in the
|
||||
course of executing a lazy import will be replaced by a ``LazyImportError``
|
||||
exception (a subclass of ``ImportError``), with ``__cause__`` set to the
|
||||
original exception.
|
||||
|
||||
The ``LazyImportError`` will have source location metadata attached pointing the
|
||||
user to the original import statement, to ease debuggability of errors from lazy
|
||||
imports. (It won't have a full traceback to the original import location; this
|
||||
is too expensive to preserve for all lazy imports, and it's not clear that it
|
||||
provides significant value over simply knowing the location of the import
|
||||
statement.)
|
||||
|
||||
Only ``Exception`` are replaced in this way, not ``BaseException``.
|
||||
``BaseException`` are for "system-exiting" exceptions like ``KeyboardInterrupt``
|
||||
or ``SystemExit``; these are normally not caught, and if they are caught, it is
|
||||
less likely to be specific to a certain bit of code that was expected to raise
|
||||
them, and more likely that the goal is to catch them whatever their origin.
|
||||
|
||||
|
||||
Debugging
|
||||
---------
|
||||
|
||||
Debug logging from ``python -v`` will include logging whenever an import
|
||||
statement has been encountered but execution of the import will be deferred.
|
||||
|
||||
Python's ``-X importtime`` feature for profiling import costs adapts naturally
|
||||
to lazy imports; the profiled time is the time spent actually importing.
|
||||
|
||||
Although lazy import objects are never visible to Python code, in some debugging
|
||||
cases it may be useful to check from Python code whether the value at a given
|
||||
key in a given dictionary is a lazy import object, without triggering its
|
||||
resolution. For this purpose, ``importlib.is_lazy_import()`` can be used::
|
||||
|
||||
from importlib import is_lazy_import
|
||||
|
||||
import foo
|
||||
|
||||
is_lazy_import(globals(), "foo")
|
||||
|
||||
foo
|
||||
|
||||
is_lazy_import(globals(), "foo")
|
||||
|
||||
In this example, if lazy imports have been enabled the first call to
|
||||
``is_lazy_import`` will return ``True`` and the second will return ``False``.
|
||||
|
||||
|
||||
Per-module opt out
|
||||
------------------
|
||||
|
||||
Due to the backwards compatibility issues mentioned below, it may be necessary
|
||||
to force some imports to be eager.
|
||||
for an application using lazy imports to force some imports to be eager.
|
||||
|
||||
In first-party code, since imports inside a ``try`` or ``with`` block are never
|
||||
lazy, this can be easily accomplished::
|
||||
|
@ -209,38 +412,78 @@ This PEP proposes to add a new ``importlib.eager_imports()`` context manager,
|
|||
so the above technique can be less verbose and doesn't require comments to
|
||||
clarify its intent::
|
||||
|
||||
from importlib import eager_imports
|
||||
|
||||
with eager_imports():
|
||||
import foo
|
||||
import bar
|
||||
|
||||
Since imports within context managers are always eager, the ``eager_imports()``
|
||||
context manager can just be an alias to a null context manager. The context
|
||||
manager does not force all imports to be recursively eager: ``foo`` and ``bar``
|
||||
will be imported eagerly, but imports within those modules will still follow
|
||||
the usual laziness rules.
|
||||
manager's effect is not transitive: ``foo`` and ``bar`` will be imported
|
||||
eagerly, but imports within those modules will still follow the usual laziness
|
||||
rules.
|
||||
|
||||
The more difficult case can occur if an import in third-party code that can't
|
||||
easily be modified must be forced to be eager. For this purpose, we propose to
|
||||
add an API to ``importlib`` that can be called early in the process to specify
|
||||
a list of module names within which all imports will be eager::
|
||||
easily be modified must be forced to be eager. For this purpose,
|
||||
``importlib.set_lazy_imports()`` takes an optional keyword-only ``excluding``
|
||||
argument, which can be set to a container of module names within which all
|
||||
imports will be eager::
|
||||
|
||||
from importlib import set_eager_imports
|
||||
from importlib import set_lazy_imports
|
||||
|
||||
set_eager_imports(["one.mod", "another"])
|
||||
set_lazy_imports(excluding=["one.mod", "another"])
|
||||
|
||||
The effect of this is also shallow: all imports within ``one.mod`` will be
|
||||
eager, but not imports in all modules imported by ``one.mod``.
|
||||
|
||||
``set_eager_imports()`` can also take a callback which receives a module name and returns
|
||||
whether imports within this module should be eager::
|
||||
The ``excluding`` parameter of ``set_lazy_imports()`` can also be set to a
|
||||
callback which receives a module name and returns whether imports within this
|
||||
module should be eager::
|
||||
|
||||
import re
|
||||
from importlib import set_eager_imports
|
||||
|
||||
from importlib import set_lazy_imports
|
||||
|
||||
def eager_imports(name):
|
||||
return re.match(r"foo\.[^.]+\.logger", name)
|
||||
|
||||
set_eager_imports(eager_imports)
|
||||
set_lazy_imports(excluding=eager_imports)
|
||||
|
||||
If Python was executed with the ``-L`` flag, then lazy imports will already be
|
||||
globally enabled, and the only effect of calling ``set_lazy_imports()`` will be
|
||||
to globally set the eager module names/callback. If ``set_lazy_imports()`` is
|
||||
called with no ``excluding`` argument, the exclusion list/callback will be
|
||||
cleared and all eligible imports (module-level imports not in
|
||||
``try/except/with``, and not ``import *``) will be lazy from that point forward.
|
||||
|
||||
``set_lazy_imports()`` may be called more than once, with subsequent calls
|
||||
having only the effect of globally replacing or clearing the ``excluding``
|
||||
list/callback. Generally there should be no reason to do this: the intended use
|
||||
is a single call to ``set_lazy_imports`` in the main module, early in the
|
||||
process.
|
||||
|
||||
This opt-out system is designed to maintain the possibility of local reasoning
|
||||
about the laziness of an import. You only need to see the code of one module,
|
||||
and the ``excluding`` argument to ``set_lazy_imports``, if any, to know whether
|
||||
a given import will be eager or lazy.
|
||||
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
The CPython test suite will pass with lazy imports enabled (possibly with some
|
||||
tests skipped). One buildbot should run the test suite with lazy imports
|
||||
enabled.
|
||||
|
||||
|
||||
C API
|
||||
-----
|
||||
|
||||
For authors of C extension modules, the proposed
|
||||
``importlib.set_lazy_imports()`` function will also be exposed in the stable C
|
||||
API as ``PyImport_SetLazyImports(PyObject *names_or_callback_or_null)``, and
|
||||
``importlib.is_lazy_import`` will be available as ``PyDict_IsLazyImport(PyObject
|
||||
*dict, PyObject *key)``.
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
|
@ -293,8 +536,12 @@ adding (and then removing after the import) paths from ``sys.path``::
|
|||
del sys.path[0]
|
||||
foo.Bar()
|
||||
|
||||
In this case, with lazy imports enabled, the import of ``foo`` will not
|
||||
actually occur while the addition to ``sys.path`` is present.
|
||||
In this case, with lazy imports enabled, the import of ``foo`` will not actually
|
||||
occur while the addition to ``sys.path`` is present.
|
||||
|
||||
An easy fix for this (which arguably also improves the code style) would be to
|
||||
place the ``sys.path`` modifications in a context manager. This resolves the
|
||||
issue, since imports inside a ``with`` block are always eager.
|
||||
|
||||
|
||||
Deferred Exceptions
|
||||
|
@ -302,80 +549,116 @@ Deferred Exceptions
|
|||
|
||||
All exceptions arising from import (including ``ModuleNotFoundError``) are
|
||||
deferred from import time to first-use time, which could complicate debugging.
|
||||
Accessing an object in the middle of any code could trigger a deferred import
|
||||
and produce ``ImportError`` or any other exception resulting from the
|
||||
resolution of the deferred object, while loading and executing the related
|
||||
imported module. The implementation will provide debugging assistance in
|
||||
lazy-import-triggered tracebacks to mitigate this issue.
|
||||
Referencing a name in the middle of any code could trigger a deferred import and
|
||||
produce ``LazyImportError`` while loading and executing the related imported
|
||||
module.
|
||||
|
||||
Ensuring all lazy import errors are raised as ``LazyImportError`` mitigates this
|
||||
issue by reducing the likelihood that they will be accidentally caught and
|
||||
mistaken for a different expected exception. ``LazyImportError`` will also
|
||||
provide the location of the original import statement to aid in debugging, as
|
||||
described above.
|
||||
|
||||
|
||||
Drawbacks
|
||||
=========
|
||||
|
||||
Downsides of this PEP include:
|
||||
|
||||
* It provides a subtly incompatible semantics for the behavior of Python
|
||||
imports. This is a potential burden on library authors who may be asked by their
|
||||
users to support both semantics, and is one more possibility for Python
|
||||
users/readers to be aware of.
|
||||
|
||||
* Some popular Python coding patterns (notably centralized registries populated
|
||||
by a decorator) rely on import side effects and may require explicit opt-out to
|
||||
work as expected with lazy imports.
|
||||
|
||||
Lazy import semantics are already possible and even supported today in the
|
||||
Python standard library, so these drawbacks are not newly introduced by this
|
||||
PEP. So far, existing usage of lazy imports by some applications has not proven
|
||||
a problem. But this PEP is likely to make the usage of lazy imports more
|
||||
popular, potentially exacerbating these drawbacks.
|
||||
|
||||
These drawbacks must be weighed against the significant benefits offered by this
|
||||
PEP's implementation of lazy imports. Ultimately these costs will be higher if
|
||||
the feature is widely used; but wide usage also indicates the feature provides a
|
||||
lot of value, perhaps justifying the costs.
|
||||
|
||||
|
||||
Security Implications
|
||||
=====================
|
||||
|
||||
Deferred execution of code could produce security concerns if process owner,
|
||||
path, ``sys.path``, or other sensitive environment or contextual states change
|
||||
between the time the ``import`` statement is executed and the time where the
|
||||
imported object is used.
|
||||
shell path, ``sys.path``, or other sensitive environment or contextual states
|
||||
change between the time the ``import`` statement is executed and the time the
|
||||
imported object is first referenced.
|
||||
|
||||
|
||||
Performance Impact
|
||||
==================
|
||||
|
||||
The reference implementation has shown that the feature has negligible
|
||||
performance impact on existing real-world codebases (Instagram Server and other
|
||||
several CLI programs at Meta), while providing substantial improvements to
|
||||
startup time and memory usage.
|
||||
performance impact on existing real-world codebases (Instagram Server, several
|
||||
CLI programs at Meta, Jupyter notebooks used by Meta researchers), while
|
||||
providing substantial improvements to startup time and memory usage.
|
||||
|
||||
The reference implementation shows small performance regressions in a few
|
||||
pyperformance benchmarks, but improvements in others. (TODO update with
|
||||
detailed data from 3.11 port of implementation.)
|
||||
detailed data from main-branch port of implementation.)
|
||||
|
||||
|
||||
How to Teach This
|
||||
=================
|
||||
|
||||
Since the feature is opt-in, beginners should not encounter it by default.
|
||||
Documentation of the ``-L`` flag and ``PYTHONLAZYIMPORTS`` environment variable
|
||||
can clarify the behavior of lazy imports.
|
||||
Documentation of the ``-L`` flag and ``importlib.set_lazy_imports()`` can
|
||||
clarify the behavior of lazy imports.
|
||||
|
||||
The documentation should also clarify that opting into lazy imports is opting
|
||||
into a non-standard semantics for Python imports, which could cause Python
|
||||
libraries to break in unexpected ways. The responsibility to identify these
|
||||
breakages and work around them with an opt-out (or stop using lazy imports)
|
||||
rests entirely with the person choosing to enable lazy imports for their
|
||||
application, not with the library author. Python libraries are under no
|
||||
obligation to support lazy import semantics. Politely reporting an
|
||||
incompatibility may be useful to the library author, but they may choose to
|
||||
simply say their library does not support use with lazy imports, and this is a
|
||||
valid choice.
|
||||
|
||||
Some best practices to deal with some of the issues that could arise and to
|
||||
better take advantage of lazy imports are:
|
||||
|
||||
* Avoid relying on import side effects. Perhaps the most common reliance on
|
||||
import side effects is the registry pattern, where population of some
|
||||
external registry happens implicitly during the importing of modules, often
|
||||
via decorators. Instead, the registry should be built via an explicit call
|
||||
that perhaps does a discovery process to find decorated functions or classes.
|
||||
import side effects is the registry pattern, where population of some external
|
||||
registry happens implicitly during the importing of modules, often via
|
||||
decorators. Instead, the registry should be built via an explicit call that does
|
||||
a discovery process to find decorated functions or classes in explicitly
|
||||
nominated modules.
|
||||
|
||||
* Always import needed submodules explicitly, don't rely on some other import
|
||||
to ensure a module has its submodules as attributes. That is, do ``import
|
||||
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only
|
||||
works (unreliably) because the attribute ``foo.bar`` is added as a side
|
||||
effect of ``foo.bar`` being imported somewhere else. With lazy imports this
|
||||
may not always happen on time.
|
||||
to ensure a module has its submodules as attributes. That is, unless there is an
|
||||
explicit ``from . import bar`` in ``foo/__init__.py``, always do ``import
|
||||
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only works
|
||||
(unreliably) because the attribute ``foo.bar`` is added as a side effect of
|
||||
``foo.bar`` being imported somewhere else. With lazy imports this may not always
|
||||
happen on time.
|
||||
|
||||
* Avoid using star imports, as those are always eager.
|
||||
|
||||
* When possible, do not import whole submodules. Import specific names instead;
|
||||
i.e.: do ``from foo.bar import Baz``, not ``import foo.bar`` and then
|
||||
``foo.bar.Baz``. If you import submodules (such as ``foo.qux`` and
|
||||
``foo.fred``), with lazy imports enabled, when you access the parent module's
|
||||
name (``foo`` in this case), that will trigger loading all of the sibling
|
||||
submodules of the parent module (``foo.bar``, ``foo.qux`` and ``foo.fred``),
|
||||
not only the one being accessed, because the parent module ``foo`` is the
|
||||
actual deferred object name.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
The current reference implementation is available as part of
|
||||
`Cinder <https://github.com/facebookincubator/cinder>`_.
|
||||
Reference implementation is in use within Meta Platforms and has proven to
|
||||
achieve improvements in startup time (and total runtime for some applications)
|
||||
in the range of 40%-70%, as well as significant reduction in memory footprint
|
||||
(up to 40%), thanks to not needing to execute imports that end up being unused
|
||||
in the common flow.
|
||||
The current reference implementation is available as part of `Cinder
|
||||
<https://github.com/facebookincubator/cinder>`_. This reference implementation
|
||||
is in use within Meta and has proven to achieve improvements in startup time
|
||||
(and total runtime for some applications) in the range of 40%-70%, as well as
|
||||
significant reduction in memory footprint (up to 40%), thanks to not needing to
|
||||
execute imports that end up being unused in the common flow.
|
||||
|
||||
An updated reference implementation based on CPython main branch is in progress
|
||||
and will be linked here soon. (TODO link.)
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
|
@ -403,11 +686,11 @@ couple of disadvantages:
|
|||
Experience with the reference implementation suggests that the most practical
|
||||
adoption path for lazy imports is for a specific deployed application to opt-in
|
||||
globally, observe whether anything breaks, and opt-out specific modules as
|
||||
needed to account for e.g. reliance on import side effects.
|
||||
needed.
|
||||
|
||||
|
||||
Explicit syntax for lazy imports
|
||||
--------------------------------
|
||||
Explicit syntax for individual lazy imports
|
||||
-------------------------------------------
|
||||
|
||||
If the primary objective of lazy imports were solely to work around import
|
||||
cycles and forward references, an explicitly-marked syntax for particular
|
||||
|
@ -423,6 +706,41 @@ imports within the subsystems are all eager. This is extremely fragile, though
|
|||
shallow laziness. Globally enabling lazy imports, on the other hand, provides
|
||||
in-depth robust laziness where you always pay only for the imports you use.
|
||||
|
||||
There may be use cases (e.g. for static typing) where individually-marked lazy
|
||||
imports are desirable to avoid forward references, but the perf/memory benefits
|
||||
of globally lazy imports are not needed. Since this is a different set of
|
||||
motivating use cases and requires new syntax, we prefer not to include it in
|
||||
this PEP. Another PEP could build on top of this implementation and propose the
|
||||
additional syntax.
|
||||
|
||||
|
||||
Environment variable to enable lazy imports
|
||||
-------------------------------------------
|
||||
|
||||
Providing an environment variable opt-in lends itself too easily to abuse of the
|
||||
feature. It may seem tempting for a Python user to, for instance, globally set
|
||||
the environment variable in their shell in the hopes of speeding up all the
|
||||
Python programs they run. This usage with untested programs is likely to lead to
|
||||
spurious bug reports and maintenance burden for the authors of those tools. To
|
||||
avoid this, we choose not to provide an environment variable opt-in at all.
|
||||
|
||||
|
||||
Removing the ``-L`` flag
|
||||
------------------------
|
||||
|
||||
We do provide the ``-L`` CLI flag, which could in theory be abused in a similar
|
||||
way by an end user running an individual Python program that is run with
|
||||
``python somescript.py`` or ``python -m somescript`` (rather than distributed
|
||||
via Python packaging tools). But the potential scope for misuse is much less
|
||||
with ``-L`` than an environment variable, and ``-L`` is valuable for some
|
||||
applications to maximize startup time benefits by ensuring that all imports from
|
||||
the start of a process will be lazy, so we choose to keep it.
|
||||
|
||||
It is already the case that running arbitrary Python programs with command line
|
||||
flags they weren't intended to be used with (e.g. ``-s``, ``-S``, ``-E``, or
|
||||
``-I``) can have unexpected and breaking results. ``-L`` is nothing new in this
|
||||
regard.
|
||||
|
||||
|
||||
Half-lazy imports
|
||||
-----------------
|
||||
|
@ -444,6 +762,15 @@ import typos has not been an observed problem with the reference
|
|||
implementation. Generally delayed imports are not delayed forever, and errors
|
||||
show up soon enough to be caught and fixed (unless the import is truly unused.)
|
||||
|
||||
Another possible motivation for half-lazy imports would be to allow modules
|
||||
themselves to control via some flag whether they are imported lazily or eagerly.
|
||||
This is rejected both on the basis that it requires half-lazy imports, giving up
|
||||
some of the performance benefits of import laziness, and because in general
|
||||
modules do not decide how or when they are imported, the module importing them
|
||||
decides that. There isn't clear rationale for this PEP to invert that control;
|
||||
instead it just provides more options for the importing code to make the
|
||||
decision.
|
||||
|
||||
|
||||
Lazy dynamic imports
|
||||
--------------------
|
||||
|
@ -461,12 +788,13 @@ Deep eager-imports override
|
|||
---------------------------
|
||||
|
||||
The proposed ``importlib.eager_imports()`` context manager and
|
||||
``importlib.set_eager_imports()`` override both have shallow effects: they only
|
||||
force eagerness for the location where they are applied, not transitively. It
|
||||
would be possible (although not simple) to provide a deep/transitive version of
|
||||
one or both. That idea is rejected in this PEP because experience with the
|
||||
reference implementation has not shown it to be necessary, and because it
|
||||
prevents local reasoning about laziness of imports.
|
||||
``importlib.set_lazy_imports(excluding=...)`` override both have shallow
|
||||
effects: they only force eagerness for the location they are applied to, not
|
||||
transitively. It would be possible (although not simple) to provide a
|
||||
deep/transitive version of one or both. That idea is rejected in this PEP
|
||||
because the implementation would be complex (taking into account threads and
|
||||
async code), experience with the reference implementation has not shown it to be
|
||||
necessary, and because it prevents local reasoning about laziness of imports.
|
||||
|
||||
A deep override can lead to confusing behavior because the
|
||||
transitively-imported modules may be imported from multiple locations, some of
|
||||
|
@ -479,6 +807,27 @@ import will be lazy or eager. With the behavior specified in this PEP, such
|
|||
local reasoning is possible.
|
||||
|
||||
|
||||
Making lazy imports the default behavior
|
||||
----------------------------------------
|
||||
|
||||
Making lazy imports the default/sole behavior of Python imports, instead of
|
||||
opt-in, would have some long-term benefits, in that library authors would
|
||||
(eventually) no longer need to consider the possibility of both semantics.
|
||||
|
||||
However, the backwards-incompatibilies are such that this could only be
|
||||
considered over a long time frame, with a ``__future__`` import. It is not at
|
||||
all clear that lazy imports should become the default import semantics for
|
||||
Python.
|
||||
|
||||
Providing only per-module opt-in with a ``__future__`` import makes it much more
|
||||
difficult for the applications that can benefit from lazy imports to do so
|
||||
immediately, as discussed above.
|
||||
|
||||
This PEP takes the position that the Python community needs more experience with
|
||||
lazy imports before considering making it the default behavior, so that is
|
||||
entirely left to a possible future PEP.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
|
|
Loading…
Reference in New Issue