PEP 690: Update draft per discussion and feedback (#2613)

This commit is contained in:
Carl Meyer 2022-06-27 17:27:49 -06:00 committed by GitHub
parent 6a8349343c
commit 6a5ee970ff
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 460 additions and 111 deletions

View File

@ -16,7 +16,7 @@ Abstract
========
This PEP proposes a feature to transparently defer the execution of imported
modules until the moment when an imported object is used. Since Python
modules until the moment when an imported object is first used. Since Python
programs commonly import many more modules than a single invocation of the
program is likely to use in practice, lazy imports can greatly reduce the
overall number of modules loaded, improving startup time and memory usage. Lazy
@ -33,33 +33,52 @@ system at runtime. This means that importing the main module of a program
typically results in an immediate cascade of imports of most or all of the
modules that may ever be needed by the program.
Consider the example of a Python command line program with a number of
Consider the example of a Python command line program (CLI) with a number of
subcommands. Each subcommand may perform different tasks, requiring the import
of different dependencies. But a given invocation of the program will only
execute a single subcommand, or possibly none (i.e. if just ``--help`` usage
info is requested). Top-level eager imports in such a program will result in
the import of many modules that will never be used at all; the time spent
(possibly compiling and) executing these modules is pure waste.
info is requested). Top-level eager imports in such a program will result in the
import of many modules that will never be used at all; the time spent (possibly
compiling and) executing these modules is pure waste.
In an effort to improve startup time, some large Python CLIs tools make imports
lazy by manually placing imports inline into functions to delay imports of
expensive subsystems. This manual approach is labor-intensive and fragile; one
misplaced import or refactor can easily undo painstaking optimization work.
To improve startup time, some large Python CLIs make imports lazy by manually
placing imports inline into functions to delay imports of expensive subsystems.
This manual approach is labor-intensive and fragile; one misplaced import or
refactor can easily undo painstaking optimization work.
Existing import-hook-based solutions such as `demandimport
<https://github.com/bwesterb/py-demandimport/>`_ or `importlib.util.LazyLoader
<https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader>`_
are limited in that only certain styles of import can be made truly lazy
(imports such as ``from foo import a, b`` will still eagerly import the module
``foo``) and they impose additional runtime overhead on every module attribute
access.
The Python standard library already includes built-in support for lazy imports,
via `importlib.util.LazyLoader
<https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader>`_.
There are also third-party packages such as `demandimport
<https://github.com/bwesterb/py-demandimport/>`_. These provide a "lazy module
object" which delays its own import until first attribute access. This is not
sufficient to make all imports lazy: imports such as ``from foo import a, b``
will still eagerly import the module ``foo`` since they immediately access an
attribute from it. It also imposes noticeable runtime overhead on every module
attribute access, since it requires a Python-level ``__getattr__`` or
``__getattribute__`` implementation.
This PEP proposes a more comprehensive solution for lazy imports that does not
impose detectable overhead in real-world use. The implementation in this PEP
has already `demonstrated
Authors of scientific Python packages have also made extensive use of lazy
imports to allow users to write e.g. ``import scipy as sp`` and then easily
access many different submodules with e.g. ``sp.linalg``, without requiring all
the many submodules to be imported up-front. `SPEC 1
<https://scientific-python.org/specs/spec-0001/>`_ codifies this practice in the
form of a ``lazy_loader`` library that can be used explicitly in a package
``__init__.py`` to provide lazily accessible submodules.
Users of static typing also have to import names for use in type annotations
that may never be used at runtime (if :pep:`563` or possibly in future
:pep:`649` are used to avoid eager runtime evaluation of annotations). Lazy
imports are very attractive in this scenario to avoid overhead of unneeded
imports.
This PEP proposes a more general and comprehensive solution for lazy imports
that can encompass all of the above use cases and does not impose detectable
overhead in real-world use. The implementation in this PEP has already
`demonstrated
<https://github.com/facebookincubator/cinder/blob/cinder/3.8/CinderDoc/lazy_imports.rst>`_
startup time improvements up to 70% and memory-use reductions up to
40% on real-world Python CLIs.
startup time improvements up to 70% and memory-use reductions up to 40% on
real-world Python CLIs.
Lazy imports also eliminate most import cycles. With eager imports, "false
cycles" can easily occur which are fixed by simply moving an import to the
@ -97,15 +116,11 @@ accidental leakage of lazy objects is to have the dictionary itself be
responsible to ensure resolution of lazy objects on lookup.
To avoid a performance penalty on the vast majority of dictionaries which never
contain any lazy objects, we install a specialized lookup function
(``lookdict_unicode_lazy``) for module namespace dictionaries when they first
gain a lazy-object value. When this lookup function finds that the key
references a lazy object, it resolves the lazy object immediately before
returning it.
Some operations on dictionaries (e.g. iterating all values) don't go through
the lookup function; in these cases we have to add a check if the lookup
function is ``lookdict_unicode_lazy`` and if so, resolve all lazy values first.
contain any lazy objects, we set a specialized lookup kind (`DictKeysKind
<https://github.com/python/cpython/blob/3.11/Include/internal/pycore_dict.h#L80>`_)
for module namespace dictionaries when they first gain a lazy-object value. When
a lazy lookup kind is set and lookup finds that the key references a lazy
object, it resolves the lazy object immediately before returning it.
This implementation comprehensively prevents leakage of lazy objects, ensuring
they are always resolved to the real imported object before anyone can get hold
@ -116,17 +131,18 @@ dictionaries in general.
Specification
=============
Lazy imports are opt-in, and globally enabled via a new ``-L`` flag to the
Python interpreter, or a ``PYTHONLAZYIMPORTS`` environment variable.
Lazy imports are opt-in, and globally enabled either via a new ``-L`` flag to
the Python interpreter or via a call to a new ``importlib.set_lazy_imports()``
function, which makes all imports after the call lazy.
When enabled, the loading and execution of all (and only) top level imports is
deferred until the imported name is used. This could happen immediately (e.g.
on the very next line after the import statement) or much later (e.g. while
using the name inside a function being called by some other code at some later
time.)
When enabled, the loading and execution of all (and only) top-level imports is
deferred until the imported name is first used. This could happen immediately
(e.g. on the very next line after the import statement) or much later (e.g.
while using the name inside a function being called by some other code at some
later time.)
For these top level imports, there are two exceptions which will make them
eager (not lazy): imports inside ``try``/``except``/``finally`` or ``with``
For these top level imports, there are two contexts which will make them eager
(not lazy): imports inside ``try`` / ``except`` / ``finally`` or ``with``
blocks, and star imports (``from foo import *``.) Imports inside
exception-handling blocks (this includes ``with`` blocks, since those can also
"catch" and handle exceptions) remain eager so that any exceptions arising from
@ -139,6 +155,10 @@ level" and are never lazy.
Dynamic imports using ``__import__()`` or ``importlib.import_module()`` are
also never lazy.
Lazy imports state (i.e. whether they have been enabled, and any excluded
modules; see below) is per-interpreter, but global within the interpreter (i.e.
all threads will be affected).
Example
-------
@ -174,27 +194,210 @@ Of course, in real use cases (especially with lazy imports), it's not
recommended to rely on import side effects like this to trigger real work. This
example is just to clarify the behavior of lazy imports.
Another way to explain the effect of lazy imports is that it is as if each lazy
import statement had instead been written inline in the source code immediately
before each use of the imported name. So one can think of lazy imports as
similar to transforming this code::
Debuggability
-------------
import foo
The implementation will ensure that exceptions resulting from a deferred import
have metadata attached pointing the user to the original import statement, to
ease debuggability of errors from lazy imports.
def func1():
return foo.bar()
Additionally, debug logging from ``python -v`` will include logging when an
import statement has been encountered but execution of the import will be
deferred.
def func2():
return foo.baz()
To this::
def func1():
import foo
return foo.bar()
def func2():
import foo
return foo.baz()
This gives a good sense of when the import of ``foo`` will occur under lazy
imports, but lazy import is not really equivalent to this code transformation.
There are several notable differences:
* Unlike in the latter code, under lazy imports the name ``foo`` still does
exist in the module's global namespace, and can be imported or referenced by
other modules that import this one. (Such references would also trigger the
import.)
* The runtime overhead of lazy imports is much lower than the latter code; after
the first reference to the name ``foo`` which triggers the import, subsequent
references will have zero import system overhead; they are indistinguishable
from a normal name reference.
In a sense, lazy imports turn the import statement into just a declaration of an
imported name or names, to later be fully resolved when referenced.
An import in the style ``from foo import bar`` can also be made lazy. When the
import occurs, the name ``bar`` will be added to the module namespace as a lazy
import. The first reference to ``bar`` will import ``foo`` and resolve ``bar``
to ``foo.bar``.
Intended usage
--------------
Since lazy imports are a potentially-breaking semantic change, they should be
enabled only by the author or maintainer of a Python application, who is
prepared to thoroughly test the application under the new semantics, ensure it
behaves as expected, and opt-out any specific imports as needed (see below).
Lazy imports should not be enabled speculatively by the end user of a Python
application with any expectation of success.
It is the responsibility of the application developer enabling lazy imports for
their application to opt-out any library imports that turn out to need to be
eager for their application to work correctly; it is not the responsibility of
library authors to ensure that their library behaves exactly the same under lazy
imports.
The documentation of the feature, the ``-L`` flag, and the new ``importlib``
APIs will be clear about the intended usage and the risks of adoption without
testing.
Implementation
--------------
Lazy imports are represented internally by a "lazy import" object. When a lazy
import occurs (say ``import foo`` or ``from foo import bar``), the key ``"foo"``
or ``"bar"`` is immediately added to the module namespace dictionary, but with
its value set to an internal-only "lazy import" object that preserves all the
necessary metadata to execute the import later. The ``DictKeysKind`` for the
module namespace dictionary is updated from e.g. ``DICT_KEYS_UNICODE`` to
``DICT_KEYS_UNICODE_LAZY`` to signal that this particular dictionary may contain
lazy import objects.
(In case someone adds a non-unicode key to a module namespace dictionary also
containing lazy import objects, e.g. via ``globals()[42] = "foo"``, there is
also ``DICT_KEYS_GENERAL_LAZY``, but in most cases this is not needed.)
Anytime a key is looked up in a dictionary with ``DICT_KEYS_UNICODE_LAZY`` or
``DICT_KEYS_GENERAL_LAZY``, the value is checked to see if it is a lazy import
object. If so, the import is immediately executed, the lazy import object is
replaced in the dictionary by the actual imported value, and the imported value
is returned from the lookup.
Because this is all handled internally by the dictionary implementation, lazy
import objects can never escape from the module namespace to become visible to
Python code; they are always resolved at their first reference.
Since only (some) module namespace dictionaries will ever have
``DICT_KEYS_*_LAZY`` set, the (minimal) extra lookup overhead to check for lazy
import objects is only paid by those dictionaries that need it; other
dictionaries have no added overhead.
No stub or dummy objects are ever visible to Python code or placed in
``sys.modules``. Other than the delayed import, the implementation is
transparent.
If a module is imported lazily, no entry for it will appear in ``sys.modules``
at all until it is actually imported on first reference.
If two different modules (``moda`` and ``modb``) both contain a lazy ``import
foo``, each module's namespace dictionary will have an independent lazy import
object under the key ``"foo"``, delaying import of the same ``foo`` module. This
is not a problem. When there is first a reference to, say, ``moda.foo``, the
module ``foo`` will be imported and placed in ``sys.modules`` as usual, and the
lazy object under the key ``moda.__dict__["foo"]`` will be replaced by the
actual module ``foo``. At this point ``modb.__dict__["foo"]`` will remain a lazy
import object. When ``modb.foo`` is later referenced, it will also try to
``import foo``. This import will find the module already present in
``sys.modules``, as is normal for subsequent imports of the same module in
Python, and at this point will replace the lazy import object at
``modb.__dict__["foo"]`` with the actual module ``foo``.
There is one case in which a lazy import object can escape one dictionary (but
only into another dictionary) without being resolved. To preserve the
performance of bulk-copy operations like ``dict.update()`` and ``dict.copy()``,
they do not check for or resolve lazy import objects. However, if the source
dict has a ``*_LAZY`` lookup kind set that indicates it might contain lazy
objects, that lookup kind will be passed on to the updated/copied dictionary.
This still ensures that the lazy import object can't escape into Python code
without being resolved.
Other "bulk" dictionary lookup methods (such as ``dict.items()``,
``dict.values()``, etc) will resolve all lazy import objects in the dictionary.
Since it is uncommon for any of these to be used on a module namespace
dictionary, the priority here is simplicity of implementation and minimizing the
overhead on normal non-lazy dictionaries (just one check to see if the
dictionary has a ``*_LAZY`` lookup kind).
The eagerness of imports within ``try`` / ``except`` / ``with`` blocks or within
class or function bodies is handled in the compiler via a new
``EAGER_IMPORT_NAME`` opcode that always imports eagerly. Top-level imports use
``IMPORT_NAME``, which may be lazy or eager depending on ``-L`` and/or
``importlib.set_lazy_imports()``.
Exceptions
----------
Exceptions that occur during a lazy import bubble up and erase the
partially-constructed module(s) from ``sys.modules``, just as exceptions during
normal import do.
Since errors raised during a lazy import will occur later (wherever the imported
name is first referenced) than they would if the import were eager, it is
possible that they could be accidentally caught by exception handlers that
didn't expect the import to be running within their ``try`` block, leading to
confusion. To reduce the potential for this confusion, exceptions raised in the
course of executing a lazy import will be replaced by a ``LazyImportError``
exception (a subclass of ``ImportError``), with ``__cause__`` set to the
original exception.
The ``LazyImportError`` will have source location metadata attached pointing the
user to the original import statement, to ease debuggability of errors from lazy
imports. (It won't have a full traceback to the original import location; this
is too expensive to preserve for all lazy imports, and it's not clear that it
provides significant value over simply knowing the location of the import
statement.)
Only ``Exception`` are replaced in this way, not ``BaseException``.
``BaseException`` are for "system-exiting" exceptions like ``KeyboardInterrupt``
or ``SystemExit``; these are normally not caught, and if they are caught, it is
less likely to be specific to a certain bit of code that was expected to raise
them, and more likely that the goal is to catch them whatever their origin.
Debugging
---------
Debug logging from ``python -v`` will include logging whenever an import
statement has been encountered but execution of the import will be deferred.
Python's ``-X importtime`` feature for profiling import costs adapts naturally
to lazy imports; the profiled time is the time spent actually importing.
Although lazy import objects are never visible to Python code, in some debugging
cases it may be useful to check from Python code whether the value at a given
key in a given dictionary is a lazy import object, without triggering its
resolution. For this purpose, ``importlib.is_lazy_import()`` can be used::
from importlib import is_lazy_import
import foo
is_lazy_import(globals(), "foo")
foo
is_lazy_import(globals(), "foo")
In this example, if lazy imports have been enabled the first call to
``is_lazy_import`` will return ``True`` and the second will return ``False``.
Per-module opt out
------------------
Due to the backwards compatibility issues mentioned below, it may be necessary
to force some imports to be eager.
for an application using lazy imports to force some imports to be eager.
In first-party code, since imports inside a ``try`` or ``with`` block are never
lazy, this can be easily accomplished::
@ -209,38 +412,78 @@ This PEP proposes to add a new ``importlib.eager_imports()`` context manager,
so the above technique can be less verbose and doesn't require comments to
clarify its intent::
from importlib import eager_imports
with eager_imports():
import foo
import bar
Since imports within context managers are always eager, the ``eager_imports()``
context manager can just be an alias to a null context manager. The context
manager does not force all imports to be recursively eager: ``foo`` and ``bar``
will be imported eagerly, but imports within those modules will still follow
the usual laziness rules.
manager's effect is not transitive: ``foo`` and ``bar`` will be imported
eagerly, but imports within those modules will still follow the usual laziness
rules.
The more difficult case can occur if an import in third-party code that can't
easily be modified must be forced to be eager. For this purpose, we propose to
add an API to ``importlib`` that can be called early in the process to specify
a list of module names within which all imports will be eager::
easily be modified must be forced to be eager. For this purpose,
``importlib.set_lazy_imports()`` takes an optional keyword-only ``excluding``
argument, which can be set to a container of module names within which all
imports will be eager::
from importlib import set_eager_imports
from importlib import set_lazy_imports
set_eager_imports(["one.mod", "another"])
set_lazy_imports(excluding=["one.mod", "another"])
The effect of this is also shallow: all imports within ``one.mod`` will be
eager, but not imports in all modules imported by ``one.mod``.
``set_eager_imports()`` can also take a callback which receives a module name and returns
whether imports within this module should be eager::
The ``excluding`` parameter of ``set_lazy_imports()`` can also be set to a
callback which receives a module name and returns whether imports within this
module should be eager::
import re
from importlib import set_eager_imports
from importlib import set_lazy_imports
def eager_imports(name):
return re.match(r"foo\.[^.]+\.logger", name)
set_eager_imports(eager_imports)
set_lazy_imports(excluding=eager_imports)
If Python was executed with the ``-L`` flag, then lazy imports will already be
globally enabled, and the only effect of calling ``set_lazy_imports()`` will be
to globally set the eager module names/callback. If ``set_lazy_imports()`` is
called with no ``excluding`` argument, the exclusion list/callback will be
cleared and all eligible imports (module-level imports not in
``try/except/with``, and not ``import *``) will be lazy from that point forward.
``set_lazy_imports()`` may be called more than once, with subsequent calls
having only the effect of globally replacing or clearing the ``excluding``
list/callback. Generally there should be no reason to do this: the intended use
is a single call to ``set_lazy_imports`` in the main module, early in the
process.
This opt-out system is designed to maintain the possibility of local reasoning
about the laziness of an import. You only need to see the code of one module,
and the ``excluding`` argument to ``set_lazy_imports``, if any, to know whether
a given import will be eager or lazy.
Testing
-------
The CPython test suite will pass with lazy imports enabled (possibly with some
tests skipped). One buildbot should run the test suite with lazy imports
enabled.
C API
-----
For authors of C extension modules, the proposed
``importlib.set_lazy_imports()`` function will also be exposed in the stable C
API as ``PyImport_SetLazyImports(PyObject *names_or_callback_or_null)``, and
``importlib.is_lazy_import`` will be available as ``PyDict_IsLazyImport(PyObject
*dict, PyObject *key)``.
Backwards Compatibility
@ -293,8 +536,12 @@ adding (and then removing after the import) paths from ``sys.path``::
del sys.path[0]
foo.Bar()
In this case, with lazy imports enabled, the import of ``foo`` will not
actually occur while the addition to ``sys.path`` is present.
In this case, with lazy imports enabled, the import of ``foo`` will not actually
occur while the addition to ``sys.path`` is present.
An easy fix for this (which arguably also improves the code style) would be to
place the ``sys.path`` modifications in a context manager. This resolves the
issue, since imports inside a ``with`` block are always eager.
Deferred Exceptions
@ -302,80 +549,116 @@ Deferred Exceptions
All exceptions arising from import (including ``ModuleNotFoundError``) are
deferred from import time to first-use time, which could complicate debugging.
Accessing an object in the middle of any code could trigger a deferred import
and produce ``ImportError`` or any other exception resulting from the
resolution of the deferred object, while loading and executing the related
imported module. The implementation will provide debugging assistance in
lazy-import-triggered tracebacks to mitigate this issue.
Referencing a name in the middle of any code could trigger a deferred import and
produce ``LazyImportError`` while loading and executing the related imported
module.
Ensuring all lazy import errors are raised as ``LazyImportError`` mitigates this
issue by reducing the likelihood that they will be accidentally caught and
mistaken for a different expected exception. ``LazyImportError`` will also
provide the location of the original import statement to aid in debugging, as
described above.
Drawbacks
=========
Downsides of this PEP include:
* It provides a subtly incompatible semantics for the behavior of Python
imports. This is a potential burden on library authors who may be asked by their
users to support both semantics, and is one more possibility for Python
users/readers to be aware of.
* Some popular Python coding patterns (notably centralized registries populated
by a decorator) rely on import side effects and may require explicit opt-out to
work as expected with lazy imports.
Lazy import semantics are already possible and even supported today in the
Python standard library, so these drawbacks are not newly introduced by this
PEP. So far, existing usage of lazy imports by some applications has not proven
a problem. But this PEP is likely to make the usage of lazy imports more
popular, potentially exacerbating these drawbacks.
These drawbacks must be weighed against the significant benefits offered by this
PEP's implementation of lazy imports. Ultimately these costs will be higher if
the feature is widely used; but wide usage also indicates the feature provides a
lot of value, perhaps justifying the costs.
Security Implications
=====================
Deferred execution of code could produce security concerns if process owner,
path, ``sys.path``, or other sensitive environment or contextual states change
between the time the ``import`` statement is executed and the time where the
imported object is used.
shell path, ``sys.path``, or other sensitive environment or contextual states
change between the time the ``import`` statement is executed and the time the
imported object is first referenced.
Performance Impact
==================
The reference implementation has shown that the feature has negligible
performance impact on existing real-world codebases (Instagram Server and other
several CLI programs at Meta), while providing substantial improvements to
startup time and memory usage.
performance impact on existing real-world codebases (Instagram Server, several
CLI programs at Meta, Jupyter notebooks used by Meta researchers), while
providing substantial improvements to startup time and memory usage.
The reference implementation shows small performance regressions in a few
pyperformance benchmarks, but improvements in others. (TODO update with
detailed data from 3.11 port of implementation.)
detailed data from main-branch port of implementation.)
How to Teach This
=================
Since the feature is opt-in, beginners should not encounter it by default.
Documentation of the ``-L`` flag and ``PYTHONLAZYIMPORTS`` environment variable
can clarify the behavior of lazy imports.
Documentation of the ``-L`` flag and ``importlib.set_lazy_imports()`` can
clarify the behavior of lazy imports.
The documentation should also clarify that opting into lazy imports is opting
into a non-standard semantics for Python imports, which could cause Python
libraries to break in unexpected ways. The responsibility to identify these
breakages and work around them with an opt-out (or stop using lazy imports)
rests entirely with the person choosing to enable lazy imports for their
application, not with the library author. Python libraries are under no
obligation to support lazy import semantics. Politely reporting an
incompatibility may be useful to the library author, but they may choose to
simply say their library does not support use with lazy imports, and this is a
valid choice.
Some best practices to deal with some of the issues that could arise and to
better take advantage of lazy imports are:
* Avoid relying on import side effects. Perhaps the most common reliance on
import side effects is the registry pattern, where population of some
external registry happens implicitly during the importing of modules, often
via decorators. Instead, the registry should be built via an explicit call
that perhaps does a discovery process to find decorated functions or classes.
import side effects is the registry pattern, where population of some external
registry happens implicitly during the importing of modules, often via
decorators. Instead, the registry should be built via an explicit call that does
a discovery process to find decorated functions or classes in explicitly
nominated modules.
* Always import needed submodules explicitly, don't rely on some other import
to ensure a module has its submodules as attributes. That is, do ``import
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only
works (unreliably) because the attribute ``foo.bar`` is added as a side
effect of ``foo.bar`` being imported somewhere else. With lazy imports this
may not always happen on time.
to ensure a module has its submodules as attributes. That is, unless there is an
explicit ``from . import bar`` in ``foo/__init__.py``, always do ``import
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only works
(unreliably) because the attribute ``foo.bar`` is added as a side effect of
``foo.bar`` being imported somewhere else. With lazy imports this may not always
happen on time.
* Avoid using star imports, as those are always eager.
* When possible, do not import whole submodules. Import specific names instead;
i.e.: do ``from foo.bar import Baz``, not ``import foo.bar`` and then
``foo.bar.Baz``. If you import submodules (such as ``foo.qux`` and
``foo.fred``), with lazy imports enabled, when you access the parent module's
name (``foo`` in this case), that will trigger loading all of the sibling
submodules of the parent module (``foo.bar``, ``foo.qux`` and ``foo.fred``),
not only the one being accessed, because the parent module ``foo`` is the
actual deferred object name.
Reference Implementation
========================
The current reference implementation is available as part of
`Cinder <https://github.com/facebookincubator/cinder>`_.
Reference implementation is in use within Meta Platforms and has proven to
achieve improvements in startup time (and total runtime for some applications)
in the range of 40%-70%, as well as significant reduction in memory footprint
(up to 40%), thanks to not needing to execute imports that end up being unused
in the common flow.
The current reference implementation is available as part of `Cinder
<https://github.com/facebookincubator/cinder>`_. This reference implementation
is in use within Meta and has proven to achieve improvements in startup time
(and total runtime for some applications) in the range of 40%-70%, as well as
significant reduction in memory footprint (up to 40%), thanks to not needing to
execute imports that end up being unused in the common flow.
An updated reference implementation based on CPython main branch is in progress
and will be linked here soon. (TODO link.)
Rejected Ideas
@ -403,11 +686,11 @@ couple of disadvantages:
Experience with the reference implementation suggests that the most practical
adoption path for lazy imports is for a specific deployed application to opt-in
globally, observe whether anything breaks, and opt-out specific modules as
needed to account for e.g. reliance on import side effects.
needed.
Explicit syntax for lazy imports
--------------------------------
Explicit syntax for individual lazy imports
-------------------------------------------
If the primary objective of lazy imports were solely to work around import
cycles and forward references, an explicitly-marked syntax for particular
@ -423,6 +706,41 @@ imports within the subsystems are all eager. This is extremely fragile, though
shallow laziness. Globally enabling lazy imports, on the other hand, provides
in-depth robust laziness where you always pay only for the imports you use.
There may be use cases (e.g. for static typing) where individually-marked lazy
imports are desirable to avoid forward references, but the perf/memory benefits
of globally lazy imports are not needed. Since this is a different set of
motivating use cases and requires new syntax, we prefer not to include it in
this PEP. Another PEP could build on top of this implementation and propose the
additional syntax.
Environment variable to enable lazy imports
-------------------------------------------
Providing an environment variable opt-in lends itself too easily to abuse of the
feature. It may seem tempting for a Python user to, for instance, globally set
the environment variable in their shell in the hopes of speeding up all the
Python programs they run. This usage with untested programs is likely to lead to
spurious bug reports and maintenance burden for the authors of those tools. To
avoid this, we choose not to provide an environment variable opt-in at all.
Removing the ``-L`` flag
------------------------
We do provide the ``-L`` CLI flag, which could in theory be abused in a similar
way by an end user running an individual Python program that is run with
``python somescript.py`` or ``python -m somescript`` (rather than distributed
via Python packaging tools). But the potential scope for misuse is much less
with ``-L`` than an environment variable, and ``-L`` is valuable for some
applications to maximize startup time benefits by ensuring that all imports from
the start of a process will be lazy, so we choose to keep it.
It is already the case that running arbitrary Python programs with command line
flags they weren't intended to be used with (e.g. ``-s``, ``-S``, ``-E``, or
``-I``) can have unexpected and breaking results. ``-L`` is nothing new in this
regard.
Half-lazy imports
-----------------
@ -444,6 +762,15 @@ import typos has not been an observed problem with the reference
implementation. Generally delayed imports are not delayed forever, and errors
show up soon enough to be caught and fixed (unless the import is truly unused.)
Another possible motivation for half-lazy imports would be to allow modules
themselves to control via some flag whether they are imported lazily or eagerly.
This is rejected both on the basis that it requires half-lazy imports, giving up
some of the performance benefits of import laziness, and because in general
modules do not decide how or when they are imported, the module importing them
decides that. There isn't clear rationale for this PEP to invert that control;
instead it just provides more options for the importing code to make the
decision.
Lazy dynamic imports
--------------------
@ -461,12 +788,13 @@ Deep eager-imports override
---------------------------
The proposed ``importlib.eager_imports()`` context manager and
``importlib.set_eager_imports()`` override both have shallow effects: they only
force eagerness for the location where they are applied, not transitively. It
would be possible (although not simple) to provide a deep/transitive version of
one or both. That idea is rejected in this PEP because experience with the
reference implementation has not shown it to be necessary, and because it
prevents local reasoning about laziness of imports.
``importlib.set_lazy_imports(excluding=...)`` override both have shallow
effects: they only force eagerness for the location they are applied to, not
transitively. It would be possible (although not simple) to provide a
deep/transitive version of one or both. That idea is rejected in this PEP
because the implementation would be complex (taking into account threads and
async code), experience with the reference implementation has not shown it to be
necessary, and because it prevents local reasoning about laziness of imports.
A deep override can lead to confusing behavior because the
transitively-imported modules may be imported from multiple locations, some of
@ -479,6 +807,27 @@ import will be lazy or eager. With the behavior specified in this PEP, such
local reasoning is possible.
Making lazy imports the default behavior
----------------------------------------
Making lazy imports the default/sole behavior of Python imports, instead of
opt-in, would have some long-term benefits, in that library authors would
(eventually) no longer need to consider the possibility of both semantics.
However, the backwards-incompatibilies are such that this could only be
considered over a long time frame, with a ``__future__`` import. It is not at
all clear that lazy imports should become the default import semantics for
Python.
Providing only per-module opt-in with a ``__future__`` import makes it much more
difficult for the applications that can benefit from lazy imports to do so
immediately, as discussed above.
This PEP takes the position that the Python community needs more experience with
lazy imports before considering making it the default behavior, so that is
entirely left to a possible future PEP.
Copyright
=========