PEP 690: Last(?) round of updates (#2861)

* Updates to PEP 690

* Remove obsolete reference

* Lazy imports are plural
This commit is contained in:
Carl Meyer 2022-11-04 11:16:35 -06:00 committed by GitHub
parent cad6880e86
commit e5aa080e52
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 97 additions and 154 deletions

View File

@ -15,10 +15,10 @@ Post-History: `03-May-2022 <https://discuss.python.org/t/pep-690-lazy-imports/15
Abstract
========
This PEP proposes a feature to transparently defer the execution of imported
modules until the moment when an imported object is first used. Since Python
programs commonly import many more modules than a single invocation of the
program is likely to use in practice, lazy imports can greatly reduce the
This PEP proposes a feature to transparently defer the finding and execution of
imported modules until the moment when an imported object is first used. Since
Python programs commonly import many more modules than a single invocation of
the program is likely to use in practice, lazy imports can greatly reduce the
overall number of modules loaded, improving startup time and memory usage. Lazy
imports also mostly eliminate the risk of import cycles.
@ -123,8 +123,8 @@ when using bulk iterations (``iter(dict)``, ``reversed(dict)``,
``dict.__reversed__()``, ``dict.keys()``, ``iter(dict.keys())`` and
``reversed(dict.keys())``). To avoid this performance penalty on the vast
majority of dictionaries, which never contain any lazy objects, we steal a bit
from the ``dk_kind`` field for a new ``dk_lazy_imports`` bitfield to keep track
of whether a dictionary may contain lazy objects or not.
from the ``dk_kind`` field for a new ``dk_lazy_imports`` flag to keep track of
whether a dictionary may contain lazy objects or not.
This implementation comprehensively prevents leakage of lazy objects, ensuring
they are always resolved to the real imported object before anyone can get hold
@ -135,13 +135,13 @@ dictionaries in general.
Specification
=============
Lazy imports are opt-in, and they can be:
* Globally enabled, either via a new ``-L`` flag to the Python interpreter or
via a call to a new ``importlib.set_lazy_imports()`` function, which makes
*all* new relevant imports after the call immediately lazy.
* Enabled in a specific module, via ``importlib.enable_lazy_imports_in_module()``.
Lazy imports are opt-in, and they can be globally enabled either via a new
``-L`` flag to the Python interpreter, or via a call to a new
``importlib.set_lazy_imports()`` function. This function takes two arguments, a
boolean ``enabled`` and an ``excluding`` container. If ``enabled`` is true, lazy
imports will be turned on from that point forward. If it is false, they will be
turned off from that point forward. (Use of the ``excluding`` keyword is
discussed below under "Per-module opt-out.")
When the flag ``-L`` is passed to the Python interpreter, a new
``sys.flags.lazy_imports`` is set to ``True``, otherwise it exists as ``False``.
@ -149,16 +149,16 @@ This flag is used to propagate ``-L`` to new Python subprocesses.
The flag in ``sys.flags.lazy_imports`` does not necessarily reflect the current
status of lazy imports, only whether the interpreter was started with the ``-L``
option. Actual current status of whether lazy imports is enabled or not at any
option. Actual current status of whether lazy imports are enabled or not at any
moment can be retrieved using ``importlib.is_lazy_imports_enabled()``, which
will return ``True`` if lazy imports is enabled at the call point or ``False``
will return ``True`` if lazy imports are enabled at the call point or ``False``
otherwise.
When enabled, the loading and execution of all (and only) top-level imports is
deferred until the imported name is first used. This could happen immediately
(e.g. on the very next line after the import statement) or much later (e.g.
while using the name inside a function being called by some other code at some
later time.)
When lazy imports are enabled, the loading and execution of all (and only)
top-level imports is deferred until the imported name is first used. This could
happen immediately (e.g. on the very next line after the import statement) or
much later (e.g. while using the name inside a function being called by some
other code at some later time.)
For these top level imports, there are two contexts which will make them eager
(not lazy): imports inside ``try`` / ``except`` / ``finally`` or ``with``
@ -289,10 +289,6 @@ or ``"bar"`` is immediately added to the module namespace dictionary, but with
its value set to an internal-only "lazy import" object that preserves all the
necessary metadata to execute the import later.
The lazy object is intended to be opaque and self-contained, it has no
attributes and it can not be resolved in any way. A ``repr()`` of it would be
shown as something like: ``<lazy_object 'fully.qualified.name'>``.
A new boolean flag in ``PyDictKeysObject`` (``dk_lazy_imports``) is set to
signal that this particular dictionary may contain lazy import objects. This
flag is only used to efficiently resolve all lazy objects in "bulk" operations,
@ -312,12 +308,8 @@ the resolved value.
Because this is all handled internally by the dictionary implementation, lazy
import objects can never escape from the module namespace to become visible to
Python code; they are always resolved at their first reference.
No stub, dummy or thunk objects are ever visible to Python code or placed in
``sys.modules``. Other than the delayed import, the implementation is
transparent.
Python code; they are always resolved at their first reference. No stub, dummy
or thunk objects are ever visible to Python code or placed in ``sys.modules``.
If a module is imported lazily, no entry for it will appear in ``sys.modules``
at all until it is actually imported on first reference.
@ -345,21 +337,22 @@ There are two cases in which a lazy import object can escape a dictionary:
* Through the garbage collector: lazy imported objects are still Python objects
and live within the garbage collector; as such, they can be collected and seen
by means of using ``gc.collect()`` and ``gc.get_objects()``. Lazy objects are
not useful but they are also harmless and pose no danger if extracted from the
garbage collector in this way.
via e.g. ``gc.get_objects()``. If a lazy object becomes
visible to Python code in this way, it is opaque and inert; it has no useful
methods or attributes. A ``repr()`` of it would be shown as something like:
``<lazy_object 'fully.qualified.name'>``.
When a lazy object is added to a dictionary the flag ``dk_lazy_imports`` is set
and once set, the only case the flag gets cleared is when *all* lazy import
objects get resolved, during one of the "bulk" dictionary lookup operations.
When a lazy object is added to a dictionary, the flag ``dk_lazy_imports`` is set.
Once set, the flag is only cleared if *all* lazy import objects in the
dictionary are resolved, e.g. prior to dictionary iteration.
All "bulk" dictionary lookup methods involving values (such as ``dict.items()``,
All dictionary iteration methods involving values (such as ``dict.items()``,
``dict.values()``, ``PyDict_Next()`` etc.) will attempt to resolve *all* lazy
import objects in the dictionary prior to starting the iteration. Since only (some)
module namespace dictionaries will ever have ``dk_lazy_imports`` set, the extra
overhead of resolving all lazy import objects inside a dictionary is only paid
by those dictionaries that need it. Minimizing the overhead on normal non-lazy
dictionaries is the sole purpose of the ``dk_lazy_imports`` flag.
import objects in the dictionary prior to starting the iteration. Since only
(some) module namespace dictionaries will ever have ``dk_lazy_imports`` set, the
extra overhead of resolving all lazy import objects inside a dictionary is only
paid by those dictionaries that need it. Minimizing the overhead on normal
non-lazy dictionaries is the sole purpose of the ``dk_lazy_imports`` flag.
``PyDict_Next`` will attempt to resolve all lazy import objects the first time
position ``0`` is accessed, and those imports could fail with exceptions. Since
@ -367,9 +360,9 @@ position ``0`` is accessed, and those imports could fail with exceptions. Since
immediately in this case, and any exception will be printed to stderr as an
unraisable exception.
For this reason, this PEP introduces ``PyDict_NextWithError``, that works in the
same way as ``PyDict_Next``, but which can set an error when returning ``0`` and
this should be checked via ``PyErr_Occurred()`` after the call.
For this reason, this PEP introduces ``PyDict_NextWithError``, which works in
the same way as ``PyDict_Next``, but which can set an error when returning ``0``
and this should be checked via ``PyErr_Occurred()`` after the call.
The eagerness of imports within ``try`` / ``except`` / ``with`` blocks or within
class or function bodies is handled in the compiler via a new
@ -377,17 +370,6 @@ class or function bodies is handled in the compiler via a new
``IMPORT_NAME``, which may be lazy or eager depending on ``-L`` and/or
``importlib.set_lazy_imports()``.
The current status of lazy imports at any given place can be retrieved by using
``importlib.is_lazy_imports_enabled()`` and is determined by a combination of
the passed ``-L`` option flag; an interpreter-wide flag set by
``importlib.set_lazy_imports()`` and the container object passed in its
``excluding`` keyword argument; and a flag set by
``importlib.enable_lazy_imports_in_module()`` in nearest running module frame.
All these together are used to cache the current status of lazy imports in the
currently running frame. This cache is globally busted whenever any of these
API functions is called so that changes take effect immediately for all new
import statements.
Debugging
---------
@ -398,10 +380,10 @@ statement has been encountered but execution of the import will be deferred.
Python's ``-X importtime`` feature for profiling import costs adapts naturally
to lazy imports; the profiled time is the time spent actually importing.
Although lazy import objects are never visible to Python code, in some debugging
cases it may be useful to check from Python code whether the value at a given
key in a given dictionary is a lazy import object, without triggering its
resolution. For this purpose, ``importlib.is_lazy_import()`` can be used::
Although lazy import objects are not generally visible to Python code, in some
debugging cases it may be useful to check from Python code whether the value at
a given key in a given dictionary is a lazy import object, without triggering
its resolution. For this purpose, ``importlib.is_lazy_import()`` can be used::
from importlib import is_lazy_import
@ -450,8 +432,7 @@ rules.
The more difficult case can occur if an import in third-party code that can't
easily be modified must be forced to be eager. For this purpose,
``importlib.set_lazy_imports()`` takes two optional arguments: a boolean, ``True``
by default, for enabling or disabling lazy imports, and an optional keyword-only
``importlib.set_lazy_imports()`` takes a second optional keyword-only
``excluding`` argument, which can be set to a container of module names within
which all imports will be eager::
@ -463,9 +444,9 @@ The effect of this is also shallow: all imports within ``one.mod`` will be
eager, but not imports in all modules imported by ``one.mod``.
The ``excluding`` parameter of ``set_lazy_imports()`` can be a container of any
type that will be checked to see whether it contains a module name or not. If
the module name is contained in the object, it should be eager. Thus, another
example use case for this argument could be::
kind that will be checked to see whether it contains a module name. If the
module name is contained in the object, imports within it will be eager. Thus,
arbitrary opt-out logic can be encoded in a ``__contains__`` method::
import re
from importlib import set_lazy_imports
@ -477,17 +458,12 @@ example use case for this argument could be::
set_lazy_imports(excluding=Checker())
If Python was executed with the ``-L`` flag, then lazy imports will already be
globally enabled, and the only effect of calling ``set_lazy_imports()`` will be
to globally set the eager module names/callback. If ``set_lazy_imports()`` is
called with no ``excluding`` argument, the exclusion list/callback will be
cleared and all eligible imports (module-level imports not in
``try/except/with``, and not ``import *``) will be lazy from that point forward.
``set_lazy_imports()`` may be called more than once, with subsequent calls
having only the effect of globally replacing or clearing the ``excluding``
list/callback. Generally there should be no reason to do this: the intended use
is a single call to ``set_lazy_imports`` in the main module, early in the
process.
globally enabled, and the only effect of calling ``set_lazy_imports(True,
excluding=...)`` will be to globally set the eager module names/callback. If
``set_lazy_imports(True)`` is called with no ``excluding`` argument, the
exclusion list/callback will be cleared and all eligible imports (module-level
imports not in ``try/except/with``, and not ``import *``) will be lazy from that
point forward.
This opt-out system is designed to maintain the possibility of local reasoning
about the laziness of an import. You only need to see the code of one module,
@ -495,43 +471,11 @@ and the ``excluding`` argument to ``set_lazy_imports``, if any, to know whether
a given import will be eager or lazy.
Per-module opt-in
-----------------
Experience with the reference implementation suggests that the most practical
adoption path for lazy imports is for a specific deployed application to opt-in
globally, observe whether anything breaks, and opt-out specific modules as
needed.
It is less practical to achieve robust and significant startup-time or
memory-use wins by piecemeal application of lazy imports. Generally it would
require blanket application of the per-module opt-in to most of the
codebase, as well as to third-party dependencies (which may be hard or
impossible.)
However, under some use cases it may be convenient to have a way to enable lazy
imports whether the application/end user requests it or not. This too can be
easily achieved::
from importlib import enable_lazy_imports_in_module
enable_lazy_imports_in_module()
After calling ``enable_lazy_imports_in_module()``, every import in the module
would be lazy. This could be very helpful for libraries importing subpackages
into their main namespace by default, as a mean of exporting them without
suffering from the penalties and slowdowns of actually doing the import. This
was one of the motivating reasons behind SPEC-1, used by Scientific Python
libraries, where exposing symbols for interactive exploration and teaching
purposes allows making all the subpackages available there from the start
without the additional cost of actually doing the imports.
Testing
-------
The CPython test suite will pass with lazy imports enabled (possibly with some
tests skipped). One buildbot should run the test suite with lazy imports
enabled.
The CPython test suite will pass with lazy imports enabled (with some tests
skipped). One buildbot should run the test suite with lazy imports enabled.
C API
@ -562,8 +506,8 @@ For authors of C extension modules, the proposed public C API is as follows:
* ``PyDict_NextWithError()``, works the same way as ``PyDict_Next()``, with
the exception it propagates any errors to the caller by returning ``0`` and
setting an exception. Caller should use the ``if (PyErr_Ocurred())`` semantics
to check for any errors.
setting an exception. The caller should use ``PyErr_Ocurred()`` to check for any
errors.
Backwards Compatibility
@ -586,8 +530,8 @@ Import Side Effects
-------------------
Import side effects that would otherwise be produced by the execution of
imported modules during the execution of import statements will be deferred at
least until the imported objects are used.
imported modules during the execution of import statements will be deferred
until the imported objects are used.
These import side effects may include:
@ -619,9 +563,9 @@ adding (and then removing after the import) paths from ``sys.path``::
In this case, with lazy imports enabled, the import of ``foo`` will not actually
occur while the addition to ``sys.path`` is present.
An easy fix for this (which arguably also improves the code style) would be to
place the ``sys.path`` modifications in a context manager. This resolves the
issue, since imports inside a ``with`` block are always eager.
An easy fix for this (which also improves the code style and ensures cleanup)
would be to place the ``sys.path`` modifications in a context manager. This
resolves the issue, since imports inside a ``with`` block are always eager.
Deferred Exceptions
@ -658,8 +602,8 @@ Downsides of this PEP include:
Lazy import semantics are already possible and even supported today in the
Python standard library, so these drawbacks are not newly introduced by this
PEP. So far, existing usage of lazy imports by some applications has not proven
a problem. But this PEP is likely to make the usage of lazy imports more
popular, potentially exacerbating these drawbacks.
a problem. But this PEP could make the usage of lazy imports more popular,
potentially exacerbating these drawbacks.
These drawbacks must be weighed against the significant benefits offered by this
PEP's implementation of lazy imports. Ultimately these costs will be higher if
@ -684,9 +628,10 @@ performance impact on existing real-world codebases (Instagram Server, several
CLI programs at Meta, Jupyter notebooks used by Meta researchers), while
providing substantial improvements to startup time and memory usage.
The reference implementation shows small performance regressions in a few
pyperformance benchmarks, but improvements in others. (TODO update with
detailed data from main-branch port of implementation.)
The reference implementation shows `no measurable change
<https://gist.github.com/ericsnowcurrently/d027ff4130dedec3b58ab1f55be11e8c>`_
in aggregate performance on the `pyperformance benchmark suite
<https://github.com/python/pyperformance>`_.
How to Teach This
@ -723,7 +668,7 @@ better take advantage of lazy imports are:
foo.bar; foo.bar.Baz``, not ``import foo; foo.bar.Baz``. The latter only works
(unreliably) because the attribute ``foo.bar`` is added as a side effect of
``foo.bar`` being imported somewhere else. With lazy imports this may not always
happen on time.
happen in time.
* Avoid using star imports, as those are always eager.
@ -731,15 +676,15 @@ better take advantage of lazy imports are:
Reference Implementation
========================
The current reference implementation is available as part of `Cinder
The initial implementation is available as part of `Cinder
<https://github.com/facebookincubator/cinder>`_. This reference implementation
is in use within Meta and has proven to achieve improvements in startup time
(and total runtime for some applications) in the range of 40%-70%, as well as
significant reduction in memory footprint (up to 40%), thanks to not needing to
execute imports that end up being unused in the common flow.
An updated reference implementation based on CPython main branch is available
in the `GitHub Pull Request <https://github.com/Kronuz/cpython/pull/17>`_.
An `updated reference implementation based on CPython main branch
<https://github.com/Kronuz/cpython/pull/17>`_ is also available.
Rejected Ideas
@ -751,32 +696,35 @@ Wrapping deferred exceptions
To reduce the potential for confusion, exceptions raised in the
course of executing a lazy import could be replaced by a ``LazyImportError``
exception (a subclass of ``ImportError``), with a ``__cause__`` set to the
original exception. The ``LazyImportError`` would have source location metadata
attached pointing the user to the original import statement, to ease
debuggability of errors from lazy imports.
original exception.
Ensuring that all lazy import errors are raised as ``LazyImportError`` would
mitigate the potential confusion by reducing the likelihood that they would be
accidentally caught and mistaken for a different expected exception. However,
in practice we have seen cases, e.g. inside tests, where failing modules raise
``unittest.SkipTest`` exception and this would too end up being wrapped in
``LazyImportError``, making such tests fail because true exception types are
being magically hidden. The drawbacks here seem to outweigh the hypothetical
case where unexpected deferred exceptions are caught by mistake.
reduce the likelihood that they would be accidentally caught and mistaken for a
different expected exception. However, in practice we have seen cases, e.g.
inside tests, where failing modules raise ``unittest.SkipTest`` exception and
this too would end up being wrapped in ``LazyImportError``, making such tests
fail because the true exception type is hidden. The drawbacks here seem to
outweigh the hypothetical case where unexpected deferred exceptions are caught
by mistake.
Per-module opt-in using future imports
--------------------------------------
Per-module opt-in
-----------------
A per-module opt-in using future imports (i.e.
``from __future__ import lazy_imports``) does not make sense because
``__future__`` imports are not feature flags, they are for transition to
behaviors which will become default in the future. It is not clear if lazy
imports will ever make sense as the default behavior, so we should not
promise this with a ``__future__`` import. Thus, the proposed per-module opt-in
uses a function call rather than dedicated syntax. Dedicated syntax would
require a new ``from __optional_features__ import lazy_imports`` or similar
mechanism.
promise this with a ``__future__`` import.
There are other cases where a library might desire to locally opt-in to lazy
imports for a particular module; e.g. a lazy top-level ``__init__.py`` for a
large library, to make its subcomponents accessible as lazy attributes. For now,
to keep the feature simpler, this PEP chooses to focus on the "application" use
case and does not address the library use case. The underlying laziness
mechanism introduced in this PEP could be used in the future to address this use
case as well.
Explicit syntax for individual lazy imports
@ -877,15 +825,14 @@ commonly used at module top level, which is where lazy imports applies.
Deep eager-imports override
---------------------------
The proposed ``importlib.enable_lazy_imports_in_module()``,
``importlib.eager_imports()`` context manager, and excluded modules in the
``importlib.set_lazy_imports(excluding=...)`` override all have shallow
The proposed ``importlib.eager_imports()`` context manager and excluded modules
in the ``importlib.set_lazy_imports(excluding=...)`` override all have shallow
effects: they only force eagerness for the location they are applied to, not
transitively. It would be possible (although not simple) to provide a
deep/transitive version of one or both. That idea is rejected in this PEP
because the implementation would be complex (taking into account threads and
async code), experience with the reference implementation has not shown it to be
necessary, and because it prevents local reasoning about laziness of imports.
transitively. It would be possible to provide a deep/transitive version of one
or both. That idea is rejected in this PEP because the implementation would be
complex (taking into account threads and async code), experience with the
reference implementation has not shown it to be necessary, and because it
prevents local reasoning about laziness of imports.
A deep override can lead to confusing behavior because the
transitively-imported modules may be imported from multiple locations, some of
@ -910,10 +857,6 @@ considered over a long time frame, with a ``__future__`` import. It is not at
all clear that lazy imports should become the default import semantics for
Python.
Providing only per-module opt-in with a ``__future__`` import makes it much more
difficult for the applications that can benefit from lazy imports to do so
immediately, as discussed above.
This PEP takes the position that the Python community needs more experience with
lazy imports before considering making it the default behavior, so that is
entirely left to a possible future PEP.