961 lines
37 KiB
Plaintext
961 lines
37 KiB
Plaintext
PEP: 451
|
||
Title: A ModuleSpec Type for the Import System
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Eric Snow <ericsnowcurrently@gmail.com>
|
||
BDFL-Delegate: Brett Cannon <brett@python.org>, Nick Coghlan <ncoghlan@gmail.com>
|
||
Discussions-To: import-sig@python.org
|
||
Status: Accepted
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 8-Aug-2013
|
||
Python-Version: 3.4
|
||
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013, 24-Sep-2013, 4-Oct-2013
|
||
Resolution: https://mail.python.org/pipermail/python-dev/2013-November/130104.html
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes to add a new class to importlib.machinery called
|
||
"ModuleSpec". It will provide all the import-related information used
|
||
to load a module and will be available without needing to load the
|
||
module first. Finders will directly provide a module's spec instead of
|
||
a loader (which they will continue to provide indirectly). The import
|
||
machinery will be adjusted to take advantage of module specs, including
|
||
using them to load modules.
|
||
|
||
|
||
Terms and Concepts
|
||
==================
|
||
|
||
The changes in this proposal are an opportunity to make several
|
||
existing terms and concepts more clear, whereas currently they are
|
||
(unfortunately) ambiguous. New concepts are also introduced in this
|
||
proposal. Finally, it's worth explaining a few other existing terms
|
||
with which people may not be so familiar. For the sake of context, here
|
||
is a brief summary of all three groups of terms and concepts. A more
|
||
detailed explanation of the import system is found at
|
||
[#import_system_docs]_.
|
||
|
||
name
|
||
----
|
||
|
||
In this proposal, a module's "name" refers to its fully-qualified name,
|
||
meaning the fully-qualified name of the module's parent (if any) joined
|
||
to the simple name of the module by a period.
|
||
|
||
finder
|
||
------
|
||
|
||
A "finder" is an object that identifies the loader that the import
|
||
system should use to load a module. Currently this is accomplished by
|
||
calling the finder's find_module() method, which returns the loader.
|
||
|
||
Finders are strictly responsible for providing the loader, which they do
|
||
through their find_module() method. The import system then uses that
|
||
loader to load the module.
|
||
|
||
loader
|
||
------
|
||
|
||
A "loader" is an object that is used to load a module during import.
|
||
Currently this is done by calling the loader's load_module() method. A
|
||
loader may also provide APIs for getting information about the modules
|
||
it can load, as well as about data from sources associated with such a
|
||
module.
|
||
|
||
Right now loaders (via load_module()) are responsible for certain
|
||
boilerplate, import-related operations. These are:
|
||
|
||
1. Perform some (module-related) validation
|
||
2. Create the module object
|
||
3. Set import-related attributes on the module
|
||
4. "Register" the module to sys.modules
|
||
5. Exec the module
|
||
6. Clean up in the event of failure while loading the module
|
||
|
||
This all takes place during the import system's call to
|
||
Loader.load_module().
|
||
|
||
origin
|
||
------
|
||
|
||
This is a new term and concept. The idea of it exists subtly in the
|
||
import system already, but this proposal makes the concept explicit.
|
||
|
||
"origin" in an import context means the system (or resource within a
|
||
system) from which a module originates. For the purposes of this
|
||
proposal, "origin" is also a string which identifies such a resource or
|
||
system. "origin" is applicable to all modules.
|
||
|
||
For example, the origin for built-in and frozen modules is the
|
||
interpreter itself. The import system already identifies this origin as
|
||
"built-in" and "frozen", respectively. This is demonstrated in the
|
||
following module repr: "<module 'sys' (built-in)>".
|
||
|
||
In fact, the module repr is already a relatively reliable, though
|
||
implicit, indicator of a module's origin. Other modules also indicate
|
||
their origin through other means, as described in the entry for
|
||
"location".
|
||
|
||
It is up to the loader to decide on how to interpret and use a module's
|
||
origin, if at all.
|
||
|
||
location
|
||
--------
|
||
|
||
This is a new term. However the concept already exists clearly in the
|
||
import system, as associated with the ``__file__`` and ``__path__``
|
||
attributes of modules, as well as the name/term "path" elsewhere.
|
||
|
||
A "location" is a resource or "place", rather than a system at large,
|
||
from which a module is loaded. It qualifies as an "origin". Examples
|
||
of locations include filesystem paths and URLs. A location is
|
||
identified by the name of the resource, but may not necessarily identify
|
||
the system to which the resource pertains. In such cases the loader
|
||
would have to identify the system itself.
|
||
|
||
In contrast to other kinds of module origin, a location cannot be
|
||
inferred by the loader just by the module name. Instead, the loader
|
||
must be provided with a string to identify the location, usually by the
|
||
finder that generates the loader. The loader then uses this information
|
||
to locate the resource from which it will load the module. In theory
|
||
you could load the module at a given location under various names.
|
||
|
||
The most common example of locations in the import system are the
|
||
files from which source and extension modules are loaded. For these
|
||
modules the location is identified by the string in the ``__file__``
|
||
attribute. Although ``__file__`` isn't particularly accurate for some
|
||
modules (e.g. zipped), it is currently the only way that the import
|
||
system indicates that a module has a location.
|
||
|
||
A module that has a location may be called "locatable".
|
||
|
||
cache
|
||
-----
|
||
|
||
The import system stores compiled modules in the __pycache__ directory
|
||
as an optimization. This module cache that we use today was provided by
|
||
PEP 3147. For this proposal, the relevant API for module caching is the
|
||
``__cache__`` attribute of modules and the cache_from_source() function
|
||
in importlib.util. Loaders are responsible for putting modules into the
|
||
cache (and loading out of the cache). Currently the cache is only used
|
||
for compiled source modules. However, loaders may take advantage of
|
||
the module cache for other kinds of modules.
|
||
|
||
package
|
||
-------
|
||
|
||
The concept does not change, nor does the term. However, the
|
||
distinction between modules and packages is mostly superficial.
|
||
Packages *are* modules. They simply have a ``__path__`` attribute and
|
||
import may add attributes bound to submodules. The typically perceived
|
||
difference is a source of confusion. This proposal explicitly
|
||
de-emphasizes the distinction between packages and modules where it
|
||
makes sense to do so.
|
||
|
||
|
||
Motivation
|
||
==========
|
||
|
||
The import system has evolved over the lifetime of Python. In late 2002
|
||
PEP 302 introduced standardized import hooks via finders and
|
||
loaders and sys.meta_path. The importlib module, introduced
|
||
with Python 3.1, now exposes a pure Python implementation of the APIs
|
||
described by PEP 302, as well as of the full import system. It is now
|
||
much easier to understand and extend the import system. While a benefit
|
||
to the Python community, this greater accessabilty also presents a
|
||
challenge.
|
||
|
||
As more developers come to understand and customize the import system,
|
||
any weaknesses in the finder and loader APIs will be more impactful. So
|
||
the sooner we can address any such weaknesses the import system, the
|
||
better...and there are a couple we hope to take care of with this proposal.
|
||
|
||
Firstly, any time the import system needs to save information about a
|
||
module we end up with more attributes on module objects that are
|
||
generally only meaningful to the import system. It would be nice to
|
||
have a per-module namespace in which to put future import-related
|
||
information and to pass around within the import system. Secondly,
|
||
there's an API void between finders and loaders that causes undue
|
||
complexity when encountered. The PEP 420 (namespace packages)
|
||
implementation had to work around this. The complexity surfaced again
|
||
during recent efforts on a separate proposal. [#ref_files_pep]_
|
||
|
||
The `finder`_ and `loader`_ sections above detail current responsibility
|
||
of both. Notably, loaders are not required to provide any of the
|
||
functionality of their load_module() method through other methods. Thus,
|
||
though the import-related information about a module is likely available
|
||
without loading the module, it is not otherwise exposed.
|
||
|
||
Furthermore, the requirements associated with load_module() are
|
||
common to all loaders and mostly are implemented in exactly the same
|
||
way. This means every loader has to duplicate the same boilerplate
|
||
code. importlib.util provides some tools that help with this, but
|
||
it would be more helpful if the import system simply took charge of
|
||
these responsibilities. The trouble is that this would limit the degree
|
||
of customization that load_module() could easily continue to facilitate.
|
||
|
||
More importantly, While a finder *could* provide the information that
|
||
the loader's load_module() would need, it currently has no consistent
|
||
way to get it to the loader. This is a gap between finders and loaders
|
||
which this proposal aims to fill.
|
||
|
||
Finally, when the import system calls a finder's find_module(), the
|
||
finder makes use of a variety of information about the module that is
|
||
useful outside the context of the method. Currently the options are
|
||
limited for persisting that per-module information past the method call,
|
||
since it only returns the loader. Popular options for this limitation
|
||
are to store the information in a module-to-info mapping somewhere on
|
||
the finder itself, or store it on the loader.
|
||
|
||
Unfortunately, loaders are not required to be module-specific. On top
|
||
of that, some of the useful information finders could provide is
|
||
common to all finders, so ideally the import system could take care of
|
||
those details. This is the same gap as before between finders and
|
||
loaders.
|
||
|
||
As an example of complexity attributable to this flaw, the
|
||
implementation of namespace packages in Python 3.3 (see PEP 420) added
|
||
FileFinder.find_loader() because there was no good way for
|
||
find_module() to provide the namespace search locations.
|
||
|
||
The answer to this gap is a ModuleSpec object that contains the
|
||
per-module information and takes care of the boilerplate functionality
|
||
involved with loading the module.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
The goal is to address the gap between finders and loaders while
|
||
changing as little of their semantics as possible. Though some
|
||
functionality and information is moved to the new ModuleSpec type,
|
||
their behavior should remain the same. However, for the sake of clarity
|
||
the finder and loader semantics will be explicitly identified.
|
||
|
||
Here is a high-level summary of the changes described by this PEP. More
|
||
detail is available in later sections.
|
||
|
||
importlib.machinery.ModuleSpec (new)
|
||
------------------------------------
|
||
|
||
An encapsulation of a module's import-system-related state during import.
|
||
See the `ModuleSpec`_ section below for a more detailed description.
|
||
|
||
* ModuleSpec(name, loader, \*, origin=None, loader_state=None, is_package=None)
|
||
|
||
Attributes:
|
||
|
||
* name - a string for the fully-qualified name of the module.
|
||
* loader - the loader to use for loading.
|
||
* origin - the name of the place from which the module is loaded,
|
||
e.g. "builtin" for built-in modules and the filename for modules
|
||
loaded from source.
|
||
* submodule_search_locations - list of strings for where to find
|
||
submodules, if a package (None otherwise).
|
||
* loader_state - a container of extra module-specific data for use
|
||
during loading.
|
||
* cached (property) - a string for where the compiled module should be
|
||
stored.
|
||
* parent (RO-property) - the fully-qualified name of the package to
|
||
which the module belongs as a submodule (or None).
|
||
* has_location (RO-property) - a flag indicating whether or not the
|
||
module's "origin" attribute refers to a location.
|
||
|
||
importlib.util Additions
|
||
------------------------
|
||
|
||
These are ModuleSpec factory functions, meant as a convenience for
|
||
finders. See the `Factory Functions`_ section below for more detail.
|
||
|
||
* spec_from_file_location(name, location, \*, loader=None, submodule_search_locations=None)
|
||
- build a spec from file-oriented information and loader APIs.
|
||
* spec_from_loader(name, loader, \*, origin=None, is_package=None)
|
||
- build a spec with missing information filled in by using loader
|
||
APIs.
|
||
|
||
Other API Additions
|
||
-------------------
|
||
|
||
* importlib.find_spec(name, path=None, target=None) will work exactly
|
||
the same as importlib.find_loader() (which it replaces), but return a
|
||
spec instead of a loader.
|
||
|
||
For finders:
|
||
|
||
* importlib.abc.MetaPathFinder.find_spec(name, path, target) and
|
||
importlib.abc.PathEntryFinder.find_spec(name, target) will return a
|
||
module spec to use during import.
|
||
|
||
For loaders:
|
||
|
||
* importlib.abc.Loader.exec_module(module) will execute a module in its
|
||
own namespace. It replaces importlib.abc.Loader.load_module(), taking
|
||
over its module execution functionality.
|
||
* importlib.abc.Loader.create_module(spec) (optional) will return the
|
||
module to use for loading.
|
||
|
||
For modules:
|
||
|
||
* Module objects will have a new attribute: ``__spec__``.
|
||
|
||
API Changes
|
||
-----------
|
||
|
||
* InspectLoader.is_package() will become optional.
|
||
|
||
Deprecations
|
||
------------
|
||
|
||
* importlib.abc.MetaPathFinder.find_module()
|
||
* importlib.abc.PathEntryFinder.find_module()
|
||
* importlib.abc.PathEntryFinder.find_loader()
|
||
* importlib.abc.Loader.load_module()
|
||
* importlib.abc.Loader.module_repr()
|
||
* importlib.util.set_package()
|
||
* importlib.util.set_loader()
|
||
* importlib.find_loader()
|
||
|
||
Removals
|
||
--------
|
||
|
||
These were introduced prior to Python 3.4's release, so they can simply
|
||
be removed.
|
||
|
||
* importlib.abc.Loader.init_module_attrs()
|
||
* importlib.util.module_to_load()
|
||
|
||
Other Changes
|
||
-------------
|
||
|
||
* The import system implementation in importlib will be changed to make
|
||
use of ModuleSpec.
|
||
* importlib.reload() will make use of ModuleSpec.
|
||
* A module's import-related attributes (other than ``__spec__``) will no
|
||
longer be used directly by the import system during that module's
|
||
import. However, this does not impact use of those attributes
|
||
(e.g. ``__path__``) when loading other modules (e.g. submodules).
|
||
* Import-related attributes should no longer be added to modules
|
||
directly, except by the import system.
|
||
* The module type's ``__repr__()`` will be a thin wrapper around a pure
|
||
Python implementation which will leverage ModuleSpec.
|
||
* The spec for the ``__main__`` module will reflect the appropriate
|
||
name and origin.
|
||
|
||
Backward-Compatibility
|
||
----------------------
|
||
|
||
* If a finder does not define find_spec(), a spec is derived from
|
||
the loader returned by find_module().
|
||
* PathEntryFinder.find_loader() still takes priority over
|
||
find_module().
|
||
* Loader.load_module() is used if exec_module() is not defined.
|
||
|
||
What Will not Change?
|
||
---------------------
|
||
|
||
* The syntax and semantics of the import statement.
|
||
* Existing finders and loaders will continue to work normally.
|
||
* The import-related module attributes will still be initialized with
|
||
the same information.
|
||
* Finders will still create loaders (now storing them in specs).
|
||
* Loader.load_module(), if a module defines it, will have all the
|
||
same requirements and may still be called directly.
|
||
* Loaders will still be responsible for module data APIs.
|
||
* importlib.reload() will still overwrite the import-related attributes.
|
||
|
||
Responsibilities
|
||
----------------
|
||
|
||
Here's a quick breakdown of where responsibilities lie after this PEP.
|
||
|
||
finders:
|
||
|
||
* create/identify a loader that can load the module.
|
||
* create the spec for the module.
|
||
|
||
loaders:
|
||
|
||
* create the module (optional).
|
||
* execute the module.
|
||
|
||
ModuleSpec:
|
||
|
||
* orchestrate module loading
|
||
* boilerplate for module loading, including managing sys.modules and
|
||
setting import-related attributes
|
||
* create module if loader doesn't
|
||
* call loader.exec_module(), passing in the module in which to exec
|
||
* contain all the information the loader needs to exec the module
|
||
* provide the repr for modules
|
||
|
||
|
||
What Will Existing Finders and Loaders Have to Do Differently?
|
||
==============================================================
|
||
|
||
Immediately? Nothing. The status quo will be deprecated, but will
|
||
continue working. However, here are the things that the authors of
|
||
finders and loaders should change relative to this PEP:
|
||
|
||
* Implement find_spec() on finders.
|
||
* Implement exec_module() on loaders, if possible.
|
||
|
||
The ModuleSpec factory functions in importlib.util are intended to be
|
||
helpful for converting existing finders. spec_from_loader() and
|
||
spec_from_file_location() are both straight-forward utilities in this
|
||
regard.
|
||
|
||
For existing loaders, exec_module() should be a relatively direct
|
||
conversion from the non-boilerplate portion of load_module(). In some
|
||
uncommon cases the loader should also implement create_module().
|
||
|
||
|
||
ModuleSpec Users
|
||
================
|
||
|
||
ModuleSpec objects have 3 distinct target audiences: Python itself,
|
||
import hooks, and normal Python users.
|
||
|
||
Python will use specs in the import machinery, in interpreter startup,
|
||
and in various standard library modules. Some modules are
|
||
import-oriented, like pkgutil, and others are not, like pickle and
|
||
pydoc. In all cases, the full ModuleSpec API will get used.
|
||
|
||
Import hooks (finders and loaders) will make use of the spec in specific
|
||
ways. First of all, finders may use the spec factory functions in
|
||
importlib.util to create spec objects. They may also directly adjust
|
||
the spec attributes after the spec is created. Secondly, the finder may
|
||
bind additional information to the spec (in finder_extras) for the
|
||
loader to consume during module creation/execution. Finally, loaders
|
||
will make use of the attributes on a spec when creating and/or executing
|
||
a module.
|
||
|
||
Python users will be able to inspect a module's ``__spec__`` to get
|
||
import-related information about the object. Generally, Python
|
||
applications and interactive users will not be using the ``ModuleSpec``
|
||
factory functions nor any the instance methods.
|
||
|
||
|
||
How Loading Will Work
|
||
=====================
|
||
|
||
Here is an outline of what the import machinery does during loading,
|
||
adjusted to take advantage of the module's spec and the new loader API::
|
||
|
||
|
||
module = None
|
||
if spec.loader is not None and hasattr(spec.loader, 'create_module'):
|
||
module = spec.loader.create_module(spec)
|
||
if module is None:
|
||
module = ModuleType(spec.name)
|
||
# The import-related module attributes get set here:
|
||
_init_module_attrs(spec, module)
|
||
|
||
if spec.loader is None and spec.submodule_search_locations is not None:
|
||
# Namespace package
|
||
sys.modules[spec.name] = module
|
||
elif not hasattr(spec.loader, 'exec_module'):
|
||
spec.loader.load_module(spec.name)
|
||
# __loader__ and __package__ would be explicitly set here for
|
||
# backwards-compatibility.
|
||
else:
|
||
sys.modules[spec.name] = module
|
||
try:
|
||
spec.loader.exec_module(module)
|
||
except BaseException:
|
||
try:
|
||
del sys.modules[spec.name]
|
||
except KeyError:
|
||
pass
|
||
raise
|
||
module_to_return = sys.modules[spec.name]
|
||
|
||
These steps are exactly what Loader.load_module() is already
|
||
expected to do. Loaders will thus be simplified since they will only
|
||
need to implement exec_module().
|
||
|
||
Note that we must return the module from sys.modules. During loading
|
||
the module may have replaced itself in sys.modules. Since we don't have
|
||
a post-import hook API to accommodate the use case, we have to deal with
|
||
it. However, in the replacement case we do not worry about setting the
|
||
import-related module attributes on the object. The module writer is on
|
||
their own if they are doing this.
|
||
|
||
|
||
How Reloading Will Work
|
||
=======================
|
||
|
||
Here is the corresponding outline for reload()::
|
||
|
||
_RELOADING = {}
|
||
|
||
def reload(module):
|
||
try:
|
||
name = module.__spec__.name
|
||
except AttributeError:
|
||
name = module.__name__
|
||
spec = find_spec(name, target=module)
|
||
|
||
if sys.modules.get(name) is not module:
|
||
raise ImportError
|
||
if spec in _RELOADING:
|
||
return _RELOADING[name]
|
||
_RELOADING[name] = module
|
||
try:
|
||
if spec.loader is None:
|
||
# Namespace loader
|
||
_init_module_attrs(spec, module)
|
||
return module
|
||
if spec.parent and spec.parent not in sys.modules:
|
||
raise ImportError
|
||
|
||
_init_module_attrs(spec, module)
|
||
# Ignoring backwards-compatibility call to load_module()
|
||
# for simplicity.
|
||
spec.loader.exec_module(module)
|
||
return sys.modules[name]
|
||
finally:
|
||
del _RELOADING[name]
|
||
|
||
A key point here is the switch to Loader.exec_module() means that
|
||
loaders will no longer have an easy way to know at execution time if it
|
||
is a reload or not. Before this proposal, they could simply check to
|
||
see if the module was already in sys.modules. Now, by the time
|
||
exec_module() is called during load (not reload) the import machinery
|
||
would already have placed the module in sys.modules. This is part of
|
||
the reason why find_spec() has
|
||
`the "target" parameter <The "target" parameter of find_spec()>`_.
|
||
|
||
The semantics of reload will remain essentially the same as they exist
|
||
already [#reload-semantics-fix]_. The impact of this PEP on some kinds
|
||
of lazy loading modules was a point of discussion. [#lazy_import_concerns]_
|
||
|
||
|
||
ModuleSpec
|
||
==========
|
||
|
||
Attributes
|
||
----------
|
||
|
||
Each of the following names is an attribute on ModuleSpec objects. A
|
||
value of None indicates "not set". This contrasts with module
|
||
objects where the attribute simply doesn't exist. Most of the
|
||
attributes correspond to the import-related attributes of modules. Here
|
||
is the mapping. The reverse of this mapping describes how the import
|
||
machinery sets the module attributes right before calling exec_module().
|
||
|
||
========================== ==============
|
||
On ModuleSpec On Modules
|
||
========================== ==============
|
||
name __name__
|
||
loader __loader__
|
||
parent __package__
|
||
origin __file__*
|
||
cached __cached__*,**
|
||
submodule_search_locations __path__**
|
||
loader_state \-
|
||
has_location \-
|
||
========================== ==============
|
||
|
||
| \* Set on the module only if spec.has_location is true.
|
||
| \*\* Set on the module only if the spec attribute is not None.
|
||
|
||
While parent and has_location are read-only properties, the remaining
|
||
attributes can be replaced after the module spec is created and even
|
||
after import is complete. This allows for unusual cases where directly
|
||
modifying the spec is the best option. However, typical use should not
|
||
involve changing the state of a module's spec.
|
||
|
||
**origin**
|
||
|
||
"origin" is a string for the name of the place from which the module
|
||
originates. See `origin`_ above. Aside from the informational value,
|
||
it is also used in the module's repr. In the case of a spec where
|
||
"has_location" is true, ``__file__`` is set to the value of "origin".
|
||
For built-in modules "origin" would be set to "built-in".
|
||
|
||
**has_location**
|
||
|
||
As explained in the `location`_ section above, many modules are
|
||
"locatable", meaning there is a corresponding resource from which the
|
||
module will be loaded and that resource can be described by a string.
|
||
In contrast, non-locatable modules can't be loaded in this fashion, e.g.
|
||
builtin modules and modules dynamically created in code. For these, the
|
||
name is the only way to access them, so they have an "origin" but not a
|
||
"location".
|
||
|
||
"has_location" is true if the module is locatable. In that case the
|
||
spec's origin is used as the location and ``__file__`` is set to
|
||
spec.origin. If additional location information is required (e.g.
|
||
zipimport), that information may be stored in spec.loader_state.
|
||
|
||
"has_location" may be implied from the existence of a load_data() method
|
||
on the loader.
|
||
|
||
Incidentally, not all locatable modules will be cache-able, but most
|
||
will.
|
||
|
||
**submodule_search_locations**
|
||
|
||
The list of location strings, typically directory paths, in which to
|
||
search for submodules. If the module is a package this will be set to
|
||
a list (even an empty one). Otherwise it is None.
|
||
|
||
The name of the corresponding module attribute, ``__path__``, is
|
||
relatively ambiguous. Instead of mirroring it, we use a more explicit
|
||
attribute name that makes the purpose clear.
|
||
|
||
**loader_state**
|
||
|
||
A finder may set loader_state to any value to provide additional
|
||
data for the loader to use during loading. A value of None is the
|
||
default and indicates that there is no additional data. Otherwise it
|
||
can be set to any object, such as a dict, list, or
|
||
types.SimpleNamespace, containing the relevant extra information.
|
||
|
||
For example, zipimporter could use it to pass the zip archive name
|
||
to the loader directly, rather than needing to derive it from origin
|
||
or create a custom loader for each find operation.
|
||
|
||
loader_state is meant for use by the finder and corresponding loader.
|
||
It is not guaranteed to be a stable resource for any other use.
|
||
|
||
Factory Functions
|
||
-----------------
|
||
|
||
**spec_from_file_location(name, location, \*, loader=None, submodule_search_locations=None)**
|
||
|
||
Build a spec from file-oriented information and loader APIs.
|
||
|
||
* "origin" will be set to the location.
|
||
* "has_location" will be set to True.
|
||
* "cached" will be set to the result of calling cache_from_source().
|
||
|
||
* "origin" can be deduced from loader.get_filename() (if "location" is
|
||
not passed in.
|
||
* "loader" can be deduced from suffix if the location is a filename.
|
||
* "submodule_search_locations" can be deduced from loader.is_package()
|
||
and from os.path.dirname(location) if location is a filename.
|
||
|
||
**spec_from_loader(name, loader, \*, origin=None, is_package=None)**
|
||
|
||
Build a spec with missing information filled in by using loader APIs.
|
||
|
||
* "has_location" can be deduced from loader.get_data.
|
||
* "origin" can be deduced from loader.get_filename().
|
||
* "submodule_search_locations" can be deduced from loader.is_package()
|
||
and from os.path.dirname(location) if location is a filename.
|
||
|
||
Backward Compatibility
|
||
----------------------
|
||
|
||
ModuleSpec doesn't have any. This would be a different story if
|
||
Finder.find_module() were to return a module spec instead of loader.
|
||
In that case, specs would have to act like the loader that would have
|
||
been returned instead. Doing so would be relatively simple, but is an
|
||
unnecessary complication. It was part of earlier versions of this PEP.
|
||
|
||
Subclassing
|
||
-----------
|
||
|
||
Subclasses of ModuleSpec are allowed, but should not be necessary.
|
||
Simply setting loader_state or adding functionality to a custom
|
||
finder or loader will likely be a better fit and should be tried first.
|
||
However, as long as a subclass still fulfills the requirements of the
|
||
import system, objects of that type are completely fine as the return
|
||
value of Finder.find_spec(). The same points apply to duck-typing.
|
||
|
||
|
||
Existing Types
|
||
==============
|
||
|
||
Module Objects
|
||
--------------
|
||
|
||
Other than adding ``__spec__``, none of the import-related module
|
||
attributes will be changed or deprecated, though some of them could be;
|
||
any such deprecation can wait until Python 4.
|
||
|
||
A module's spec will not be kept in sync with the corresponding import-
|
||
related attributes. Though they may differ, in practice they will
|
||
typically be the same.
|
||
|
||
One notable exception is that case where a module is run as a script by
|
||
using the ``-m`` flag. In that case ``module.__spec__.name`` will
|
||
reflect the actual module name while ``module.__name__`` will be
|
||
``__main__``.
|
||
|
||
A module's spec is not guaranteed to be identical between two modules
|
||
with the same name. Likewise there is no guarantee that successive
|
||
calls to importlib.find_spec() will return the same object or even an
|
||
equivalent object, though at least the latter is likely.
|
||
|
||
Finders
|
||
-------
|
||
|
||
Finders are still responsible for identifying, and typically creating,
|
||
the loader that should be used to load a module. That loader will
|
||
now be stored in the module spec returned by find_spec() rather
|
||
than returned directly. As is currently the case without the PEP, if a
|
||
loader would be costly to create, that loader can be designed to defer
|
||
the cost until later.
|
||
|
||
**MetaPathFinder.find_spec(name, path=None, target=None)**
|
||
|
||
**PathEntryFinder.find_spec(name, target=None)**
|
||
|
||
Finders must return ModuleSpec objects when find_spec() is
|
||
called. This new method replaces find_module() and
|
||
find_loader() (in the PathEntryFinder case). If a loader does
|
||
not have find_spec(), find_module() and find_loader() are
|
||
used instead, for backward-compatibility.
|
||
|
||
Adding yet another similar method to loaders is a case of practicality.
|
||
find_module() could be changed to return specs instead of loaders.
|
||
This is tempting because the import APIs have suffered enough,
|
||
especially considering PathEntryFinder.find_loader() was just
|
||
added in Python 3.3. However, the extra complexity and a less-than-
|
||
explicit method name aren't worth it.
|
||
|
||
The "target" parameter of find_spec()
|
||
-------------------------------------
|
||
|
||
A call to find_spec() may optionally include a "target" argument. This
|
||
is the module object that will be used subsequently as the target of
|
||
loading. During normal import (and by default) "target" is None,
|
||
meaning the target module has yet to be created. During reloading the
|
||
module passed in to reload() is passed through to find_spec() as the
|
||
target. This argument allows the finder to build the module spec with
|
||
more information than is otherwise available. Doing so is particularly
|
||
relevant in identifying the loader to use.
|
||
|
||
Through find_spec() the finder will always identify the loader it
|
||
will return in the spec (or return None). At the point the loader is
|
||
identified, the finder should also decide whether or not the loader
|
||
supports loading into the target module, in the case that "target" is
|
||
passed in. This decision may entail consulting with the loader.
|
||
|
||
If the finder determines that the loader does not support loading into
|
||
the target module, it should either find another loader or raise
|
||
ImportError (completely stopping import of the module). This
|
||
determination is especially important during reload since, as noted in
|
||
`How Reloading Will Work`_, loaders will no longer be able to trivially
|
||
identify a reload situation on their own.
|
||
|
||
Two alternatives were presented to the "target" parameter:
|
||
Loader.supports_reload() and adding "target" to Loader.exec_module()
|
||
instead of find_spec(). supports_reload() was the initial approach to
|
||
the reload situation. [#supports_reload]_ However, there was some
|
||
opposition to the loader-specific, reload-centric approach.
|
||
[#supports_reload_considered_harmful]_
|
||
|
||
As to "target" on exec_module(), the loader may need other information
|
||
from the target module (or spec) during reload, more than just "does
|
||
this loader support reloading this module", that is no longer available
|
||
with the move away from load_module(). A proposal on the table was to
|
||
add something like "target" to exec_module(). [#exec_module_target]_
|
||
However, putting "target" on find_spec() instead is more in line with
|
||
the goals of this PEP. Furthermore, it obviates the need for
|
||
supports_reload().
|
||
|
||
Namespace Packages
|
||
------------------
|
||
|
||
Currently a path entry finder may return (None, portions) from
|
||
find_loader() to indicate it found part of a possible namespace
|
||
package. To achieve the same effect, find_spec() must return a spec
|
||
with "loader" set to None (a.k.a. not set) and with
|
||
submodule_search_locations set to the same portions as would have been
|
||
provided by find_loader(). It's up to PathFinder how to handle such
|
||
specs.
|
||
|
||
Loaders
|
||
-------
|
||
|
||
**Loader.exec_module(module)**
|
||
|
||
Loaders will have a new method, exec_module(). Its only job
|
||
is to "exec" the module and consequently populate the module's
|
||
namespace. It is not responsible for creating or preparing the module
|
||
object, nor for any cleanup afterward. It has no return value.
|
||
exec_module() will be used during both loading and reloading.
|
||
|
||
exec_module() should properly handle the case where it is called more
|
||
than once. For some kinds of modules this may mean raising ImportError
|
||
every time after the first time the method is called. This is
|
||
particularly relevant for reloading, where some kinds of modules do not
|
||
support in-place reloading.
|
||
|
||
**Loader.create_module(spec)**
|
||
|
||
Loaders may also implement create_module() that will return a
|
||
new module to exec. It may return None to indicate that the default
|
||
module creation code should be used. One use case, though atypical, for
|
||
create_module() is to provide a module that is a subclass of the builtin
|
||
module type. Most loaders will not need to implement create_module(),
|
||
|
||
create_module() should properly handle the case where it is called more
|
||
than once for the same spec/module. This may include returning None or
|
||
raising ImportError.
|
||
|
||
.. note::
|
||
|
||
exec_module() and create_module() should not set any import-related
|
||
module attributes. The fact that load_module() does is a design flaw
|
||
that this proposal aims to correct.
|
||
|
||
Other changes:
|
||
|
||
PEP 420 introduced the optional module_repr() loader method to limit
|
||
the amount of special-casing in the module type's ``__repr__()``. Since
|
||
this method is part of ModuleSpec, it will be deprecated on loaders.
|
||
However, if it exists on a loader it will be used exclusively.
|
||
|
||
Loader.init_module_attr() method, added prior to Python 3.4's
|
||
release, will be removed in favor of the same method on ModuleSpec.
|
||
|
||
However, InspectLoader.is_package() will not be deprecated even
|
||
though the same information is found on ModuleSpec. ModuleSpec
|
||
can use it to populate its own is_package if that information is
|
||
not otherwise available. Still, it will be made optional.
|
||
|
||
In addition to executing a module during loading, loaders will still be
|
||
directly responsible for providing APIs concerning module-related data.
|
||
|
||
|
||
Other Changes
|
||
=============
|
||
|
||
* The various finders and loaders provided by importlib will be
|
||
updated to comply with this proposal.
|
||
* Any other implmentations of or dependencies on the import-related APIs
|
||
(particularly finders and loaders) in the stdlib will be likewise
|
||
adjusted to this PEP. While they should continue to work, any such
|
||
changes that get missed should be considered bugs for the Python 3.4.x
|
||
series.
|
||
* The spec for the ``__main__`` module will reflect how the interpreter
|
||
was started. For instance, with ``-m`` the spec's name will be that
|
||
of the module used, while ``__main__.__name__`` will still be
|
||
"__main__".
|
||
* We will add importlib.find_spec() to mirror importlib.find_loader()
|
||
(which becomes deprecated).
|
||
* importlib.reload() is changed to use ModuleSpec.
|
||
* importlib.reload() will now make use of the per-module import lock.
|
||
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
A reference implementation is available at
|
||
http://bugs.python.org/issue18864.
|
||
|
||
Implementation Notes
|
||
--------------------
|
||
|
||
\* The implementation of this PEP needs to be cognizant of its impact on
|
||
pkgutil (and setuptools). pkgutil has some generic function-based
|
||
extensions to PEP 302 which may break if importlib starts wrapping
|
||
loaders without the tools' knowledge.
|
||
|
||
\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
|
||
inspect.
|
||
|
||
For instance, pickle should be updated in the ``__main__`` case to look
|
||
at ``module.__spec__.name``.
|
||
|
||
|
||
Rejected Additions to the PEP
|
||
=============================
|
||
|
||
There were a few proposed additions to this proposal that did not fit
|
||
well enough into its scope.
|
||
|
||
There is no "PathModuleSpec" subclass of ModuleSpec that separates out
|
||
has_location, cached, and submodule_search_locations. While that might
|
||
make the separation cleaner, module objects don't have that distinction.
|
||
ModuleSpec will support both cases equally well.
|
||
|
||
While "ModuleSpec.is_package" would be a simple additional attribute
|
||
(aliasing self.submodule_search_locations is not None), it perpetuates
|
||
the artificial (and mostly erroneous) distinction between modules and
|
||
packages.
|
||
|
||
The module spec `Factory Functions`_ could be classmethods on
|
||
ModuleSpec. However that would expose them on *all* modules via
|
||
``__spec__``, which has the potential to unnecessarily confuse
|
||
non-advanced Python users. The factory functions have a specific use
|
||
case, to support finder authors. See `ModuleSpec Users`_.
|
||
|
||
Likewise, several other methods could be added to ModuleSpec that expose
|
||
the specific uses of module specs by the import machinery:
|
||
|
||
* create() - a wrapper around Loader.create_module().
|
||
* exec(module) - a wrapper around Loader.exec_module().
|
||
* load() - an analogue to the deprecated Loader.load_module().
|
||
|
||
As with the factory functions, exposing these methods via
|
||
module.__spec__ is less than desireable. They would end up being an
|
||
attractive nuisance, even if only exposed as "private" attributes (as
|
||
they were in previous versions of this PEP). If someone finds a need
|
||
for these methods later, we can expose the via an appropriate API
|
||
(separate from ModuleSpec) at that point, perhaps relative to PEP 406
|
||
(import engine).
|
||
|
||
Conceivably, the load() method could optionally take a list of
|
||
modules with which to interact instead of sys.modules. Also, load()
|
||
could be leveraged to implement multi-version imports. Both are
|
||
interesting ideas, but definitely outside the scope of this proposal.
|
||
|
||
Others left out:
|
||
|
||
* Add ModuleSpec.submodules (RO-property) - returns possible submodules
|
||
relative to the spec.
|
||
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
|
||
any.
|
||
* Add ModuleSpec.data - a descriptor that wraps the data API of the
|
||
spec's loader.
|
||
* Also see [#cleaner_reload_support]_.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#ref_files_pep]
|
||
http://mail.python.org/pipermail/import-sig/2013-August/000658.html
|
||
|
||
.. [#import_system_docs] http://docs.python.org/3/reference/import.html
|
||
|
||
.. [#cleaner_reload_support]
|
||
https://mail.python.org/pipermail/import-sig/2013-September/000735.html
|
||
|
||
.. [#lazy_import_concerns]
|
||
https://mail.python.org/pipermail/python-dev/2013-August/128129.html
|
||
|
||
.. [#reload-semantics-fix] http://bugs.python.org/issue19413
|
||
|
||
.. [#supports_reload]
|
||
https://mail.python.org/pipermail/python-dev/2013-October/129913.html
|
||
.. [#supports_reload_considered_harmful]
|
||
https://mail.python.org/pipermail/python-dev/2013-October/129971.html
|
||
|
||
.. [#exec_module_target]
|
||
https://mail.python.org/pipermail/python-dev/2013-October/129933.html
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|
||
|