PEP: 451
Title: A ModuleSpec Type for the Import System
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently@gmail.com>
BDFL-Delegate: Brett Cannon <brett@python.org>, Alyssa Coghlan <ncoghlan@gmail.com>
Discussions-To: import-sig@python.org
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 08-Aug-2013
Python-Version: 3.4
Post-History: 08-Aug-2013, 28-Aug-2013, 18-Sep-2013, 24-Sep-2013, 04-Oct-2013
Resolution: https://mail.python.org/pipermail/python-dev/2013-November/130104.html


Abstract
========

This PEP proposes to add a new class to importlib.machinery called
"ModuleSpec".  It will provide all the import-related information used
to load a module and will be available without needing to load the
module first.  Finders will directly provide a module's spec instead of
a loader (which they will continue to provide indirectly).  The import
machinery will be adjusted to take advantage of module specs, including
using them to load modules.


Terms and Concepts
==================

The changes in this proposal are an opportunity to make several
existing terms and concepts more clear, whereas currently they are
(unfortunately) ambiguous.  New concepts are also introduced in this
proposal.  Finally, it's worth explaining a few other existing terms
with which people may not be so familiar.  For the sake of context, here
is a brief summary of all three groups of terms and concepts.  A more
detailed explanation of the import system is found at
[#import_system_docs]_.

name
----

In this proposal, a module's "name" refers to its fully-qualified name,
meaning the fully-qualified name of the module's parent (if any) joined
to the simple name of the module by a period.

finder
------

A "finder" is an object that identifies the loader that the import
system should use to load a module.  Currently this is accomplished by
calling the finder's find_module() method, which returns the loader.

Finders are strictly responsible for providing the loader, which they do
through their find_module() method. The import system then uses that
loader to load the module.

loader
------

A "loader" is an object that is used to load a module during import.
Currently this is done by calling the loader's load_module() method.  A
loader may also provide APIs for getting information about the modules
it can load, as well as about data from sources associated with such a
module.

Right now loaders (via load_module()) are responsible for certain
boilerplate, import-related operations.  These are:

1. Perform some (module-related) validation
2. Create the module object
3. Set import-related attributes on the module
4. "Register" the module to sys.modules
5. Exec the module
6. Clean up in the event of failure while loading the module

This all takes place during the import system's call to
Loader.load_module().

origin
------

This is a new term and concept.  The idea of it exists subtly in the
import system already, but this proposal makes the concept explicit.

"origin" in an import context means the system (or resource within a
system) from which a module originates.  For the purposes of this
proposal, "origin" is also a string which identifies such a resource or
system.  "origin" is applicable to all modules.

For example, the origin for built-in and frozen modules is the
interpreter itself.  The import system already identifies this origin as
"built-in" and "frozen", respectively.  This is demonstrated in the
following module repr: "<module 'sys' (built-in)>".

In fact, the module repr is already a relatively reliable, though
implicit, indicator of a module's origin.  Other modules also indicate
their origin through other means, as described in the entry for
"location".

It is up to the loader to decide on how to interpret and use a module's
origin, if at all.

location
--------

This is a new term.  However the concept already exists clearly in the
import system, as associated with the ``__file__`` and ``__path__``
attributes of modules, as well as the name/term "path" elsewhere.

A "location" is a resource or "place", rather than a system at large,
from which a module is loaded.  It qualifies as an "origin".  Examples
of locations include filesystem paths and URLs.  A location is
identified by the name of the resource, but may not necessarily identify
the system to which the resource pertains.  In such cases the loader
would have to identify the system itself.

In contrast to other kinds of module origin, a location cannot be
inferred by the loader just by the module name.  Instead, the loader
must be provided with a string to identify the location, usually by the
finder that generates the loader.  The loader then uses this information
to locate the resource from which it will load the module.  In theory
you could load the module at a given location under various names.

The most common example of locations in the import system are the
files from which source and extension modules are loaded.  For these
modules the location is identified by the string in the ``__file__``
attribute.  Although ``__file__`` isn't particularly accurate for some
modules (e.g. zipped), it is currently the only way that the import
system indicates that a module has a location.

A module that has a location may be called "locatable".

cache
-----

The import system stores compiled modules in the __pycache__ directory
as an optimization.  This module cache that we use today was provided by
:pep:`3147`.  For this proposal, the relevant API for module caching is the
``__cache__`` attribute of modules and the cache_from_source() function
in importlib.util.  Loaders are responsible for putting modules into the
cache (and loading out of the cache).   Currently the cache is only used
for compiled source modules.  However, loaders may take advantage of
the module cache for other kinds of modules.

package
-------

The concept does not change, nor does the term.  However, the
distinction between modules and packages is mostly superficial.
Packages *are* modules.  They simply have a ``__path__`` attribute and
import may add attributes bound to submodules.  The typically perceived
difference is a source of confusion.  This proposal explicitly
de-emphasizes the distinction between packages and modules where it
makes sense to do so.


Motivation
==========

The import system has evolved over the lifetime of Python.  In late 2002
:pep:`302` introduced standardized import hooks via finders and
loaders and sys.meta_path.  The importlib module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by :pep:`302`, as well as of the full import system.  It is now
much easier to understand and extend the import system.  While a benefit
to the Python community, this greater accessibility also presents a
challenge.

As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful.  So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we hope to take care of with this proposal.

Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system.  It would be nice to
have a per-module namespace in which to put future import-related
information and to pass around within the import system.  Secondly,
there's an API void between finders and loaders that causes undue
complexity when encountered.  The :pep:`420` (namespace packages)
implementation had to work around this.  The complexity surfaced again
during recent efforts on a separate proposal. [#ref_files_pep]_

The `finder`_ and `loader`_ sections above detail current responsibility
of both.  Notably, loaders are not required to provide any of the
functionality of their load_module() method through other methods.  Thus,
though the import-related information about a module is likely available
without loading the module, it is not otherwise exposed.

Furthermore, the requirements associated with load_module() are
common to all loaders and mostly are implemented in exactly the same
way.  This means every loader has to duplicate the same boilerplate
code.  importlib.util provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities.  The trouble is that this would limit the degree
of customization that load_module() could easily continue to facilitate.

More importantly, While a finder *could* provide the information that
the loader's load_module() would need, it currently has no consistent
way to get it to the loader.  This is a gap between finders and loaders
which this proposal aims to fill.

Finally, when the import system calls a finder's find_module(), the
finder makes use of a variety of information about the module that is
useful outside the context of the method.  Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader.  Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.

Unfortunately, loaders are not required to be module-specific.  On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
those details.  This is the same gap as before between finders and
loaders.

As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see :pep:`420`) added
FileFinder.find_loader() because there was no good way for
find_module() to provide the namespace search locations.

The answer to this gap is a ModuleSpec object that contains the
per-module information and takes care of the boilerplate functionality
involved with loading the module.


Specification
=============

The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible.  Though some
functionality and information is moved to the new ModuleSpec type,
their behavior should remain the same.  However, for the sake of clarity
the finder and loader semantics will be explicitly identified.

Here is a high-level summary of the changes described by this PEP.  More
detail is available in later sections.

importlib.machinery.ModuleSpec (new)
------------------------------------

An encapsulation of a module's import-system-related state during import.
See the `ModuleSpec`_ section below for a more detailed description.

* ModuleSpec(name, loader, \*, origin=None, loader_state=None, is_package=None)

Attributes:

* name - a string for the fully-qualified name of the module.
* loader - the loader to use for loading.
* origin - the name of the place from which the module is loaded,
  e.g. "builtin" for built-in modules and the filename for modules
  loaded from source.
* submodule_search_locations - list of strings for where to find
  submodules, if a package (None otherwise).
* loader_state - a container of extra module-specific data for use
  during loading.
* cached (property) - a string for where the compiled module should be
  stored.
* parent (RO-property) - the fully-qualified name of the package to
  which the module belongs as a submodule (or None).
* has_location (RO-property) - a flag indicating whether or not the
  module's "origin" attribute refers to a location.

importlib.util Additions
------------------------

These are ModuleSpec factory functions, meant as a convenience for
finders.  See the `Factory Functions`_ section below for more detail.

* spec_from_file_location(name, location, \*, loader=None, submodule_search_locations=None)
  - build a spec from file-oriented information and loader APIs.
* spec_from_loader(name, loader, \*, origin=None, is_package=None)
  - build a spec with missing information filled in by using loader
  APIs.

Other API Additions
-------------------

* importlib.find_spec(name, path=None, target=None) will work exactly
  the same as importlib.find_loader() (which it replaces), but return a
  spec instead of a loader.

For finders:

* importlib.abc.MetaPathFinder.find_spec(name, path, target) and
  importlib.abc.PathEntryFinder.find_spec(name, target) will return a
  module spec to use during import.

For loaders:

* importlib.abc.Loader.exec_module(module) will execute a module in its
  own namespace.  It replaces importlib.abc.Loader.load_module(), taking
  over its module execution functionality.
* importlib.abc.Loader.create_module(spec) (optional) will return the
  module to use for loading.

For modules:

* Module objects will have a new attribute: ``__spec__``.

API Changes
-----------

* InspectLoader.is_package() will become optional.

Deprecations
------------

* importlib.abc.MetaPathFinder.find_module()
* importlib.abc.PathEntryFinder.find_module()
* importlib.abc.PathEntryFinder.find_loader()
* importlib.abc.Loader.load_module()
* importlib.abc.Loader.module_repr()
* importlib.util.set_package()
* importlib.util.set_loader()
* importlib.find_loader()

Removals
--------

These were introduced prior to Python 3.4's release, so they can simply
be removed.

* importlib.abc.Loader.init_module_attrs()
* importlib.util.module_to_load()

Other Changes
-------------

* The import system implementation in importlib will be changed to make
  use of ModuleSpec.
* importlib.reload() will make use of ModuleSpec.
* A module's import-related attributes (other than ``__spec__``) will no
  longer be used directly by the import system during that module's
  import.  However, this does not impact use of those attributes
  (e.g. ``__path__``) when loading other modules (e.g. submodules).
* Import-related attributes should no longer be added to modules
  directly, except by the import system.
* The module type's ``__repr__()`` will be a thin wrapper around a pure
  Python implementation which will leverage ModuleSpec.
* The spec for the ``__main__`` module will reflect the appropriate
  name and origin.

Backward-Compatibility
----------------------

* If a finder does not define find_spec(), a spec is derived from
  the loader returned by find_module().
* PathEntryFinder.find_loader() still takes priority over
  find_module().
* Loader.load_module() is used if exec_module() is not defined.

What Will not Change?
---------------------

* The syntax and semantics of the import statement.
* Existing finders and loaders will continue to work normally.
* The import-related module attributes will still be initialized with
  the same information.
* Finders will still create loaders (now storing them in specs).
* Loader.load_module(), if a module defines it, will have all the
  same requirements and may still be called directly.
* Loaders will still be responsible for module data APIs.
* importlib.reload() will still overwrite the import-related attributes.

Responsibilities
----------------

Here's a quick breakdown of where responsibilities lie after this PEP.

finders:

* create/identify a loader that can load the module.
* create the spec for the module.

loaders:

* create the module (optional).
* execute the module.

ModuleSpec:

* orchestrate module loading
* boilerplate for module loading, including managing sys.modules and
  setting import-related attributes
* create module if loader doesn't
* call loader.exec_module(), passing in the module in which to exec
* contain all the information the loader needs to exec the module
* provide the repr for modules


What Will Existing Finders and Loaders Have to Do Differently?
==============================================================

Immediately?  Nothing.  The status quo will be deprecated, but will
continue working.  However, here are the things that the authors of
finders and loaders should change relative to this PEP:

* Implement find_spec() on finders.
* Implement exec_module() on loaders, if possible.

The ModuleSpec factory functions in importlib.util are intended to be
helpful for converting existing finders.  spec_from_loader() and
spec_from_file_location() are both straightforward utilities in this
regard.

For existing loaders, exec_module() should be a relatively direct
conversion from the non-boilerplate portion of load_module().  In some
uncommon cases the loader should also implement create_module().


ModuleSpec Users
================

ModuleSpec objects have 3 distinct target audiences: Python itself,
import hooks, and normal Python users.

Python will use specs in the import machinery, in interpreter startup,
and in various standard library modules.  Some modules are
import-oriented, like pkgutil, and others are not, like pickle and
pydoc.  In all cases, the full ModuleSpec API will get used.

Import hooks (finders and loaders) will make use of the spec in specific
ways.  First of all, finders may use the spec factory functions in
importlib.util to create spec objects.  They may also directly adjust
the spec attributes after the spec is created.  Secondly, the finder may
bind additional information to the spec (in finder_extras) for the
loader to consume during module creation/execution.  Finally, loaders
will make use of the attributes on a spec when creating and/or executing
a module.

Python users will be able to inspect a module's ``__spec__`` to get
import-related information about the object.  Generally, Python
applications and interactive users will not be using the ``ModuleSpec``
factory functions nor any the instance methods.


How Loading Will Work
=====================

Here is an outline of what the import machinery does during loading,
adjusted to take advantage of the module's spec and the new loader API::


    module = None
    if spec.loader is not None and hasattr(spec.loader, 'create_module'):
        module = spec.loader.create_module(spec)
    if module is None:
        module = ModuleType(spec.name)
    # The import-related module attributes get set here:
    _init_module_attrs(spec, module)

    if spec.loader is None and spec.submodule_search_locations is not None:
        # Namespace package
        sys.modules[spec.name] = module
    elif not hasattr(spec.loader, 'exec_module'):
        spec.loader.load_module(spec.name)
        # __loader__ and __package__ would be explicitly set here for
        # backwards-compatibility.
    else:
        sys.modules[spec.name] = module
        try:
            spec.loader.exec_module(module)
        except BaseException:
            try:
                del sys.modules[spec.name]
            except KeyError:
                pass
            raise
    module_to_return = sys.modules[spec.name]

These steps are exactly what Loader.load_module() is already
expected to do.  Loaders will thus be simplified since they will only
need to implement exec_module().

Note that we must return the module from sys.modules.  During loading
the module may have replaced itself in sys.modules.  Since we don't have
a post-import hook API to accommodate the use case, we have to deal with
it.  However, in the replacement case we do not worry about setting the
import-related module attributes on the object.  The module writer is on
their own if they are doing this.


How Reloading Will Work
=======================

Here is the corresponding outline for reload()::

    _RELOADING = {}

    def reload(module):
        try:
            name = module.__spec__.name
        except AttributeError:
            name = module.__name__
        spec = find_spec(name, target=module)

        if sys.modules.get(name) is not module:
            raise ImportError
        if spec in _RELOADING:
            return _RELOADING[name]
        _RELOADING[name] = module
        try:
            if spec.loader is None:
                # Namespace loader
                _init_module_attrs(spec, module)
                return module
            if spec.parent and spec.parent not in sys.modules:
                raise ImportError

            _init_module_attrs(spec, module)
            # Ignoring backwards-compatibility call to load_module()
            # for simplicity.
            spec.loader.exec_module(module)
            return sys.modules[name]
        finally:
            del _RELOADING[name]

A key point here is the switch to Loader.exec_module() means that
loaders will no longer have an easy way to know at execution time if it
is a reload or not.  Before this proposal, they could simply check to
see if the module was already in sys.modules.  Now, by the time
exec_module() is called during load (not reload) the import machinery
would already have placed the module in sys.modules.  This is part of
the reason why find_spec() has
`the "target" parameter <The "target" parameter of find_spec()>`_.

The semantics of reload will remain essentially the same as they exist
already [#reload-semantics-fix]_.  The impact of this PEP on some kinds
of lazy loading modules was a point of discussion. [#lazy_import_concerns]_


ModuleSpec
==========

Attributes
----------

Each of the following names is an attribute on ModuleSpec objects.  A
value of None indicates "not set".  This contrasts with module
objects where the attribute simply doesn't exist.  Most of the
attributes correspond to the import-related attributes of modules.  Here
is the mapping.  The reverse of this mapping describes how the import
machinery sets the module attributes right before calling exec_module().

========================== ==============
On ModuleSpec              On Modules
========================== ==============
name                       __name__
loader                     __loader__
parent                     __package__
origin                     __file__*
cached                     __cached__*,**
submodule_search_locations __path__**
loader_state                \-
has_location                \-
========================== ==============

| \* Set on the module only if spec.has_location is true.
| \*\* Set on the module only if the spec attribute is not None.

While parent and has_location are read-only properties, the remaining
attributes can be replaced after the module spec is created and even
after import is complete.  This allows for unusual cases where directly
modifying the spec is the best option.  However, typical use should not
involve changing the state of a module's spec.

**origin**

"origin" is a string for the name of the place from which the module
originates.  See `origin`_ above.  Aside from the informational value,
it is also used in the module's repr.  In the case of a spec where
"has_location" is true, ``__file__`` is set to the value of "origin".
For built-in modules "origin" would be set to "built-in".

**has_location**

As explained in the `location`_ section above, many modules are
"locatable", meaning there is a corresponding resource from which the
module will be loaded and that resource can be described by a string.
In contrast, non-locatable modules can't be loaded in this fashion, e.g.
builtin modules and modules dynamically created in code.  For these, the
name is the only way to access them, so they have an "origin" but not a
"location".

"has_location" is true if the module is locatable.  In that case the
spec's origin is used as the location and ``__file__`` is set to
spec.origin.  If additional location information is required (e.g.
zipimport), that information may be stored in spec.loader_state.

"has_location" may be implied from the existence of a load_data() method
on the loader.

Incidentally, not all locatable modules will be cache-able, but most
will.

**submodule_search_locations**

The list of location strings, typically directory paths, in which to
search for submodules.  If the module is a package this will be set to
a list (even an empty one).  Otherwise it is None.

The name of the corresponding module attribute, ``__path__``, is
relatively ambiguous.  Instead of mirroring it, we use a more explicit
attribute name that makes the purpose clear.

**loader_state**

A finder may set loader_state to any value to provide additional
data for the loader to use during loading.  A value of None is the
default and indicates that there is no additional data.  Otherwise it
can be set to any object, such as a dict, list, or
types.SimpleNamespace, containing the relevant extra information.

For example, zipimporter could use it to pass the zip archive name
to the loader directly, rather than needing to derive it from origin
or create a custom loader for each find operation.

loader_state is meant for use by the finder and corresponding loader.
It is not guaranteed to be a stable resource for any other use.

Factory Functions
-----------------

**spec_from_file_location(name, location, \*, loader=None, submodule_search_locations=None)**

Build a spec from file-oriented information and loader APIs.

* "origin" will be set to the location.
* "has_location" will be set to True.
* "cached" will be set to the result of calling cache_from_source().

* "origin" can be deduced from loader.get_filename() (if "location" is
  not passed in.
* "loader" can be deduced from suffix if the location is a filename.
* "submodule_search_locations" can be deduced from loader.is_package()
  and from os.path.dirname(location) if location is a filename.

**spec_from_loader(name, loader, \*, origin=None, is_package=None)**

Build a spec with missing information filled in by using loader APIs.

* "has_location" can be deduced from loader.get_data.
* "origin" can be deduced from loader.get_filename().
* "submodule_search_locations" can be deduced from loader.is_package()
  and from os.path.dirname(location) if location is a filename.

Backward Compatibility
----------------------

ModuleSpec doesn't have any.  This would be a different story if
Finder.find_module() were to return a module spec instead of loader.
In that case, specs would have to act like the loader that would have
been returned instead.  Doing so would be relatively simple, but is an
unnecessary complication.  It was part of earlier versions of this PEP.

Subclassing
-----------

Subclasses of ModuleSpec are allowed, but should not be necessary.
Simply setting loader_state or adding functionality to a custom
finder or loader will likely be a better fit and should be tried first.
However, as long as a subclass still fulfills the requirements of the
import system, objects of that type are completely fine as the return
value of Finder.find_spec().  The same points apply to duck-typing.


Existing Types
==============

Module Objects
--------------

Other than adding ``__spec__``, none of the import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.

A module's spec will not be kept in sync with the corresponding
import-related attributes.  Though they may differ, in practice they
will typically be the same.

One notable exception is that case where a module is run as a script by
using the ``-m`` flag.  In that case ``module.__spec__.name`` will
reflect the actual module name while ``module.__name__`` will be
``__main__``.

A module's spec is not guaranteed to be identical between two modules
with the same name.  Likewise there is no guarantee that successive
calls to importlib.find_spec() will return the same object or even an
equivalent object, though at least the latter is likely.

Finders
-------

Finders are still responsible for identifying, and typically creating,
the loader that should be used to load a module.  That loader will
now be stored in the module spec returned by find_spec() rather
than returned directly.  As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.

**MetaPathFinder.find_spec(name, path=None, target=None)**

**PathEntryFinder.find_spec(name, target=None)**

Finders must return ModuleSpec objects when find_spec() is
called.  This new method replaces find_module() and
find_loader() (in the PathEntryFinder case).  If a loader does
not have find_spec(), find_module() and find_loader() are
used instead, for backward-compatibility.

Adding yet another similar method to loaders is a case of practicality.
find_module() could be changed to return specs instead of loaders.
This is tempting because the import APIs have suffered enough,
especially considering PathEntryFinder.find_loader() was just
added in Python 3.3.  However, the extra complexity and a
less-than-explicit method name aren't worth it.

The "target" parameter of find_spec()
-------------------------------------

A call to find_spec() may optionally include a "target" argument.  This
is the module object that will be used subsequently as the target of
loading.  During normal import (and by default) "target" is None,
meaning the target module has yet to be created.  During reloading the
module passed in to reload() is passed through to find_spec() as the
target.  This argument allows the finder to build the module spec with
more information than is otherwise available.  Doing so is particularly
relevant in identifying the loader to use.

Through find_spec() the finder will always identify the loader it
will return in the spec (or return None).  At the point the loader is
identified, the finder should also decide whether or not the loader
supports loading into the target module, in the case that "target" is
passed in.  This decision may entail consulting with the loader.

If the finder determines that the loader does not support loading into
the target module, it should either find another loader or raise
ImportError (completely stopping import of the module).  This
determination is especially important during reload since, as noted in
`How Reloading Will Work`_, loaders will no longer be able to trivially
identify a reload situation on their own.

Two alternatives were presented to the "target" parameter:
Loader.supports_reload() and adding "target" to Loader.exec_module()
instead of find_spec().  supports_reload() was the initial approach to
the reload situation. [#supports_reload]_  However, there was some
opposition to the loader-specific, reload-centric approach.
[#supports_reload_considered_harmful]_

As to "target" on exec_module(), the loader may need other information
from the target module (or spec) during reload, more than just "does
this loader support reloading this module", that is no longer available
with the move away from load_module().  A proposal on the table was to
add something like "target" to exec_module().  [#exec_module_target]_
However, putting "target" on find_spec() instead is more in line with
the goals of this PEP.  Furthermore, it obviates the need for
supports_reload().

Namespace Packages
------------------

Currently a path entry finder may return (None, portions) from
find_loader() to indicate it found part of a possible namespace
package.  To achieve the same effect, find_spec() must return a spec
with "loader" set to None (a.k.a. not set) and with
submodule_search_locations set to the same portions as would have been
provided by find_loader().  It's up to PathFinder how to handle such
specs.

Loaders
-------

**Loader.exec_module(module)**

Loaders will have a new method, exec_module().  Its only job
is to "exec" the module and consequently populate the module's
namespace.  It is not responsible for creating or preparing the module
object, nor for any cleanup afterward.  It has no return value.
exec_module() will be used during both loading and reloading.

exec_module() should properly handle the case where it is called more
than once.  For some kinds of modules this may mean raising ImportError
every time after the first time the method is called.  This is
particularly relevant for reloading, where some kinds of modules do not
support in-place reloading.

**Loader.create_module(spec)**

Loaders may also implement create_module() that will return a
new module to exec.  It may return None to indicate that the default
module creation code should be used.  One use case, though atypical, for
create_module() is to provide a module that is a subclass of the builtin
module type.  Most loaders will not need to implement create_module(),

create_module() should properly handle the case where it is called more
than once for the same spec/module.  This may include returning None or
raising ImportError.

.. note::

   exec_module() and create_module() should not set any import-related
   module attributes.  The fact that load_module() does is a design flaw
   that this proposal aims to correct.

Other changes:

:pep:`420` introduced the optional module_repr() loader method to limit
the amount of special-casing in the module type's ``__repr__()``.  Since
this method is part of ModuleSpec, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.

Loader.init_module_attr() method, added prior to Python 3.4's
release, will be removed in favor of the same method on ModuleSpec.

However, InspectLoader.is_package() will not be deprecated even
though the same information is found on ModuleSpec.  ModuleSpec
can use it to populate its own is_package if that information is
not otherwise available.  Still, it will be made optional.

In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.


Other Changes
=============

* The various finders and loaders provided by importlib will be
  updated to comply with this proposal.
* Any other implementations of or dependencies on the import-related APIs
  (particularly finders and loaders) in the stdlib will be likewise
  adjusted to this PEP.  While they should continue to work, any such
  changes that get missed should be considered bugs for the Python 3.4.x
  series.
* The spec for the ``__main__`` module will reflect how the interpreter
  was started.  For instance, with ``-m`` the spec's name will be that
  of the module used, while ``__main__.__name__`` will still be
  "__main__".
* We will add importlib.find_spec() to mirror importlib.find_loader()
  (which becomes deprecated).
* importlib.reload() is changed to use ModuleSpec.
* importlib.reload() will now make use of the per-module import lock.


Reference Implementation
========================

A reference implementation is available at
http://bugs.python.org/issue18864.

Implementation Notes
--------------------

\* The implementation of this PEP needs to be cognizant of its impact on
pkgutil (and setuptools).  pkgutil has some generic function-based
extensions to :pep:`302` which may break if importlib starts wrapping
loaders without the tools' knowledge.

\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
inspect.

For instance, pickle should be updated in the ``__main__`` case to look
at ``module.__spec__.name``.


Rejected Additions to the PEP
=============================

There were a few proposed additions to this proposal that did not fit
well enough into its scope.

There is no "PathModuleSpec" subclass of ModuleSpec that separates out
has_location, cached, and submodule_search_locations.  While that might
make the separation cleaner, module objects don't have that distinction.
ModuleSpec will support both cases equally well.

While "ModuleSpec.is_package" would be a simple additional attribute
(aliasing self.submodule_search_locations is not None), it perpetuates
the artificial (and mostly erroneous) distinction between modules and
packages.

The module spec `Factory Functions`_ could be classmethods on
ModuleSpec.  However that would expose them on *all* modules via
``__spec__``, which has the potential to unnecessarily confuse
non-advanced Python users.  The factory functions have a specific use
case, to support finder authors.  See `ModuleSpec Users`_.

Likewise, several other methods could be added to ModuleSpec that expose
the specific uses of module specs by the import machinery:

* create() - a wrapper around Loader.create_module().
* exec(module) - a wrapper around Loader.exec_module().
* load() - an analogue to the deprecated Loader.load_module().

As with the factory functions, exposing these methods via
module.__spec__ is less than desirable.  They would end up being an
attractive nuisance, even if only exposed as "private" attributes (as
they were in previous versions of this PEP).  If someone finds a need
for these methods later, we can expose the via an appropriate API
(separate from ModuleSpec) at that point, perhaps relative to :pep:`406`
(import engine).

Conceivably, the load() method could optionally take a list of
modules with which to interact instead of sys.modules.  Also, load()
could be leveraged to implement multi-version imports.  Both are
interesting ideas, but definitely outside the scope of this proposal.

Others left out:

* Add ModuleSpec.submodules (RO-property) - returns possible submodules
  relative to the spec.
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
  any.
* Add ModuleSpec.data - a descriptor that wraps the data API of the
  spec's loader.
* Also see [#cleaner_reload_support]_.


References
==========

.. [#ref_files_pep]
   https://mail.python.org/pipermail/import-sig/2013-August/000658.html

.. [#import_system_docs] http://docs.python.org/3/reference/import.html

.. [#cleaner_reload_support]
   https://mail.python.org/pipermail/import-sig/2013-September/000735.html

.. [#lazy_import_concerns]
   https://mail.python.org/pipermail/python-dev/2013-August/128129.html

.. [#reload-semantics-fix] http://bugs.python.org/issue19413

.. [#supports_reload]
   https://mail.python.org/pipermail/python-dev/2013-October/129913.html
.. [#supports_reload_considered_harmful]
   https://mail.python.org/pipermail/python-dev/2013-October/129971.html

.. [#exec_module_target]
   https://mail.python.org/pipermail/python-dev/2013-October/129933.html

Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: