* reST-ified

* Remove long dead links
* Update reference numbers
This commit is contained in:
Barry Warsaw 2012-05-03 11:47:57 -04:00
parent 7ef0ca1bd6
commit ca0189a132
1 changed files with 409 additions and 429 deletions

View File

@ -6,279 +6,267 @@ Author: Just van Rossum <just@letterror.com>,
Paul Moore <gustav@morpheus.demon.co.uk>
Status: Final
Type: Standards Track
Content-Type: text/plain
Content-Type: text/x-rst
Created: 19-Dec-2002
Python-Version: 2.3
Post-History: 19-Dec-2002
Abstract
========
This PEP proposes to add a new set of import hooks that offer better
customization of the Python import mechanism. Contrary to the
current __import__ hook, a new-style hook can be injected into the
existing scheme, allowing for a finer grained control of how modules
are found and how they are loaded.
This PEP proposes to add a new set of import hooks that offer better
customization of the Python import mechanism. Contrary to the current
``__import__`` hook, a new-style hook can be injected into the existing
scheme, allowing for a finer grained control of how modules are found and how
they are loaded.
Motivation
==========
The only way to customize the import mechanism is currently to
override the built-in __import__ function. However, overriding
__import__ has many problems. To begin with:
The only way to customize the import mechanism is currently to override the
built-in ``__import__`` function. However, overriding ``__import__`` has many
problems. To begin with:
- An __import__ replacement needs to *fully* reimplement the entire
import mechanism, or call the original __import__ before or after
the custom code.
* An ``__import__`` replacement needs to *fully* reimplement the entire
import mechanism, or call the original ``__import__`` before or after the
custom code.
- It has very complex semantics and responsibilities.
* It has very complex semantics and responsibilities.
- __import__ gets called even for modules that are already in
sys.modules, which is almost never what you want, unless you're
writing some sort of monitoring tool.
* ``__import__`` gets called even for modules that are already in
``sys.modules``, which is almost never what you want, unless you're writing
some sort of monitoring tool.
The situation gets worse when you need to extend the import
mechanism from C: it's currently impossible, apart from hacking
Python's import.c or reimplementing much of import.c from scratch.
The situation gets worse when you need to extend the import mechanism from C:
it's currently impossible, apart from hacking Python's ``import.c`` or
reimplementing much of ``import.c`` from scratch.
There is a fairly long history of tools written in Python that allow
extending the import mechanism in various way, based on the
__import__ hook. The Standard Library includes two such tools:
ihooks.py (by GvR) and imputil.py (Greg Stein), but perhaps the most
famous is iu.py by Gordon McMillan, available as part of his
Installer [1] package. Their usefulness is somewhat limited because
they are written in Python; bootstrapping issues need to worked
around as you can't load the module containing the hook with the
hook itself. So if you want the entire Standard Library to be
loadable from an import hook, the hook must be written in C.
There is a fairly long history of tools written in Python that allow extending
the import mechanism in various way, based on the ``__import__`` hook. The
Standard Library includes two such tools: ``ihooks.py`` (by GvR) and
``imputil.py`` [1]_ (Greg Stein), but perhaps the most famous is ``iu.py`` by
Gordon McMillan, available as part of his Installer package. Their usefulness
is somewhat limited because they are written in Python; bootstrapping issues
need to worked around as you can't load the module containing the hook with
the hook itself. So if you want the entire Standard Library to be loadable
from an import hook, the hook must be written in C.
Use cases
=========
This section lists several existing applications that depend on
import hooks. Among these, a lot of duplicate work was done that
could have been saved if there had been a more flexible import hook
at the time. This PEP should make life a lot easier for similar
projects in the future.
This section lists several existing applications that depend on import hooks.
Among these, a lot of duplicate work was done that could have been saved if
there had been a more flexible import hook at the time. This PEP should make
life a lot easier for similar projects in the future.
Extending the import mechanism is needed when you want to load
modules that are stored in a non-standard way. Examples include
modules that are bundled together in an archive; byte code that is
not stored in a pyc formatted file; modules that are loaded from a
database over a network.
Extending the import mechanism is needed when you want to load modules that
are stored in a non-standard way. Examples include modules that are bundled
together in an archive; byte code that is not stored in a ``pyc`` formatted
file; modules that are loaded from a database over a network.
The work on this PEP was partly triggered by the implementation of
PEP 273 [2], which adds imports from Zip archives as a built-in
feature to Python. While the PEP itself was widely accepted as a
must-have feature, the implementation left a few things to desire.
For one thing it went through great lengths to integrate itself with
import.c, adding lots of code that was either specific for Zip file
imports or *not* specific to Zip imports, yet was not generally
useful (or even desirable) either. Yet the PEP 273 implementation
can hardly be blamed for this: it is simply extremely hard to do,
given the current state of import.c.
The work on this PEP was partly triggered by the implementation of PEP 273,
which adds imports from Zip archives as a built-in feature to Python. While
the PEP itself was widely accepted as a must-have feature, the implementation
left a few things to desire. For one thing it went through great lengths to
integrate itself with ``import.c``, adding lots of code that was either
specific for Zip file imports or *not* specific to Zip imports, yet was not
generally useful (or even desirable) either. Yet the PEP 273 implementation
can hardly be blamed for this: it is simply extremely hard to do, given the
current state of ``import.c``.
Packaging applications for end users is a typical use case for
import hooks, if not *the* typical use case. Distributing lots of
source or pyc files around is not always appropriate (let alone a
separate Python installation), so there is a frequent desire to
package all needed modules in a single file. So frequent in fact
that multiple solutions have been implemented over the years.
Packaging applications for end users is a typical use case for import hooks,
if not *the* typical use case. Distributing lots of source or ``pyc`` files
around is not always appropriate (let alone a separate Python installation),
so there is a frequent desire to package all needed modules in a single file.
So frequent in fact that multiple solutions have been implemented over the
years.
The oldest one is included with the Python source code: Freeze [3].
It puts marshalled byte code into static objects in C source code.
Freeze's "import hook" is hard wired into import.c, and has a couple
of issues. Later solutions include Fredrik Lundh's Squeeze [4],
Gordon McMillan's Installer [1] and Thomas Heller's py2exe [5].
MacPython ships with a tool called BuildApplication.
The oldest one is included with the Python source code: Freeze [2]_. It puts
marshalled byte code into static objects in C source code. Freeze's "import
hook" is hard wired into ``import.c``, and has a couple of issues. Later
solutions include Fredrik Lundh's Squeeze, Gordon McMillan's Installer, and
Thomas Heller's py2exe [3]_. MacPython ships with a tool called
``BuildApplication``.
Squeeze, Installer and py2exe use an __import__ based scheme (py2exe
currently uses Installer's iu.py, Squeeze used ihooks.py), MacPython
has two Mac-specific import hooks hard wired into import.c, that are
similar to the Freeze hook. The hooks proposed in this PEP enables
us (at least in theory; it's not a short term goal) to get rid of
the hard coded hooks in import.c, and would allow the
__import__-based tools to get rid of most of their import.c
emulation code.
Squeeze, Installer and py2exe use an ``__import__`` based scheme (py2exe
currently uses Installer's ``iu.py``, Squeeze used ``ihooks.py``), MacPython
has two Mac-specific import hooks hard wired into ``import.c``, that are
similar to the Freeze hook. The hooks proposed in this PEP enables us (at
least in theory; it's not a short term goal) to get rid of the hard coded
hooks in ``import.c``, and would allow the ``__import__``-based tools to get
rid of most of their ``import.c`` emulation code.
Before work on the design and implementation of this PEP was
started, a new BuildApplication-like tool for MacOS X prompted one
of the authors of this PEP (JvR) to expose the table of frozen
modules to Python, in the imp module. The main reason was to be
able to use the freeze import hook (avoiding fancy __import__
support), yet to also be able to supply a set of modules at
runtime. This resulted in sf patch #642578 [6], which was
mysteriously accepted (mostly because nobody seemed to care either
way ;-). Yet it is completely superfluous when this PEP gets
accepted, as it offers a much nicer and general way to do the same
thing.
Before work on the design and implementation of this PEP was started, a new
``BuildApplication``-like tool for Mac OS X prompted one of the authors of
this PEP (JvR) to expose the table of frozen modules to Python, in the ``imp``
module. The main reason was to be able to use the freeze import hook
(avoiding fancy ``__import__`` support), yet to also be able to supply a set
of modules at runtime. This resulted in issue #642578 [4]_, which was
mysteriously accepted (mostly because nobody seemed to care either way ;-).
Yet it is completely superfluous when this PEP gets accepted, as it offers a
much nicer and general way to do the same thing.
Rationale
=========
While experimenting with alternative implementation ideas to get
built-in Zip import, it was discovered that achieving this is
possible with only a fairly small amount of changes to import.c.
This allowed to factor out the Zip-specific stuff into a new source
file, while at the same time creating a *general* new import hook
scheme: the one you're reading about now.
While experimenting with alternative implementation ideas to get built-in Zip
import, it was discovered that achieving this is possible with only a fairly
small amount of changes to ``import.c``. This allowed to factor out the
Zip-specific stuff into a new source file, while at the same time creating a
*general* new import hook scheme: the one you're reading about now.
An earlier design allowed non-string objects on sys.path. Such an
object would have the necessary methods to handle an import. This
has two disadvantages: 1) it breaks code that assumes all items on
sys.path are strings; 2) it is not compatible with the PYTHONPATH
environment variable. The latter is directly needed for Zip
imports. A compromise came from Jython: allow string *subclasses*
on sys.path, which would then act as importer objects. This avoids
some breakage, and seems to work well for Jython (where it is used
to load modules from .jar files), but it was perceived as an "ugly
hack".
An earlier design allowed non-string objects on ``sys.path``. Such an object
would have the necessary methods to handle an import. This has two
disadvantages: 1) it breaks code that assumes all items on ``sys.path`` are
strings; 2) it is not compatible with the ``PYTHONPATH`` environment variable.
The latter is directly needed for Zip imports. A compromise came from Jython:
allow string *subclasses* on ``sys.path``, which would then act as importer
objects. This avoids some breakage, and seems to work well for Jython (where
it is used to load modules from ``.jar`` files), but it was perceived as an
"ugly hack".
This lead to a more elaborate scheme, (mostly copied from McMillan's
iu.py) in which each in a list of candidates is asked whether it can
handle the sys.path item, until one is found that can. This list of
candidates is a new object in the sys module: sys.path_hooks.
This lead to a more elaborate scheme, (mostly copied from McMillan's
``iu.py``) in which each in a list of candidates is asked whether it can
handle the ``sys.path`` item, until one is found that can. This list of
candidates is a new object in the ``sys`` module: ``sys.path_hooks``.
Traversing sys.path_hooks for each path item for each new import can
be expensive, so the results are cached in another new object in the
sys module: sys.path_importer_cache. It maps sys.path entries to
importer objects.
Traversing ``sys.path_hooks`` for each path item for each new import can be
expensive, so the results are cached in another new object in the ``sys``
module: ``sys.path_importer_cache``. It maps ``sys.path`` entries to importer
objects.
To minimize the impact on import.c as well as to avoid adding extra
overhead, it was chosen to not add an explicit hook and importer
object for the existing file system import logic (as iu.py has), but
to simply fall back to the built-in logic if no hook on
sys.path_hooks could handle the path item. If this is the case, a
None value is stored in sys.path_importer_cache, again to avoid
repeated lookups. (Later we can go further and add a real importer
object for the built-in mechanism, for now, the None fallback scheme
should suffice.)
To minimize the impact on ``import.c`` as well as to avoid adding extra
overhead, it was chosen to not add an explicit hook and importer object for
the existing file system import logic (as ``iu.py`` has), but to simply fall
back to the built-in logic if no hook on ``sys.path_hooks`` could handle the
path item. If this is the case, a ``None`` value is stored in
``sys.path_importer_cache``, again to avoid repeated lookups. (Later we can
go further and add a real importer object for the built-in mechanism, for now,
the ``None`` fallback scheme should suffice.)
A question was raised: what about importers that don't need *any* entry on
``sys.path``? (Built-in and frozen modules fall into that category.) Again,
Gordon McMillan to the rescue: ``iu.py`` contains a thing he calls the
*metapath*. In this PEP's implementation, it's a list of importer objects
that is traversed *before* ``sys.path``. This list is yet another new object
in the ``sys`` module: ``sys.meta_path``. Currently, this list is empty by
default, and frozen and built-in module imports are done after traversing
``sys.meta_path``, but still before ``sys.path``.
A question was raised: what about importers that don't need *any*
entry on sys.path? (Built-in and frozen modules fall into that
category.) Again, Gordon McMillan to the rescue: iu.py contains a
thing he calls the "metapath". In this PEP's implementation, it's a
list of importer objects that is traversed *before* sys.path. This
list is yet another new object in the sys.module: sys.meta_path.
Currently, this list is empty by default, and frozen and built-in
module imports are done after traversing sys.meta_path, but still
before sys.path.
Specification part 1: The Importer Protocol
===========================================
This PEP introduces a new protocol: the "Importer Protocol". It is
important to understand the context in which the protocol operates,
so here is a brief overview of the outer shells of the import
mechanism.
This PEP introduces a new protocol: the "Importer Protocol". It is important
to understand the context in which the protocol operates, so here is a brief
overview of the outer shells of the import mechanism.
When an import statement is encountered, the interpreter looks up
the __import__ function in the built-in name space. __import__ is
then called with four arguments, amongst which are the name of the
module being imported (may be a dotted name) and a reference to the
current global namespace.
When an import statement is encountered, the interpreter looks up the
``__import__`` function in the built-in name space. ``__import__`` is then
called with four arguments, amongst which are the name of the module being
imported (may be a dotted name) and a reference to the current global
namespace.
The built-in __import__ function (known as PyImport_ImportModuleEx
in import.c) will then check to see whether the module doing the
import is a package or a submodule of a package. If it is indeed a
(submodule of a) package, it first tries to do the import relative
to the package (the parent package for a submodule). For example if
a package named "spam" does "import eggs", it will first look for a
module named "spam.eggs". If that fails, the import continues as an
absolute import: it will look for a module named "eggs". Dotted
name imports work pretty much the same: if package "spam" does
"import eggs.bacon" (and "spam.eggs" exists and is itself a
package), "spam.eggs.bacon" is tried. If that fails "eggs.bacon" is
tried. (There are more subtleties that are not described here, but
these are not relevant for implementers of the Importer Protocol.)
The built-in ``__import__`` function (known as ``PyImport_ImportModuleEx()``
in ``import.c``) will then check to see whether the module doing the import is
a package or a submodule of a package. If it is indeed a (submodule of a)
package, it first tries to do the import relative to the package (the parent
package for a submodule). For example if a package named "spam" does "import
eggs", it will first look for a module named "spam.eggs". If that fails, the
import continues as an absolute import: it will look for a module named
"eggs". Dotted name imports work pretty much the same: if package "spam" does
"import eggs.bacon" (and "spam.eggs" exists and is itself a package),
"spam.eggs.bacon" is tried. If that fails "eggs.bacon" is tried. (There are
more subtleties that are not described here, but these are not relevant for
implementers of the Importer Protocol.)
Deeper down in the mechanism, a dotted name import is split up by
its components. For "import spam.ham", first an "import spam" is
done, and only when that succeeds is "ham" imported as a submodule
of "spam".
Deeper down in the mechanism, a dotted name import is split up by its
components. For "import spam.ham", first an "import spam" is done, and only
when that succeeds is "ham" imported as a submodule of "spam".
The Importer Protocol operates at this level of *individual*
imports. By the time an importer gets a request for "spam.ham",
module "spam" has already been imported.
The Importer Protocol operates at this level of *individual* imports. By the
time an importer gets a request for "spam.ham", module "spam" has already been
imported.
The protocol involves two objects: a finder and a loader. A
finder object has a single method:
The protocol involves two objects: a *finder* and a *loader*. A finder object
has a single method::
finder.find_module(fullname, path=None)
finder.find_module(fullname, path=None)
This method will be called with the fully qualified name of the
module. If the finder is installed on sys.meta_path, it will
receive a second argument, which is None for a top-level module, or
package.__path__ for submodules or subpackages[7]. It should return
a loader object if the module was found, or None if it wasn't. If
find_module() raises an exception, it will be propagated to the
caller, aborting the import.
This method will be called with the fully qualified name of the module. If
the finder is installed on ``sys.meta_path``, it will receive a second
argument, which is ``None`` for a top-level module, or ``package.__path__``
for submodules or subpackages [5]_. It should return a loader object if the
module was found, or ``None`` if it wasn't. If ``find_module()`` raises an
exception, it will be propagated to the caller, aborting the import.
A loader object also has one method:
A loader object also has one method::
loader.load_module(fullname)
loader.load_module(fullname)
This method returns the loaded module or raises an exception,
preferably ImportError if an existing exception is not being
propagated. If load_module() is asked to load a module that it
cannot, ImportError is to be raised.
This method returns the loaded module or raises an exception, preferably
``ImportError`` if an existing exception is not being propagated. If
``load_module()`` is asked to load a module that it cannot, ``ImportError`` is
to be raised.
In many cases the finder and loader can be one and the same
object: finder.find_module() would just return self.
In many cases the finder and loader can be one and the same object:
``finder.find_module()`` would just return ``self``.
The 'fullname' argument of both methods is the fully qualified
module name, for example "spam.eggs.ham". As explained above, when
finder.find_module("spam.eggs.ham") is called, "spam.eggs" has
already been imported and added to sys.modules. However, the
find_module() method isn't necessarily always called during an
actual import: meta tools that analyze import dependencies (such as
freeze, Installer or py2exe) don't actually load modules, so an
finder shouldn't *depend* on the parent package being available in
sys.modules.
The ``fullname`` argument of both methods is the fully qualified module name,
for example "spam.eggs.ham". As explained above, when
``finder.find_module("spam.eggs.ham")`` is called, "spam.eggs" has already
been imported and added to ``sys.modules``. However, the ``find_module()``
method isn't necessarily always called during an actual import: meta tools
that analyze import dependencies (such as freeze, Installer or py2exe) don't
actually load modules, so a finder shouldn't *depend* on the parent package
being available in ``sys.modules``.
The load_module() method has a few responsibilities that it must
fulfill *before* it runs any code:
The ``load_module()`` method has a few responsibilities that it must fulfill
*before* it runs any code:
- If there is an existing module object named 'fullname' in
sys.modules, the loader must use that existing module.
(Otherwise, the reload() builtin will not work correctly.)
If a module named 'fullname' does not exist in sys.modules,
the loader must create a new module object and add it to
sys.modules.
* If there is an existing module object named 'fullname' in ``sys.modules``,
the loader must use that existing module. (Otherwise, the ``reload()``
builtin will not work correctly.) If a module named 'fullname' does not
exist in ``sys.modules``, the loader must create a new module object and
add it to ``sys.modules``.
Note that the module object *must* be in sys.modules before the
loader executes the module code. This is crucial because the
module code may (directly or indirectly) import itself; adding
it to sys.modules beforehand prevents unbounded recursion in the
worst case and multiple loading in the best.
Note that the module object *must* be in ``sys.modules`` before the loader
executes the module code. This is crucial because the module code may
(directly or indirectly) import itself; adding it to ``sys.modules``
beforehand prevents unbounded recursion in the worst case and multiple
loading in the best.
If the load fails, the loader needs to remove any module it may have
inserted into sys.modules. If the module was already in
sys.modules then the loader should leave it alone.
If the load fails, the loader needs to remove any module it may have
inserted into ``sys.modules``. If the module was already in ``sys.modules``
then the loader should leave it alone.
- The __file__ attribute must be set. This must be a string, but it
may be a dummy value, for example "<frozen>". The privilege of
not having a __file__ attribute at all is reserved for built-in
modules.
* The ``__file__`` attribute must be set. This must be a string, but it may
be a dummy value, for example "<frozen>". The privilege of not having a
``__file__`` attribute at all is reserved for built-in modules.
- The __name__ attribute must be set. If one uses
imp.new_module() then the attribute is set automatically.
* The ``__name__`` attribute must be set. If one uses ``imp.new_module()``
then the attribute is set automatically.
- If it's a package, the __path__ variable must be set. This must
be a list, but may be empty if __path__ has no further
significance to the importer (more on this later).
* If it's a package, the ``__path__`` variable must be set. This must be a
list, but may be empty if ``__path__`` has no further significance to the
importer (more on this later).
- The __loader__ attribute must be set to the loader object.
This is mostly for introspection and reloading, but can be used
for importer-specific extras, for example getting data associated
with an importer.
* The ``__loader__`` attribute must be set to the loader object. This is
mostly for introspection and reloading, but can be used for
importer-specific extras, for example getting data associated with an
importer.
- The __package__ attribute [10] must be set.
* The ``__package__`` attribute [8]_ must be set.
If the module is a Python module (as opposed to a built-in module or
a dynamically loaded extension), it should execute the module's code
in the module's global name space (module.__dict__).
If the module is a Python module (as opposed to a built-in module or a
dynamically loaded extension), it should execute the module's code in the
module's global name space (``module.__dict__``).
Here is a minimal pattern for a load_module() method:
Here is a minimal pattern for a ``load_module()`` method::
# Consider using importlib.util.module_for_loader() to handle
# most of these details for you.
@ -298,294 +286,286 @@ Specification part 1: The Importer Protocol
Specification part 2: Registering Hooks
=======================================
There are two types of import hooks: Meta hooks and Path hooks.
Meta hooks are called at the start of import processing, before any
other import processing (so that meta hooks can override sys.path
processing, or frozen modules, or even built-in modules). To
register a meta hook, simply add the finder object to
sys.meta_path (the list of registered meta hooks).
There are two types of import hooks: *Meta hooks* and *Path hooks*. Meta
hooks are called at the start of import processing, before any other import
processing (so that meta hooks can override ``sys.path`` processing, frozen
modules, or even built-in modules). To register a meta hook, simply add the
finder object to ``sys.meta_path`` (the list of registered meta hooks).
Path hooks are called as part of sys.path (or package.__path__)
processing, at the point where their associated path item is
encountered. A path hook is registered by adding an importer
factory to sys.path_hooks.
Path hooks are called as part of ``sys.path`` (or ``package.__path__``)
processing, at the point where their associated path item is encountered. A
path hook is registered by adding an importer factory to ``sys.path_hooks``.
sys.path_hooks is a list of callables, which will be checked in
sequence to determine if they can handle a given path item. The
callable is called with one argument, the path item. The callable
must raise ImportError if it is unable to handle the path item, and
return an importer object if it can handle the path item. Note
that if the callable returns an importer object for a specific
sys.path entry, the builtin import machinery will not be invoked
to handle that entry any longer, even if the importer object later
fails to find a specific module. The callable is typically the
class of the import hook, and hence the class __init__ method is
called. (This is also the reason why it should raise ImportError:
an __init__ method can't return anything. This would be possible
with a __new__ method in a new style class, but we don't want to
require anything about how a hook is implemented.)
``sys.path_hooks`` is a list of callables, which will be checked in sequence
to determine if they can handle a given path item. The callable is called
with one argument, the path item. The callable must raise ``ImportError`` if
it is unable to handle the path item, and return an importer object if it can
handle the path item. Note that if the callable returns an importer object
for a specific ``sys.path`` entry, the builtin import machinery will not be
invoked to handle that entry any longer, even if the importer object later
fails to find a specific module. The callable is typically the class of the
import hook, and hence the class ``__init__()`` method is called. (This is
also the reason why it should raise ``ImportError``: an ``__init__()`` method
can't return anything. This would be possible with a ``__new__()`` method in
a new style class, but we don't want to require anything about how a hook is
implemented.)
The results of path hook checks are cached in
sys.path_importer_cache, which is a dictionary mapping path entries
to importer objects. The cache is checked before sys.path_hooks is
scanned. If it is necessary to force a rescan of sys.path_hooks, it
is possible to manually clear all or part of
sys.path_importer_cache.
The results of path hook checks are cached in ``sys.path_importer_cache``,
which is a dictionary mapping path entries to importer objects. The cache is
checked before ``sys.path_hooks`` is scanned. If it is necessary to force a
rescan of ``sys.path_hooks``, it is possible to manually clear all or part of
``sys.path_importer_cache``.
Just like sys.path itself, the new sys variables must have specific
types:
Just like ``sys.path`` itself, the new ``sys`` variables must have specific
types:
sys.meta_path and sys.path_hooks must be Python lists.
sys.path_importer_cache must be a Python dict.
* ``sys.meta_path`` and ``sys.path_hooks`` must be Python lists.
* ``sys.path_importer_cache`` must be a Python dict.
Modifying these variables in place is allowed, as is replacing them
with new objects.
Modifying these variables in place is allowed, as is replacing them with new
objects.
Packages and the role of __path__
Packages and the role of ``__path__``
=====================================
If a module has a __path__ attribute, the import mechanism will
treat it as a package. The __path__ variable is used instead of
sys.path when importing submodules of the package. The rules for
sys.path therefore also apply to pkg.__path__. So sys.path_hooks is
also consulted when pkg.__path__ is traversed. Meta importers don't
necessarily use sys.path at all to do their work and may therefore
ignore the value of pkg.__path__. In this case it is still advised
to set it to list, which can be empty.
If a module has a ``__path__`` attribute, the import mechanism will treat it
as a package. The ``__path__`` variable is used instead of ``sys.path`` when
importing submodules of the package. The rules for ``sys.path`` therefore
also apply to ``pkg.__path__``. So ``sys.path_hooks`` is also consulted when
``pkg.__path__`` is traversed. Meta importers don't necessarily use
``sys.path`` at all to do their work and may therefore ignore the value of
``pkg.__path__``. In this case it is still advised to set it to list, which
can be empty.
Optional Extensions to the Importer Protocol
============================================
The Importer Protocol defines three optional extensions. One is to
retrieve data files, the second is to support module packaging tools
and/or tools that analyze module dependencies (for example Freeze
[3]), while the last is to support execution of modules as scripts.
The latter two categories of tools usually don't actually *load*
modules, they only need to know if and where they are available.
All three extensions are highly recommended for general purpose
importers, but may safely be left out if those features aren't
needed.
The Importer Protocol defines three optional extensions. One is to retrieve
data files, the second is to support module packaging tools and/or tools that
analyze module dependencies (for example Freeze), while the last is to support
execution of modules as scripts. The latter two categories of tools usually
don't actually *load* modules, they only need to know if and where they are
available. All three extensions are highly recommended for general purpose
importers, but may safely be left out if those features aren't needed.
To retrieve the data for arbitrary "files" from the underlying
storage backend, loader objects may supply a method named get_data:
To retrieve the data for arbitrary "files" from the underlying storage
backend, loader objects may supply a method named ``get_data()``::
loader.get_data(path)
loader.get_data(path)
This method returns the data as a string, or raise IOError if the
"file" wasn't found. The data is always returned as if "binary" mode
was used - there is no CRLF translation of text files, for example.
It is meant for importers that have some file-system-like properties.
The 'path' argument is a path that can be constructed by munging
module.__file__ (or pkg.__path__ items) with the os.path.* functions,
for example:
This method returns the data as a string, or raise ``IOError`` if the "file"
wasn't found. The data is always returned as if "binary" mode was used -
there is no CRLF translation of text files, for example. It is meant for
importers that have some file-system-like properties. The 'path' argument is
a path that can be constructed by munging ``module.__file__`` (or
``pkg.__path__`` items) with the ``os.path.*`` functions, for example::
d = os.path.dirname(__file__)
data = __loader__.get_data(os.path.join(d, "logo.gif"))
d = os.path.dirname(__file__)
data = __loader__.get_data(os.path.join(d, "logo.gif"))
The following set of methods may be implemented if support for (for
example) Freeze-like tools is desirable. It consists of three
additional methods which, to make it easier for the caller, each of
which should be implemented, or none at all.
The following set of methods may be implemented if support for (for example)
Freeze-like tools is desirable. It consists of three additional methods
which, to make it easier for the caller, each of which should be implemented,
or none at all::
loader.is_package(fullname)
loader.get_code(fullname)
loader.get_source(fullname)
loader.is_package(fullname)
loader.get_code(fullname)
loader.get_source(fullname)
All three methods should raise ImportError if the module wasn't
found.
All three methods should raise ``ImportError`` if the module wasn't found.
The loader.is_package(fullname) method should return True if the
module specified by 'fullname' is a package and False if it isn't.
The ``loader.is_package(fullname)`` method should return ``True`` if the
module specified by 'fullname' is a package and ``False`` if it isn't.
The loader.get_code(fullname) method should return the code object
associated with the module, or None if it's a built-in or extension
module. If the loader doesn't have the code object but it _does_
have the source code, it should return the compiled source code.
(This is so that our caller doesn't also need to check get_source()
if all it needs is the code object.)
The ``loader.get_code(fullname)`` method should return the code object
associated with the module, or ``None`` if it's a built-in or extension
module. If the loader doesn't have the code object but it *does* have the
source code, it should return the compiled source code. (This is so that our
caller doesn't also need to check ``get_source()`` if all it needs is the code
object.)
The loader.get_source(fullname) method should return the source code
for the module as a string (using newline characters for line
endings) or None if the source is not available (yet it should still
raise ImportError if the module can't be found by the importer at
all).
The ``loader.get_source(fullname)`` method should return the source code for
the module as a string (using newline characters for line endings) or ``None``
if the source is not available (yet it should still raise ``ImportError`` if
the module can't be found by the importer at all).
To support execution of modules as scripts [9], the above three
methods for finding the code associated with a module must be
implemented. In addition to those methods, the following method
may be provided in order to allow the ``runpy`` module to correctly
set the ``__file__`` attribute:
To support execution of modules as scripts [6]_, the above three methods for
finding the code associated with a module must be implemented. In addition to
those methods, the following method may be provided in order to allow the
``runpy`` module to correctly set the ``__file__`` attribute::
loader.get_filename(fullname)
loader.get_filename(fullname)
This method should return the value that ``__file__`` would be set
to if the named module was loaded. If the module is not found, then
ImportError should be raised.
This method should return the value that ``__file__`` would be set to if the
named module was loaded. If the module is not found, then ``ImportError``
should be raised.
Integration with the 'imp' module
=================================
The new import hooks are not easily integrated in the existing
imp.find_module() and imp.load_module() calls. It's questionable
whether it's possible at all without breaking code; it is better to
simply add a new function to the imp module. The meaning of the
existing imp.find_module() and imp.load_module() calls changes from:
"they expose the built-in import mechanism" to "they expose the
basic *unhooked* built-in import mechanism". They simply won't
invoke any import hooks. A new imp module function is proposed (but
not yet implemented) under the name "get_loader", which is used as
in the following pattern:
The new import hooks are not easily integrated in the existing
``imp.find_module()`` and ``imp.load_module()`` calls. It's questionable
whether it's possible at all without breaking code; it is better to simply add
a new function to the ``imp`` module. The meaning of the existing
``imp.find_module()`` and ``imp.load_module()`` calls changes from: "they
expose the built-in import mechanism" to "they expose the basic *unhooked*
built-in import mechanism". They simply won't invoke any import hooks. A new
``imp`` module function is proposed (but not yet implemented) under the name
``get_loader()``, which is used as in the following pattern::
loader = imp.get_loader(fullname, path)
if loader is not None:
loader.load_module(fullname)
loader = imp.get_loader(fullname, path)
if loader is not None:
loader.load_module(fullname)
In the case of a "basic" import, one the imp.find_module() function
would handle, the loader object would be a wrapper for the current
output of imp.find_module(), and loader.load_module() would call
imp.load_module() with that output.
In the case of a "basic" import, one the `imp.find_module()` function would
handle, the loader object would be a wrapper for the current output of
``imp.find_module()``, and ``loader.load_module()`` would call
``imp.load_module()`` with that output.
Note that this wrapper is currently not yet implemented, although a
Python prototype exists in the test_importhooks.py script (the
ImpWrapper class) included with the patch.
Note that this wrapper is currently not yet implemented, although a Python
prototype exists in the ``test_importhooks.py`` script (the ``ImpWrapper``
class) included with the patch.
Forward Compatibility
=====================
Existing __import__ hooks will not invoke new-style hooks by magic,
unless they call the original __import__ function as a fallback.
For example, ihooks.py, iu.py and imputil.py are in this sense not
forward compatible with this PEP.
Existing ``__import__`` hooks will not invoke new-style hooks by magic, unless
they call the original ``__import__`` function as a fallback. For example,
``ihooks.py``, ``iu.py`` and ``imputil.py`` are in this sense not forward
compatible with this PEP.
Open Issues
===========
Modules often need supporting data files to do their job,
particularly in the case of complex packages or full applications.
Current practice is generally to locate such files via sys.path (or
a package.__path__ attribute). This approach will not work, in
general, for modules loaded via an import hook.
Modules often need supporting data files to do their job, particularly in the
case of complex packages or full applications. Current practice is generally
to locate such files via ``sys.path`` (or a ``package.__path__`` attribute).
This approach will not work, in general, for modules loaded via an import
hook.
There are a number of possible ways to address this problem:
There are a number of possible ways to address this problem:
- "Don't do that". If a package needs to locate data files via its
__path__, it is not suitable for loading via an import hook. The
package can still be located on a directory in sys.path, as at
present, so this should not be seen as a major issue.
* "Don't do that". If a package needs to locate data files via its
``__path__``, it is not suitable for loading via an import hook. The
package can still be located on a directory in ``sys.path``, as at present,
so this should not be seen as a major issue.
- Locate data files from a standard location, rather than relative
to the module file. A relatively simple approach (which is
supported by distutils) would be to locate data files based on
sys.prefix (or sys.exec_prefix). For example, looking in
os.path.join(sys.prefix, "data", package_name).
* Locate data files from a standard location, rather than relative to the
module file. A relatively simple approach (which is supported by
distutils) would be to locate data files based on ``sys.prefix`` (or
``sys.exec_prefix``). For example, looking in
``os.path.join(sys.prefix, "data", package_name)``.
- Import hooks could offer a standard way of getting at data files
relative to the module file. The standard zipimport object
provides a method get_data(name) which returns the content of the
"file" called name, as a string. To allow modules to get at the
importer object, zipimport also adds an attribute "__loader__"
to the module, containing the zipimport object used to load the
module. If such an approach is used, it is important that client
code takes care not to break if the get_data method is not available,
so it is not clear that this approach offers a general answer to the
problem.
* Import hooks could offer a standard way of getting at data files relative
to the module file. The standard ``zipimport`` object provides a method
``get_data(name)`` which returns the content of the "file" called ``name``,
as a string. To allow modules to get at the importer object, ``zipimport``
also adds an attribute ``__loader__`` to the module, containing the
``zipimport`` object used to load the module. If such an approach is used,
it is important that client code takes care not to break if the
``get_data()`` method is not available, so it is not clear that this
approach offers a general answer to the problem.
It was suggested on python-dev that it would be useful to be able to
receive a list of available modules from an importer and/or a list
of available data files for use with the get_data() method. The
protocol could grow two additional extensions, say list_modules()
and list_files(). The latter makes sense on loader objects with a
get_data() method. However, it's a bit unclear which object should
implement list_modules(): the importer or the loader or both?
It was suggested on python-dev that it would be useful to be able to receive a
list of available modules from an importer and/or a list of available data
files for use with the ``get_data()`` method. The protocol could grow two
additional extensions, say ``list_modules()`` and ``list_files()``. The
latter makes sense on loader objects with a ``get_data()`` method. However,
it's a bit unclear which object should implement ``list_modules()``: the
importer or the loader or both?
This PEP is biased towards loading modules from alternative places:
it currently doesn't offer dedicated solutions for loading modules
from alternative file formats or with alternative compilers. In
contrast, the ihooks module from the standard library does have a
fairly straightforward way to do this. The Quixote project [8] uses
this technique to import PTL files as if they are ordinary Python
modules. To do the same with the new hooks would either mean to add
a new module implementing a subset of ihooks as a new-style
importer, or add a hookable built-in path importer object.
This PEP is biased towards loading modules from alternative places: it
currently doesn't offer dedicated solutions for loading modules from
alternative file formats or with alternative compilers. In contrast, the
``ihooks`` module from the standard library does have a fairly straightforward
way to do this. The Quixote project [7]_ uses this technique to import PTL
files as if they are ordinary Python modules. To do the same with the new
hooks would either mean to add a new module implementing a subset of
``ihooks`` as a new-style importer, or add a hookable built-in path importer
object.
There is no specific support within this PEP for "stacking" hooks.
For example, it is not obvious how to write a hook to load modules
from ..tar.gz files by combining separate hooks to load modules from
.tar and ..gz files. However, there is no support for such stacking
in the existing hook mechanisms (either the basic "replace
__import__" method, or any of the existing import hook modules) and
so this functionality is not an obvious requirement of the new
mechanism. It may be worth considering as a future enhancement,
however.
There is no specific support within this PEP for "stacking" hooks. For
example, it is not obvious how to write a hook to load modules from ``tar.gz``
files by combining separate hooks to load modules from ``.tar`` and ``.gz``
files. However, there is no support for such stacking in the existing hook
mechanisms (either the basic "replace ``__import__``" method, or any of the
existing import hook modules) and so this functionality is not an obvious
requirement of the new mechanism. It may be worth considering as a future
enhancement, however.
It is possible (via sys.meta_path) to add hooks which run before
sys.path is processed. However, there is no equivalent way of
adding hooks to run after sys.path is processed. For now, if a hook
is required after sys.path has been processed, it can be simulated
by adding an arbitrary "cookie" string at the end of sys.path, and
having the required hook associated with this cookie, via the normal
sys.path_hooks processing. In the longer term, the path handling
code will become a "real" hook on sys.meta_path, and at that stage
it will be possible to insert user-defined hooks either before or
after it.
It is possible (via ``sys.meta_path``) to add hooks which run before
``sys.path`` is processed. However, there is no equivalent way of adding
hooks to run after ``sys.path`` is processed. For now, if a hook is required
after ``sys.path`` has been processed, it can be simulated by adding an
arbitrary "cookie" string at the end of ``sys.path``, and having the required
hook associated with this cookie, via the normal ``sys.path_hooks``
processing. In the longer term, the path handling code will become a "real"
hook on ``sys.meta_path``, and at that stage it will be possible to insert
user-defined hooks either before or after it.
Implementation
==============
The PEP 302 implementation has been integrated with Python as of
2.3a1. An earlier version is available as SourceForge patch
#652586, but more interestingly, the SF item contains a fairly
detailed history of the development and design.
http://www.python.org/sf/652586
The PEP 302 implementation has been integrated with Python as of 2.3a1. An
earlier version is available as patch #652586 [9]_, but more interestingly,
the issue contains a fairly detailed history of the development and design.
PEP 273 has been implemented using PEP 302's import hooks.
PEP 273 has been implemented using PEP 302's import hooks.
References and Footnotes
========================
[1] Installer by Gordon McMillan
http://www.mcmillan-inc.com/install1.html
.. [1] imputil module
http://docs.python.org/library/imputil.html
[2] PEP 273, Import Modules from Zip Archives, Ahlstrom
http://www.python.org/dev/peps/pep-0273/
.. [2] The Freeze tool.
See also the ``Tools/freeze/`` directory in a Python source distribution
[3] The Freeze tool
Tools/freeze/ in a Python source distribution
.. [3] py2exe by Thomas Heller
http://www.py2exe.org/
[4] Squeeze
http://starship.python.net/crew/fredrik/ipa/squeeze.htm
.. [4] imp.set_frozenmodules() patch
http://bugs.python.org/issue642578
[5] py2exe by Thomas Heller
http://py2exe.sourceforge.net/
.. [5] The path argument to ``finder.find_module()`` is there because the
``pkg.__path__`` variable may be needed at this point. It may either come
from the actual parent module or be supplied by ``imp.find_module()`` or
the proposed ``imp.get_loader()`` function.
[6] imp.set_frozenmodules() patch
http://www.python.org/sf/642578
.. [6] PEP 338: Executing modules as scripts
http://www.python.org/dev/peps/pep-0338/
[7] The path argument to finder.find_module() is there because the
pkg.__path__ variable may be needed at this point. It may either
come from the actual parent module or be supplied by
imp.find_module() or the proposed imp.get_loader() function.
.. [7] Quixote, a framework for developing Web applications
http://www.mems-exchange.org/software/quixote/
[8] Quixote, a framework for developing Web applications
http://www.mems-exchange.org/software/quixote/
.. [8] PEP 366: Main module explicit relative imports
http://www.python.org/dev/peps/pep-0366/
[9] PEP 338: Executing modules as scripts
http://www.python.org/dev/peps/pep-0338/
[10] PEP 366: Main module explicit relative imports
http://www.python.org/dev/peps/pep-0366/
.. [9] New import hooks + Import from Zip files
http://bugs.python.org/issue652586
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: