2002-12-20 08:07:24 -05:00
|
|
|
|
PEP: 302
|
|
|
|
|
Title: New Import Hooks
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Just van Rossum <just@letterror.com>,
|
2002-12-21 14:51:05 -05:00
|
|
|
|
Paul Moore <gustav@morpheus.demon.co.uk>
|
2007-05-18 14:05:05 -04:00
|
|
|
|
Status: Final
|
2002-12-20 08:07:24 -05:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/plain
|
|
|
|
|
Created: 19-Dec-2002
|
|
|
|
|
Python-Version: 2.3
|
|
|
|
|
Post-History: 19-Dec-2002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP proposes to add a new set of import hooks that offer better
|
|
|
|
|
customization of the Python import mechanism. Contrary to the
|
|
|
|
|
current __import__ hook, a new-style hook can be injected into the
|
|
|
|
|
existing scheme, allowing for a finer grained control of how modules
|
|
|
|
|
are found and how they are loaded.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
|
|
|
|
|
|
The only way to customize the import mechanism is currently to
|
2002-12-21 14:51:05 -05:00
|
|
|
|
override the built-in __import__ function. However, overriding
|
2002-12-20 08:07:24 -05:00
|
|
|
|
__import__ has many problems. To begin with:
|
|
|
|
|
|
|
|
|
|
- An __import__ replacement needs to *fully* reimplement the entire
|
|
|
|
|
import mechanism, or call the original __import__ before or after
|
|
|
|
|
the custom code.
|
|
|
|
|
|
|
|
|
|
- It has very complex semantics and responsibilities.
|
|
|
|
|
|
|
|
|
|
- __import__ gets called even for modules that are already in
|
|
|
|
|
sys.modules, which is almost never what you want, unless you're
|
|
|
|
|
writing some sort of monitoring tool.
|
|
|
|
|
|
|
|
|
|
The situation gets worse when you need to extend the import
|
|
|
|
|
mechanism from C: it's currently impossible, apart from hacking
|
|
|
|
|
Python's import.c or reimplementing much of import.c from scratch.
|
|
|
|
|
|
|
|
|
|
There is a fairly long history of tools written in Python that allow
|
|
|
|
|
extending the import mechanism in various way, based on the
|
|
|
|
|
__import__ hook. The Standard Library includes two such tools:
|
|
|
|
|
ihooks.py (by GvR) and imputil.py (Greg Stein), but perhaps the most
|
|
|
|
|
famous is iu.py by Gordon McMillan, available as part of his
|
|
|
|
|
Installer [1] package. Their usefulness is somewhat limited because
|
|
|
|
|
they are written in Python; bootstrapping issues need to worked
|
|
|
|
|
around as you can't load the module containing the hook with the
|
|
|
|
|
hook itself. So if you want the entire Standard Library to be
|
|
|
|
|
loadable from an import hook, the hook must be written in C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use cases
|
|
|
|
|
|
|
|
|
|
This section lists several existing applications that depend on
|
|
|
|
|
import hooks. Among these, a lot of duplicate work was done that
|
|
|
|
|
could have been saved if there had been a more flexible import hook
|
|
|
|
|
at the time. This PEP should make life a lot easier for similar
|
|
|
|
|
projects in the future.
|
|
|
|
|
|
|
|
|
|
Extending the import mechanism is needed when you want to load
|
|
|
|
|
modules that are stored in a non-standard way. Examples include
|
|
|
|
|
modules that are bundled together in an archive; byte code that is
|
|
|
|
|
not stored in a pyc formatted file; modules that are loaded from a
|
|
|
|
|
database over a network.
|
|
|
|
|
|
|
|
|
|
The work on this PEP was partly triggered by the implementation of
|
2002-12-21 14:51:05 -05:00
|
|
|
|
PEP 273 [2], which adds imports from Zip archives as a built-in
|
2002-12-20 08:07:24 -05:00
|
|
|
|
feature to Python. While the PEP itself was widely accepted as a
|
|
|
|
|
must-have feature, the implementation left a few things to desire.
|
|
|
|
|
For one thing it went through great lengths to integrate itself with
|
|
|
|
|
import.c, adding lots of code that was either specific for Zip file
|
|
|
|
|
imports or *not* specific to Zip imports, yet was not generally
|
|
|
|
|
useful (or even desirable) either. Yet the PEP 273 implementation
|
|
|
|
|
can hardly be blamed for this: it is simply extremely hard to do,
|
|
|
|
|
given the current state of import.c.
|
|
|
|
|
|
|
|
|
|
Packaging applications for end users is a typical use case for
|
|
|
|
|
import hooks, if not *the* typical use case. Distributing lots of
|
|
|
|
|
source or pyc files around is not always appropriate (let alone a
|
|
|
|
|
separate Python installation), so there is a frequent desire to
|
|
|
|
|
package all needed modules in a single file. So frequent in fact
|
|
|
|
|
that multiple solutions have been implemented over the years.
|
|
|
|
|
|
|
|
|
|
The oldest one is included with the Python source code: Freeze [3].
|
|
|
|
|
It puts marshalled byte code into static objects in C source code.
|
|
|
|
|
Freeze's "import hook" is hard wired into import.c, and has a couple
|
|
|
|
|
of issues. Later solutions include Fredrik Lundh's Squeeze [4],
|
|
|
|
|
Gordon McMillan's Installer [1] and Thomas Heller's py2exe [5].
|
|
|
|
|
MacPython ships with a tool called BuildApplication.
|
|
|
|
|
|
|
|
|
|
Squeeze, Installer and py2exe use an __import__ based scheme (py2exe
|
|
|
|
|
currently uses Installer's iu.py, Squeeze used ihooks.py), MacPython
|
|
|
|
|
has two Mac-specific import hooks hard wired into import.c, that are
|
|
|
|
|
similar to the Freeze hook. The hooks proposed in this PEP enables
|
|
|
|
|
us (at least in theory; it's not a short term goal) to get rid of
|
|
|
|
|
the hard coded hooks in import.c, and would allow the
|
|
|
|
|
__import__-based tools to get rid of most of their import.c
|
|
|
|
|
emulation code.
|
|
|
|
|
|
|
|
|
|
Before work on the design and implementation of this PEP was
|
2002-12-21 14:51:05 -05:00
|
|
|
|
started, a new BuildApplication-like tool for MacOS X prompted one
|
|
|
|
|
of the authors of this PEP (JvR) to expose the table of frozen
|
|
|
|
|
modules to Python, in the imp module. The main reason was to be
|
|
|
|
|
able to use the freeze import hook (avoiding fancy __import__
|
|
|
|
|
support), yet to also be able to supply a set of modules at
|
|
|
|
|
runtime. This resulted in sf patch #642578 [6], which was
|
|
|
|
|
mysteriously accepted (mostly because nobody seemed to care either
|
|
|
|
|
way ;-). Yet it is completely superfluous when this PEP gets
|
|
|
|
|
accepted, as it offers a much nicer and general way to do the same
|
|
|
|
|
thing.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
While experimenting with alternative implementation ideas to get
|
2002-12-21 14:51:05 -05:00
|
|
|
|
built-in Zip import, it was discovered that achieving this is
|
2002-12-20 08:07:24 -05:00
|
|
|
|
possible with only a fairly small amount of changes to import.c.
|
|
|
|
|
This allowed to factor out the Zip-specific stuff into a new source
|
|
|
|
|
file, while at the same time creating a *general* new import hook
|
|
|
|
|
scheme: the one you're reading about now.
|
|
|
|
|
|
|
|
|
|
An earlier design allowed non-string objects on sys.path. Such an
|
2002-12-21 14:51:05 -05:00
|
|
|
|
object would have the necessary methods to handle an import. This
|
2002-12-20 08:07:24 -05:00
|
|
|
|
has two disadvantages: 1) it breaks code that assumes all items on
|
|
|
|
|
sys.path are strings; 2) it is not compatible with the PYTHONPATH
|
|
|
|
|
environment variable. The latter is directly needed for Zip
|
|
|
|
|
imports. A compromise came from Jython: allow string *subclasses*
|
|
|
|
|
on sys.path, which would then act as importer objects. This avoids
|
|
|
|
|
some breakage, and seems to work well for Jython (where it is used
|
|
|
|
|
to load modules from .jar files), but it was perceived as an "ugly
|
|
|
|
|
hack".
|
|
|
|
|
|
|
|
|
|
This lead to a more elaborate scheme, (mostly copied from McMillan's
|
|
|
|
|
iu.py) in which each in a list of candidates is asked whether it can
|
|
|
|
|
handle the sys.path item, until one is found that can. This list of
|
|
|
|
|
candidates is a new object in the sys module: sys.path_hooks.
|
|
|
|
|
|
|
|
|
|
Traversing sys.path_hooks for each path item for each new import can
|
|
|
|
|
be expensive, so the results are cached in another new object in the
|
|
|
|
|
sys module: sys.path_importer_cache. It maps sys.path entries to
|
|
|
|
|
importer objects.
|
|
|
|
|
|
|
|
|
|
To minimize the impact on import.c as well as to avoid adding extra
|
|
|
|
|
overhead, it was chosen to not add an explicit hook and importer
|
|
|
|
|
object for the existing file system import logic (as iu.py has), but
|
2002-12-21 14:51:05 -05:00
|
|
|
|
to simply fall back to the built-in logic if no hook on
|
2002-12-20 08:07:24 -05:00
|
|
|
|
sys.path_hooks could handle the path item. If this is the case, a
|
|
|
|
|
None value is stored in sys.path_importer_cache, again to avoid
|
|
|
|
|
repeated lookups. (Later we can go further and add a real importer
|
2002-12-21 14:51:05 -05:00
|
|
|
|
object for the built-in mechanism, for now, the None fallback scheme
|
2002-12-20 08:07:24 -05:00
|
|
|
|
should suffice.)
|
|
|
|
|
|
|
|
|
|
A question was raised: what about importers that don't need *any*
|
2002-12-21 14:51:05 -05:00
|
|
|
|
entry on sys.path? (Built-in and frozen modules fall into that
|
2002-12-20 08:07:24 -05:00
|
|
|
|
category.) Again, Gordon McMillan to the rescue: iu.py contains a
|
|
|
|
|
thing he calls the "metapath". In this PEP's implementation, it's a
|
|
|
|
|
list of importer objects that is traversed *before* sys.path. This
|
|
|
|
|
list is yet another new object in the sys.module: sys.meta_path.
|
2002-12-21 14:51:05 -05:00
|
|
|
|
Currently, this list is empty by default, and frozen and built-in
|
2002-12-20 08:07:24 -05:00
|
|
|
|
module imports are done after traversing sys.meta_path, but still
|
2002-12-21 14:51:05 -05:00
|
|
|
|
before sys.path. (Again, later we can add real frozen, built-in and
|
2002-12-20 08:07:24 -05:00
|
|
|
|
sys.path importer objects on sys.meta_path, allowing for some extra
|
|
|
|
|
flexibility, but this could be done as a "phase 2" project, possibly
|
|
|
|
|
for Python 2.4. It would be the finishing touch as then *every*
|
|
|
|
|
import would go through sys.meta_path, making it the central import
|
|
|
|
|
dispatcher.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification part 1: The Importer Protocol
|
|
|
|
|
|
|
|
|
|
This PEP introduces a new protocol: the "Importer Protocol". It is
|
|
|
|
|
important to understand the context in which the protocol operates,
|
|
|
|
|
so here is a brief overview of the outer shells of the import
|
|
|
|
|
mechanism.
|
|
|
|
|
|
|
|
|
|
When an import statement is encountered, the interpreter looks up
|
2002-12-21 14:51:05 -05:00
|
|
|
|
the __import__ function in the built-in name space. __import__ is
|
2002-12-20 08:07:24 -05:00
|
|
|
|
then called with four arguments, amongst which are the name of the
|
|
|
|
|
module being imported (may be a dotted name) and a reference to the
|
|
|
|
|
current global namespace.
|
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
The built-in __import__ function (known as PyImport_ImportModuleEx
|
|
|
|
|
in import.c) will then check to see whether the module doing the
|
2002-12-26 13:00:40 -05:00
|
|
|
|
import is a package or a submodule of a package. If it is indeed a
|
|
|
|
|
(submodule of a) package, it first tries to do the import relative
|
|
|
|
|
to the package (the parent package for a submodule). For example if
|
|
|
|
|
a package named "spam" does "import eggs", it will first look for a
|
|
|
|
|
module named "spam.eggs". If that fails, the import continues as an
|
2002-12-23 17:13:48 -05:00
|
|
|
|
absolute import: it will look for a module named "eggs". Dotted
|
|
|
|
|
name imports work pretty much the same: if package "spam" does
|
2002-12-26 14:03:22 -05:00
|
|
|
|
"import eggs.bacon" (and "spam.eggs" exists and is itself a
|
|
|
|
|
package), "spam.eggs.bacon" is tried. If that fails "eggs.bacon" is
|
|
|
|
|
tried. (There are more subtleties that are not described here, but
|
|
|
|
|
these are not relevant for implementers of the Importer Protocol.)
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
Deeper down in the mechanism, a dotted name import is split up by
|
|
|
|
|
its components. For "import spam.ham", first an "import spam" is
|
|
|
|
|
done, and only when that succeeds is "ham" imported as a submodule
|
|
|
|
|
of "spam".
|
|
|
|
|
|
|
|
|
|
The Importer Protocol operates at this level of *individual*
|
|
|
|
|
imports. By the time an importer gets a request for "spam.ham",
|
|
|
|
|
module "spam" has already been imported.
|
|
|
|
|
|
|
|
|
|
The protocol involves two objects: an importer and a loader. An
|
|
|
|
|
importer object has a single method:
|
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
importer.find_module(fullname, path=None)
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
This method will be called with the fully qualified name of the
|
|
|
|
|
module. If the importer is installed on sys.meta_path, it will
|
|
|
|
|
receive a second argument, which is None for a top-level module, or
|
|
|
|
|
package.__path__ for submodules or subpackages[7]. It should return
|
|
|
|
|
a loader object if the module was found, or None if it wasn't. If
|
|
|
|
|
find_module() raises an exception, it will be propagated to the
|
|
|
|
|
caller, aborting the import.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
A loader object also has one method:
|
|
|
|
|
|
|
|
|
|
loader.load_module(fullname)
|
|
|
|
|
|
2009-02-05 21:42:50 -05:00
|
|
|
|
This method returns the loaded module or raises an exception,
|
2007-04-29 16:19:50 -04:00
|
|
|
|
preferably ImportError if an existing exception is not being
|
2007-08-16 23:22:33 -04:00
|
|
|
|
propagated. If load_module() is asked to load a module that it
|
2007-04-29 16:19:50 -04:00
|
|
|
|
cannot, ImportError is to be raised.
|
2009-02-05 21:42:50 -05:00
|
|
|
|
|
2007-04-29 16:19:50 -04:00
|
|
|
|
In many cases the importer and loader can be one and the same
|
|
|
|
|
object: importer.find_module() would just return self.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
The 'fullname' argument of both methods is the fully qualified
|
|
|
|
|
module name, for example "spam.eggs.ham". As explained above, when
|
|
|
|
|
importer.find_module("spam.eggs.ham") is called, "spam.eggs" has
|
|
|
|
|
already been imported and added to sys.modules. However, the
|
2002-12-21 14:51:05 -05:00
|
|
|
|
find_module() method isn't necessarily always called during an
|
2002-12-20 08:07:24 -05:00
|
|
|
|
actual import: meta tools that analyze import dependencies (such as
|
|
|
|
|
freeze, Installer or py2exe) don't actually load modules, so an
|
|
|
|
|
importer shouldn't *depend* on the parent package being available in
|
|
|
|
|
sys.modules.
|
|
|
|
|
|
|
|
|
|
The load_module() method has a few responsibilities that it must
|
|
|
|
|
fulfill *before* it runs any code:
|
|
|
|
|
|
2004-09-23 09:54:53 -04:00
|
|
|
|
- If there is an existing module object named 'fullname' in
|
|
|
|
|
sys.modules, the loader must use that existing module.
|
|
|
|
|
(Otherwise, the reload() builtin will not work correctly.)
|
|
|
|
|
If a module named 'fullname' does not exist in sys.modules,
|
|
|
|
|
the loader must create a new module object and add it to
|
|
|
|
|
sys.modules.
|
|
|
|
|
|
|
|
|
|
In C code, all of these requirements can be met simply by using
|
|
|
|
|
the PyImport_AddModule() function, which returns the existing
|
|
|
|
|
module or creates a new one and adds it to sys.modules for you.
|
|
|
|
|
In Python code, you can use something like:
|
|
|
|
|
|
|
|
|
|
module = sys.modules.setdefault(fullname, new.module(fullname))
|
|
|
|
|
|
|
|
|
|
to accomplish the same results.
|
|
|
|
|
|
|
|
|
|
Note that the module object *must* be in sys.modules before the
|
|
|
|
|
loader executes the module code. This is crucial because the
|
|
|
|
|
module code may (directly or indirectly) import itself; adding
|
|
|
|
|
it to sys.modules beforehand prevents unbounded recursion in the
|
|
|
|
|
worst case and multiple loading in the best.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
- The __file__ attribute must be set. This must be a string, but it
|
2002-12-21 14:51:05 -05:00
|
|
|
|
may be a dummy value, for example "<frozen>". The privilege of
|
|
|
|
|
not having a __file__ attribute at all is reserved for built-in
|
2002-12-20 08:07:24 -05:00
|
|
|
|
modules.
|
|
|
|
|
|
2007-04-29 16:19:50 -04:00
|
|
|
|
- The __name__ attribute must be set. If one uses
|
|
|
|
|
imp.new_module() then the attribute is set automatically.
|
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
- If it's a package, the __path__ variable must be set. This must
|
|
|
|
|
be a list, but may be empty if __path__ has no further
|
|
|
|
|
significance to the importer (more on this later).
|
|
|
|
|
|
2002-12-30 16:23:57 -05:00
|
|
|
|
- It should add an __loader__ attribute to the module, set to the
|
2002-12-20 08:07:24 -05:00
|
|
|
|
loader object. This is mostly for introspection, but can be used
|
2002-12-21 14:51:05 -05:00
|
|
|
|
for importer-specific extras, for example getting data associated
|
2002-12-20 08:07:24 -05:00
|
|
|
|
with an importer.
|
|
|
|
|
|
2002-12-21 14:51:05 -05:00
|
|
|
|
If the module is a Python module (as opposed to a built-in module or
|
2002-12-24 15:36:32 -05:00
|
|
|
|
a dynamically loaded extension), it should execute the module's code
|
|
|
|
|
in the module's global name space (module.__dict__).
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
Here is a minimal pattern for a load_module() method:
|
|
|
|
|
|
|
|
|
|
def load_module(self, fullname):
|
|
|
|
|
ispkg, code = self._get_code(fullname)
|
2004-09-23 09:54:53 -04:00
|
|
|
|
mod = sys.modules.setdefault(fullname, imp.new_module(fullname))
|
2002-12-20 08:07:24 -05:00
|
|
|
|
mod.__file__ = "<%s>" % self.__class__.__name__
|
2002-12-30 16:23:57 -05:00
|
|
|
|
mod.__loader__ = self
|
2002-12-20 08:07:24 -05:00
|
|
|
|
if ispkg:
|
|
|
|
|
mod.__path__ = []
|
|
|
|
|
exec code in mod.__dict__
|
|
|
|
|
return mod
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification part 2: Registering Hooks
|
|
|
|
|
|
|
|
|
|
There are two types of import hooks: Meta hooks and Path hooks.
|
|
|
|
|
Meta hooks are called at the start of import processing, before any
|
|
|
|
|
other import processing (so that meta hooks can override sys.path
|
2002-12-21 14:51:05 -05:00
|
|
|
|
processing, or frozen modules, or even built-in modules). To
|
2002-12-20 08:07:24 -05:00
|
|
|
|
register a meta hook, simply add the importer object to
|
|
|
|
|
sys.meta_path (the list of registered meta hooks).
|
|
|
|
|
|
|
|
|
|
Path hooks are called as part of sys.path (or package.__path__)
|
|
|
|
|
processing, at the point where their associated path item is
|
2002-12-28 05:16:07 -05:00
|
|
|
|
encountered. A path hook is registered by adding an importer
|
|
|
|
|
factory to sys.path_hooks.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
sys.path_hooks is a list of callables, which will be checked in
|
|
|
|
|
sequence to determine if they can handle a given path item. The
|
|
|
|
|
callable is called with one argument, the path item. The callable
|
|
|
|
|
must raise ImportError if it is unable to handle the path item, and
|
2006-08-11 03:11:14 -04:00
|
|
|
|
return an importer object if it can handle the path item. Note
|
|
|
|
|
that if the callable returns an importer object for a specific
|
|
|
|
|
sys.path entry, the builtin import machinery will not be invoked
|
|
|
|
|
to handle that entry any longer, even if the importer object later
|
|
|
|
|
fails to find a specific module. The callable is typically the
|
|
|
|
|
class of the import hook, and hence the class __init__ method is
|
|
|
|
|
called. (This is also the reason why it should raise ImportError:
|
|
|
|
|
an __init__ method can't return anything. This would be possible
|
|
|
|
|
with a __new__ method in a new style class, but we don't want to
|
|
|
|
|
require anything about how a hook is implemented.)
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
The results of path hook checks are cached in
|
|
|
|
|
sys.path_importer_cache, which is a dictionary mapping path entries
|
|
|
|
|
to importer objects. The cache is checked before sys.path_hooks is
|
|
|
|
|
scanned. If it is necessary to force a rescan of sys.path_hooks, it
|
|
|
|
|
is possible to manually clear all or part of
|
|
|
|
|
sys.path_importer_cache.
|
|
|
|
|
|
|
|
|
|
Just like sys.path itself, the new sys variables must have specific
|
|
|
|
|
types:
|
|
|
|
|
|
|
|
|
|
sys.meta_path and sys.path_hooks must be Python lists.
|
|
|
|
|
sys.path_importer_cache must be a Python dict.
|
|
|
|
|
|
|
|
|
|
Modifying these variables in place is allowed, as is replacing them
|
|
|
|
|
with new objects.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Packages and the role of __path__
|
|
|
|
|
|
|
|
|
|
If a module has a __path__ attribute, the import mechanism will
|
|
|
|
|
treat it as a package. The __path__ variable is used instead of
|
|
|
|
|
sys.path when importing submodules of the package. The rules for
|
|
|
|
|
sys.path therefore also apply to pkg.__path__. So sys.path_hooks is
|
2002-12-28 05:16:07 -05:00
|
|
|
|
also consulted when pkg.__path__ is traversed. Meta importers don't
|
|
|
|
|
necessarily use sys.path at all to do their work and may therefore
|
|
|
|
|
ignore the value of pkg.__path__. In this case it is still advised
|
|
|
|
|
to set it to list, which can be empty.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
Optional Extensions to the Importer Protocol
|
|
|
|
|
|
2009-02-07 21:41:22 -05:00
|
|
|
|
The Importer Protocol defines three optional extensions. One is to
|
|
|
|
|
retrieve data files, the second is to support module packaging tools
|
2002-12-23 17:13:48 -05:00
|
|
|
|
and/or tools that analyze module dependencies (for example Freeze
|
2009-02-07 21:41:22 -05:00
|
|
|
|
[3]), while the last is to support execution of modules as scripts.
|
|
|
|
|
The latter two categories of tools usually don't actually *load*
|
2002-12-23 17:13:48 -05:00
|
|
|
|
modules, they only need to know if and where they are available.
|
2009-02-07 21:41:22 -05:00
|
|
|
|
All three extensions are highly recommended for general purpose
|
2002-12-23 17:13:48 -05:00
|
|
|
|
importers, but may safely be left out if those features aren't
|
|
|
|
|
needed.
|
|
|
|
|
|
|
|
|
|
To retrieve the data for arbitrary "files" from the underlying
|
|
|
|
|
storage backend, loader objects may supply a method named get_data:
|
|
|
|
|
|
2002-12-30 16:23:57 -05:00
|
|
|
|
loader.get_data(path)
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
|
|
|
|
This method returns the data as a string, or raise IOError if the
|
2006-02-24 17:47:18 -05:00
|
|
|
|
"file" wasn't found. The data is always returned as if "binary" mode
|
|
|
|
|
was used - there is no CRLF translation of text files, for example.
|
|
|
|
|
It is meant for importers that have some file-system-like properties.
|
|
|
|
|
The 'path' argument is a path that can be constructed by munging
|
|
|
|
|
module.__file__ (or pkg.__path__ items) with the os.path.* functions,
|
|
|
|
|
for example:
|
2002-12-30 16:23:57 -05:00
|
|
|
|
|
|
|
|
|
d = os.path.dirname(__file__)
|
2006-02-24 17:47:18 -05:00
|
|
|
|
data = __loader__.get_data(os.path.join(d, "logo.gif"))
|
2009-02-05 21:42:50 -05:00
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
The following set of methods may be implemented if support for (for
|
|
|
|
|
example) Freeze-like tools is desirable. It consists of three
|
|
|
|
|
additional methods which, to make it easier for the caller, each of
|
|
|
|
|
which should be implemented, or none at all.
|
|
|
|
|
|
2002-12-30 17:39:07 -05:00
|
|
|
|
loader.is_package(fullname)
|
2002-12-23 17:13:48 -05:00
|
|
|
|
loader.get_code(fullname)
|
|
|
|
|
loader.get_source(fullname)
|
|
|
|
|
|
|
|
|
|
All three methods should raise ImportError if the module wasn't
|
|
|
|
|
found.
|
|
|
|
|
|
2002-12-30 17:39:07 -05:00
|
|
|
|
The loader.is_package(fullname) method should return True if the
|
|
|
|
|
module specified by 'fullname' is a package and False if it isn't.
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
|
|
|
|
The loader.get_code(fullname) method should return the code object
|
|
|
|
|
associated with the module, or None if it's a built-in or extension
|
|
|
|
|
module. If the loader doesn't have the code object but it _does_
|
2006-10-19 22:07:00 -04:00
|
|
|
|
have the source code, it should return the compiled source code.
|
2002-12-30 17:39:07 -05:00
|
|
|
|
(This is so that our caller doesn't also need to check get_source()
|
2002-12-23 17:13:48 -05:00
|
|
|
|
if all it needs is the code object.)
|
|
|
|
|
|
|
|
|
|
The loader.get_source(fullname) method should return the source code
|
|
|
|
|
for the module as a string (using newline characters for line
|
|
|
|
|
endings) or None if the source is not available (yet it should still
|
|
|
|
|
raise ImportError if the module can't be found by the importer at
|
|
|
|
|
all).
|
|
|
|
|
|
2009-02-07 21:41:22 -05:00
|
|
|
|
To support execution of modules as scripts [9], the above three
|
|
|
|
|
methods for finding the code associated with a module must be
|
|
|
|
|
implemented. In addition to those methods, the following method
|
|
|
|
|
may be provided in order to allow the ``runpy`` module to correctly
|
|
|
|
|
set the ``__file__`` attribute:
|
|
|
|
|
|
|
|
|
|
loader.get_filename(fullname)
|
|
|
|
|
|
|
|
|
|
This method should return the value that ``__file__`` would be set
|
|
|
|
|
to if the named module was loaded. If the module is not found, then
|
|
|
|
|
ImportError should be raised.
|
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
Integration with the 'imp' module
|
|
|
|
|
|
|
|
|
|
The new import hooks are not easily integrated in the existing
|
|
|
|
|
imp.find_module() and imp.load_module() calls. It's questionable
|
|
|
|
|
whether it's possible at all without breaking code; it is better to
|
|
|
|
|
simply add a new function to the imp module. The meaning of the
|
|
|
|
|
existing imp.find_module() and imp.load_module() calls changes from:
|
2002-12-23 17:13:48 -05:00
|
|
|
|
"they expose the built-in import mechanism" to "they expose the
|
|
|
|
|
basic *unhooked* built-in import mechanism". They simply won't
|
2002-12-24 15:36:32 -05:00
|
|
|
|
invoke any import hooks. A new imp module function is proposed (but
|
2002-12-28 05:16:07 -05:00
|
|
|
|
not yet implemented) under the name "get_loader", which is used as
|
2002-12-24 15:36:32 -05:00
|
|
|
|
in the following pattern:
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
2002-12-28 05:16:07 -05:00
|
|
|
|
loader = imp.get_loader(fullname, path)
|
2002-12-20 08:07:24 -05:00
|
|
|
|
if loader is not None:
|
|
|
|
|
loader.load_module(fullname)
|
|
|
|
|
|
|
|
|
|
In the case of a "basic" import, one the imp.find_module() function
|
|
|
|
|
would handle, the loader object would be a wrapper for the current
|
|
|
|
|
output of imp.find_module(), and loader.load_module() would call
|
|
|
|
|
imp.load_module() with that output.
|
|
|
|
|
|
|
|
|
|
Note that this wrapper is currently not yet implemented, although a
|
|
|
|
|
Python prototype exists in the test_importhooks.py script (the
|
|
|
|
|
ImpWrapper class) included with the patch.
|
|
|
|
|
|
|
|
|
|
|
2002-12-24 15:36:32 -05:00
|
|
|
|
Forward Compatibility
|
|
|
|
|
|
|
|
|
|
Existing __import__ hooks will not invoke new-style hooks by magic,
|
2002-12-28 05:16:07 -05:00
|
|
|
|
unless they call the original __import__ function as a fallback.
|
|
|
|
|
For example, ihooks.py, iu.py and imputil.py are in this sense not
|
2002-12-24 15:36:32 -05:00
|
|
|
|
forward compatible with this PEP.
|
|
|
|
|
|
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
Open Issues
|
|
|
|
|
|
|
|
|
|
Modules often need supporting data files to do their job,
|
|
|
|
|
particularly in the case of complex packages or full applications.
|
|
|
|
|
Current practice is generally to locate such files via sys.path (or
|
|
|
|
|
a package.__path__ attribute). This approach will not work, in
|
|
|
|
|
general, for modules loaded via an import hook.
|
|
|
|
|
|
|
|
|
|
There are a number of possible ways to address this problem:
|
|
|
|
|
|
|
|
|
|
- "Don't do that". If a package needs to locate data files via its
|
|
|
|
|
__path__, it is not suitable for loading via an import hook. The
|
|
|
|
|
package can still be located on a directory in sys.path, as at
|
|
|
|
|
present, so this should not be seen as a major issue.
|
|
|
|
|
|
|
|
|
|
- Locate data files from a standard location, rather than relative
|
|
|
|
|
to the module file. A relatively simple approach (which is
|
|
|
|
|
supported by distutils) would be to locate data files based on
|
|
|
|
|
sys.prefix (or sys.exec_prefix). For example, looking in
|
|
|
|
|
os.path.join(sys.prefix, "data", package_name).
|
|
|
|
|
|
2002-12-21 14:51:05 -05:00
|
|
|
|
- Import hooks could offer a standard way of getting at data files
|
2002-12-20 08:07:24 -05:00
|
|
|
|
relative to the module file. The standard zipimport object
|
|
|
|
|
provides a method get_data(name) which returns the content of the
|
|
|
|
|
"file" called name, as a string. To allow modules to get at the
|
2002-12-30 16:23:57 -05:00
|
|
|
|
importer object, zipimport also adds an attribute "__loader__"
|
2002-12-20 08:07:24 -05:00
|
|
|
|
to the module, containing the zipimport object used to load the
|
|
|
|
|
module. If such an approach is used, it is important that client
|
|
|
|
|
code takes care not to break if the get_data method (or the
|
2002-12-30 16:23:57 -05:00
|
|
|
|
__loader__ attribute) is not available, so it is not clear that
|
2002-12-20 08:07:24 -05:00
|
|
|
|
this approach offers a general answer to the problem.
|
|
|
|
|
|
2002-12-30 16:23:57 -05:00
|
|
|
|
Requiring loaders to set the module's __loader__ attribute means
|
2002-12-20 08:07:24 -05:00
|
|
|
|
that the loader will not get thrown away once the load is complete.
|
|
|
|
|
This increases memory usage, and stops loaders from being
|
|
|
|
|
lightweight, "throwaway" objects. As loader objects are not
|
|
|
|
|
required to offer any useful functionality (any such functionality,
|
|
|
|
|
such as the zipimport get_data() method mentioned above, is
|
2002-12-30 16:23:57 -05:00
|
|
|
|
optional) it is not clear that the __loader__ attribute will be
|
2002-12-20 08:07:24 -05:00
|
|
|
|
helpful, in practice.
|
|
|
|
|
|
|
|
|
|
On the other hand, importer objects are mostly permanent, as they
|
2003-01-02 13:47:04 -05:00
|
|
|
|
live or are kept alive on sys.meta_path, sys.path_importer_cache, so
|
|
|
|
|
for a loader to keep a reference to the importer costs us nothing
|
|
|
|
|
extra. Whether loaders will ever need to carry so much independent
|
|
|
|
|
state for this to become a real issue is questionable.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
2002-12-23 17:46:21 -05:00
|
|
|
|
It was suggested on python-dev that it would be useful to be able to
|
|
|
|
|
receive a list of available modules from an importer and/or a list
|
2002-12-24 15:36:32 -05:00
|
|
|
|
of available data files for use with the get_data() method. The
|
2002-12-23 17:46:21 -05:00
|
|
|
|
protocol could grow two additional extensions, say list_modules()
|
|
|
|
|
and list_files(). The latter makes sense on loader objects with a
|
|
|
|
|
get_data() method. However, it's a bit unclear which object should
|
|
|
|
|
implement list_modules(): the importer or the loader or both?
|
|
|
|
|
|
2002-12-24 15:36:32 -05:00
|
|
|
|
This PEP is biased towards loading modules from alternative places:
|
|
|
|
|
it currently doesn't offer dedicated solutions for loading modules
|
|
|
|
|
from alternative file formats or with alternative compilers. In
|
|
|
|
|
contrast, the ihooks module from the standard library does have a
|
|
|
|
|
fairly straightforward way to do this. The Quixote project [8] uses
|
|
|
|
|
this technique to import PTL files as if they are ordinary Python
|
|
|
|
|
modules. To do the same with the new hooks would either mean to add
|
|
|
|
|
a new module implementing a subset of ihooks as a new-style
|
|
|
|
|
importer, or add a hookable built-in path importer object.
|
|
|
|
|
|
2009-02-05 21:42:50 -05:00
|
|
|
|
There is no specific support within this PEP for "stacking" hooks.
|
2003-01-02 13:47:04 -05:00
|
|
|
|
For example, it is not obvious how to write a hook to load modules
|
|
|
|
|
from ..tar.gz files by combining separate hooks to load modules from
|
|
|
|
|
.tar and ..gz files. However, there is no support for such stacking
|
|
|
|
|
in the existing hook mechanisms (either the basic "replace
|
|
|
|
|
__import__" method, or any of the existing import hook modules) and
|
|
|
|
|
so this functionality is not an obvious requirement of the new
|
|
|
|
|
mechanism. It may be worth considering as a future enhancement,
|
|
|
|
|
however.
|
|
|
|
|
|
|
|
|
|
It is possible (via sys.meta_path) to add hooks which run before
|
|
|
|
|
sys.path is processed. However, there is no equivalent way of
|
|
|
|
|
adding hooks to run after sys.path is processed. For now, if a hook
|
|
|
|
|
is required after sys.path has been processed, it can be simulated
|
|
|
|
|
by adding an arbitrary "cookie" string at the end of sys.path, and
|
|
|
|
|
having the required hook associated with this cookie, via the normal
|
|
|
|
|
sys.path_hooks processing. In the longer term, the path handling
|
|
|
|
|
code will become a "real" hook on sys.meta_path, and at that stage
|
|
|
|
|
it will be possible to insert user-defined hooks either before or
|
|
|
|
|
after it.
|
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
Implementation
|
|
|
|
|
|
2004-09-28 08:39:00 -04:00
|
|
|
|
The PEP 302 implementation has been integrated with Python as of
|
|
|
|
|
2.3a1. An earlier version is available as SourceForge patch
|
|
|
|
|
#652586, but more interestingly, the SF item contains a fairly
|
|
|
|
|
detailed history of the development and design.
|
2002-12-20 08:07:24 -05:00
|
|
|
|
http://www.python.org/sf/652586
|
|
|
|
|
|
2004-09-27 21:11:15 -04:00
|
|
|
|
PEP 273 has been implemented using PEP 302's import hooks.
|
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
References and Footnotes
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
[1] Installer by Gordon McMillan
|
|
|
|
|
http://www.mcmillan-inc.com/install1.html
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
[2] PEP 273, Import Modules from Zip Archives, Ahlstrom
|
2009-01-18 04:50:42 -05:00
|
|
|
|
http://www.python.org/dev/peps/pep-0273/
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
[3] The Freeze tool
|
|
|
|
|
Tools/freeze/ in a Python source distribution
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
[4] Squeeze
|
|
|
|
|
http://starship.python.net/crew/fredrik/ipa/squeeze.htm
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
[5] py2exe by Thomas Heller
|
|
|
|
|
http://py2exe.sourceforge.net/
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
[6] imp.set_frozenmodules() patch
|
|
|
|
|
http://www.python.org/sf/642578
|
|
|
|
|
|
2002-12-23 17:13:48 -05:00
|
|
|
|
[7] The path argument to importer.find_module() is there because the
|
|
|
|
|
pkg.__path__ variable may be needed at this point. It may either
|
|
|
|
|
come from the actual parent module or be supplied by
|
2002-12-28 05:16:07 -05:00
|
|
|
|
imp.find_module() or the proposed imp.get_loader() function.
|
2002-12-23 17:13:48 -05:00
|
|
|
|
|
2002-12-24 15:36:32 -05:00
|
|
|
|
[8] Quixote, a framework for developing Web applications
|
|
|
|
|
http://www.mems-exchange.org/software/quixote/
|
|
|
|
|
|
2009-02-07 21:41:22 -05:00
|
|
|
|
[9] PEP 338: Executing modules as scripts
|
|
|
|
|
http://www.python.org/dev/peps/pep-0338/
|
|
|
|
|
|
2002-12-20 08:07:24 -05:00
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|