- some reflow caused by David Goodger's spell checking

- added "Optional Extensions to the Importer Protocol" section
- fixed flaw in the importer protocol: i.find_module() will now receive
  the package.__path__ (or None for a plain module) as an additional
  argument, if it's installed on sys.meta_path. This is needed to
  be able to add a hook that implements the full sys.path/pkg.__path__
  semantics. See also footnote [7]. The patch on sf will be updated
  shortly to match the new version of the PEP.
This commit is contained in:
Just van Rossum 2002-12-23 22:13:48 +00:00
parent fb0080cbcb
commit 74737f0d45
1 changed files with 101 additions and 20 deletions

View File

@ -190,17 +190,17 @@ Specification part 1: The Importer Protocol
module being imported (may be a dotted name) and a reference to the
current global namespace.
The built-in __import__ function (known as PyImport_ImportModuleEx in
import.c) will then check to see whether the module doing the import
is a package by looking for a __path__ variable in the current
global namespace. If it is indeed a package, it first tries to do
the import relative to the package. For example if a package named
"spam" does "import eggs", it will first look for a module named
"spam.eggs". If that fails, the import continues as an absolute
import: it will look for a module named "eggs". Dotted name imports
work pretty much the same: if package "spam" does "import
eggs.bacon", first "spam.eggs.bacon" is tried, and only if that
fails "eggs.bacon" is tried.
The built-in __import__ function (known as PyImport_ImportModuleEx
in import.c) will then check to see whether the module doing the
import is a package by looking for a __path__ variable in the
current global namespace. If it is indeed a package, it first tries
to do the import relative to the package. For example if a package
named "spam" does "import eggs", it will first look for a module
named "spam.eggs". If that fails, the import continues as an
absolute import: it will look for a module named "eggs". Dotted
name imports work pretty much the same: if package "spam" does
"import eggs.bacon", first "spam.eggs.bacon" is tried, and only if
that fails "eggs.bacon" is tried.
Deeper down in the mechanism, a dotted name import is split up by
its components. For "import spam.ham", first an "import spam" is
@ -214,11 +214,15 @@ Specification part 1: The Importer Protocol
The protocol involves two objects: an importer and a loader. An
importer object has a single method:
importer.find_module(fullname)
importer.find_module(fullname, path=None)
This method returns a loader object if the module was found, or None
if it wasn't. If find_module() raises an exception, it will be
propagated to the caller, aborting the import.
This method will be called with the fully qualified name of the
module. If the importer is installed on sys.meta_path, it will
receive a second argument, which is None for a top-level module, or
package.__path__ for submodules or subpackages[7]. It should return
a loader object if the module was found, or None if it wasn't. If
find_module() raises an exception, it will be propagated to the
caller, aborting the import.
A loader object also has one method:
@ -348,6 +352,72 @@ Packages and the role of __path__
which can be empty.
Optional Extensions to the Importer Protocol
The Importer Protocol defines two optional extensions. One is to
retrieve data files, the other is to support module packaging tools
and/or tools that analyze module dependencies (for example Freeze
[3]). The latter category of tools usually don't actually *load*
modules, they only need to know if and where they are available.
Both extensions are highly recommended for general purpose
importers, but may safely be left out if those features aren't
needed.
To retrieve the data for arbitrary "files" from the underlying
storage backend, loader objects may supply a method named get_data:
loader.get_data(name)
This method returns the data as a string, or raise IOError if the
"file" wasn't found. The 'name' argument should be seen as a
'cookie', meaning the protocol doesn't prescribe any semantics for
it. However, for importer objects that have some file system-like
properties (for example zipimporter) it is recommended to use os.sep
as a separator character to specify a (possibly virtual) directory
hierarchy. For example if the importer allows access to a module's
source code via i.get_data(name), the 'name' argument should be
constructed like this:
name = mod.__name__.replace(".", os.sep) + ".py"
Note that this is not the recommended way to retrieve source code,
the (also optional) method loader.get_source(fullname) is more
general, as it doesn't imply *any* file-system-like characteristics.
This leads us to the next extension.
The following set of methods may be implemented if support for (for
example) Freeze-like tools is desirable. It consists of three
additional methods which, to make it easier for the caller, each of
which should be implemented, or none at all.
loader.get_package_path(fullname)
loader.get_code(fullname)
loader.get_source(fullname)
All three methods should raise ImportError if the module wasn't
found.
The loader.get_package_path(fullname) method should return None if
the module specified by 'fullname' is not a package, or a list to
serve as pkg.__path__ if it is. It can be used to check
package-ness for a module ("loader.get_package_path(fullname) is not
None") but its main purpose is to tell our caller what pkg.__path__
would be if the module would actually be loaded.
The loader.get_code(fullname) method should return the code object
associated with the module, or None if it's a built-in or extension
module. If the loader doesn't have the code object but it _does_
have the source code, it should return the compiled the source code.
(This is so that our caller doesn't also need to check get_source()
if all it needs is the code object.)
The loader.get_source(fullname) method should return the source code
for the module as a string (using newline characters for line
endings) or None if the source is not available (yet it should still
raise ImportError if the module can't be found by the importer at
all).
Integration with the 'imp' module
The new import hooks are not easily integrated in the existing
@ -355,10 +425,11 @@ Integration with the 'imp' module
whether it's possible at all without breaking code; it is better to
simply add a new function to the imp module. The meaning of the
existing imp.find_module() and imp.load_module() calls changes from:
"they expose the built-in import mechanism" to "they expose the basic
*unhooked* built-in import mechanism". They simply won't invoke any
import hooks. A new imp module function is proposed under the name
"find_module2", with is used like the following pattern:
"they expose the built-in import mechanism" to "they expose the
basic *unhooked* built-in import mechanism". They simply won't
invoke any import hooks. A new imp module function is proposed
under the name "find_module2", with is used like the following
pattern:
loader = imp.find_module2(fullname, path)
if loader is not None:
@ -438,21 +509,31 @@ Implementation
http://www.python.org/sf/652586
References
References and Footnotes
[1] Installer by Gordon McMillan
http://www.mcmillan-inc.com/install1.html
[2] PEP 273, Import Modules from Zip Archives, Ahlstrom
http://www.python.org/peps/pep-0273.html
[3] The Freeze tool
Tools/freeze/ in a Python source distribution
[4] Squeeze
http://starship.python.net/crew/fredrik/ipa/squeeze.htm
[5] py2exe by Thomas Heller
http://py2exe.sourceforge.net/
[6] imp.set_frozenmodules() patch
http://www.python.org/sf/642578
[7] The path argument to importer.find_module() is there because the
pkg.__path__ variable may be needed at this point. It may either
come from the actual parent module or be supplied by
imp.find_module() or the proposed imp.find_module2() function.
Copyright