From 74737f0d450b99a158caa9f4b500d77d5ed1e73c Mon Sep 17 00:00:00 2001 From: Just van Rossum Date: Mon, 23 Dec 2002 22:13:48 +0000 Subject: [PATCH] - some reflow caused by David Goodger's spell checking - added "Optional Extensions to the Importer Protocol" section - fixed flaw in the importer protocol: i.find_module() will now receive the package.__path__ (or None for a plain module) as an additional argument, if it's installed on sys.meta_path. This is needed to be able to add a hook that implements the full sys.path/pkg.__path__ semantics. See also footnote [7]. The patch on sf will be updated shortly to match the new version of the PEP. --- pep-0302.txt | 121 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 101 insertions(+), 20 deletions(-) diff --git a/pep-0302.txt b/pep-0302.txt index beae67bda..3f37a0c47 100644 --- a/pep-0302.txt +++ b/pep-0302.txt @@ -190,17 +190,17 @@ Specification part 1: The Importer Protocol module being imported (may be a dotted name) and a reference to the current global namespace. - The built-in __import__ function (known as PyImport_ImportModuleEx in - import.c) will then check to see whether the module doing the import - is a package by looking for a __path__ variable in the current - global namespace. If it is indeed a package, it first tries to do - the import relative to the package. For example if a package named - "spam" does "import eggs", it will first look for a module named - "spam.eggs". If that fails, the import continues as an absolute - import: it will look for a module named "eggs". Dotted name imports - work pretty much the same: if package "spam" does "import - eggs.bacon", first "spam.eggs.bacon" is tried, and only if that - fails "eggs.bacon" is tried. + The built-in __import__ function (known as PyImport_ImportModuleEx + in import.c) will then check to see whether the module doing the + import is a package by looking for a __path__ variable in the + current global namespace. If it is indeed a package, it first tries + to do the import relative to the package. For example if a package + named "spam" does "import eggs", it will first look for a module + named "spam.eggs". If that fails, the import continues as an + absolute import: it will look for a module named "eggs". Dotted + name imports work pretty much the same: if package "spam" does + "import eggs.bacon", first "spam.eggs.bacon" is tried, and only if + that fails "eggs.bacon" is tried. Deeper down in the mechanism, a dotted name import is split up by its components. For "import spam.ham", first an "import spam" is @@ -214,11 +214,15 @@ Specification part 1: The Importer Protocol The protocol involves two objects: an importer and a loader. An importer object has a single method: - importer.find_module(fullname) + importer.find_module(fullname, path=None) - This method returns a loader object if the module was found, or None - if it wasn't. If find_module() raises an exception, it will be - propagated to the caller, aborting the import. + This method will be called with the fully qualified name of the + module. If the importer is installed on sys.meta_path, it will + receive a second argument, which is None for a top-level module, or + package.__path__ for submodules or subpackages[7]. It should return + a loader object if the module was found, or None if it wasn't. If + find_module() raises an exception, it will be propagated to the + caller, aborting the import. A loader object also has one method: @@ -348,6 +352,72 @@ Packages and the role of __path__ which can be empty. +Optional Extensions to the Importer Protocol + + The Importer Protocol defines two optional extensions. One is to + retrieve data files, the other is to support module packaging tools + and/or tools that analyze module dependencies (for example Freeze + [3]). The latter category of tools usually don't actually *load* + modules, they only need to know if and where they are available. + Both extensions are highly recommended for general purpose + importers, but may safely be left out if those features aren't + needed. + + To retrieve the data for arbitrary "files" from the underlying + storage backend, loader objects may supply a method named get_data: + + loader.get_data(name) + + This method returns the data as a string, or raise IOError if the + "file" wasn't found. The 'name' argument should be seen as a + 'cookie', meaning the protocol doesn't prescribe any semantics for + it. However, for importer objects that have some file system-like + properties (for example zipimporter) it is recommended to use os.sep + as a separator character to specify a (possibly virtual) directory + hierarchy. For example if the importer allows access to a module's + source code via i.get_data(name), the 'name' argument should be + constructed like this: + + name = mod.__name__.replace(".", os.sep) + ".py" + + Note that this is not the recommended way to retrieve source code, + the (also optional) method loader.get_source(fullname) is more + general, as it doesn't imply *any* file-system-like characteristics. + This leads us to the next extension. + + The following set of methods may be implemented if support for (for + example) Freeze-like tools is desirable. It consists of three + additional methods which, to make it easier for the caller, each of + which should be implemented, or none at all. + + loader.get_package_path(fullname) + loader.get_code(fullname) + loader.get_source(fullname) + + All three methods should raise ImportError if the module wasn't + found. + + The loader.get_package_path(fullname) method should return None if + the module specified by 'fullname' is not a package, or a list to + serve as pkg.__path__ if it is. It can be used to check + package-ness for a module ("loader.get_package_path(fullname) is not + None") but its main purpose is to tell our caller what pkg.__path__ + would be if the module would actually be loaded. + + The loader.get_code(fullname) method should return the code object + associated with the module, or None if it's a built-in or extension + module. If the loader doesn't have the code object but it _does_ + have the source code, it should return the compiled the source code. + (This is so that our caller doesn't also need to check get_source() + if all it needs is the code object.) + + The loader.get_source(fullname) method should return the source code + for the module as a string (using newline characters for line + endings) or None if the source is not available (yet it should still + raise ImportError if the module can't be found by the importer at + all). + + Integration with the 'imp' module The new import hooks are not easily integrated in the existing @@ -355,10 +425,11 @@ Integration with the 'imp' module whether it's possible at all without breaking code; it is better to simply add a new function to the imp module. The meaning of the existing imp.find_module() and imp.load_module() calls changes from: - "they expose the built-in import mechanism" to "they expose the basic - *unhooked* built-in import mechanism". They simply won't invoke any - import hooks. A new imp module function is proposed under the name - "find_module2", with is used like the following pattern: + "they expose the built-in import mechanism" to "they expose the + basic *unhooked* built-in import mechanism". They simply won't + invoke any import hooks. A new imp module function is proposed + under the name "find_module2", with is used like the following + pattern: loader = imp.find_module2(fullname, path) if loader is not None: @@ -438,21 +509,31 @@ Implementation http://www.python.org/sf/652586 -References +References and Footnotes [1] Installer by Gordon McMillan http://www.mcmillan-inc.com/install1.html + [2] PEP 273, Import Modules from Zip Archives, Ahlstrom http://www.python.org/peps/pep-0273.html + [3] The Freeze tool Tools/freeze/ in a Python source distribution + [4] Squeeze http://starship.python.net/crew/fredrik/ipa/squeeze.htm + [5] py2exe by Thomas Heller http://py2exe.sourceforge.net/ + [6] imp.set_frozenmodules() patch http://www.python.org/sf/642578 + [7] The path argument to importer.find_module() is there because the + pkg.__path__ variable may be needed at this point. It may either + come from the actual parent module or be supplied by + imp.find_module() or the proposed imp.find_module2() function. + Copyright