diff --git a/pep-0302.txt b/pep-0302.txt index d9ae27e6d..0fd09a6e9 100644 --- a/pep-0302.txt +++ b/pep-0302.txt @@ -6,279 +6,267 @@ Author: Just van Rossum , Paul Moore Status: Final Type: Standards Track -Content-Type: text/plain +Content-Type: text/x-rst Created: 19-Dec-2002 Python-Version: 2.3 Post-History: 19-Dec-2002 Abstract +======== - This PEP proposes to add a new set of import hooks that offer better - customization of the Python import mechanism. Contrary to the - current __import__ hook, a new-style hook can be injected into the - existing scheme, allowing for a finer grained control of how modules - are found and how they are loaded. +This PEP proposes to add a new set of import hooks that offer better +customization of the Python import mechanism. Contrary to the current +``__import__`` hook, a new-style hook can be injected into the existing +scheme, allowing for a finer grained control of how modules are found and how +they are loaded. Motivation +========== - The only way to customize the import mechanism is currently to - override the built-in __import__ function. However, overriding - __import__ has many problems. To begin with: +The only way to customize the import mechanism is currently to override the +built-in ``__import__`` function. However, overriding ``__import__`` has many +problems. To begin with: - - An __import__ replacement needs to *fully* reimplement the entire - import mechanism, or call the original __import__ before or after - the custom code. + * An ``__import__`` replacement needs to *fully* reimplement the entire + import mechanism, or call the original ``__import__`` before or after the + custom code. - - It has very complex semantics and responsibilities. + * It has very complex semantics and responsibilities. - - __import__ gets called even for modules that are already in - sys.modules, which is almost never what you want, unless you're - writing some sort of monitoring tool. + * ``__import__`` gets called even for modules that are already in + ``sys.modules``, which is almost never what you want, unless you're writing + some sort of monitoring tool. - The situation gets worse when you need to extend the import - mechanism from C: it's currently impossible, apart from hacking - Python's import.c or reimplementing much of import.c from scratch. +The situation gets worse when you need to extend the import mechanism from C: +it's currently impossible, apart from hacking Python's ``import.c`` or +reimplementing much of ``import.c`` from scratch. - There is a fairly long history of tools written in Python that allow - extending the import mechanism in various way, based on the - __import__ hook. The Standard Library includes two such tools: - ihooks.py (by GvR) and imputil.py (Greg Stein), but perhaps the most - famous is iu.py by Gordon McMillan, available as part of his - Installer [1] package. Their usefulness is somewhat limited because - they are written in Python; bootstrapping issues need to worked - around as you can't load the module containing the hook with the - hook itself. So if you want the entire Standard Library to be - loadable from an import hook, the hook must be written in C. +There is a fairly long history of tools written in Python that allow extending +the import mechanism in various way, based on the ``__import__`` hook. The +Standard Library includes two such tools: ``ihooks.py`` (by GvR) and +``imputil.py`` [1]_ (Greg Stein), but perhaps the most famous is ``iu.py`` by +Gordon McMillan, available as part of his Installer package. Their usefulness +is somewhat limited because they are written in Python; bootstrapping issues +need to worked around as you can't load the module containing the hook with +the hook itself. So if you want the entire Standard Library to be loadable +from an import hook, the hook must be written in C. Use cases +========= - This section lists several existing applications that depend on - import hooks. Among these, a lot of duplicate work was done that - could have been saved if there had been a more flexible import hook - at the time. This PEP should make life a lot easier for similar - projects in the future. +This section lists several existing applications that depend on import hooks. +Among these, a lot of duplicate work was done that could have been saved if +there had been a more flexible import hook at the time. This PEP should make +life a lot easier for similar projects in the future. - Extending the import mechanism is needed when you want to load - modules that are stored in a non-standard way. Examples include - modules that are bundled together in an archive; byte code that is - not stored in a pyc formatted file; modules that are loaded from a - database over a network. +Extending the import mechanism is needed when you want to load modules that +are stored in a non-standard way. Examples include modules that are bundled +together in an archive; byte code that is not stored in a ``pyc`` formatted +file; modules that are loaded from a database over a network. - The work on this PEP was partly triggered by the implementation of - PEP 273 [2], which adds imports from Zip archives as a built-in - feature to Python. While the PEP itself was widely accepted as a - must-have feature, the implementation left a few things to desire. - For one thing it went through great lengths to integrate itself with - import.c, adding lots of code that was either specific for Zip file - imports or *not* specific to Zip imports, yet was not generally - useful (or even desirable) either. Yet the PEP 273 implementation - can hardly be blamed for this: it is simply extremely hard to do, - given the current state of import.c. +The work on this PEP was partly triggered by the implementation of PEP 273, +which adds imports from Zip archives as a built-in feature to Python. While +the PEP itself was widely accepted as a must-have feature, the implementation +left a few things to desire. For one thing it went through great lengths to +integrate itself with ``import.c``, adding lots of code that was either +specific for Zip file imports or *not* specific to Zip imports, yet was not +generally useful (or even desirable) either. Yet the PEP 273 implementation +can hardly be blamed for this: it is simply extremely hard to do, given the +current state of ``import.c``. - Packaging applications for end users is a typical use case for - import hooks, if not *the* typical use case. Distributing lots of - source or pyc files around is not always appropriate (let alone a - separate Python installation), so there is a frequent desire to - package all needed modules in a single file. So frequent in fact - that multiple solutions have been implemented over the years. +Packaging applications for end users is a typical use case for import hooks, +if not *the* typical use case. Distributing lots of source or ``pyc`` files +around is not always appropriate (let alone a separate Python installation), +so there is a frequent desire to package all needed modules in a single file. +So frequent in fact that multiple solutions have been implemented over the +years. - The oldest one is included with the Python source code: Freeze [3]. - It puts marshalled byte code into static objects in C source code. - Freeze's "import hook" is hard wired into import.c, and has a couple - of issues. Later solutions include Fredrik Lundh's Squeeze [4], - Gordon McMillan's Installer [1] and Thomas Heller's py2exe [5]. - MacPython ships with a tool called BuildApplication. +The oldest one is included with the Python source code: Freeze [2]_. It puts +marshalled byte code into static objects in C source code. Freeze's "import +hook" is hard wired into ``import.c``, and has a couple of issues. Later +solutions include Fredrik Lundh's Squeeze, Gordon McMillan's Installer, and +Thomas Heller's py2exe [3]_. MacPython ships with a tool called +``BuildApplication``. - Squeeze, Installer and py2exe use an __import__ based scheme (py2exe - currently uses Installer's iu.py, Squeeze used ihooks.py), MacPython - has two Mac-specific import hooks hard wired into import.c, that are - similar to the Freeze hook. The hooks proposed in this PEP enables - us (at least in theory; it's not a short term goal) to get rid of - the hard coded hooks in import.c, and would allow the - __import__-based tools to get rid of most of their import.c - emulation code. +Squeeze, Installer and py2exe use an ``__import__`` based scheme (py2exe +currently uses Installer's ``iu.py``, Squeeze used ``ihooks.py``), MacPython +has two Mac-specific import hooks hard wired into ``import.c``, that are +similar to the Freeze hook. The hooks proposed in this PEP enables us (at +least in theory; it's not a short term goal) to get rid of the hard coded +hooks in ``import.c``, and would allow the ``__import__``-based tools to get +rid of most of their ``import.c`` emulation code. - Before work on the design and implementation of this PEP was - started, a new BuildApplication-like tool for MacOS X prompted one - of the authors of this PEP (JvR) to expose the table of frozen - modules to Python, in the imp module. The main reason was to be - able to use the freeze import hook (avoiding fancy __import__ - support), yet to also be able to supply a set of modules at - runtime. This resulted in sf patch #642578 [6], which was - mysteriously accepted (mostly because nobody seemed to care either - way ;-). Yet it is completely superfluous when this PEP gets - accepted, as it offers a much nicer and general way to do the same - thing. +Before work on the design and implementation of this PEP was started, a new +``BuildApplication``-like tool for Mac OS X prompted one of the authors of +this PEP (JvR) to expose the table of frozen modules to Python, in the ``imp`` +module. The main reason was to be able to use the freeze import hook +(avoiding fancy ``__import__`` support), yet to also be able to supply a set +of modules at runtime. This resulted in issue #642578 [4]_, which was +mysteriously accepted (mostly because nobody seemed to care either way ;-). +Yet it is completely superfluous when this PEP gets accepted, as it offers a +much nicer and general way to do the same thing. Rationale +========= - While experimenting with alternative implementation ideas to get - built-in Zip import, it was discovered that achieving this is - possible with only a fairly small amount of changes to import.c. - This allowed to factor out the Zip-specific stuff into a new source - file, while at the same time creating a *general* new import hook - scheme: the one you're reading about now. +While experimenting with alternative implementation ideas to get built-in Zip +import, it was discovered that achieving this is possible with only a fairly +small amount of changes to ``import.c``. This allowed to factor out the +Zip-specific stuff into a new source file, while at the same time creating a +*general* new import hook scheme: the one you're reading about now. - An earlier design allowed non-string objects on sys.path. Such an - object would have the necessary methods to handle an import. This - has two disadvantages: 1) it breaks code that assumes all items on - sys.path are strings; 2) it is not compatible with the PYTHONPATH - environment variable. The latter is directly needed for Zip - imports. A compromise came from Jython: allow string *subclasses* - on sys.path, which would then act as importer objects. This avoids - some breakage, and seems to work well for Jython (where it is used - to load modules from .jar files), but it was perceived as an "ugly - hack". +An earlier design allowed non-string objects on ``sys.path``. Such an object +would have the necessary methods to handle an import. This has two +disadvantages: 1) it breaks code that assumes all items on ``sys.path`` are +strings; 2) it is not compatible with the ``PYTHONPATH`` environment variable. +The latter is directly needed for Zip imports. A compromise came from Jython: +allow string *subclasses* on ``sys.path``, which would then act as importer +objects. This avoids some breakage, and seems to work well for Jython (where +it is used to load modules from ``.jar`` files), but it was perceived as an +"ugly hack". - This lead to a more elaborate scheme, (mostly copied from McMillan's - iu.py) in which each in a list of candidates is asked whether it can - handle the sys.path item, until one is found that can. This list of - candidates is a new object in the sys module: sys.path_hooks. +This lead to a more elaborate scheme, (mostly copied from McMillan's +``iu.py``) in which each in a list of candidates is asked whether it can +handle the ``sys.path`` item, until one is found that can. This list of +candidates is a new object in the ``sys`` module: ``sys.path_hooks``. - Traversing sys.path_hooks for each path item for each new import can - be expensive, so the results are cached in another new object in the - sys module: sys.path_importer_cache. It maps sys.path entries to - importer objects. +Traversing ``sys.path_hooks`` for each path item for each new import can be +expensive, so the results are cached in another new object in the ``sys`` +module: ``sys.path_importer_cache``. It maps ``sys.path`` entries to importer +objects. - To minimize the impact on import.c as well as to avoid adding extra - overhead, it was chosen to not add an explicit hook and importer - object for the existing file system import logic (as iu.py has), but - to simply fall back to the built-in logic if no hook on - sys.path_hooks could handle the path item. If this is the case, a - None value is stored in sys.path_importer_cache, again to avoid - repeated lookups. (Later we can go further and add a real importer - object for the built-in mechanism, for now, the None fallback scheme - should suffice.) +To minimize the impact on ``import.c`` as well as to avoid adding extra +overhead, it was chosen to not add an explicit hook and importer object for +the existing file system import logic (as ``iu.py`` has), but to simply fall +back to the built-in logic if no hook on ``sys.path_hooks`` could handle the +path item. If this is the case, a ``None`` value is stored in +``sys.path_importer_cache``, again to avoid repeated lookups. (Later we can +go further and add a real importer object for the built-in mechanism, for now, +the ``None`` fallback scheme should suffice.) + +A question was raised: what about importers that don't need *any* entry on +``sys.path``? (Built-in and frozen modules fall into that category.) Again, +Gordon McMillan to the rescue: ``iu.py`` contains a thing he calls the +*metapath*. In this PEP's implementation, it's a list of importer objects +that is traversed *before* ``sys.path``. This list is yet another new object +in the ``sys`` module: ``sys.meta_path``. Currently, this list is empty by +default, and frozen and built-in module imports are done after traversing +``sys.meta_path``, but still before ``sys.path``. - A question was raised: what about importers that don't need *any* - entry on sys.path? (Built-in and frozen modules fall into that - category.) Again, Gordon McMillan to the rescue: iu.py contains a - thing he calls the "metapath". In this PEP's implementation, it's a - list of importer objects that is traversed *before* sys.path. This - list is yet another new object in the sys.module: sys.meta_path. - Currently, this list is empty by default, and frozen and built-in - module imports are done after traversing sys.meta_path, but still - before sys.path. Specification part 1: The Importer Protocol +=========================================== - This PEP introduces a new protocol: the "Importer Protocol". It is - important to understand the context in which the protocol operates, - so here is a brief overview of the outer shells of the import - mechanism. +This PEP introduces a new protocol: the "Importer Protocol". It is important +to understand the context in which the protocol operates, so here is a brief +overview of the outer shells of the import mechanism. - When an import statement is encountered, the interpreter looks up - the __import__ function in the built-in name space. __import__ is - then called with four arguments, amongst which are the name of the - module being imported (may be a dotted name) and a reference to the - current global namespace. +When an import statement is encountered, the interpreter looks up the +``__import__`` function in the built-in name space. ``__import__`` is then +called with four arguments, amongst which are the name of the module being +imported (may be a dotted name) and a reference to the current global +namespace. - The built-in __import__ function (known as PyImport_ImportModuleEx - in import.c) will then check to see whether the module doing the - import is a package or a submodule of a package. If it is indeed a - (submodule of a) package, it first tries to do the import relative - to the package (the parent package for a submodule). For example if - a package named "spam" does "import eggs", it will first look for a - module named "spam.eggs". If that fails, the import continues as an - absolute import: it will look for a module named "eggs". Dotted - name imports work pretty much the same: if package "spam" does - "import eggs.bacon" (and "spam.eggs" exists and is itself a - package), "spam.eggs.bacon" is tried. If that fails "eggs.bacon" is - tried. (There are more subtleties that are not described here, but - these are not relevant for implementers of the Importer Protocol.) +The built-in ``__import__`` function (known as ``PyImport_ImportModuleEx()`` +in ``import.c``) will then check to see whether the module doing the import is +a package or a submodule of a package. If it is indeed a (submodule of a) +package, it first tries to do the import relative to the package (the parent +package for a submodule). For example if a package named "spam" does "import +eggs", it will first look for a module named "spam.eggs". If that fails, the +import continues as an absolute import: it will look for a module named +"eggs". Dotted name imports work pretty much the same: if package "spam" does +"import eggs.bacon" (and "spam.eggs" exists and is itself a package), +"spam.eggs.bacon" is tried. If that fails "eggs.bacon" is tried. (There are +more subtleties that are not described here, but these are not relevant for +implementers of the Importer Protocol.) - Deeper down in the mechanism, a dotted name import is split up by - its components. For "import spam.ham", first an "import spam" is - done, and only when that succeeds is "ham" imported as a submodule - of "spam". +Deeper down in the mechanism, a dotted name import is split up by its +components. For "import spam.ham", first an "import spam" is done, and only +when that succeeds is "ham" imported as a submodule of "spam". - The Importer Protocol operates at this level of *individual* - imports. By the time an importer gets a request for "spam.ham", - module "spam" has already been imported. +The Importer Protocol operates at this level of *individual* imports. By the +time an importer gets a request for "spam.ham", module "spam" has already been +imported. - The protocol involves two objects: a finder and a loader. A - finder object has a single method: +The protocol involves two objects: a *finder* and a *loader*. A finder object +has a single method:: - finder.find_module(fullname, path=None) + finder.find_module(fullname, path=None) - This method will be called with the fully qualified name of the - module. If the finder is installed on sys.meta_path, it will - receive a second argument, which is None for a top-level module, or - package.__path__ for submodules or subpackages[7]. It should return - a loader object if the module was found, or None if it wasn't. If - find_module() raises an exception, it will be propagated to the - caller, aborting the import. +This method will be called with the fully qualified name of the module. If +the finder is installed on ``sys.meta_path``, it will receive a second +argument, which is ``None`` for a top-level module, or ``package.__path__`` +for submodules or subpackages [5]_. It should return a loader object if the +module was found, or ``None`` if it wasn't. If ``find_module()`` raises an +exception, it will be propagated to the caller, aborting the import. - A loader object also has one method: +A loader object also has one method:: - loader.load_module(fullname) + loader.load_module(fullname) - This method returns the loaded module or raises an exception, - preferably ImportError if an existing exception is not being - propagated. If load_module() is asked to load a module that it - cannot, ImportError is to be raised. +This method returns the loaded module or raises an exception, preferably +``ImportError`` if an existing exception is not being propagated. If +``load_module()`` is asked to load a module that it cannot, ``ImportError`` is +to be raised. - In many cases the finder and loader can be one and the same - object: finder.find_module() would just return self. +In many cases the finder and loader can be one and the same object: +``finder.find_module()`` would just return ``self``. - The 'fullname' argument of both methods is the fully qualified - module name, for example "spam.eggs.ham". As explained above, when - finder.find_module("spam.eggs.ham") is called, "spam.eggs" has - already been imported and added to sys.modules. However, the - find_module() method isn't necessarily always called during an - actual import: meta tools that analyze import dependencies (such as - freeze, Installer or py2exe) don't actually load modules, so an - finder shouldn't *depend* on the parent package being available in - sys.modules. +The ``fullname`` argument of both methods is the fully qualified module name, +for example "spam.eggs.ham". As explained above, when +``finder.find_module("spam.eggs.ham")`` is called, "spam.eggs" has already +been imported and added to ``sys.modules``. However, the ``find_module()`` +method isn't necessarily always called during an actual import: meta tools +that analyze import dependencies (such as freeze, Installer or py2exe) don't +actually load modules, so a finder shouldn't *depend* on the parent package +being available in ``sys.modules``. - The load_module() method has a few responsibilities that it must - fulfill *before* it runs any code: +The ``load_module()`` method has a few responsibilities that it must fulfill +*before* it runs any code: - - If there is an existing module object named 'fullname' in - sys.modules, the loader must use that existing module. - (Otherwise, the reload() builtin will not work correctly.) - If a module named 'fullname' does not exist in sys.modules, - the loader must create a new module object and add it to - sys.modules. + * If there is an existing module object named 'fullname' in ``sys.modules``, + the loader must use that existing module. (Otherwise, the ``reload()`` + builtin will not work correctly.) If a module named 'fullname' does not + exist in ``sys.modules``, the loader must create a new module object and + add it to ``sys.modules``. - Note that the module object *must* be in sys.modules before the - loader executes the module code. This is crucial because the - module code may (directly or indirectly) import itself; adding - it to sys.modules beforehand prevents unbounded recursion in the - worst case and multiple loading in the best. + Note that the module object *must* be in ``sys.modules`` before the loader + executes the module code. This is crucial because the module code may + (directly or indirectly) import itself; adding it to ``sys.modules`` + beforehand prevents unbounded recursion in the worst case and multiple + loading in the best. - If the load fails, the loader needs to remove any module it may have - inserted into sys.modules. If the module was already in - sys.modules then the loader should leave it alone. + If the load fails, the loader needs to remove any module it may have + inserted into ``sys.modules``. If the module was already in ``sys.modules`` + then the loader should leave it alone. - - The __file__ attribute must be set. This must be a string, but it - may be a dummy value, for example "". The privilege of - not having a __file__ attribute at all is reserved for built-in - modules. + * The ``__file__`` attribute must be set. This must be a string, but it may + be a dummy value, for example "". The privilege of not having a + ``__file__`` attribute at all is reserved for built-in modules. - - The __name__ attribute must be set. If one uses - imp.new_module() then the attribute is set automatically. + * The ``__name__`` attribute must be set. If one uses ``imp.new_module()`` + then the attribute is set automatically. - - If it's a package, the __path__ variable must be set. This must - be a list, but may be empty if __path__ has no further - significance to the importer (more on this later). + * If it's a package, the ``__path__`` variable must be set. This must be a + list, but may be empty if ``__path__`` has no further significance to the + importer (more on this later). - - The __loader__ attribute must be set to the loader object. - This is mostly for introspection and reloading, but can be used - for importer-specific extras, for example getting data associated - with an importer. + * The ``__loader__`` attribute must be set to the loader object. This is + mostly for introspection and reloading, but can be used for + importer-specific extras, for example getting data associated with an + importer. - - The __package__ attribute [10] must be set. + * The ``__package__`` attribute [8]_ must be set. - If the module is a Python module (as opposed to a built-in module or - a dynamically loaded extension), it should execute the module's code - in the module's global name space (module.__dict__). + If the module is a Python module (as opposed to a built-in module or a + dynamically loaded extension), it should execute the module's code in the + module's global name space (``module.__dict__``). - Here is a minimal pattern for a load_module() method: + Here is a minimal pattern for a ``load_module()`` method:: # Consider using importlib.util.module_for_loader() to handle # most of these details for you. @@ -298,294 +286,286 @@ Specification part 1: The Importer Protocol Specification part 2: Registering Hooks +======================================= - There are two types of import hooks: Meta hooks and Path hooks. - Meta hooks are called at the start of import processing, before any - other import processing (so that meta hooks can override sys.path - processing, or frozen modules, or even built-in modules). To - register a meta hook, simply add the finder object to - sys.meta_path (the list of registered meta hooks). +There are two types of import hooks: *Meta hooks* and *Path hooks*. Meta +hooks are called at the start of import processing, before any other import +processing (so that meta hooks can override ``sys.path`` processing, frozen +modules, or even built-in modules). To register a meta hook, simply add the +finder object to ``sys.meta_path`` (the list of registered meta hooks). - Path hooks are called as part of sys.path (or package.__path__) - processing, at the point where their associated path item is - encountered. A path hook is registered by adding an importer - factory to sys.path_hooks. +Path hooks are called as part of ``sys.path`` (or ``package.__path__``) +processing, at the point where their associated path item is encountered. A +path hook is registered by adding an importer factory to ``sys.path_hooks``. - sys.path_hooks is a list of callables, which will be checked in - sequence to determine if they can handle a given path item. The - callable is called with one argument, the path item. The callable - must raise ImportError if it is unable to handle the path item, and - return an importer object if it can handle the path item. Note - that if the callable returns an importer object for a specific - sys.path entry, the builtin import machinery will not be invoked - to handle that entry any longer, even if the importer object later - fails to find a specific module. The callable is typically the - class of the import hook, and hence the class __init__ method is - called. (This is also the reason why it should raise ImportError: - an __init__ method can't return anything. This would be possible - with a __new__ method in a new style class, but we don't want to - require anything about how a hook is implemented.) +``sys.path_hooks`` is a list of callables, which will be checked in sequence +to determine if they can handle a given path item. The callable is called +with one argument, the path item. The callable must raise ``ImportError`` if +it is unable to handle the path item, and return an importer object if it can +handle the path item. Note that if the callable returns an importer object +for a specific ``sys.path`` entry, the builtin import machinery will not be +invoked to handle that entry any longer, even if the importer object later +fails to find a specific module. The callable is typically the class of the +import hook, and hence the class ``__init__()`` method is called. (This is +also the reason why it should raise ``ImportError``: an ``__init__()`` method +can't return anything. This would be possible with a ``__new__()`` method in +a new style class, but we don't want to require anything about how a hook is +implemented.) - The results of path hook checks are cached in - sys.path_importer_cache, which is a dictionary mapping path entries - to importer objects. The cache is checked before sys.path_hooks is - scanned. If it is necessary to force a rescan of sys.path_hooks, it - is possible to manually clear all or part of - sys.path_importer_cache. +The results of path hook checks are cached in ``sys.path_importer_cache``, +which is a dictionary mapping path entries to importer objects. The cache is +checked before ``sys.path_hooks`` is scanned. If it is necessary to force a +rescan of ``sys.path_hooks``, it is possible to manually clear all or part of +``sys.path_importer_cache``. - Just like sys.path itself, the new sys variables must have specific - types: +Just like ``sys.path`` itself, the new ``sys`` variables must have specific +types: - sys.meta_path and sys.path_hooks must be Python lists. - sys.path_importer_cache must be a Python dict. + * ``sys.meta_path`` and ``sys.path_hooks`` must be Python lists. + * ``sys.path_importer_cache`` must be a Python dict. - Modifying these variables in place is allowed, as is replacing them - with new objects. +Modifying these variables in place is allowed, as is replacing them with new +objects. -Packages and the role of __path__ +Packages and the role of ``__path__`` +===================================== - If a module has a __path__ attribute, the import mechanism will - treat it as a package. The __path__ variable is used instead of - sys.path when importing submodules of the package. The rules for - sys.path therefore also apply to pkg.__path__. So sys.path_hooks is - also consulted when pkg.__path__ is traversed. Meta importers don't - necessarily use sys.path at all to do their work and may therefore - ignore the value of pkg.__path__. In this case it is still advised - to set it to list, which can be empty. +If a module has a ``__path__`` attribute, the import mechanism will treat it +as a package. The ``__path__`` variable is used instead of ``sys.path`` when +importing submodules of the package. The rules for ``sys.path`` therefore +also apply to ``pkg.__path__``. So ``sys.path_hooks`` is also consulted when +``pkg.__path__`` is traversed. Meta importers don't necessarily use +``sys.path`` at all to do their work and may therefore ignore the value of +``pkg.__path__``. In this case it is still advised to set it to list, which +can be empty. Optional Extensions to the Importer Protocol +============================================ - The Importer Protocol defines three optional extensions. One is to - retrieve data files, the second is to support module packaging tools - and/or tools that analyze module dependencies (for example Freeze - [3]), while the last is to support execution of modules as scripts. - The latter two categories of tools usually don't actually *load* - modules, they only need to know if and where they are available. - All three extensions are highly recommended for general purpose - importers, but may safely be left out if those features aren't - needed. +The Importer Protocol defines three optional extensions. One is to retrieve +data files, the second is to support module packaging tools and/or tools that +analyze module dependencies (for example Freeze), while the last is to support +execution of modules as scripts. The latter two categories of tools usually +don't actually *load* modules, they only need to know if and where they are +available. All three extensions are highly recommended for general purpose +importers, but may safely be left out if those features aren't needed. - To retrieve the data for arbitrary "files" from the underlying - storage backend, loader objects may supply a method named get_data: +To retrieve the data for arbitrary "files" from the underlying storage +backend, loader objects may supply a method named ``get_data()``:: - loader.get_data(path) + loader.get_data(path) - This method returns the data as a string, or raise IOError if the - "file" wasn't found. The data is always returned as if "binary" mode - was used - there is no CRLF translation of text files, for example. - It is meant for importers that have some file-system-like properties. - The 'path' argument is a path that can be constructed by munging - module.__file__ (or pkg.__path__ items) with the os.path.* functions, - for example: +This method returns the data as a string, or raise ``IOError`` if the "file" +wasn't found. The data is always returned as if "binary" mode was used - +there is no CRLF translation of text files, for example. It is meant for +importers that have some file-system-like properties. The 'path' argument is +a path that can be constructed by munging ``module.__file__`` (or +``pkg.__path__`` items) with the ``os.path.*`` functions, for example:: - d = os.path.dirname(__file__) - data = __loader__.get_data(os.path.join(d, "logo.gif")) + d = os.path.dirname(__file__) + data = __loader__.get_data(os.path.join(d, "logo.gif")) - The following set of methods may be implemented if support for (for - example) Freeze-like tools is desirable. It consists of three - additional methods which, to make it easier for the caller, each of - which should be implemented, or none at all. +The following set of methods may be implemented if support for (for example) +Freeze-like tools is desirable. It consists of three additional methods +which, to make it easier for the caller, each of which should be implemented, +or none at all:: - loader.is_package(fullname) - loader.get_code(fullname) - loader.get_source(fullname) + loader.is_package(fullname) + loader.get_code(fullname) + loader.get_source(fullname) - All three methods should raise ImportError if the module wasn't - found. +All three methods should raise ``ImportError`` if the module wasn't found. - The loader.is_package(fullname) method should return True if the - module specified by 'fullname' is a package and False if it isn't. +The ``loader.is_package(fullname)`` method should return ``True`` if the +module specified by 'fullname' is a package and ``False`` if it isn't. - The loader.get_code(fullname) method should return the code object - associated with the module, or None if it's a built-in or extension - module. If the loader doesn't have the code object but it _does_ - have the source code, it should return the compiled source code. - (This is so that our caller doesn't also need to check get_source() - if all it needs is the code object.) +The ``loader.get_code(fullname)`` method should return the code object +associated with the module, or ``None`` if it's a built-in or extension +module. If the loader doesn't have the code object but it *does* have the +source code, it should return the compiled source code. (This is so that our +caller doesn't also need to check ``get_source()`` if all it needs is the code +object.) - The loader.get_source(fullname) method should return the source code - for the module as a string (using newline characters for line - endings) or None if the source is not available (yet it should still - raise ImportError if the module can't be found by the importer at - all). +The ``loader.get_source(fullname)`` method should return the source code for +the module as a string (using newline characters for line endings) or ``None`` +if the source is not available (yet it should still raise ``ImportError`` if +the module can't be found by the importer at all). - To support execution of modules as scripts [9], the above three - methods for finding the code associated with a module must be - implemented. In addition to those methods, the following method - may be provided in order to allow the ``runpy`` module to correctly - set the ``__file__`` attribute: +To support execution of modules as scripts [6]_, the above three methods for +finding the code associated with a module must be implemented. In addition to +those methods, the following method may be provided in order to allow the +``runpy`` module to correctly set the ``__file__`` attribute:: - loader.get_filename(fullname) + loader.get_filename(fullname) - This method should return the value that ``__file__`` would be set - to if the named module was loaded. If the module is not found, then - ImportError should be raised. +This method should return the value that ``__file__`` would be set to if the +named module was loaded. If the module is not found, then ``ImportError`` +should be raised. Integration with the 'imp' module +================================= - The new import hooks are not easily integrated in the existing - imp.find_module() and imp.load_module() calls. It's questionable - whether it's possible at all without breaking code; it is better to - simply add a new function to the imp module. The meaning of the - existing imp.find_module() and imp.load_module() calls changes from: - "they expose the built-in import mechanism" to "they expose the - basic *unhooked* built-in import mechanism". They simply won't - invoke any import hooks. A new imp module function is proposed (but - not yet implemented) under the name "get_loader", which is used as - in the following pattern: +The new import hooks are not easily integrated in the existing +``imp.find_module()`` and ``imp.load_module()`` calls. It's questionable +whether it's possible at all without breaking code; it is better to simply add +a new function to the ``imp`` module. The meaning of the existing +``imp.find_module()`` and ``imp.load_module()`` calls changes from: "they +expose the built-in import mechanism" to "they expose the basic *unhooked* +built-in import mechanism". They simply won't invoke any import hooks. A new +``imp`` module function is proposed (but not yet implemented) under the name +``get_loader()``, which is used as in the following pattern:: - loader = imp.get_loader(fullname, path) - if loader is not None: - loader.load_module(fullname) + loader = imp.get_loader(fullname, path) + if loader is not None: + loader.load_module(fullname) - In the case of a "basic" import, one the imp.find_module() function - would handle, the loader object would be a wrapper for the current - output of imp.find_module(), and loader.load_module() would call - imp.load_module() with that output. +In the case of a "basic" import, one the `imp.find_module()` function would +handle, the loader object would be a wrapper for the current output of +``imp.find_module()``, and ``loader.load_module()`` would call +``imp.load_module()`` with that output. - Note that this wrapper is currently not yet implemented, although a - Python prototype exists in the test_importhooks.py script (the - ImpWrapper class) included with the patch. +Note that this wrapper is currently not yet implemented, although a Python +prototype exists in the ``test_importhooks.py`` script (the ``ImpWrapper`` +class) included with the patch. Forward Compatibility +===================== - Existing __import__ hooks will not invoke new-style hooks by magic, - unless they call the original __import__ function as a fallback. - For example, ihooks.py, iu.py and imputil.py are in this sense not - forward compatible with this PEP. +Existing ``__import__`` hooks will not invoke new-style hooks by magic, unless +they call the original ``__import__`` function as a fallback. For example, +``ihooks.py``, ``iu.py`` and ``imputil.py`` are in this sense not forward +compatible with this PEP. Open Issues +=========== - Modules often need supporting data files to do their job, - particularly in the case of complex packages or full applications. - Current practice is generally to locate such files via sys.path (or - a package.__path__ attribute). This approach will not work, in - general, for modules loaded via an import hook. +Modules often need supporting data files to do their job, particularly in the +case of complex packages or full applications. Current practice is generally +to locate such files via ``sys.path`` (or a ``package.__path__`` attribute). +This approach will not work, in general, for modules loaded via an import +hook. - There are a number of possible ways to address this problem: +There are a number of possible ways to address this problem: - - "Don't do that". If a package needs to locate data files via its - __path__, it is not suitable for loading via an import hook. The - package can still be located on a directory in sys.path, as at - present, so this should not be seen as a major issue. + * "Don't do that". If a package needs to locate data files via its + ``__path__``, it is not suitable for loading via an import hook. The + package can still be located on a directory in ``sys.path``, as at present, + so this should not be seen as a major issue. - - Locate data files from a standard location, rather than relative - to the module file. A relatively simple approach (which is - supported by distutils) would be to locate data files based on - sys.prefix (or sys.exec_prefix). For example, looking in - os.path.join(sys.prefix, "data", package_name). + * Locate data files from a standard location, rather than relative to the + module file. A relatively simple approach (which is supported by + distutils) would be to locate data files based on ``sys.prefix`` (or + ``sys.exec_prefix``). For example, looking in + ``os.path.join(sys.prefix, "data", package_name)``. - - Import hooks could offer a standard way of getting at data files - relative to the module file. The standard zipimport object - provides a method get_data(name) which returns the content of the - "file" called name, as a string. To allow modules to get at the - importer object, zipimport also adds an attribute "__loader__" - to the module, containing the zipimport object used to load the - module. If such an approach is used, it is important that client - code takes care not to break if the get_data method is not available, - so it is not clear that this approach offers a general answer to the - problem. + * Import hooks could offer a standard way of getting at data files relative + to the module file. The standard ``zipimport`` object provides a method + ``get_data(name)`` which returns the content of the "file" called ``name``, + as a string. To allow modules to get at the importer object, ``zipimport`` + also adds an attribute ``__loader__`` to the module, containing the + ``zipimport`` object used to load the module. If such an approach is used, + it is important that client code takes care not to break if the + ``get_data()`` method is not available, so it is not clear that this + approach offers a general answer to the problem. - It was suggested on python-dev that it would be useful to be able to - receive a list of available modules from an importer and/or a list - of available data files for use with the get_data() method. The - protocol could grow two additional extensions, say list_modules() - and list_files(). The latter makes sense on loader objects with a - get_data() method. However, it's a bit unclear which object should - implement list_modules(): the importer or the loader or both? +It was suggested on python-dev that it would be useful to be able to receive a +list of available modules from an importer and/or a list of available data +files for use with the ``get_data()`` method. The protocol could grow two +additional extensions, say ``list_modules()`` and ``list_files()``. The +latter makes sense on loader objects with a ``get_data()`` method. However, +it's a bit unclear which object should implement ``list_modules()``: the +importer or the loader or both? - This PEP is biased towards loading modules from alternative places: - it currently doesn't offer dedicated solutions for loading modules - from alternative file formats or with alternative compilers. In - contrast, the ihooks module from the standard library does have a - fairly straightforward way to do this. The Quixote project [8] uses - this technique to import PTL files as if they are ordinary Python - modules. To do the same with the new hooks would either mean to add - a new module implementing a subset of ihooks as a new-style - importer, or add a hookable built-in path importer object. +This PEP is biased towards loading modules from alternative places: it +currently doesn't offer dedicated solutions for loading modules from +alternative file formats or with alternative compilers. In contrast, the +``ihooks`` module from the standard library does have a fairly straightforward +way to do this. The Quixote project [7]_ uses this technique to import PTL +files as if they are ordinary Python modules. To do the same with the new +hooks would either mean to add a new module implementing a subset of +``ihooks`` as a new-style importer, or add a hookable built-in path importer +object. - There is no specific support within this PEP for "stacking" hooks. - For example, it is not obvious how to write a hook to load modules - from ..tar.gz files by combining separate hooks to load modules from - .tar and ..gz files. However, there is no support for such stacking - in the existing hook mechanisms (either the basic "replace - __import__" method, or any of the existing import hook modules) and - so this functionality is not an obvious requirement of the new - mechanism. It may be worth considering as a future enhancement, - however. +There is no specific support within this PEP for "stacking" hooks. For +example, it is not obvious how to write a hook to load modules from ``tar.gz`` +files by combining separate hooks to load modules from ``.tar`` and ``.gz`` +files. However, there is no support for such stacking in the existing hook +mechanisms (either the basic "replace ``__import__``" method, or any of the +existing import hook modules) and so this functionality is not an obvious +requirement of the new mechanism. It may be worth considering as a future +enhancement, however. - It is possible (via sys.meta_path) to add hooks which run before - sys.path is processed. However, there is no equivalent way of - adding hooks to run after sys.path is processed. For now, if a hook - is required after sys.path has been processed, it can be simulated - by adding an arbitrary "cookie" string at the end of sys.path, and - having the required hook associated with this cookie, via the normal - sys.path_hooks processing. In the longer term, the path handling - code will become a "real" hook on sys.meta_path, and at that stage - it will be possible to insert user-defined hooks either before or - after it. +It is possible (via ``sys.meta_path``) to add hooks which run before +``sys.path`` is processed. However, there is no equivalent way of adding +hooks to run after ``sys.path`` is processed. For now, if a hook is required +after ``sys.path`` has been processed, it can be simulated by adding an +arbitrary "cookie" string at the end of ``sys.path``, and having the required +hook associated with this cookie, via the normal ``sys.path_hooks`` +processing. In the longer term, the path handling code will become a "real" +hook on ``sys.meta_path``, and at that stage it will be possible to insert +user-defined hooks either before or after it. Implementation +============== - The PEP 302 implementation has been integrated with Python as of - 2.3a1. An earlier version is available as SourceForge patch - #652586, but more interestingly, the SF item contains a fairly - detailed history of the development and design. - http://www.python.org/sf/652586 +The PEP 302 implementation has been integrated with Python as of 2.3a1. An +earlier version is available as patch #652586 [9]_, but more interestingly, +the issue contains a fairly detailed history of the development and design. - PEP 273 has been implemented using PEP 302's import hooks. +PEP 273 has been implemented using PEP 302's import hooks. References and Footnotes +======================== - [1] Installer by Gordon McMillan - http://www.mcmillan-inc.com/install1.html +.. [1] imputil module + http://docs.python.org/library/imputil.html - [2] PEP 273, Import Modules from Zip Archives, Ahlstrom - http://www.python.org/dev/peps/pep-0273/ +.. [2] The Freeze tool. + See also the ``Tools/freeze/`` directory in a Python source distribution - [3] The Freeze tool - Tools/freeze/ in a Python source distribution +.. [3] py2exe by Thomas Heller + http://www.py2exe.org/ - [4] Squeeze - http://starship.python.net/crew/fredrik/ipa/squeeze.htm +.. [4] imp.set_frozenmodules() patch + http://bugs.python.org/issue642578 - [5] py2exe by Thomas Heller - http://py2exe.sourceforge.net/ +.. [5] The path argument to ``finder.find_module()`` is there because the + ``pkg.__path__`` variable may be needed at this point. It may either come + from the actual parent module or be supplied by ``imp.find_module()`` or + the proposed ``imp.get_loader()`` function. - [6] imp.set_frozenmodules() patch - http://www.python.org/sf/642578 +.. [6] PEP 338: Executing modules as scripts + http://www.python.org/dev/peps/pep-0338/ - [7] The path argument to finder.find_module() is there because the - pkg.__path__ variable may be needed at this point. It may either - come from the actual parent module or be supplied by - imp.find_module() or the proposed imp.get_loader() function. +.. [7] Quixote, a framework for developing Web applications + http://www.mems-exchange.org/software/quixote/ - [8] Quixote, a framework for developing Web applications - http://www.mems-exchange.org/software/quixote/ +.. [8] PEP 366: Main module explicit relative imports + http://www.python.org/dev/peps/pep-0366/ - [9] PEP 338: Executing modules as scripts - http://www.python.org/dev/peps/pep-0338/ - - [10] PEP 366: Main module explicit relative imports - http://www.python.org/dev/peps/pep-0366/ +.. [9] New import hooks + Import from Zip files + http://bugs.python.org/issue652586 Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: