665 lines
28 KiB
Plaintext
665 lines
28 KiB
Plaintext
PEP: 402
|
||
Title: Simplified Package Layout and Partitioning
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: P.J. Eby
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 12-Jul-2011
|
||
Python-Version: 3.3
|
||
Post-History: 20-Jul-2011
|
||
Replaces: 382
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes an enhancement to Python's package importing
|
||
to:
|
||
|
||
* Surprise users of other languages less,
|
||
* Make it easier to convert a module into a package, and
|
||
* Support dividing packages into separately installed components
|
||
(ala "namespace packages", as described in PEP 382)
|
||
|
||
The proposed enhancements do not change the semantics of any
|
||
currently-importable directory layouts, but make it possible for
|
||
packages to use a simplified directory layout (that is not importable
|
||
currently).
|
||
|
||
However, the proposed changes do NOT add any performance overhead to
|
||
the importing of existing modules or packages, and performance for the
|
||
new directory layout should be about the same as that of previous
|
||
"namespace package" solutions (such as ``pkgutil.extend_path()``).
|
||
|
||
|
||
The Problem
|
||
===========
|
||
|
||
.. epigraph::
|
||
|
||
"Most packages are like modules. Their contents are highly
|
||
interdependent and can't be pulled apart. [However,] some
|
||
packages exist to provide a separate namespace. ... It should
|
||
be possible to distribute sub-packages or submodules of these
|
||
[namespace packages] independently."
|
||
|
||
-- Jim Fulton, shortly before the release of Python 2.3 [1]_
|
||
|
||
|
||
When new users come to Python from other languages, they are often
|
||
confused by Python's packaging semantics. At Google, for example,
|
||
Guido received complaints from "a large crowd with pitchforks" [2]_
|
||
that the requirement for packages to contain an ``__init__`` module
|
||
was a "misfeature", and should be dropped.
|
||
|
||
In addition, users coming from languages like Java or Perl are
|
||
sometimes confused by a difference in Python's import path searching.
|
||
|
||
In most other languages that have a similar path mechanism to Python's
|
||
``sys.path``, a package is merely a namespace that contains modules
|
||
or classes, and can thus be spread across multiple directories in
|
||
the language's path. In Perl, for instance, a ``Foo::Bar`` module
|
||
will be searched for in ``Foo/`` subdirectories all along the module
|
||
include path, not just in the first such subdirectory found.
|
||
|
||
Worse, this is not just a problem for new users: it prevents *anyone*
|
||
from easily splitting a package into separately-installable
|
||
components. In Perl terms, it would be as if every possible ``Net::``
|
||
module on CPAN had to be bundled up and shipped in a single tarball!
|
||
|
||
For that reason, various workarounds for this latter limitation exist,
|
||
circulated under the term "namespace packages". The Python standard
|
||
library has provided one such workaround since Python 2.3 (via the
|
||
``pkgutil.extend_path()`` function), and the "setuptools" package
|
||
provides another (via ``pkg_resources.declare_namespace()``).
|
||
|
||
The workarounds themselves, however, fall prey to a *third* issue with
|
||
Python's way of laying out packages in the filesystem.
|
||
|
||
Because a package *must* contain an ``__init__`` module, any attempt
|
||
to distribute modules for that package must necessarily include that
|
||
``__init__`` module, if those modules are to be importable.
|
||
|
||
However, the very fact that each distribution of modules for a package
|
||
must contain this (duplicated) ``__init__`` module, means that OS
|
||
vendors who package up these module distributions must somehow handle
|
||
the conflict caused by several module distributions installing that
|
||
``__init__`` module to the same location in the filesystem.
|
||
|
||
This led to the proposing of PEP 382 ("Namespace Packages") - a way
|
||
to signal to Python's import machinery that a directory was
|
||
importable, using unique filenames per module distribution.
|
||
|
||
However, there was more than one downside to this approach.
|
||
Performance for all import operations would be affected, and the
|
||
process of designating a package became even more complex. New
|
||
terminology had to be invented to explain the solution, and so on.
|
||
|
||
As terminology discussions continued on the Import-SIG, it soon became
|
||
apparent that the main reason it was so difficult to explain the
|
||
concepts related to "namespace packages" was because Python's
|
||
current way of handling packages is somewhat underpowered, when
|
||
compared to other languages.
|
||
|
||
That is, in other popular languages with package systems, no special
|
||
term is needed to describe "namespace packages", because *all*
|
||
packages generally behave in the desired fashion.
|
||
|
||
Rather than being an isolated single directory with a special marker
|
||
module (as in Python), packages in other languages are typically just
|
||
the *union* of appropriately-named directories across the *entire*
|
||
import or inclusion path.
|
||
|
||
In Perl, for example, the module ``Foo`` is always found in a
|
||
``Foo.pm`` file, and a module ``Foo::Bar`` is always found in a
|
||
``Foo/Bar.pm`` file. (In other words, there is One Obvious Way to
|
||
find the location of a particular module.)
|
||
|
||
This is because Perl considers a module to be *different* from a
|
||
package: the package is purely a *namespace* in which other modules
|
||
may reside, and is only *coincidentally* the name of a module as well.
|
||
|
||
In current versions of Python, however, the module and the package are
|
||
more tightly bound together. ``Foo`` is always a module -- whether it
|
||
is found in ``Foo.py`` or ``Foo/__init__.py`` -- and it is tightly
|
||
linked to its submodules (if any), which *must* reside in the exact
|
||
same directory where the ``__init__.py`` was found.
|
||
|
||
On the positive side, this design choice means that a package is quite
|
||
self-contained, and can be installed, copied, etc. as a unit just by
|
||
performing an operation on the package's root directory.
|
||
|
||
On the negative side, however, it is non-intuitive for beginners, and
|
||
requires a more complex step to turn a module into a package. If
|
||
``Foo`` begins its life as ``Foo.py``, then it must be moved and
|
||
renamed to ``Foo/__init__.py``.
|
||
|
||
Conversely, if you intend to create a ``Foo.Bar`` module from the
|
||
start, but have no particular module contents to put in ``Foo``
|
||
itself, then you have to create an empty and seemingly-irrelevant
|
||
``Foo/__init__.py`` file, just so that ``Foo.Bar`` can be imported.
|
||
|
||
(And these issues don't just confuse newcomers to the language,
|
||
either: they annoy many experienced developers as well.)
|
||
|
||
So, after some discussion on the Import-SIG, this PEP was created
|
||
as an alternative to PEP \382, in an attempt to solve *all* of the
|
||
above problems, not just the "namespace package" use cases.
|
||
|
||
And, as a delightful side effect, the solution proposed in this PEP
|
||
does not affect the import performance of ordinary modules or
|
||
self-contained (i.e. ``__init__``-based) packages.
|
||
|
||
|
||
The Solution
|
||
============
|
||
|
||
In the past, various proposals have been made to allow more intuitive
|
||
approaches to package directory layout. However, most of them failed
|
||
because of an apparent backward-compatibility problem.
|
||
|
||
That is, if the requirement for an ``__init__`` module were simply
|
||
dropped, it would open up the possibility for a directory named, say,
|
||
``string`` on ``sys.path``, to block importing of the standard library
|
||
``string`` module.
|
||
|
||
Paradoxically, however, the failure of this approach does *not* arise
|
||
from the elimination of the ``__init__`` requirement!
|
||
|
||
Rather, the failure arises because the underlying approach takes for
|
||
granted that a package is just ONE thing, instead of two.
|
||
|
||
In truth, a package comprises two separate, but related entities: a
|
||
module (with its own, optional contents), and a *namespace* where
|
||
*other* modules or packages can be found.
|
||
|
||
In current versions of Python, however, the module part (found in
|
||
``__init__``) and the namespace for submodule imports (represented
|
||
by the ``__path__`` attribute) are both initialized at the same time,
|
||
when the package is first imported.
|
||
|
||
And, if you assume this is the *only* way to initialize these two
|
||
things, then there is no way to drop the need for an ``__init__``
|
||
module, while still being backwards-compatible with existing directory
|
||
layouts.
|
||
|
||
After all, as soon as you encounter a directory on ``sys.path``
|
||
matching the desired name, that means you've "found" the package, and
|
||
must stop searching, right?
|
||
|
||
Well, not quite.
|
||
|
||
|
||
A Thought Experiment
|
||
--------------------
|
||
|
||
Let's hop into the time machine for a moment, and pretend we're back
|
||
in the early 1990s, shortly before Python packages and ``__init__.py``
|
||
have been invented. But, imagine that we *are* familiar with
|
||
Perl-like package imports, and we want to implement a similar system
|
||
in Python.
|
||
|
||
We'd still have Python's *module* imports to build on, so we could
|
||
certainly conceive of having ``Foo.py`` as a parent ``Foo`` module
|
||
for a ``Foo`` package. But how would we implement submodule and
|
||
subpackage imports?
|
||
|
||
Well, if we didn't have the idea of ``__path__`` attributes yet,
|
||
we'd probably just search ``sys.path`` looking for ``Foo/Bar.py``.
|
||
|
||
But we'd *only* do it when someone actually tried to *import*
|
||
``Foo.Bar``.
|
||
|
||
NOT when they imported ``Foo``.
|
||
|
||
And *that* lets us get rid of the backwards-compatibility problem
|
||
of dropping the ``__init__`` requirement, back here in 2011.
|
||
|
||
How?
|
||
|
||
Well, when we ``import Foo``, we're not even *looking* for ``Foo/``
|
||
directories on ``sys.path``, because we don't *care* yet. The only
|
||
point at which we care, is the point when somebody tries to actually
|
||
import a submodule or subpackage of ``Foo``.
|
||
|
||
That means that if ``Foo`` is a standard library module (for example),
|
||
and I happen to have a ``Foo`` directory on ``sys.path`` (without
|
||
an ``__init__.py``, of course), then *nothing breaks*. The ``Foo``
|
||
module is still just a module, and it's still imported normally.
|
||
|
||
|
||
Self-Contained vs. "Virtual" Packages
|
||
-------------------------------------
|
||
|
||
Of course, in today's Python, trying to ``import Foo.Bar`` will
|
||
fail if ``Foo`` is just a ``Foo.py`` module (and thus lacks a
|
||
``__path__`` attribute).
|
||
|
||
So, this PEP proposes to *dynamically* create a ``__path__``, in the
|
||
case where one is missing.
|
||
|
||
That is, if I try to ``import Foo.Bar`` the proposed change to the
|
||
import machinery will notice that the ``Foo`` module lacks a
|
||
``__path__``, and will therefore try to *build* one before proceeding.
|
||
|
||
And it will do this by making a list of all the existing ``Foo/``
|
||
subdirectories of the directories listed in ``sys.path``.
|
||
|
||
If the list is empty, the import will fail with ``ImportError``, just
|
||
like today. But if the list is *not* empty, then it is saved in
|
||
a new ``Foo.__path__`` attribute, making the module a "virtual
|
||
package".
|
||
|
||
That is, because it now has a valid ``__path__``, we can proceed
|
||
to import submodules or subpackages in the normal way.
|
||
|
||
Now, notice that this change does not affect "classic", self-contained
|
||
packages that have an ``__init__`` module in them. Such packages
|
||
already *have* a ``__path__`` attribute (initialized at import time)
|
||
so the import machinery won't try to create another one later.
|
||
|
||
This means that (for example) the standard library ``email`` package
|
||
will not be affected in any way by you having a bunch of unrelated
|
||
directories named ``email`` on ``sys.path``. (Even if they contain
|
||
``*.py`` files.)
|
||
|
||
But it *does* mean that if you want to turn your ``Foo`` module into
|
||
a ``Foo`` package, all you have to do is add a ``Foo/`` directory
|
||
somewhere on ``sys.path``, and start adding modules to it.
|
||
|
||
But what if you only want a "namespace package"? That is, a package
|
||
that is *only* a namespace for various separately-distributed
|
||
submodules and subpackages?
|
||
|
||
For example, if you're Zope Corporation, distributing dozens of
|
||
separate tools like ``zc.buildout``, each in packages under the ``zc``
|
||
namespace, you don't want to have to make and include an empty
|
||
``zc.py`` in every tool you ship. (And, if you're a Linux or other
|
||
OS vendor, you don't want to deal with the package installation
|
||
conflicts created by trying to install ten copies of ``zc.py`` to the
|
||
same location!)
|
||
|
||
No problem. All we have to do is make one more minor tweak to the
|
||
import process: if the "classic" import process fails to find a
|
||
self-contained module or package (e.g., if ``import zc`` fails to find
|
||
a ``zc.py`` or ``zc/__init__.py``), then we once more try to build a
|
||
``__path__`` by searching for all the ``zc/`` directories on
|
||
``sys.path``, and putting them in a list.
|
||
|
||
If this list is empty, we raise ``ImportError``. But if it's
|
||
non-empty, we create an empty ``zc`` module, and put the list in
|
||
``zc.__path__``. Congratulations: ``zc`` is now a namespace-only,
|
||
"pure virtual" package! It has no module contents, but you can still
|
||
import submodules and subpackages from it, regardless of where they're
|
||
located on ``sys.path``.
|
||
|
||
(By the way, both of these additions to the import protocol (i.e. the
|
||
dynamically-added ``__path__``, and dynamically-created modules)
|
||
apply recursively to child packages, using the parent package's
|
||
``__path__`` in place of ``sys.path`` as a basis for generating a
|
||
child ``__path__``. This means that self-contained and virtual
|
||
packages can contain each other without limitation, with the caveat
|
||
that if you put a virtual package inside a self-contained one, it's
|
||
gonna have a really short ``__path__``!)
|
||
|
||
|
||
Backwards Compatibility and Performance
|
||
---------------------------------------
|
||
|
||
Notice that these two changes *only* affect import operations that
|
||
today would result in ``ImportError``. As a result, the performance
|
||
of imports that do not involve virtual packages is unaffected, and
|
||
potential backward compatibility issues are very restricted.
|
||
|
||
Today, if you try to import submodules or subpackages from a module
|
||
with no ``__path__``, it's an immediate error. And of course, if you
|
||
don't have a ``zc.py`` or ``zc/__init__.py`` somewhere on ``sys.path``
|
||
today, ``import zc`` would likewise fail.
|
||
|
||
Thus, the only potential backwards-compatibility issues are:
|
||
|
||
1. Tools that expect package directories to have an ``__init__``
|
||
module, that expect directories without an ``__init__`` module
|
||
to be unimportable, or that expect ``__path__`` attributes to be
|
||
static, will not recognize virtual packages as packages.
|
||
|
||
(In practice, this just means that tools will need updating to
|
||
support virtual packages, e.g. by using ``pkgutil.walk_modules()``
|
||
instead of using hardcoded filesystem searches.)
|
||
|
||
2. Code that *expects* certain imports to fail may now do something
|
||
unexpected. This should be fairly rare in practice, as most sane,
|
||
non-test code does not import things that are expected not to
|
||
exist!
|
||
|
||
The biggest likely exception to the above would be when a piece of
|
||
code tries to check whether some package is installed by importing
|
||
it. If this is done *only* by importing a top-level module (i.e., not
|
||
checking for a ``__version__`` or some other attribute), *and* there
|
||
is a directory of the same name as the sought-for package on
|
||
``sys.path`` somewhere, *and* the package is not actually installed,
|
||
then such code could be fooled into thinking a package is installed
|
||
that really isn't.
|
||
|
||
For example, suppose someone writes a script (``datagen.py``)
|
||
containing the following code::
|
||
|
||
try:
|
||
import json
|
||
except ImportError:
|
||
import simplejson as json
|
||
|
||
And runs it in a directory laid out like this::
|
||
|
||
datagen.py
|
||
json/
|
||
foo.js
|
||
bar.js
|
||
|
||
If ``import json`` succeeded due to the mere presence of the ``json/``
|
||
subdirectory, the code would incorrectly believe that the ``json``
|
||
module was available, and proceed to fail with an error.
|
||
|
||
However, we can prevent corner cases like these from arising, simply
|
||
by making one small change to the algorithm presented so far. Instead
|
||
of allowing you to import a "pure virtual" package (like ``zc``),
|
||
we allow only importing of the *contents* of virtual packages.
|
||
|
||
That is, a statement like ``import zc`` should raise ``ImportError``
|
||
if there is no ``zc.py`` or ``zc/__init__.py`` on ``sys.path``. But,
|
||
doing ``import zc.buildout`` should still succeed, as long as there's
|
||
a ``zc/buildout.py`` or ``zc/buildout/__init__.py`` on ``sys.path``.
|
||
|
||
In other words, we don't allow pure virtual packages to be imported
|
||
directly, only modules and self-contained packages. (This is an
|
||
acceptable limitation, because there is no *functional* value to
|
||
importing such a package by itself. After all, the module object
|
||
will have no *contents* until you import at least one of its
|
||
subpackages or submodules!)
|
||
|
||
Once ``zc.buildout`` has been successfully imported, though, there
|
||
*will* be a ``zc`` module in ``sys.modules``, and trying to import it
|
||
will of course succeed. We are only preventing an *initial* import
|
||
from succeeding, in order to prevent false-positive import successes
|
||
when clashing subdirectories are present on ``sys.path``.
|
||
|
||
So, with this slight change, the ``datagen.py`` example above will
|
||
work correctly. When it does ``import json``, the mere presence of a
|
||
``json/`` directory will simply not affect the import process at all,
|
||
even if it contains ``.py`` files. The ``json/`` directory will still
|
||
only be searched in the case where an import like ``import
|
||
json.converter`` is attempted.
|
||
|
||
Meanwhile, tools that expect to locate packages and modules by
|
||
walking a directory tree can be updated to use the existing
|
||
``pkgutil.walk_modules()`` API, and tools that need to inspect
|
||
packages in memory should use the other APIs described in the
|
||
`Standard Library Changes/Additions`_ section below.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
A change is made to the existing import process, when importing
|
||
names containing at least one ``.`` -- that is, imports of modules
|
||
that have a parent package.
|
||
|
||
Specifically, if the parent package does not exist, or exists but
|
||
lacks a ``__path__`` attribute, an attempt is first made to create a
|
||
"virtual path" for the parent package (following the algorithm
|
||
described in the section on `virtual paths`_, below).
|
||
|
||
If the computed "virtual path" is empty, an ``ImportError`` results,
|
||
just as it would today. However, if a non-empty virtual path is
|
||
obtained, the normal import of the submodule or subpackage proceeds,
|
||
using that virtual path to find the submodule or subpackage. (Just
|
||
as it would have with the parent's ``__path__``, if the parent package
|
||
had existed and had a ``__path__``.)
|
||
|
||
When a submodule or subpackage is found (but not yet loaded),
|
||
the parent package is created and added to ``sys.modules`` (if it
|
||
didn't exist before), and its ``__path__`` is set to the computed
|
||
virtual path (if it wasn't already set).
|
||
|
||
In this way, when the actual loading of the submodule or subpackage
|
||
occurs, it will see a parent package existing, and any relative
|
||
imports will work correctly. However, if no submodule or subpackage
|
||
exists, then the parent package will *not* be created, nor will a
|
||
standalone module be converted into a package (by the addition of a
|
||
spurious ``__path__`` attribute).
|
||
|
||
Note, by the way, that this change must be applied *recursively*: that
|
||
is, if ``foo`` and ``foo.bar`` are pure virtual packages, then
|
||
``import foo.bar.baz`` must wait until ``foo.bar.baz`` is found before
|
||
creating module objects for *both* ``foo`` and ``foo.bar``, and then
|
||
create both of them together, properly setting the ``foo`` module's
|
||
``.bar`` attrbute to point to the ``foo.bar`` module.
|
||
|
||
In this way, pure virtual packages are never directly importable:
|
||
an ``import foo`` or ``import foo.bar`` by itself will fail, and the
|
||
corresponding modules will not appear in ``sys.modules`` until they
|
||
are needed to point to a *successfully* imported submodule or
|
||
self-contained subpackage.
|
||
|
||
|
||
Virtual Paths
|
||
-------------
|
||
|
||
A virtual path is created by obtaining a PEP 302 "importer" object for
|
||
each of the path entries found in ``sys.path`` (for a top-level
|
||
module) or the parent ``__path__`` (for a submodule).
|
||
|
||
(Note: because ``sys.meta_path`` importers are not associated with
|
||
``sys.path`` or ``__path__`` entry strings, such importers do *not*
|
||
participate in this process.)
|
||
|
||
Each importer is checked for a ``get_subpath()`` method, and if
|
||
present, the method is called with the full name of the module/package
|
||
the path is being constructed for. The return value is either a
|
||
string representing a subdirectory for the requested package, or
|
||
``None`` if no such subdirectory exists.
|
||
|
||
The strings returned by the importers are added to the path list
|
||
being built, in the same order as they are found. (``None`` values
|
||
and missing ``get_subpath()`` methods are simply skipped.)
|
||
|
||
The resulting list (whether empty or not) is then stored in a
|
||
``sys.virtual_package_paths`` dictionary, keyed by module name.
|
||
|
||
This dictionary has two purposes. First, it serves as a cache, in
|
||
the event that more than one attempt is made to import a submodule
|
||
of a virtual package.
|
||
|
||
Second, and more importantly, the dictionary can be used by code that
|
||
extends ``sys.path`` at runtime to *update* imported packages'
|
||
``__path__`` attributes accordingly. (See `Standard Library
|
||
Changes/Additions`_ below for more details.)
|
||
|
||
In Python code, the virtual path construction algorithm would look
|
||
something like this::
|
||
|
||
def get_virtual_path(modulename, parent_path=None):
|
||
|
||
if modulename in sys.virtual_package_paths:
|
||
return sys.virtual_package_paths[modulename]
|
||
|
||
if parent_path is None:
|
||
parent_path = sys.path
|
||
|
||
path = []
|
||
|
||
for entry in parent_path:
|
||
# Obtain a PEP 302 importer object - see pkgutil module
|
||
importer = pkgutil.get_importer(entry)
|
||
|
||
if hasattr(importer, 'get_subpath'):
|
||
subpath = importer.get_subpath(modulename)
|
||
if subpath is not None:
|
||
path.append(subpath)
|
||
|
||
sys.virtual_package_paths[modulename] = path
|
||
return path
|
||
|
||
And a function like this one should be exposed in the standard
|
||
library as e.g. ``imp.get_virtual_path()``, so that people creating
|
||
``__import__`` replacements or ``sys.meta_path`` hooks can reuse it.
|
||
|
||
|
||
Standard Library Changes/Additions
|
||
----------------------------------
|
||
|
||
The ``pkgutil`` module should be updated to handle this
|
||
specification appropriately, including any necessary changes to
|
||
``extend_path()``, ``iter_modules()``, etc.
|
||
|
||
Specifically the proposed changes and additions to ``pkgutil`` are:
|
||
|
||
* A new ``extend_virtual_paths(path_entry)`` function, to extend
|
||
existing, already-imported virtual packages' ``__path__`` attributes
|
||
to include any portions found in a new ``sys.path`` entry. This
|
||
function should be called by applications extending ``sys.path``
|
||
at runtime, e.g. when adding a plugin directory or an egg to the
|
||
path.
|
||
|
||
The implementation of this function does a simple top-down traversal
|
||
of ``sys.virtual_package_paths``, and performs any necessary
|
||
``get_subpath()`` calls to identify what path entries need to be
|
||
added to the virtual path for that package, given that `path_entry`
|
||
has been added to ``sys.path``. (Or, in the case of sub-packages,
|
||
adding a derived subpath entry, based on their parent package's
|
||
virtual path.)
|
||
|
||
(Note: this function must update both the path values in
|
||
``sys.virtual_package_paths`` as well as the ``__path__`` attributes
|
||
of any corresponding modules in ``sys.modules``, even though in the
|
||
common case they will both be the same ``list`` object.)
|
||
|
||
* A new ``iter_virtual_packages(parent='')`` function to allow
|
||
top-down traversal of virtual packages from
|
||
``sys.virtual_package_paths``, by yielding the child virtual
|
||
packages of `parent`. For example, calling
|
||
``iter_virtual_packages("zope")`` might yield ``zope.app``
|
||
and ``zope.products`` (if they are virtual packages listed in
|
||
``sys.virtual_package_paths``), but **not** ``zope.foo.bar``.
|
||
(This function is needed to implement ``extend_virtual_paths()``,
|
||
but is also potentially useful for other code that needs to inspect
|
||
imported virtual packages.)
|
||
|
||
* ``ImpImporter.iter_modules()`` should be changed to also detect and
|
||
yield the names of modules found in virtual packages.
|
||
|
||
In addition to the above changes, the ``zipimport`` importer should
|
||
have its ``iter_modules()`` implementation similarly changed. (Note:
|
||
current versions of Python implement this via a shim in ``pkgutil``,
|
||
so technically this is also a change to ``pkgutil``.)
|
||
|
||
Last, but not least, the ``imp`` module (or ``importlib``, if
|
||
appropriate) should expose the algorithm described in the `virtual
|
||
paths`_ section above, as a
|
||
``get_virtual_path(modulename, parent_path=None)`` function, so that
|
||
creators of ``__import__`` replacements can use it.
|
||
|
||
|
||
Implementation Notes
|
||
--------------------
|
||
|
||
For users, developers, and distributors of virtual packages:
|
||
|
||
* While virtual packages are easy to set up and use, there is still
|
||
a time and place for using self-contained packages. While it's not
|
||
strictly necessary, adding an ``__init__`` module to your
|
||
self-contained packages lets users of the package (and Python
|
||
itself) know that *all* of the package's code will be found in
|
||
that single subdirectory. In addition, it lets you define
|
||
``__all__``, expose a public API, provide a package-level docstring,
|
||
and do other things that make more sense for a self-contained
|
||
project than for a mere "namespace" package.
|
||
|
||
* ``sys.virtual_package_paths`` is allowed to contain entries for
|
||
non-existent or not-yet-imported package names; code that uses its
|
||
contents should not assume that every key in this dictionary is also
|
||
present in ``sys.modules`` or that importing the name will
|
||
necessarily succeed.
|
||
|
||
* If you are changing a currently self-contained package into a
|
||
virtual one, it's important to note that you can no longer use its
|
||
``__file__`` attribute to locate data files stored in a package
|
||
directory. Instead, you must search ``__path__`` or use the
|
||
``__file__`` of a submodule adjacent to the desired files, or
|
||
of a self-contained subpackage that contains the desired files.
|
||
|
||
(Note: this caveat is already true for existing users of "namespace
|
||
packages" today. That is, it is an inherent result of being able
|
||
to partition a package, that you must know *which* partition the
|
||
desired data file lives in. We mention it here simply so that
|
||
*new* users converting from self-contained to virtual packages will
|
||
also be aware of it.)
|
||
|
||
* XXX what is the __file__ of a "pure virtual" package? ``None``?
|
||
Some arbitrary string? The path of the first directory with a
|
||
trailing separator? No matter what we put, *some* code is
|
||
going to break, but the last choice might allow some code to
|
||
accidentally work. Is that good or bad?
|
||
|
||
|
||
For those implementing PEP \302 importer objects:
|
||
|
||
* Importers that support the ``iter_modules()`` method (used by
|
||
``pkgutil`` to locate importable modules and packages) and want to
|
||
add virtual package support should modify their ``iter_modules()``
|
||
method so that it discovers and lists virtual packages as well as
|
||
standard modules and packages. To do this, the importer should
|
||
simply list all immediate subdirectory names in its jurisdiction
|
||
that are valid Python identifiers.
|
||
|
||
XXX This might list a lot of not-really-packages. Should we
|
||
require importable contents to exist? If so, how deep do we
|
||
search, and how do we prevent e.g. link loops, or traversing onto
|
||
different filesystems, etc.? Ick. Also, if virtual packages are
|
||
listed, they still can't be *imported*, which is a problem for the
|
||
way that ``pkgutil.walk_modules()`` is currently implemented.
|
||
|
||
* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
|
||
not need to implement ``get_subpath()``, because the method
|
||
is only called on importers corresponding to ``sys.path`` entries
|
||
and ``__path__`` entries. If a meta importer wishes to support
|
||
virtual packages, it must do so entirely within its own
|
||
``find_module()`` implementation.
|
||
|
||
Unfortunately, it is unlikely that any such implementation will be
|
||
able to merge its package subpaths with those of other meta
|
||
importers or ``sys.path`` importers, so the meaning of "supporting
|
||
virtual packages" for a meta importer is currently undefined!
|
||
|
||
(However, since the intended use case for meta importers is to
|
||
replace Python's normal import process entirely for some subset of
|
||
modules, and the number of such importers currently implemented is
|
||
quite small, this seems unlikely to be a big issue in practice.)
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] "namespace" vs "module" packages (mailing list thread)
|
||
(http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
|
||
|
||
.. [2] "Dropping __init__.py requirement for subpackages"
|
||
(http://mail.python.org/pipermail/python-dev/2006-April/064400.html)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|