2011-07-20 04:43:40 -04:00
|
|
|
PEP: 402
|
|
|
|
Title: Simplified Package Layout and Partitioning
|
2023-09-01 10:39:22 -04:00
|
|
|
Author: Phillip J. Eby
|
2012-03-12 19:58:36 -04:00
|
|
|
Status: Rejected
|
2011-07-20 04:43:40 -04:00
|
|
|
Type: Standards Track
|
2022-06-14 17:22:20 -04:00
|
|
|
Topic: Packaging
|
2011-07-20 04:43:40 -04:00
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 12-Jul-2011
|
|
|
|
Python-Version: 3.3
|
|
|
|
Post-History: 20-Jul-2011
|
|
|
|
Replaces: 382
|
|
|
|
|
2012-03-12 19:58:36 -04:00
|
|
|
Rejection Notice
|
|
|
|
================
|
|
|
|
|
|
|
|
On the first day of sprints at US PyCon 2012 we had a long and
|
2022-01-21 06:03:51 -05:00
|
|
|
fruitful discussion about :pep:`382` and :pep:`402`. We ended up rejecting
|
2012-03-12 19:58:36 -04:00
|
|
|
both but a new PEP will be written to carry on in the spirit of PEP
|
|
|
|
402. Martin von Löwis wrote up a summary: [3]_.
|
|
|
|
|
2011-07-20 04:43:40 -04:00
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
This PEP proposes an enhancement to Python's package importing
|
|
|
|
to:
|
|
|
|
|
|
|
|
* Surprise users of other languages less,
|
|
|
|
* Make it easier to convert a module into a package, and
|
|
|
|
* Support dividing packages into separately installed components
|
2022-01-21 06:03:51 -05:00
|
|
|
(ala "namespace packages", as described in :pep:`382`)
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
The proposed enhancements do not change the semantics of any
|
|
|
|
currently-importable directory layouts, but make it possible for
|
|
|
|
packages to use a simplified directory layout (that is not importable
|
|
|
|
currently).
|
|
|
|
|
|
|
|
However, the proposed changes do NOT add any performance overhead to
|
|
|
|
the importing of existing modules or packages, and performance for the
|
|
|
|
new directory layout should be about the same as that of previous
|
|
|
|
"namespace package" solutions (such as ``pkgutil.extend_path()``).
|
|
|
|
|
|
|
|
|
|
|
|
The Problem
|
|
|
|
===========
|
|
|
|
|
|
|
|
.. epigraph::
|
|
|
|
|
2011-07-20 09:56:48 -04:00
|
|
|
"Most packages are like modules. Their contents are highly
|
|
|
|
interdependent and can't be pulled apart. [However,] some
|
|
|
|
packages exist to provide a separate namespace. ... It should
|
|
|
|
be possible to distribute sub-packages or submodules of these
|
|
|
|
[namespace packages] independently."
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2011-07-20 09:56:48 -04:00
|
|
|
-- Jim Fulton, shortly before the release of Python 2.3 [1]_
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
|
|
|
|
When new users come to Python from other languages, they are often
|
2011-08-11 14:29:06 -04:00
|
|
|
confused by Python's package import semantics. At Google, for example,
|
2011-07-20 04:43:40 -04:00
|
|
|
Guido received complaints from "a large crowd with pitchforks" [2]_
|
|
|
|
that the requirement for packages to contain an ``__init__`` module
|
|
|
|
was a "misfeature", and should be dropped.
|
|
|
|
|
|
|
|
In addition, users coming from languages like Java or Perl are
|
|
|
|
sometimes confused by a difference in Python's import path searching.
|
|
|
|
|
|
|
|
In most other languages that have a similar path mechanism to Python's
|
|
|
|
``sys.path``, a package is merely a namespace that contains modules
|
|
|
|
or classes, and can thus be spread across multiple directories in
|
|
|
|
the language's path. In Perl, for instance, a ``Foo::Bar`` module
|
|
|
|
will be searched for in ``Foo/`` subdirectories all along the module
|
|
|
|
include path, not just in the first such subdirectory found.
|
|
|
|
|
|
|
|
Worse, this is not just a problem for new users: it prevents *anyone*
|
|
|
|
from easily splitting a package into separately-installable
|
|
|
|
components. In Perl terms, it would be as if every possible ``Net::``
|
|
|
|
module on CPAN had to be bundled up and shipped in a single tarball!
|
|
|
|
|
|
|
|
For that reason, various workarounds for this latter limitation exist,
|
|
|
|
circulated under the term "namespace packages". The Python standard
|
|
|
|
library has provided one such workaround since Python 2.3 (via the
|
|
|
|
``pkgutil.extend_path()`` function), and the "setuptools" package
|
|
|
|
provides another (via ``pkg_resources.declare_namespace()``).
|
|
|
|
|
|
|
|
The workarounds themselves, however, fall prey to a *third* issue with
|
|
|
|
Python's way of laying out packages in the filesystem.
|
|
|
|
|
|
|
|
Because a package *must* contain an ``__init__`` module, any attempt
|
|
|
|
to distribute modules for that package must necessarily include that
|
|
|
|
``__init__`` module, if those modules are to be importable.
|
|
|
|
|
|
|
|
However, the very fact that each distribution of modules for a package
|
|
|
|
must contain this (duplicated) ``__init__`` module, means that OS
|
|
|
|
vendors who package up these module distributions must somehow handle
|
|
|
|
the conflict caused by several module distributions installing that
|
|
|
|
``__init__`` module to the same location in the filesystem.
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
This led to the proposing of :pep:`382` ("Namespace Packages") - a way
|
2011-07-20 04:43:40 -04:00
|
|
|
to signal to Python's import machinery that a directory was
|
|
|
|
importable, using unique filenames per module distribution.
|
|
|
|
|
|
|
|
However, there was more than one downside to this approach.
|
|
|
|
Performance for all import operations would be affected, and the
|
|
|
|
process of designating a package became even more complex. New
|
|
|
|
terminology had to be invented to explain the solution, and so on.
|
|
|
|
|
|
|
|
As terminology discussions continued on the Import-SIG, it soon became
|
|
|
|
apparent that the main reason it was so difficult to explain the
|
|
|
|
concepts related to "namespace packages" was because Python's
|
|
|
|
current way of handling packages is somewhat underpowered, when
|
|
|
|
compared to other languages.
|
|
|
|
|
|
|
|
That is, in other popular languages with package systems, no special
|
|
|
|
term is needed to describe "namespace packages", because *all*
|
|
|
|
packages generally behave in the desired fashion.
|
|
|
|
|
|
|
|
Rather than being an isolated single directory with a special marker
|
|
|
|
module (as in Python), packages in other languages are typically just
|
|
|
|
the *union* of appropriately-named directories across the *entire*
|
|
|
|
import or inclusion path.
|
|
|
|
|
|
|
|
In Perl, for example, the module ``Foo`` is always found in a
|
|
|
|
``Foo.pm`` file, and a module ``Foo::Bar`` is always found in a
|
|
|
|
``Foo/Bar.pm`` file. (In other words, there is One Obvious Way to
|
|
|
|
find the location of a particular module.)
|
|
|
|
|
|
|
|
This is because Perl considers a module to be *different* from a
|
|
|
|
package: the package is purely a *namespace* in which other modules
|
|
|
|
may reside, and is only *coincidentally* the name of a module as well.
|
|
|
|
|
|
|
|
In current versions of Python, however, the module and the package are
|
|
|
|
more tightly bound together. ``Foo`` is always a module -- whether it
|
|
|
|
is found in ``Foo.py`` or ``Foo/__init__.py`` -- and it is tightly
|
|
|
|
linked to its submodules (if any), which *must* reside in the exact
|
|
|
|
same directory where the ``__init__.py`` was found.
|
|
|
|
|
|
|
|
On the positive side, this design choice means that a package is quite
|
|
|
|
self-contained, and can be installed, copied, etc. as a unit just by
|
|
|
|
performing an operation on the package's root directory.
|
|
|
|
|
|
|
|
On the negative side, however, it is non-intuitive for beginners, and
|
|
|
|
requires a more complex step to turn a module into a package. If
|
|
|
|
``Foo`` begins its life as ``Foo.py``, then it must be moved and
|
|
|
|
renamed to ``Foo/__init__.py``.
|
|
|
|
|
|
|
|
Conversely, if you intend to create a ``Foo.Bar`` module from the
|
|
|
|
start, but have no particular module contents to put in ``Foo``
|
|
|
|
itself, then you have to create an empty and seemingly-irrelevant
|
|
|
|
``Foo/__init__.py`` file, just so that ``Foo.Bar`` can be imported.
|
|
|
|
|
|
|
|
(And these issues don't just confuse newcomers to the language,
|
|
|
|
either: they annoy many experienced developers as well.)
|
|
|
|
|
|
|
|
So, after some discussion on the Import-SIG, this PEP was created
|
|
|
|
as an alternative to PEP \382, in an attempt to solve *all* of the
|
|
|
|
above problems, not just the "namespace package" use cases.
|
|
|
|
|
|
|
|
And, as a delightful side effect, the solution proposed in this PEP
|
|
|
|
does not affect the import performance of ordinary modules or
|
|
|
|
self-contained (i.e. ``__init__``-based) packages.
|
|
|
|
|
|
|
|
|
|
|
|
The Solution
|
|
|
|
============
|
|
|
|
|
|
|
|
In the past, various proposals have been made to allow more intuitive
|
|
|
|
approaches to package directory layout. However, most of them failed
|
|
|
|
because of an apparent backward-compatibility problem.
|
|
|
|
|
|
|
|
That is, if the requirement for an ``__init__`` module were simply
|
|
|
|
dropped, it would open up the possibility for a directory named, say,
|
|
|
|
``string`` on ``sys.path``, to block importing of the standard library
|
|
|
|
``string`` module.
|
|
|
|
|
|
|
|
Paradoxically, however, the failure of this approach does *not* arise
|
|
|
|
from the elimination of the ``__init__`` requirement!
|
|
|
|
|
|
|
|
Rather, the failure arises because the underlying approach takes for
|
|
|
|
granted that a package is just ONE thing, instead of two.
|
|
|
|
|
|
|
|
In truth, a package comprises two separate, but related entities: a
|
|
|
|
module (with its own, optional contents), and a *namespace* where
|
|
|
|
*other* modules or packages can be found.
|
|
|
|
|
|
|
|
In current versions of Python, however, the module part (found in
|
|
|
|
``__init__``) and the namespace for submodule imports (represented
|
|
|
|
by the ``__path__`` attribute) are both initialized at the same time,
|
|
|
|
when the package is first imported.
|
|
|
|
|
|
|
|
And, if you assume this is the *only* way to initialize these two
|
|
|
|
things, then there is no way to drop the need for an ``__init__``
|
|
|
|
module, while still being backwards-compatible with existing directory
|
|
|
|
layouts.
|
|
|
|
|
|
|
|
After all, as soon as you encounter a directory on ``sys.path``
|
|
|
|
matching the desired name, that means you've "found" the package, and
|
|
|
|
must stop searching, right?
|
|
|
|
|
|
|
|
Well, not quite.
|
|
|
|
|
|
|
|
|
|
|
|
A Thought Experiment
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Let's hop into the time machine for a moment, and pretend we're back
|
|
|
|
in the early 1990s, shortly before Python packages and ``__init__.py``
|
|
|
|
have been invented. But, imagine that we *are* familiar with
|
|
|
|
Perl-like package imports, and we want to implement a similar system
|
|
|
|
in Python.
|
|
|
|
|
|
|
|
We'd still have Python's *module* imports to build on, so we could
|
|
|
|
certainly conceive of having ``Foo.py`` as a parent ``Foo`` module
|
|
|
|
for a ``Foo`` package. But how would we implement submodule and
|
|
|
|
subpackage imports?
|
|
|
|
|
|
|
|
Well, if we didn't have the idea of ``__path__`` attributes yet,
|
|
|
|
we'd probably just search ``sys.path`` looking for ``Foo/Bar.py``.
|
|
|
|
|
|
|
|
But we'd *only* do it when someone actually tried to *import*
|
|
|
|
``Foo.Bar``.
|
|
|
|
|
|
|
|
NOT when they imported ``Foo``.
|
|
|
|
|
|
|
|
And *that* lets us get rid of the backwards-compatibility problem
|
|
|
|
of dropping the ``__init__`` requirement, back here in 2011.
|
|
|
|
|
|
|
|
How?
|
|
|
|
|
|
|
|
Well, when we ``import Foo``, we're not even *looking* for ``Foo/``
|
|
|
|
directories on ``sys.path``, because we don't *care* yet. The only
|
|
|
|
point at which we care, is the point when somebody tries to actually
|
|
|
|
import a submodule or subpackage of ``Foo``.
|
|
|
|
|
|
|
|
That means that if ``Foo`` is a standard library module (for example),
|
|
|
|
and I happen to have a ``Foo`` directory on ``sys.path`` (without
|
|
|
|
an ``__init__.py``, of course), then *nothing breaks*. The ``Foo``
|
|
|
|
module is still just a module, and it's still imported normally.
|
|
|
|
|
|
|
|
|
|
|
|
Self-Contained vs. "Virtual" Packages
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
Of course, in today's Python, trying to ``import Foo.Bar`` will
|
|
|
|
fail if ``Foo`` is just a ``Foo.py`` module (and thus lacks a
|
|
|
|
``__path__`` attribute).
|
|
|
|
|
|
|
|
So, this PEP proposes to *dynamically* create a ``__path__``, in the
|
|
|
|
case where one is missing.
|
|
|
|
|
|
|
|
That is, if I try to ``import Foo.Bar`` the proposed change to the
|
|
|
|
import machinery will notice that the ``Foo`` module lacks a
|
|
|
|
``__path__``, and will therefore try to *build* one before proceeding.
|
|
|
|
|
|
|
|
And it will do this by making a list of all the existing ``Foo/``
|
|
|
|
subdirectories of the directories listed in ``sys.path``.
|
|
|
|
|
|
|
|
If the list is empty, the import will fail with ``ImportError``, just
|
|
|
|
like today. But if the list is *not* empty, then it is saved in
|
|
|
|
a new ``Foo.__path__`` attribute, making the module a "virtual
|
|
|
|
package".
|
|
|
|
|
|
|
|
That is, because it now has a valid ``__path__``, we can proceed
|
|
|
|
to import submodules or subpackages in the normal way.
|
|
|
|
|
|
|
|
Now, notice that this change does not affect "classic", self-contained
|
|
|
|
packages that have an ``__init__`` module in them. Such packages
|
|
|
|
already *have* a ``__path__`` attribute (initialized at import time)
|
|
|
|
so the import machinery won't try to create another one later.
|
|
|
|
|
|
|
|
This means that (for example) the standard library ``email`` package
|
|
|
|
will not be affected in any way by you having a bunch of unrelated
|
|
|
|
directories named ``email`` on ``sys.path``. (Even if they contain
|
|
|
|
``*.py`` files.)
|
|
|
|
|
|
|
|
But it *does* mean that if you want to turn your ``Foo`` module into
|
|
|
|
a ``Foo`` package, all you have to do is add a ``Foo/`` directory
|
|
|
|
somewhere on ``sys.path``, and start adding modules to it.
|
|
|
|
|
|
|
|
But what if you only want a "namespace package"? That is, a package
|
|
|
|
that is *only* a namespace for various separately-distributed
|
|
|
|
submodules and subpackages?
|
|
|
|
|
|
|
|
For example, if you're Zope Corporation, distributing dozens of
|
|
|
|
separate tools like ``zc.buildout``, each in packages under the ``zc``
|
|
|
|
namespace, you don't want to have to make and include an empty
|
|
|
|
``zc.py`` in every tool you ship. (And, if you're a Linux or other
|
|
|
|
OS vendor, you don't want to deal with the package installation
|
|
|
|
conflicts created by trying to install ten copies of ``zc.py`` to the
|
|
|
|
same location!)
|
|
|
|
|
|
|
|
No problem. All we have to do is make one more minor tweak to the
|
|
|
|
import process: if the "classic" import process fails to find a
|
|
|
|
self-contained module or package (e.g., if ``import zc`` fails to find
|
|
|
|
a ``zc.py`` or ``zc/__init__.py``), then we once more try to build a
|
|
|
|
``__path__`` by searching for all the ``zc/`` directories on
|
|
|
|
``sys.path``, and putting them in a list.
|
|
|
|
|
|
|
|
If this list is empty, we raise ``ImportError``. But if it's
|
|
|
|
non-empty, we create an empty ``zc`` module, and put the list in
|
|
|
|
``zc.__path__``. Congratulations: ``zc`` is now a namespace-only,
|
|
|
|
"pure virtual" package! It has no module contents, but you can still
|
|
|
|
import submodules and subpackages from it, regardless of where they're
|
|
|
|
located on ``sys.path``.
|
|
|
|
|
|
|
|
(By the way, both of these additions to the import protocol (i.e. the
|
|
|
|
dynamically-added ``__path__``, and dynamically-created modules)
|
|
|
|
apply recursively to child packages, using the parent package's
|
|
|
|
``__path__`` in place of ``sys.path`` as a basis for generating a
|
|
|
|
child ``__path__``. This means that self-contained and virtual
|
|
|
|
packages can contain each other without limitation, with the caveat
|
|
|
|
that if you put a virtual package inside a self-contained one, it's
|
|
|
|
gonna have a really short ``__path__``!)
|
|
|
|
|
|
|
|
|
|
|
|
Backwards Compatibility and Performance
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
Notice that these two changes *only* affect import operations that
|
|
|
|
today would result in ``ImportError``. As a result, the performance
|
|
|
|
of imports that do not involve virtual packages is unaffected, and
|
|
|
|
potential backward compatibility issues are very restricted.
|
|
|
|
|
|
|
|
Today, if you try to import submodules or subpackages from a module
|
|
|
|
with no ``__path__``, it's an immediate error. And of course, if you
|
|
|
|
don't have a ``zc.py`` or ``zc/__init__.py`` somewhere on ``sys.path``
|
|
|
|
today, ``import zc`` would likewise fail.
|
|
|
|
|
|
|
|
Thus, the only potential backwards-compatibility issues are:
|
|
|
|
|
|
|
|
1. Tools that expect package directories to have an ``__init__``
|
|
|
|
module, that expect directories without an ``__init__`` module
|
|
|
|
to be unimportable, or that expect ``__path__`` attributes to be
|
|
|
|
static, will not recognize virtual packages as packages.
|
|
|
|
|
|
|
|
(In practice, this just means that tools will need updating to
|
|
|
|
support virtual packages, e.g. by using ``pkgutil.walk_modules()``
|
|
|
|
instead of using hardcoded filesystem searches.)
|
|
|
|
|
|
|
|
2. Code that *expects* certain imports to fail may now do something
|
|
|
|
unexpected. This should be fairly rare in practice, as most sane,
|
|
|
|
non-test code does not import things that are expected not to
|
|
|
|
exist!
|
|
|
|
|
|
|
|
The biggest likely exception to the above would be when a piece of
|
|
|
|
code tries to check whether some package is installed by importing
|
|
|
|
it. If this is done *only* by importing a top-level module (i.e., not
|
|
|
|
checking for a ``__version__`` or some other attribute), *and* there
|
|
|
|
is a directory of the same name as the sought-for package on
|
|
|
|
``sys.path`` somewhere, *and* the package is not actually installed,
|
2011-07-20 14:48:00 -04:00
|
|
|
then such code could be fooled into thinking a package is installed
|
|
|
|
that really isn't.
|
|
|
|
|
|
|
|
For example, suppose someone writes a script (``datagen.py``)
|
|
|
|
containing the following code::
|
|
|
|
|
|
|
|
try:
|
|
|
|
import json
|
|
|
|
except ImportError:
|
|
|
|
import simplejson as json
|
|
|
|
|
|
|
|
And runs it in a directory laid out like this::
|
|
|
|
|
|
|
|
datagen.py
|
|
|
|
json/
|
|
|
|
foo.js
|
|
|
|
bar.js
|
|
|
|
|
|
|
|
If ``import json`` succeeded due to the mere presence of the ``json/``
|
|
|
|
subdirectory, the code would incorrectly believe that the ``json``
|
|
|
|
module was available, and proceed to fail with an error.
|
|
|
|
|
|
|
|
However, we can prevent corner cases like these from arising, simply
|
|
|
|
by making one small change to the algorithm presented so far. Instead
|
|
|
|
of allowing you to import a "pure virtual" package (like ``zc``),
|
|
|
|
we allow only importing of the *contents* of virtual packages.
|
|
|
|
|
|
|
|
That is, a statement like ``import zc`` should raise ``ImportError``
|
|
|
|
if there is no ``zc.py`` or ``zc/__init__.py`` on ``sys.path``. But,
|
|
|
|
doing ``import zc.buildout`` should still succeed, as long as there's
|
|
|
|
a ``zc/buildout.py`` or ``zc/buildout/__init__.py`` on ``sys.path``.
|
|
|
|
|
|
|
|
In other words, we don't allow pure virtual packages to be imported
|
|
|
|
directly, only modules and self-contained packages. (This is an
|
|
|
|
acceptable limitation, because there is no *functional* value to
|
|
|
|
importing such a package by itself. After all, the module object
|
|
|
|
will have no *contents* until you import at least one of its
|
|
|
|
subpackages or submodules!)
|
|
|
|
|
|
|
|
Once ``zc.buildout`` has been successfully imported, though, there
|
|
|
|
*will* be a ``zc`` module in ``sys.modules``, and trying to import it
|
|
|
|
will of course succeed. We are only preventing an *initial* import
|
|
|
|
from succeeding, in order to prevent false-positive import successes
|
|
|
|
when clashing subdirectories are present on ``sys.path``.
|
|
|
|
|
|
|
|
So, with this slight change, the ``datagen.py`` example above will
|
|
|
|
work correctly. When it does ``import json``, the mere presence of a
|
|
|
|
``json/`` directory will simply not affect the import process at all,
|
|
|
|
even if it contains ``.py`` files. The ``json/`` directory will still
|
|
|
|
only be searched in the case where an import like ``import
|
|
|
|
json.converter`` is attempted.
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
Meanwhile, tools that expect to locate packages and modules by
|
|
|
|
walking a directory tree can be updated to use the existing
|
|
|
|
``pkgutil.walk_modules()`` API, and tools that need to inspect
|
|
|
|
packages in memory should use the other APIs described in the
|
|
|
|
`Standard Library Changes/Additions`_ section below.
|
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
=============
|
|
|
|
|
2011-07-20 14:48:00 -04:00
|
|
|
A change is made to the existing import process, when importing
|
|
|
|
names containing at least one ``.`` -- that is, imports of modules
|
|
|
|
that have a parent package.
|
|
|
|
|
|
|
|
Specifically, if the parent package does not exist, or exists but
|
|
|
|
lacks a ``__path__`` attribute, an attempt is first made to create a
|
|
|
|
"virtual path" for the parent package (following the algorithm
|
|
|
|
described in the section on `virtual paths`_, below).
|
|
|
|
|
|
|
|
If the computed "virtual path" is empty, an ``ImportError`` results,
|
|
|
|
just as it would today. However, if a non-empty virtual path is
|
|
|
|
obtained, the normal import of the submodule or subpackage proceeds,
|
|
|
|
using that virtual path to find the submodule or subpackage. (Just
|
|
|
|
as it would have with the parent's ``__path__``, if the parent package
|
|
|
|
had existed and had a ``__path__``.)
|
|
|
|
|
|
|
|
When a submodule or subpackage is found (but not yet loaded),
|
|
|
|
the parent package is created and added to ``sys.modules`` (if it
|
|
|
|
didn't exist before), and its ``__path__`` is set to the computed
|
|
|
|
virtual path (if it wasn't already set).
|
|
|
|
|
|
|
|
In this way, when the actual loading of the submodule or subpackage
|
|
|
|
occurs, it will see a parent package existing, and any relative
|
|
|
|
imports will work correctly. However, if no submodule or subpackage
|
|
|
|
exists, then the parent package will *not* be created, nor will a
|
|
|
|
standalone module be converted into a package (by the addition of a
|
|
|
|
spurious ``__path__`` attribute).
|
|
|
|
|
|
|
|
Note, by the way, that this change must be applied *recursively*: that
|
|
|
|
is, if ``foo`` and ``foo.bar`` are pure virtual packages, then
|
|
|
|
``import foo.bar.baz`` must wait until ``foo.bar.baz`` is found before
|
|
|
|
creating module objects for *both* ``foo`` and ``foo.bar``, and then
|
|
|
|
create both of them together, properly setting the ``foo`` module's
|
2011-08-11 14:29:06 -04:00
|
|
|
``.bar`` attribute to point to the ``foo.bar`` module.
|
2011-07-20 14:48:00 -04:00
|
|
|
|
|
|
|
In this way, pure virtual packages are never directly importable:
|
|
|
|
an ``import foo`` or ``import foo.bar`` by itself will fail, and the
|
|
|
|
corresponding modules will not appear in ``sys.modules`` until they
|
|
|
|
are needed to point to a *successfully* imported submodule or
|
|
|
|
self-contained subpackage.
|
|
|
|
|
|
|
|
|
|
|
|
Virtual Paths
|
|
|
|
-------------
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
A virtual path is created by obtaining a :pep:`302` "importer" object for
|
2011-07-20 14:48:00 -04:00
|
|
|
each of the path entries found in ``sys.path`` (for a top-level
|
|
|
|
module) or the parent ``__path__`` (for a submodule).
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
(Note: because ``sys.meta_path`` importers are not associated with
|
|
|
|
``sys.path`` or ``__path__`` entry strings, such importers do *not*
|
|
|
|
participate in this process.)
|
|
|
|
|
|
|
|
Each importer is checked for a ``get_subpath()`` method, and if
|
|
|
|
present, the method is called with the full name of the module/package
|
2011-07-20 14:48:00 -04:00
|
|
|
the path is being constructed for. The return value is either a
|
|
|
|
string representing a subdirectory for the requested package, or
|
2011-07-20 04:43:40 -04:00
|
|
|
``None`` if no such subdirectory exists.
|
|
|
|
|
2011-07-20 14:48:00 -04:00
|
|
|
The strings returned by the importers are added to the path list
|
2011-07-20 04:43:40 -04:00
|
|
|
being built, in the same order as they are found. (``None`` values
|
|
|
|
and missing ``get_subpath()`` methods are simply skipped.)
|
|
|
|
|
2011-07-20 14:48:00 -04:00
|
|
|
The resulting list (whether empty or not) is then stored in a
|
|
|
|
``sys.virtual_package_paths`` dictionary, keyed by module name.
|
|
|
|
|
|
|
|
This dictionary has two purposes. First, it serves as a cache, in
|
|
|
|
the event that more than one attempt is made to import a submodule
|
|
|
|
of a virtual package.
|
|
|
|
|
|
|
|
Second, and more importantly, the dictionary can be used by code that
|
|
|
|
extends ``sys.path`` at runtime to *update* imported packages'
|
|
|
|
``__path__`` attributes accordingly. (See `Standard Library
|
|
|
|
Changes/Additions`_ below for more details.)
|
|
|
|
|
|
|
|
In Python code, the virtual path construction algorithm would look
|
|
|
|
something like this::
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2011-07-20 09:56:48 -04:00
|
|
|
def get_virtual_path(modulename, parent_path=None):
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2011-07-20 14:48:00 -04:00
|
|
|
if modulename in sys.virtual_package_paths:
|
|
|
|
return sys.virtual_package_paths[modulename]
|
|
|
|
|
2011-07-20 09:56:48 -04:00
|
|
|
if parent_path is None:
|
|
|
|
parent_path = sys.path
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2011-07-20 09:56:48 -04:00
|
|
|
path = []
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2011-07-20 09:56:48 -04:00
|
|
|
for entry in parent_path:
|
|
|
|
# Obtain a PEP 302 importer object - see pkgutil module
|
|
|
|
importer = pkgutil.get_importer(entry)
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2011-07-20 09:56:48 -04:00
|
|
|
if hasattr(importer, 'get_subpath'):
|
|
|
|
subpath = importer.get_subpath(modulename)
|
|
|
|
if subpath is not None:
|
|
|
|
path.append(subpath)
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2011-07-20 14:48:00 -04:00
|
|
|
sys.virtual_package_paths[modulename] = path
|
2011-07-20 09:56:48 -04:00
|
|
|
return path
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
And a function like this one should be exposed in the standard
|
|
|
|
library as e.g. ``imp.get_virtual_path()``, so that people creating
|
|
|
|
``__import__`` replacements or ``sys.meta_path`` hooks can reuse it.
|
|
|
|
|
|
|
|
|
|
|
|
Standard Library Changes/Additions
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
The ``pkgutil`` module should be updated to handle this
|
|
|
|
specification appropriately, including any necessary changes to
|
|
|
|
``extend_path()``, ``iter_modules()``, etc.
|
|
|
|
|
|
|
|
Specifically the proposed changes and additions to ``pkgutil`` are:
|
|
|
|
|
|
|
|
* A new ``extend_virtual_paths(path_entry)`` function, to extend
|
|
|
|
existing, already-imported virtual packages' ``__path__`` attributes
|
|
|
|
to include any portions found in a new ``sys.path`` entry. This
|
|
|
|
function should be called by applications extending ``sys.path``
|
|
|
|
at runtime, e.g. when adding a plugin directory or an egg to the
|
|
|
|
path.
|
|
|
|
|
|
|
|
The implementation of this function does a simple top-down traversal
|
2011-07-20 14:48:00 -04:00
|
|
|
of ``sys.virtual_package_paths``, and performs any necessary
|
|
|
|
``get_subpath()`` calls to identify what path entries need to be
|
2023-09-01 15:26:24 -04:00
|
|
|
added to the virtual path for that package, given that ``path_entry``
|
2011-07-20 04:43:40 -04:00
|
|
|
has been added to ``sys.path``. (Or, in the case of sub-packages,
|
2011-07-20 14:48:00 -04:00
|
|
|
adding a derived subpath entry, based on their parent package's
|
|
|
|
virtual path.)
|
|
|
|
|
|
|
|
(Note: this function must update both the path values in
|
|
|
|
``sys.virtual_package_paths`` as well as the ``__path__`` attributes
|
|
|
|
of any corresponding modules in ``sys.modules``, even though in the
|
|
|
|
common case they will both be the same ``list`` object.)
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
* A new ``iter_virtual_packages(parent='')`` function to allow
|
2011-07-20 14:48:00 -04:00
|
|
|
top-down traversal of virtual packages from
|
|
|
|
``sys.virtual_package_paths``, by yielding the child virtual
|
2023-09-01 15:26:24 -04:00
|
|
|
packages of ``parent``. For example, calling
|
2011-07-20 14:48:00 -04:00
|
|
|
``iter_virtual_packages("zope")`` might yield ``zope.app``
|
|
|
|
and ``zope.products`` (if they are virtual packages listed in
|
|
|
|
``sys.virtual_package_paths``), but **not** ``zope.foo.bar``.
|
2011-07-20 04:43:40 -04:00
|
|
|
(This function is needed to implement ``extend_virtual_paths()``,
|
|
|
|
but is also potentially useful for other code that needs to inspect
|
|
|
|
imported virtual packages.)
|
|
|
|
|
|
|
|
* ``ImpImporter.iter_modules()`` should be changed to also detect and
|
|
|
|
yield the names of modules found in virtual packages.
|
|
|
|
|
|
|
|
In addition to the above changes, the ``zipimport`` importer should
|
|
|
|
have its ``iter_modules()`` implementation similarly changed. (Note:
|
|
|
|
current versions of Python implement this via a shim in ``pkgutil``,
|
|
|
|
so technically this is also a change to ``pkgutil``.)
|
|
|
|
|
|
|
|
Last, but not least, the ``imp`` module (or ``importlib``, if
|
2011-07-21 22:58:56 -04:00
|
|
|
appropriate) should expose the algorithm described in the `virtual
|
|
|
|
paths`_ section above, as a
|
2011-07-20 04:43:40 -04:00
|
|
|
``get_virtual_path(modulename, parent_path=None)`` function, so that
|
|
|
|
creators of ``__import__`` replacements can use it.
|
|
|
|
|
|
|
|
|
|
|
|
Implementation Notes
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
For users, developers, and distributors of virtual packages:
|
|
|
|
|
|
|
|
* While virtual packages are easy to set up and use, there is still
|
|
|
|
a time and place for using self-contained packages. While it's not
|
|
|
|
strictly necessary, adding an ``__init__`` module to your
|
|
|
|
self-contained packages lets users of the package (and Python
|
|
|
|
itself) know that *all* of the package's code will be found in
|
|
|
|
that single subdirectory. In addition, it lets you define
|
|
|
|
``__all__``, expose a public API, provide a package-level docstring,
|
|
|
|
and do other things that make more sense for a self-contained
|
|
|
|
project than for a mere "namespace" package.
|
|
|
|
|
2011-07-20 14:48:00 -04:00
|
|
|
* ``sys.virtual_package_paths`` is allowed to contain entries for
|
|
|
|
non-existent or not-yet-imported package names; code that uses its
|
|
|
|
contents should not assume that every key in this dictionary is also
|
|
|
|
present in ``sys.modules`` or that importing the name will
|
|
|
|
necessarily succeed.
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
* If you are changing a currently self-contained package into a
|
|
|
|
virtual one, it's important to note that you can no longer use its
|
|
|
|
``__file__`` attribute to locate data files stored in a package
|
|
|
|
directory. Instead, you must search ``__path__`` or use the
|
|
|
|
``__file__`` of a submodule adjacent to the desired files, or
|
|
|
|
of a self-contained subpackage that contains the desired files.
|
|
|
|
|
|
|
|
(Note: this caveat is already true for existing users of "namespace
|
|
|
|
packages" today. That is, it is an inherent result of being able
|
|
|
|
to partition a package, that you must know *which* partition the
|
|
|
|
desired data file lives in. We mention it here simply so that
|
|
|
|
*new* users converting from self-contained to virtual packages will
|
|
|
|
also be aware of it.)
|
|
|
|
|
|
|
|
* XXX what is the __file__ of a "pure virtual" package? ``None``?
|
|
|
|
Some arbitrary string? The path of the first directory with a
|
|
|
|
trailing separator? No matter what we put, *some* code is
|
|
|
|
going to break, but the last choice might allow some code to
|
|
|
|
accidentally work. Is that good or bad?
|
|
|
|
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
For those implementing :pep:`302` importer objects:
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
* Importers that support the ``iter_modules()`` method (used by
|
|
|
|
``pkgutil`` to locate importable modules and packages) and want to
|
|
|
|
add virtual package support should modify their ``iter_modules()``
|
|
|
|
method so that it discovers and lists virtual packages as well as
|
|
|
|
standard modules and packages. To do this, the importer should
|
|
|
|
simply list all immediate subdirectory names in its jurisdiction
|
|
|
|
that are valid Python identifiers.
|
|
|
|
|
|
|
|
XXX This might list a lot of not-really-packages. Should we
|
|
|
|
require importable contents to exist? If so, how deep do we
|
|
|
|
search, and how do we prevent e.g. link loops, or traversing onto
|
2011-07-20 14:48:00 -04:00
|
|
|
different filesystems, etc.? Ick. Also, if virtual packages are
|
|
|
|
listed, they still can't be *imported*, which is a problem for the
|
|
|
|
way that ``pkgutil.walk_modules()`` is currently implemented.
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
|
|
|
|
not need to implement ``get_subpath()``, because the method
|
|
|
|
is only called on importers corresponding to ``sys.path`` entries
|
|
|
|
and ``__path__`` entries. If a meta importer wishes to support
|
|
|
|
virtual packages, it must do so entirely within its own
|
|
|
|
``find_module()`` implementation.
|
|
|
|
|
|
|
|
Unfortunately, it is unlikely that any such implementation will be
|
|
|
|
able to merge its package subpaths with those of other meta
|
|
|
|
importers or ``sys.path`` importers, so the meaning of "supporting
|
|
|
|
virtual packages" for a meta importer is currently undefined!
|
|
|
|
|
|
|
|
(However, since the intended use case for meta importers is to
|
|
|
|
replace Python's normal import process entirely for some subset of
|
|
|
|
modules, and the number of such importers currently implemented is
|
|
|
|
quite small, this seems unlikely to be a big issue in practice.)
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
|
|
|
.. [1] "namespace" vs "module" packages (mailing list thread)
|
2011-07-20 09:56:48 -04:00
|
|
|
(http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
.. [2] "Dropping __init__.py requirement for subpackages"
|
2017-06-11 15:02:39 -04:00
|
|
|
(https://mail.python.org/pipermail/python-dev/2006-April/064400.html)
|
2011-07-20 04:43:40 -04:00
|
|
|
|
2012-03-12 19:58:36 -04:00
|
|
|
.. [3] Namespace Packages resolution
|
2017-06-11 15:02:39 -04:00
|
|
|
(https://mail.python.org/pipermail/import-sig/2012-March/000421.html)
|
2012-03-12 19:58:36 -04:00
|
|
|
|
2011-07-20 04:43:40 -04:00
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|