436 lines
18 KiB
Plaintext
436 lines
18 KiB
Plaintext
PEP: 420
|
||
Title: Implicit Namespace Packages
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Eric V. Smith <eric@trueblade.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 19-Apr-2012
|
||
Python-Version: 3.3
|
||
Post-History:
|
||
|
||
Abstract
|
||
========
|
||
|
||
Namespace packages are a mechanism for splitting a single Python package
|
||
across multiple directories on disk. In current Python versions, an algorithm
|
||
to compute the packages ``__path__`` must be formulated. With the enhancement
|
||
proposed here, the import machinery itself will construct the list of
|
||
directories that make up the package. This PEP builds upon previous work,
|
||
documented in PEP 382 and PEP 402. Those PEPs have since been rejected in
|
||
favor of this one. An implementation of this PEP is at [1]_.
|
||
|
||
Terminology
|
||
===========
|
||
|
||
Within this PEP:
|
||
|
||
* "package" refers to Python packages as defined by Python's import
|
||
statement.
|
||
* "distribution" refers to separately installable sets of Python
|
||
modules as stored in the Python package index, and installed by
|
||
distutils or setuptools.
|
||
* "vendor package" refers to groups of files installed by an
|
||
operating system's packaging mechanism (e.g. Debian or Redhat
|
||
packages install on Linux systems).
|
||
* "regular package" refers to packages as they are implemented in
|
||
Python 3.2 and earlier.
|
||
* "portion" refers to a set of files in a single directory (possibly
|
||
stored in a zip file) that contribute to a namespace package.
|
||
* "legacy portion" refers to a portion that uses ``__path__``
|
||
manipulation in order to implement namespace packages.
|
||
|
||
This PEP defines a new type of package, the "namespace package".
|
||
|
||
Namespace packages today
|
||
========================
|
||
|
||
Python currently provides ``pkgutil.extend_path`` to denote a package
|
||
as a namespace package. The recommended way of using it is to put::
|
||
|
||
from pkgutil import extend_path
|
||
__path__ = extend_path(__path__, __name__)
|
||
|
||
in the package's ``__init__.py``. Every distribution needs to provide
|
||
the same contents in its ``__init__.py``, so that ``extend_path`` is
|
||
invoked independent of which portion of the package gets imported
|
||
first. As a consequence, the package's ``__init__.py`` cannot
|
||
practically define any names as it depends on the order of the package
|
||
fragments on ``sys.path`` to determine which portion is imported
|
||
first. As a special feature, ``extend_path`` reads files named
|
||
``<packagename>.pkg`` which allows declaration of additional portions.
|
||
|
||
setuptools provides a similar function named
|
||
``pkg_resources.declare_namespace`` that is used in the form::
|
||
|
||
import pkg_resources
|
||
pkg_resources.declare_namespace(__name__)
|
||
|
||
In the portion's ``__init__.py``, no assignment to ``__path__`` is
|
||
necessary, as ``declare_namespace`` modifies the package ``__path__``
|
||
through ``sys.modules``. As a special feature, ``declare_namespace``
|
||
also supports zip files, and registers the package name internally so
|
||
that future additions to ``sys.path`` by setuptools can properly add
|
||
additional portions to each package.
|
||
|
||
setuptools allows declaring namespace packages in a distribution's
|
||
``setup.py``, so that distribution developers don't need to put the
|
||
magic ``__path__`` modification into ``__init__.py`` themselves.
|
||
|
||
See PEP 402's "The Problem" section [2]_ for more details on the
|
||
motivation for namespace packages. Note that PEP 402 has been
|
||
rejected, but the motivating use cases are still valid.
|
||
|
||
Rationale
|
||
=========
|
||
|
||
The current imperative approach to namespace packages has led to
|
||
multiple slightly-incompatible mechanisms for providing namespace
|
||
packages. For example, pkgutil supports ``*.pkg`` files; setuptools
|
||
doesn't. Likewise, setuptools supports inspecting zip files, and
|
||
supports adding portions to its ``_namespace_packages`` variable,
|
||
whereas pkgutil doesn't.
|
||
|
||
Namespace packages are designed to support being split across multiple
|
||
directories (and hence found via multiple ``sys.path`` entries). In
|
||
this configuration, it doesn't matter if multiple portions all provide
|
||
an ``__init__.py`` file, so long as each portion correctly initializes
|
||
the namespace package. However, Linux distribution vendors (amongst
|
||
others) prefer to combine the separate portions and install them all
|
||
into the *same* file system directory. This creates a potential for
|
||
conflict, as the portions are now attempting to provide the *same*
|
||
file on the target system - something that is not allowed by many
|
||
package managers. Allowing implicit namespace packages means that the
|
||
requirement to provide an ``__init__.py`` file can be dropped
|
||
completely, and affected portions can be installed into a common
|
||
directory or split across multiple directories as distributions see
|
||
fit.
|
||
|
||
Specification
|
||
=============
|
||
|
||
Regular packages will continue to have an ``__init__.py`` and will
|
||
reside in a single directory.
|
||
|
||
Namespace packages cannot contain an ``__init__.py``. As a
|
||
consequence, ``pkgutil.extend_path`` and
|
||
``pkg_resources.declare_namespace`` become obsolete for purposes of
|
||
namespace package creation. There will be no marker file or directory
|
||
for specifying a namespace package.
|
||
|
||
During import processing, the import machinery will continue to
|
||
iterate over each directory in the parent path as it does in Python
|
||
3.2. While looking for a module or package named "foo", for each
|
||
directory in the parent path:
|
||
|
||
* If ``<directory>/foo/__init__.py`` is found, a regular package is
|
||
imported and returned.
|
||
|
||
* If not, but ``<directory>/foo.{py,pyc,so,pyd}`` is found, a module
|
||
is imported and returned. The exact list of extension varies by
|
||
platform and whether the -O flag is specified. The list here is
|
||
representative.
|
||
|
||
* If not, but ``<directory>/foo`` is found and is a directory, it is
|
||
recorded and the scan continues with the next directory in the
|
||
parent path.
|
||
|
||
* Otherwise the scan continues with the next directory in the parent
|
||
path.
|
||
|
||
If the scan completes without returning a module or package, and at
|
||
least one directory was recorded, then a namespace package is created.
|
||
The new namespace package:
|
||
|
||
* Has a ``__path__`` attribute set to an iterable of the path strings
|
||
that were found and recorded during the scan.
|
||
|
||
* Does not have a ``__file__`` attribute.
|
||
|
||
Note that if "import foo" is executed and "foo" is found as a
|
||
namespace package (using the above rules), then "foo" is immediately
|
||
created as a package. The creation of the namespace package is not
|
||
deferred until a sub-level import occurs.
|
||
|
||
A namespace package is not fundamentally different from a regular
|
||
package. It is just a different way of creating packages. Once a
|
||
namespace package is created, there is no functional difference
|
||
between it and a regular package.
|
||
|
||
Dynamic path computation
|
||
------------------------
|
||
|
||
A namespace package's ``__path__`` will be recomputed if the value of
|
||
the parent path changes. In order for this feature to work, the parent
|
||
path must be modified in-place, not replaced with a new object. For
|
||
example, for top-level namespace packages, this will work::
|
||
|
||
sys.path.append('new-dir')
|
||
|
||
But this will not::
|
||
|
||
sys.path = sys.path + ['new-dir']
|
||
|
||
Impact on import finders and loaders
|
||
------------------------------------
|
||
|
||
PEP 302 defines "finders" that are called to search path elements.
|
||
These finders' ``find_module`` methods return either a "loader" object
|
||
or ``None``.
|
||
|
||
For a finder to contribute to namespace packages, it must implement a
|
||
new ``find_loader(fullname)`` method. ``fullname`` has the same
|
||
meaning as for ``find_module``. ``find_loader`` always returns a
|
||
2-tuple of ``(loader, <iterable-of-path-entries>)``. ``loader`` may
|
||
be ``None``, in which case ``<iterable-of-path-entries>`` (which may
|
||
be empty) is added to the list of recorded path entries and path
|
||
searching continues. If ``loader`` is not ``None``, it is immediately
|
||
used to load a module or regular package.
|
||
|
||
Even if ``loader`` is returned and is not ``None``,
|
||
``<iterable-of-path-entries>`` must still contain the path entries for
|
||
the package. This allows code such as ``pkgutil.extend_path()`` to
|
||
compute path entries for packages that it does not load.
|
||
|
||
Note that multiple path entries per finder are allowed. This is to
|
||
support the case where a finder discovers multiple namespace portions
|
||
for a given ``fullname``. Many finders will support only a single
|
||
namespace package portion per ``find_loader`` call, in which case this
|
||
iterable will contain only a single string.
|
||
|
||
The import machinery will call ``find_loader`` if it exists, else fall
|
||
back to ``find_module``. Legacy finders which implement
|
||
``find_module`` but not ``find_loader`` will be unable to contribute
|
||
portions to a namespace package.
|
||
|
||
The specification expands PEP 302 loaders to include an optional method called
|
||
``module_repr()`` which if present, is used to generate module object reprs.
|
||
See the section below for further details.
|
||
|
||
Differences between namespace packages and regular packages
|
||
-----------------------------------------------------------
|
||
|
||
Namespace packages and regular packages are very similar. The
|
||
differences are:
|
||
|
||
* Portions of namespace packages need not all come from the same
|
||
directory structure, or even from the same loader. Regular packages
|
||
are self-contained: all parts live in the same directory hierarchy.
|
||
|
||
* Namespace packages have no ``__file__`` attribute.
|
||
|
||
* Namespace packages' ``__path__`` attribute is a read-only iterable
|
||
of strings, which is automatically updated when the parent path is
|
||
modified.
|
||
|
||
* Namespace packages have no ``__init__.py`` module.
|
||
|
||
* Namespace packages have a different type of object for their
|
||
``__loader__`` attribute.
|
||
|
||
|
||
Namespace packages in the standard library
|
||
------------------------------------------
|
||
|
||
It is possible, and this PEP explicitly allows, that parts of the
|
||
standard library be implemented as namespace packages. When and if
|
||
any standard library packages become namespace packages is outside the
|
||
scope of this PEP.
|
||
|
||
|
||
Migrating from legacy namespace packages
|
||
----------------------------------------
|
||
|
||
As described above, prior to this PEP ``pkgutil.extend_path()`` was
|
||
used by legacy portions to create namespace packages. Because it is
|
||
likely not practical for all existing portions of a namespace package
|
||
to be migrated to this PEP at once, ``extend_path()`` will be modified
|
||
to also recognize PEP 420 namespace packages. This will allow some
|
||
portions of a namespace to be legacy portions while others are
|
||
migrated to PEP 420. These hybrid namespace packages will not have
|
||
the dynamic path computation that normal namespace packages have,
|
||
since ``extend_path()`` never provided this functionality in the past.
|
||
|
||
|
||
Packaging Implications
|
||
======================
|
||
|
||
Multiple portions of a namespace package can be installed into the
|
||
same directory, or into separate directories. For this section,
|
||
suppose there are two portions which define "foo.bar" and "foo.baz".
|
||
"foo" itself is a namespace package.
|
||
|
||
If these are installed in the same location, a single directory "foo"
|
||
would be in a directory that is on ``sys.path``. Inside "foo" would
|
||
be two directories, "bar" and "baz". If "foo.bar" is removed (perhaps
|
||
by an OS package manager), care must be taken not to remove the
|
||
"foo/baz" or "foo" directories. Note that in this case "foo" will be
|
||
a namespace package (because it lacks an ``__init__.py``), even though
|
||
all of its portions are in the same directory.
|
||
|
||
Note that "foo.bar" and "foo.baz" can be installed into the same "foo"
|
||
directory because they will not have any files in common.
|
||
|
||
If the portions are installed in different locations, two different
|
||
"foo" directories would be in directories that are on ``sys.path``.
|
||
"foo/bar" would be in one of these sys.path entries, and "foo/baz"
|
||
would be in the other. Upon removal of "foo.bar", the "foo/bar" and
|
||
corresponding "foo" directories can be completely removed. But
|
||
"foo/baz" and its corresponding "foo" directory cannot be removed.
|
||
|
||
It is also possible to have the "foo.bar" portion installed in a
|
||
directory on ``sys.path``, and have the "foo.baz" portion provided in
|
||
a zip file, also on ``sys.path``.
|
||
|
||
Discussion
|
||
==========
|
||
|
||
At PyCon 2012, we had a discussion about namespace packages at which
|
||
PEP 382 and PEP 402 were rejected, to be replaced by this PEP [3]_.
|
||
|
||
There is no intention to remove support of regular packages. If a
|
||
developer knows that her package will never be a portion of a
|
||
namespace package, then there is a performance advantage to it being a
|
||
regular package (with an ``__init__.py``). Creation and loading of a
|
||
regular package can take place immediately when it is located along
|
||
the path. With namespace packages, all entries in the path must be
|
||
scanned before the package is created.
|
||
|
||
Note that an ImportWarning will no longer be raised for a directory
|
||
lacking an ``__init__.py`` file. Such a directory will now be
|
||
imported as a namespace package, whereas in prior Python versions an
|
||
ImportWarning would be raised.
|
||
|
||
Nick Coghlan presented a list of his objections to this proposal [4]_.
|
||
They are:
|
||
|
||
1. Implicit package directories go against the Zen of Python.
|
||
|
||
2. Implicit package directories pose awkward backwards compatibility
|
||
challenges.
|
||
|
||
3. Implicit package directories introduce ambiguity into file system
|
||
layouts.
|
||
|
||
4. Implicit package directories will permanently entrench current
|
||
newbie-hostile behavior in ``__main__``.
|
||
|
||
Nick later gave a detailed response to his own objections[5]_, which
|
||
is summarized here:
|
||
|
||
1. The practicality of this PEP wins over other proposals and the
|
||
status quo.
|
||
|
||
2. Minor backward compatibility issues are okay, as long as they are
|
||
properly documented.
|
||
|
||
3. This will be addressed in PEP 395.
|
||
|
||
4. This will also be addressed in PEP 395.
|
||
|
||
The inclusion of namespace packages in the standard library was
|
||
motivated by Martin v. Löwis, who wanted the ``encodings`` package to
|
||
become a namespace package [6]_. While this PEP allows for standard
|
||
library packages to become namespaces, it defers a decision on
|
||
``encodings``.
|
||
|
||
``find_module`` versus ``find_loader``
|
||
--------------------------------------
|
||
|
||
An early draft of this PEP specified a change to the ``find_module``
|
||
method in order to support namespace packages. It would be modified
|
||
to return a string in the case where a namespace package portion was
|
||
discovered.
|
||
|
||
However, this caused a problem with existing code outside of the
|
||
standard library which calls ``find_module``. Because this code would
|
||
not be upgraded in concert with changes required by this PEP, it would
|
||
fail when it would receive unexpected return values from
|
||
``find_module``. Because of this incompatibility, this PEP now
|
||
specifies that finders that want to provide namespace portions must
|
||
implement the ``find_loader`` method, described above.
|
||
|
||
The use case for supporting multiple portions per ``find_loader`` call
|
||
is given in [7]_.
|
||
|
||
|
||
Module reprs
|
||
============
|
||
|
||
Previously, module reprs were hard coded based on assumptions about a module's
|
||
``__file__`` attribute. If this attribute existed and was a string, it was
|
||
assumed to be a file system path, and the module object's repr would include
|
||
this in its value. The only exception was that PEP 302 reserved missing
|
||
``__file__`` attributes to built-in modules, and in CPython, this assumption
|
||
was baked into the module object's implementation. Because of this
|
||
restriction, some modules contained contrived ``__file__`` values that did not
|
||
reflect file system paths, and which could cause unexpected problems later
|
||
(e.g. ``os.path.join()`` on a non-path ``__file__`` would return gibberish).
|
||
|
||
This PEP relaxes this constraint, and leaves the setting of ``__file__`` to
|
||
the purview of the loader producing the module. Loaders may opt to leave
|
||
``__file__`` unset if no file system path is appropriate. Loaders may also
|
||
set additional reserved attributes on the module if useful. This means that
|
||
the definitive way to determine the origin of a module is to check its
|
||
``__loader__`` attribute.
|
||
|
||
For example, namespace packages as described in this PEP will have no
|
||
``__file__`` attribute because no corresponding file exists. In order to
|
||
provide flexibility and descriptiveness in the reprs of such modules, a new
|
||
optional protocol is added to PEP 302 loaders. Loaders can implement a
|
||
``module_repr()`` method which takes a single argument, the module object.
|
||
This method should return the string to be used verbatim as the repr of the
|
||
module. The rules for producing a module repr are now standardized as:
|
||
|
||
* If the module has an ``__loader__`` and that loader has a ``module_repr()``
|
||
method, call it with a single argument, which is the module object. The
|
||
value returned is used as the module's repr.
|
||
* Exceptions from ``module_repr()`` are ignored, and the following steps
|
||
are used instead.
|
||
* If the module has an ``__file__`` attribute, this is used as part of the
|
||
module's repr.
|
||
* If the module has no ``__file__`` but does have an ``__loader__``, then the
|
||
loader's repr is used as part of the module's repr.
|
||
* Otherwise, just use the module's ``__name__`` in the repr.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] PEP 420 branch (http://hg.python.org/features/pep-420)
|
||
|
||
.. [2] PEP 402's description of use cases for namespace packages
|
||
(http://www.python.org/dev/peps/pep-0402/#the-problem)
|
||
|
||
.. [3] PyCon 2012 Namespace Package discussion outcome
|
||
(http://mail.python.org/pipermail/import-sig/2012-March/000421.html)
|
||
|
||
.. [4] Nick Coghlan's objection to the lack of marker files or directories
|
||
(http://mail.python.org/pipermail/import-sig/2012-March/000423.html)
|
||
|
||
.. [5] Nick Coghlan's response to his initial objections
|
||
(http://mail.python.org/pipermail/import-sig/2012-April/000464.html)
|
||
|
||
.. [6] Martin v. Löwis's suggestion to make ``encodings`` a namespace
|
||
package
|
||
(http://mail.python.org/pipermail/import-sig/2012-May/000540.html)
|
||
|
||
.. [7] Use case for multiple portions per ``find_loader`` call
|
||
(http://mail.python.org/pipermail/import-sig/2012-May/000585.html)
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|