268 lines
11 KiB
Plaintext
268 lines
11 KiB
Plaintext
PEP: 420
|
||
Title: Implicit Namespace Packages
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Eric V. Smith <eric@trueblade.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 19-Apr-2012
|
||
Python-Version: 3.3
|
||
Post-History:
|
||
|
||
Abstract
|
||
========
|
||
|
||
Namespace packages are a mechanism for splitting a single Python
|
||
package across multiple directories on disk. In current Python
|
||
versions, an algorithm to compute the packages ``__path__`` must be
|
||
formulated. With the enhancement proposed here, the import machinery
|
||
itself will construct the list of directories that make up the
|
||
package. This PEP builds upon the work started in rejected PEPs 382
|
||
and 402. An implementation of this PEP is at [1]_.
|
||
|
||
Terminology
|
||
===========
|
||
|
||
Within this PEP:
|
||
|
||
* "package" refers to Python packages as defined by Python's import
|
||
statement.
|
||
* "distribution" refers to separately installable sets of Python
|
||
modules as stored in the Python package index, and installed by
|
||
distutils or setuptools.
|
||
* "vendor package" refers to groups of files installed by an
|
||
operating system's packaging mechanism (e.g. Debian or Redhat
|
||
packages install on Linux systems).
|
||
* "portion" refers to a set of files in a single directory (possibly
|
||
stored in a zip file) that contribute to a namespace package.
|
||
* "regular package" refers to packages as they are implemented in
|
||
Python 3.2 and earlier.
|
||
|
||
This PEP defines a new type of package, the "namespace package".
|
||
|
||
Namespace packages today
|
||
========================
|
||
|
||
Python currently provides ``pkgutil.extend_path`` to denote a package
|
||
as a namespace package. The recommended way of using it is to put::
|
||
|
||
from pkgutil import extend_path
|
||
__path__ = extend_path(__path__, __name__)
|
||
|
||
in the package's ``__init__.py``. Every distribution needs to provide
|
||
the same contents in its ``__init__.py``, so that ``extend_path`` is
|
||
invoked independent of which portion of the package gets imported
|
||
first. As a consequence, the package's ``__init__.py`` cannot
|
||
practically define any names as it depends on the order of the package
|
||
fragments on ``sys.path`` to determine which portion is imported
|
||
first. As a special feature, ``extend_path`` reads files named
|
||
``<packagename>.pkg`` which allows declaration of additional portions.
|
||
|
||
setuptools provides a similar function named
|
||
``pkg_resources.declare_namespace`` that is used in the form::
|
||
|
||
import pkg_resources
|
||
pkg_resources.declare_namespace(__name__)
|
||
|
||
In the portion's ``__init__.py``, no assignment to ``__path__`` is
|
||
necessary, as ``declare_namespace`` modifies the package ``__path__``
|
||
through ``sys.modules``. As a special feature, ``declare_namespace``
|
||
also supports zip files, and registers the package name internally so
|
||
that future additions to ``sys.path`` by setuptools can properly add
|
||
additional portions to each package.
|
||
|
||
setuptools allows declaring namespace packages in a distribution's
|
||
``setup.py``, so that distribution developers don't need to put the
|
||
magic ``__path__`` modification into ``__init__.py`` themselves.
|
||
|
||
Rationale
|
||
=========
|
||
|
||
The current imperative approach to namespace packages has lead to
|
||
multiple slightly-incompatible mechanisms for providing namespace
|
||
packages. For example, pkgutil supports ``*.pkg`` files; setuptools
|
||
doesn't. Likewise, setuptools supports inspecting zip files, and
|
||
supports adding portions to its ``_namespace_packages`` variable,
|
||
whereas pkgutil doesn't.
|
||
|
||
Namespace packages are designed to support being split across multiple
|
||
directories (and hence found via multiple ``sys.path`` entries). In
|
||
this configuration, it doesn't matter if multiple portions all provide
|
||
an ``__init__.py`` file, so long as each portion correctly initializes
|
||
the namespace package. However, Linux distribution vendors (amongst
|
||
others) prefer to combine the separate portions and install them all
|
||
into the *same* filesystem directory. This creates a potential for
|
||
conflict, as the portions are now attempting to provide the *same*
|
||
file on the target system - something that is not allowed by many
|
||
package managers. Allowing implicit namespace packages means that the
|
||
requirement to provide an ``__init__.py`` file can be dropped
|
||
completely, and affected portions can be installed into a common
|
||
directory or split across multiple directories as distributions see
|
||
fit.
|
||
|
||
Specification
|
||
=============
|
||
|
||
Regular packages will continue to have an ``__init__.py`` and will
|
||
reside in a single directory.
|
||
|
||
Namespace packages cannot contain an ``__init__.py``. As a
|
||
consequence, ``pkgutil.extend_path`` and
|
||
``pkg_resources.declare_namespace`` become obsolete for purposes of
|
||
namespace package creation. There will be no marker file or directory
|
||
for specifing a namespace package.
|
||
|
||
During import processing, the import machinery will continue to
|
||
iterate over the parent path as it does in Python 3.2. While looking
|
||
for a module or package named "foo":
|
||
|
||
* If ``foo/__init__.py`` is found, a regular package is imported.
|
||
* If not, but ``foo.{py,pyc,so,pyd}`` is found, a module is imported.
|
||
* If not, but ``foo`` is found and is a directory, it is recorded.
|
||
|
||
If the scan along the parent path completes without finding a module
|
||
or package and at least one directory was recorded, then a namespace
|
||
package is created. The new namespace package:
|
||
|
||
* Has a ``__file__`` attribute set to the first directory that was
|
||
found during the scan, including the trailing path separator.
|
||
* Has a ``__path__`` attribute set to the list of directories there
|
||
were found and recorded during the scan.
|
||
|
||
There is no mechanism to automatically recompute the ``__path__`` if
|
||
``sys.path`` is altered after a namespace package has already been
|
||
created. However, existing namespace utilities (like
|
||
``pkgutil.extend_path``) can be used to update them explicitly if
|
||
desired.
|
||
|
||
Note that if "import foo" is executed and "foo" is found as a
|
||
namespace package (using the above rules), then "foo" is immediately
|
||
created as a package. The creation of the namespace package is not
|
||
deferred until a sub-level import occurs.
|
||
|
||
A namespace package is not fundamentally different from a regular
|
||
package. It is just a different way of creating packages. Once a
|
||
namespace package is created, there is no functional difference
|
||
between it and a regular package. The only observable difference is
|
||
that the namespace package's ``__file__`` attribute will end with a
|
||
path separator (typically a slash or backslash, depending on the
|
||
platform).
|
||
|
||
Impact on Import Finders and Loaders
|
||
------------------------------------
|
||
|
||
PEP 302 defines "finders" that are called to search path elements.
|
||
These finders' ``find_module`` methods currently return either a
|
||
"loader" object or None. For a finder to contribute to namespace
|
||
packages, ``find_module`` will return a third type: a string. This is
|
||
the string that will be recorded and later used as a component of the
|
||
namespace module's ``__path__``, as described above. This string must
|
||
not contain a trailing path separator.
|
||
|
||
There is no impact on PEP 302 "loaders".
|
||
|
||
If an existing finder is not updated to support returning a string
|
||
from ``find_module``, the only impact is that such a loader will be
|
||
unable to provide portions of a namespace package.
|
||
|
||
Discussion
|
||
==========
|
||
|
||
At PyCon 2012, we had a discussion about namespace packages at which
|
||
PEP 382 and PEP 402 were rejected, to be replaced by this PEP [2]_.
|
||
|
||
There is no intention to remove support of regular packages. If a
|
||
developer knows that her package will never be a portion of a
|
||
namespace package, then there is a performance advantage to it being a
|
||
regular package (with an ``__init__.py``). Creation and loading of a
|
||
regular package can take place immediately when it is located along
|
||
the path. With namespace packages, all entries in the path must be
|
||
scanned before the package is created.
|
||
|
||
Note that an ImportWarning will no longer be raised for a directory
|
||
lacking an ``__init__.py`` file. Such a directory will now be
|
||
imported as a namespace package, whereas in prior Python versions an
|
||
ImportWarning would be raised.
|
||
|
||
Nick Coglan presented a list of his objections to this proposal [3]_.
|
||
They are:
|
||
|
||
* Implicit package directories go against the Zen of Python
|
||
|
||
* Implicit package directories pose awkward backwards compatibility
|
||
challenges
|
||
|
||
* Implicit package directories introduce ambiguity into filesystem
|
||
layouts
|
||
|
||
* Implicit package directories will permanently entrench current
|
||
newbie-hostile behaviour in __main__
|
||
|
||
(These need to be addressed here.)
|
||
|
||
Phillip Eby asked about auto-updating of ``__path__``, instead of it
|
||
being a simple list [4]_. It is the intent of this PEP to get the
|
||
simplest possible solution working. It will be possible at a later
|
||
date to add such features. Several possible ways to do so were
|
||
discussed in the referenced email thread.
|
||
|
||
Packaging Implications
|
||
======================
|
||
|
||
Multiple portions of a namespace package can be installed into the
|
||
same directory, or into separate directories. For this section,
|
||
suppose there are two portions which define "foo.bar" and "foo.baz".
|
||
"foo" itself is a namespace package.
|
||
|
||
If these are installed in the same location, a single directory "foo"
|
||
would be in a directory that is on ``sys.path``. Inside "foo" would
|
||
be two directories, "bar" and "baz". If "foo.bar" is removed (perhaps
|
||
by an OS package manager), care must be taken not to remove the
|
||
"foo/baz" or "foo" directories. Note that in this case "foo" will be
|
||
a namespace package (because it lacks an ``__init__.py``), even though
|
||
all of its portions are in the same directory.
|
||
|
||
Note that "foo.bar" and "foo.baz" can be installed into the same "foo"
|
||
directory because they will not have any files in common.
|
||
|
||
If the portions are installed in different locations, two different
|
||
"foo" directories would be in directories that are on ``sys.path``.
|
||
"foo/bar" would be in one of these sys.path entries, and "foo/baz"
|
||
would be in the other. Upon removal of "foo.bar", the "foo/bar" and
|
||
corresonding "foo" directories can be completely removed. But
|
||
"foo/baz" and its corresponding "foo" directory cannot be removed.
|
||
|
||
It is also possible to have the "foo.bar" portion installed in a
|
||
directory on ``sys.path``, and have the "foo.baz" portion provided in
|
||
a zip file, also on ``sys.path``.
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] PEP 420 branch (http://hg.python.org/features/pep-420)
|
||
|
||
.. [2] PyCon 2012 Namespace Package discussion outcome
|
||
(http://mail.python.org/pipermail/import-sig/2012-March/000421.html)
|
||
|
||
.. [3] Nick Coglan's objection to the lack of marker files or directories
|
||
(http://mail.python.org/pipermail/import-sig/2012-March/000423.html)
|
||
|
||
.. [4] Phillip Eby's question about auto-updating __path__
|
||
(http://mail.python.org/pipermail/import-sig/2012-April/000468.html)
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|