python-peps/pep-0420.txt

241 lines
9.3 KiB
Plaintext
Raw Normal View History

PEP: 420
Title: Implicit Namespace Packages
Version: $Revision$
Last-Modified: $Date$
Author: Eric V. Smith <eric@trueblade.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Apr-2012
Python-Version: 3.3
Post-History:
Abstract
========
Namespace packages are a mechanism for splitting a single Python
package across multiple directories on disk. In current Python
versions, an algorithm to compute the packages ``__path__`` must be
formulated. With the enhancement proposed here, the import machinery
itself will construct the list of directories that make up the
package. This PEP builds upon the work started in rejected PEPs 382
and 402. An implementation of this PEP is at [1]_.
Terminology
===========
Within this PEP:
* "package" refers to Python packages as defined by Python's import
statement.
* "distribution" refers to separately installable sets of Python
modules as stored in the Python package index, and installed by
distutils or setuptools.
* "vendor package" refers to groups of files installed by an
operating system's packaging mechanism (e.g. Debian or Redhat
packages install on Linux systems).
* "portion" refers to a set of files in a single directory (possibly
stored in a zip file) that contribute to a namespace package.
* "regular package" refers to packages as they are implemented in
Python 3.2.
This PEP describes a new type of package, the "namespace package".
Namespace packages today
========================
Python currently provides ``pkgutil.extend_path`` to denote a package
as a namespace package. The recommended way of using it is to put::
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
in the package's ``__init__.py``. Every distribution needs to provide
the same contents in its ``__init__.py``, so that ``extend_path`` is
invoked independent of which portion of the package gets imported
first. As a consequence, the package's ``__init__.py`` cannot
practically define any names as it depends on the order of the package
2012-04-20 09:22:25 -04:00
fragments on ``sys.path`` to determine which portion is imported
first. As a special feature, ``extend_path`` reads files named
2012-04-20 09:22:25 -04:00
``<packagename>.pkg`` which allows declaration of additional portions.
2012-04-19 18:10:05 -04:00
setuptools provides a similar function named
``pkg_resources.declare_namespace`` that is used in the form::
import pkg_resources
pkg_resources.declare_namespace(__name__)
In the portion's ``__init__.py``, no assignment to ``__path__`` is
necessary, as ``declare_namespace`` modifies the package ``__path__``
through ``sys.modules``. As a special feature, ``declare_namespace``
also supports zip files, and registers the package name internally so
that future additions to ``sys.path`` by setuptools can properly add
additional portions to each package.
setuptools allows declaring namespace packages in a distribution's
``setup.py``, so that distribution developers don't need to put the
magic ``__path__`` modification into ``__init__.py`` themselves.
Rationale
=========
The current imperative approach to namespace packages has lead to
multiple slightly-incompatible mechanisms for providing namespace
packages. For example, pkgutil supports ``*.pkg`` files; setuptools
doesn't. Likewise, setuptools supports inspecting zip files, and
supports adding portions to its ``_namespace_packages`` variable,
whereas pkgutil doesn't.
Namespace packages are designed to support being split across multiple
2012-04-20 08:35:55 -04:00
directories (and hence found via multiple ``sys.path`` entries). In
this configuration, it doesn't matter if multiple portions all provide
an ``__init__.py`` file, so long as each portion correctly initializes
the namespace package. However, Linux distribution vendors (amongst
others) prefer to combine the separate portions and install them all
into the *same* filesystem directory. This creates a potential for
conflict, as the portions are now attempting to provide the *same*
file on the target system - something that is not allowed by many
package managers. Allowing implicit namespace packages means that the
requirement to provide an ``__init__.py`` file can be dropped
completely, and affected portions can be installed into a common
directory or split across multiple directories as distributions see
fit.
Specification
=============
Regular packages will continue to have an ``__init__.py`` and will
reside in a single directory.
Namespace packages cannot contain an ``__init__.py``. As a
consequence, ``pkgutil.extend_path`` and
``pkg_resources.declare_namespace`` become obsolete for purposes of
namespace package creation. There will be no marker file or directory
for specifing a namespace package.
During import processing, the import machinery will continue to
iterate over the parent path as it does in Python 3.2. While looking
for a module or package named "foo":
* If ``foo/__init__.py`` is found, a regular package is imported.
* If not, but ``foo.{py,pyc,so,pyd}`` is found, a module is imported.
* If not, but ``foo`` is found and is a directory, it is recorded.
If the scan along the parent path completes without finding a module
or package, then a namespace package is created. The new namespace
package:
* Has a ``__file__`` attribute set to the first directory that was
found during the scan, including the trailing path separator.
* Has a ``__path__`` attribute set to the list of directories there
were found and recorded during the scan.
There is no mechanism to automatically recompute the ``__path__`` if
``sys.path`` is altered after a namespace package has already been
created. However, existing namespace utilities (like
``pkgutil.extend_path``) can be used to update them explicitly if
desired.
Note that if "import foo" is executed and "foo" is found as a
namespace package (using the above rules), then "foo" is immediately
created as a package. The creation of the namespace package is not
deferred until a sub-level import occurs.
Impact on Import Finders and Loaders
------------------------------------
PEP 302 defines "finders" that are called to search path elements.
These finders' ``find_module`` methods currently return either a
"loader" object or None. For a finder to contribute to namespace
packages, ``find_module`` will return a third type: a string. This is
the string that will be recorded and later used as a component of the
namespace module's __path__, as described above.
2012-04-24 13:05:25 -04:00
[Consider Brett's idea to pass NamespaceLoader in to PathFinder]
Discussion
==========
There is no intention to remove support of regular packages. If there
is no intention that a package is a namespace package, then there is a
performance advantage to it being a regular package. Creation and
loading of the package can take place once it is located along the
path. With namespace packages, all entries in the path must be
scanned.
2012-04-20 04:51:07 -04:00
Note that an ImportWarning will no longer be raised for a directory
lacking an ``__init__.py`` file. Such a directory will now be
imported as a namespace package, whereas in prior Python versions an
ImportError would be raised.
At PyCon 2012, we had a discussion about namespace packages at which
PEP 382 and PEP 402 were rejected, to be replaced by this PEP [2]_.
Nick Coglan presented a list of his objections to this proposal [3]_.
They are:
* Implicit package directories go against the Zen of Python
* Implicit package directories pose awkward backwards compatibility
challenges
* Implicit package directories introduce ambiguity into filesystem
layouts
* Implicit package directories will permanently entrench current
newbie-hostile behaviour in __main__
(These need to be addressed here.)
Packaging Implications
======================
Multiple portions of a namespace package can be installed into the
same location, or into separate locations. For this section, suppose
there are two portions which define "foo.bar" and "foo.baz". "foo"
itself is a namespace package.
If these are installed in the same location, a single directory "foo"
would be in a directory that is on ``sys.path``. Inside "foo" would
be two directories, "bar" and "baz". If "foo.bar" is removed (perhaps
by an automatic packager), care must be taken not to remove the
"foo/baz" or "foo" directories. Note that in this case "foo" will be
2012-04-25 13:35:07 -04:00
a namespace package (because it lacks an ``__init__.py``), even though
all of its portions are in the same directory.
If the portions are installed in different locations, two different
"foo" directories would be in directories that are on ``sys.path``.
"foo/bar" would be in one of these sys.path entries, and "foo/baz"
would be in the other. Upon removal of "foo.bar", the "foo/bar" and
corresonding "foo" directories can be removed.
Note that even if they are installed in the same directory, "foo.bar"
and "foo.baz" would not have any files in common.
References
==========
.. [1] PEP 420 branch (http://hg.python.org/features/pep-420)
.. [2] PyCon 2012 Namespace Package discussion outcome
(http://mail.python.org/pipermail/import-sig/2012-March/000421.html)
.. [3] Nick Coglan's objection to the lack of marker files or directories
(http://mail.python.org/pipermail/import-sig/2012-March/000423.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: