PEP: 420 Title: Implicit Namespace Packages Version: $Revision$ Last-Modified: $Date$ Author: Eric V. Smith Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-Apr-2012 Python-Version: 3.3 Post-History: Abstract ======== Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. In current Python versions, an algorithm to compute the packages ``__path__`` must be formulated. With the enhancement proposed here, the import machinery itself will construct the list of directories that make up the package. This PEP builds upon the work started in rejected PEPs 382 and 402. Terminology =========== Within this PEP: * "package" refers to Python packages as defined by Python's import statement. * "distribution" refers to separately installable sets of Python modules as stored in the Python package index, and installed by distutils or setuptools. * "vendor package" refers to groups of files installed by an operating system's packaging mechanism (e.g. Debian or Redhat packages install on Linux systems). * "portion" refers to a set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package. * "regular package" refers to packages as they are implemented in Python 3.2. This PEP describes a new type of package, the "namespace package". Namespace packages today ======================== Python currently provides ``pkgutil.extend_path`` to denote a package as a namespace package. The recommended way of using it is to put:: from pkgutil import extend_path __path__ = extend_path(__path__, __name__) in the package's ``__init__.py``. Every distribution needs to provide the same contents in its ``__init__.py``, so that ``extend_path`` is invoked independent of which portion of the package gets imported first. As a consequence, the package's ``__init__.py`` cannot practically define any names as it depends on the order of the package fragments on ``sys.path`` to determine which portion is imported first. As a special feature, ``extend_path`` reads files named ``.pkg`` which allows declaration of additional portions. setuptools provides a similar function named ``pkg_resources.declare_namespace`` that is used in the form:: import pkg_resources pkg_resources.declare_namespace(__name__) In the portion's ``__init__.py``, no assignment to ``__path__`` is necessary, as ``declare_namespace`` modifies the package ``__path__`` through ``sys.modules``. As a special feature, ``declare_namespace`` also supports zip files, and registers the package name internally so that future additions to ``sys.path`` by setuptools can properly add additional portions to each package. setuptools allows declaring namespace packages in a distribution's ``setup.py``, so that distribution developers don't need to put the magic ``__path__`` modification into ``__init__.py`` themselves. Rationale ========= The current imperative approach to namespace packages has lead to multiple slightly-incompatible mechanisms for providing namespace packages. For example, pkgutil supports ``*.pkg`` files; setuptools doesn't. Likewise, setuptools supports inspecting zip files, and supports adding portions to its ``_namespace_packages`` variable, whereas pkgutil doesn't. Namespace packages are designed to support being split across multiple directories (and hence found via multiple ``sys.path`` entries). In this configuration, it doesn't matter if multiple portions all provide an ``__init__.py`` file, so long as each portion correctly initializes the namespace package. However, Linux distribution vendors (amongst others) prefer to combine the separate portions and install them all into the *same* filesystem directory. This creates a potential for conflict, as the portions are now attempting to provide the *same* file on the target system - something that is not allowed by many package managers. Allowing implicit namespace packages means that the requirement to provide an ``__init__.py`` file can be dropped completely, and affected portions can be installed into a common directory or split across multiple directories as distributions see fit. Specification ============= Regular packages will continue to have an ``__init__.py`` and will reside in a single directory. Namespace packages cannot contain an ``__init__.py``. As a consequence, ``pkgutil.extend_path`` and ``pkg_resources.declare_namespace`` become obsolete for purposes of namespace package creation. There will be no marker file or directory for specifing a namespace package. During import processing, the import machinery will continue to iterate over the parent path as it does in Python 3.2. While looking for a module or package named "foo": * If ``foo/__init__.py`` is found, a regular package is imported. * If not, but ``foo.{py,pyc,so,pyd}`` is found, a module is imported. * If not, but ``foo`` is found and is a directory, it is recorded. If the scan along the parent path completes without finding a module or package, then a namespace package is created. The new namespace package: * Has a ``__file__`` attribute set to the first directory that was found during the scan, including the trailing path separator. * Has a ``__path__`` attribute set to the list of directories there were found and recorded during the scan. There is no mechanism to automatically recompute the ``__path__`` if ``sys.path`` is altered after a namespace package has already been created. However, existing namespace utilities (like ``pkgutil.extend_path``) can be used to update them explicitly if desired. Note that if "import foo" is executed and "foo" is found as a namespace package (using the above rules), then "foo" is immediately created as a package. The creation of the namespace package is not deferred until a sub-level import occurs. Impact on Import Finders and Loaders ------------------------------------ To be determined in the sample implementation. Discussion ========== There is no intention to remove support of regular packages. If there is no intention that a package is a namespace package, then there is a performance advantage to it being a regular package. Creation and loading of the package can take place once it is located along the path. With namespace packages, all entries in the path must be scanned. Note that an ImportWarning will no longer be raised for a directory lacking an ``__init__.py`` file. Such a directory will now be imported as a namespace package, whereas in prior Python versions an ImportError would be raised. At PyCon 2012, we had a discussion about namespace packages at which PEP 382 and PEP 402 were rejected, to be replaced by this PEP [1]_. Nick Coglan presented a list of his objections to this proposal [2]_. They are: * Implicit package directories go against the Zen of Python * Implicit package directories pose awkward backwards compatibility challenges * Implicit package directories introduce ambiguity into filesystem layouts * Implicit package directories will permanently entrench current newbie-hostile behaviour in __main__ (These need to be addressed here.) References ========== .. [1] PyCon 2012 Namespace Package discussion outcome (http://mail.python.org/pipermail/import-sig/2012-March/000421.html) .. [2] Nick Coglan's objection to the lack of marker files or directories (http://mail.python.org/pipermail/import-sig/2012-March/000423.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: