PEP: 420 Title: Implicit Namespace Packages Version: $Revision$ Last-Modified: $Date$ Author: Eric V. Smith Status: Final Type: Standards Track Content-Type: text/x-rst Created: 19-Apr-2012 Python-Version: 3.3 Post-History: Resolution: https://mail.python.org/pipermail/python-dev/2012-May/119651.html Abstract ======== Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. In current Python versions, an algorithm to compute the packages ``__path__`` must be formulated. With the enhancement proposed here, the import machinery itself will construct the list of directories that make up the package. This PEP builds upon previous work, documented in :pep:`382` and :pep:`402`. Those PEPs have since been rejected in favor of this one. An implementation of this PEP is at [1]_. Terminology =========== Within this PEP: * "package" refers to Python packages as defined by Python's import statement. * "distribution" refers to separately installable sets of Python modules as stored in the Python package index, and installed by distutils or setuptools. * "vendor package" refers to groups of files installed by an operating system's packaging mechanism (e.g. Debian or Redhat packages install on Linux systems). * "regular package" refers to packages as they are implemented in Python 3.2 and earlier. * "portion" refers to a set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package. * "legacy portion" refers to a portion that uses ``__path__`` manipulation in order to implement namespace packages. This PEP defines a new type of package, the "namespace package". Namespace packages today ======================== Python currently provides ``pkgutil.extend_path`` to denote a package as a namespace package. The recommended way of using it is to put:: from pkgutil import extend_path __path__ = extend_path(__path__, __name__) in the package's ``__init__.py``. Every distribution needs to provide the same contents in its ``__init__.py``, so that ``extend_path`` is invoked independent of which portion of the package gets imported first. As a consequence, the package's ``__init__.py`` cannot practically define any names as it depends on the order of the package fragments on ``sys.path`` to determine which portion is imported first. As a special feature, ``extend_path`` reads files named ``.pkg`` which allows declaration of additional portions. setuptools provides a similar function named ``pkg_resources.declare_namespace`` that is used in the form:: import pkg_resources pkg_resources.declare_namespace(__name__) In the portion's ``__init__.py``, no assignment to ``__path__`` is necessary, as ``declare_namespace`` modifies the package ``__path__`` through ``sys.modules``. As a special feature, ``declare_namespace`` also supports zip files, and registers the package name internally so that future additions to ``sys.path`` by setuptools can properly add additional portions to each package. setuptools allows declaring namespace packages in a distribution's ``setup.py``, so that distribution developers don't need to put the magic ``__path__`` modification into ``__init__.py`` themselves. See :pep:`402`'s :pep:`"The Problem" <402#the-problem>` section for additional motivations for namespace packages. Note that :pep:`402` has been rejected, but the motivating use cases are still valid. Rationale ========= The current imperative approach to namespace packages has led to multiple slightly-incompatible mechanisms for providing namespace packages. For example, pkgutil supports ``*.pkg`` files; setuptools doesn't. Likewise, setuptools supports inspecting zip files, and supports adding portions to its ``_namespace_packages`` variable, whereas pkgutil doesn't. Namespace packages are designed to support being split across multiple directories (and hence found via multiple ``sys.path`` entries). In this configuration, it doesn't matter if multiple portions all provide an ``__init__.py`` file, so long as each portion correctly initializes the namespace package. However, Linux distribution vendors (amongst others) prefer to combine the separate portions and install them all into the *same* file system directory. This creates a potential for conflict, as the portions are now attempting to provide the *same* file on the target system - something that is not allowed by many package managers. Allowing implicit namespace packages means that the requirement to provide an ``__init__.py`` file can be dropped completely, and affected portions can be installed into a common directory or split across multiple directories as distributions see fit. A namespace package will not be constrained by a fixed ``__path__``, computed from the parent path at namespace package creation time. Consider the standard library ``encodings`` package: 1. Suppose that ``encodings`` becomes a namespace package. 2. It sometimes gets imported during interpreter startup to initialize the standard io streams. 3. An application modifies ``sys.path`` after startup and wants to contribute additional encodings from new path entries. 4. An attempt is made to import an encoding from an ``encodings`` portion that is found on a path entry added in step 3. If the import system was restricted to only finding portions along the value of ``sys.path`` that existed at the time the ``encodings`` namespace package was created, the additional paths added in step 3 would never be searched for the additional portions imported in step 4. In addition, if step 2 were sometimes skipped (due to some runtime flag or other condition), then the path items added in step 3 would indeed be used the first time a portion was imported. Thus this PEP requires that the list of path entries be dynamically computed when each portion is loaded. It is expected that the import machinery will do this efficiently by caching ``__path__`` values and only refreshing them when it detects that the parent path has changed. In the case of a top-level package like ``encodings``, this parent path would be ``sys.path``. Specification ============= Regular packages will continue to have an ``__init__.py`` and will reside in a single directory. Namespace packages cannot contain an ``__init__.py``. As a consequence, ``pkgutil.extend_path`` and ``pkg_resources.declare_namespace`` become obsolete for purposes of namespace package creation. There will be no marker file or directory for specifying a namespace package. During import processing, the import machinery will continue to iterate over each directory in the parent path as it does in Python 3.2. While looking for a module or package named "foo", for each directory in the parent path: * If ``/foo/__init__.py`` is found, a regular package is imported and returned. * If not, but ``/foo.{py,pyc,so,pyd}`` is found, a module is imported and returned. The exact list of extension varies by platform and whether the -O flag is specified. The list here is representative. * If not, but ``/foo`` is found and is a directory, it is recorded and the scan continues with the next directory in the parent path. * Otherwise the scan continues with the next directory in the parent path. If the scan completes without returning a module or package, and at least one directory was recorded, then a namespace package is created. The new namespace package: * Has a ``__path__`` attribute set to an iterable of the path strings that were found and recorded during the scan. * Does not have a ``__file__`` attribute. Note that if "import foo" is executed and "foo" is found as a namespace package (using the above rules), then "foo" is immediately created as a package. The creation of the namespace package is not deferred until a sub-level import occurs. A namespace package is not fundamentally different from a regular package. It is just a different way of creating packages. Once a namespace package is created, there is no functional difference between it and a regular package. Dynamic path computation ------------------------ The import machinery will behave as if a namespace package's ``__path__`` is recomputed before each portion is loaded. For performance reasons, it is expected that this will be achieved by detecting that the parent path has changed. If no change has taken place, then no ``__path__`` recomputation is required. The implementation must ensure that changes to the contents of the parent path are detected, as well as detecting the replacement of the parent path with a new path entry list object. Impact on import finders and loaders ------------------------------------ :pep:`302` defines "finders" that are called to search path elements. These finders' ``find_module`` methods return either a "loader" object or ``None``. For a finder to contribute to namespace packages, it must implement a new ``find_loader(fullname)`` method. ``fullname`` has the same meaning as for ``find_module``. ``find_loader`` always returns a 2-tuple of ``(loader, )``. ``loader`` may be ``None``, in which case ```` (which may be empty) is added to the list of recorded path entries and path searching continues. If ``loader`` is not ``None``, it is immediately used to load a module or regular package. Even if ``loader`` is returned and is not ``None``, ```` must still contain the path entries for the package. This allows code such as ``pkgutil.extend_path()`` to compute path entries for packages that it does not load. Note that multiple path entries per finder are allowed. This is to support the case where a finder discovers multiple namespace portions for a given ``fullname``. Many finders will support only a single namespace package portion per ``find_loader`` call, in which case this iterable will contain only a single string. The import machinery will call ``find_loader`` if it exists, else fall back to ``find_module``. Legacy finders which implement ``find_module`` but not ``find_loader`` will be unable to contribute portions to a namespace package. The specification expands :pep:`302` loaders to include an optional method called ``module_repr()`` which if present, is used to generate module object reprs. See the section below for further details. Differences between namespace packages and regular packages ----------------------------------------------------------- Namespace packages and regular packages are very similar. The differences are: * Portions of namespace packages need not all come from the same directory structure, or even from the same loader. Regular packages are self-contained: all parts live in the same directory hierarchy. * Namespace packages have no ``__file__`` attribute. * Namespace packages' ``__path__`` attribute is a read-only iterable of strings, which is automatically updated when the parent path is modified. * Namespace packages have no ``__init__.py`` module. * Namespace packages have a different type of object for their ``__loader__`` attribute. Namespace packages in the standard library ------------------------------------------ It is possible, and this PEP explicitly allows, that parts of the standard library be implemented as namespace packages. When and if any standard library packages become namespace packages is outside the scope of this PEP. Migrating from legacy namespace packages ---------------------------------------- As described above, prior to this PEP ``pkgutil.extend_path()`` was used by legacy portions to create namespace packages. Because it is likely not practical for all existing portions of a namespace package to be migrated to this PEP at once, ``extend_path()`` will be modified to also recognize :pep:`420` namespace packages. This will allow some portions of a namespace to be legacy portions while others are migrated to :pep:`420`. These hybrid namespace packages will not have the dynamic path computation that normal namespace packages have, since ``extend_path()`` never provided this functionality in the past. Packaging Implications ====================== Multiple portions of a namespace package can be installed into the same directory, or into separate directories. For this section, suppose there are two portions which define "foo.bar" and "foo.baz". "foo" itself is a namespace package. If these are installed in the same location, a single directory "foo" would be in a directory that is on ``sys.path``. Inside "foo" would be two directories, "bar" and "baz". If "foo.bar" is removed (perhaps by an OS package manager), care must be taken not to remove the "foo/baz" or "foo" directories. Note that in this case "foo" will be a namespace package (because it lacks an ``__init__.py``), even though all of its portions are in the same directory. Note that "foo.bar" and "foo.baz" can be installed into the same "foo" directory because they will not have any files in common. If the portions are installed in different locations, two different "foo" directories would be in directories that are on ``sys.path``. "foo/bar" would be in one of these sys.path entries, and "foo/baz" would be in the other. Upon removal of "foo.bar", the "foo/bar" and corresponding "foo" directories can be completely removed. But "foo/baz" and its corresponding "foo" directory cannot be removed. It is also possible to have the "foo.bar" portion installed in a directory on ``sys.path``, and have the "foo.baz" portion provided in a zip file, also on ``sys.path``. Examples ======== Nested namespace packages ------------------------- This example uses the following directory structure:: Lib/test/namespace_pkgs project1 parent child one.py project2 parent child two.py Here, both parent and child are namespace packages: Portions of them exist in different directories, and they do not have ``__init__.py`` files. Here we add the parent directories to ``sys.path``, and show that the portions are correctly found:: >>> import sys >>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2'] >>> import parent.child.one >>> parent.__path__ _NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent']) >>> parent.child.__path__ _NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child']) >>> import parent.child.two >>> Dynamic path computation ------------------------ This example uses a similar directory structure, but adds a third portion:: Lib/test/namespace_pkgs project1 parent child one.py project2 parent child two.py project3 parent child three.py We add ``project1`` and ``project2`` to ``sys.path``, then import ``parent.child.one`` and ``parent.child.two``. Then we add the ``project3`` to ``sys.path`` and when ``parent.child.three`` is imported, ``project3/parent`` is automatically added to ``parent.__path__``:: # add the first two parent paths to sys.path >>> import sys >>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2'] # parent.child.one can be imported, because project1 was added to sys.path: >>> import parent.child.one >>> parent.__path__ _NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent']) # parent.child.__path__ contains project1/parent/child and project2/parent/child, but not project3/parent/child: >>> parent.child.__path__ _NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child']) # parent.child.two can be imported, because project2 was added to sys.path: >>> import parent.child.two # we cannot import parent.child.three, because project3 is not in the path: >>> import parent.child.three Traceback (most recent call last): File "", line 1, in File "", line 1286, in _find_and_load File "", line 1250, in _find_and_load_unlocked ImportError: No module named 'parent.child.three' # now add project3 to sys.path: >>> sys.path.append('Lib/test/namespace_pkgs/project3') # and now parent.child.three can be imported: >>> import parent.child.three # project3/parent has been added to parent.__path__: >>> parent.__path__ _NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent', 'Lib/test/namespace_pkgs/project3/parent']) # and project3/parent/child has been added to parent.child.__path__ >>> parent.child.__path__ _NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child', 'Lib/test/namespace_pkgs/project3/parent/child']) >>> Discussion ========== At PyCon 2012, we had a discussion about namespace packages at which :pep:`382` and :pep:`402` were rejected, to be replaced by this PEP [3]_. There is no intention to remove support of regular packages. If a developer knows that her package will never be a portion of a namespace package, then there is a performance advantage to it being a regular package (with an ``__init__.py``). Creation and loading of a regular package can take place immediately when it is located along the path. With namespace packages, all entries in the path must be scanned before the package is created. Note that an ImportWarning will no longer be raised for a directory lacking an ``__init__.py`` file. Such a directory will now be imported as a namespace package, whereas in prior Python versions an ImportWarning would be raised. Nick Coghlan presented a list of his objections to this proposal [4]_. They are: 1. Implicit package directories go against the Zen of Python. 2. Implicit package directories pose awkward backwards compatibility challenges. 3. Implicit package directories introduce ambiguity into file system layouts. 4. Implicit package directories will permanently entrench current newbie-hostile behavior in ``__main__``. Nick later gave a detailed response to his own objections [5]_, which is summarized here: 1. The practicality of this PEP wins over other proposals and the status quo. 2. Minor backward compatibility issues are okay, as long as they are properly documented. 3. This will be addressed in :pep:`395`. 4. This will also be addressed in :pep:`395`. The inclusion of namespace packages in the standard library was motivated by Martin v. Löwis, who wanted the ``encodings`` package to become a namespace package [6]_. While this PEP allows for standard library packages to become namespaces, it defers a decision on ``encodings``. ``find_module`` versus ``find_loader`` -------------------------------------- An early draft of this PEP specified a change to the ``find_module`` method in order to support namespace packages. It would be modified to return a string in the case where a namespace package portion was discovered. However, this caused a problem with existing code outside of the standard library which calls ``find_module``. Because this code would not be upgraded in concert with changes required by this PEP, it would fail when it would receive unexpected return values from ``find_module``. Because of this incompatibility, this PEP now specifies that finders that want to provide namespace portions must implement the ``find_loader`` method, described above. The use case for supporting multiple portions per ``find_loader`` call is given in [7]_. Dynamic path computation ------------------------ Guido raised a concern that automatic dynamic path computation was an unnecessary feature [8]_. Later in that thread, PJ Eby and Nick Coghlan presented arguments as to why dynamic computation would minimize surprise to Python users. The conclusion of that discussion has been included in this PEP's Rationale section. An earlier version of this PEP required that dynamic path computation could only take affect if the parent path object were modified in-place. That is, this would work:: sys.path.append('new-dir') But this would not:: sys.path = sys.path + ['new-dir'] In the same thread [8]_, it was pointed out that this restriction is not required. If the parent path is looked up by name instead of by holding a reference to it, then there is no restriction on how the parent path is modified or replaced. For a top-level namespace package, the lookup would be the module named ``"sys"`` then its attribute ``"path"``. For a namespace package nested inside a package ``foo``, the lookup would be for the module named ``"foo"`` then its attribute ``"__path__"``. Module reprs ============ Previously, module reprs were hard coded based on assumptions about a module's ``__file__`` attribute. If this attribute existed and was a string, it was assumed to be a file system path, and the module object's repr would include this in its value. The only exception was that :pep:`302` reserved missing ``__file__`` attributes to built-in modules, and in CPython, this assumption was baked into the module object's implementation. Because of this restriction, some modules contained contrived ``__file__`` values that did not reflect file system paths, and which could cause unexpected problems later (e.g. ``os.path.join()`` on a non-path ``__file__`` would return gibberish). This PEP relaxes this constraint, and leaves the setting of ``__file__`` to the purview of the loader producing the module. Loaders may opt to leave ``__file__`` unset if no file system path is appropriate. Loaders may also set additional reserved attributes on the module if useful. This means that the definitive way to determine the origin of a module is to check its ``__loader__`` attribute. For example, namespace packages as described in this PEP will have no ``__file__`` attribute because no corresponding file exists. In order to provide flexibility and descriptiveness in the reprs of such modules, a new optional protocol is added to :pep:`302` loaders. Loaders can implement a ``module_repr()`` method which takes a single argument, the module object. This method should return the string to be used verbatim as the repr of the module. The rules for producing a module repr are now standardized as: * If the module has an ``__loader__`` and that loader has a ``module_repr()`` method, call it with a single argument, which is the module object. The value returned is used as the module's repr. * If an exception occurs in ``module_repr()``, the exception is caught and discarded, and the calculation of the module's repr continues as if ``module_repr()`` did not exist. * If the module has an ``__file__`` attribute, this is used as part of the module's repr. * If the module has no ``__file__`` but does have an ``__loader__``, then the loader's repr is used as part of the module's repr. * Otherwise, just use the module's ``__name__`` in the repr. Here is a snippet showing how namespace module reprs are calculated from its loader:: class NamespaceLoader: @classmethod def module_repr(cls, module): return "".format(module.__name__) Built-in module reprs would no longer need to be hard-coded, but instead would come from their loader as well:: class BuiltinImporter: @classmethod def module_repr(cls, module): return "".format(module.__name__) Here are some example reprs of different types of modules with different sets of the related attributes:: >>> import email >>> email >>> m = type(email)('foo') >>> m >>> m.__file__ = 'zippy:/de/do/dah' >>> m >>> class Loader: pass ... >>> m.__loader__ = Loader >>> del m.__file__ >>> m )> >>> class NewLoader: ... @classmethod ... def module_repr(cls, module): ... return '' ... >>> m.__loader__ = NewLoader >>> m >>> References ========== .. [1] PEP 420 branch (http://hg.python.org/features/pep-420) .. [3] PyCon 2012 Namespace Package discussion outcome (https://mail.python.org/pipermail/import-sig/2012-March/000421.html) .. [4] Nick Coghlan's objection to the lack of marker files or directories (https://mail.python.org/pipermail/import-sig/2012-March/000423.html) .. [5] Nick Coghlan's response to his initial objections (https://mail.python.org/pipermail/import-sig/2012-April/000464.html) .. [6] Martin v. Löwis's suggestion to make ``encodings`` a namespace package (https://mail.python.org/pipermail/import-sig/2012-May/000540.html) .. [7] Use case for multiple portions per ``find_loader`` call (https://mail.python.org/pipermail/import-sig/2012-May/000585.html) .. [8] Discussion about dynamic path computation (https://mail.python.org/pipermail/python-dev/2012-May/119560.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: