PEP: 395 Title: Qualified Names for Modules Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-Mar-2011 Python-Version: 3.3 Post-History: 5-Mar-2011, 19-Nov-2011 Abstract ======== This PEP proposes new mechanisms that eliminate some longstanding traps for the unwary when dealing with Python's import system, as well as serialisation and introspection of functions and classes. It builds on the "Qualified Name" concept defined in PEP 3155. Relationship with Other PEPs ---------------------------- This PEP builds on the "qualified name" concept introduced by PEP 3155, and also shares in that PEP's aim of fixing some ugly corner cases when dealing with serialisation of arbitrary functions and classes. It also builds on PEP 366, which took initial tentative steps towards making explicit relative imports from the main module work correctly in at least *some* circumstances. This PEP is also affected by the two competing "namespace package" PEPs (PEP 382 and PEP 402). This PEP would require some minor adjustments to accommodate PEP 382, but has some critical incompatibilities with respect to the implicit namespace package mechanism proposed in PEP 402. Finally, PEP 328 eliminated implicit relative imports from imported modules. This PEP proposes that the de facto implicit relative imports from main modules that are provided by the current initialisation behaviour for ``sys.path[0]`` also be eliminated. What's in a ``__name__``? ========================= Over time, a module's ``__name__`` attribute has come to be used to handle a number of different tasks. The key use cases identified for this module attribute are: 1. Flagging the main module in a program, using the ``if __name__ == "__main__":`` convention. 2. As the starting point for relative imports 3. To identify the location of function and class definitions within the running application 4. To identify the location of classes for serialisation into pickle objects which may be shared with other interpreter instances Traps for the Unwary ==================== The overloading of the semantics of ``__name__``, along with some historically associated behaviour in the initialisation of ``sys.path[0]``, has resulted in several traps for the unwary. These traps can be quite annoying in practice, as they are highly unobvious (especially to beginners) and can cause quite confusing behaviour. Why are my imports broken? -------------------------- There's a general principle that applies when modifying ``sys.path``: *never* put a package directory directly on ``sys.path``. The reason this is problematic is that every module in that directory is now potentially accessible under two different names: as a top level module (since the package directory is on ``sys.path``) and as a submodule of the package (if the higher level directory containing the package itself is also on ``sys.path``). As an example, Django (up to and including version 1.3) is guilty of setting up exactly this situation for site-specific applications - the application ends up being accessible as both ``app`` and ``site.app`` in the module namespace, and these are actually two *different* copies of the module. This is a recipe for confusion if there is any meaningful mutable module level state, so this behaviour is being eliminated from the default site set up in version 1.4 (site-specific apps will always be fully qualified with the site name). However, it's hard to blame Django for this, when the same part of Python responsible for setting ``__name__ = "__main__"`` in the main module commits the exact same error when determining the value for ``sys.path[0]``. The impact of this can be seen relatively frequently if you follow the "python" and "import" tags on Stack Overflow. When I had the time to follow it myself, I regularly encountered people struggling to understand the behaviour of straightforward package layouts like the following (I actually use package layouts along these lines in my own projects):: project/ setup.py example/ __init__.py foo.py tests/ __init__.py test_foo.py While I would often see it without the ``__init__.py`` files first, that's a trivial fix to explain. What's hard to explain is that all of the following ways to invoke ``test_foo.py`` *probably won't work* due to broken imports (either failing to find ``example`` for absolute imports, complaining about relative imports in a non-package or beyond the toplevel package for explicit relative imports, or issuing even more obscure errors if some other submodule happens to shadow the name of a top-level module, such as an ``example.json`` module that handled serialisation or an ``example.tests.unittest`` test runner):: # These commands will most likely *FAIL*, even if the code is correct # working directory: project/example/tests ./test_foo.py python test_foo.py python -m package.tests.test_foo python -c "from package.tests.test_foo import main; main()" # working directory: project/package tests/test_foo.py python tests/test_foo.py python -m package.tests.test_foo python -c "from package.tests.test_foo import main; main()" # working directory: project example/tests/test_foo.py python example/tests/test_foo.py # working directory: project/.. project/example/tests/test_foo.py python project/example/tests/test_foo.py # The -m and -c approaches don't work from here either, but the failure # to find 'package' correctly is easier to explain in this case That's right, that long list is of all the methods of invocation that will almost certainly *break* if you try them, and the error messages won't make any sense if you're not already intimately familiar not only with the way Python's import system works, but also with how it gets initialised. For a long time, the only way to get ``sys.path`` right with that kind of setup was to either set it manually in ``test_foo.py`` itself (hardly something a novice, or even many veteran, Python programmers are going to know how to do) or else to make sure to import the module instead of executing it directly:: # working directory: project python -c "from package.tests.test_foo import main; main()" Since the implementation of PEP 366 (which defined a mechanism that allows relative imports to work correctly when a module inside a package is executed via the ``-m`` switch), the following also works properly:: # working directory: project python -m package.tests.test_foo The fact that most methods of invoking Python code from the command line break when that code is inside a package, and the two that do work are highly sensitive to the current working directory is all thoroughly confusing for a beginner. I personally believe it is one of the key factors leading to the perception that Python packages are complicated and hard to get right. This problem isn't even limited to the command line - if ``test_foo.py`` is open in Idle and you attempt to run it by pressing F5, or if you try to run it by clicking on it in a graphical filebrowser, then it will fail in just the same way it would if run directly from the command line. There's a reason the general "no package directories on ``sys.path``" guideline exists, and the fact that the interpreter itself doesn't follow it when determining ``sys.path[0]`` is the root cause of all sorts of grief. In the past, this couldn't be fixed due to backwards compatibility concerns. However, scripts potentially affected by this problem will *already* require fixes when porting to the Python 3.x (due to the elimination of implicit relative imports when importing modules normally). This provides a convenient opportunity to implement a corresponding change in the initialisation semantics for ``sys.path[0]``. Importing the main module twice ------------------------------- Another venerable trap is the issue of importing ``__main__`` twice. This occurs when the main module is also imported under its real name, effectively creating two instances of the same module under different names. If the state stored in ``__main__`` is significant to the correct operation of the program, or if there is top-level code in the main module that has non-idempotent side effects, then this duplication can cause obscure and surprising errors. In a bit of a pickle -------------------- Something many users may not realise is that the ``pickle`` module sometimes relies on the ``__module__`` attribute when serialising instances of arbitrary classes. So instances of classes defined in ``__main__`` are pickled that way, and won't be unpickled correctly by another python instance that only imported that module instead of running it directly. This behaviour is the underlying reason for the advice from many Python veterans to do as little as possible in the ``__main__`` module in any application that involves any form of object serialisation and persistence. Similarly, when creating a pseudo-module (see next paragraph), pickles rely on the name of the module where a class is actually defined, rather than the officially documented location for that class in the module hierarchy. For the purposes of this PEP, a "pseudo-module" is a package designed like the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These packages are documented as if they were single modules, but are in fact internally implemented as a package. This is *supposed* to be an implementation detail that users and other implementations don't need to worry about, but, thanks to ``pickle`` (and serialisation in general), the details are often exposed and can effectively become part of the public API. While this PEP focuses specifically on ``pickle`` as the principal serialisation scheme in the standard library, this issue may also affect other mechanisms that support serialisation of arbitrary class instances and rely on ``__module__`` attributes to determine how to handle deserialisation. Where's the source? ------------------- Some sophisticated users of the pseudo-module technique described above recognise the problem with implementation details leaking out via the ``pickle`` module, and choose to address it by altering ``__name__`` to refer to the public location for the module before defining any functions or classes (or else by modifying the ``__module__`` attributes of those objects after they have been defined). This approach is effective at eliminating the leakage of information via pickling, but comes at the cost of breaking introspection for functions and classes (as their ``__module__`` attribute now points to the wrong place). Forkless Windows ---------------- To get around the lack of ``os.fork`` on Windows, the ``multiprocessing`` module attempts to re-execute Python with the same main module, but skipping over any code guarded by ``if __name__ == "__main__":`` checks. It does the best it can with the information it has, but is forced to make assumptions that simply aren't valid whenever the main module isn't an ordinary directly executed script or top-level module. Packages and non-top-level modules executed via the ``-m`` switch, as well as directly executed zipfiles or directories, are likely to make multiprocessing on Windows do the wrong thing (either quietly or noisily, depending on application details) when spawning a new process. While this issue currently only affects Windows directly, it also impacts any proposals to provide Windows-style "clean process" invocation via the multiprocessing module on other platforms. Qualified Names for Modules =========================== To make it feasible to fix these problems once and for all, it is proposed to add a new module level attribute: ``__qualname__``. This abbreviation of "qualified name" is taken from PEP 3155, where it is used to store the naming path to a nested class or function definition relative to the top level module. For modules, ``__qualname__`` will normally be the same as ``__name__``, just as it is for top-level functions and classes in PEP 3155. However, it will differ in some situations so that the above problems can be addressed. Specifically, whenever ``__name__`` is modified for some other purpose (such as to denote the main module), then ``__qualname__`` will remain unchanged, allowing code that needs it to access the original unmodified value. If a module loader does not initialise ``__qualname__`` itself, then the import system will add it automatically (setting it to the same value as ``__name__``). Alternative Names ----------------- Two alternative names were also considered for the new attribute: "full name" (``__fullname__``) and "implementation name" (``__implname__``). Either of those would actually be valid for the use case in this PEP. However, as a meta-issue, PEP 3155 is *also* adding a new attribute (for functions and classes) that is "like ``__name__``, but different in some cases where ``__name__`` is missing necessary information" and those terms aren't accurate for the PEP 3155 function and class use case. PEP 3155 deliberately omits the module information, so the term "full name" is simply untrue, and "implementation name" implies that it may specify an object other than that specified by ``__name__``, and that is never the case for PEP 3155 (in that PEP, ``__name__`` and ``__qualname__`` always refer to the same function or class, it's just that ``__name__`` is insufficient to accurately identify nested functions and classes). Since it seems needlessly inconsistent to add *two* new terms for attributes that only exist because backwards compatibility concerns keep us from changing the behaviour of ``__name__`` itself, this PEP instead chose to adopt the PEP 3155 terminology. If the relative inscrutability of "qualified name" and ``__qualname__`` encourages interested developers to look them up at least once rather than assuming they know what they mean just from the name and guessing wrong, that's not necessarily a bad outcome. Besides, 99% of Python developers should never need to even care these extra attributes exist - they're really an implementation detail to let us fix a few problematic behaviours exhibited by imports, pickling and introspection, not something people are going to be dealing with on a regular basis. Eliminating the Traps ===================== The following changes are interrelated and make the most sense when considered together. They collectively either completely eliminate the traps for the unwary noted above, or else provide straightforward mechanisms for dealing with them. A rough draft of some of the concepts presented here was first posted on the python-ideas list ([1]_), but they have evolved considerably since first being discussed in that thread. Further discussion has subsequently taken place on the import-sig mailing list ([2]_. [3]_). Fixing main module imports inside packages ------------------------------------------ To eliminate this trap, it is proposed that an additional filesystem check be performed when determining a suitable value for ``sys.path[0]``. This check will look for Python's explicit package directory markers and use them to find the appropriate directory to add to ``sys.path``. The current algorithm for setting ``sys.path[0]`` in relevant cases is roughly as follows:: # Interactive prompt, -m switch, -c switch sys.path.insert(0, '') :: # Valid sys.path entry execution (i.e. directory and zip execution) sys.path.insert(0, sys.argv[0]) :: # Direct script execution sys.path.insert(0, os.path.dirname(sys.argv[0])) It is proposed that this initialisation process be modified to take package details stored on the filesystem into account:: # Interactive prompt, -m switch, -c switch in_package, path_entry, _ignored = split_path_module(os.getcwd(), '') if in_package: sys.path.insert(0, path_entry) else: sys.path.insert(0, '') # Start interactive prompt or run -c command as usual # __main__.__qualname__ is set to "__main__" # The -m switches uses the same sys.path[0] calculation, but: # modname is the argument to the -m switch # modname is passed to ``runpy._run_module_as_main()`` as usual # __main__.__qualname__ is set to modname :: # Valid sys.path entry execution (i.e. directory and zip execution) modname = "__main__" path_entry, modname = split_path_module(sys.argv[0], modname) sys.path.insert(0, path_entry) # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()`` # __main__.__qualname__ is set to modname :: # Direct script execution in_package, path_entry, modname = split_path_module(sys.argv[0]) sys.path.insert(0, path_entry) if in_package: # Pass modname to ``runpy._run_module_as_main()`` else: # Run script directly # __main__.__qualname__ is set to modname The ``split_path_module()`` supporting function used in the above pseudo-code would have the following semantics:: def _splitmodname(fspath): path_entry, fname = os.path.split(fspath) modname = os.path.splitext(fname)[0] return path_entry, modname def _is_package_dir(fspath): return any(os.exists("__init__" + info[0]) for info in imp.get_suffixes()) def split_path_module(fspath, modname=None): """Given a filesystem path and a relative module name, determine an appropriate sys.path entry and a fully qualified module name. Returns a 3-tuple of (package_depth, fspath, modname). A reported package depth of 0 indicates that this would be a top level import. If no relative module name is given, it is derived from the final component in the supplied path with the extension stripped. """ if modname is None: fspath, modname = _splitmodname(fspath) package_depth = 0 while _is_package_dir(fspath): fspath, pkg = _splitmodname(fspath) modname = pkg + '.' + modname return package_depth, fspath, modname This PEP also proposes that the ``split_path_module()`` functionality be exposed directly to Python users via the ``runpy`` module. With this fix in place, and the same simple package layout described earlier, *all* of the following commands would invoke the test suite correctly:: # working directory: project/example/tests ./test_foo.py python test_foo.py python -m package.tests.test_foo python -c "from .test_foo import main; main()" python -c "from ..tests.test_foo import main; main()" python -c "from package.tests.test_foo import main; main()" # working directory: project/package tests/test_foo.py python tests/test_foo.py python -m package.tests.test_foo python -c "from .tests.test_foo import main; main()" python -c "from package.tests.test_foo import main; main()" # working directory: project example/tests/test_foo.py python example/tests/test_foo.py python -m package.tests.test_foo python -c "from package.tests.test_foo import main; main()" # working directory: project/.. project/example/tests/test_foo.py python project/example/tests/test_foo.py # The -m and -c approaches still don't work from here, but the failure # to find 'package' correctly is pretty easy to explain in this case With these changes, clicking Python modules in a graphical file browser should always execute them correctly, even if they live inside a package. Depending on the details of how it invokes the script, Idle would likely also be able to run ``test_foo.py`` correctly with F5, without needing any Idle specific fixes. Optional addition: command line relative imports ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With the above changes in place, it would be a fairly minor addition to allow explicit relative imports as arguments to the ``-m`` switch:: # working directory: project/example/tests python -m .test_foo python -m ..tests.test_foo # working directory: project/example/ python -m .tests.test_foo With this addition, system initialisation for the ``-m`` switch would change as follows:: # -m switch (permitting explicit relative imports) in_package, path_entry, pkg_name = split_path_module(os.getcwd(), '') qualname= <> if qualname.startswith('.'): modname = qualname while modname.startswith('.'): modname = modname[1:] pkg_name, sep, _ignored = pkg_name.rpartition('.') if not sep: raise ImportError("Attempted relative import beyond top level package") qualname = pkg_name + '.' modname if in_package: sys.path.insert(0, path_entry) else: sys.path.insert(0, '') # qualname is passed to ``runpy._run_module_as_main()`` # _main__.__qualname__ is set to qualname Compatibility with PEP 382 ~~~~~~~~~~~~~~~~~~~~~~~~~~ Making this proposal compatible with the PEP 382 namespace packaging PEP is trivial. The semantics of ``_is_package_dir()`` are merely changed to be:: def _is_package_dir(fspath): return (fspath.endswith(".pyp") or any(os.exists("__init__" + info[0]) for info in imp.get_suffixes())) Incompatibility with PEP 402 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PEP 402 proposes the elimination of explicit markers in the file system for Python packages. This fundamentally breaks the proposed concept of being able to take a filesystem path and a Python module name and work out an unambiguous mapping to the Python module namespace. Instead, the appropriate mapping would depend on the current values in ``sys.path``, rendering it impossible to ever fix the problems described above with the calculation of ``sys.path[0]`` when the interpreter is initialised. While some aspects of this PEP could probably be salvaged if PEP 402 were adopted, the core concept of making import semantics from main and other modules more consistent would no longer be feasible. This incompatibility is discussed in more detail in the relevant import-sig threads ([2]_, [3]_). Potential incompatibilities with scripts stored in packages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The proposed change to ``sys.path[0]`` initialisation *may* break some existing code. Specifically, it will break scripts stored in package directories that rely on the implicit relative imports from ``__main__`` in order to run correctly under Python 3. While such scripts could be imported in Python 2 (due to implicit relative imports) it is already the case that they cannot be imported in Python 3, as implicit relative imports are no longer permitted when a module is imported. By disallowing implicit relatives imports from the main module as well, such modules won't even work as scripts with this PEP. Switching them over to explicit relative imports will then get them working again as both executable scripts *and* as importable modules. To support earlier versions of Python, a script could be written to use different forms of import based on the Python version:: if __name__ == "__main__" and sys.version_info < (3, 3): import peer # Implicit relative import else: from . import peer # explicit relative import Fixing dual imports of the main module -------------------------------------- Given the above proposal to get ``__qualname__`` consistently set correctly in the main module, one simple change is proposed to eliminate the problem of dual imports of the main module: the addition of a ``sys.metapath`` hook that detects attempts to import ``__main__`` under its real name and returns the original main module instead:: class AliasImporter: def __init__(self, module, alias): self.module = module self.alias = alias def __repr__(self): fmt = "{0.__class__.__name__}({0.module.__name__}, {0.alias})" return fmt.format(self) def find_module(self, fullname, path=None): if path is None and fullname == self.alias: return self return None def load_module(self, fullname): if fullname != self.alias: raise ImportError("{!r} cannot load {!r}".format(self, fullname)) return self.main_module This metapath hook would be added automatically during import system initialisation based on the following logic:: main = sys.modules["__main__"] if main.__name__ != main.__qualname__: sys.metapath.append(AliasImporter(main, main.__qualname__)) This is probably the least important proposal in the PEP - it just closes off the last mechanism that is likely to lead to module duplication after the configuration of ``sys.path[0]`` at interpreter startup is addressed. Fixing pickling without breaking introspection ---------------------------------------------- To fix this problem, it is proposed to make use of the new module level ``__qualname__`` attributes to determine the real module location when ``__name__`` has been modified for any reason. In the main module, ``__qualname__`` will automatically be set to the main module's "real" name (as described above) by the interpreter. Pseudo-modules that adjust ``__name__`` to point to the public namespace will leave ``__qualname__`` untouched, so the implementation location remains readily accessible for introspection. If ``__name__`` is adjusted at the top of a module, then this will automatically adjust the ``__module__`` attribute for all functions and classes subsequently defined in that module. Since multiple submodules may be set to use the same "public" namespace, functions and classes will be given a new ``__qualmodule__`` attribute that refers to the ``__qualname__`` of their module. This isn't strictly necessary for functions (you could find out their module's qualified name by looking in their globals dictionary), but it is needed for classes, since they don't hold a reference to the globals of their defining module. Once a new attribute is added to classes, it is more convenient to keep the API consistent and add a new attribute to functions as well. These changes mean that adjusting ``__name__`` (and, either directly or indirectly, the corresponding function and class ``__module__`` attributes) becomes the officially sanctioned way to implement a namespace as a package, while exposing the API as if it were still a single module. All serialisation code that currently uses ``__name__`` and ``__module__`` attributes will then avoid exposing implementation details by default. To correctly handle serialisation of items from the main module, the class and function definition logic will be updated to also use ``__qualname__`` for the ``__module__`` attribute in the case where ``__name__ == "__main__"``. With ``__name__`` and ``__module__`` being officially blessed as being used for the *public* names of things, the introspection tools in the standard library will be updated to use ``__qualname__`` and ``__qualmodule__`` where appropriate. For example: - ``pydoc`` will report both public and qualified names for modules - ``inspect.getsource()`` (and similar tools) will use the qualified names that point to the implementation of the code - additional ``pydoc`` and/or ``inspect`` APIs may be provided that report all modules with a given public ``__name__``. Fixing multiprocessing on Windows --------------------------------- With ``__qualname__`` now available to tell ``multiprocessing`` the real name of the main module, it will be able to simply include it in the serialised information passed to the child process, eliminating the need for the current dubious introspection of the ``__file__`` attribute. For older Python versions, ``multiprocessing`` could be improved by applying the ``split_path_module()`` algorithm described above when attempting to work out how to execute the main module based on its ``__file__`` attribute. Explicit relative imports ========================= This PEP proposes that ``__package__`` be unconditionally defined in the main module as ``__qualname__.rpartition('.')[0]``. Aside from that, it proposes that the behaviour of explicit relative imports be left alone. In particular, if ``__package__`` is not set in a module when an explicit relative import occurs, the automatically cached value will continue to be derived from ``__name__`` rather than ``__qualname__``. This minimises any backwards incompatibilities with existing code that deliberately manipulates relative imports by adjusting ``__name__`` rather than setting ``__package__`` directly. This PEP does *not* propose that ``__package__`` be deprecated. While it is technically redundant following the introduction of ``__qualname__``, it just isn't worth the hassle of deprecating it within the lifetime of Python 3.x. Reference Implementation ======================== None as yet. References ========== .. [1] Module aliases and/or "real names" (http://mail.python.org/pipermail/python-ideas/2011-January/008983.html) .. [2] PEP 395 (Module aliasing) and the namespace PEPs (http://mail.python.org/pipermail/import-sig/2011-November/000382.html) .. [3] Updated PEP 395 (aka "Implicit Relative Imports Must Die!") (http://mail.python.org/pipermail/import-sig/2011-November/000397.html) .. [4] Elaboration of compatibility problems between this PEP and PEP 402 (http://mail.python.org/pipermail/import-sig/2011-November/000403.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: