diff --git a/pep-0395.txt b/pep-0395.txt index decab3e60..5709e0a65 100644 --- a/pep-0395.txt +++ b/pep-0395.txt @@ -1,5 +1,5 @@ PEP: 395 -Title: Module Aliasing +Title: Qualifed Names for Modules Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan @@ -8,19 +8,36 @@ Type: Standards Track Content-Type: text/x-rst Created: 4-Mar-2011 Python-Version: 3.3 -Post-History: 5-Mar-2011 +Post-History: 5-Mar-2011, 19-Nov-2011 Abstract ======== This PEP proposes new mechanisms that eliminate some longstanding traps for -the unwary when dealing with Python's import system, the pickle module and -introspection interfaces. +the unwary when dealing with Python's import system, as well as serialisation +and introspection of functions and classes. It builds on the "Qualified Name" concept defined in PEP 3155. +Relationship with Other PEPs +---------------------------- + +This PEP builds on the "qualified name" concept introduced by PEP 3155, and +also shares in that PEP's aim of fixing some ugly corner cases when dealing +with serialisation of arbitrary functions and classes. + +It is also affected by the two competing "namespace package" PEPs (PEP 382 +and PEP 402). This PEP would require some minor adjustments to accommodate +PEP 382, but has some critical incompatibilities with respect to the namespace +package mechanism proposed in PEP 402. + +Finally, PEP 328 eliminated implicit relative imports from imported modules. +This PEP proposes that implicit relative imports from main modules also be +eliminated. + + What's in a ``__name__``? ========================= @@ -48,35 +65,122 @@ the time, you won't even notice them, which just makes them all the more surprising when they do come up. +Why are my imports broken? +-------------------------- + +There's a general principle that applies when modifying ``sys.path``: *never* +put a package directory directly on ``sys.path``. The reason this is +problematic is that every module in that directory is now potentially +accessible under two different names: as a top level module (since the +package directory is on ``sys.path``) and as a submodule of the package (if +the higher level directory containing the package itself is also on +``sys.path``). + +As an example, Django (up to and including version 1.3) is guilty of setting +up exactly this situation for site-specific applications - the application +ends up being accessible as both ``app`` and ``site.app`` in the module +namespace, and these are actually two *different* copies of the module. This +is a recipe for confusion if there is any meaningful mutable module level +state, so this behaviour is being eliminated from the default site set up in +version 1.4 (site-specific apps will always be fully qualified with the site +name). + +However, it's hard to blame Django for this, when the same part of Python +responsible for setting ``__name__ = "__main__"`` in the main module commits +the exact same error when determining the value for ``sys.path[0]``. + +The impact of this can be seen relatively frequently if you follow the +"python" and "import" tags on Stack Overflow. When I had the time to follow +it myself, I regularly encountered people struggling to understand the +behaviour of straightforward package layouts like the following:: + + project/ + setup.py + package/ + __init__.py + foo.py + tests/ + __init__.py + test_foo.py + +I would actually often see it without the ``__init__.py`` files first, but +that's a trivial fix to explain. What's hard to explain is that all of the +following ways to invoke ``test_foo.py`` *probably won't work* due to broken +imports (either failing to find ``package`` for absolute imports, complaining +about relative imports in a non-package for explicit relative imports, or +issuing even more obscure errors if some other submodule happens to shadow +the name of a top-level module, such as a ``package.json`` module that +handled serialisation or a ``package.tests.unittest`` test runner):: + + # working directory: project/package/tests + ./test_foo.py + python test_foo.py + python -m test_foo + python -c "from test_foo import main; main()" + + # working directory: project/package + tests/test_foo.py + python tests/test_foo.py + python -m tests.test_foo + python -c "from tests.test_foo import main; main()" + + # working directory: project + package/tests/test_foo.py + python package/tests/test_foo.py + + # working directory: project/.. + project/package/tests/test_foo.py + python project/package/tests/test_foo.py + # The -m and -c approaches don't work from here either, but the failure + # to find 'package' correctly is pretty easy to explain in this case + +That's right, that long list is of all the methods of invocation that will +almost certainly *break* if you try them, and the error messages won't make +any sense if you're not already intimately not only with the way Python's +import system works, but also with how it gets initialised. + +For a long time, the only way to get ``sys.path`` right with that kind of +setup was to either set it manually in ``test_foo.py`` itself (hardly +something a novice, or even many veteran, Python programmers are going to +know how to do) or else to make sure to import the module instead of +executing it directly:: + + # working directory: project + python -c "from package.tests.test_foo import main; main()" + +Since the implementation of PEP 366 (which defined a mechanism that allows +relative imports to work correctly when a module inside a package is executed +via the ``-m`` switch), the following also works properly:: + + # working directory: project + python -m package.tests.test_foo + +The fact that most methods of invoking Python code from the command line +break when that code is inside a package, and the two that do work are highly +sensitive to the current working directory is all thoroughly confusing for a +beginner, and I personally believe it is one of the key factors leading +to the perception that Python packages are complicated and hard to get right. + +This problem isn't even limited to the command line - if ``test_foo.py`` is +open in Idle and you attempt to run it by pressing F5, then it will fail in +just the same way it would if run directly from the command line. + +There's a reason the general ``sys.path`` guideline mentioned above exists, +and the fact that the interpreter itself doesn't follow it when determining +``sys.path[0]`` is the root cause of all sorts of grief. + + Importing the main module twice ------------------------------- -The most venerable of these traps is the issue of (effectively) importing -``__main__`` twice. This occurs when the main module is also imported under -its real name, effectively creating two instances of the same module under +Another venerable trap is the issue of (effectively) importing ``__main__`` +twice. This occurs when the main module is also imported under its real +name, effectively creating two instances of the same module under different names. -This problem used to be significantly worse due to implicit relative imports -from the main module, but the switch to allowing only absolute imports and -explicit relative imports means this issue is now restricted to affecting the -main module itself. - - -Why are my relative imports broken? ------------------------------------ - -PEP 366 defines a mechanism that allows relative imports to work correctly -when a module inside a package is executed via the ``-m`` switch. - -Unfortunately, many users still attempt to directly execute scripts inside -packages. While this no longer silently does the wrong thing by -creating duplicate copies of peer modules due to implicit relative imports, it -now fails noisily at the first explicit relative import, even though the -interpreter actually has sufficient information available on the filesystem to -make it work properly. - - +If the state stored in ``__main__`` is significant to the correct operation +of the program, then this duplication can cause obscure and surprising +errors. In a bit of a pickle @@ -91,21 +195,23 @@ advice from many Python veterans to do as little as possible in the ``__main__`` module in any application that involves any form of object serialisation and persistence. -Similarly, when creating a pseudo-module\*, pickles rely on the name of the +Similarly, when creating a pseudo-module, pickles rely on the name of the module where a class is actually defined, rather than the officially documented location for that class in the module hierarchy. -While this PEP focuses specifically on ``pickle`` as the principal -serialisation scheme in the standard library, this issue may also affect -other mechanisms that support serialisation of arbitrary class instances. - -\*For the purposes of this PEP, a "pseudo-module" is a package designed like +For the purposes of this PEP, a "pseudo-module" is a package designed like the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These packages are documented as if they were single modules, but are in fact internally implemented as a package. This is *supposed* to be an -implementation detail that users and other implementations don't need to worry -about, but, thanks to ``pickle`` (and serialisation in general), the details -are exposed and effectively become part of the public API. +implementation detail that users and other implementations don't need to +worry about, but, thanks to ``pickle`` (and serialisation in general), +the details are often exposed and can effectively become part of the public +API. + +While this PEP focuses specifically on ``pickle`` as the principal +serialisation scheme in the standard library, this issue may also affect +other mechanisms that support serialisation of arbitrary class instances +and rely on ``__name__`` to determine how to handle deserialisation. Where's the source? @@ -141,8 +247,30 @@ any proposals to provide Windows-style "clean process" invocation via the multiprocessing module on other platforms. -Proposed Changes -================ +Qualified Names for Modules +=========================== + +To make it feasible to fix these problems once and for all, it is proposed +to add a new module level attribute: ``__qualname__``. This abbreviation of +"qualified name" is taken from PEP 3155, where it is used to store the naming +path to a nested class or function definition relative to the top level +module. + +If a module loader does not initialise ``__qualname__`` itself, then the +import system will add it automatically (setting it to the same value as +``__name__``). + +For modules, ``__qualname__`` will normally be the same as ``__name__``, just +as it is for top-level functions and classes in PEP 3155. However, it will +differ in some situations so that the above problems can be addressed. + +Specifically, whenever ``__name__`` is modified for some other purpose (such +as to denote the main module), then ``__qualname__`` will remain unchanged, +allowing code that needs it to access the original unmodified value. + + +Eliminating the Traps +===================== The following changes are interrelated and make the most sense when considered together. They collectively either completely eliminate the traps @@ -150,105 +278,281 @@ for the unwary noted above, or else provide straightforward mechanisms for dealing with them. A rough draft of some of the concepts presented here was first posted on the -python-ideas list [1], but they have evolved considerably since first being -discussed in that thread. +python-ideas list [1]_, but they have evolved considerably since first being +discussed in that thread. Further discussion has subsequently taken place on +import-sig [2]_. + + +Fixing main module imports inside packages +------------------------------------------ + +To eliminate this trap, it is proposed that an additional filesystem check be +performed when determining a suitable value for ``sys.path[0]``. This check +will look for Python's explicit package directory markers and use them to find +the appropriate directory to add to ``sys.path``. + +The current algorithm for setting ``sys.path[0]`` in relevant cases is roughly +as follows: + + # Interactive prompt, -m switch, -c switch + sys.path.insert(0, '') + + # Valid sys.path entry execution (i.e. directory and zip execution) + sys.path.insert(0, sys.argv[0]) + + # Direct script execution + sys.path.insert(0, os.path.dirname(sys.argv[0])) + +It is proposed that this initialisation process be modified to take +package details stored on the filesystem into account:: + + # Interactive prompt, -c switch + in_package, path_entry, modname = split_path_module(os.getcwd(), '') + if in_package: + sys.path.insert(0, path_entry) + else: + sys.path.insert(0, '') + # Start interactive prompt or run -c command as usual + # __main__.__qualname__ is set to "__main__" + + # -m switch + modname = <> + in_package, path_entry, modname = split_path_module(os.getcwd(), modname) + if in_package: + sys.path.insert(0, path_entry) + else: + sys.path.insert(0, '') + # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()`` + # __main__.__qualname__ is set to modname + + # Valid sys.path entry execution (i.e. directory and zip execution) + modname = "__main__" + path_entry, modname = split_path_module(sys.argv[0], modname) + sys.path.insert(0, path_entry) + # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()`` + # __main__.__qualname__ is set to modname + + # Direct script execution + in_package, path_entry, modname = split_path_module(sys.argv[0]) + sys.path.insert(0, path_entry) + if in_package: + # Pass modname to ``runpy._run_module_as_main()`` + else: + # Run script directly + # __main__.__qualname__ is set to modname + +The ``split_path_module()`` supporting function used in the above pseudo-code +would have the following semantics:: + + def _splitmodname(fspath): + path_entry, fname = os.path.split(fspath) + modname = os.path.splitext(fname)[0] + return path_entry, modname + + def _is_package_dir(fspath): + return any(os.exists("__init__" + info[0]) for info + in imp.get_suffixes()) + + def split_path_module(fspath, modname=None): + """Given a filesystem path and a relative module name, determine an + appropriate sys.path entry and a fully qualified module name. + + Returns a 3-tuple of (package_depth, fspath, modname). A reported + package depth of 0 indicates that this would be a top level import. + + If no relative module name is given, it is derived from the final + component in the supplied path with the extension stripped. + """ + if modname is None: + fspath, modname = _splitmodname(fspath) + package_depth = 0 + while _is_package_dir(fspath): + fspath, pkg = _splitmodname(fspath) + modname = pkg + '.' + modname + return package_depth, fspath, modname + +This PEP also proposes that the ``split_path_module()`` functionality be +exposed directly to Python users via the ``runpy`` module. + + +Compatibility with PEP 382 +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Making this proposal compatible with the PEP 382 namespace packaging PEP is +trivial. The semantics of ``_is_package_dir()`` are merely changed to be:: + + def _is_package_dir(fspath): + return (fspath.endswith(".pyp") or + any(os.exists("__init__" + info[0]) for info + in imp.get_suffixes())) + + +Incompatibility with PEP 402 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +PEP 402 proposes the elimination of explicit markers in the file system for +Python packages. This fundamentally breaks the proposed concept of being able +to take a filesystem path and a Python module name and work out an unambiguous +mapping to the Python module namespace. Instead, the appropriate mapping +would depend on the current values in ``sys.path``, rendering it impossible +to ever fix the problems described above with the calculation of +``sys.path[0]`` when the interpreter is initialised. + +While some aspects of this PEP could probably be salvaged if PEP 402 were +adopted, the core concept of making import semantics from main and other +modules more consistent would no longer be feasible. + +This incompatibility is discussed in more detail in the relevant import-sig +thread [2]_. + + +Potential incompatibilities with scripts stored in packages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The proposed change to ``sys.path[0]`` initialisation *may* break some +existing code. Specifically, it will break scripts stored in package +directories that rely on the implicit relative imports from ``__main__`` in +order to run correctly under Python 3. + +While such scripts could be imported in Python 2 (due to implicit relative +imports) it is already the case that they cannot be imported in Python 3, +as implicit relative imports are no longer permitted when a module is +imported. + +By disallowing implicit relatives imports from the main module as well, +such modules won't even work as scripts with this PEP. Switching them +over to explicit relative imports will then get them working again as +both executable scripts *and* as importable modules. + +To support earlier versions of Python, a script could be written to use +different forms of import based on the Python version:: + + if __name__ == "__main__" and sys.version_info < (3, 3): + import peer # Implicit relative import + else: + from . import peer # explicit relative import Fixing dual imports of the main module -------------------------------------- -Two simple changes are proposed to fix this problem: +Given the above proposal to get ``__qualname__`` consistently set correctly +in the main module, one simple change is proposed to eliminate the problem +of dual imports of the main module: the addition of a ``sys.metapath`` hook +that detects attempts to import ``__main__`` under its real name and returns +the original main module instead:: -1. In ``runpy``, modify the implementation of the ``-m`` switch handling to - install the specified module in ``sys.modules`` under both its real name - and the name ``__main__``. (Currently it is only installed as the latter) -2. When directly executing a module, install it in ``sys.modules`` under - ``os.path.splitext(os.path.basename(__file__))[0]`` as well as under - ``__main__``. + class AliasImporter: + def __init__(self, module, alias): + self.module = module + self.alias = alias -With the main module also stored under its "real" name, attempts to import it -will pick it up from the ``sys.modules`` cache rather than reimporting it -under the new name. + def __repr__(self): + fmt = "{0.__class__.__name__}({0.module.__name__}, {0.alias})" + return fmt.format(self) + def find_module(self, fullname, path=None): + if path is None and fullname == self.alias: + return self + return None -Fixing direct execution inside packages ---------------------------------------- + def load_module(self, fullname): + if fullname != self.alias: + raise ImportError("{!r} cannot load {!r}".format(self, fullname)) + return self.main_module -To fix this problem, it is proposed that an additional filesystem check be -performed before proceeding with direct execution of a ``PY_SOURCE`` or -``PY_COMPILED`` file that has been named on the command line. +This metapath hook would be added automatically during import system +initialisation based on the following logic:: -This additional check would look for an ``__init__`` file that is a peer to -the specified file with a matching extension (either ``.py``, ``.pyc`` or -``.pyo``, depending what was passed on the command line). + main = sys.modules["__main__"] + if main.__name__ != main.__qualname__: + sys.metapath.append(AliasImporter(main, main.__qualname__)) -If this check fails to find anything, direct execution proceeds as usual. - -If, however, it finds something, execution is handed over to a -helper function in the ``runpy`` module that ``runpy.run_path`` also invokes -in the same circumstances. That function will walk back up the -directory hierarchy from the supplied path, looking for the first directory -that doesn't contain an ``__init__`` file. Once that directory is found, it -will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and -``runpy._run_module_as_main`` will be invoked with the appropriate module -name (as calculated based on the original filename and the directories -traversed while looking for a directory without an ``__init__`` file). - -The two current PEPs for namespace packages (PEP 382 and PEP 402) would both -affect this part of the proposal. For PEP 382 (with its current suggestion of -"\*.pyp" package directories, this check would instead just walk up the -supplied path, looking for the first non-package directory (this would not -require any filesystem stat calls). Since PEP 402 deliberately omits explicit -directory markers, it would need an alternative approach, based on checking -the supplied path against the contents of ``sys.path``. In both cases, the -direct execution behaviour can still be corrected. +This is probably the least important proposal in the PEP - it just +closes off the last mechanism that is likely to lead to module duplication +after the configuration of ``sys.path[0]`` at interpreter startup is +addressed. Fixing pickling without breaking introspection ---------------------------------------------- -To fix this problem, it is proposed to add a new optional module level -attribute: ``__qname__``. This abbreviation of "qualified name" is taken -from PEP 3155, where it is used to store the naming path to a nested class -or function definition relative to the top level module. By default, -``__qname__`` will be the same as ``__name__``, which covers the typical -case where there is a one-to-one correspondence between the documented API -and the actual module implementation. +To fix this problem, it is proposed to make use of the new module level +``__qualname__`` attributes to determine the real module location when +``__name__`` has been modified for any reason. -Functions and classes will gain a corresponding ``__qmodule__`` attribute -that refers to their module's ``__qname__``. +In the main module, ``__qualname__`` will automatically be set to the main +module's "real" name (as described above) by the interpreter. Pseudo-modules that adjust ``__name__`` to point to the public namespace will -leave ``__qname__`` untouched, so the implementation location remains readily +leave ``__qualname__`` untouched, so the implementation location remains readily accessible for introspection. -In the main module, ``__qname__`` will automatically be set to the main -module's "real" name (as described above under the fix to prevent duplicate -imports of the main module) by the interpreter. +If ``__name__`` is adjusted at the top of a module, then this will +automatically adjust the ``__module__`` attribute for all functions and +classes subsequently defined in that module. -At the interactive prompt, both ``__name__`` and ``__qname__`` will be set -to ``"__main__"``. +Since multiple submodules may be set to use the same "public" namespace, +functions and classes will be given a new ``__qualmodule__`` attribute +that refers to the ``__qualname__`` of their module. -These changes on their own will fix most pickling and serialisation problems, -but one additional change is needed to fix the problem with serialisation of -items in ``__main__``: as a slight adjustment to the definition process for -functions and classes, in the ``__name__ == "__main__"`` case, the module -``__qname__`` attribute will be used to set ``__module__``. +This isn't strictly necessary for functions (you could find out their +module's qualified name by looking in their globals dictionary), it is +needed for classes, since they don't hold a reference to the globals of +their defining module. Once a new attribute is added to classes, it is +more convenient to keep the API consistent and add a new attribute to +functions as well. -``pydoc`` and ``inspect`` would also be updated appropriately to: +These changes mean that adjusting ``__name__`` (and, either directly or +indirectly, the corresponding function and class ``__module__`` attributes) +becomes the officially sanctioned way to implement a namespace as a package, +while exposing the API as if it were still a single module. + +All serialisation code that currently uses ``__name__`` and ``__module__`` +attributes will then avoid exposing implementation details by default. + +To correctly handle serialisation of items from the main module, the class +and function definition logic will be updated to also use ``__qualname__`` +for the ``__module__`` attribute in the case where ``__name__ == "__main__"``. + +With ``__name__`` and ``__module__`` being officially blessed as being used +for the *public* names of things, the introspection tools in the standard +library will be updated to use ``__qualname__`` and ``__qualmodule__`` +where appropriate. For example: + +- ``pydoc`` will report both public and qualified names for modules +- ``inspect.getsource()`` (and similar tools) will use the qualified names + that point to the implementation of the code +- additional ``pydoc`` and/or ``inspect`` APIs may be provided that report + all modules with a given public ``__name__``. -- use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of - ``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer - the qualified variants) -- report both the public names and the qualified names for affected objects Fixing multiprocessing on Windows --------------------------------- -With ``__qname__`` now available to tell ``multiprocessing`` the real -name of the main module, it should be able to simply include it in the +With ``__qualname__`` now available to tell ``multiprocessing`` the real +name of the main module, it will be able to simply include it in the serialised information passed to the child process, eliminating the -need for dubious reverse engineering of the ``__file__`` attribute. +need for the current dubious introspection of the ``__file__`` attribute. + +For older Python versions, ``multiprocessing`` could be improved by applying +the ``split_path_module()`` algorithm described above when attempting to +work out how to execute the main module based on its ``__file__`` attribute. + + +Explicit relative imports +========================= + +This PEP proposes that ``__package__`` be unconditionally defined in the +main module as ``__qualname__.rpartition('.')[0]``. Aside from that, it +proposes that the behaviour of explicit relative imports be left alone. + +In particular, if ``__package__`` is not set in a module when an explicit +relative import occurs, the automatically cached value will continue to be +derived from ``__name__`` rather than ``__qualname__``. This minimises any +backwards incompatibilities with code that deliberately manipulates +relative imports by adjusting ``__name__`` rather than setting ``__package__`` +directly. Reference Implementation @@ -263,6 +567,10 @@ References .. [1] Module aliases and/or "real names" (http://mail.python.org/pipermail/python-ideas/2011-January/008983.html) +.. [2] PEP 395 (Module aliasing) and the namespace PEPs + (http://mail.python.org/pipermail/import-sig/2011-November/000382.html) + + Copyright =========