Update PEP 395 in light of import-sig discussion (also changes qname->qualname)

This commit is contained in:
Nick Coghlan 2011-11-19 22:18:45 +10:00
parent 500271eecb
commit d0145b5271
1 changed files with 415 additions and 107 deletions

View File

@ -1,5 +1,5 @@
PEP: 395
Title: Module Aliasing
Title: Qualifed Names for Modules
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
@ -8,19 +8,36 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 4-Mar-2011
Python-Version: 3.3
Post-History: 5-Mar-2011
Post-History: 5-Mar-2011, 19-Nov-2011
Abstract
========
This PEP proposes new mechanisms that eliminate some longstanding traps for
the unwary when dealing with Python's import system, the pickle module and
introspection interfaces.
the unwary when dealing with Python's import system, as well as serialisation
and introspection of functions and classes.
It builds on the "Qualified Name" concept defined in PEP 3155.
Relationship with Other PEPs
----------------------------
This PEP builds on the "qualified name" concept introduced by PEP 3155, and
also shares in that PEP's aim of fixing some ugly corner cases when dealing
with serialisation of arbitrary functions and classes.
It is also affected by the two competing "namespace package" PEPs (PEP 382
and PEP 402). This PEP would require some minor adjustments to accommodate
PEP 382, but has some critical incompatibilities with respect to the namespace
package mechanism proposed in PEP 402.
Finally, PEP 328 eliminated implicit relative imports from imported modules.
This PEP proposes that implicit relative imports from main modules also be
eliminated.
What's in a ``__name__``?
=========================
@ -48,35 +65,122 @@ the time, you won't even notice them, which just makes them all the more
surprising when they do come up.
Why are my imports broken?
--------------------------
There's a general principle that applies when modifying ``sys.path``: *never*
put a package directory directly on ``sys.path``. The reason this is
problematic is that every module in that directory is now potentially
accessible under two different names: as a top level module (since the
package directory is on ``sys.path``) and as a submodule of the package (if
the higher level directory containing the package itself is also on
``sys.path``).
As an example, Django (up to and including version 1.3) is guilty of setting
up exactly this situation for site-specific applications - the application
ends up being accessible as both ``app`` and ``site.app`` in the module
namespace, and these are actually two *different* copies of the module. This
is a recipe for confusion if there is any meaningful mutable module level
state, so this behaviour is being eliminated from the default site set up in
version 1.4 (site-specific apps will always be fully qualified with the site
name).
However, it's hard to blame Django for this, when the same part of Python
responsible for setting ``__name__ = "__main__"`` in the main module commits
the exact same error when determining the value for ``sys.path[0]``.
The impact of this can be seen relatively frequently if you follow the
"python" and "import" tags on Stack Overflow. When I had the time to follow
it myself, I regularly encountered people struggling to understand the
behaviour of straightforward package layouts like the following::
project/
setup.py
package/
__init__.py
foo.py
tests/
__init__.py
test_foo.py
I would actually often see it without the ``__init__.py`` files first, but
that's a trivial fix to explain. What's hard to explain is that all of the
following ways to invoke ``test_foo.py`` *probably won't work* due to broken
imports (either failing to find ``package`` for absolute imports, complaining
about relative imports in a non-package for explicit relative imports, or
issuing even more obscure errors if some other submodule happens to shadow
the name of a top-level module, such as a ``package.json`` module that
handled serialisation or a ``package.tests.unittest`` test runner)::
# working directory: project/package/tests
./test_foo.py
python test_foo.py
python -m test_foo
python -c "from test_foo import main; main()"
# working directory: project/package
tests/test_foo.py
python tests/test_foo.py
python -m tests.test_foo
python -c "from tests.test_foo import main; main()"
# working directory: project
package/tests/test_foo.py
python package/tests/test_foo.py
# working directory: project/..
project/package/tests/test_foo.py
python project/package/tests/test_foo.py
# The -m and -c approaches don't work from here either, but the failure
# to find 'package' correctly is pretty easy to explain in this case
That's right, that long list is of all the methods of invocation that will
almost certainly *break* if you try them, and the error messages won't make
any sense if you're not already intimately not only with the way Python's
import system works, but also with how it gets initialised.
For a long time, the only way to get ``sys.path`` right with that kind of
setup was to either set it manually in ``test_foo.py`` itself (hardly
something a novice, or even many veteran, Python programmers are going to
know how to do) or else to make sure to import the module instead of
executing it directly::
# working directory: project
python -c "from package.tests.test_foo import main; main()"
Since the implementation of PEP 366 (which defined a mechanism that allows
relative imports to work correctly when a module inside a package is executed
via the ``-m`` switch), the following also works properly::
# working directory: project
python -m package.tests.test_foo
The fact that most methods of invoking Python code from the command line
break when that code is inside a package, and the two that do work are highly
sensitive to the current working directory is all thoroughly confusing for a
beginner, and I personally believe it is one of the key factors leading
to the perception that Python packages are complicated and hard to get right.
This problem isn't even limited to the command line - if ``test_foo.py`` is
open in Idle and you attempt to run it by pressing F5, then it will fail in
just the same way it would if run directly from the command line.
There's a reason the general ``sys.path`` guideline mentioned above exists,
and the fact that the interpreter itself doesn't follow it when determining
``sys.path[0]`` is the root cause of all sorts of grief.
Importing the main module twice
-------------------------------
The most venerable of these traps is the issue of (effectively) importing
``__main__`` twice. This occurs when the main module is also imported under
its real name, effectively creating two instances of the same module under
Another venerable trap is the issue of (effectively) importing ``__main__``
twice. This occurs when the main module is also imported under its real
name, effectively creating two instances of the same module under
different names.
This problem used to be significantly worse due to implicit relative imports
from the main module, but the switch to allowing only absolute imports and
explicit relative imports means this issue is now restricted to affecting the
main module itself.
Why are my relative imports broken?
-----------------------------------
PEP 366 defines a mechanism that allows relative imports to work correctly
when a module inside a package is executed via the ``-m`` switch.
Unfortunately, many users still attempt to directly execute scripts inside
packages. While this no longer silently does the wrong thing by
creating duplicate copies of peer modules due to implicit relative imports, it
now fails noisily at the first explicit relative import, even though the
interpreter actually has sufficient information available on the filesystem to
make it work properly.
<TODO: Anyone want to place bets on how many Stack Overflow links I could find
to put here if I really went looking?>
If the state stored in ``__main__`` is significant to the correct operation
of the program, then this duplication can cause obscure and surprising
errors.
In a bit of a pickle
@ -91,21 +195,23 @@ advice from many Python veterans to do as little as possible in the
``__main__`` module in any application that involves any form of object
serialisation and persistence.
Similarly, when creating a pseudo-module\*, pickles rely on the name of the
Similarly, when creating a pseudo-module, pickles rely on the name of the
module where a class is actually defined, rather than the officially
documented location for that class in the module hierarchy.
While this PEP focuses specifically on ``pickle`` as the principal
serialisation scheme in the standard library, this issue may also affect
other mechanisms that support serialisation of arbitrary class instances.
\*For the purposes of this PEP, a "pseudo-module" is a package designed like
For the purposes of this PEP, a "pseudo-module" is a package designed like
the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These
packages are documented as if they were single modules, but are in fact
internally implemented as a package. This is *supposed* to be an
implementation detail that users and other implementations don't need to worry
about, but, thanks to ``pickle`` (and serialisation in general), the details
are exposed and effectively become part of the public API.
implementation detail that users and other implementations don't need to
worry about, but, thanks to ``pickle`` (and serialisation in general),
the details are often exposed and can effectively become part of the public
API.
While this PEP focuses specifically on ``pickle`` as the principal
serialisation scheme in the standard library, this issue may also affect
other mechanisms that support serialisation of arbitrary class instances
and rely on ``__name__`` to determine how to handle deserialisation.
Where's the source?
@ -141,8 +247,30 @@ any proposals to provide Windows-style "clean process" invocation via the
multiprocessing module on other platforms.
Proposed Changes
================
Qualified Names for Modules
===========================
To make it feasible to fix these problems once and for all, it is proposed
to add a new module level attribute: ``__qualname__``. This abbreviation of
"qualified name" is taken from PEP 3155, where it is used to store the naming
path to a nested class or function definition relative to the top level
module.
If a module loader does not initialise ``__qualname__`` itself, then the
import system will add it automatically (setting it to the same value as
``__name__``).
For modules, ``__qualname__`` will normally be the same as ``__name__``, just
as it is for top-level functions and classes in PEP 3155. However, it will
differ in some situations so that the above problems can be addressed.
Specifically, whenever ``__name__`` is modified for some other purpose (such
as to denote the main module), then ``__qualname__`` will remain unchanged,
allowing code that needs it to access the original unmodified value.
Eliminating the Traps
=====================
The following changes are interrelated and make the most sense when
considered together. They collectively either completely eliminate the traps
@ -150,105 +278,281 @@ for the unwary noted above, or else provide straightforward mechanisms for
dealing with them.
A rough draft of some of the concepts presented here was first posted on the
python-ideas list [1], but they have evolved considerably since first being
discussed in that thread.
python-ideas list [1]_, but they have evolved considerably since first being
discussed in that thread. Further discussion has subsequently taken place on
import-sig [2]_.
Fixing main module imports inside packages
------------------------------------------
To eliminate this trap, it is proposed that an additional filesystem check be
performed when determining a suitable value for ``sys.path[0]``. This check
will look for Python's explicit package directory markers and use them to find
the appropriate directory to add to ``sys.path``.
The current algorithm for setting ``sys.path[0]`` in relevant cases is roughly
as follows:
# Interactive prompt, -m switch, -c switch
sys.path.insert(0, '')
# Valid sys.path entry execution (i.e. directory and zip execution)
sys.path.insert(0, sys.argv[0])
# Direct script execution
sys.path.insert(0, os.path.dirname(sys.argv[0]))
It is proposed that this initialisation process be modified to take
package details stored on the filesystem into account::
# Interactive prompt, -c switch
in_package, path_entry, modname = split_path_module(os.getcwd(), '')
if in_package:
sys.path.insert(0, path_entry)
else:
sys.path.insert(0, '')
# Start interactive prompt or run -c command as usual
# __main__.__qualname__ is set to "__main__"
# -m switch
modname = <<argument to -m switch>>
in_package, path_entry, modname = split_path_module(os.getcwd(), modname)
if in_package:
sys.path.insert(0, path_entry)
else:
sys.path.insert(0, '')
# modname (possibly adjusted) is passed to ``runpy._run_module_as_main()``
# __main__.__qualname__ is set to modname
# Valid sys.path entry execution (i.e. directory and zip execution)
modname = "__main__"
path_entry, modname = split_path_module(sys.argv[0], modname)
sys.path.insert(0, path_entry)
# modname (possibly adjusted) is passed to ``runpy._run_module_as_main()``
# __main__.__qualname__ is set to modname
# Direct script execution
in_package, path_entry, modname = split_path_module(sys.argv[0])
sys.path.insert(0, path_entry)
if in_package:
# Pass modname to ``runpy._run_module_as_main()``
else:
# Run script directly
# __main__.__qualname__ is set to modname
The ``split_path_module()`` supporting function used in the above pseudo-code
would have the following semantics::
def _splitmodname(fspath):
path_entry, fname = os.path.split(fspath)
modname = os.path.splitext(fname)[0]
return path_entry, modname
def _is_package_dir(fspath):
return any(os.exists("__init__" + info[0]) for info
in imp.get_suffixes())
def split_path_module(fspath, modname=None):
"""Given a filesystem path and a relative module name, determine an
appropriate sys.path entry and a fully qualified module name.
Returns a 3-tuple of (package_depth, fspath, modname). A reported
package depth of 0 indicates that this would be a top level import.
If no relative module name is given, it is derived from the final
component in the supplied path with the extension stripped.
"""
if modname is None:
fspath, modname = _splitmodname(fspath)
package_depth = 0
while _is_package_dir(fspath):
fspath, pkg = _splitmodname(fspath)
modname = pkg + '.' + modname
return package_depth, fspath, modname
This PEP also proposes that the ``split_path_module()`` functionality be
exposed directly to Python users via the ``runpy`` module.
Compatibility with PEP 382
~~~~~~~~~~~~~~~~~~~~~~~~~~
Making this proposal compatible with the PEP 382 namespace packaging PEP is
trivial. The semantics of ``_is_package_dir()`` are merely changed to be::
def _is_package_dir(fspath):
return (fspath.endswith(".pyp") or
any(os.exists("__init__" + info[0]) for info
in imp.get_suffixes()))
Incompatibility with PEP 402
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PEP 402 proposes the elimination of explicit markers in the file system for
Python packages. This fundamentally breaks the proposed concept of being able
to take a filesystem path and a Python module name and work out an unambiguous
mapping to the Python module namespace. Instead, the appropriate mapping
would depend on the current values in ``sys.path``, rendering it impossible
to ever fix the problems described above with the calculation of
``sys.path[0]`` when the interpreter is initialised.
While some aspects of this PEP could probably be salvaged if PEP 402 were
adopted, the core concept of making import semantics from main and other
modules more consistent would no longer be feasible.
This incompatibility is discussed in more detail in the relevant import-sig
thread [2]_.
Potential incompatibilities with scripts stored in packages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The proposed change to ``sys.path[0]`` initialisation *may* break some
existing code. Specifically, it will break scripts stored in package
directories that rely on the implicit relative imports from ``__main__`` in
order to run correctly under Python 3.
While such scripts could be imported in Python 2 (due to implicit relative
imports) it is already the case that they cannot be imported in Python 3,
as implicit relative imports are no longer permitted when a module is
imported.
By disallowing implicit relatives imports from the main module as well,
such modules won't even work as scripts with this PEP. Switching them
over to explicit relative imports will then get them working again as
both executable scripts *and* as importable modules.
To support earlier versions of Python, a script could be written to use
different forms of import based on the Python version::
if __name__ == "__main__" and sys.version_info < (3, 3):
import peer # Implicit relative import
else:
from . import peer # explicit relative import
Fixing dual imports of the main module
--------------------------------------
Two simple changes are proposed to fix this problem:
Given the above proposal to get ``__qualname__`` consistently set correctly
in the main module, one simple change is proposed to eliminate the problem
of dual imports of the main module: the addition of a ``sys.metapath`` hook
that detects attempts to import ``__main__`` under its real name and returns
the original main module instead::
1. In ``runpy``, modify the implementation of the ``-m`` switch handling to
install the specified module in ``sys.modules`` under both its real name
and the name ``__main__``. (Currently it is only installed as the latter)
2. When directly executing a module, install it in ``sys.modules`` under
``os.path.splitext(os.path.basename(__file__))[0]`` as well as under
``__main__``.
class AliasImporter:
def __init__(self, module, alias):
self.module = module
self.alias = alias
With the main module also stored under its "real" name, attempts to import it
will pick it up from the ``sys.modules`` cache rather than reimporting it
under the new name.
def __repr__(self):
fmt = "{0.__class__.__name__}({0.module.__name__}, {0.alias})"
return fmt.format(self)
def find_module(self, fullname, path=None):
if path is None and fullname == self.alias:
return self
return None
Fixing direct execution inside packages
---------------------------------------
def load_module(self, fullname):
if fullname != self.alias:
raise ImportError("{!r} cannot load {!r}".format(self, fullname))
return self.main_module
To fix this problem, it is proposed that an additional filesystem check be
performed before proceeding with direct execution of a ``PY_SOURCE`` or
``PY_COMPILED`` file that has been named on the command line.
This metapath hook would be added automatically during import system
initialisation based on the following logic::
This additional check would look for an ``__init__`` file that is a peer to
the specified file with a matching extension (either ``.py``, ``.pyc`` or
``.pyo``, depending what was passed on the command line).
main = sys.modules["__main__"]
if main.__name__ != main.__qualname__:
sys.metapath.append(AliasImporter(main, main.__qualname__))
If this check fails to find anything, direct execution proceeds as usual.
If, however, it finds something, execution is handed over to a
helper function in the ``runpy`` module that ``runpy.run_path`` also invokes
in the same circumstances. That function will walk back up the
directory hierarchy from the supplied path, looking for the first directory
that doesn't contain an ``__init__`` file. Once that directory is found, it
will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and
``runpy._run_module_as_main`` will be invoked with the appropriate module
name (as calculated based on the original filename and the directories
traversed while looking for a directory without an ``__init__`` file).
The two current PEPs for namespace packages (PEP 382 and PEP 402) would both
affect this part of the proposal. For PEP 382 (with its current suggestion of
"\*.pyp" package directories, this check would instead just walk up the
supplied path, looking for the first non-package directory (this would not
require any filesystem stat calls). Since PEP 402 deliberately omits explicit
directory markers, it would need an alternative approach, based on checking
the supplied path against the contents of ``sys.path``. In both cases, the
direct execution behaviour can still be corrected.
This is probably the least important proposal in the PEP - it just
closes off the last mechanism that is likely to lead to module duplication
after the configuration of ``sys.path[0]`` at interpreter startup is
addressed.
Fixing pickling without breaking introspection
----------------------------------------------
To fix this problem, it is proposed to add a new optional module level
attribute: ``__qname__``. This abbreviation of "qualified name" is taken
from PEP 3155, where it is used to store the naming path to a nested class
or function definition relative to the top level module. By default,
``__qname__`` will be the same as ``__name__``, which covers the typical
case where there is a one-to-one correspondence between the documented API
and the actual module implementation.
To fix this problem, it is proposed to make use of the new module level
``__qualname__`` attributes to determine the real module location when
``__name__`` has been modified for any reason.
Functions and classes will gain a corresponding ``__qmodule__`` attribute
that refers to their module's ``__qname__``.
In the main module, ``__qualname__`` will automatically be set to the main
module's "real" name (as described above) by the interpreter.
Pseudo-modules that adjust ``__name__`` to point to the public namespace will
leave ``__qname__`` untouched, so the implementation location remains readily
leave ``__qualname__`` untouched, so the implementation location remains readily
accessible for introspection.
In the main module, ``__qname__`` will automatically be set to the main
module's "real" name (as described above under the fix to prevent duplicate
imports of the main module) by the interpreter.
If ``__name__`` is adjusted at the top of a module, then this will
automatically adjust the ``__module__`` attribute for all functions and
classes subsequently defined in that module.
At the interactive prompt, both ``__name__`` and ``__qname__`` will be set
to ``"__main__"``.
Since multiple submodules may be set to use the same "public" namespace,
functions and classes will be given a new ``__qualmodule__`` attribute
that refers to the ``__qualname__`` of their module.
These changes on their own will fix most pickling and serialisation problems,
but one additional change is needed to fix the problem with serialisation of
items in ``__main__``: as a slight adjustment to the definition process for
functions and classes, in the ``__name__ == "__main__"`` case, the module
``__qname__`` attribute will be used to set ``__module__``.
This isn't strictly necessary for functions (you could find out their
module's qualified name by looking in their globals dictionary), it is
needed for classes, since they don't hold a reference to the globals of
their defining module. Once a new attribute is added to classes, it is
more convenient to keep the API consistent and add a new attribute to
functions as well.
``pydoc`` and ``inspect`` would also be updated appropriately to:
These changes mean that adjusting ``__name__`` (and, either directly or
indirectly, the corresponding function and class ``__module__`` attributes)
becomes the officially sanctioned way to implement a namespace as a package,
while exposing the API as if it were still a single module.
All serialisation code that currently uses ``__name__`` and ``__module__``
attributes will then avoid exposing implementation details by default.
To correctly handle serialisation of items from the main module, the class
and function definition logic will be updated to also use ``__qualname__``
for the ``__module__`` attribute in the case where ``__name__ == "__main__"``.
With ``__name__`` and ``__module__`` being officially blessed as being used
for the *public* names of things, the introspection tools in the standard
library will be updated to use ``__qualname__`` and ``__qualmodule__``
where appropriate. For example:
- ``pydoc`` will report both public and qualified names for modules
- ``inspect.getsource()`` (and similar tools) will use the qualified names
that point to the implementation of the code
- additional ``pydoc`` and/or ``inspect`` APIs may be provided that report
all modules with a given public ``__name__``.
- use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of
``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer
the qualified variants)
- report both the public names and the qualified names for affected objects
Fixing multiprocessing on Windows
---------------------------------
With ``__qname__`` now available to tell ``multiprocessing`` the real
name of the main module, it should be able to simply include it in the
With ``__qualname__`` now available to tell ``multiprocessing`` the real
name of the main module, it will be able to simply include it in the
serialised information passed to the child process, eliminating the
need for dubious reverse engineering of the ``__file__`` attribute.
need for the current dubious introspection of the ``__file__`` attribute.
For older Python versions, ``multiprocessing`` could be improved by applying
the ``split_path_module()`` algorithm described above when attempting to
work out how to execute the main module based on its ``__file__`` attribute.
Explicit relative imports
=========================
This PEP proposes that ``__package__`` be unconditionally defined in the
main module as ``__qualname__.rpartition('.')[0]``. Aside from that, it
proposes that the behaviour of explicit relative imports be left alone.
In particular, if ``__package__`` is not set in a module when an explicit
relative import occurs, the automatically cached value will continue to be
derived from ``__name__`` rather than ``__qualname__``. This minimises any
backwards incompatibilities with code that deliberately manipulates
relative imports by adjusting ``__name__`` rather than setting ``__package__``
directly.
Reference Implementation
@ -263,6 +567,10 @@ References
.. [1] Module aliases and/or "real names"
(http://mail.python.org/pipermail/python-ideas/2011-January/008983.html)
.. [2] PEP 395 (Module aliasing) and the namespace PEPs
(http://mail.python.org/pipermail/import-sig/2011-November/000382.html)
Copyright
=========