python-peps/pep-0395.txt

278 lines
11 KiB
Plaintext
Raw Normal View History

PEP: 395
Title: Module Aliasing
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-Mar-2011
Python-Version: 3.3
2011-03-04 10:38:00 -05:00
Post-History: 5-Mar-2011
Abstract
========
This PEP proposes new mechanisms that eliminate some longstanding traps for
the unwary when dealing with Python's import system, the pickle module and
introspection interfaces.
It builds on the "Qualified Name" concept defined in PEP 3155.
What's in a ``__name__``?
=========================
Over time, a module's ``__name__`` attribute has come to be used to handle a
number of different tasks.
The key use cases identified for this module attribute are:
1. Flagging the main module in a program, using the ``if __name__ ==
"__main__":`` convention.
2. As the starting point for relative imports
3. To identify the location of function and class definitions within the
running application
4. To identify the location of classes for serialisation into pickle objects
which may be shared with other interpreter instances
Traps for the Unwary
====================
The overloading of the semantics of ``__name__`` have resulted in several
traps for the unwary. These traps can be quite annoying in practice, as
they are highly unobvious and can cause quite confusing behaviour. A lot of
the time, you won't even notice them, which just makes them all the more
surprising when they do come up.
Importing the main module twice
-------------------------------
The most venerable of these traps is the issue of (effectively) importing
``__main__`` twice. This occurs when the main module is also imported under
its real name, effectively creating two instances of the same module under
different names.
This problem used to be significantly worse due to implicit relative imports
from the main module, but the switch to allowing only absolute imports and
explicit relative imports means this issue is now restricted to affecting the
main module itself.
Why are my relative imports broken?
-----------------------------------
PEP 366 defines a mechanism that allows relative imports to work correctly
when a module inside a package is executed via the ``-m`` switch.
Unfortunately, many users still attempt to directly execute scripts inside
packages. While this no longer silently does the wrong thing by
creating duplicate copies of peer modules due to implicit relative imports, it
now fails noisily at the first explicit relative import, even though the
interpreter actually has sufficient information available on the filesystem to
make it work properly.
<TODO: Anyone want to place bets on how many Stack Overflow links I could find
to put here if I really went looking?>
In a bit of a pickle
--------------------
Something many users may not realise is that the ``pickle`` module serialises
objects based on the ``__name__`` of the containing module. So objects
defined in ``__main__`` are pickled that way, and won't be unpickled
correctly by another python instance that only imported that module instead
of running it directly. This behaviour is the underlying reason for the
advice from many Python veterans to do as little as possible in the
``__main__`` module in any application that involves any form of object
serialisation and persistence.
Similarly, when creating a pseudo-module\*, pickles rely on the name of the
module where a class is actually defined, rather than the officially
documented location for that class in the module hierarchy.
While this PEP focuses specifically on ``pickle`` as the principal
serialisation scheme in the standard library, this issue may also affect
other mechanisms that support serialisation of arbitrary class instances.
\*For the purposes of this PEP, a "pseudo-module" is a package designed like
the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These
packages are documented as if they were single modules, but are in fact
internally implemented as a package. This is *supposed* to be an
implementation detail that users and other implementations don't need to worry
about, but, thanks to ``pickle`` (and serialisation in general), the details
are exposed and effectively become part of the public API.
Where's the source?
-------------------
Some sophisticated users of the pseudo-module technique described
above recognise the problem with implementation details leaking out via the
``pickle`` module, and choose to address it by altering ``__name__`` to refer
to the public location for the module before defining any functions or classes
(or else by modifying the ``__module__`` attributes of those objects after
they have been defined).
This approach is effective at eliminating the leakage of information via
pickling, but comes at the cost of breaking introspection for functions and
classes (as their ``__module__`` attribute now points to the wrong place).
Forkless Windows
----------------
To get around the lack of ``os.fork`` on Windows, the ``multiprocessing``
module attempts to re-execute Python with the same main module, but skipping
over any code guarded by ``if __name__ == "__main__":`` checks. It does the
best it can with the information it has, but is forced to make assumptions
that simply aren't valid whenever the main module isn't an ordinary directly
executed script or top-level module. Packages and non-top-level modules
executed via the ``-m`` switch, as well as directly executed zipfiles or
directories, are likely to make multiprocessing on Windows do the wrong thing
(either quietly or noisily) when spawning a new process.
While this issue currently only affects Windows directly, it also impacts
any proposals to provide Windows-style "clean process" invocation via the
multiprocessing module on other platforms.
Proposed Changes
================
The following changes are interrelated and make the most sense when
considered together. They collectively either completely eliminate the traps
for the unwary noted above, or else provide straightforward mechanisms for
dealing with them.
A rough draft of some of the concepts presented here was first posted on the
python-ideas list [1], but they have evolved considerably since first being
discussed in that thread.
Fixing dual imports of the main module
--------------------------------------
Two simple changes are proposed to fix this problem:
1. In ``runpy``, modify the implementation of the ``-m`` switch handling to
install the specified module in ``sys.modules`` under both its real name
and the name ``__main__``. (Currently it is only installed as the latter)
2. When directly executing a module, install it in ``sys.modules`` under
``os.path.splitext(os.path.basename(__file__))[0]`` as well as under
``__main__``.
2011-03-04 10:46:44 -05:00
With the main module also stored under its "real" name, attempts to import it
will pick it up from the ``sys.modules`` cache rather than reimporting it
under the new name.
Fixing direct execution inside packages
---------------------------------------
To fix this problem, it is proposed that an additional filesystem check be
performed before proceeding with direct execution of a ``PY_SOURCE`` or
``PY_COMPILED`` file that has been named on the command line.
This additional check would look for an ``__init__`` file that is a peer to
the specified file with a matching extension (either ``.py``, ``.pyc`` or
``.pyo``, depending what was passed on the command line).
If this check fails to find anything, direct execution proceeds as usual.
If, however, it finds something, execution is handed over to a
helper function in the ``runpy`` module that ``runpy.run_path`` also invokes
in the same circumstances. That function will walk back up the
directory hierarchy from the supplied path, looking for the first directory
that doesn't contain an ``__init__`` file. Once that directory is found, it
will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and
``runpy._run_module_as_main`` will be invoked with the appropriate module
name (as calculated based on the original filename and the directories
2011-03-04 10:46:44 -05:00
traversed while looking for a directory without an ``__init__`` file).
The two current PEPs for namespace packages (PEP 382 and PEP 402) would both
affect this part of the proposal. For PEP 382 (with its current suggestion of
"*.pyp" package directories, this check would instead just walk up the
supplied path, looking for the first non-package directory (this would not
require any filesystem stat calls). Since PEP 402 deliberately omits explicit
directory markers, it would need an alternative approach, based on checking
the supplied path against the contents of ``sys.path``. In both cases, the
direct execution behaviour can still be corrected.
Fixing pickling without breaking introspection
----------------------------------------------
To fix this problem, it is proposed to add a new optional module level
attribute: ``__qname__``. This abbreviation of "qualified name" is taken
from PEP 3155, where it is used to store the naming path to a nested class
or function definition relative to the top level module. By default,
``__qname__`` will be the same as ``__name__``, which covers the typical
case where there is a one-to-one correspondence between the documented API
and the actual module implementation.
Functions and classes will gain a corresponding ``__qmodule__`` attribute
that refers to their module's ``__qname__``.
Pseudo-modules that adjust ``__name__`` to point to the public namespace will
leave ``__qname__`` untouched, so the implementation location remains readily
accessible for introspection.
In the main module, ``__qname__`` will automatically be set to the main
2011-03-04 10:46:44 -05:00
module's "real" name (as described above under the fix to prevent duplicate
imports of the main module) by the interpreter.
At the interactive prompt, both ``__name__`` and ``__qname__`` will be set
to ``"__main__"``.
These changes on their own will fix most pickling and serialisation problems,
but one additional change is needed to fix the problem with serialisation of
items in ``__main__``: as a slight adjustment to the definition process for
functions and classes, in the ``__name__ == "__main__"`` case, the module
``__qname__`` attribute will be used to set ``__module__``.
``pydoc`` and ``inspect`` would also be updated appropriately to:
- use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of
``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer
the qualified variants)
- report both the public names and the qualified names for affected objects
Fixing multiprocessing on Windows
---------------------------------
With ``__qname__`` now available to tell ``multiprocessing`` the real
name of the main module, it should be able to simply include it in the
2011-03-04 10:46:44 -05:00
serialised information passed to the child process, eliminating the
need for dubious reverse engineering of the ``__file__`` attribute.
Reference Implementation
========================
None as yet.
References
==========
.. [1] Module aliases and/or "real names"
(http://mail.python.org/pipermail/python-ideas/2011-January/008983.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: