Update the module aliasing proposal based on Antoine's new qualified names PEP

This commit is contained in:
Nick Coghlan 2011-10-30 16:00:10 +10:00
parent 34f1b5ddcf
commit d003fe8846
1 changed files with 50 additions and 31 deletions

View File

@ -18,8 +18,7 @@ This PEP proposes new mechanisms that eliminate some longstanding traps for
the unwary when dealing with Python's import system, the pickle module and
introspection interfaces.
<This will be fleshed out into a better summary once the PEP has been
discussed further>
It builds on the "Qualified Name" concept defined in PEP 3155.
What's in a ``__name__``?
@ -76,7 +75,7 @@ now fails noisily at the first explicit relative import, even though the
interpreter actually has sufficient information available on the filesystem to
make it work properly.
<TODO: Anyone want to place bets on how many StackOverflow links I could find
<TODO: Anyone want to place bets on how many Stack Overflow links I could find
to put here if I really went looking?>
@ -87,9 +86,10 @@ Something many users may not realise is that the ``pickle`` module serialises
objects based on the ``__name__`` of the containing module. So objects
defined in ``__main__`` are pickled that way, and won't be unpickled
correctly by another python instance that only imported that module instead
of running it directly. Thus the advice from many Python veterans to do as
little as possible in the ``__main__`` module in any application that
involves any form of object serialisation and persistence.
of running it directly. This behaviour is the underlying reason for the
advice from many Python veterans to do as little as possible in the
``__main__`` module in any application that involves any form of object
serialisation and persistence.
Similarly, when creating a pseudo-module\*, pickles rely on the name of the
module where a class is actually defined, rather than the officially
@ -104,8 +104,8 @@ the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These
packages are documented as if they were single modules, but are in fact
internally implemented as a package. This is *supposed* to be an
implementation detail that users and other implementations don't need to worry
about, but, thanks to ``pickle``, the details are exposed and effectively
become part of the public API.
about, but, thanks to ``pickle`` (and serialisation in general), the details
are exposed and effectively become part of the public API.
Where's the source?
@ -136,6 +136,10 @@ executed via the ``-m`` switch, as well as directly executed zipfiles or
directories, are likely to make multiprocessing on Windows do the wrong thing
(either quietly or noisily) when spawning a new process.
While this issue currently only affects Windows directly, it also impacts
any proposals to provide Windows-style "clean process" invocation via the
multiprocessing module on other platforms.
Proposed Changes
================
@ -190,42 +194,57 @@ will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and
name (as calculated based on the original filename and the directories
traversed while looking for a directory without an ``__init__`` file).
The two current PEPs for namespace packages (PEP 382 and PEP 402) would both
affect this part of the proposal. For PEP 382 (with its current suggestion of
"*.pyp" package directories, this check would instead just walk up the
supplied path, looking for the first non-package directory (this would not
require any filesystem stat calls). Since PEP 402 deliberately omits explicit
directory markers, it would need an alternative approach, based on checking
the supplied path against the contents of ``sys.path``. In both cases, the
direct execution behaviour can still be corrected.
Fixing pickling without breaking introspection
----------------------------------------------
To fix this problem, it is proposed to add two optional module level
attributes: ``__source_name__`` and ``__pickle_name__``.
To fix this problem, it is proposed to add a new optional module level
attribute: ``__qname__``. This abbreviation of "qualified name" is taken
from PEP 3155, where it is used to store the naming path to a nested class
or function definition relative to the top level module. By default,
``__qname__`` will be the same as ``__name__``, which covers the typical
case where there is a one-to-one correspondence between the documented API
and the actual module implementation.
When setting the ``__module__`` attribute on a function or class, the
interpreter will be updated to use ``__source_name__`` if defined, falling
back to ``__name__`` otherwise.
Functions and classes will gain a corresponding ``__qmodule__`` attribute
that refers to their module's ``__qname__``.
In the main module, ``__source_name__`` will automatically be set to the main
Pseudo-modules that adjust ``__name__`` to point to the public namespace will
leave ``__qname__`` untouched, so the implementation location remains readily
accessible for introspection.
In the main module, ``__qname__`` will automatically be set to the main
module's "real" name (as described above under the fix to prevent duplicate
imports of the main module) by the interpreter. This will fix both pickling
and introspection for the main module.
imports of the main module) by the interpreter.
It is also proposed that the pickling mechanism for classes and functions be
updated to use an optional ``__pickle_module__`` attribute when deciding how
to pickle these objects (falling back to the existing ``__module__``
attribute if the optional attribute is not defined). When a class or function
is defined, this optional attribute will be defined if ``__pickle_name__`` is
defined at the module level, and left out otherwise. This will allow
pseudo-modules to fix pickling without breaking introspection.
At the interactive prompt, both ``__name__`` and ``__qname__`` will be set
to ``"__main__"``.
Other serialisation schemes could add support for this new attribute
relatively easily by replacing ``x.__module__`` with ``getattr(x,
"__pickle_module__", x.__module__)``.
These changes on their own will fix most pickling and serialisation problems,
but one additional change is needed to fix the problem with serialisation of
items in ``__main__``: as a slight adjustment to the definition process for
functions and classes, in the ``__name__ == "__main__"`` case, the module
``__qname__`` attribute will be used to set ``__module__``.
``pydoc`` and ``inspect`` would also be updated to make appropriate use of
the new attributes for any cases not already covered by the above rules for
setting ``__module__``.
``pydoc`` and ``inspect`` would also be updated appropriately to:
- use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of
``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer
the qualified variants)
- report both the public names and the qualified names for affected objects
Fixing multiprocessing on Windows
---------------------------------
With ``__source_name__`` now available to tell ``multiprocessing`` the real
With ``__qname__`` now available to tell ``multiprocessing`` the real
name of the main module, it should be able to simply include it in the
serialised information passed to the child process, eliminating the
need for dubious reverse engineering of the ``__file__`` attribute.
@ -234,7 +253,7 @@ need for dubious reverse engineering of the ``__file__`` attribute.
Reference Implementation
========================
None as yet. I'll probably be sprinting on this after Pycon.
None as yet.
References