python-peps/pep-0395.txt

PEP: 395
Title: Module Aliasing
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-Mar-2011
Python-Version: 3.3
Post-History: 5-Mar-2011


Abstract
========

This PEP proposes new mechanisms that eliminate some longstanding traps for
the unwary when dealing with Python's import system, the pickle module and
introspection interfaces.

It builds on the "Qualified Name" concept defined in PEP 3155.


What's in a ``__name__``?
=========================

Over time, a module's ``__name__`` attribute has come to be used to handle a
number of different tasks.

The key use cases identified for this module attribute are:

1. Flagging the main module in a program, using the ``if __name__ ==
   "__main__":`` convention.
2. As the starting point for relative imports
3. To identify the location of function and class definitions within the
   running application
4. To identify the location of classes for serialisation into pickle objects
   which may be shared with other interpreter instances


Traps for the Unwary
====================

The overloading of the semantics of ``__name__`` have resulted in several
traps for the unwary. These traps can be quite annoying in practice, as
they are highly unobvious and can cause quite confusing behaviour. A lot of
the time, you won't even notice them, which just makes them all the more
surprising when they do come up.


Importing the main module twice
-------------------------------

The most venerable of these traps is the issue of (effectively) importing
``__main__`` twice. This occurs when the main module is also imported under
its real name, effectively creating two instances of the same module under
different names.

This problem used to be significantly worse due to implicit relative imports
from the main module, but the switch to allowing only absolute imports and
explicit relative imports means this issue is now restricted to affecting the
main module itself.


Why are my relative imports broken?
-----------------------------------

PEP 366 defines a mechanism that allows relative imports to work correctly
when a module inside a package is executed via the ``-m`` switch.

Unfortunately, many users still attempt to directly execute scripts inside
packages. While this no longer silently does the wrong thing by
creating duplicate copies of peer modules due to implicit relative imports, it
now fails noisily at the first explicit relative import, even though the
interpreter actually has sufficient information available on the filesystem to
make it work properly.

<TODO: Anyone want to place bets on how many Stack Overflow links I could find
to put here if I really went looking?>


In a bit of a pickle
--------------------

Something many users may not realise is that the ``pickle`` module serialises
objects based on the ``__name__`` of the containing module. So objects
defined in ``__main__`` are pickled that way, and won't be unpickled
correctly by another python instance that only imported that module instead
of running it directly. This behaviour is the underlying reason for the
advice from many Python veterans to do as little as possible in the 
``__main__`` module in any application that involves any form of object
serialisation and persistence.

Similarly, when creating a pseudo-module\*, pickles rely on the name of the
module where a class is actually defined, rather than the officially
documented location for that class in the module hierarchy.

While this PEP focuses specifically on ``pickle`` as the principal
serialisation scheme in the standard library, this issue may also affect
other mechanisms that support serialisation of arbitrary class instances.

\*For the purposes of this PEP, a "pseudo-module" is a package designed like
the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These
packages are documented as if they were single modules, but are in fact
internally implemented as a package. This is *supposed* to be an
implementation detail that users and other implementations don't need to worry
about, but, thanks to ``pickle`` (and serialisation in general), the details
are exposed and effectively become part of the public API.


Where's the source?
-------------------

Some sophisticated users of the pseudo-module technique described
above recognise the problem with implementation details leaking out via the
``pickle`` module, and choose to address it by altering ``__name__`` to refer
to the public location for the module before defining any functions or classes
(or else by modifying the ``__module__`` attributes of those objects after
they have been defined).

This approach is effective at eliminating the leakage of information via
pickling, but comes at the cost of breaking introspection for functions and
classes (as their ``__module__`` attribute now points to the wrong place).


Forkless Windows
----------------

To get around the lack of ``os.fork`` on Windows, the ``multiprocessing``
module attempts to re-execute Python with the same main module, but skipping
over any code guarded by ``if __name__ == "__main__":`` checks. It does the
best it can with the information it has, but is forced to make assumptions
that simply aren't valid whenever the main module isn't an ordinary directly
executed script or top-level module. Packages and non-top-level modules
executed via the ``-m`` switch, as well as directly executed zipfiles or
directories, are likely to make multiprocessing on Windows do the wrong thing
(either quietly or noisily) when spawning a new process.

While this issue currently only affects Windows directly, it also impacts
any proposals to provide Windows-style "clean process" invocation via the
multiprocessing module on other platforms.


Proposed Changes
================

The following changes are interrelated and make the most sense when
considered together. They collectively either completely eliminate the traps
for the unwary noted above, or else provide straightforward mechanisms for
dealing with them.

A rough draft of some of the concepts presented here was first posted on the
python-ideas list [1], but they have evolved considerably since first being
discussed in that thread.


Fixing dual imports of the main module
--------------------------------------

Two simple changes are proposed to fix this problem:

1. In ``runpy``, modify the implementation of the ``-m`` switch handling to
   install the specified module in ``sys.modules`` under both its real name
   and the name ``__main__``. (Currently it is only installed as the latter)
2. When directly executing a module, install it in ``sys.modules`` under
   ``os.path.splitext(os.path.basename(__file__))[0]`` as well as under
   ``__main__``.

With the main module also stored under its "real" name, attempts to import it
will pick it up from the ``sys.modules`` cache rather than reimporting it
under the new name.


Fixing direct execution inside packages
---------------------------------------

To fix this problem, it is proposed that an additional filesystem check be
performed before proceeding with direct execution of a ``PY_SOURCE`` or
``PY_COMPILED`` file that has been named on the command line.

This additional check would look for an ``__init__`` file that is a peer to
the specified file with a matching extension (either ``.py``, ``.pyc`` or
``.pyo``, depending what was passed on the command line).

If this check fails to find anything, direct execution proceeds as usual.

If, however, it finds something, execution is handed over to a
helper function in the ``runpy`` module that ``runpy.run_path`` also invokes
in the same circumstances. That function will walk back up the
directory hierarchy from the supplied path, looking for the first directory
that doesn't contain an ``__init__`` file. Once that directory is found, it
will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and
``runpy._run_module_as_main`` will be invoked with the appropriate module
name (as calculated based on the original filename and the directories
traversed while looking for a directory without an ``__init__`` file).

The two current PEPs for namespace packages (PEP 382 and PEP 402) would both
affect this part of the proposal. For PEP 382 (with its current suggestion of
"*.pyp" package directories, this check would instead just walk up the
supplied path, looking for the first non-package directory (this would not
require any filesystem stat calls). Since PEP 402 deliberately omits explicit
directory markers, it would need an alternative approach, based on checking
the supplied path against the contents of ``sys.path``. In both cases, the
direct execution behaviour can still be corrected.


Fixing pickling without breaking introspection
----------------------------------------------

To fix this problem, it is proposed to add a new optional module level
attribute: ``__qname__``. This abbreviation of "qualified name" is taken
from PEP 3155, where it is used to store the naming path to a nested class
or function definition relative to the top level module. By default,
``__qname__`` will be the same as ``__name__``, which covers the typical
case where there is a one-to-one correspondence between the documented API
and the actual module implementation.

Functions and classes will gain a corresponding ``__qmodule__`` attribute
that refers to their module's ``__qname__``.

Pseudo-modules that adjust ``__name__`` to point to the public namespace will
leave ``__qname__`` untouched, so the implementation location remains readily
accessible for introspection.

In the main module, ``__qname__`` will automatically be set to the main
module's "real" name (as described above under the fix to prevent duplicate
imports of the main module) by the interpreter.

At the interactive prompt, both ``__name__`` and ``__qname__`` will be set
to ``"__main__"``.

These changes on their own will fix most pickling and serialisation problems,
but one additional change is needed to fix the problem with serialisation of
items in ``__main__``: as a slight adjustment to the definition process for
functions and classes, in the ``__name__ == "__main__"`` case, the module
``__qname__`` attribute will be used to set ``__module__``.

``pydoc`` and ``inspect`` would also be updated appropriately to:
- use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of
  ``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer
  the qualified variants)
- report both the public names and the qualified names for affected objects

Fixing multiprocessing on Windows
---------------------------------

With ``__qname__`` now available to tell ``multiprocessing`` the real
name of the main module, it should be able to simply include it in the
serialised information passed to the child process, eliminating the
need for dubious reverse engineering of the ``__file__`` attribute.


Reference Implementation
========================

None as yet.


References
==========

.. [1] Module aliases and/or "real names"
   (http://mail.python.org/pipermail/python-ideas/2011-January/008983.html)


Copyright
=========

This document has been placed in the public domain.

..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00			`PEP: 395`
			`Title: Module Aliasing`
			`Version: $Revision$`
			`Last-Modified: $Date$`
			`Author: Nick Coghlan <ncoghlan@gmail.com>`
			`Status: Draft`
			`Type: Standards Track`
			`Content-Type: text/x-rst`
			`Created: 4-Mar-2011`
			`Python-Version: 3.3`
Update post history for PEP 395 2011-03-04 10:38:00 -05:00			`Post-History: 5-Mar-2011`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00

			`Abstract`
			`========`

			`This PEP proposes new mechanisms that eliminate some longstanding traps for`
			`the unwary when dealing with Python's import system, the pickle module and`
			`introspection interfaces.`

Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`It builds on the "Qualified Name" concept defined in PEP 3155.`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00

			What's in a ``__name__``?
			`=========================`

			Over time, a module's ``__name__`` attribute has come to be used to handle a
			`number of different tasks.`

			`The key use cases identified for this module attribute are:`

			1. Flagging the main module in a program, using the ``if __name__ ==
			"__main__":`` convention.
			`2. As the starting point for relative imports`
			`3. To identify the location of function and class definitions within the`
			`running application`
			`4. To identify the location of classes for serialisation into pickle objects`
			`which may be shared with other interpreter instances`


			`Traps for the Unwary`
			`====================`

			The overloading of the semantics of ``__name__`` have resulted in several
			`traps for the unwary. These traps can be quite annoying in practice, as`
			`they are highly unobvious and can cause quite confusing behaviour. A lot of`
			`the time, you won't even notice them, which just makes them all the more`
			`surprising when they do come up.`


			`Importing the main module twice`
			`-------------------------------`

			`The most venerable of these traps is the issue of (effectively) importing`
			``__main__`` twice. This occurs when the main module is also imported under
			`its real name, effectively creating two instances of the same module under`
			`different names.`

			`This problem used to be significantly worse due to implicit relative imports`
			`from the main module, but the switch to allowing only absolute imports and`
			`explicit relative imports means this issue is now restricted to affecting the`
			`main module itself.`


			`Why are my relative imports broken?`
			`-----------------------------------`

			`PEP 366 defines a mechanism that allows relative imports to work correctly`
			when a module inside a package is executed via the ``-m`` switch.

			`Unfortunately, many users still attempt to directly execute scripts inside`
			`packages. While this no longer silently does the wrong thing by`
			`creating duplicate copies of peer modules due to implicit relative imports, it`
			`now fails noisily at the first explicit relative import, even though the`
			`interpreter actually has sufficient information available on the filesystem to`
			`make it work properly.`

Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`<TODO: Anyone want to place bets on how many Stack Overflow links I could find`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00			`to put here if I really went looking?>`


			`In a bit of a pickle`
			`--------------------`

			Something many users may not realise is that the ``pickle`` module serialises
			objects based on the ``__name__`` of the containing module. So objects
			defined in ``__main__`` are pickled that way, and won't be unpickled
			`correctly by another python instance that only imported that module instead`
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`of running it directly. This behaviour is the underlying reason for the`
			`advice from many Python veterans to do as little as possible in the`
			``__main__`` module in any application that involves any form of object
			`serialisation and persistence.`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
			`Similarly, when creating a pseudo-module\*, pickles rely on the name of the`
			`module where a class is actually defined, rather than the officially`
			`documented location for that class in the module hierarchy.`

			While this PEP focuses specifically on ``pickle`` as the principal
			`serialisation scheme in the standard library, this issue may also affect`
			`other mechanisms that support serialisation of arbitrary class instances.`

			`\*For the purposes of this PEP, a "pseudo-module" is a package designed like`
			the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These
			`packages are documented as if they were single modules, but are in fact`
			`internally implemented as a package. This is supposed to be an`
			`implementation detail that users and other implementations don't need to worry`
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			about, but, thanks to ``pickle`` (and serialisation in general), the details
			`are exposed and effectively become part of the public API.`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00

			`Where's the source?`
			`-------------------`

			`Some sophisticated users of the pseudo-module technique described`
			`above recognise the problem with implementation details leaking out via the`
			``pickle`` module, and choose to address it by altering ``__name__`` to refer
			`to the public location for the module before defining any functions or classes`
			(or else by modifying the ``__module__`` attributes of those objects after
			`they have been defined).`

			`This approach is effective at eliminating the leakage of information via`
			`pickling, but comes at the cost of breaking introspection for functions and`
			classes (as their ``__module__`` attribute now points to the wrong place).


			`Forkless Windows`
			`----------------`

			To get around the lack of ``os.fork`` on Windows, the ``multiprocessing``
			`module attempts to re-execute Python with the same main module, but skipping`
			over any code guarded by ``if __name__ == "__main__":`` checks. It does the
			`best it can with the information it has, but is forced to make assumptions`
			`that simply aren't valid whenever the main module isn't an ordinary directly`
			`executed script or top-level module. Packages and non-top-level modules`
			executed via the ``-m`` switch, as well as directly executed zipfiles or
			`directories, are likely to make multiprocessing on Windows do the wrong thing`
			`(either quietly or noisily) when spawning a new process.`

Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`While this issue currently only affects Windows directly, it also impacts`
			`any proposals to provide Windows-style "clean process" invocation via the`
			`multiprocessing module on other platforms.`

PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
			`Proposed Changes`
			`================`

			`The following changes are interrelated and make the most sense when`
			`considered together. They collectively either completely eliminate the traps`
			`for the unwary noted above, or else provide straightforward mechanisms for`
			`dealing with them.`

			`A rough draft of some of the concepts presented here was first posted on the`
			`python-ideas list [1], but they have evolved considerably since first being`
			`discussed in that thread.`


			`Fixing dual imports of the main module`
			`--------------------------------------`

			`Two simple changes are proposed to fix this problem:`

			1. In ``runpy``, modify the implementation of the ``-m`` switch handling to
			install the specified module in ``sys.modules`` under both its real name
			and the name ``__main__``. (Currently it is only installed as the latter)
			2. When directly executing a module, install it in ``sys.modules`` under
			``os.path.splitext(os.path.basename(__file__))[0]`` as well as under
			``__main__``.

Fix a few wording nits 2011-03-04 10:46:44 -05:00			`With the main module also stored under its "real" name, attempts to import it`
			will pick it up from the ``sys.modules`` cache rather than reimporting it
			`under the new name.`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00

			`Fixing direct execution inside packages`
			`---------------------------------------`

			`To fix this problem, it is proposed that an additional filesystem check be`
			performed before proceeding with direct execution of a ``PY_SOURCE`` or
			``PY_COMPILED`` file that has been named on the command line.

			This additional check would look for an ``__init__`` file that is a peer to
			the specified file with a matching extension (either ``.py``, ``.pyc`` or
			``.pyo``, depending what was passed on the command line).

			`If this check fails to find anything, direct execution proceeds as usual.`

			`If, however, it finds something, execution is handed over to a`
			helper function in the ``runpy`` module that ``runpy.run_path`` also invokes
			`in the same circumstances. That function will walk back up the`
			`directory hierarchy from the supplied path, looking for the first directory`
			that doesn't contain an ``__init__`` file. Once that directory is found, it
			will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and
			``runpy._run_module_as_main`` will be invoked with the appropriate module
			`name (as calculated based on the original filename and the directories`
Fix a few wording nits 2011-03-04 10:46:44 -05:00			traversed while looking for a directory without an ``__init__`` file).
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`The two current PEPs for namespace packages (PEP 382 and PEP 402) would both`
			`affect this part of the proposal. For PEP 382 (with its current suggestion of`
			`"*.pyp" package directories, this check would instead just walk up the`
			`supplied path, looking for the first non-package directory (this would not`
			`require any filesystem stat calls). Since PEP 402 deliberately omits explicit`
			`directory markers, it would need an alternative approach, based on checking`
			the supplied path against the contents of ``sys.path``. In both cases, the
			`direct execution behaviour can still be corrected.`

PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
			`Fixing pickling without breaking introspection`
			`----------------------------------------------`

Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`To fix this problem, it is proposed to add a new optional module level`
			attribute: ``__qname__``. This abbreviation of "qualified name" is taken
			`from PEP 3155, where it is used to store the naming path to a nested class`
			`or function definition relative to the top level module. By default,`
			``__qname__`` will be the same as ``__name__``, which covers the typical
			`case where there is a one-to-one correspondence between the documented API`
			`and the actual module implementation.`

			Functions and classes will gain a corresponding ``__qmodule__`` attribute
			that refers to their module's ``__qname__``.
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			Pseudo-modules that adjust ``__name__`` to point to the public namespace will
			leave ``__qname__`` untouched, so the implementation location remains readily
			`accessible for introspection.`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			In the main module, ``__qname__`` will automatically be set to the main
Fix a few wording nits 2011-03-04 10:46:44 -05:00			`module's "real" name (as described above under the fix to prevent duplicate`
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`imports of the main module) by the interpreter.`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			At the interactive prompt, both ``__name__`` and ``__qname__`` will be set
			to ``"__main__"``.
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`These changes on their own will fix most pickling and serialisation problems,`
			`but one additional change is needed to fix the problem with serialisation of`
			items in ``__main__``: as a slight adjustment to the definition process for
			functions and classes, in the ``__name__ == "__main__"`` case, the module
			``__qname__`` attribute will be used to set ``__module__``.
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			``pydoc`` and ``inspect`` would also be updated appropriately to:
			- use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of
			``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer
			`the qualified variants)`
			`- report both the public names and the qualified names for affected objects`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00
			`Fixing multiprocessing on Windows`
			`---------------------------------`

Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			With ``__qname__`` now available to tell ``multiprocessing`` the real
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00			`name of the main module, it should be able to simply include it in the`
Fix a few wording nits 2011-03-04 10:46:44 -05:00			`serialised information passed to the child process, eliminating the`
			need for dubious reverse engineering of the ``__file__`` attribute.
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00

			`Reference Implementation`
			`========================`

Update the module aliasing proposal based on Antoine's new qualified names PEP 2011-10-30 02:00:10 -04:00			`None as yet.`
PEP 395: Module Aliasing, aka Do What I Mean for several import and script execution corner cases 2011-03-04 10:26:35 -05:00

			`References`
			`==========`

			`.. [1] Module aliases and/or "real names"`
			`(http://mail.python.org/pipermail/python-ideas/2011-January/008983.html)`


			`Copyright`
			`=========`

			`This document has been placed in the public domain.`

			`..`
			`Local Variables:`
			`mode: indented-text`
			`indent-tabs-mode: nil`
			`sentence-end-double-space: t`
			`fill-column: 70`
			`End:`