242 lines
8.5 KiB
Plaintext
242 lines
8.5 KiB
Plaintext
PEP: 499
|
||
Title: ``python -m foo`` should bind ``sys.modules['foo']`` in addition to ``sys.modules['__main__']``
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Cameron Simpson <cs@cskk.id.au>, Chris Angelico <rosuav@gmail.com>, Joseph Jevnik <joejev@gmail.com>
|
||
BDFL-Delegate: Nick Coghlan
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 07-Aug-2015
|
||
Python-Version: 3.8
|
||
|
||
Abstract
|
||
========
|
||
|
||
When a module is used as a main program on the Python command line,
|
||
such as by:
|
||
|
||
python -m module.name ...
|
||
|
||
it is easy to accidentally end up with two independent instances
|
||
of the module if that module is again imported within the program.
|
||
This PEP proposes a way to fix this problem.
|
||
|
||
When a module is invoked via Python's -m option the module is bound
|
||
to ``sys.modules['__main__']`` and its ``.__name__`` attribute is set to
|
||
``'__main__'``.
|
||
This enables the standard "main program" boilerplate code at the
|
||
bottom of many modules, such as::
|
||
|
||
if __name__ == '__main__':
|
||
sys.exit(main(sys.argv))
|
||
|
||
However, when the above command line invocation is used it is a
|
||
natural inference to presume that the module is actually imported
|
||
under its official name ``module.name``,
|
||
and therefore that if the program again imports that name
|
||
then it will obtain the same module instance.
|
||
|
||
That actuality is that the module was imported only as ``'__main__'``.
|
||
Another import will obtain a distinct module instance, which can
|
||
lead to confusing bugs,
|
||
all stemming from having two instances of module global objects:
|
||
one in each module.
|
||
|
||
Examples include:
|
||
|
||
module level data structures
|
||
Some modules provide features such as caches or registries
|
||
as module level global variables,
|
||
typically private.
|
||
A second instance of a module creates a second data structure.
|
||
If that structure is a cache
|
||
such as in the ``re`` module
|
||
then two caches exist leading to wasteful memory use.
|
||
If that structure is a shared registry
|
||
such as a mapping of values to handlers
|
||
then it is possible to register a handler to one registry
|
||
and to try to use it via the other registry, where it is unknown.
|
||
|
||
sentinels
|
||
The standard test for a sentinel value provided by a module
|
||
is the identity comparison using ``is``,
|
||
as this avoids unreliable "looks like" comparisons
|
||
such as equality which can both mismatch two values as "equal"
|
||
(for example being zeroish)
|
||
or raise a ``TypeError`` when the objects are incompatible.
|
||
When there are two instances of a module
|
||
there are two sentinel instances
|
||
and only one will be recognised via ``is``.
|
||
|
||
classes
|
||
With two modules
|
||
there are duplicate class definitions of any classes provided.
|
||
All operations which depend on recognising these classes
|
||
and subclasses of these are prone to failure
|
||
depending where the reference class
|
||
(from one of the modules) is obtained
|
||
and where the comparison class or instance is obtained.
|
||
This impacts ``isinstance``, ``issubclass``
|
||
and also ``try``/``except`` constructs.
|
||
|
||
Proposal
|
||
========
|
||
|
||
It is suggested that to fix this situation all that is needed is a
|
||
simple change to the way the ``-m`` option is implemented: in addition
|
||
to binding the module object to ``sys.modules['__main__']``, it is also
|
||
bound to ``sys.modules['module.name']``.
|
||
|
||
Nick Coghlan has suggested that this is as simple as modifying the
|
||
``runpy`` module's ``_run_module_as_main`` function as follows::
|
||
|
||
main_globals = sys.modules["__main__"].__dict__
|
||
|
||
to instead be::
|
||
|
||
main_module = sys.modules["__main__"]
|
||
sys.modules[mod_spec.name] = main_module
|
||
main_globals = main_module.__dict__
|
||
|
||
Joseph Jevnik has pointed out that modules which are packages already
|
||
do something very similar to this proposal:
|
||
the __init__.py file is bound to the module's canonical name
|
||
and the __main__.py file is bound to "__main__".
|
||
As such, the double import issue does not occur.
|
||
Therefore this PEP proposes to affect only simple non-package modules.
|
||
|
||
|
||
Considerations and Prerequisites
|
||
================================
|
||
|
||
Pickling Modules
|
||
----------------
|
||
|
||
Nick has mentioned `issue 19702`_ which proposes (quoted from the issue):
|
||
|
||
- runpy will ensure that when __main__ is executed via the import
|
||
system, it will also be aliased in sys.modules as __spec__.name
|
||
- if __main__.__spec__ is set, pickle will use __spec__.name rather
|
||
than __name__ to pickle classes, functions and methods defined in
|
||
__main__
|
||
- multiprocessing is updated appropriately to skip creating __mp_main__
|
||
in child processes when __main__.__spec__ is set in the parent
|
||
process
|
||
|
||
The first point above covers this PEP's specific proposal.
|
||
|
||
|
||
A Normal Module's ``__name__`` Is No Longer Canonical
|
||
-----------------------------------------------------
|
||
|
||
Chris Angelico points out that it becomes possible to import a
|
||
module whose ``__name__`` is not what you gave to "import", since
|
||
"__main__" is now present at "module.name", so a subsequent
|
||
``import module.name`` finds it already present.
|
||
Therefore ``__name__`` is no longer the canonical name for some normal imports.
|
||
|
||
Some counter arguments follow:
|
||
|
||
- As of PEP 451 a module's canonical name is stored at ``__spec__.name``.
|
||
- Very little code should actually care about ``__name__`` being the canonical name
|
||
and any that does should arguably be updated to consult ``__spec__.name``
|
||
with fallback to ``__name__`` for older Pythons, should that be relevant.
|
||
This is true even if this PEP is not approved.
|
||
- Should this PEP be approved,
|
||
it becomes possible to introspect a module by its canonical name
|
||
and ask "was this the main program?" by inferring from ``__name__``.
|
||
This was not previously possible.
|
||
|
||
The glaring counter example is the standard "am I the main program?" boilerplate,
|
||
where ``__name__`` is expected to be "__main__".
|
||
This PEP explicitly preserves that semantic.
|
||
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
`BPO 36375 <https://bugs.python.org/issue36375>`_ is the issue tracker entry
|
||
for the PEP's reference implementation, with the current draft PR being
|
||
available `on GitHub <https://github.com/python/cpython/pull/12490>`_.
|
||
|
||
|
||
Open Questions
|
||
==============
|
||
|
||
This proposal does raise some backwards compatibility concerns, and these will
|
||
need to be well understood, and either a deprecation process designed, or clear
|
||
porting guidelines provided.
|
||
|
||
Pickle compatibility
|
||
--------------------
|
||
|
||
If no changes are made to the pickle module, then pickles that were previously
|
||
being written with the correct module name (due to a dual import) may start
|
||
being written with `__main__` as their module name instead, and hence fail to be
|
||
loaded correctly by other projects.
|
||
|
||
Scenarios to be checked:
|
||
|
||
* `python script.py` writing, `python -m script` reading
|
||
* `python -m script` writing, `python script.py` reading
|
||
* `python -m script` writing, `python some_other_app.py` reading
|
||
* `old_python -m script` writing, `new_python -m script` reading
|
||
* `new_python -m script` writing, `old_python -m script` reading
|
||
|
||
|
||
Projects that special-case `__main__`
|
||
-------------------------------------
|
||
|
||
In order to get the regression test suite to pass, the current reference
|
||
implementation had to patch `pdb` to avoid destroying its own global namespace.
|
||
|
||
This suggests there may be a broader compatibility issue where some scripts are
|
||
relying on direct execution and import giving different namespaces (just as
|
||
package execution keeps the two separate by executing the `__main__` submodule
|
||
in the `__main__` namespace, while the package name references the `__init__`
|
||
file as usual.
|
||
|
||
|
||
Background
|
||
==========
|
||
|
||
`I tripped over this issue`_ while debugging a main program via a
|
||
module which tried to monkey patch a named module, that being the
|
||
main program module. Naturally, the monkey patching was ineffective
|
||
as it imported the main module by name and thus patched the second
|
||
module instance, not the running module instance.
|
||
|
||
However, the problem has been around as long as the ``-m`` command
|
||
line option and is encountered regularly, if infrequently, by others.
|
||
|
||
In addition to `issue 19702`_, the discrepancy around `__main__`
|
||
is alluded to in PEP 451 and a similar proposal (predating PEP 451)
|
||
is described in PEP 395 under `Fixing dual imports of the main module`_.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. _issue 19702: http://bugs.python.org/issue19702
|
||
|
||
.. _I tripped over this issue: https://mail.python.org/pipermail/python-list/2015-August/694905.html
|
||
|
||
.. _Fixing dual imports of the main module: https://www.python.org/dev/peps/pep-0395/#fixing-dual-imports-of-the-main-module
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|