PEP 534: Improve title and clarify proposal (#949)

The previous title gave the impression that this PEP was about
splitting the standard library into separately versioned pieces,
when it's actually a far more modest proposal to improve the
error messages that users receive when pieces are missing, as
well as to provide a standard library API to retrieve the full
list of expected standard library module names.

It also includes a number of other improvements that came up
during the PR review:

- explicitly limit metadata to top level modules
- be clear that private stdlib modules need to be listed
- provide more details on which modules would be optional
- use tkinter and idlelib as additional examples
- provide rationale for packaging modules being mandatory
- provide example error messages for missing submodules
- add subheadings to the design discussion section
- rename Open Questions to Deferred Ideas to avoid scope
  creep in the initial proposal
This commit is contained in:
Nick Coghlan 2019-04-07 17:45:54 +10:00 committed by GitHub
parent 1dd991e900
commit ad9c99dd8d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 264 additions and 35 deletions

View File

@ -1,5 +1,5 @@
PEP: 534
Title: Distributing a Subset of the Standard Library
Title: Improved Errors for Missing Standard Library Modules
Version: $Revision$
Last-Modified: $Date$
Author: Tomáš Orsava <tomas.n@orsava.cz>,
@ -9,7 +9,7 @@ Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 5-Sep-2016
Python-Version: 3.7
Python-Version: 3.8
Post-History:
@ -17,26 +17,48 @@ Abstract
========
Python is often being built or distributed without its full standard library.
However, there is as of yet no standard, user friendly way of properly informing the user about the failure to import such missing `stdlib` modules. This PEP proposes a mechanism for identifying standard library modules and informing the user appropriately.
However, there is as of yet no standard, user friendly way of properly
informing the user about the failure to import such missing standard library
modules.
This PEP proposes a mechanism for identifying expected standard library modules
and providing more informative error messages to users when attempts to import
standard library modules fail.
Motivation
==========
There are several use cases for including only a subset of Python's standard
library. However, there is so far no user-friendly mechanism for informing the user *why* a stdlib module is missing and how to remedy the situation appropriately.
library. However, there is so far no user-friendly mechanism for informing
the user *why* a stdlib module is missing and how to remedy the situation
appropriately.
CPython
-------
When one of Python standard library modules (such as ``_sqlite3``) cannot be
compiled during a Python build because of missing dependencies (e.g. SQLite
header files), the module is simply skipped. If you then install this compiled Python and use it to try to import one of the
missing modules, Python will fail with a ModuleNotFoundError_.
When one of Python's standard library modules (such as ``_sqlite3``) cannot be
compiled during a CPython build because of missing dependencies (e.g. SQLite
header files), the module is simply skipped. If you then install this compiled
Python and use it to try to import one of the missing modules, Python will fail
with a ModuleNotFoundError_.
.. _ModuleNotFoundError:
https://docs.python.org/3.7/library/exceptions.html#ModuleNotFoundError
For example, after deliberately removing ``sqlite-devel`` from the local
system::
$ ./python -c "import sqlite3"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ncoghlan/devel/cpython/Lib/sqlite3/__init__.py", line 23, in <module>
from sqlite3.dbapi2 import *
File "/home/ncoghlan/devel/cpython/Lib/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
This can confuse users who may not understand why a cleanly built Python is
missing standard library modules.
@ -47,9 +69,10 @@ Linux and other distributions
Many Linux and other distributions are already separating out parts of the
standard library to standalone packages. Among the most commonly excluded
modules are the ``tkinter`` module, since it draws in a dependency on the
graphical environment, and the ``test`` package, as it only serves to test
Python internally and is about as big as the rest of the standard library put
together.
graphical environment, ``idlelib``, since it depends on ``tkinter`` (and most
Linux desktop environments provide their own default code editor), and the
``test`` package, as it only serves to test Python internally and is about as
big as the rest of the standard library put together.
The methods of omission of these modules differ. For example, Debian patches
the file ``Lib/tkinter/__init__.py`` to envelop the line ``import _tkinter`` in
@ -58,46 +81,247 @@ the following to the error message: ``please install the python3-tk package``
[#debian-patch]_. Fedora and other distributions simply don't include the
omitted modules, potentially leaving users baffled as to where to find them.
An example from Fedora 29::
$ python3 -c "import tkinter"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'tkinter'
Specification
=============
The `sysconfig`_ module will be extended by two functions: `sysconfig.get_stdlib_modules()`, which will provide a list of the names of all Python standard library modules, and a function `sysconfig.get_optional_modules()`, that will list optional `stdlib` module names. The results of the latter function—`sysconfig.get_optional_modules()`—as well as of the existing `sys.builtin_module_names` will both be subsets of the full list provided by `sysconfig.get_stdlib_modules()`. These added lists will be generated during the Python build process and saved in `_sysconfigdata-*.py` file along with other `sysconfig`_ values.
APIs to list expected standard library modules
----------------------------------------------
.. _`sysconfig`: https://docs.python.org/3.7/library/sysconfig.html
To allow for easier identification of which module names are *expected* to be
resolved in the standard library, the `sysconfig`_ module will be extended
with two additional functions:
The default implementation of the `sys.excepthook`_ function will then be modified to dispense an appropriate message when it detects a failure to import a module identified by one of the two new `sysconfig`_ functions as belonging to the Python standard library.
* ``sysconfig.get_stdlib_modules()``, which will provide a list of the names of
all top level Python standard library modules (including private modules)
* ``sysconfig.get_optional_modules()``, which will list optional public top level
standard library module names
.. _`sys.excepthook`: https://docs.python.org/3.7/library/sys.html#sys.excepthook
The results of ``sysconfig.get_optional_modules()``and the existing
``sys.builtin_module_names`` will both be subsets of the full list provided by
the new ``sysconfig.get_stdlib_modules()`` function.
These added lists will be generated during the Python build process and saved in
the ``_sysconfigdata-*.py`` file along with other `sysconfig`_ values.
Possible reasons for modules being in the "optional" list will be:
* the module relies on an optional build dependency (e.g. ``_sqlite3``,
``tkinter``, ``idlelib``)
* the module is private for other reasons and hence may not be present on all
implementations (e.g. ``_freeze_importlib``, ``_collections_abc``)
* the module is platform specific and hence may not be present in all
installations (e.g. ``winreg``)
* the ``test`` package may also be freely omitted from Python runtime
installations, as it is intended for use in testing Python implementations,
not as a runtime library for Python projects to use (the public API offering
testing utilities is ``unittest``)
(Note: the ``ensurepip``, ``venv``, and ``distutils`` modules are all considered
mandatory modules in this PEP, even though not all redistributors currently
adhere to that practice)
.. _`sysconfig`: https://docs.python.org/3/library/sysconfig.html
Rationale
=========
Changes to the default ``sys.excepthook`` implementation
--------------------------------------------------------
The `sys.excepthook`_ function gets called when a raised exception is uncaught and the program is about to exit or—in an interactive session—the control is being returned to the prompt. This makes it a perfect place for customized error messages, as it will not influence caught errors and thus not slow down normal execution of Python scripts.
The default implementation of the `sys.excepthook`_ function will then be
modified to dispense an appropriate message when it detects a failure to
import a module identified by one of the two new `sysconfig`_ functions as
belonging to the Python standard library.
The inclusion of the functions `sysconfig.get_stdlib_modules()` and `sysconfig.get_optional_modules()` will also provide a long sought-after way of easily listing the names of Python standard library modules [#stackoverflow-stdlib]_, which will—among other benefits—make it easier for code analysis, profiling, and error reporting tools to offer runtime `--ignore-stdlib` flags.
.. _`sys.excepthook`: https://docs.python.org/3/library/sys.html#sys.excepthook
Ideas leading up to this PEP were discussed on the `python-dev mailing list`_ and subsequently on `python-ideas`_.
Revised error message for a module that relies on an optional build dependency
or is otherwise considered optional when Python is installed::
.. _`python-dev mailing list`:
https://mail.python.org/pipermail/python-dev/2016-July/145534.html
.. _`python-ideas`:
https://mail.python.org/pipermail/python-ideas/2016-December/043907.html
$ ./python -c "import sqlite3"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ncoghlan/devel/cpython/Lib/sqlite3/__init__.py", line 23, in <module>
from sqlite3.dbapi2 import *
File "/home/ncoghlan/devel/cpython/Lib/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: Optional standard library module '_sqlite3' was not found
Revised error message for a submodule of an optional top level package when the
entire top level package is missing::
$ ./python -c "import test.regrtest"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: Optional standard library module 'test' was not found
Revised error message for a submodule of an optional top level package when the
top level package is present::
$ ./python -c "import test.regrtest"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No submodule named 'test.regrtest' in optional standard library module 'test'
Revised error message for a module that is always expected to be available::
$ ./python -c "import ensurepip"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: Standard library module 'ensurepip' was not found
Revised error message for a missing submodule of a standard library package when
the top level package is present::
$ ./python -c "import encodings.mbcs"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No submodule named 'encodings.mbcs' in standard library module 'encodings'
These revised error messages make it clear that the missing modules are expected
to be available from the standard library, but are not available for some reason,
rather than being an indicator of a missing third party dependency in the current
environment.
Design Discussion
=================
Modifying ``sys.excepthook``
----------------------------
The `sys.excepthook`_ function gets called when a raised exception is uncaught
and the program is about to exit or (in an interactive session) the control is
being returned to the prompt. This makes it a perfect place for customized
error messages, as it will not influence caught errors and thus not slow down
normal execution of Python scripts.
Public API to query expected standard library module names
----------------------------------------------------------
The inclusion of the functions ``sysconfig.get_stdlib_modules()`` and
``sysconfig.get_optional_modules()`` will provide a long sought-after
way of easily listing the names of Python standard library modules
[#stackoverflow-stdlib]_, which will (among other benefits) make it easier for
code analysis, profiling, and error reporting tools to offer runtime
``--ignore-stdlib`` flags.
Only including top level module names
-------------------------------------
This PEP proposes that only top level module and package names be reported by
the new query APIs. This is sufficient information to generate the proposed
error messages, reduces the number of required entries by an order of magnitude,
and simplifies the process of generating the related metadata during the build
process.
If this is eventually found to be overly limiting, a new ``include_submodules``
flag could be added to the query APIs. However, this is *not* part of the initial
proposal, as the benefits of doing so aren't currently seen as justifying the
extra complexity.
There is one known consequence of this restriction, which is that the new
default ``excepthook`` implementation will report incorrect submodules names the
same way that it reports genuinely missing standard library submodules::
$ ./python -c "import unittest.muck"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No submodule named 'unittest.muck' in standard library module 'unittest'
Listing private top level module names as optional standard library modules
---------------------------------------------------------------------------
Many of the modules that have an optional external build dependency are written
as hybrid modules, where there is a shared Python wrapper around an
implementation dependent interface to the underlying external library. In other
cases, a private top level module may simply be a CPython implementation detail,
and other implementations may not provide that module at all.
To report import errors involving these modules appropriately, the new default
``excepthook`` implementation needs them to be reported by the new query APIs.
Deeming packaging related modules to be mandatory
-------------------------------------------------
Some redistributors aren't entirely keen on installing the Python specific
packaging related modules (``distutils``, ``ensurepip``, ``venv``) by default,
preferring that developers use their platform specific tooling instead.
This approach causes interoperability problems for developers working on
cross-platform projects and educators attempting to write platform independent
setup instructions, so this PEP takes the view that these modules should be
considered mandatory, and left out of the list of optional modules.
Deferred Ideas
==============
The ideas in this section are concepts that this PEP would potentially help
enable, but they're considered out of scope for the initial proposal.
Platform dependent modules
--------------------------
Some standard library modules may be missing because they're only provided on
particular platforms. For example, the ``winreg`` module is only available on
Windows::
$ python3 -c "import winreg"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'winreg'
In the current proposal, these platform dependent modules will simply be
included with all the other optional modules rather than attempting to expose
the platform dependency information in a more structured way.
However, the platform dependence is at least tracked at the level of "Windows",
"Unix", "Linux", and "FreeBSD" for the benefit of `the documentation`_, so it
seems plausible that it could potentially be exposed programmatically as well.
.. _the documentation: https://docs.python.org/3/py-modindex.html
Emitting a warning when ``__main__`` shadows a standard library module
----------------------------------------------------------------------
Given the new query APIs, the new default ``excepthook`` implementation could
potentially detect when ``__main__.__file__`` or ``__main__.__spec__.name``
match a standard library module, and emit a suitable warning.
However, actually doing anything along this lines should review more cases where
uses actually encounter this problem, and the various options for potentially
offering more information to assist in debugging the situation, rather than
needing to be incorporated right now.
Recommendation for Downstream Distributors
==========================================
By patching `site.py`_ [*]_ to provide their own implementation of the `sys.excepthook`_ function, Python distributors can display tailor-made error messages for any uncaught exceptions, including informing the user of a proper, distro-specific way to install missing standard library modules upon encountering a `ModuleNotFoundError`_. Some downstream distributors are already using this method of patching `sys.excepthook` to integrate with platform crash reporting mechanisms.
By patching `site.py`_ [*]_ to provide their own implementation of the
`sys.excepthook`_ function, Python distributors can display tailor-made
error messages for any uncaught exceptions, including informing the user of
a proper, distro-specific way to install missing standard library modules upon
encountering a `ModuleNotFoundError`_.
Some downstream distributors are already using this method of patching
`sys.excepthook` to integrate with platform crash reporting mechanisms.
.. _`site.py`: https://docs.python.org/3.7/library/site.html
.. _`sitecustomize.py`: `site.py`_
An `example implementation`_ is attached to this PEP.
.. _`example implementation`: `Reference and Example Implementation`_
Backwards Compatibility
=======================
@ -111,12 +335,8 @@ Reference and Example Implementation
====================================
TBD. The finer details will depend on what's practical given the capabilities
of the build system.
.. Reference implementation can be found on `GitHub`_ and is also accessible in the form of a `patch`_.
.. _`GitHub`: https://github.com/torsava/cpython/pull/1
.. _`patch`: https://github.com/torsava/cpython/pull/1.patch
of the CPython build system (other implementations should then be able to use
the generated CPython data, rather than having to regenerate it themselves).
Notes and References
@ -130,6 +350,15 @@ Notes and References
http://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules
Ideas leading up to this PEP were discussed on the `python-dev mailing list`_
and subsequently on `python-ideas`_.
.. _`python-dev mailing list`:
https://mail.python.org/pipermail/python-dev/2016-July/145534.html
.. _`python-ideas`:
https://mail.python.org/pipermail/python-ideas/2016-December/043907.html
Copyright
=========