python-peps/pep-0648.rst

491 lines
22 KiB
ReStructuredText

PEP: 648
Title: Extensible customizations of the interpreter at startup
Author: Mario Corchero <mariocj89@gmail.com>
Sponsor: Pablo Galindo
BDFL-Delegate: XXXX
Discussions-To: https://discuss.python.org/t/pep-648-extensible-customizations-of-the-interpreter-at-startup/6403
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Dec-2020
Python-Version: 3.11
Post-History: python-ideas: 16th Dec. python-dev: 18th Dec.
Abstract
========
This PEP proposes supporting extensible customization of the interpreter by
allowing users to install files that will be executed at startup.
Motivation
==========
System administrators, tools that repackage the interpreter and some
libraries need to customize aspects of the interpreter at startup time.
This is usually achieved via ``sitecustomize.py`` for system administrators
whilst libraries rely on exploiting ``pth`` files. This PEP proposes a way of
achieving the same functionality in a more user-friendly and structured way.
Limitations of ``pth`` files
----------------------------
If a library needs to perform any customization before an import or that
relates to the general working of the interpreter, they often rely on the
fact that ``pth`` files, which are loaded at startup and implemented via the
site module [#site]_, can include Python code that will be executed when the
``pth`` file is evaluated.
Note that ``pth`` files were originally developed to just add additional
directories to ``sys.path``, but they may also contain lines which start
with "import", which will be passed to ``exec()``. Users have exploited this
feature to allow the customizations that they needed. See setuptools
[#setuptools]_ or betterexceptions [#betterexceptions]_ as examples.
Using ``pth`` files for this purpose is far from ideal for library developers,
as they need to inject code into a single line preceded by an import, making
it rather unreadable. Library developers following that practice will usually
create a module that performs all actions on import, as done by
betterexceptions [#betterexceptions]_, but the approach is still not really
user friendly.
Additionally, it is also non-ideal for users of the interpreter if they want
to inspect what is being executed at Python startup as they need to review
all the ``pth`` files for potential code execution which can be spread across
all site paths. Most of those ``pth`` files will be "legitimate" ``pth``
files that just modify the path, answering the question of "what is changing
my interpreter at startup" a rather complex one.
Lastly, there have been multiple suggestions for removing code execution from
``pth`` files, see [#bpo-24534]_ and [#bpo-33944]_.
Limitations of ``sitecustomize.py``
-----------------------------------
Whilst sitecustomize is an acceptable solution, it assumes a single person is
in charge of the system and the interpreter. If both the system administrator
and the responsibility of provisioning the interpreter want to add
customizations at the interpreter startup they need to agree on the contents
of the file and combine all the changes. This is not a major limitation
though, and it is not the main driver of this change. Should the change
happen, it will also improve the situation for these users, as rather than
having a ``sitecustomize.py`` which performs all those actions, they can have
custom isolated files named after the features they want to enhance. As an
example, Ubuntu could change their current ``sitecustomize.py`` to just be
``ubuntu_apport_python_hook``. This not only better represents its intent but
also gives users of the interpreter a better understanding of the
modifications happening on their interpreter.
Rationale
=========
This PEP proposes supporting extensible customization of the interpreter at
startup by executing all files discovered in directories named
``__sitecustomize__`` in sitepackages [#sitepackages-api]_ or
usersitepackages [#usersitepackages-api]_ at startup time.
Why ``__sitecustomize__``
-------------------------
The name aims to follow the already existing concept of ``sitecustomize.py``.
As the directory will be within ``sys.path``, given that it is located in
site paths, we choose to use double underscore around its name, to prevent
colliding with the already existing ``sitecustomize.py``.
Discovering the new ``__sitecustomize__`` directories
-----------------------------------------------------
The Python interpreter will look at startup for directory named
``__sitecustomize__`` within any of the standard site-packages path.
These are commonly the Python system location and the user location, but are
ultimately defined by the site module logic.
Users can use ``site.sitepackages`` [#sitepackages-api]_ and
``site.usersitepackages`` [#usersitepackages-api]_ to know the paths where
the interpreter can discover ``__sitecustomize__`` directories.
Time of ``__sitecustomize__`` discovery
---------------------------------------
The ``__sitecustomize__`` directories will be discovered exactly after ``pth``
files are discovered in a site-packages path as part of ``site.addsitedir``
[#siteaddsitedir]_.
These is repeated for each of the site-packages path in the exact same order
that is being followed today for ``pth`` files.
Order of execution within ``__sitecustomize__``
-----------------------------------------------
The implementation will execute the files within ``__sitecustomize__`` by
sorting them by name when discovering each of the ``__sitecustomize__``
directories. We discourage users to rely on the order of execution though.
We considered executing them in random order, but that could result in
different results depending on how the interpreter chooses to pick up those
files. So even if it won't be a good practice to rely on other files being
executed, we think that is better than having randomly different results on
interpreter startup. We chose to run the files after the ``pth`` files in
case a user needs to add items to the path before running a files.
Interaction with ``pth`` files
------------------------------
``pth`` files can be used to add paths into ``sys.path``, but this should not
affect the ``__sitecustomize__`` discovery process, as those directories are
looked up exclusively in site-packages paths.
Execution of files within ``__sitecustomize__``
-----------------------------------------------
When a ``__sitecustomize__`` directory is discovered, all of the files that
have a ``.py`` extension within it will be read with ``io.open_code`` and
executed by using ``exec`` [#exec]_.
An empty dictionary will be passed as ``globals`` to the ``exec`` function
to prevent unexpected interactions between different files.
Failure handling
----------------
Any error on the execution of any of the files will not be logged unless the
interpreter is run in verbose mode and it should not stop the evaluation of
other files. The user will receive a message in stderr saying that the file
failed to be executed and that verbose mode can be used to get more
information. This behaviour mimics the one existing for ``sitecustomize.py``.
Interaction with virtual environments
-------------------------------------
The customizations applied to an interpreter via the new
``__sitecustomize__`` solutions will continue to work when a user creates a
virtual environment the same way that ``sitecustomize.py``
interact with virtual environments.
This is a difference when compared to ``pth`` files, which are not propagated
into virtual environments unless ``include-system-site-packages`` is enabled.
If library maintainers have features installed via ``__sitecustomize__`` that
they do not want to propagate into virtual environments, they should detect
if they are running within a virtual environment by checking ``sys.prefix ==
sys.base_prefix``. This behavior is similar to packages that modify the global
``sitecustomize.py``.
Interaction with ``sitecustomize.py`` and ``usercustomize.py``
--------------------------------------------------------------
Until removed, ``sitecustomize`` and ``usercustomize`` will be executed after
``__sitecustomize__`` similar to pth files. See the Backward compatibility
section for information on removal plans for ``sitecustomize`` and
``usercustomize``.
Identifying all installed files
-------------------------------
To facilitate debugging of the Python startup, if the site module is invoked
it will print the ``__sitecustomize__`` directories that will be discovered
on startup.
Files naming convention
-----------------------
Packages will be encouraged to include the name of the package within the
name of the file to avoid collisions between packages. But the only
requirement on the filename is that it ends in ``.py`` for the interpreter to
execute them.
Disabling start files
---------------------
In some scenarios, like when the startup time is key, it might be desired to
disable this option altogether. The already existing flag ``-S`` [#s-flag]_
will disable all ``site``-related manipulation, including this new feature.
If the flag is passed in, ``__sitecustomize__`` directories will not be
discovered.
Additionally, to allow for starting the interpreter disabling only this new
feature a new option will be added under ``-X``: ``disablesitecustomize``,
which will disable the discovery of ``__sitecustomize__`` exclusively.
Lastly, the user can disable the discovery of ``__sitecustomize__``
directories only in the user site by disabling the user site via any of the
multiple options in the ``site.py`` module.
Support in build backends
-------------------------
Whilst build backends can choose to provide an option to facilitate the
installation of these files into a ``__sitecustomize__`` directory, this
PEP does not address that directly. Similar to ``pth`` files, build backends
can choose to not provide an easy-to-configure mechanism for
``__sitecustomize__`` files and let users hook into the installation
process to include such files. We do not think build backends enhanced
support as a requirement for this PEP.
Impact on startup time
----------------------
A concern in this implementation is how Python interpreter startup time can
be affected by this addition. We expect the performance impact to be highly
coupled to the logic in the files that a user or sysadmin installs in the
Python environment being tested.
If the interpreter has any files in their ``__sitecustomize__`` directory,
the file execution time plus a call reading the code will be added to the
startup time. This is similar to how code execution is impacting startup time
through ``sitecustomize.py``, ``usercustomize.py`` and code in ``pth`` files.
We will therefore focus here on comparing this solution against those three,
as otherwise the actual time added to startup is highly dependent on the code
that is being executed in those files.
Results were gathered by running "./python.exe -c pass" with perf on 50
iterations, repeating 50 times the command on each iteration and getting the
geometric mean of all the results. The file used to run those benchmarks is
checked in in the reference implementation [#reference-implementation]_.
The benchmark was run with 3.10 alpha 7 compiled with PGO and LTO with the
following parameters and system state:
- Perf event: Max sample rate set to 1 per second
- CPU Frequency: Minimum frequency of CPU 17,35 set to the maximum frequency
- Turbo Boost (MSR): Turbo Boost disabled on CPU 17: MSR 0x1a0 set to 0x4000850089
- IRQ affinity: Set default affinity to CPU 0-16,18-34
- IRQ affinity: Set affinity of IRQ 1,3-16,21,25-31,56-59,68-85,87,89-90,92-93,95-104 to CPU 0-16,18-34
- CPU: use 2 logical CPUs: 17,35
- Perf event: Maximum sample rate: 1 per second
- ASLR: Full randomization
- Linux scheduler: Isolated CPUs (2/36): 17,35
- Linux scheduler: RCU disabled on CPUs (2/36): 17,35
- CPU Frequency: 0-16,18-34=min=1200 MHz, max=3600 MHz; 17,35=min=max=3600 MHz
- Turbo Boost (MSR): CPU 17,35: disabled
The code placed to be executed in ``pth`` files, ``sitecustomize.py``,
``usercustomize.py`` and files within ``__sitecustomize__`` is the following:
import time; x = time.time() ** 5
The file is aimed at execution a simple operation but still expected to be
negligible. This is to put the experiment in a situation where we make
visible any hit on performance due to the mechanism whilst still making it
relatively realistic. Additionally, it starts with an import and is a single
line to be able to be used in ``pth`` files.
==== ==================== ==================== ======= ===================== ====== =====
Test # of files Time (us)
---- -------------------------------------------------------------------------- -------------
# ``sitecustomize.py`` ``usercustomize.py`` ``pth`` ``__sitecustomize__`` Run 1 Run 2
==== ==================== ==================== ======= ===================== ====== =====
1 0 0 0 Dir not created 13884 13897
2 0 0 0 0 13871 13818
3 0 0 1 0 13964 13924
4 0 0 0 1 13940 13939
5 1 1 0 0 13990 13993
6 0 0 0 2 (system + user) 14063 14040
7 0 0 50 0 16011 16014
8 0 0 0 50 15456 15448
==== ==================== ==================== ======= ===================== ====== =====
Results can be reproduced with ``run-benchmark.py`` script provided in the
reference implementation [#reference-implementation]_.
We interpret the following from these results:
- Using two ``__sitecustomize__`` scripts compared to ``sitecustomize.py``
and ``usercustomize.py`` slows down the interpreter by 0.3%. We expect this
slowdown until ``sitecustomize.py`` and ``usercustomize.py`` are removed in
a future release as even if the user does not create the files, the
interpreter will still attempt to import them.
- With the arbitrary 50 pth files with code tested, moving those to
``__sitecustomize__`` produces a speedup of ~3.5% in startup. Which is likely
related to the simpler logic to evaluate ``__sitecustomize__`` files compared
to ``pth`` file execution.
- In general all measurements show that there is a low impact on startup time
with this addition.
Audit Event
-----------
A new audit event will be added and triggered on ``__sitecustomize__``
execution to facilitate security inspection by calling ``sys.audit``
[#sysaudit]_ with "sitecustimze.exec_file" as name and the filename as
argument.
Security implications
---------------------
This PEP aims to move all code execution from ``pth`` files to files within a
``__sitecustomize__`` directory. We think this is an improvement to system admins
for the following reasons:
* Allows to quickly identify the code being executed at startup time by the
interpreter by looking into a single directory rather than having to scan
all ``pth`` files.
* Allows to track usage of this feature through the new proposed audit event.
* Gives finer grain control by allowing to tune permissions on the
``__sitecustomize__`` directory, potentially allowing users to install only
packages that does not change the interpreter startup.
In short, whilst this allows for a malicious users to drop a file that will
be executed at startup, it's an improvement compared to the existing ``pth``
files.
How to teach this
=================
This can be documented and taught as simple as saying that the interpreter
will try to look for the ``__sitecustomize__`` directory at startup in its
site paths and if it finds any files with ``.py`` extension, it will then
execute it one by one.
For system administrators and tools that package the interpreter, we can now
recommend placing files in ``__sitecustomize__`` as they used to place
``sitecustomize.py``. Being more comfortable on that their content won't be
overridden by the next person, as they can provide with specific files to
handle the logic they want to customize.
Library developers should be able to specify a new argument on tools like
setuptools that will inject those new files. Something like
``sitecustomize_files=["scripts/betterexceptions.py"]``, which allows them to
add those. Should the build backend not support that, they can manually
install them as they used to do with ``pth`` files. We will recommend them to
include the name of the package as part of the file's name.
Backward compatibility
======================
This PEP adds a deprecation warning on ``sitecustomize.py``,
``usercustomize.py`` and ``pth`` code execution in 3.11, 3.12 and 3.13. With
plans on removing those features by 3.14. The migration from those solutions
to ``__sitecustomize__`` should ideally be just moving the logic into a
different file.
Whilst the existing ``sitecustomize.py`` mechanism was created targeting
System Administrators that placed it in a site path, the file could be
actually placed anywhere in the path at the time that the interpreter was
starting up. The new mechanism does not allow for users to place
``__sitecustomize__`` directories anywhere in the path, but only in site
paths. System administrators can recover a similar behavior to
``sitecustomize.py`` by adding a custom file in ``__sitecustomize__`` which
just imports ``sitecustomize`` as a migration path.
Reference Implementation
========================
An initial implementation that passes the CPython test suite is available for
evaluation [#reference-implementation]_.
This implementation is just for the reviewer to play with and check potential
issues that this PEP could generate.
Rejected Ideas
==============
Do nothing
----------
Whilst the current status "works" it presents the issues listed in the
motivation. After analyzing the impact of this change, we believe it is worth
it, given the enhanced experience it brings.
Formalize using ``pth`` files
-----------------------------
Another option would be to just glorify and document the usage of ``pth`` files
to inject code at startup code, but that is a suboptimal experience for users
as listed in the motivation.
Making ``__sitecustomize__`` a namespace package
------------------------------------------------
We considered making the directory a namespace package and just import all
the modules within it, which allowed searching across all paths in
``sys.path`` at initialization time and provided a way to declare
dependencies between files by importing each other. This was rejected for
multiple reasons:
1. This was unnecessarily broadening the list of paths where arbitrary files
are executed.
2. The logic brought additional complexity, like what to do if a package were
to install an ``__init__.py`` file in one of the locations.
3. It's cheaper to search for ``__sitecustomize__`` as we are looking for
``pth`` files already in the site paths compared to performing an actual
import of a namespace package.
Support for shutdown customization
----------------------------------
``init.d`` users might be tempted to implement this feature in a way that users
could also add code at shutdown, but extra support for that is not needed, as
Python users can already do that via ``atexit``.
Using entry_points
------------------
We considered extending the use of entry points to allow specifying files
that should be executed at startup but we discarded that solution due to two
main reasons. The first one being impact on startup time. This approach will
require scanning all packages distribution information to just execute a
handful of files. This has an impact on performance even if the user is not
using the feature and such impact growths linearly with the number of packages
installed in the environment. The second reason was that the proposed
implementation in this PEP offers a single solution for startup customization
for packages and system administrators. Additionally, if the main objective of
entry points is to make it easy for libraries to install files at startup,
that can still be added and make the build backends just install the files
within the ``__sitecustomize__`` directory.
Copyright
=========
This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.
Acknowledgements
================
Thanks Pablo Galindo for contributing to this PEP and offering his PC to run
the benchmark.
References
==========
.. [#bpo-24534]
https://bugs.python.org/issue24534
.. [#bpo-33944]
https://bugs.python.org/issue33944
.. [#s-flag]
https://docs.python.org/3/using/cmdline.html#id3
.. [#setuptools]
https://github.com/pypa/setuptools/blob/b6bbe236ed0689f50b5148f1172510b975687e62/setup.py#L100
.. [#betterexceptions]
https://github.com/Qix-/better-exceptions/blob/7b417527757d555faedc354c86d3b6fe449200c2/better_exceptions_hook.pth#L1
.. [#reference-implementation]
https://github.com/mariocj89/cpython/tree/pu/__sitecustomize__
.. [#site]
https://docs.python.org/3/library/site.html
.. [#sitepackages-api]
https://docs.python.org/3/library/site.html?highlight=site#site.getsitepackages
.. [#usersitepackages-api]
https://docs.python.org/3/library/site.html?highlight=site#site.getusersitepackages
.. [#siteaddsitedir]
https://github.com/python/cpython/blob/5787ba4a45492e232f5470c7d2e93763198e4b22/Lib/site.py#L207
.. [#exec]
https://docs.python.org/3/library/functions.html#exec
.. [#sysaudit]
https://docs.python.org/3/library/sys.html#sys.audit