396 lines
15 KiB
Plaintext
396 lines
15 KiB
Plaintext
PEP: 432
|
|
Title: Simplifying the CPython startup sequence
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 28-Dec-2012
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a mechanism for simplifying the startup sequence for
|
|
CPython, making it easier to modify the initialisation behaviour of the
|
|
reference interpreter executable, as well as making it easier to control
|
|
CPython's startup behaviour when creating an alternate executable or
|
|
embedding it as a Python execution engine inside a larger application.
|
|
|
|
|
|
Proposal Summary
|
|
================
|
|
|
|
This PEP proposes that CPython move to an explicit 2-phase initialisation
|
|
process, where a preliminary interpreter is put in place with limited OS
|
|
interaction capabilities early in the startup sequence. This essential core
|
|
remains in place while all of the configuration settings are determined,
|
|
until a final configuration call takes those settings and finishes
|
|
bootstrapping the interpreter immediately before executing the main module.
|
|
|
|
As a concrete use case to help guide any design changes, and to solve a known
|
|
problem where the appropriate defaults for system utilities differ from those
|
|
for running user scripts, this PEP also proposes the creation and
|
|
distribution of a separate system Python (``spython``) executable which, by
|
|
default, ignores user site directories and environment variables, and does
|
|
not implicitly set ``sys.path[0]`` based on the current directory or the
|
|
script being executed.
|
|
|
|
|
|
Background
|
|
==========
|
|
|
|
Over time, CPython's initialisation sequence has become progressively more
|
|
complicated, offering more options, as well as performing more complex tasks
|
|
(such as configuring the Unicode settings for OS interfaces in Python 3 as
|
|
well as bootstrapping a pure Python implementation of the import system).
|
|
|
|
Much of this complexity is accessible only through the ``Py_Main`` and
|
|
``Py_Initialize`` APIs, offering embedding applications little opportunity
|
|
for customisation. This creeping complexity also makes life difficult for
|
|
maintainers, as much of the configuration needs to take place prior to the
|
|
``Py_Initialize`` call, meaning much of the Python C API cannot be used
|
|
safely.
|
|
|
|
A number of proposals are on the table for even *more* sophisticated
|
|
startup behaviour, such as better control over ``sys.path`` initialisation
|
|
(easily adding additional directories on the command line in a cross-platform
|
|
fashion, as well as controlling the configuration of ``sys.path[0]``), easier
|
|
configuration of utilities like coverage tracing when launching Python
|
|
subprocesses, and easier control of the encoding used for the standard IO
|
|
streams when embedding CPython in a larger application.
|
|
|
|
Rather than attempting to bolt such behaviour onto an already complicated
|
|
system, this PEP proposes to instead simplify the status quo *first*, with
|
|
the aim of making these further feature requests easier to implement.
|
|
|
|
|
|
Key Concerns
|
|
============
|
|
|
|
There are a couple of key concerns that any change to the startup sequence
|
|
needs to take into account.
|
|
|
|
|
|
Maintainability
|
|
---------------
|
|
|
|
The current CPython startup sequence is difficult to understand, and even
|
|
more difficult to modify. It is not clear what state the interpreter is in
|
|
while much of the initialisation code executes, leading to behaviour such
|
|
as lists, dictionaries and Unicode values being created prior to the call
|
|
to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1_].
|
|
|
|
By moving to a 2-phase startup sequence, developers should only need to
|
|
understand which features are not available in the core bootstrapping state,
|
|
as the vast majority of the configuration process will now take place in
|
|
that state.
|
|
|
|
By basing the new design on a combination of C structures and Python
|
|
dictionaries, it should also be easier to modify the system in the
|
|
future to add new configuration options.
|
|
|
|
|
|
Performance
|
|
-----------
|
|
|
|
CPython is used heavily to run short scripts where the runtime is dominated
|
|
by the interpreter initialisation time. Any changes to the startup sequence
|
|
should minimise their impact on the startup overhead. (Given that the
|
|
overhead is dominated by IO operations, this is not currently expected to
|
|
cause any significant problems).
|
|
|
|
|
|
The Status Quo
|
|
==============
|
|
|
|
Much of the configuration of CPython is currently handled through C level
|
|
global variables::
|
|
|
|
Py_IgnoreEnvironmentFlag
|
|
Py_HashRandomizationFlag
|
|
_Py_HashSecretInitialized
|
|
_Py_HashSecret
|
|
Py_BytesWarningFlag
|
|
Py_DebugFlag
|
|
Py_InspectFlag
|
|
Py_InteractiveFlag
|
|
Py_OptimizeFlag
|
|
Py_DontWriteBytecodeFlag
|
|
Py_NoUserSiteDirectory
|
|
Py_NoSiteFlag
|
|
Py_UnbufferedStdioFlag
|
|
Py_VerboseFlag
|
|
|
|
For the above variables, the conversion of command line options and
|
|
environment variables to C global variables is handled by ``Py_Main``,
|
|
so each embedding application must set those appropriately in order to
|
|
change them from their defaults.
|
|
|
|
Some configuration can only be provided as OS level environment variables::
|
|
|
|
PYTHONHASHSEED
|
|
PYTHONSTARTUP
|
|
PYTHONPATH
|
|
PYTHONHOME
|
|
PYTHONCASEOK
|
|
PYTHONIOENCODING
|
|
|
|
Additional configuration is handled via separate API calls::
|
|
|
|
Py_SetProgramName() (call before Py_Initialize())
|
|
Py_SetPath() (optional, call before Py_Initialize())
|
|
Py_SetPythonHome() (optional, call before Py_Initialize()???)
|
|
Py_SetArgv[Ex]() (call after Py_Initialize())
|
|
|
|
The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate
|
|
whether or not CPython's signal handlers should be installed.
|
|
|
|
Finally, some interactive behaviour (such as printing the introductory
|
|
banner) is triggered only when standard input is reported as a terminal
|
|
connection by the operating system.
|
|
|
|
Also see more detailed notes at [1_]
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
(Note: details here are still very much in flux, but preliminary feedback
|
|
is appreciated anyway)
|
|
|
|
Core Interpreter Initialisation
|
|
-------------------------------
|
|
|
|
The only configuration that currently absolutely needs to be in place
|
|
before even the interpreter core can be initialised is the seed for the
|
|
randomised hash algorithm. However, there are a couple of settings needed
|
|
there: whether or not hash randomisation is enabled at all, and if it's
|
|
enabled, whether or not to use a specific seed value.
|
|
|
|
The proposed API for this step in the startup sequence is::
|
|
|
|
void Py_BeginInitialization(Py_CoreConfig *config);
|
|
|
|
Like Py_Initialize, this part of the new API treats initialisation failures
|
|
as fatal errors. While that's still not particularly embedding friendly,
|
|
the operations in this step *really* shouldn't be failing, and changing them
|
|
to return error codes instead of aborting would be an even larger task than
|
|
the one already being proposed.
|
|
|
|
The new Py_CoreConfig struct holds the settings required for preliminary
|
|
configuration::
|
|
|
|
typedef struct {
|
|
int use_hash_seed;
|
|
size_t hash_seed;
|
|
} Py_CoreConfig;
|
|
|
|
To "disable" hash randomisation, set "use_hash_seed" and pass a hash seed of
|
|
zero. (This seems reasonable to me, but there may be security implications
|
|
I'm overlooking. If so, adding a separate flag or switching to a 3-valued
|
|
"no randomisation", "fixed hash seed" and "randomised hash" option is easy)
|
|
|
|
The core configuration settings pointer may be NULL, in which case the
|
|
default behaviour of randomised hashes with a random seed will be used.
|
|
|
|
A new query API will allow code to determine if the interpreter is in the
|
|
bootstrapping state between core initialisation and the completion of the
|
|
initialisation process::
|
|
|
|
int Py_IsInitializing();
|
|
|
|
While in the initialising state, the interpreter should be fully functional
|
|
except that:
|
|
|
|
* compilation is not allowed (as the parser is not yet configured properly)
|
|
* The following attributes in the ``sys`` module are all either missing or
|
|
``None``:
|
|
* ``sys.path``
|
|
* ``sys.argv``
|
|
* ``sys.executable``
|
|
* ``sys.base_exec_prefix``
|
|
* ``sys.base_prefix``
|
|
* ``sys.exec_prefix``
|
|
* ``sys.prefix``
|
|
* ``sys.warnoptions``
|
|
* ``sys.flags``
|
|
* ``sys.dont_write_bytecode``
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* The filesystem encoding is not yet defined
|
|
* The IO encoding is not yet defined
|
|
* CPython signal handlers are not yet installed
|
|
* only builtin and frozen modules may be imported (due to above limitations)
|
|
* ``sys.stderr`` is set to a temporary IO object using unbuffered binary
|
|
mode
|
|
* The ``warnings`` module is not yet initialised
|
|
* The ``__main__`` module does not yet exist
|
|
|
|
<TBD: identify any other notable missing functionality>
|
|
|
|
The main things made available by this step will be the core Python
|
|
datatypes, in particular dictionaries, lists and strings. This allows them
|
|
to be used safely for all of the remaining configuration steps (unlike the
|
|
status quo).
|
|
|
|
In addition, the current thread will possess a valid Python thread state,
|
|
allow any further configuration data to be stored.
|
|
|
|
Any call to Py_InitStart() must have a matching call to Py_Finalize(). It
|
|
is acceptable to skip calling Py_InitFinish() in between (e.g. if
|
|
attempting to read the configuration settings fails)
|
|
|
|
|
|
Determining the remaining configuration settings
|
|
------------------------------------------------
|
|
|
|
The next step in the initialisation sequence is to determine the full
|
|
settings needed to complete the process. No changes are made to the
|
|
interpreter state at this point. The core API for this step is::
|
|
|
|
int Py_ReadConfiguration(PyObject *config);
|
|
|
|
The config argument should be a pointer to a Python dictionary. For any
|
|
supported configuration setting already in the dictionary, CPython will
|
|
sanity check the supplied value, but otherwise accept it as correct.
|
|
|
|
Unlike Py_Initialize and Py_BeginInitialization, this call will raise an
|
|
exception and report an error return rather than exhibiting fatal errors if
|
|
a problem is found with the config data.
|
|
|
|
Any supported configuration setting which is not already set will be
|
|
populated appropriately. The default configuration can be overridden
|
|
entirely by setting the value *before* calling Py_ReadConfiguration. The
|
|
provided value will then also be used in calculating any settings derived
|
|
from that value.
|
|
|
|
Alternatively, settings may be overridden *after* the Py_ReadConfiguration
|
|
call (this can be useful if an embedding application wants to adjust
|
|
a setting rather than replace it completely, such as removing
|
|
``sys.path[0]``).
|
|
|
|
|
|
Supported configuration settings
|
|
--------------------------------
|
|
|
|
At least the following configuration settings will be supported::
|
|
|
|
raw_argv (list of str, default = retrieved from OS APIs)
|
|
|
|
argv (list of str, default = derived from raw_argv)
|
|
warnoptions (list of str, default = derived from raw_argv and environment)
|
|
xoptions (list of str, default = derived from raw_argv and environment)
|
|
|
|
program_name (str, default = retrieved from OS APIs)
|
|
executable (str, default = derived from program_name)
|
|
home (str, default = complicated!)
|
|
prefix (str, default = complicated!)
|
|
exec_prefix (str, default = complicated!)
|
|
base_prefix (str, default = complicated!)
|
|
base_exec_prefix (str, default = complicated!)
|
|
path (list of str, default = complicated!)
|
|
|
|
io_encoding (str, default = derived from environment or OS APIs)
|
|
fs_encoding (str, default = derived from OS APIs)
|
|
|
|
skip_signal_handlers (boolean, default = derived from environment or False)
|
|
ignore_environment (boolean, default = derived from environment or False)
|
|
dont_write_bytecode (boolean, default = derived from environment or False)
|
|
no_site (boolean, default = derived from environment or False)
|
|
no_user_site (boolean, default = derived from environment or False)
|
|
<TBD: at least more from sys.flags need to go here>
|
|
|
|
|
|
|
|
Completing the interpreter initialisation
|
|
-----------------------------------------
|
|
|
|
The final step in the process is to actually put the configuration settings
|
|
into effect and finish bootstrapping the interpreter up to full operation::
|
|
|
|
int Py_EndInitialization(PyObject *config);
|
|
|
|
Like Py_ReadConfiguration, this call will raise an exception and report an
|
|
error return rather than exhibiting fatal errors if a problem is found with
|
|
the config data.
|
|
|
|
After a successful call, Py_IsInitializing() will be false, while
|
|
Py_IsInitialized() will become true. The caveats described above for the
|
|
interpreter during the initialisation phase will no longer hold.
|
|
|
|
|
|
Stable ABI
|
|
----------
|
|
|
|
All of the APIs proposed in this PEP are excluded from the stable ABI, as
|
|
embedding a Python interpreter involves a much higher degree of coupling
|
|
than merely writing an extension.
|
|
|
|
|
|
Backwards Compatibility
|
|
-----------------------
|
|
|
|
Backwards compatibility will be preserved primarily by ensuring that
|
|
Py_ReadConfiguration() interrogates all the previously defined configuration
|
|
settings stored in global variables and environment variables.
|
|
|
|
One acknowledged incompatiblity is that some environment variables which
|
|
are currently read lazily may instead be read once during interpreter
|
|
initialisation. As the PEP matures, these will be discussed in more detail
|
|
on a case by case basis.
|
|
|
|
|
|
A System Python Executable
|
|
==========================
|
|
|
|
When executing system utilities with administrative access to a system, many
|
|
of the default behaviours of CPython are undesirable, as they may allow
|
|
untrusted code to execute with elevated privileges. The most problematic
|
|
aspects are the fact that user site directories are enabled,
|
|
environment variables are trusted and that the directory containing the
|
|
executed file is placed at the beginning of the import path.
|
|
|
|
Currently, providing a separate executable with different default behaviour
|
|
would be prohibitively hard to maintain. One of the goals of this PEP is to
|
|
make it possible to replace much of the hard to maintain bootstrapping code
|
|
with more normal CPython code, as well as making it easier for a separate
|
|
application to make use of key components of ``Py_Main``. Including this
|
|
change in the PEP is designed to help avoid acceptance of a design that
|
|
sounds good in theory but proves to be problematic in practice.
|
|
|
|
One final aspect not addressed by the general embedding changes above is
|
|
the current inaccessibility of the core logic for deciding between the
|
|
different execution modes supported by CPython::
|
|
|
|
* script execution
|
|
* directory/zipfile execution
|
|
* command execution ("-c" switch)
|
|
* module or package execution ("-m" switch)
|
|
* execution from stdin (non-interactive)
|
|
* interactive stdin
|
|
|
|
<TBD: concrete proposal for better exposing the __main__ execution step>
|
|
|
|
Implementation
|
|
==============
|
|
|
|
None as yet. Once I have a reasonably solid plan of attack, I intend to work
|
|
on a reference implementation as a feature branch in my BitBucket sandbox [2_]
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] CPython interpreter initialization notes
|
|
(http://wiki.python.org/moin/CPythonInterpreterInitialization)
|
|
|
|
.. [2] BitBucket Sandbox
|
|
(https://bitbucket.org/ncoghlan/cpython_sandbox)
|
|
|
|
|
|
Copyright
|
|
===========
|
|
This document has been placed in the public domain.
|