1422 lines
59 KiB
Plaintext
1422 lines
59 KiB
Plaintext
PEP: 432
|
|
Title: Restructuring the CPython startup sequence
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>,
|
|
Victor Stinner <vstinner@python.org>,
|
|
Eric Snow <ericsnowcurrently@gmail.com>
|
|
Discussions-To: capi-sig@python.org
|
|
Status: Withdrawn
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Requires: 587
|
|
Created: 28-Dec-2012
|
|
Post-History: 28-Dec-2012, 02-Jan-2013, 30-Mar-2019, 28-Jun-2020
|
|
|
|
.. highlight:: c
|
|
|
|
PEP Withdrawal
|
|
==============
|
|
|
|
From late 2012 to mid 2020, this PEP provided general background and specific
|
|
concrete proposals for making the CPython startup sequence easier to maintain
|
|
and the CPython runtime easier to embed as part of a larger application.
|
|
|
|
For most of that time, the changes were maintained either in a separate feature
|
|
branch, or else as underscore-prefixed private APIs in the main CPython repo.
|
|
|
|
In 2019, :pep:`587` migrated a subset of those API changes to the public CPython
|
|
API for Python 3.8+ (specifically, the PEP updated the interpreter runtime to
|
|
offer an explicitly multi-stage struct-based configuration interface).
|
|
|
|
In June 2020, in response to a query from the Steering Council, the PEP authors
|
|
decided that it made sense to withdraw the original PEP, as enough has changed
|
|
since :pep:`432` was first written that we think any further changes to the
|
|
startup sequence and embedding API would be best formulated as a new PEP (or
|
|
PEPs) that take into account not only the not-yet-implemented ideas from :pep:`432`
|
|
that weren't considered sufficiently well validated to make their way into
|
|
:pep:`587`, but also any feedback on the public :pep:`587` API, and any other lessons
|
|
that have been learned while adjusting the CPython implementation to be more
|
|
embedding and subinterpreter friendly.
|
|
|
|
In particular, PEPs proposing the following changes, and any further
|
|
infrastructure changes needed to enable them, would likely still be worth
|
|
exploring:
|
|
|
|
* shipping an alternate Python executable that ignores all user level
|
|
settings and runs in isolated mode by default, and would hence be more
|
|
suitable for execution of system level Python applications than the default
|
|
interpreter
|
|
* enhancing the zipapp module to support the creation of single-file executables
|
|
from pure Python scripts (and potentially even Python extension modules, given
|
|
the introduction of multi-phase extension module initialisation)
|
|
* migrating the complex sys.path initialisation logic from C to Python in order
|
|
to improve test suite coverage and the general maintainability of that code
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a mechanism for restructuring the startup sequence for
|
|
CPython, making it easier to modify the initialization behaviour of the
|
|
reference interpreter executable, as well as making it easier to control
|
|
CPython's startup behaviour when creating an alternate executable or
|
|
embedding it as a Python execution engine inside a larger application.
|
|
|
|
When implementation of this proposal is completed, interpreter startup will
|
|
consist of three clearly distinct and independently configurable phases:
|
|
|
|
* Python core runtime preinitialization
|
|
|
|
* setting up memory management
|
|
* determining the encodings used for system interfaces (including settings
|
|
passed in for later configuration phase)
|
|
|
|
* Python core runtime initialization
|
|
|
|
* ensuring C API is ready for use
|
|
* ensuring builtin and frozen modules are accessible
|
|
|
|
* Main interpreter configuration
|
|
|
|
* ensuring external modules are accessible
|
|
* (Note: the name of this phase is quite likely to change)
|
|
|
|
Changes are also proposed that impact main module execution and subinterpreter
|
|
initialization.
|
|
|
|
Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriate
|
|
resolution for most of these should become clearer as the reference
|
|
implementation is developed.
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
This PEP proposes that initialization of the CPython runtime be split into
|
|
three clearly distinct phases:
|
|
|
|
* core runtime preinitialization
|
|
* core runtime initialization
|
|
* main interpreter configuration
|
|
|
|
(Earlier versions proposed only two phases, but experience with attempting to
|
|
implement the PEP as an internal CPython refactoring showed that at least 3
|
|
phases are needed to get clear separation of concerns)
|
|
|
|
The proposed design also has significant implications for:
|
|
|
|
* main module execution
|
|
* subinterpreter initialization
|
|
|
|
In the new design, the interpreter will move through the following
|
|
well-defined phases during the initialization sequence:
|
|
|
|
* Uninitialized - haven't even started the pre-initialization phase yet
|
|
* Pre-Initialization - no interpreter available
|
|
* Runtime Initialized - main interpreter partially available,
|
|
subinterpreter creation not yet available
|
|
* Initialized - main interpreter fully available, subinterpreter creation
|
|
available
|
|
|
|
:pep:`587` is a more detailed proposal that covers separating out the
|
|
Pre-Initialization phase from the last two phases, but doesn't allow embedding
|
|
applications to run arbitrary code while in the "Runtime Initialized" state
|
|
(instead, initializing the core runtime will also always fully initialize the
|
|
main interpreter, as that's the way the native CPython CLI still works in
|
|
Python 3.8).
|
|
|
|
As a concrete use case to help guide any design changes, and to solve a known
|
|
problem where the appropriate defaults for system utilities differ from those
|
|
for running user scripts, this PEP proposes the creation and
|
|
distribution of a separate system Python (``system-python``) executable
|
|
which, by default, operates in "isolated mode" (as selected by the CPython
|
|
``-I`` switch), as well as the creation of an example stub binary that just
|
|
runs an appended zip archive (permitting single-file pure Python executables)
|
|
rather than going through the normal CPython startup sequence.
|
|
|
|
To keep the implementation complexity under control, this PEP does *not*
|
|
propose wholesale changes to the way the interpreter state is accessed at
|
|
runtime. Changing the order in which the existing initialization steps
|
|
occur in order to make the startup sequence easier to maintain is already a
|
|
substantial change, and attempting to make those other changes at the same time
|
|
will make the change significantly more invasive and much harder to review.
|
|
However, such proposals may be suitable topics for follow-on PEPs or patches
|
|
- one key benefit of this PEP and its related subproposals is decreasing the
|
|
coupling between the internal storage model and the configuration interface,
|
|
so such changes should be easier once this PEP has been implemented.
|
|
|
|
|
|
Background
|
|
==========
|
|
|
|
Over time, CPython's initialization sequence has become progressively more
|
|
complicated, offering more options, as well as performing more complex tasks
|
|
(such as configuring the Unicode settings for OS interfaces in Python 3 [10]_,
|
|
bootstrapping a pure Python implementation of the import system, and
|
|
implementing an isolated mode more suitable for system applications that run
|
|
with elevated privileges [6]_).
|
|
|
|
Much of this complexity is formally accessible only through the ``Py_Main``
|
|
and ``Py_Initialize`` APIs, offering embedding applications little
|
|
opportunity for customisation. This creeping complexity also makes life
|
|
difficult for maintainers, as much of the configuration needs to take
|
|
place prior to the ``Py_Initialize`` call, meaning much of the Python C
|
|
API cannot be used safely.
|
|
|
|
A number of proposals are on the table for even *more* sophisticated
|
|
startup behaviour, such as better control over ``sys.path``
|
|
initialization (e.g. easily adding additional directories on the command line
|
|
in a cross-platform fashion [7]_, controlling the configuration of
|
|
``sys.path[0]`` [8]_), easier configuration of utilities like coverage
|
|
tracing when launching Python subprocesses [9]_).
|
|
|
|
Rather than continuing to bolt such behaviour onto an already complicated
|
|
system indefinitely, this PEP proposes to start simplifying the status quo by
|
|
introducing a more structured startup sequence, with the aim of making these
|
|
further feature requests easier to implement.
|
|
|
|
Originally the entire proposal was maintained in this one PEP, but that proved
|
|
impractical, so as parts of the proposed design stabilised, they are now split
|
|
out into their own PEPs, allowing progress to be made, even while the details
|
|
of the overall design are still evolving.
|
|
|
|
|
|
Key Concerns
|
|
============
|
|
|
|
There are a few key concerns that any change to the startup sequence
|
|
needs to take into account.
|
|
|
|
|
|
Maintainability
|
|
---------------
|
|
|
|
The CPython startup sequence as of Python 3.6 was difficult to understand, and
|
|
even more difficult to modify. It was not clear what state the interpreter was
|
|
in while much of the initialization code executed, leading to behaviour such
|
|
as lists, dictionaries and Unicode values being created prior to the call
|
|
to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1]_.
|
|
|
|
By moving to an explicitly multi-phase startup sequence, developers should
|
|
only need to understand:
|
|
|
|
* which APIs and features are available prior to pre-configuration (essentially
|
|
none, except for the pre-configuration API itself)
|
|
* which APIs and features are available prior to core runtime configuration, and
|
|
will implicitly run the pre-configuration with default settings that match the
|
|
behaviour of Python 3.6 if the pre-configuration hasn't been run explicitly
|
|
* which APIs and features are only available after the main interpreter has been
|
|
fully configured (which will hopefully be a relatively small subset of the
|
|
full C API)
|
|
|
|
The first two aspects of that are covered by :pep:`587`, while the details of the
|
|
latter distinction are still being considered.
|
|
|
|
By basing the new design on a combination of C structures and Python
|
|
data types, it should also be easier to modify the system in the
|
|
future to add new configuration options.
|
|
|
|
|
|
Testability
|
|
-----------
|
|
|
|
One of the problems with the complexity of the CPython startup sequence is the
|
|
combinatorial explosion of possible interactions between different configuration
|
|
settings.
|
|
|
|
This concern impacts both the design of the new initialisation system, and
|
|
the proposed approach for getting there.
|
|
|
|
|
|
Performance
|
|
-----------
|
|
|
|
CPython is used heavily to run short scripts where the runtime is dominated
|
|
by the interpreter initialization time. Any changes to the startup sequence
|
|
should minimise their impact on the startup overhead.
|
|
|
|
Experience with the importlib migration suggests that the startup time is
|
|
dominated by IO operations. However, to monitor the impact of any changes,
|
|
a simple benchmark can be used to check how long it takes to start and then
|
|
tear down the interpreter:
|
|
|
|
.. code-block:: bash
|
|
|
|
python3 -m timeit -s "from subprocess import call" "call(['./python', '-Sc', 'pass'])"
|
|
|
|
Current numbers on my system for Python 3.7 (as built by the Fedora project):
|
|
|
|
.. code-block:: console
|
|
|
|
$ python3 -m timeit -s "from subprocess import call" "call(['python3', '-Sc', 'pass'])"
|
|
50 loops, best of 5: 6.48 msec per loop
|
|
|
|
(TODO: run this microbenchmark with perf rather than the stdlib timeit)
|
|
|
|
This PEP is not expected to have any significant effect on the startup time,
|
|
as it is aimed primarily at *reordering* the existing initialization
|
|
sequence, without making substantial changes to the individual steps.
|
|
|
|
However, if this simple check suggests that the proposed changes to the
|
|
initialization sequence may pose a performance problem, then a more
|
|
sophisticated microbenchmark will be developed to assist in investigation.
|
|
|
|
|
|
Required Configuration Settings
|
|
===============================
|
|
|
|
See :pep:`587` for a detailed listing of CPython interpreter configuration settings
|
|
and the various means available for setting them.
|
|
|
|
|
|
Implementation Strategy
|
|
=======================
|
|
|
|
An initial attempt was made at implementing an earlier version of this PEP for
|
|
Python 3.4 [2]_, with one of the significant problems encountered being merge
|
|
conflicts after the initial structural changes were put in place to start the
|
|
refactoring process. Unlike some other previous major changes, such as the
|
|
switch to an AST-based compiler in Python 2.5, or the switch to the importlib
|
|
implementation of the import system in Python 3.3, there is no clear way to
|
|
structure a draft implementation that won't be prone to the kinds of merge
|
|
conflicts that afflicted the original attempt.
|
|
|
|
Accordingly, the implementation strategy was revised to instead first implement
|
|
this refactoring as a private API for CPython 3.7, and then review the viability
|
|
of exposing the new functions and structures as public API elements in CPython
|
|
3.8.
|
|
|
|
After the initial merge, Victor Stinner then proceeded to actually migrate
|
|
settings to the new structure in order to successfully implement the :pep:`540`
|
|
UTF-8 mode changes (which required the ability to track all settings that had
|
|
previously been decoded with the locale encoding, and decode them again using
|
|
UTF-8 instead). Eric Snow also migrated a number of internal subsystems over as
|
|
part of making the subinterpreter feature more robust.
|
|
|
|
That work showed that the detailed design originally proposed in this PEP had a
|
|
range of practical issues, so Victor designed and implemented an improved
|
|
private API (inspired by an earlier iteration of this PEP), which :pep:`587`
|
|
proposes to promote to a public API in Python 3.8.
|
|
|
|
|
|
Design Details
|
|
==============
|
|
|
|
.. note::
|
|
|
|
The API details here are still very much in flux. The header files that show
|
|
the current state of the private API are mainly:
|
|
|
|
* https://github.com/python/cpython/blob/master/Include/cpython/coreconfig.h
|
|
* https://github.com/python/cpython/blob/master/Include/cpython/pystate.h
|
|
* https://github.com/python/cpython/blob/master/Include/cpython/pylifecycle.h
|
|
|
|
:pep:`587` covers the aspects of the API that are considered potentially stable
|
|
enough to make public. Where a proposed API is covered by that PEP,
|
|
"(see PEP 587)" is added to the text below.
|
|
|
|
The main theme of this proposal is to initialize the core language runtime
|
|
and create a partially initialized interpreter state for the main interpreter
|
|
*much* earlier in the startup process. This will allow most of the CPython API
|
|
to be used during the remainder of the initialization process, potentially
|
|
simplifying a number of operations that currently need to rely on basic C
|
|
functionality rather than being able to use the richer data structures provided
|
|
by the CPython C API.
|
|
|
|
:pep:`587` covers a subset of that task, which is splitting out the components that
|
|
even the existing "May be called before ``Py_Initialize``" interfaces need (like
|
|
memory allocators and operating system interface encoding details) into a
|
|
separate pre-configuration step.
|
|
|
|
In the following, the term "embedding application" also covers the standard
|
|
CPython command line application.
|
|
|
|
|
|
Interpreter Initialization Phases
|
|
---------------------------------
|
|
|
|
The following distinct interpreter initialisation phases are proposed:
|
|
|
|
* Uninitialized:
|
|
|
|
* Not really a phase, but the absence of a phase
|
|
* ``Py_IsInitializing()`` returns ``0``
|
|
* ``Py_IsRuntimeInitialized()`` returns ``0``
|
|
* ``Py_IsInitialized()`` returns ``0``
|
|
* The embedding application determines which memory allocator to use, and
|
|
which encoding to use to access operating system interfaces (or chooses
|
|
to delegate those decisions to the Python runtime)
|
|
* Application starts the initialization process by calling one of the
|
|
``Py_PreInitialize`` APIs (see :pep:`587`)
|
|
|
|
* Runtime Pre-Initialization:
|
|
|
|
* no interpreter is available
|
|
* ``Py_IsInitializing()`` returns ``1``
|
|
* ``Py_IsRuntimeInitialized()`` returns ``0``
|
|
* ``Py_IsInitialized()`` returns ``0``
|
|
* The embedding application determines the settings required to initialize
|
|
the core CPython runtime and create the main interpreter and moves to the
|
|
next phase by calling ``Py_InitializeRuntime``
|
|
* Note: as of :pep:`587`, the embedding application instead calls ``Py_Main()``,
|
|
``Py_UnixMain``, or one of the ``Py_Initialize`` APIs, and hence jumps
|
|
directly to the Initialized state.
|
|
|
|
* Main Interpreter Initialization:
|
|
|
|
* the builtin data types and other core runtime services are available
|
|
* the main interpreter is available, but only partially configured
|
|
* ``Py_IsInitializing()`` returns ``1``
|
|
* ``Py_IsRuntimeInitialized()`` returns ``1``
|
|
* ``Py_IsInitialized()`` returns ``0``
|
|
* The embedding application determines and applies the settings
|
|
required to complete the initialization process by calling
|
|
``Py_InitializeMainInterpreter``
|
|
* Note: as of :pep:`587`, this state is not reachable via any public API, it
|
|
only exists as an implicit internal state while one of the ``Py_Initialize``
|
|
functions is running
|
|
|
|
* Initialized:
|
|
|
|
* the main interpreter is available and fully operational, but
|
|
``__main__`` related metadata is incomplete
|
|
* ``Py_IsInitializing()`` returns ``0``
|
|
* ``Py_IsRuntimeInitialized()`` returns ``1``
|
|
* ``Py_IsInitialized()`` returns ``1``
|
|
|
|
|
|
Invocation of Phases
|
|
--------------------
|
|
|
|
All listed phases will be used by the standard CPython interpreter and the
|
|
proposed System Python interpreter.
|
|
|
|
An embedding application may still continue to leave initialization almost
|
|
entirely under CPython's control by using the existing ``Py_Initialize``
|
|
or ``Py_Main()`` APIs - backwards compatibility will be preserved.
|
|
|
|
Alternatively, if an embedding application wants greater control
|
|
over CPython's initial state, it will be able to use the new, finer
|
|
grained API, which allows the embedding application greater control
|
|
over the initialization process.
|
|
|
|
:pep:`587` covers an initial iteration of that API, separating out the
|
|
pre-initialization phase without attempting to separate core runtime
|
|
initialization from main interpreter initialization.
|
|
|
|
|
|
Uninitialized State
|
|
-------------------
|
|
|
|
The uninitialized state is where an embedding application determines the settings
|
|
which are required in order to be able to correctly pass configurations settings
|
|
to the embedded Python runtime.
|
|
|
|
This covers telling Python which memory allocator to use, as well as which text
|
|
encoding to use when processing provided settings.
|
|
|
|
:pep:`587` defines the settings needed to exit this state in its ``PyPreConfig``
|
|
struct.
|
|
|
|
A new query API will allow code to determine if the interpreter hasn't even
|
|
started the initialization process::
|
|
|
|
int Py_IsInitializing();
|
|
|
|
The query for a completely uninitialized environment would then be
|
|
``!(Py_Initialized() || Py_Initializing())``.
|
|
|
|
|
|
Runtime Pre-Initialization Phase
|
|
--------------------------------
|
|
|
|
.. note:: In :pep:`587`, the settings for this phase are not yet separated out,
|
|
and are instead only available through the combined ``PyConfig`` struct
|
|
|
|
The pre-initialization phase is where an embedding application determines
|
|
the settings which are absolutely required before the CPython runtime can be
|
|
initialized at all. Currently, the primary configuration settings in this
|
|
category are those related to the randomised hash algorithm - the hash
|
|
algorithms must be consistent for the lifetime of the process, and so they
|
|
must be in place before the core interpreter is created.
|
|
|
|
The essential settings needed are a flag indicating whether or not to use a
|
|
specific seed value for the randomised hashes, and if so, the specific value
|
|
for the seed (a seed value of zero disables randomised hashing). In addition,
|
|
due to the possible use of ``PYTHONHASHSEED`` in configuring the hash
|
|
randomisation, the question of whether or not to consider environment
|
|
variables must also be addressed early. Finally, to support the CPython
|
|
build process, an option is offered to completely disable the import
|
|
system.
|
|
|
|
The proposed APIs for this step in the startup sequence are::
|
|
|
|
PyInitError Py_InitializeRuntime(
|
|
const PyRuntimeConfig *config
|
|
);
|
|
|
|
PyInitError Py_InitializeRuntimeFromArgs(
|
|
const PyRuntimeConfig *config, int argc, char **argv
|
|
);
|
|
|
|
PyInitError Py_InitializeRuntimeFromWideArgs(
|
|
const PyRuntimeConfig *config, int argc, wchar_t **argv
|
|
);
|
|
|
|
If ``Py_IsInitializing()`` is false, the ``Py_InitializeRuntime`` functions will
|
|
implicitly call the corresponding ``Py_PreInitialize`` function. The
|
|
``use_environment`` setting will be passed down, while other settings will be
|
|
processed according to their defaults, as described in :pep:`587`.
|
|
|
|
The ``PyInitError`` return type is defined in :pep:`587`, and allows an embedding
|
|
application to gracefully handle Python runtime initialization failures,
|
|
rather than having the entire process abruptly terminated by ``Py_FatalError``.
|
|
|
|
The new ``PyRuntimeConfig`` struct holds the settings required for preliminary
|
|
configuration of the core runtime and creation of the main interpreter::
|
|
|
|
/* Note: if changing anything in PyRuntimeConfig, also update
|
|
* PyRuntimeConfig_INIT */
|
|
typedef struct {
|
|
bool use_environment; /* as in PyPreConfig, PyConfig from PEP 587 */
|
|
int use_hash_seed; /* PYTHONHASHSEED, as in PyConfig from PEP 587 */
|
|
unsigned long hash_seed; /* PYTHONHASHSEED, as in PyConfig from PEP 587 */
|
|
bool _install_importlib; /* Needed by freeze_importlib */
|
|
} PyRuntimeConfig;
|
|
|
|
/* Rely on the "designated initializer" feature of C99 */
|
|
#define PyRuntimeConfig_INIT {.use_hash_seed=-1}
|
|
|
|
The core configuration settings pointer may be ``NULL``, in which case the
|
|
default values are as specified in ``PyRuntimeConfig_INIT``.
|
|
|
|
The ``PyRuntimeConfig_INIT`` macro is designed to allow easy initialization
|
|
of a struct instance with sensible defaults::
|
|
|
|
PyRuntimeConfig runtime_config = PyRuntimeConfig_INIT;
|
|
|
|
``use_environment`` controls the processing of all Python related
|
|
environment variables. If the flag is true, then ``PYTHONHASHSEED`` is
|
|
processed normally. Otherwise, all Python-specific environment variables
|
|
are considered undefined (exceptions may be made for some OS specific
|
|
environment variables, such as those used on Mac OS X to communicate
|
|
between the App bundle and the main Python binary).
|
|
|
|
``use_hash_seed`` controls the configuration of the randomised hash
|
|
algorithm. If it is zero, then randomised hashes with a random seed will
|
|
be used. It is positive, then the value in ``hash_seed`` will be used
|
|
to seed the random number generator. If the ``hash_seed`` is zero in this
|
|
case, then the randomised hashing is disabled completely.
|
|
|
|
If ``use_hash_seed`` is negative (and ``use_environment`` is true),
|
|
then CPython will inspect the ``PYTHONHASHSEED`` environment variable. If the
|
|
environment variable is not set, is set to the empty string, or to the value
|
|
``"random"``, then randomised hashes with a random seed will be used. If the
|
|
environment variable is set to the string ``"0"`` the randomised hashing will
|
|
be disabled. Otherwise, the hash seed is expected to be a string
|
|
representation of an integer in the range ``[0; 4294967295]``.
|
|
|
|
To make it easier for embedding applications to use the ``PYTHONHASHSEED``
|
|
processing with a different data source, the following helper function
|
|
will be added to the C API::
|
|
|
|
int Py_ReadHashSeed(char *seed_text,
|
|
int *use_hash_seed,
|
|
unsigned long *hash_seed);
|
|
|
|
This function accepts a seed string in ``seed_text`` and converts it to
|
|
the appropriate flag and seed values. If ``seed_text`` is ``NULL``,
|
|
the empty string or the value ``"random"``, both ``use_hash_seed`` and
|
|
``hash_seed`` will be set to zero. Otherwise, ``use_hash_seed`` will be set to
|
|
``1`` and the seed text will be interpreted as an integer and reported as
|
|
``hash_seed``. On success the function will return zero. A non-zero return
|
|
value indicates an error (most likely in the conversion to an integer).
|
|
|
|
The ``_install_importlib`` setting is used as part of the CPython build
|
|
process to create an interpreter with no import capability at all. It is
|
|
considered private to the CPython development team (hence the leading
|
|
underscore), as the only currently supported use case is to permit compiler
|
|
changes that invalidate the previously frozen bytecode for
|
|
``importlib._bootstrap`` without breaking the build process.
|
|
|
|
The aim is to keep this initial level of configuration as small as possible
|
|
in order to keep the bootstrapping environment consistent across
|
|
different embedding applications. If we can create a valid interpreter state
|
|
without the setting, then the setting should appear solely in the comprehensive
|
|
``PyConfig`` struct rather than in the core runtime configuration.
|
|
|
|
A new query API will allow code to determine if the interpreter is in the
|
|
bootstrapping state between the core runtime initialization and the creation of
|
|
the main interpreter state and the completion of the bulk of the main
|
|
interpreter initialization process::
|
|
|
|
int Py_IsRuntimeInitialized();
|
|
|
|
Attempting to call ``Py_InitializeRuntime()`` again when
|
|
``Py_IsRuntimeInitialized()`` is already true is reported as a user
|
|
configuration error. (TBC, as existing public initialisation APIs support being
|
|
called multiple times without error, and simply ignore changes to any
|
|
write-once settings. It may make sense to keep that behaviour rather than trying
|
|
to make the new API stricter than the old one)
|
|
|
|
As frozen bytecode may now be legitimately run in an interpreter which is not
|
|
yet fully initialized, ``sys.flags`` will gain a new ``initialized`` flag.
|
|
|
|
With the core runtime initialised, the main interpreter and most of the CPython
|
|
C API should be fully functional except that:
|
|
|
|
* compilation is not allowed (as the parser and compiler are not yet
|
|
configured properly)
|
|
* creation of subinterpreters is not allowed
|
|
* creation of additional thread states is not allowed
|
|
* The following attributes in the ``sys`` module are all either missing or
|
|
``None``:
|
|
* ``sys.path``
|
|
* ``sys.argv``
|
|
* ``sys.executable``
|
|
* ``sys.base_exec_prefix``
|
|
* ``sys.base_prefix``
|
|
* ``sys.exec_prefix``
|
|
* ``sys.prefix``
|
|
* ``sys.warnoptions``
|
|
* ``sys.dont_write_bytecode``
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* The filesystem encoding is not yet defined
|
|
* The IO encoding is not yet defined
|
|
* CPython signal handlers are not yet installed
|
|
* Only builtin and frozen modules may be imported (due to above limitations)
|
|
* ``sys.stderr`` is set to a temporary IO object using unbuffered binary
|
|
mode
|
|
* The ``sys.flags`` attribute exists, but the individual flags may not yet
|
|
have their final values.
|
|
* The ``sys.flags.initialized`` attribute is set to ``0``
|
|
* The ``warnings`` module is not yet initialized
|
|
* The ``__main__`` module does not yet exist
|
|
|
|
<TBD: identify any other notable missing functionality>
|
|
|
|
The main things made available by this step will be the core Python
|
|
data types, in particular dictionaries, lists and strings. This allows them
|
|
to be used safely for all of the remaining configuration steps (unlike the
|
|
status quo).
|
|
|
|
In addition, the current thread will possess a valid Python thread state,
|
|
allowing any further configuration data to be stored on the main interpreter
|
|
object rather than in C process globals.
|
|
|
|
Any call to ``Py_InitializeRuntime()`` must have a matching call to
|
|
``Py_Finalize()``. It is acceptable to skip calling
|
|
``Py_InitializeMainInterpreter()`` in between (e.g. if attempting to build the
|
|
main interpreter configuration settings fails).
|
|
|
|
|
|
Determining the remaining configuration settings
|
|
------------------------------------------------
|
|
|
|
The next step in the initialization sequence is to determine the remaining
|
|
settings needed to complete the process. No changes are made to the
|
|
interpreter state at this point. The core APIs for this step are::
|
|
|
|
int Py_BuildPythonConfig(
|
|
PyConfigAsObjects *py_config, const PyConfig *c_config
|
|
);
|
|
|
|
int Py_BuildPythonConfigFromArgs(
|
|
PyConfigAsObjects *py_config, const PyConfig *c_config, int argc, char **argv
|
|
);
|
|
|
|
int Py_BuildPythonConfigFromWideArgs(
|
|
PyConfigAsObjects *py_config, const PyConfig *c_config, int argc, wchar_t **argv
|
|
);
|
|
|
|
The ``py_config`` argument should be a pointer to a PyConfigAsObjects struct
|
|
(which may be a temporary one stored on the C stack). For any already configured
|
|
value (i.e. any non-NULL pointer), CPython will sanity check the supplied value,
|
|
but otherwise accept it as correct.
|
|
|
|
A struct is used rather than a Python dictionary as the struct is easier
|
|
to work with from C, the list of supported fields is fixed for a given
|
|
CPython version and only a read-only view needs to be exposed to Python
|
|
code (which is relatively straightforward, thanks to the infrastructure
|
|
already put in place to expose ``sys.implementation``).
|
|
|
|
Unlike ``Py_InitializeRuntime``, this call will raise a Python exception and
|
|
report an error return rather than returning a Python initialization specific
|
|
C struct if a problem is found with the config data.
|
|
|
|
Any supported configuration setting which is not already set will be
|
|
populated appropriately in the supplied configuration struct. The default
|
|
configuration can be overridden entirely by setting the value *before*
|
|
calling ``Py_BuildPythonConfig``. The provided value will then also be
|
|
used in calculating any other settings derived from that value.
|
|
|
|
Alternatively, settings may be overridden *after* the
|
|
``Py_BuildPythonConfig`` call (this can be useful if an embedding
|
|
application wants to adjust a setting rather than replace it completely,
|
|
such as removing ``sys.path[0]``).
|
|
|
|
The ``c_config`` argument is an optional pointer to a ``PyConfig`` structure,
|
|
as defined in :pep:`587`. If provided, it is used in preference to reading settings
|
|
directly from the environment or process global state.
|
|
|
|
Merely reading the configuration has no effect on the interpreter state: it
|
|
only modifies the passed in configuration struct. The settings are not
|
|
applied to the running interpreter until the ``Py_InitializeMainInterpreter``
|
|
call (see below).
|
|
|
|
|
|
Supported configuration settings
|
|
--------------------------------
|
|
|
|
The interpreter configuration is split into two parts: settings which are
|
|
either relevant only to the main interpreter or must be identical across the
|
|
main interpreter and all subinterpreters, and settings which may vary across
|
|
subinterpreters.
|
|
|
|
NOTE: For initial implementation purposes, only the flag indicating whether
|
|
or not the interpreter is the main interpreter will be configured on a per
|
|
interpreter basis. Other fields will be reviewed for whether or not they can
|
|
feasibly be made interpreter specific over the course of the implementation.
|
|
|
|
.. note:: The list of config fields below is currently out of sync with :pep:`587`.
|
|
Where they differ, :pep:`587` takes precedence.
|
|
|
|
The ``PyConfigAsObjects`` struct mirrors the ``PyConfig`` struct from :pep:`587`,
|
|
but uses full Python objects to store values, rather than C level data types.
|
|
It adds ``raw_argv`` and ``argv`` list fields, so later initialisation steps
|
|
don't need to accept those separately.
|
|
|
|
Fields are always pointers to Python data types, with unset values indicated by
|
|
``NULL``::
|
|
|
|
typedef struct {
|
|
/* Argument processing */
|
|
PyListObject *raw_argv;
|
|
PyListObject *argv;
|
|
PyListObject *warnoptions; /* -W switch, PYTHONWARNINGS */
|
|
PyDictObject *xoptions; /* -X switch */
|
|
|
|
/* Filesystem locations */
|
|
PyUnicodeObject *program_name;
|
|
PyUnicodeObject *executable;
|
|
PyUnicodeObject *prefix; /* PYTHONHOME */
|
|
PyUnicodeObject *exec_prefix; /* PYTHONHOME */
|
|
PyUnicodeObject *base_prefix; /* pyvenv.cfg */
|
|
PyUnicodeObject *base_exec_prefix; /* pyvenv.cfg */
|
|
|
|
/* Site module */
|
|
PyBoolObject *enable_site_config; /* -S switch (inverted) */
|
|
PyBoolObject *no_user_site; /* -s switch, PYTHONNOUSERSITE */
|
|
|
|
/* Import configuration */
|
|
PyBoolObject *dont_write_bytecode; /* -B switch, PYTHONDONTWRITEBYTECODE */
|
|
PyBoolObject *ignore_module_case; /* PYTHONCASEOK */
|
|
PyListObject *import_path; /* PYTHONPATH (etc) */
|
|
|
|
/* Standard streams */
|
|
PyBoolObject *use_unbuffered_io; /* -u switch, PYTHONUNBUFFEREDIO */
|
|
PyUnicodeObject *stdin_encoding; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stdin_errors; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stdout_encoding; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stdout_errors; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stderr_encoding; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stderr_errors; /* PYTHONIOENCODING */
|
|
|
|
/* Filesystem access */
|
|
PyUnicodeObject *fs_encoding;
|
|
|
|
/* Debugging output */
|
|
PyBoolObject *debug_parser; /* -d switch, PYTHONDEBUG */
|
|
PyLongObject *verbosity; /* -v switch */
|
|
|
|
/* Code generation */
|
|
PyLongObject *bytes_warnings; /* -b switch */
|
|
PyLongObject *optimize; /* -O switch */
|
|
|
|
/* Signal handling */
|
|
PyBoolObject *install_signal_handlers;
|
|
|
|
/* Implicit execution */
|
|
PyUnicodeObject *startup_file; /* PYTHONSTARTUP */
|
|
|
|
/* Main module
|
|
*
|
|
* If prepare_main is set, at most one of the main_* settings should
|
|
* be set before calling PyRun_PrepareMain (Py_ReadMainInterpreterConfig
|
|
* will set one of them based on the command line arguments if
|
|
* prepare_main is non-zero when that API is called).
|
|
PyBoolObject *prepare_main;
|
|
PyUnicodeObject *main_source; /* -c switch */
|
|
PyUnicodeObject *main_path; /* filesystem path */
|
|
PyUnicodeObject *main_module; /* -m switch */
|
|
PyCodeObject *main_code; /* Run directly from a code object */
|
|
PyObject *main_stream; /* Run from stream */
|
|
PyBoolObject *run_implicit_code; /* Run implicit code during prep */
|
|
|
|
/* Interactive main
|
|
*
|
|
* Note: Settings related to interactive mode are very much in flux.
|
|
*/
|
|
PyObject *prompt_stream; /* Output interactive prompt */
|
|
PyBoolObject *show_banner; /* -q switch (inverted) */
|
|
PyBoolObject *inspect_main; /* -i switch, PYTHONINSPECT */
|
|
|
|
} PyConfigAsObjects;
|
|
|
|
The ``PyInterpreterConfig`` struct holds the settings that may vary between
|
|
the main interpreter and subinterpreters. For the main interpreter, these
|
|
settings are automatically populated by ``Py_InitializeMainInterpreter()``.
|
|
|
|
::
|
|
|
|
typedef struct {
|
|
PyBoolObject *is_main_interpreter; /* Easily check for subinterpreters */
|
|
} PyInterpreterConfig;
|
|
|
|
As these structs consist solely of object pointers, no explicit initializer
|
|
definitions are needed - C99's default initialization of struct memory to zero
|
|
is sufficient.
|
|
|
|
|
|
Completing the main interpreter initialization
|
|
----------------------------------------------
|
|
|
|
The final step in the initialization process is to actually put the
|
|
configuration settings into effect and finish bootstrapping the main
|
|
interpreter up to full operation::
|
|
|
|
int Py_InitializeMainInterpreter(const PyConfigAsObjects *config);
|
|
|
|
Like ``Py_BuildPythonConfig``, this call will raise an exception and
|
|
report an error return rather than exhibiting fatal errors if a problem is
|
|
found with the config data. (TBC, as existing public initialisation APIs support
|
|
being called multiple times without error, and simply ignore changes to any
|
|
write-once settings. It may make sense to keep that behaviour rather than trying
|
|
to make the new API stricter than the old one)
|
|
|
|
All configuration settings are required - the configuration struct
|
|
should always be passed through ``Py_BuildPythonConfig`` to ensure it
|
|
is fully populated.
|
|
|
|
After a successful call ``Py_IsInitialized()`` will become true and
|
|
``Py_IsInitializing()`` will become false. The caveats described above for the
|
|
interpreter during the phase where only the core runtime is initialized will
|
|
no longer hold.
|
|
|
|
Attempting to call ``Py_InitializeMainInterpreter()`` again when
|
|
``Py_IsInitialized()`` is true is an error.
|
|
|
|
However, some metadata related to the ``__main__`` module may still be
|
|
incomplete:
|
|
|
|
* ``sys.argv[0]`` may not yet have its final value
|
|
|
|
* it will be ``-m`` when executing a module or package with CPython
|
|
* it will be the same as ``sys.path[0]`` rather than the location of
|
|
the ``__main__`` module when executing a valid ``sys.path`` entry
|
|
(typically a zipfile or directory)
|
|
* otherwise, it will be accurate:
|
|
|
|
* the script name if running an ordinary script
|
|
* ``-c`` if executing a supplied string
|
|
* ``-`` or the empty string if running from stdin
|
|
|
|
* the metadata in the ``__main__`` module will still indicate it is a
|
|
builtin module
|
|
|
|
This function will normally implicitly import site as its final operation
|
|
(after ``Py_IsInitialized()`` is already set). Setting the
|
|
"enable_site_config" flag to ``Py_False`` in the configuration settings will
|
|
disable this behaviour, as well as eliminating any side effects on global
|
|
state if ``import site`` is later explicitly executed in the process.
|
|
|
|
|
|
Preparing the main module
|
|
-------------------------
|
|
|
|
.. note:: In :pep:`587`, ``PyRun_PrepareMain`` and ``PyRun_ExecMain`` are not
|
|
exposed separately, and are instead accessed through a ``Py_RunMain`` API
|
|
that both prepares and executes main, and then finalizes the Python
|
|
interpreter.
|
|
|
|
This subphase completes the population of the ``__main__`` module
|
|
related metadata, without actually starting execution of the ``__main__``
|
|
module code.
|
|
|
|
It is handled by calling the following API::
|
|
|
|
int PyRun_PrepareMain();
|
|
|
|
This operation is only permitted for the main interpreter, and will raise
|
|
``RuntimeError`` when invoked from a thread where the current thread state
|
|
belongs to a subinterpreter.
|
|
|
|
The actual processing is driven by the main related settings stored in
|
|
the interpreter state as part of the configuration struct.
|
|
|
|
If ``prepare_main`` is zero, this call does nothing.
|
|
|
|
If all of ``main_source``, ``main_path``, ``main_module``,
|
|
``main_stream`` and ``main_code`` are NULL, this call does nothing.
|
|
|
|
If more than one of ``main_source``, ``main_path``, ``main_module``,
|
|
``main_stream`` or ``main_code`` are set, ``RuntimeError`` will be reported.
|
|
|
|
If ``main_code`` is already set, then this call does nothing.
|
|
|
|
If ``main_stream`` is set, and ``run_implicit_code`` is also set, then
|
|
the file identified in ``startup_file`` will be read, compiled and
|
|
executed in the ``__main__`` namespace.
|
|
|
|
If ``main_source``, ``main_path`` or ``main_module`` are set, then this
|
|
call will take whatever steps are needed to populate ``main_code``:
|
|
|
|
* For ``main_source``, the supplied string will be compiled and saved to
|
|
``main_code``.
|
|
|
|
* For ``main_path``:
|
|
|
|
* if the supplied path is recognised as a valid ``sys.path`` entry, it
|
|
is inserted as ``sys.path[0]``, ``main_module`` is set
|
|
to ``__main__`` and processing continues as for ``main_module`` below.
|
|
* otherwise, path is read as a CPython bytecode file
|
|
* if that fails, it is read as a Python source file and compiled
|
|
* in the latter two cases, the code object is saved to ``main_code``
|
|
and ``__main__.__file__`` is set appropriately
|
|
|
|
* For ``main_module``:
|
|
|
|
* any parent package is imported
|
|
* the loader for the module is determined
|
|
* if the loader indicates the module is a package, add ``.__main__`` to
|
|
the end of ``main_module`` and try again (if the final name segment
|
|
is already ``.__main__`` then fail immediately)
|
|
* once the module source code is located, save the compiled module code
|
|
as ``main_code`` and populate the following attributes in ``__main__``
|
|
appropriately: ``__name__``, ``__loader__``, ``__file__``,
|
|
``__cached__``, ``__package__``.
|
|
|
|
|
|
(Note: the behaviour described in this section isn't new, it's a write-up
|
|
of the current behaviour of the CPython interpreter adjusted for the new
|
|
configuration system)
|
|
|
|
|
|
Executing the main module
|
|
-------------------------
|
|
|
|
.. note:: In :pep:`587`, ``PyRun_PrepareMain`` and ``PyRun_ExecMain`` are not
|
|
exposed separately, and are instead accessed through a ``Py_RunMain`` API
|
|
that both prepares and executes main, and then finalizes the Python
|
|
interpreter.
|
|
|
|
|
|
This subphase covers the execution of the actual ``__main__`` module code.
|
|
|
|
It is handled by calling the following API::
|
|
|
|
int PyRun_ExecMain();
|
|
|
|
This operation is only permitted for the main interpreter, and will raise
|
|
``RuntimeError`` when invoked from a thread where the current thread state
|
|
belongs to a subinterpreter.
|
|
|
|
The actual processing is driven by the main related settings stored in
|
|
the interpreter state as part of the configuration struct.
|
|
|
|
If both ``main_stream`` and ``main_code`` are NULL, this call does nothing.
|
|
|
|
If both ``main_stream`` and ``main_code`` are set, ``RuntimeError`` will
|
|
be reported.
|
|
|
|
If ``main_stream`` and ``prompt_stream`` are both set, main execution will
|
|
be delegated to a new internal API::
|
|
|
|
int _PyRun_InteractiveMain(PyObject *input, PyObject* output);
|
|
|
|
If ``main_stream`` is set and ``prompt_stream`` is NULL, main execution will
|
|
be delegated to a new internal API::
|
|
|
|
int _PyRun_StreamInMain(PyObject *input);
|
|
|
|
If ``main_code`` is set, main execution will be delegated to a new internal
|
|
API::
|
|
|
|
int _PyRun_CodeInMain(PyCodeObject *code);
|
|
|
|
After execution of main completes, if ``inspect_main`` is set, or
|
|
the ``PYTHONINSPECT`` environment variable has been set, then
|
|
``PyRun_ExecMain`` will invoke
|
|
``_PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__)``.
|
|
|
|
|
|
Internal Storage of Configuration Data
|
|
--------------------------------------
|
|
|
|
The interpreter state will be updated to include details of the configuration
|
|
settings supplied during initialization by extending the interpreter state
|
|
object with at least an embedded copy of the ``PyConfigAsObjects`` and
|
|
``PyInterpreterConfig`` structs.
|
|
|
|
For debugging purposes, the configuration settings will be exposed as
|
|
a ``sys._configuration`` simple namespace (similar to ``sys.flags`` and
|
|
``sys.implementation``. The attributes will be themselves by simple namespaces
|
|
corresponding to the two levels of configuration setting:
|
|
|
|
* ``all_interpreters``
|
|
* ``active_interpreter``
|
|
|
|
Field names will match those in the configuration structs, except for
|
|
``hash_seed``, which will be deliberately excluded.
|
|
|
|
An underscored attribute is chosen deliberately, as these configuration
|
|
settings are part of the CPython implementation, rather than part of the
|
|
Python language definition. If new settings are needed to support
|
|
cross-implementation compatibility in the standard library, then those
|
|
should be agreed with the other implementations and exposed as new required
|
|
attributes on ``sys.implementation``, as described in :pep:`421`.
|
|
|
|
These are *snapshots* of the initial configuration settings. They are not
|
|
modified by the interpreter during runtime (except as noted above).
|
|
|
|
|
|
Creating and Configuring Subinterpreters
|
|
----------------------------------------
|
|
|
|
As the new configuration settings are stored in the interpreter state, they
|
|
need to be initialised when a new subinterpreter is created. This turns out
|
|
to be trickier than one might expect due to ``PyThreadState_Swap(NULL);``
|
|
(which is fortunately exercised by CPython's own embedding tests, allowing
|
|
this problem to be detected during development).
|
|
|
|
To provide a straightforward solution for this case, the PEP proposes to
|
|
add a new API::
|
|
|
|
Py_InterpreterState *Py_InterpreterState_Main();
|
|
|
|
This will be a counterpart to ``Py_InterpreterState_Head()``, only reporting the
|
|
oldest currently existing interpreter rather than the newest. If
|
|
``Py_NewInterpreter()`` is called from a thread with an existing thread
|
|
state, then the interpreter configuration for that thread will be
|
|
used when initialising the new subinterpreter. If there is no current
|
|
thread state, the configuration from ``Py_InterpreterState_Main()``
|
|
will be used.
|
|
|
|
While the existing ``Py_InterpreterState_Head()`` API could be used instead,
|
|
that reference changes as subinterpreters are created and destroyed, while
|
|
``PyInterpreterState_Main()`` will always refer to the initial interpreter
|
|
state created in ``Py_InitializeRuntime()``.
|
|
|
|
A new constraint is also added to the embedding API: attempting to delete
|
|
the main interpreter while subinterpreters still exist will now be a fatal
|
|
error.
|
|
|
|
|
|
Stable ABI
|
|
----------
|
|
|
|
Most of the APIs proposed in this PEP are excluded from the stable ABI, as
|
|
embedding a Python interpreter involves a much higher degree of coupling
|
|
than merely writing an extension module.
|
|
|
|
The only newly exposed APIs that will be part of the stable ABI are the
|
|
``Py_IsInitializing()`` and ``Py_IsRuntimeInitialized()`` queries.
|
|
|
|
|
|
Build time configuration
|
|
------------------------
|
|
|
|
This PEP makes no changes to the handling of build time configuration
|
|
settings, and thus has no effect on the contents of ``sys.implementation``
|
|
or the result of ``sysconfig.get_config_vars()``.
|
|
|
|
|
|
Backwards Compatibility
|
|
-----------------------
|
|
|
|
Backwards compatibility will be preserved primarily by ensuring that
|
|
``Py_BuildPythonConfig()`` interrogates all the previously defined
|
|
configuration settings stored in global variables and environment variables,
|
|
and that ``Py_InitializeMainInterpreter()`` writes affected settings back to
|
|
the relevant locations.
|
|
|
|
One acknowledged incompatibility is that some environment variables which
|
|
are currently read lazily may instead be read once during interpreter
|
|
initialization. As the reference implementation matures, these will be
|
|
discussed in more detail on a case-by-case basis. The environment variables
|
|
which are currently known to be looked up dynamically are:
|
|
|
|
* ``PYTHONCASEOK``: writing to ``os.environ['PYTHONCASEOK']`` will no longer
|
|
dynamically alter the interpreter's handling of filename case differences
|
|
on import (TBC)
|
|
* ``PYTHONINSPECT``: ``os.environ['PYTHONINSPECT']`` will still be checked
|
|
after execution of the ``__main__`` module terminates
|
|
|
|
The ``Py_Initialize()`` style of initialization will continue to be
|
|
supported. It will use (at least some elements of) the new API
|
|
internally, but will continue to exhibit the same behaviour as it
|
|
does today, ensuring that ``sys.argv`` is not populated until a subsequent
|
|
``PySys_SetArgv`` call (TBC). All APIs that currently support being called
|
|
prior to ``Py_Initialize()`` will
|
|
continue to do so, and will also support being called prior to
|
|
``Py_InitializeRuntime()``.
|
|
|
|
|
|
A System Python Executable
|
|
==========================
|
|
|
|
When executing system utilities with administrative access to a system, many
|
|
of the default behaviours of CPython are undesirable, as they may allow
|
|
untrusted code to execute with elevated privileges. The most problematic
|
|
aspects are the fact that user site directories are enabled,
|
|
environment variables are trusted and that the directory containing the
|
|
executed file is placed at the beginning of the import path.
|
|
|
|
Issue 16499 [6]_ added a ``-I`` option to change the behaviour of
|
|
the normal CPython executable, but this is a hard to discover solution (and
|
|
adds yet another option to an already complex CLI). This PEP proposes to
|
|
instead add a separate ``system-python`` executable
|
|
|
|
Currently, providing a separate executable with different default behaviour
|
|
would be prohibitively hard to maintain. One of the goals of this PEP is to
|
|
make it possible to replace much of the hard to maintain bootstrapping code
|
|
with more normal CPython code, as well as making it easier for a separate
|
|
application to make use of key components of ``Py_Main``. Including this
|
|
change in the PEP is designed to help avoid acceptance of a design that
|
|
sounds good in theory but proves to be problematic in practice.
|
|
|
|
Cleanly supporting this kind of "alternate CLI" is the main reason for the
|
|
proposed changes to better expose the core logic for deciding between the
|
|
different execution modes supported by CPython:
|
|
|
|
* script execution
|
|
* directory/zipfile execution
|
|
* command execution ("-c" switch)
|
|
* module or package execution ("-m" switch)
|
|
* execution from stdin (non-interactive)
|
|
* interactive stdin
|
|
|
|
Actually implementing this may also reveal the need for some better
|
|
argument parsing infrastructure for use during the initializing phase.
|
|
|
|
|
|
Open Questions
|
|
==============
|
|
|
|
* Error details for ``Py_BuildPythonConfig`` and
|
|
``Py_InitializeMainInterpreter`` (these should become clearer as the
|
|
implementation progresses)
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
The reference implementation is being developed as a private API refactoring
|
|
within the CPython reference interpreter (as attempting to maintain it as an
|
|
independent project proved impractical).
|
|
|
|
:pep:`587` extracts a subset of the proposal that is considered sufficiently stable
|
|
to be worth proposing as a public API for Python 3.8.
|
|
|
|
|
|
The Status Quo (as of Python 3.6)
|
|
=================================
|
|
|
|
The current mechanisms for configuring the interpreter have accumulated in
|
|
a fairly ad hoc fashion over the past 20+ years, leading to a rather
|
|
inconsistent interface with varying levels of documentation.
|
|
|
|
Also see :pep:`587` for further discussion of the existing settings and their
|
|
handling.
|
|
|
|
(Note: some of the info below could probably be cleaned up and added to the
|
|
C API documentation for 3.x - it's all CPython specific, so it
|
|
doesn't belong in the language reference)
|
|
|
|
|
|
Ignoring Environment Variables
|
|
------------------------------
|
|
|
|
The ``-E`` command line option allows all environment variables to be
|
|
ignored when initializing the Python interpreter. An embedding application
|
|
can enable this behaviour by setting ``Py_IgnoreEnvironmentFlag`` before
|
|
calling ``Py_Initialize()``.
|
|
|
|
In the CPython source code, the ``Py_GETENV`` macro implicitly checks this
|
|
flag, and always produces ``NULL`` if it is set.
|
|
|
|
<TBD: I believe PYTHONCASEOK is checked regardless of this setting >
|
|
<TBD: Does -E also ignore Windows registry keys? >
|
|
|
|
|
|
Randomised Hashing
|
|
------------------
|
|
|
|
The randomised hashing is controlled via the ``-R`` command line option (in
|
|
releases prior to 3.3), as well as the ``PYTHONHASHSEED`` environment
|
|
variable.
|
|
|
|
In Python 3.3, only the environment variable remains relevant. It can be
|
|
used to disable randomised hashing (by using a seed value of 0) or else
|
|
to force a specific hash value (e.g. for repeatability of testing, or
|
|
to share hash values between processes)
|
|
|
|
However, embedding applications must use the ``Py_HashRandomizationFlag``
|
|
to explicitly request hash randomisation (CPython sets it in ``Py_Main()``
|
|
rather than in ``Py_Initialize()``).
|
|
|
|
The new configuration API should make it straightforward for an
|
|
embedding application to reuse the ``PYTHONHASHSEED`` processing with
|
|
a text based configuration setting provided by other means (e.g. a
|
|
config file or separate environment variable).
|
|
|
|
|
|
Locating Python and the standard library
|
|
----------------------------------------
|
|
|
|
The location of the Python binary and the standard library is influenced
|
|
by several elements. The algorithm used to perform the calculation is
|
|
not documented anywhere other than in the source code [3]_, [4]_. Even that
|
|
description is incomplete, as it failed to be updated for the virtual
|
|
environment support added in Python 3.3 (detailed in :pep:`405`).
|
|
|
|
These calculations are affected by the following function calls (made
|
|
prior to calling ``Py_Initialize()``) and environment variables:
|
|
|
|
* ``Py_SetProgramName()``
|
|
* ``Py_SetPythonHome()``
|
|
* ``PYTHONHOME``
|
|
|
|
The filesystem is also inspected for ``pyvenv.cfg`` files (see :pep:`405`) or,
|
|
failing that, a ``lib/os.py`` (Windows) or ``lib/python$VERSION/os.py``
|
|
file.
|
|
|
|
The build time settings for ``PREFIX`` and ``EXEC_PREFIX`` are also relevant,
|
|
as are some registry settings on Windows. The hardcoded fallbacks are
|
|
based on the layout of the CPython source tree and build output when
|
|
working in a source checkout.
|
|
|
|
|
|
Configuring ``sys.path``
|
|
------------------------
|
|
|
|
An embedding application may call ``Py_SetPath()`` prior to
|
|
``Py_Initialize()`` to completely override the calculation of
|
|
``sys.path``. It is not straightforward to only allow *some* of the
|
|
calculations, as modifying ``sys.path`` after initialization is
|
|
already complete means those modifications will not be in effect
|
|
when standard library modules are imported during the startup sequence.
|
|
|
|
If ``Py_SetPath()`` is not used prior to the first call to ``Py_GetPath()``
|
|
(implicit in ``Py_Initialize()``), then it builds on the location data
|
|
calculations above to calculate suitable path entries, along with
|
|
the ``PYTHONPATH`` environment variable.
|
|
|
|
<TBD: On Windows, there's also a bunch of stuff to do with the registry>
|
|
|
|
The ``site`` module, which is implicitly imported at startup (unless
|
|
disabled via the ``-S`` option) adds additional paths to this initial
|
|
set of paths, as described in its documentation [5]_.
|
|
|
|
The ``-s`` command line option can be used to exclude the user site
|
|
directory from the list of directories added. Embedding applications
|
|
can control this by setting the ``Py_NoUserSiteDirectory`` global variable.
|
|
|
|
The following commands can be used to check the default path configurations
|
|
for a given Python executable on a given system:
|
|
|
|
* ``./python -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- standard configuration
|
|
* ``./python -s -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- user site directory disabled
|
|
* ``./python -S -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- all site path modifications disabled
|
|
|
|
(Note: you can see similar information using ``-m site`` instead of ``-c``,
|
|
but this is slightly misleading as it calls ``os.abspath`` on all of the
|
|
path entries, making relative path entries look absolute. Using the ``site``
|
|
module also causes problems in the last case, as on Python versions prior to
|
|
3.3, explicitly importing site will carry out the path modifications ``-S``
|
|
avoids, while on 3.3+ combining ``-m site`` with ``-S`` currently fails)
|
|
|
|
The calculation of ``sys.path[0]`` is comparatively straightforward:
|
|
|
|
* For an ordinary script (Python source or compiled bytecode),
|
|
``sys.path[0]`` will be the directory containing the script.
|
|
* For a valid ``sys.path`` entry (typically a zipfile or directory),
|
|
``sys.path[0]`` will be that path
|
|
* For an interactive session, running from stdin or when using the ``-c`` or
|
|
``-m`` switches, ``sys.path[0]`` will be the empty string, which the import
|
|
system interprets as allowing imports from the current directory
|
|
|
|
|
|
Configuring ``sys.argv``
|
|
------------------------
|
|
|
|
Unlike most other settings discussed in this PEP, ``sys.argv`` is not
|
|
set implicitly by ``Py_Initialize()``. Instead, it must be set via an
|
|
explicitly call to ``Py_SetArgv()``.
|
|
|
|
CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The
|
|
calculation of ``sys.argv[1:]`` is straightforward: they're the command line
|
|
arguments passed after the script name or the argument to the ``-c`` or
|
|
``-m`` options.
|
|
|
|
The calculation of ``sys.argv[0]`` is a little more complicated:
|
|
|
|
* For an ordinary script (source or bytecode), it will be the script name
|
|
* For a ``sys.path`` entry (typically a zipfile or directory) it will
|
|
initially be the zipfile or directory name, but will later be changed by
|
|
the ``runpy`` module to the full path to the imported ``__main__`` module.
|
|
* For a module specified with the ``-m`` switch, it will initially be the
|
|
string ``"-m"``, but will later be changed by the ``runpy`` module to the
|
|
full path to the executed module.
|
|
* For a package specified with the ``-m`` switch, it will initially be the
|
|
string ``"-m"``, but will later be changed by the ``runpy`` module to the
|
|
full path to the executed ``__main__`` submodule of the package.
|
|
* For a command executed with ``-c``, it will be the string ``"-c"``
|
|
* For explicitly requested input from stdin, it will be the string ``"-"``
|
|
* Otherwise, it will be the empty string
|
|
|
|
Embedding applications must call Py_SetArgv themselves. The CPython logic
|
|
for doing so is part of ``Py_Main()`` and is not exposed separately.
|
|
However, the ``runpy`` module does provide roughly equivalent logic in
|
|
``runpy.run_module`` and ``runpy.run_path``.
|
|
|
|
|
|
|
|
Other configuration settings
|
|
----------------------------
|
|
|
|
TBD: Cover the initialization of the following in more detail:
|
|
|
|
* Completely disabling the import system
|
|
* The initial warning system state:
|
|
|
|
* ``sys.warnoptions``
|
|
* (-W option, PYTHONWARNINGS)
|
|
|
|
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
|
|
|
|
* ``sys._xoptions``
|
|
* (-X option)
|
|
|
|
* The filesystem encoding used by:
|
|
|
|
* ``sys.getfsencoding``
|
|
* ``os.fsencode``
|
|
* ``os.fsdecode``
|
|
|
|
* The IO encoding and buffering used by:
|
|
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* ``sys.stderr``
|
|
* (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
|
|
|
|
* Whether or not to implicitly cache bytecode files:
|
|
|
|
* ``sys.dont_write_bytecode``
|
|
* (-B option, PYTHONDONTWRITEBYTECODE)
|
|
|
|
* Whether or not to enforce correct case in filenames on case-insensitive
|
|
platforms
|
|
|
|
* ``os.environ["PYTHONCASEOK"]``
|
|
|
|
* The other settings exposed to Python code in ``sys.flags``:
|
|
|
|
* ``debug`` (Enable debugging output in the pgen parser)
|
|
* ``inspect`` (Enter interactive interpreter after __main__ terminates)
|
|
* ``interactive`` (Treat stdin as a tty)
|
|
* ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
|
|
* ``no_user_site`` (don't add the user site directory to sys.path)
|
|
* ``no_site`` (don't implicitly import site during startup)
|
|
* ``ignore_environment`` (whether environment vars are used during config)
|
|
* ``verbose`` (enable all sorts of random output)
|
|
* ``bytes_warning`` (warnings/errors for implicit str/bytes interaction)
|
|
* ``quiet`` (disable banner output even if verbose is also enabled or
|
|
stdin is a tty and the interpreter is launched in interactive mode)
|
|
|
|
* Whether or not CPython's signal handlers should be installed
|
|
|
|
Much of the configuration of CPython is currently handled through C level
|
|
global variables::
|
|
|
|
Py_BytesWarningFlag (-b)
|
|
Py_DebugFlag (-d option)
|
|
Py_InspectFlag (-i option, PYTHONINSPECT)
|
|
Py_InteractiveFlag (property of stdin, cannot be overridden)
|
|
Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
|
|
Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
|
|
Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
|
|
Py_NoSiteFlag (-S option)
|
|
Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO)
|
|
Py_VerboseFlag (-v option, PYTHONVERBOSE)
|
|
|
|
For the above variables, the conversion of command line options and
|
|
environment variables to C global variables is handled by ``Py_Main``,
|
|
so each embedding application must set those appropriately in order to
|
|
change them from their defaults.
|
|
|
|
Some configuration can only be provided as OS level environment variables::
|
|
|
|
PYTHONSTARTUP
|
|
PYTHONCASEOK
|
|
PYTHONIOENCODING
|
|
|
|
The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate
|
|
whether or not CPython's signal handlers should be installed.
|
|
|
|
Finally, some interactive behaviour (such as printing the introductory
|
|
banner) is triggered only when standard input is reported as a terminal
|
|
connection by the operating system.
|
|
|
|
TBD: Document how the "-x" option is handled (skips processing of the
|
|
first comment line in the main script)
|
|
|
|
Also see detailed sequence of operations notes at [1]_.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] CPython interpreter initialization notes
|
|
(http://wiki.python.org/moin/CPythonInterpreterInitialization)
|
|
|
|
.. [2] BitBucket Sandbox
|
|
(https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits)
|
|
|
|
.. [3] \*nix getpath implementation
|
|
(http://hg.python.org/cpython/file/default/Modules/getpath.c)
|
|
|
|
.. [4] Windows getpath implementation
|
|
(http://hg.python.org/cpython/file/default/PC/getpathp.c)
|
|
|
|
.. [5] Site module documentation
|
|
(http://docs.python.org/3/library/site.html)
|
|
|
|
.. [6] Proposed CLI option for isolated mode
|
|
(http://bugs.python.org/issue16499)
|
|
|
|
.. [7] Adding to sys.path on the command line
|
|
(https://mail.python.org/pipermail/python-ideas/2010-October/008299.html)
|
|
(https://mail.python.org/pipermail/python-ideas/2012-September/016128.html)
|
|
|
|
.. [8] Control sys.path[0] initialisation
|
|
(http://bugs.python.org/issue13475)
|
|
|
|
.. [9] Enabling code coverage in subprocesses when testing
|
|
(http://bugs.python.org/issue14803)
|
|
|
|
.. [10] Problems with PYTHONIOENCODING in Blender
|
|
(http://bugs.python.org/issue16129)
|
|
|
|
|
|
|
|
Copyright
|
|
===========
|
|
This document has been placed in the public domain.
|