PEP 432: Reframe as core init vs main interpreter init

This commit is contained in:
Nick Coghlan 2015-06-30 21:43:19 +10:00
parent f6830b511b
commit 6847250c7a
1 changed files with 141 additions and 139 deletions

View File

@ -28,50 +28,42 @@ implementation is developed.
Proposal
========
This PEP proposes that CPython move to an explicit multi-phase initialization
process, where a preliminary interpreter is put in place with limited OS
interaction capabilities early in the startup sequence. This essential core
remains in place while all of the configuration settings are determined,
until a final configuration call takes those settings and finishes
bootstrapping the interpreter immediately before locating and executing
the main module.
This PEP proposes that initialization of the CPython runtime be split into
two clearly distinct phases:
* core runtime initialization
* main interpreter initialization
The proposed design also has significant implications for:
* main module execution
* subinterpreter initialization
In the new design, the interpreter will move through the following
well-defined phases during the initialization sequence:
* Pre-Initialization - no interpreter available
* Initializing - interpreter partially available
* Initialized - interpreter available, __main__ related metadata
incomplete
With the interpreter itself fully initialised, main module execution will
then proceed through two phases:
* Main Preparation - __main__ related metadata populated
* Main Execution - bytecode executing in the __main__ module namespace
(Embedding applications may choose not to use the Main Preparation and
Execution phases)
* Core Initialized - main interpreter partially available,
subinterpreter creation not yet available
* Initialized - main interpreter fully available, subinterpreter creation
available
As a concrete use case to help guide any design changes, and to solve a known
problem where the appropriate defaults for system utilities differ from those
for running user scripts, this PEP also proposes the creation and
distribution of a separate system Python (``pysystem``) executable
which, by default, ignores user site directories and environment variables,
and does not implicitly set ``sys.path[0]`` based on the current directory
or the script being executed (it will, however, still support virtual
environments).
which, by default, operates in "isolated mode" (as selected by the CPython
``-I`` switch).
To keep the implementation complexity under control, this PEP does *not*
propose wholesale changes to the way the interpreter state is accessed at
runtime. Changing the order in which the existing initialization steps
occur in order to make
the startup sequence easier to maintain is already a substantial change, and
attempting to make those other changes at the same time will make the
change significantly more invasive and much harder to review. However, such
proposals may be suitable topics for follow-on PEPs or patches - one key
benefit of this PEP is decreasing the coupling between the internal storage
model and the configuration interface, so such changes should be easier
occur in order to make the startup sequence easier to maintain is already a
substantial change, and attempting to make those other changes at the same time
will make the change significantly more invasive and much harder to review.
However, such proposals may be suitable topics for follow-on PEPs or patches
- one key benefit of this PEP is decreasing the coupling between the internal
storage model and the configuration interface, so such changes should be easier
once this PEP has been implemented.
@ -94,10 +86,10 @@ API cannot be used safely.
A number of proposals are on the table for even *more* sophisticated
startup behaviour, such as better control over ``sys.path``
initialization (easily adding additional directories on the command line
in a cross-platform fashion [7_], as well as controlling the configuration of
initialization (e.g. easily adding additional directories on the command line
in a cross-platform fashion [7_], controlling the configuration of
``sys.path[0]`` [8_]), easier configuration of utilities like coverage
tracing when launching Python subprocesses [9_].
tracing when launching Python subprocesses [9_]).
Rather than continuing to bolt such behaviour onto an already complicated
system, this PEP proposes to start simplifying the status quo by introducing
@ -259,54 +251,33 @@ Three distinct interpreter initialisation phases are proposed:
* Pre-Initialization:
* no interpreter is available.
* ``Py_IsInitializing()`` returns ``0``
* ``Py_IsCoreInitialized()`` returns ``0``
* ``Py_IsInitialized()`` returns ``0``
* The embedding application determines the settings required to create the
main interpreter and moves to the next phase by calling
``Py_BeginInitialization``.
``Py_InitializationCore``.
* Initializing:
* Core Initialized:
* the main interpreter is available, but only partially configured.
* ``Py_IsInitializing()`` returns ``1``
* ``Py_IsCoreInitialized()`` returns ``1``
* ``Py_IsInitialized()`` returns ``0``
* The embedding application determines and applies the settings
required to complete the initialization process by calling
``Py_ReadConfig`` and ``Py_EndInitialization``.
``Py_ReadMainInterpreterConfig`` and ``Py_InitializeMainInterpreter``.
* Initialized:
* the main interpreter is available and fully operational, but
``__main__`` related metadata is incomplete
* ``Py_IsInitializing()`` returns ``0``
* ``Py_IsCoreInitialized()`` returns ``1``
* ``Py_IsInitialized()`` returns ``1``
Main Execution Phases
---------------------
After initializing the interpreter, the embedding application may continue
on to execute code in the ``__main__`` module namespace.
* Main Preparation:
* subphase of Initialized (not separately identified at runtime)
* fully populates ``__main__`` related metadata
* may execute code in ``__main__`` namespace (e.g. ``PYTHONSTARTUP``)
* invoked as ``PyRun_PrepareMain``
* Main Execution:
* subphase of Initialized (not separately identified at runtime)
* user supplied bytecode is being executed in the ``__main__`` namespace
* invoked as ``PyRun_ExecMain``
Invocation of Phases
--------------------
All listed phases will be used by the standard CPython interpreter and the
proposed System Python interpreter. Other embedding applications may
choose to skip the step of executing code in the ``__main__`` namespace.
proposed System Python interpreter.
An embedding application may still continue to leave initialization almost
entirely under CPython's control by using the existing ``Py_Initialize``
@ -317,21 +288,21 @@ over the initialization process::
/* Phase 1: Pre-Initialization */
PyCoreConfig core_config = PyCoreConfig_INIT;
PyConfig config = PyConfig_INIT;
PyMainInterpreterConfig config = PyMainInterpreterConfig_INIT;
/* Easily control the core configuration */
core_config.ignore_environment = 1; /* Ignore environment variables */
core_config.use_hash_seed = 0; /* Full hash randomisation */
Py_BeginInitialization(&core_config);
Py_InitializeCore(&core_config);
/* Phase 2: Initialization */
/* Optionally preconfigure some settings here - they will then be
* used to derive other settings */
Py_ReadConfig(&config);
Py_ReadMainInterpreterConfig(&config);
/* Can completely override derived settings here */
Py_EndInitialization(&config);
Py_InitializeMainInterpreter(&config);
/* Phase 3: Initialized */
/* If an embedding application has no real concept of a main module
* it can just stop the initialization process here.
* Alternatively, it can launch __main__ via the PyRun_*Main functions.
* Alternatively, it can launch __main__ via the relevant API functions.
*/
@ -356,7 +327,7 @@ system.
The proposed API for this step in the startup sequence is::
void Py_BeginInitialization(const PyCoreConfig *config);
void Py_InitializeCore(const PyCoreConfig *config);
Like ``Py_Initialize``, this part of the new API treats initialization
failures
@ -366,7 +337,7 @@ to return error codes instead of aborting would be an even larger task than
the one already being proposed.
The new ``PyCoreConfig`` struct holds the settings required for preliminary
configuration::
configuration of the core runtime and creation of the main interpreter::
/* Note: if changing anything in PyCoreConfig, also update
* PyCoreConfig_INIT */
@ -435,18 +406,21 @@ The aim is to keep this initial level of configuration as small as possible
in order to keep the bootstrapping environment consistent across
different embedding applications. If we can create a valid interpreter state
without the setting, then the setting should go in the configuration passed
to ``Py_EndInitialization()`` rather than in the core configuration.
to ``Py_InitializeMainInterpreter()`` rather than in the core configuration.
A new query API will allow code to determine if the interpreter is in the
bootstrapping state between the creation of the interpreter state and the
completion of the bulk of the initialization process::
int Py_IsInitializing();
int Py_IsCoreInitialized();
Attempting to call ``Py_BeginInitialization()`` again when
``Py_IsInitializing()`` or ``Py_IsInitialized()`` is true is a fatal error.
Attempting to call ``Py_InitializeCore()`` again when
``Py_IsCoreInitialized()`` is true is a fatal error.
While in the initializing state, the interpreter should be fully functional
As frozen bytecode may now be legitimately run in an interpreter which is not
yet fully initialized, ``sys.flags`` will gain a new ``initialized`` flag.
With the core runtime initialised, the interpreter should be fully functional
except that:
* compilation is not allowed (as the parser and compiler are not yet
@ -463,23 +437,25 @@ except that:
* ``sys.exec_prefix``
* ``sys.prefix``
* ``sys.warnoptions``
* ``sys.flags``
* ``sys.dont_write_bytecode``
* ``sys.stdin``
* ``sys.stdout``
* The filesystem encoding is not yet defined
* The IO encoding is not yet defined
* CPython signal handlers are not yet installed
* only builtin and frozen modules may be imported (due to above limitations)
* Only builtin and frozen modules may be imported (due to above limitations)
* ``sys.stderr`` is set to a temporary IO object using unbuffered binary
mode
* The ``sys.flags`` attribute exists, but may contain flags may not yet
have their final values.
* The ``sys.flags.initialized`` attribute is set to ``0``
* The ``warnings`` module is not yet initialized
* The ``__main__`` module does not yet exist
<TBD: identify any other notable missing functionality>
The main things made available by this step will be the core Python
datatypes, in particular dictionaries, lists and strings. This allows them
data types, in particular dictionaries, lists and strings. This allows them
to be used safely for all of the remaining configuration steps (unlike the
status quo).
@ -487,9 +463,10 @@ In addition, the current thread will possess a valid Python thread state,
allowing any further configuration data to be stored on the interpreter
object rather than in C process globals.
Any call to ``Py_BeginInitialization()`` must have a matching call to
``Py_Finalize()``. It is acceptable to skip calling Py_EndInitialization() in
between (e.g. if attempting to read the configuration settings fails)
Any call to ``Py_InitializeCore()`` must have a matching call to
``Py_Finalize()``. It is acceptable to skip calling
``Py_InitializeMainInterpreter()`` in between (e.g. if attempting to read the
main interpreter configuration settings fails)
Determining the remaining configuration settings
@ -499,7 +476,7 @@ The next step in the initialization sequence is to determine the full
settings needed to complete the process. No changes are made to the
interpreter state at this point. The core API for this step is::
int Py_ReadConfig(PyConfig *config);
int Py_ReadMainInterpreterConfig(PyMainInterpreterConfig *config);
The config argument should be a pointer to a config struct (which may be
a temporary one stored on the C stack). For any already configured value
@ -512,35 +489,47 @@ CPython version and only a read-only view needs to be exposed to Python
code (which is relatively straightforward, thanks to the infrastructure
already put in place to expose ``sys.implementation``).
Unlike ``Py_Initialize`` and ``Py_BeginInitialization``, this call will raise
Unlike ``Py_Initialize`` and ``Py_InitializeCore``, this call will raise
an exception and report an error return rather than exhibiting fatal errors
if a problem is found with the config data.
Any supported configuration setting which is not already set will be
populated appropriately in the supplied configuration struct. The default
configuration can be overridden entirely by setting the value *before*
calling ``Py_ReadConfiguration``. The provided value will then also be used
in calculating any other settings derived from that value.
calling ``Py_ReadMainInterpreterConfig``. The provided value will then also be
used in calculating any other settings derived from that value.
Alternatively, settings may be overridden *after* the
``Py_ReadConfiguration`` call (this can be useful if an embedding
``Py_ReadMainInterpreterConfig`` call (this can be useful if an embedding
application wants to adjust a setting rather than replace it completely,
such as removing ``sys.path[0]``).
Merely reading the configuration has no effect on the interpreter state: it
only modifies the passed in configuration struct. The settings are not
applied to the running interpreter until the ``Py_EndInitialization`` call
(see below).
applied to the running interpreter until the ``Py_InitializeMainInterpreter``
call (see below).
Supported configuration settings
--------------------------------
The new ``PyConfig`` struct holds the settings required to complete the
interpreter configuration. All fields are either pointers to Python
data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
The interpreter configuration is split into two parts: settings which are
either relevant only to the main interpreter or must be identical across the
main interpreter and all subinterpreters, and settings which may vary across
subinterpreters.
/* Note: if changing anything in PyConfig, also update PyConfig_INIT */
NOTE: For initial implementation purposes, only the flag indicating whether
or not the interpreter is the main interpreter will be configured on a per
interpreter basis. Other fields will be reviewed for whether or not they can
feasibly be made interpreter specific over the course of the implementation.
The ``PyMainInterpreterConfig`` struct holds the settings required to
complete the main interpreter configuration. These settings are also all
passed through unmodified to subinterpreters. Fields are either pointers to
Python data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
/* Note: if changing anything in PyMainInterpreterConfig, also update
* PyMainInterpreterConfig_INIT */
typedef struct {
/* Argument processing */
PyListObject *raw_argv;
@ -613,10 +602,10 @@ data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
int show_banner; /* -q switch (inverted) */
int inspect_main; /* -i switch, PYTHONINSPECT */
} PyConfig;
} PyMainInterpreterConfig;
/* Struct initialization is pretty ugly in C89. Avoiding this mess would
/* Struct initialization is pretty horrible in C89. Avoiding this mess would
* be the most attractive aspect of using a PyDictObject* instead... */
#define _PyArgConfig_INIT NULL, NULL, NULL, NULL
#define _PyLocationConfig_INIT NULL, NULL, NULL, NULL, NULL, NULL
@ -631,13 +620,28 @@ data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
#define _PyMainConfig_INIT -1, NULL, NULL, NULL, NULL, NULL, -1
#define _PyInteractiveConfig_INIT NULL, -1, -1
#define PyConfig_INIT {_PyArgConfig_INIT, _PyLocationConfig_INIT,
#define PyMainInterpreterConfig_INIT {
_PyArgConfig_INIT, _PyLocationConfig_INIT,
_PySiteConfig_INIT, _PyImportConfig_INIT,
_PyStreamConfig_INIT, _PyFilesystemConfig_INIT,
_PyDebuggingConfig_INIT, _PyCodeGenConfig_INIT,
_PySignalConfig_INIT, _PyImplicitConfig_INIT,
_PyMainConfig_INIT, _PyInteractiveConfig_INIT}
The ``PyInterpreterConfig`` struct holds the settings that may vary between
the main interpreter and subinterpreters. For the main interpreter, these
settings are automatically populated by ``Py_InitializeMainInterpreter()``.
::
/* Note: if changing anything in PyInterpreterConfig, also update
* PyInterpreterConfig_INIT */
typedef struct {
int is_main_interpreter; /* Easily check for subinterpreters */
} PyInterpreterConfig;
#define PyInterpreterConfig_INIT {0}
<TBD: did I miss anything?>
@ -645,26 +649,25 @@ Completing the interpreter initialization
-----------------------------------------
The final step in the initialization process is to actually put the
configuration settings into effect and finish bootstrapping the interpreter
up to full operation::
configuration settings into effect and finish bootstrapping the main
interpreter up to full operation::
int Py_EndInitialization(const PyConfig *config);
int Py_InitializeMainInterpreter(const PyMainInterpreterConfig *config);
Like Py_ReadConfiguration, this call will raise an exception and report an
error return rather than exhibiting fatal errors if a problem is found with
the config data.
Like ``Py_ReadMainInterpreterConfig``, this call will raise an exception and
report an error return rather than exhibiting fatal errors if a problem is
found with the config data.
All configuration settings are required - the configuration struct
should always be passed through ``Py_ReadConfig()`` to ensure it
should always be passed through ``Py_ReadMainInterpreterConfig`` to ensure it
is fully populated.
After a successful call, ``Py_IsInitializing()`` will be false, while
``Py_IsInitialized()`` will become true. The caveats described above for the
interpreter during the initialization phase will no longer hold.
After a successful call ``Py_IsInitialized()`` will become true. The caveats
described above for the interpreter during the phase where only the core
runtime is initialized will no longer hold.
Attempting to call ``Py_EndInitialization()`` again when
``Py_IsInitializing()`` is false or ``Py_IsInitialized()`` is true is an
error.
Attempting to call ``Py_InitializeMainInterpreter()`` again when
``Py_IsInitialized()`` is true is an error.
However, some metadata related to the ``__main__`` module may still be
incomplete:
@ -702,6 +705,10 @@ It is handled by calling the following API::
int PyRun_PrepareMain();
This operation is only permitted for the main interpreter, and will raise
``RuntimeError`` when invoked from a thread where the current thread state
belongs to a subinterpreter.
The actual processing is driven by the main related settings stored in
the interpreter state as part of the configuration struct.
@ -760,6 +767,10 @@ It is handled by calling the following API::
int PyRun_ExecMain();
This operation is only permitted for the main interpreter, and will raise
``RuntimeError`` when invoked from a thread where the current thread state
belongs to a subinterpreter.
The actual processing is driven by the main related settings stored in
the interpreter state as part of the configuration struct.
@ -771,22 +782,22 @@ be reported.
If ``main_stream`` and ``prompt_stream`` are both set, main execution will
be delegated to a new API::
int PyRun_InteractiveMain(PyObject *input, PyObject* output);
int _PyRun_InteractiveMain(PyObject *input, PyObject* output);
If ``main_stream`` is set and ``prompt_stream`` is NULL, main execution will
be delegated to a new API::
int PyRun_StreamInMain(PyObject *input);
int _PyRun_StreamInMain(PyObject *input);
If ``main_code`` is set, main execution will be delegated to a new
API::
int PyRun_CodeInMain(PyCodeObject *code);
int _PyRun_CodeInMain(PyCodeObject *code);
After execution of main completes, if ``inspect_main`` is set, or
the ``PYTHONINSPECT`` environment variable has been set, then
``PyRun_ExecMain`` will invoke
``PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__)``.
``_PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__)``.
Internal Storage of Configuration Data
@ -794,8 +805,8 @@ Internal Storage of Configuration Data
The interpreter state will be updated to include details of the configuration
settings supplied during initialization by extending the interpreter state
object with an embedded copy of the ``PyCoreConfig`` and ``PyConfig``
structs.
object with an embedded copy of the ``PyCoreConfig``,
``PyMainInterpreterConfig`` and ``PyInterpreterConfig`` structs.
For debugging purposes, the configuration settings will be exposed as
a ``sys._configuration`` simple namespace (similar to ``sys.flags`` and
@ -838,7 +849,7 @@ will be used.
While the existing ``Py_InterpreterState_Head()`` API could be used instead,
that reference changes as subinterpreters are created and destroyed, while
``PyInterpreterState_Main()`` will always refer to the initial interpreter
state created in ``Py_BeginInitialization()``.
state created in ``Py_InitializeCore()``.
A new constraint is also added to the embedding API: attempting to delete
the main interpreter while subinterpreters still exist will now be a fatal
@ -853,7 +864,7 @@ embedding a Python interpreter involves a much higher degree of coupling
than merely writing an extension.
The only newly exposed API that will be part of the stable ABI is the
``Py_IsInitializing()`` query.
``Py_IsCoreInitialized()`` query.
Build time configuration
@ -868,10 +879,10 @@ Backwards Compatibility
-----------------------
Backwards compatibility will be preserved primarily by ensuring that
``Py_ReadConfig()`` interrogates all the previously defined
``Py_ReadMainInterpreterConfig()`` interrogates all the previously defined
configuration settings stored in global variables and environment variables,
and that ``Py_EndInitialization()`` writes affected settings back to the
relevant locations.
and that ``Py_InitializeMainInterpreter()`` writes affected settings back to
the relevant locations.
One acknowledged incompatiblity is that some environment variables which
are currently read lazily may instead be read once during interpreter
@ -892,7 +903,7 @@ does today, ensuring that ``sys.argv`` is not populated until a subsequent
``PySys_SetArgv`` call. All APIs that currently support being called
prior to ``Py_Initialize()`` will
continue to do so, and will also support being called prior to
``Py_BeginInitialization()``.
``Py_InitializeCore()``.
To minimise unnecessary code churn, and to ensure the backwards compatibility
is well tested, the main CPython executable may continue to use some elements
@ -909,7 +920,7 @@ aspects are the fact that user site directories are enabled,
environment variables are trusted and that the directory containing the
executed file is placed at the beginning of the import path.
Issue 16499 [6_] proposes adding a ``-I`` option to change the behaviour of
Issue 16499 [6_] added a ``-I`` option to change the behaviour of
the normal CPython executable, but this is a hard to discover solution (and
adds yet another option to an already complex CLI). This PEP proposes to
instead add a separate ``pysystem`` executable
@ -940,19 +951,19 @@ argument parsing infrastructure for use during the initializing phase.
Open Questions
==============
* Error details for Py_ReadConfiguration and Py_EndInitialization (these
should become clear as the implementation progresses)
* Should there be ``Py_PreparingMain()`` and ``Py_RunningMain()`` query APIs?
* Should the answer to ``Py_IsInitialized()`` be exposed via the ``sys``
module?
* Is initialisation of the ``PyConfig`` struct too unwieldy to be
maintainable? Would a Python dictionary be a better choice, despite
being harder to work with from C code?
* Would it be better to manage the flag variables in ``PyConfig`` as
Python integers or as "negative means false, positive means true, zero
* Error details for ``Py_ReadMainInterpreterConfig`` and
``Py_InitializeMainInterpreter`` (these should become clearer as the
implementation progresses)
* Is initialisation of the ``PyMainInterpreterConfig`` struct too unwieldy to
be maintainable? Would a Python dictionary be a better choice, despite
being harder to work with from C code? Can we upgrade to requiring a C99
compatible compiler?
* Would it be better to manage the flag variables in ``PyMainInterpreterConfig``
as Python integers or as "negative means false, positive means true, zero
means not set" so the struct can be initialized with a simple
``memset(&config, 0, sizeof(*config))``, eliminating the need to update
both PyConfig and PyConfig_INIT when adding new fields?
both PyMainInterpreterConfig and PyMainInterpreterConfig_INIT when adding
new fields?
* The name of the new system Python executable is a bikeshed waiting to be
painted. The 3 options considered so far are ``spython``, ``pysystem``
and ``python-minimal``. The PEP text reflects my current preferred choice
@ -969,15 +980,6 @@ for other pull requests to be feasible just yet. Once the overall design
settles down and it's a matter of migrating individual settings over to
the new design, that level of collaboration should become more practical.
As the number of application binaries created by the build process is now
four, the reference implementation also creates a new top level "Apps"
directory in the CPython source tree. The source files for the main
``python`` binary and the new ``pysystem`` binary will be located in that
directory. The source files for the ``_freeze_importlib`` binary and the
``_testembed`` binary have been moved out of the Modules directory (which
is intended for CPython builtin and extension modules) and into the Tools
directory.
The Status Quo
==============