PEP 432: Flesh out a design for main execution

This commit is contained in:
Nick Coghlan 2013-01-13 16:35:06 +10:00
parent 38af865f01
commit cc027218c7
1 changed files with 175 additions and 56 deletions

View File

@ -43,9 +43,11 @@ well-defined phases during the startup sequence:
* Initializing - interpreter partially available
* Initialized - interpreter available, __main__ related metadata
incomplete
* Main Execution - __main__ related metadata populated, bytecode
executing in the __main__ module namespace (embedding applications
may choose not to use this phase)
* Main Preparation - __main__ related metadata populated
* Main Execution - bytecode executing in the __main__ module namespace
(Embedding applications may choose not to use the Main Preparation and
Execution phases)
As a concrete use case to help guide any design changes, and to solve a known
problem where the appropriate defaults for system utilities differ from those
@ -78,21 +80,21 @@ complicated, offering more options, as well as performing more complex tasks
(such as configuring the Unicode settings for OS interfaces in Python 3 as
well as bootstrapping a pure Python implementation of the import system).
Much of this complexity is accessible only through the ``Py_Main`` and
``Py_Initialize`` APIs, offering embedding applications little opportunity
for customisation. This creeping complexity also makes life difficult for
maintainers, as much of the configuration needs to take place prior to the
``Py_Initialize`` call, meaning much of the Python C API cannot be used
safely.
Much of this complexity is formally accessible only through the ``Py_Main``
and ``Py_Initialize`` APIs, offering embedding applications little
opportunity for customisation. This creeping complexity also makes life
difficult for maintainers, as much of the configuration needs to take
place prior to the ``Py_Initialize`` call, meaning much of the Python C
API cannot be used safely.
A number of proposals are on the table for even *more* sophisticated
startup behaviour, such as an isolated mode equivalent to that described in
this PEP as a "system Python" [6_], better control over ``sys.path``
initialization (easily adding additional directories on the command line
in a cross-platform fashion [7_], as well as controlling the configuration of
``sys.path[0]`` [8_]), easier configuration of utilities like coverage tracing
when launching Python subprocesses [9_], and easier control of the encoding
used for the standard IO streams when embedding CPython in a larger
``sys.path[0]`` [8_]), easier configuration of utilities like coverage
tracing when launching Python subprocesses [9_], and easier control of the
encoding used for the standard IO streams when embedding CPython in a larger
application [10_].
Rather than attempting to bolt such behaviour onto an already complicated
@ -118,8 +120,8 @@ to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1_].
By moving to an explicitly multi-phase startup sequence, developers should
only need to understand which features are not available in the core
bootstrapping state, as the vast majority of the configuration process
will now take place in that state.
bootstrapping phase, as the vast majority of the configuration process
will now take place during that phase.
By basing the new design on a combination of C structures and Python
data types, it should also be easier to modify the system in the
@ -504,14 +506,13 @@ CPython command line application.
Interpreter Initialization Phases
---------------------------------
Four distinct phases are proposed:
Five distinct phases are proposed:
* Pre-Initialization:
* no interpreter is available.
* ``Py_IsInitializing()`` returns ``0``
* ``Py_IsInitialized()`` returns ``0``
* ``Py_IsRunningMain()`` returns ``0``
* The embedding application determines the settings required to create the
main interpreter and moves to the next phase by calling
``Py_BeginInitialization``.
@ -521,7 +522,6 @@ Four distinct phases are proposed:
* the main interpreter is available, but only partially configured.
* ``Py_IsInitializing()`` returns ``1``
* ``Py_IsInitialized()`` returns ``0``
* ``Py_RunningMain()`` returns ``0``
* The embedding application determines and applies the settings
required to complete the initialization process by calling
``Py_ReadConfiguration`` and ``Py_EndInitialization``.
@ -529,26 +529,28 @@ Four distinct phases are proposed:
* Initialized:
* the main interpreter is available and fully operational, but
``__main__`` related metadata is incomplete and the site module may
not have been imported.
``__main__`` related metadata is incomplete
* ``Py_IsInitializing()`` returns ``0``
* ``Py_IsInitialized()`` returns ``1``
* ``Py_IsRunningMain()`` returns ``0``
* Optionally, the embedding application may identify and begin
executing code in the ``__main__`` module namespace by calling
``Py_RunPathAsMain``, ``Py_RunModuleAsMain`` or ``Py_RunStreamAsMain``.
``PyRun_PrepareMain`` and ``PyRun_ExecMain``.
* Main Preparation:
* subphase of Initialized (not separately identified at runtime)
* fully populates ``__main__`` related metadata
* may execute code in ``__main__`` namespace (e.g. ``PYTHONSTARTUP``)
* Main Execution:
* bytecode is being executed in the ``__main__`` namespace
* ``Py_IsInitializing()`` returns ``0``
* ``Py_IsInitialized()`` returns ``1``
* ``Py_IsRunningMain()`` returns ``1``
* subphase of Initialized (not separately identified at runtime)
* user supplied bytecode is being executed in the ``__main__`` namespace
As indicated by the phase reporting functions, main module execution is
an optional subphase of Initialized rather than a completely distinct phase.
As noted above, main module preparation and execution are optional subphases
of Initialized rather than completely distinct phases.
All 4 phases will be used by the standard CPython interpreter and the
All listed phases will be used by the standard CPython interpreter and the
proposed System Python interpreter. Other embedding applications may
choose to skip the step of executing code in the ``__main__`` namespace.
@ -817,15 +819,9 @@ data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
/* Filesystem access */
PyUnicodeObject *fs_encoding;
/* Interactive interpreter */
int stdin_is_interactive; /* Force interactive behaviour */
int inspect_main; /* -i switch, PYTHONINSPECT */
PyUnicodeObject *startup_file; /* PYTHONSTARTUP */
/* Debugging output */
int debug_parser; /* -d switch, PYTHONDEBUG */
int verbosity; /* -v switch */
int suppress_banner; /* -q switch */
/* Code generation */
int bytes_warnings; /* -b switch */
@ -833,6 +829,32 @@ data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
/* Signal handling */
int install_sig_handlers;
/* Implicit execution */
PyUnicodeObject *startup_file; /* PYTHONSTARTUP */
/* Main module
*
* If prepare_main is set, at most one of the main_* settings should
* be set before calling PyRun_PrepareMain (Py_ReadConfiguration will
* set one of them based on the command line arguments if prepare_main
* is non-zero when that API is called).
int prepare_main;
PyUnicodeObject *main_source; /* -c switch */
PyUnicodeObject *main_path; /* filesystem path */
PyUnicodeObject *main_module; /* -m switch */
PyCodeObject *main_code; /* Run directly from a code object */
PyObject *main_stream; /* Run from stream */
int run_implicit_code; /* Run implicit code during prep */
/* Interactive main
*
* Note: Settings related to interactive mode are very much in flux.
*/
PyObject *prompt_stream; /* Output interactive prompt */
int show_banner; /* -q switch (inverted) */
int inspect_main; /* -i switch, PYTHONINSPECT */
} Py_Config;
@ -844,17 +866,19 @@ data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
#define _Py_ImportConfig_INIT -1, -1, NULL
#define _Py_StreamConfig_INIT -1, NULL, NULL, NULL, NULL, NULL, NULL
#define _Py_FilesystemConfig_INIT NULL
#define _Py_InteractiveConfig_INIT -1, -1, NULL
#define _Py_DebuggingConfig_INIT -1, -1, -1
#define _Py_CodeGenConfig_INIT -1, -1
#define _Py_SignalConfig_INIT -1
#define _Py_ImplicitConfig_INIT NULL
#define _Py_MainConfig_INIT -1, NULL, NULL, NULL, NULL, NULL, -1
#define _Py_InteractiveConfig_INIT NULL, -1, -1
#define Py_Config_INIT {_Py_ArgConfig_INIT, _Py_LocationConfig_INIT,
_Py_SiteConfig_INIT, _Py_ImportConfig_INIT,
_Py_StreamConfig_INIT, _Py_FilesystemConfig_INIT,
_Py_InteractiveConfig_INIT,
_Py_DebuggingConfig_INIT, _Py_CodeGenConfig_INIT,
_Py_SignalConfig_INIT}
_Py_SignalConfig_INIT, _Py_ImplicitConfig_INIT,
_Py_MainConfig_INIT, _Py_InteractiveConfig_INIT}
<TBD: did I miss anything?>
@ -893,6 +917,11 @@ incomplete:
* it will be the same as ``sys.path[0]`` rather than the location of
the ``__main__`` module when executing a valid ``sys.path`` entry
(typically a zipfile or directory)
* otherwise, it will be accurate:
* the script name if running an ordinary script
* ``-c`` if executing a supplied string
* ``-`` or the empty string if running from stdin
* the metadata in the ``__main__`` module will still indicate it is a
builtin module
@ -904,21 +933,103 @@ behaviour, as well as eliminating any side effects on global state if
``import site`` is later explicitly executed in the process.
Preparing the main module
-------------------------
This subphase completes the population of the ``__main__`` module
related metadata, without actually starting execution of the ``__main__``
module code.
It is handled by calling the following API::
int PyRun_PrepareMain();
The actual processing is driven by the main related settings stored in
the interpreter state as part of the configuration struct.
If ``prepare_main`` is zero, this call does nothing.
If all of ``main_source``, ``main_path``, ``main_module``,
``main_stream`` and ``main_code`` are NULL, this call does nothing.
If more than one of ``main_source``, ``main_path``, ``main_module``,
``main_stream`` or ``main_code`` are set, ``RuntimeError`` will be reported.
If ``main_code`` is already set, then this call does nothing.
If ``main_stream`` is set, and ``run_implicit_code`` is also set, then
the file identified in ``startup_file`` will be read, compiled and
executed in the ``__main__`` namespace.
If ``main_source``, ``main_path`` or ``main_module`` are set, then this
call will take whatever steps are needed to populate ``main_code``:
* For ``main_source``, the supplied string will be compiled and saved to
``main_code``.
* For ``main_path``:
* if the supplied path is recognised as a valid ``sys.path`` entry, it
is inserted as ``sys.path[0]``, ``main_module`` is set
to ``__main__`` and processing continues as for ``main_module`` below.
* otherwise, path is read as a CPython bytecode file
* if that fails, it is read as a Python source file and compiled
* in the latter two cases, the code object is saved to ``main_code``
and ``__main__.__file__`` is set appropriately
* For ``main_module``:
* any parent package is imported
* the loader for the module is determined
* if the loader indicates the module is a package, add ``.__main__`` to
the end of ``main_module`` and try again (if the final name segment
is already ``.__main__`` then fail immediately)
* once the module source code is located, save the compiled module code
as ``main_code`` and populate the following attributes in ``__main__``
appropriately: ``__name__``, ``__loader__``, ``__file__``,
``__cached__``, ``__package__``.
(Note: the behaviour described in this section isn't new, it's a write-up
of the current behaviour of the CPython interpreter adjusted for the new
configuration system)
Executing the main module
-------------------------
<TBD>
This subphase covers the execution of the actual ``__main__`` module code.
Initial thought is that hiding the various options behind a single API
would make that API too complicated, so 3 separate APIs is more likely::
It is handled by calling the following API::
Py_RunPathAsMain
Py_RunModuleAsMain
Py_RunStreamAsMain
int PyRun_ExecMain();
Query API to indicate that ``sys.argv[0]`` is fully populated::
The actual processing is driven by the main related settings stored in
the interpreter state as part of the configuration struct.
If both ``main_stream`` and ``main_code`` are NULL, this call does nothing.
If both ``main_stream`` and ``main_code`` are set, ``RuntimeError`` will
be reported.
If ``main_stream`` and ``prompt_stream`` are both set, main execution will
be delegated to a new API::
int PyRun_InteractiveMain(PyObject *input, PyObject* output);
If ``main_stream`` is set and ``prompt_stream`` is NULL, main execution will
be delegated to a new API::
int PyRun_StreamInMain(PyObject *input);
If ``main_code`` is set, main execution will be delegated to a new
API::
int PyRun_CodeInMain(PyCodeObject *code);
After execution of main completes, if ``inspect_main`` is set, or
the ``PYTHONINSPECT`` environment variable has been set, then
``PyRun_ExecMain`` will invoke
``PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__)``.
Py_IsRunningMain()
Internal Storage of Configuration Data
--------------------------------------
@ -931,7 +1042,7 @@ structs.
For debugging purposes, the configuration settings will be exposed as
a ``sys._configuration`` simple namespace (similar to ``sys.flags`` and
``sys.implementation``. Field names will match those in the configuration
structs, exception for ``hash_seed``, which will be deliberately excluded.
structs, except for ``hash_seed``, which will be deliberately excluded.
An underscored attribute is chosen deliberately, as these configuration
settings are part of the CPython implementation, rather than part of the
@ -941,16 +1052,19 @@ should be agreed with the other implementations and exposed as new required
attributes on ``sys.implementation``, as described in PEP 421.
These are *snapshots* of the initial configuration settings. They are not
consulted by the interpreter during runtime.
modified by the interpreter during runtime (except as noted above).
Stable ABI
----------
All of the APIs proposed in this PEP are excluded from the stable ABI, as
Most of the APIs proposed in this PEP are excluded from the stable ABI, as
embedding a Python interpreter involves a much higher degree of coupling
than merely writing an extension.
The only newly exposed API that will be part of the stable ABI is the
``Py_IsInitializing()`` query.
Build time configuration
------------------------
@ -1038,27 +1152,32 @@ Open Questions
* Error details for Py_ReadConfiguration and Py_EndInitialization (these
should become clear as the implementation progresses)
* Is ``Py_IsRunningMain()`` worth keeping?
* Should the answers to ``Py_IsInitialized()`` and ``Py_IsRunningMain()`` be
exposed via the ``sys`` module?
* Is the ``Py_Config`` struct too unwieldy to be practical? Would a Python
dictionary be a better choice?
* Should there be ``Py_PreparingMain()`` and ``Py_RunningMain()`` query APIs?
* Should the answer to ``Py_IsInitialized()`` be exposed via the ``sys``
module?
* Is initialisation of the ``Py_Config`` struct too unwieldy to be
maintainable? Would a Python dictionary be a better choice, despite
being harder to work with from C code?
* Would it be better to manage the flag variables in ``Py_Config`` as
Python integers or as "negative means false, positive means true, zero
means not set" so the struct can be initialized with a simple
``memset(&config, 0, sizeof(*config))``, eliminating the need to update
both Py_Config and Py_Config_INIT when adding new fields?
* The name of the system Python executable is a bikeshed waiting to be
* The name of the new system Python executable is a bikeshed waiting to be
painted. The 3 options considered so far are ``spython``, ``pysystem``
and ``python-minimal``. The PEP text reflects my current preferred choice
i.e. ``pysystem``.
(``pysystem``).
Implementation
==============
The reference implementation is being developed as a feature branch in my
BitBucket sandbox [2_].
BitBucket sandbox [2_]. Pull requests to fix the inevitably broken
Windows builds are welcome, but the basic design is still in too much flux
for other pull requests to be feasible just yet. Once the overall design
settles down and it's a matter of migrating individual settings over to
the new design, that level of collaboration should become more practical.
As the number of application binaries created by the build process is now
four, the reference implementation also creates a new top level "Apps"