Minor tweaks to PEP 432

This commit is contained in:
Nick Coghlan 2015-04-13 19:26:10 -04:00
parent 6c3484e878
commit 874d8d0ff0
1 changed files with 47 additions and 49 deletions

View File

@ -37,12 +37,16 @@ bootstrapping the interpreter immediately before locating and executing
the main module.
In the new design, the interpreter will move through the following
well-defined phases during the startup sequence:
well-defined phases during the initialization sequence:
* Pre-Initialization - no interpreter available
* Initializing - interpreter partially available
* Initialized - interpreter available, __main__ related metadata
incomplete
With the interpreter itself fully initialised, main module execution will
then proceed through two phases:
* Main Preparation - __main__ related metadata populated
* Main Execution - bytecode executing in the __main__ module namespace
@ -71,14 +75,15 @@ model and the configuration interface, so such changes should be easier
once this PEP has been implemented.
Background
==========
Over time, CPython's initialization sequence has become progressively more
complicated, offering more options, as well as performing more complex tasks
(such as configuring the Unicode settings for OS interfaces in Python 3 as
well as bootstrapping a pure Python implementation of the import system).
(such as configuring the Unicode settings for OS interfaces in Python 3 [10_],
bootstrapping a pure Python implementation of the import system, and
implementing an isolated mode more suitable for system applications that run
with elevated privileges [6_]).
Much of this complexity is formally accessible only through the ``Py_Main``
and ``Py_Initialize`` APIs, offering embedding applications little
@ -88,18 +93,16 @@ place prior to the ``Py_Initialize`` call, meaning much of the Python C
API cannot be used safely.
A number of proposals are on the table for even *more* sophisticated
startup behaviour, such as an isolated mode equivalent to that described in
this PEP as a "system Python" [6_], better control over ``sys.path``
startup behaviour, such as better control over ``sys.path``
initialization (easily adding additional directories on the command line
in a cross-platform fashion [7_], as well as controlling the configuration of
``sys.path[0]`` [8_]), easier configuration of utilities like coverage
tracing when launching Python subprocesses [9_], and easier control of the
encoding used for the standard IO streams when embedding CPython in a larger
application [10_].
tracing when launching Python subprocesses [9_].
Rather than attempting to bolt such behaviour onto an already complicated
system, this PEP proposes to instead simplify the status quo *first*, with
the aim of making these further feature requests easier to implement.
Rather than continuing to bolt such behaviour onto an already complicated
system, this PEP proposes to start simplifying the status quo by introducing
a more stuctured startup sequence, with the aim of making these further
feature requests easier to implement.
Key Concerns
@ -142,25 +145,12 @@ tear down the interpreter::
python3 -m timeit -s "from subprocess import call" "call(['./python', '-c', 'pass'])"
Current numbers on my system for 2.7, 3.2 and 3.3 (using the 3.3
Current numbers on my system for Python 3.5 (using the 3.4
subprocess and timeit modules to execute the check, all with non-debug
builds)::
# Python 2.7
$ py33/python -m timeit -s "from subprocess import call" "call(['py27/python', '-c', 'pass'])"
100 loops, best of 3: 17.8 msec per loop
# Python 3.2
$ py33/python -m timeit -s "from subprocess import call" "call(['py32/python', '-c', 'pass'])"
10 loops, best of 3: 39 msec per loop
# Python 3.3
$ py33/python -m timeit -s "from subprocess import call" "call(['py33/python', '-c', 'pass'])"
10 loops, best of 3: 25.3 msec per loop
Improvements in the import system and the Unicode support already resulted
in a more than 30% improvement in startup time in Python 3.3 relative to
3.2. Python 3.3 is still slightly slower to start than Python 2.7 due to the
additional infrastructure that needs to be put in place to support the
Unicode based text model.
$ python3 -m timeit -s "from subprocess import call" "call(['./python', '-c', 'pass'])"
10 loops, best of 3: 18.2 msec per loop
This PEP is not expected to have any significant effect on the startup time,
as it is aimed primarily at *reordering* the existing initialization
@ -264,7 +254,7 @@ CPython command line application.
Interpreter Initialization Phases
---------------------------------
Five distinct phases are proposed:
Three distinct interpreter initialisation phases are proposed:
* Pre-Initialization:
@ -290,23 +280,29 @@ Five distinct phases are proposed:
``__main__`` related metadata is incomplete
* ``Py_IsInitializing()`` returns ``0``
* ``Py_IsInitialized()`` returns ``1``
* Optionally, the embedding application may identify and begin
executing code in the ``__main__`` module namespace by calling
``PyRun_PrepareMain`` and ``PyRun_ExecMain``.
Main Execution Phases
---------------------
After initializing the interpreter, the embedding application may continue
on to execute code in the ``__main__`` module namespace.
* Main Preparation:
* subphase of Initialized (not separately identified at runtime)
* fully populates ``__main__`` related metadata
* may execute code in ``__main__`` namespace (e.g. ``PYTHONSTARTUP``)
* invoked as ``PyRun_PrepareMain``
* Main Execution:
* subphase of Initialized (not separately identified at runtime)
* user supplied bytecode is being executed in the ``__main__`` namespace
* invoked as ``PyRun_ExecMain``
As noted above, main module preparation and execution are optional subphases
of Initialized rather than completely distinct phases.
Invocation of Phases
--------------------
All listed phases will be used by the standard CPython interpreter and the
proposed System Python interpreter. Other embedding applications may
@ -344,7 +340,7 @@ Pre-Initialization Phase
The pre-initialization phase is where an embedding application determines
the settings which are absolutely required before the interpreter can be
initialized at all. Currently, the only configuration settings in this
initialized at all. Currently, the primary configuration settings in this
category are those related to the randomised hash algorithm - the hash
algorithms must be consistent for the lifetime of the process, and so they
must be in place before the core interpreter is created.
@ -362,7 +358,8 @@ The proposed API for this step in the startup sequence is::
void Py_BeginInitialization(const PyCoreConfig *config);
Like Py_Initialize, this part of the new API treats initialization failures
Like ``Py_Initialize``, this part of the new API treats initialization
failures
as fatal errors. While that's still not particularly embedding friendly,
the operations in this step *really* shouldn't be failing, and changing them
to return error codes instead of aborting would be an even larger task than
@ -374,7 +371,7 @@ configuration::
/* Note: if changing anything in PyCoreConfig, also update
* PyCoreConfig_INIT */
typedef struct {
int ignore_environment; /* -E switch */
int ignore_environment; /* -E switch, -I switch */
int use_hash_seed; /* PYTHONHASHSEED */
unsigned long hash_seed; /* PYTHONHASHSEED */
int _disable_importlib; /* Needed by freeze_importlib */
@ -404,12 +401,12 @@ to seed the random number generator. If the ``hash_seed`` is zero in this
case, then the randomised hashing is disabled completely.
If ``use_hash_seed`` is negative (and ``ignore_environment`` is zero),
then CPython will inspect the ``PYTHONHASHSEED`` environment variable. If it
is not set, is set to the empty string, or to the value ``"random"``, then
randomised hashes with a random seed will be used. If it is set to the string
``"0"`` the randomised hashing will be disabled. Otherwise, the hash seed is
expected to be a string representation of an integer in the range
``[0; 4294967295]``.
then CPython will inspect the ``PYTHONHASHSEED`` environment variable. If the
environment variable is not set, is set to the empty string, or to the value
``"random"``, then randomised hashes with a random seed will be used. If the
environment variable is set to the string ``"0"`` the randomised hashing will
be disabled. Otherwise, the hash seed is expected to be a string
representation of an integer in the range ``[0; 4294967295]``.
To make it easier for embedding applications to use the ``PYTHONHASHSEED``
processing with a different data source, the following helper function
@ -437,7 +434,7 @@ without breaking the build process.
The aim is to keep this initial level of configuration as small as possible
in order to keep the bootstrapping environment consistent across
different embedding applications. If we can create a valid interpreter state
without the setting, then the setting should go in the config dict passed
without the setting, then the setting should go in the configuration passed
to ``Py_EndInitialization()`` rather than in the core configuration.
A new query API will allow code to determine if the interpreter is in the
@ -487,8 +484,8 @@ to be used safely for all of the remaining configuration steps (unlike the
status quo).
In addition, the current thread will possess a valid Python thread state,
allow any further configuration data to be stored on the interpreter object
rather than in C process globals.
allowing any further configuration data to be stored on the interpreter
object rather than in C process globals.
Any call to ``Py_BeginInitialization()`` must have a matching call to
``Py_Finalize()``. It is acceptable to skip calling Py_EndInitialization() in
@ -511,7 +508,7 @@ check the supplied value, but otherwise accept it as correct.
A struct is used rather than a Python dictionary as the struct is easier
to work with from C, the list of supported fields is fixed for a given
CPython version and only a read-only view need to be exposed to Python
CPython version and only a read-only view needs to be exposed to Python
code (which is relatively straightforward, thanks to the infrastructure
already put in place to expose ``sys.implementation``).
@ -521,8 +518,9 @@ if a problem is found with the config data.
Any supported configuration setting which is not already set will be
populated appropriately in the supplied configuration struct. The default
configuration can be overridden entirely by setting the value *before* calling ``Py_ReadConfiguration``. The provided value will then also be used in
calculating any other settings derived from that value.
configuration can be overridden entirely by setting the value *before*
calling ``Py_ReadConfiguration``. The provided value will then also be used
in calculating any other settings derived from that value.
Alternatively, settings may be overridden *after* the
``Py_ReadConfiguration`` call (this can be useful if an embedding