1323 lines
55 KiB
Plaintext
1323 lines
55 KiB
Plaintext
PEP: 432
|
|
Title: Simplifying the CPython startup sequence
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 28-Dec-2012
|
|
Python-Version: 3.6
|
|
Post-History: 28-Dec-2012, 2-Jan-2013
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a mechanism for simplifying the startup sequence for
|
|
CPython, making it easier to modify the initialization behaviour of the
|
|
reference interpreter executable, as well as making it easier to control
|
|
CPython's startup behaviour when creating an alternate executable or
|
|
embedding it as a Python execution engine inside a larger application.
|
|
|
|
Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriate
|
|
resolution for most of these should become clearer as the reference
|
|
implementation is developed.
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
This PEP proposes that initialization of the CPython runtime be split into
|
|
two clearly distinct phases:
|
|
|
|
* core runtime initialization
|
|
* main interpreter initialization
|
|
|
|
The proposed design also has significant implications for:
|
|
|
|
* main module execution
|
|
* subinterpreter initialization
|
|
|
|
In the new design, the interpreter will move through the following
|
|
well-defined phases during the initialization sequence:
|
|
|
|
* Pre-Initialization - no interpreter available
|
|
* Core Initialized - main interpreter partially available,
|
|
subinterpreter creation not yet available
|
|
* Initialized - main interpreter fully available, subinterpreter creation
|
|
available
|
|
|
|
As a concrete use case to help guide any design changes, and to solve a known
|
|
problem where the appropriate defaults for system utilities differ from those
|
|
for running user scripts, this PEP also proposes the creation and
|
|
distribution of a separate system Python (``pysystem``) executable
|
|
which, by default, operates in "isolated mode" (as selected by the CPython
|
|
``-I`` switch).
|
|
|
|
To keep the implementation complexity under control, this PEP does *not*
|
|
propose wholesale changes to the way the interpreter state is accessed at
|
|
runtime. Changing the order in which the existing initialization steps
|
|
occur in order to make the startup sequence easier to maintain is already a
|
|
substantial change, and attempting to make those other changes at the same time
|
|
will make the change significantly more invasive and much harder to review.
|
|
However, such proposals may be suitable topics for follow-on PEPs or patches
|
|
- one key benefit of this PEP is decreasing the coupling between the internal
|
|
storage model and the configuration interface, so such changes should be easier
|
|
once this PEP has been implemented.
|
|
|
|
|
|
Background
|
|
==========
|
|
|
|
Over time, CPython's initialization sequence has become progressively more
|
|
complicated, offering more options, as well as performing more complex tasks
|
|
(such as configuring the Unicode settings for OS interfaces in Python 3 [10_],
|
|
bootstrapping a pure Python implementation of the import system, and
|
|
implementing an isolated mode more suitable for system applications that run
|
|
with elevated privileges [6_]).
|
|
|
|
Much of this complexity is formally accessible only through the ``Py_Main``
|
|
and ``Py_Initialize`` APIs, offering embedding applications little
|
|
opportunity for customisation. This creeping complexity also makes life
|
|
difficult for maintainers, as much of the configuration needs to take
|
|
place prior to the ``Py_Initialize`` call, meaning much of the Python C
|
|
API cannot be used safely.
|
|
|
|
A number of proposals are on the table for even *more* sophisticated
|
|
startup behaviour, such as better control over ``sys.path``
|
|
initialization (e.g. easily adding additional directories on the command line
|
|
in a cross-platform fashion [7_], controlling the configuration of
|
|
``sys.path[0]`` [8_]), easier configuration of utilities like coverage
|
|
tracing when launching Python subprocesses [9_]).
|
|
|
|
Rather than continuing to bolt such behaviour onto an already complicated
|
|
system, this PEP proposes to start simplifying the status quo by introducing
|
|
a more stuctured startup sequence, with the aim of making these further
|
|
feature requests easier to implement.
|
|
|
|
|
|
Key Concerns
|
|
============
|
|
|
|
There are a few key concerns that any change to the startup sequence
|
|
needs to take into account.
|
|
|
|
|
|
Maintainability
|
|
---------------
|
|
|
|
The current CPython startup sequence is difficult to understand, and even
|
|
more difficult to modify. It is not clear what state the interpreter is in
|
|
while much of the initialization code executes, leading to behaviour such
|
|
as lists, dictionaries and Unicode values being created prior to the call
|
|
to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1_].
|
|
|
|
By moving to an explicitly multi-phase startup sequence, developers should
|
|
only need to understand which features are not available in the core
|
|
bootstrapping phase, as the vast majority of the configuration process
|
|
will now take place during that phase.
|
|
|
|
By basing the new design on a combination of C structures and Python
|
|
data types, it should also be easier to modify the system in the
|
|
future to add new configuration options.
|
|
|
|
|
|
Testability
|
|
-----------
|
|
|
|
One of the problems with the complexity of the CPython startup sequence is the
|
|
combinatorial explosion of possible interactions between different configuration
|
|
settings.
|
|
|
|
This concern impacts both the design of the new initialisation system, and
|
|
the proposed approach for getting there.
|
|
|
|
|
|
Performance
|
|
-----------
|
|
|
|
CPython is used heavily to run short scripts where the runtime is dominated
|
|
by the interpreter initialization time. Any changes to the startup sequence
|
|
should minimise their impact on the startup overhead.
|
|
|
|
Experience with the importlib migration suggests that the startup time is
|
|
dominated by IO operations. However, to monitor the impact of any changes,
|
|
a simple benchmark can be used to check how long it takes to start and then
|
|
tear down the interpreter::
|
|
|
|
python3 -m timeit -s "from subprocess import call" "call(['./python', '-c', 'pass'])"
|
|
|
|
Current numbers on my system for Python 3.5 (using the 3.4
|
|
subprocess and timeit modules to execute the check, all with non-debug
|
|
builds)::
|
|
|
|
$ python3 -m timeit -s "from subprocess import call" "call(['./python', '-c', 'pass'])"
|
|
10 loops, best of 3: 18.2 msec per loop
|
|
|
|
This PEP is not expected to have any significant effect on the startup time,
|
|
as it is aimed primarily at *reordering* the existing initialization
|
|
sequence, without making substantial changes to the individual steps.
|
|
|
|
However, if this simple check suggests that the proposed changes to the
|
|
initialization sequence may pose a performance problem, then a more
|
|
sophisticated microbenchmark will be developed to assist in investigation.
|
|
|
|
|
|
Required Configuration Settings
|
|
===============================
|
|
|
|
A comprehensive configuration scheme requires that an embedding application
|
|
be able to control the following aspects of the final interpreter state:
|
|
|
|
* Whether or not to use randomised hashes (and if used, potentially specify
|
|
a specific random seed)
|
|
* Whether or not to enable the import system (required by CPython's
|
|
build process when freezing the importlib._bootstrap bytecode)
|
|
* The "Where is Python located?" elements in the ``sys`` module:
|
|
* ``sys.executable``
|
|
* ``sys.base_exec_prefix``
|
|
* ``sys.base_prefix``
|
|
* ``sys.exec_prefix``
|
|
* ``sys.prefix``
|
|
* The path searched for imports from the filesystem (and other path hooks):
|
|
* ``sys.path``
|
|
* The command line arguments seen by the interpeter:
|
|
* ``sys.argv``
|
|
* The filesystem encoding used by:
|
|
* ``sys.getfsencoding``
|
|
* ``os.fsencode``
|
|
* ``os.fsdecode``
|
|
* The IO encoding (if any) and the buffering used by:
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* ``sys.stderr``
|
|
* The initial warning system state:
|
|
* ``sys.warnoptions``
|
|
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
|
|
* ``sys._xoptions``
|
|
* Whether or not to implicitly cache bytecode files:
|
|
* ``sys.dont_write_bytecode``
|
|
* Whether or not to enforce correct case in filenames on case-insensitive
|
|
platforms
|
|
* ``os.environ["PYTHONCASEOK"]``
|
|
* The other settings exposed to Python code in ``sys.flags``:
|
|
|
|
* ``debug`` (Enable debugging output in the pgen parser)
|
|
* ``inspect`` (Enter interactive interpreter after __main__ terminates)
|
|
* ``interactive`` (Treat stdin as a tty)
|
|
* ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
|
|
* ``no_user_site`` (don't add the user site directory to sys.path)
|
|
* ``no_site`` (don't implicitly import site during startup)
|
|
* ``ignore_environment`` (whether environment vars are used during config)
|
|
* ``verbose`` (enable all sorts of random output)
|
|
* ``bytes_warning`` (warnings/errors for implicit str/bytes interaction)
|
|
* ``quiet`` (disable banner output even if verbose is also enabled or
|
|
stdin is a tty and the interpreter is launched in interactive mode)
|
|
|
|
* Whether or not CPython's signal handlers should be installed
|
|
* What code (if any) should be executed as ``__main__``:
|
|
|
|
* Nothing (just create an empty module)
|
|
* A filesystem path referring to a Python script (source or bytecode)
|
|
* A filesystem path referring to a valid ``sys.path`` entry (typically
|
|
a directory or zipfile)
|
|
* A given string (equivalent to the "-c" option)
|
|
* A module or package (equivalent to the "-m" option)
|
|
* Standard input as a script (i.e. a non-interactive stream)
|
|
* Standard input as an interactive interpreter session
|
|
|
|
<TBD: Did I miss anything?>
|
|
|
|
Note that this just covers settings that are currently configurable in some
|
|
manner when using the main CPython executable. While this PEP aims to make
|
|
adding additional configuration settings easier in the future, it
|
|
deliberately avoids adding any new settings of its own (except where such
|
|
additional settings arise naturally in the course of migrating existing
|
|
settings to the new structure).
|
|
|
|
|
|
Implementation Strategy
|
|
=======================
|
|
|
|
An initial attempt was made at implementing an earlier version of this PEP for
|
|
Python 3.4 [2_], with one of the significant problems encountered being merge
|
|
conflicts after the initial structural changes were put in place to start the
|
|
refactoring process. Unlike some other previous major changes, such as the
|
|
switch to an AST-based compiler in Python 2.5, or the switch to the importlib
|
|
implementation of the import system in Python 3.3, there is no clear way to
|
|
structure a draft implementation that won't be prone to the kinds of merge
|
|
conflicts that afflicted the original attempt.
|
|
|
|
Accordingly, the implementation strategy now being proposed is to implement this
|
|
refactoring as a private API for CPython 3.6, before exposing the new functions
|
|
and structures as public API elements in CPython 3.7.
|
|
|
|
The affected APIs with a leading underscore, as they would be named in CPython
|
|
3.6:
|
|
|
|
* ``_Py_IsCoreInitialized``
|
|
* ``_Py_InitializeCore``
|
|
* ``_PyCoreConfig``
|
|
* ``_PyCoreConfig_INIT``
|
|
* ``_Py_ReadHashSeed``
|
|
* ``_Py_InitializeMainInterpreter``
|
|
* ``_PyMainInterpreterConfig``
|
|
* ``_PyMainInterpreterConfig_INIT``
|
|
* ``_Py_ReadMainInterpreterConfig``
|
|
* ``_PyRun_PrepareMain``
|
|
* ``_PyRun_ExecMain``
|
|
* ``_Py_InterpreterState_Main``
|
|
|
|
New APIs described in the rest of the PEP with a leading underscore are
|
|
intended to be retained permanently as private CPython implementation details.
|
|
|
|
The principle benefit of this approach is allowing the refactoring to adopt the
|
|
new configuration structures to be handled on a setting by setting basis
|
|
over the course of the 3.6 and 3.7 development cycles, rather than having to
|
|
migrate them in a single monolithic change. It also means any new settings can
|
|
be handled using the new approach, even if some existing settings have yet to
|
|
be migrated.
|
|
|
|
If all existing settings are successfully migrated to the new initialization
|
|
model in time for the 3.6.0a4 release in August 2016, then the proposal would
|
|
be to make the APIs public for the 3.6.0b1 release in September 2016, rather
|
|
than waiting for 3.7.
|
|
|
|
|
|
Design Details
|
|
==============
|
|
|
|
(Note: details here are still very much in flux, but preliminary feedback
|
|
is appreciated anyway)
|
|
|
|
The main theme of this proposal is to create the interpreter state for
|
|
the main interpreter *much* earlier in the startup process. This will allow
|
|
most of the CPython API to be used during the remainder of the initialization
|
|
process, potentially simplifying a number of operations that currently need
|
|
to rely on basic C functionality rather than being able to use the richer
|
|
data structures provided by the CPython C API.
|
|
|
|
In the following, the term "embedding application" also covers the standard
|
|
CPython command line application.
|
|
|
|
|
|
Interpreter Initialization Phases
|
|
---------------------------------
|
|
|
|
Three distinct interpreter initialisation phases are proposed:
|
|
|
|
* Pre-Initialization:
|
|
|
|
* no interpreter is available.
|
|
* ``Py_IsCoreInitialized()`` returns ``0``
|
|
* ``Py_IsInitialized()`` returns ``0``
|
|
* The embedding application determines the settings required to create the
|
|
main interpreter and moves to the next phase by calling
|
|
``Py_InitializeCore``.
|
|
|
|
* Core Initialized:
|
|
|
|
* the main interpreter is available, but only partially configured.
|
|
* ``Py_IsCoreInitialized()`` returns ``1``
|
|
* ``Py_IsInitialized()`` returns ``0``
|
|
* The embedding application determines and applies the settings
|
|
required to complete the initialization process by calling
|
|
``Py_ReadMainInterpreterConfig`` and ``Py_InitializeMainInterpreter``.
|
|
|
|
* Initialized:
|
|
|
|
* the main interpreter is available and fully operational, but
|
|
``__main__`` related metadata is incomplete
|
|
* ``Py_IsCoreInitialized()`` returns ``1``
|
|
* ``Py_IsInitialized()`` returns ``1``
|
|
|
|
Invocation of Phases
|
|
--------------------
|
|
|
|
All listed phases will be used by the standard CPython interpreter and the
|
|
proposed System Python interpreter.
|
|
|
|
An embedding application may still continue to leave initialization almost
|
|
entirely under CPython's control by using the existing ``Py_Initialize``
|
|
API. Alternatively, if an embedding application wants greater control
|
|
over CPython's initial state, it will be able to use the new, finer
|
|
grained API, which allows the embedding application greater control
|
|
over the initialization process::
|
|
|
|
/* Phase 1: Pre-Initialization */
|
|
PyCoreConfig core_config = PyCoreConfig_INIT;
|
|
PyMainInterpreterConfig config = PyMainInterpreterConfig_INIT;
|
|
/* Easily control the core configuration */
|
|
core_config.ignore_environment = 1; /* Ignore environment variables */
|
|
core_config.use_hash_seed = 0; /* Full hash randomisation */
|
|
Py_InitializeCore(&core_config);
|
|
/* Phase 2: Initialization */
|
|
/* Optionally preconfigure some settings here - they will then be
|
|
* used to derive other settings */
|
|
Py_ReadMainInterpreterConfig(&config);
|
|
/* Can completely override derived settings here */
|
|
Py_InitializeMainInterpreter(&config);
|
|
/* Phase 3: Initialized */
|
|
/* If an embedding application has no real concept of a main module
|
|
* it can just stop the initialization process here.
|
|
* Alternatively, it can launch __main__ via the relevant API functions.
|
|
*/
|
|
|
|
|
|
Pre-Initialization Phase
|
|
------------------------
|
|
|
|
The pre-initialization phase is where an embedding application determines
|
|
the settings which are absolutely required before the interpreter can be
|
|
initialized at all. Currently, the primary configuration settings in this
|
|
category are those related to the randomised hash algorithm - the hash
|
|
algorithms must be consistent for the lifetime of the process, and so they
|
|
must be in place before the core interpreter is created.
|
|
|
|
The specific settings needed are a flag indicating whether or not to use a
|
|
specific seed value for the randomised hashes, and if so, the specific value
|
|
for the seed (a seed value of zero disables randomised hashing). In addition,
|
|
due to the possible use of ``PYTHONHASHSEED`` in configuring the hash
|
|
randomisation, the question of whether or not to consider environment
|
|
variables must also be addressed early. Finally, to support the CPython
|
|
build process, an option is offered to completely disable the import
|
|
system.
|
|
|
|
The proposed API for this step in the startup sequence is::
|
|
|
|
void Py_InitializeCore(const PyCoreConfig *config);
|
|
|
|
Like ``Py_Initialize``, this part of the new API treats initialization failures
|
|
as fatal errors. While that's still not particularly embedding friendly,
|
|
the operations in this step *really* shouldn't be failing, and changing them
|
|
to return error codes instead of aborting would be an even larger task than
|
|
the one already being proposed.
|
|
|
|
The new ``PyCoreConfig`` struct holds the settings required for preliminary
|
|
configuration of the core runtime and creation of the main interpreter::
|
|
|
|
/* Note: if changing anything in PyCoreConfig, also update
|
|
* PyCoreConfig_INIT */
|
|
typedef struct {
|
|
int ignore_environment; /* -E switch, -I switch */
|
|
int use_hash_seed; /* PYTHONHASHSEED */
|
|
unsigned long hash_seed; /* PYTHONHASHSEED */
|
|
int _disable_importlib; /* Needed by freeze_importlib */
|
|
} PyCoreConfig;
|
|
|
|
#define PyCoreConfig_INIT {0, -1, 0, 0}
|
|
|
|
The core configuration settings pointer may be ``NULL``, in which case the
|
|
default values are ``ignore_environment = -1`` and ``use_hash_seed = -1``.
|
|
|
|
The ``PyCoreConfig_INIT`` macro is designed to allow easy initialization
|
|
of a struct instance with sensible defaults::
|
|
|
|
PyCoreConfig core_config = PyCoreConfig_INIT;
|
|
|
|
``ignore_environment`` controls the processing of all Python related
|
|
environment variables. If the flag is zero, then environment variables are
|
|
processed normally. Otherwise, all Python-specific environment variables
|
|
are considered undefined (exceptions may be made for some OS specific
|
|
environment variables, such as those used on Mac OS X to communicate
|
|
between the App bundle and the main Python binary).
|
|
|
|
``use_hash_seed`` controls the configuration of the randomised hash
|
|
algorithm. If it is zero, then randomised hashes with a random seed will
|
|
be used. It it is positive, then the value in ``hash_seed`` will be used
|
|
to seed the random number generator. If the ``hash_seed`` is zero in this
|
|
case, then the randomised hashing is disabled completely.
|
|
|
|
If ``use_hash_seed`` is negative (and ``ignore_environment`` is zero),
|
|
then CPython will inspect the ``PYTHONHASHSEED`` environment variable. If the
|
|
environment variable is not set, is set to the empty string, or to the value
|
|
``"random"``, then randomised hashes with a random seed will be used. If the
|
|
environment variable is set to the string ``"0"`` the randomised hashing will
|
|
be disabled. Otherwise, the hash seed is expected to be a string
|
|
representation of an integer in the range ``[0; 4294967295]``.
|
|
|
|
To make it easier for embedding applications to use the ``PYTHONHASHSEED``
|
|
processing with a different data source, the following helper function
|
|
will be added to the C API::
|
|
|
|
int Py_ReadHashSeed(char *seed_text,
|
|
int *use_hash_seed,
|
|
unsigned long *hash_seed);
|
|
|
|
This function accepts a seed string in ``seed_text`` and converts it to
|
|
the appropriate flag and seed values. If ``seed_text`` is ``NULL``,
|
|
the empty string or the value ``"random"``, both ``use_hash_seed`` and
|
|
``hash_seed`` will be set to zero. Otherwise, ``use_hash_seed`` will be set to
|
|
``1`` and the seed text will be interpreted as an integer and reported as
|
|
``hash_seed``. On success the function will return zero. A non-zero return
|
|
value indicates an error (most likely in the conversion to an integer).
|
|
|
|
The ``_disable_importlib`` setting is used as part of the CPython build
|
|
process to create an interpreter with no import capability at all. It is
|
|
considered private to the CPython development team (hence the leading
|
|
underscore), as the only known use case is to permit compiler changes
|
|
that invalidate the previously frozen bytecode for ``importlib._bootstrap``
|
|
without breaking the build process.
|
|
|
|
The aim is to keep this initial level of configuration as small as possible
|
|
in order to keep the bootstrapping environment consistent across
|
|
different embedding applications. If we can create a valid interpreter state
|
|
without the setting, then the setting should go in the configuration passed
|
|
to ``Py_InitializeMainInterpreter()`` rather than in the core configuration.
|
|
|
|
A new query API will allow code to determine if the interpreter is in the
|
|
bootstrapping state between the creation of the interpreter state and the
|
|
completion of the bulk of the initialization process::
|
|
|
|
int Py_IsCoreInitialized();
|
|
|
|
Attempting to call ``Py_InitializeCore()`` again when
|
|
``Py_IsCoreInitialized()`` is true is a fatal error.
|
|
|
|
As frozen bytecode may now be legitimately run in an interpreter which is not
|
|
yet fully initialized, ``sys.flags`` will gain a new ``initialized`` flag.
|
|
|
|
With the core runtime initialised, the interpreter should be fully functional
|
|
except that:
|
|
|
|
* compilation is not allowed (as the parser and compiler are not yet
|
|
configured properly)
|
|
* creation of subinterpreters is not allowed
|
|
* creation of additional thread states is not allowed
|
|
* The following attributes in the ``sys`` module are all either missing or
|
|
``None``:
|
|
* ``sys.path``
|
|
* ``sys.argv``
|
|
* ``sys.executable``
|
|
* ``sys.base_exec_prefix``
|
|
* ``sys.base_prefix``
|
|
* ``sys.exec_prefix``
|
|
* ``sys.prefix``
|
|
* ``sys.warnoptions``
|
|
* ``sys.dont_write_bytecode``
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* The filesystem encoding is not yet defined
|
|
* The IO encoding is not yet defined
|
|
* CPython signal handlers are not yet installed
|
|
* Only builtin and frozen modules may be imported (due to above limitations)
|
|
* ``sys.stderr`` is set to a temporary IO object using unbuffered binary
|
|
mode
|
|
* The ``sys.flags`` attribute exists, but the individual flags may not yet
|
|
have their final values.
|
|
* The ``sys.flags.initialized`` attribute is set to ``0``
|
|
* The ``warnings`` module is not yet initialized
|
|
* The ``__main__`` module does not yet exist
|
|
|
|
<TBD: identify any other notable missing functionality>
|
|
|
|
The main things made available by this step will be the core Python
|
|
data types, in particular dictionaries, lists and strings. This allows them
|
|
to be used safely for all of the remaining configuration steps (unlike the
|
|
status quo).
|
|
|
|
In addition, the current thread will possess a valid Python thread state,
|
|
allowing any further configuration data to be stored on the interpreter
|
|
object rather than in C process globals.
|
|
|
|
Any call to ``Py_InitializeCore()`` must have a matching call to
|
|
``Py_Finalize()``. It is acceptable to skip calling
|
|
``Py_InitializeMainInterpreter()`` in between (e.g. if attempting to read the
|
|
main interpreter configuration settings fails).
|
|
|
|
|
|
Determining the remaining configuration settings
|
|
------------------------------------------------
|
|
|
|
The next step in the initialization sequence is to determine the full
|
|
settings needed to complete the process. No changes are made to the
|
|
interpreter state at this point. The core API for this step is::
|
|
|
|
int Py_ReadMainInterpreterConfig(PyMainInterpreterConfig *config);
|
|
|
|
The config argument should be a pointer to a config struct (which may be
|
|
a temporary one stored on the C stack). For any already configured value
|
|
(i.e. non-NULL pointer or non-negative numeric value), CPython will sanity
|
|
check the supplied value, but otherwise accept it as correct.
|
|
|
|
A struct is used rather than a Python dictionary as the struct is easier
|
|
to work with from C, the list of supported fields is fixed for a given
|
|
CPython version and only a read-only view needs to be exposed to Python
|
|
code (which is relatively straightforward, thanks to the infrastructure
|
|
already put in place to expose ``sys.implementation``).
|
|
|
|
Unlike ``Py_Initialize`` and ``Py_InitializeCore``, this call will raise
|
|
an exception and report an error return rather than exhibiting fatal errors
|
|
if a problem is found with the config data.
|
|
|
|
Any supported configuration setting which is not already set will be
|
|
populated appropriately in the supplied configuration struct. The default
|
|
configuration can be overridden entirely by setting the value *before*
|
|
calling ``Py_ReadMainInterpreterConfig``. The provided value will then also be
|
|
used in calculating any other settings derived from that value.
|
|
|
|
Alternatively, settings may be overridden *after* the
|
|
``Py_ReadMainInterpreterConfig`` call (this can be useful if an embedding
|
|
application wants to adjust a setting rather than replace it completely,
|
|
such as removing ``sys.path[0]``).
|
|
|
|
Merely reading the configuration has no effect on the interpreter state: it
|
|
only modifies the passed in configuration struct. The settings are not
|
|
applied to the running interpreter until the ``Py_InitializeMainInterpreter``
|
|
call (see below).
|
|
|
|
|
|
Supported configuration settings
|
|
--------------------------------
|
|
|
|
The interpreter configuration is split into two parts: settings which are
|
|
either relevant only to the main interpreter or must be identical across the
|
|
main interpreter and all subinterpreters, and settings which may vary across
|
|
subinterpreters.
|
|
|
|
NOTE: For initial implementation purposes, only the flag indicating whether
|
|
or not the interpreter is the main interpreter will be configured on a per
|
|
interpreter basis. Other fields will be reviewed for whether or not they can
|
|
feasibly be made interpreter specific over the course of the implementation.
|
|
|
|
The ``PyMainInterpreterConfig`` struct holds the settings required to
|
|
complete the main interpreter configuration. These settings are also all
|
|
passed through unmodified to subinterpreters. Fields are either pointers to
|
|
Python data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
|
|
|
|
/* Note: if changing anything in PyMainInterpreterConfig, also update
|
|
* PyMainInterpreterConfig_INIT */
|
|
typedef struct {
|
|
/* Argument processing */
|
|
PyListObject *raw_argv;
|
|
PyListObject *argv;
|
|
PyListObject *warnoptions; /* -W switch, PYTHONWARNINGS */
|
|
PyDictObject *xoptions; /* -X switch */
|
|
|
|
/* Filesystem locations */
|
|
PyUnicodeObject *program_name;
|
|
PyUnicodeObject *executable;
|
|
PyUnicodeObject *prefix; /* PYTHONHOME */
|
|
PyUnicodeObject *exec_prefix; /* PYTHONHOME */
|
|
PyUnicodeObject *base_prefix; /* pyvenv.cfg */
|
|
PyUnicodeObject *base_exec_prefix; /* pyvenv.cfg */
|
|
|
|
/* Site module */
|
|
int enable_site_config; /* -S switch (inverted) */
|
|
int no_user_site; /* -s switch, PYTHONNOUSERSITE */
|
|
|
|
/* Import configuration */
|
|
int dont_write_bytecode; /* -B switch, PYTHONDONTWRITEBYTECODE */
|
|
int ignore_module_case; /* PYTHONCASEOK */
|
|
PyListObject *import_path; /* PYTHONPATH (etc) */
|
|
|
|
/* Standard streams */
|
|
int use_unbuffered_io; /* -u switch, PYTHONUNBUFFEREDIO */
|
|
PyUnicodeObject *stdin_encoding; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stdin_errors; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stdout_encoding; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stdout_errors; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stderr_encoding; /* PYTHONIOENCODING */
|
|
PyUnicodeObject *stderr_errors; /* PYTHONIOENCODING */
|
|
|
|
/* Filesystem access */
|
|
PyUnicodeObject *fs_encoding;
|
|
|
|
/* Debugging output */
|
|
int debug_parser; /* -d switch, PYTHONDEBUG */
|
|
int verbosity; /* -v switch */
|
|
|
|
/* Code generation */
|
|
int bytes_warnings; /* -b switch */
|
|
int optimize; /* -O switch */
|
|
|
|
/* Signal handling */
|
|
int install_signal_handlers;
|
|
|
|
/* Implicit execution */
|
|
PyUnicodeObject *startup_file; /* PYTHONSTARTUP */
|
|
|
|
/* Main module
|
|
*
|
|
* If prepare_main is set, at most one of the main_* settings should
|
|
* be set before calling PyRun_PrepareMain (Py_ReadMainInterpreterConfig
|
|
* will set one of them based on the command line arguments if
|
|
* prepare_main is non-zero when that API is called).
|
|
int prepare_main;
|
|
PyUnicodeObject *main_source; /* -c switch */
|
|
PyUnicodeObject *main_path; /* filesystem path */
|
|
PyUnicodeObject *main_module; /* -m switch */
|
|
PyCodeObject *main_code; /* Run directly from a code object */
|
|
PyObject *main_stream; /* Run from stream */
|
|
int run_implicit_code; /* Run implicit code during prep */
|
|
|
|
/* Interactive main
|
|
*
|
|
* Note: Settings related to interactive mode are very much in flux.
|
|
*/
|
|
PyObject *prompt_stream; /* Output interactive prompt */
|
|
int show_banner; /* -q switch (inverted) */
|
|
int inspect_main; /* -i switch, PYTHONINSPECT */
|
|
|
|
} PyMainInterpreterConfig;
|
|
|
|
|
|
/* Struct initialization is pretty horrible in C89. Avoiding this mess would
|
|
* be the most attractive aspect of using a PyDictObject* instead... */
|
|
#define _PyArgConfig_INIT NULL, NULL, NULL, NULL
|
|
#define _PyLocationConfig_INIT NULL, NULL, NULL, NULL, NULL, NULL
|
|
#define _PySiteConfig_INIT -1, -1
|
|
#define _PyImportConfig_INIT -1, -1, NULL
|
|
#define _PyStreamConfig_INIT -1, NULL, NULL, NULL, NULL, NULL, NULL
|
|
#define _PyFilesystemConfig_INIT NULL
|
|
#define _PyDebuggingConfig_INIT -1, -1, -1
|
|
#define _PyCodeGenConfig_INIT -1, -1
|
|
#define _PySignalConfig_INIT -1
|
|
#define _PyImplicitConfig_INIT NULL
|
|
#define _PyMainConfig_INIT -1, NULL, NULL, NULL, NULL, NULL, -1
|
|
#define _PyInteractiveConfig_INIT NULL, -1, -1
|
|
|
|
#define PyMainInterpreterConfig_INIT {
|
|
_PyArgConfig_INIT, _PyLocationConfig_INIT,
|
|
_PySiteConfig_INIT, _PyImportConfig_INIT,
|
|
_PyStreamConfig_INIT, _PyFilesystemConfig_INIT,
|
|
_PyDebuggingConfig_INIT, _PyCodeGenConfig_INIT,
|
|
_PySignalConfig_INIT, _PyImplicitConfig_INIT,
|
|
_PyMainConfig_INIT, _PyInteractiveConfig_INIT}
|
|
|
|
The ``PyInterpreterConfig`` struct holds the settings that may vary between
|
|
the main interpreter and subinterpreters. For the main interpreter, these
|
|
settings are automatically populated by ``Py_InitializeMainInterpreter()``.
|
|
|
|
::
|
|
|
|
/* Note: if changing anything in PyInterpreterConfig, also update
|
|
* PyInterpreterConfig_INIT */
|
|
typedef struct {
|
|
int is_main_interpreter; /* Easily check for subinterpreters */
|
|
} PyInterpreterConfig;
|
|
|
|
#define PyInterpreterConfig_INIT {0}
|
|
|
|
<TBD: did I miss anything?>
|
|
|
|
|
|
Completing the interpreter initialization
|
|
-----------------------------------------
|
|
|
|
The final step in the initialization process is to actually put the
|
|
configuration settings into effect and finish bootstrapping the main
|
|
interpreter up to full operation::
|
|
|
|
int Py_InitializeMainInterpreter(const PyMainInterpreterConfig *config);
|
|
|
|
Like ``Py_ReadMainInterpreterConfig``, this call will raise an exception and
|
|
report an error return rather than exhibiting fatal errors if a problem is
|
|
found with the config data.
|
|
|
|
All configuration settings are required - the configuration struct
|
|
should always be passed through ``Py_ReadMainInterpreterConfig`` to ensure it
|
|
is fully populated.
|
|
|
|
After a successful call ``Py_IsInitialized()`` will become true. The caveats
|
|
described above for the interpreter during the phase where only the core
|
|
runtime is initialized will no longer hold.
|
|
|
|
Attempting to call ``Py_InitializeMainInterpreter()`` again when
|
|
``Py_IsInitialized()`` is true is an error.
|
|
|
|
However, some metadata related to the ``__main__`` module may still be
|
|
incomplete:
|
|
|
|
* ``sys.argv[0]`` may not yet have its final value
|
|
|
|
* it will be ``-m`` when executing a module or package with CPython
|
|
* it will be the same as ``sys.path[0]`` rather than the location of
|
|
the ``__main__`` module when executing a valid ``sys.path`` entry
|
|
(typically a zipfile or directory)
|
|
* otherwise, it will be accurate:
|
|
|
|
* the script name if running an ordinary script
|
|
* ``-c`` if executing a supplied string
|
|
* ``-`` or the empty string if running from stdin
|
|
|
|
* the metadata in the ``__main__`` module will still indicate it is a
|
|
builtin module
|
|
|
|
This function will normally implicitly import site as its final operation
|
|
(after ``Py_IsInitialized()`` is already set). Clearing the
|
|
"enable_site_config" flag in the configuration settings will disable this
|
|
behaviour, as well as eliminating any side effects on global state if
|
|
``import site`` is later explicitly executed in the process.
|
|
|
|
|
|
Preparing the main module
|
|
-------------------------
|
|
|
|
This subphase completes the population of the ``__main__`` module
|
|
related metadata, without actually starting execution of the ``__main__``
|
|
module code.
|
|
|
|
It is handled by calling the following API::
|
|
|
|
int PyRun_PrepareMain();
|
|
|
|
This operation is only permitted for the main interpreter, and will raise
|
|
``RuntimeError`` when invoked from a thread where the current thread state
|
|
belongs to a subinterpreter.
|
|
|
|
The actual processing is driven by the main related settings stored in
|
|
the interpreter state as part of the configuration struct.
|
|
|
|
If ``prepare_main`` is zero, this call does nothing.
|
|
|
|
If all of ``main_source``, ``main_path``, ``main_module``,
|
|
``main_stream`` and ``main_code`` are NULL, this call does nothing.
|
|
|
|
If more than one of ``main_source``, ``main_path``, ``main_module``,
|
|
``main_stream`` or ``main_code`` are set, ``RuntimeError`` will be reported.
|
|
|
|
If ``main_code`` is already set, then this call does nothing.
|
|
|
|
If ``main_stream`` is set, and ``run_implicit_code`` is also set, then
|
|
the file identified in ``startup_file`` will be read, compiled and
|
|
executed in the ``__main__`` namespace.
|
|
|
|
If ``main_source``, ``main_path`` or ``main_module`` are set, then this
|
|
call will take whatever steps are needed to populate ``main_code``:
|
|
|
|
* For ``main_source``, the supplied string will be compiled and saved to
|
|
``main_code``.
|
|
|
|
* For ``main_path``:
|
|
* if the supplied path is recognised as a valid ``sys.path`` entry, it
|
|
is inserted as ``sys.path[0]``, ``main_module`` is set
|
|
to ``__main__`` and processing continues as for ``main_module`` below.
|
|
* otherwise, path is read as a CPython bytecode file
|
|
* if that fails, it is read as a Python source file and compiled
|
|
* in the latter two cases, the code object is saved to ``main_code``
|
|
and ``__main__.__file__`` is set appropriately
|
|
|
|
* For ``main_module``:
|
|
* any parent package is imported
|
|
* the loader for the module is determined
|
|
* if the loader indicates the module is a package, add ``.__main__`` to
|
|
the end of ``main_module`` and try again (if the final name segment
|
|
is already ``.__main__`` then fail immediately)
|
|
* once the module source code is located, save the compiled module code
|
|
as ``main_code`` and populate the following attributes in ``__main__``
|
|
appropriately: ``__name__``, ``__loader__``, ``__file__``,
|
|
``__cached__``, ``__package__``.
|
|
|
|
|
|
(Note: the behaviour described in this section isn't new, it's a write-up
|
|
of the current behaviour of the CPython interpreter adjusted for the new
|
|
configuration system)
|
|
|
|
|
|
Executing the main module
|
|
-------------------------
|
|
|
|
This subphase covers the execution of the actual ``__main__`` module code.
|
|
|
|
It is handled by calling the following API::
|
|
|
|
int PyRun_ExecMain();
|
|
|
|
This operation is only permitted for the main interpreter, and will raise
|
|
``RuntimeError`` when invoked from a thread where the current thread state
|
|
belongs to a subinterpreter.
|
|
|
|
The actual processing is driven by the main related settings stored in
|
|
the interpreter state as part of the configuration struct.
|
|
|
|
If both ``main_stream`` and ``main_code`` are NULL, this call does nothing.
|
|
|
|
If both ``main_stream`` and ``main_code`` are set, ``RuntimeError`` will
|
|
be reported.
|
|
|
|
If ``main_stream`` and ``prompt_stream`` are both set, main execution will
|
|
be delegated to a new API::
|
|
|
|
int _PyRun_InteractiveMain(PyObject *input, PyObject* output);
|
|
|
|
If ``main_stream`` is set and ``prompt_stream`` is NULL, main execution will
|
|
be delegated to a new API::
|
|
|
|
int _PyRun_StreamInMain(PyObject *input);
|
|
|
|
If ``main_code`` is set, main execution will be delegated to a new
|
|
API::
|
|
|
|
int _PyRun_CodeInMain(PyCodeObject *code);
|
|
|
|
After execution of main completes, if ``inspect_main`` is set, or
|
|
the ``PYTHONINSPECT`` environment variable has been set, then
|
|
``PyRun_ExecMain`` will invoke
|
|
``_PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__)``.
|
|
|
|
|
|
Internal Storage of Configuration Data
|
|
--------------------------------------
|
|
|
|
The interpreter state will be updated to include details of the configuration
|
|
settings supplied during initialization by extending the interpreter state
|
|
object with an embedded copy of the ``PyCoreConfig``,
|
|
``PyMainInterpreterConfig`` and ``PyInterpreterConfig`` structs.
|
|
|
|
For debugging purposes, the configuration settings will be exposed as
|
|
a ``sys._configuration`` simple namespace (similar to ``sys.flags`` and
|
|
``sys.implementation``. Field names will match those in the configuration
|
|
structs, except for ``hash_seed``, which will be deliberately excluded.
|
|
|
|
An underscored attribute is chosen deliberately, as these configuration
|
|
settings are part of the CPython implementation, rather than part of the
|
|
Python language definition. If settings are needed to support
|
|
cross-implementation compatibility in the standard library, then those
|
|
should be agreed with the other implementations and exposed as new required
|
|
attributes on ``sys.implementation``, as described in PEP 421.
|
|
|
|
These are *snapshots* of the initial configuration settings. They are not
|
|
modified by the interpreter during runtime (except as noted above).
|
|
|
|
|
|
Creating and Configuring Subinterpreters
|
|
----------------------------------------
|
|
|
|
As the new configuration settings are stored in the interpreter state, they
|
|
need to be initialised when a new subinterpreter is created. This turns out
|
|
to be trickier than one might think due to ``PyThreadState_Swap(NULL);``
|
|
(which is fortunately exercised by CPython's own embedding tests, allowing
|
|
this problem to be detected during development).
|
|
|
|
To provide a straightforward solution for this case, the PEP proposes to
|
|
add a new API::
|
|
|
|
Py_InterpreterState *Py_InterpreterState_Main();
|
|
|
|
This will be a counterpart to Py_InterpreterState_Head(), reporting the
|
|
oldest currently existing interpreter rather than the newest. If
|
|
``Py_NewInterpreter()`` is called from a thread with an existing thread
|
|
state, then the interpreter configuration for that thread will be
|
|
used when initialising the new subinterpreter. If there is no current
|
|
thread state, the configuration from ``Py_InterpreterState_Main()``
|
|
will be used.
|
|
|
|
While the existing ``Py_InterpreterState_Head()`` API could be used instead,
|
|
that reference changes as subinterpreters are created and destroyed, while
|
|
``PyInterpreterState_Main()`` will always refer to the initial interpreter
|
|
state created in ``Py_InitializeCore()``.
|
|
|
|
A new constraint is also added to the embedding API: attempting to delete
|
|
the main interpreter while subinterpreters still exist will now be a fatal
|
|
error.
|
|
|
|
|
|
Stable ABI
|
|
----------
|
|
|
|
Most of the APIs proposed in this PEP are excluded from the stable ABI, as
|
|
embedding a Python interpreter involves a much higher degree of coupling
|
|
than merely writing an extension.
|
|
|
|
The only newly exposed API that will be part of the stable ABI is the
|
|
``Py_IsCoreInitialized()`` query.
|
|
|
|
|
|
Build time configuration
|
|
------------------------
|
|
|
|
This PEP makes no changes to the handling of build time configuration
|
|
settings, and thus has no effect on the contents of ``sys.implementation``
|
|
or the result of ``sysconfig.get_config_vars()``.
|
|
|
|
|
|
Backwards Compatibility
|
|
-----------------------
|
|
|
|
Backwards compatibility will be preserved primarily by ensuring that
|
|
``Py_ReadMainInterpreterConfig()`` interrogates all the previously defined
|
|
configuration settings stored in global variables and environment variables,
|
|
and that ``Py_InitializeMainInterpreter()`` writes affected settings back to
|
|
the relevant locations.
|
|
|
|
One acknowledged incompatiblity is that some environment variables which
|
|
are currently read lazily may instead be read once during interpreter
|
|
initialization. As the PEP matures, these will be discussed in more detail
|
|
on a case by case basis. The environment variables which are currently
|
|
known to be looked up dynamically are:
|
|
|
|
* ``PYTHONCASEOK``: writing to ``os.environ['PYTHONCASEOK']`` will no longer
|
|
dynamically alter the interpreter's handling of filename case differences
|
|
on import (TBC)
|
|
* ``PYTHONINSPECT``: ``os.environ['PYTHONINSPECT']`` will still be checked
|
|
after execution of the ``__main__`` module terminates
|
|
|
|
The ``Py_Initialize()`` style of initialization will continue to be
|
|
supported. It will use (at least some elements of) the new API
|
|
internally, but will continue to exhibit the same behaviour as it
|
|
does today, ensuring that ``sys.argv`` is not populated until a subsequent
|
|
``PySys_SetArgv`` call. All APIs that currently support being called
|
|
prior to ``Py_Initialize()`` will
|
|
continue to do so, and will also support being called prior to
|
|
``Py_InitializeCore()``.
|
|
|
|
To minimise unnecessary code churn, and to ensure the backwards compatibility
|
|
is well tested, the main CPython executable may continue to use some elements
|
|
of the old style initialization API. (very much TBC)
|
|
|
|
|
|
A System Python Executable
|
|
==========================
|
|
|
|
When executing system utilities with administrative access to a system, many
|
|
of the default behaviours of CPython are undesirable, as they may allow
|
|
untrusted code to execute with elevated privileges. The most problematic
|
|
aspects are the fact that user site directories are enabled,
|
|
environment variables are trusted and that the directory containing the
|
|
executed file is placed at the beginning of the import path.
|
|
|
|
Issue 16499 [6_] added a ``-I`` option to change the behaviour of
|
|
the normal CPython executable, but this is a hard to discover solution (and
|
|
adds yet another option to an already complex CLI). This PEP proposes to
|
|
instead add a separate ``system-python`` executable
|
|
|
|
Currently, providing a separate executable with different default behaviour
|
|
would be prohibitively hard to maintain. One of the goals of this PEP is to
|
|
make it possible to replace much of the hard to maintain bootstrapping code
|
|
with more normal CPython code, as well as making it easier for a separate
|
|
application to make use of key components of ``Py_Main``. Including this
|
|
change in the PEP is designed to help avoid acceptance of a design that
|
|
sounds good in theory but proves to be problematic in practice.
|
|
|
|
Cleanly supporting this kind of "alternate CLI" is the main reason for the
|
|
proposed changes to better expose the core logic for deciding between the
|
|
different execution modes supported by CPython:
|
|
|
|
* script execution
|
|
* directory/zipfile execution
|
|
* command execution ("-c" switch)
|
|
* module or package execution ("-m" switch)
|
|
* execution from stdin (non-interactive)
|
|
* interactive stdin
|
|
|
|
Actually implementing this may also reveal the need for some better
|
|
argument parsing infrastructure for use during the initializing phase.
|
|
|
|
|
|
Open Questions
|
|
==============
|
|
|
|
* Error details for ``Py_ReadMainInterpreterConfig`` and
|
|
``Py_InitializeMainInterpreter`` (these should become clearer as the
|
|
implementation progresses)
|
|
* Is initialisation of the ``PyMainInterpreterConfig`` struct too unwieldy to
|
|
be maintainable? Would a Python dictionary be a better choice, despite
|
|
being harder to work with from C code? Can we upgrade to requiring a C99
|
|
compatible compiler?
|
|
* Would it be better to manage the flag variables in ``PyMainInterpreterConfig``
|
|
as Python integers or as "negative means false, positive means true, zero
|
|
means not set" so the struct can be initialized with a simple
|
|
``memset(&config, 0, sizeof(*config))``, eliminating the need to update
|
|
both PyMainInterpreterConfig and PyMainInterpreterConfig_INIT when adding
|
|
new fields?
|
|
* The name of the new system Python executable is a bikeshed waiting to be
|
|
painted. The 4 options considered so far are ``spython``, ``pysystem``,
|
|
``python-minimal`` and `system-python``. The PEP text reflects my current
|
|
preferred choice (``system-python``).
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
A reference implementation for an earlier design was developed as a feature
|
|
branch in my BitBucket sandbox [2_].
|
|
|
|
There is not yet a reference implementation for the design currently
|
|
described in the PEP.
|
|
|
|
The Status Quo
|
|
==============
|
|
|
|
The current mechanisms for configuring the interpreter have accumulated in
|
|
a fairly ad hoc fashion over the past 20+ years, leading to a rather
|
|
inconsistent interface with varying levels of documentation.
|
|
|
|
(Note: some of the info below could probably be cleaned up and added to the
|
|
C API documentation for at least 3.3. - it's all CPython specific, so it
|
|
doesn't belong in the language reference)
|
|
|
|
|
|
Ignoring Environment Variables
|
|
------------------------------
|
|
|
|
The ``-E`` command line option allows all environment variables to be
|
|
ignored when initializing the Python interpreter. An embedding application
|
|
can enable this behaviour by setting ``Py_IgnoreEnvironmentFlag`` before
|
|
calling ``Py_Initialize()``.
|
|
|
|
In the CPython source code, the ``Py_GETENV`` macro implicitly checks this
|
|
flag, and always produces ``NULL`` if it is set.
|
|
|
|
<TBD: I believe PYTHONCASEOK is checked regardless of this setting >
|
|
<TBD: Does -E also ignore Windows registry keys? >
|
|
|
|
|
|
Randomised Hashing
|
|
------------------
|
|
|
|
The randomised hashing is controlled via the ``-R`` command line option (in
|
|
releases prior to 3.3), as well as the ``PYTHONHASHSEED`` environment
|
|
variable.
|
|
|
|
In Python 3.3, only the environment variable remains relevant. It can be
|
|
used to disable randomised hashing (by using a seed value of 0) or else
|
|
to force a specific hash value (e.g. for repeatability of testing, or
|
|
to share hash values between processes)
|
|
|
|
However, embedding applications must use the ``Py_HashRandomizationFlag``
|
|
to explicitly request hash randomisation (CPython sets it in ``Py_Main()``
|
|
rather than in ``Py_Initialize()``).
|
|
|
|
The new configuration API should make it straightforward for an
|
|
embedding application to reuse the ``PYTHONHASHSEED`` processing with
|
|
a text based configuration setting provided by other means (e.g. a
|
|
config file or separate environment variable).
|
|
|
|
|
|
Locating Python and the standard library
|
|
----------------------------------------
|
|
|
|
The location of the Python binary and the standard library is influenced
|
|
by several elements. The algorithm used to perform the calculation is
|
|
not documented anywhere other than in the source code [3_,4_]. Even that
|
|
description is incomplete, as it failed to be updated for the virtual
|
|
environment support added in Python 3.3 (detailed in PEP 405).
|
|
|
|
These calculations are affected by the following function calls (made
|
|
prior to calling ``Py_Initialize()``) and environment variables:
|
|
|
|
* ``Py_SetProgramName()``
|
|
* ``Py_SetPythonHome()``
|
|
* ``PYTHONHOME``
|
|
|
|
The filesystem is also inspected for ``pyvenv.cfg`` files (see PEP 405) or,
|
|
failing that, a ``lib/os.py`` (Windows) or ``lib/python$VERSION/os.py``
|
|
file.
|
|
|
|
The build time settings for ``PREFIX`` and ``EXEC_PREFIX`` are also relevant,
|
|
as are some registry settings on Windows. The hardcoded fallbacks are
|
|
based on the layout of the CPython source tree and build output when
|
|
working in a source checkout.
|
|
|
|
|
|
Configuring ``sys.path``
|
|
------------------------
|
|
|
|
An embedding application may call ``Py_SetPath()`` prior to
|
|
``Py_Initialize()`` to completely override the calculation of
|
|
``sys.path``. It is not straightforward to only allow *some* of the
|
|
calculations, as modifying ``sys.path`` after initialization is
|
|
already complete means those modifications will not be in effect
|
|
when standard library modules are imported during the startup sequence.
|
|
|
|
If ``Py_SetPath()`` is not used prior to the first call to ``Py_GetPath()``
|
|
(implicit in ``Py_Initialize()``), then it builds on the location data
|
|
calculations above to calculate suitable path entries, along with
|
|
the ``PYTHONPATH`` environment variable.
|
|
|
|
<TBD: On Windows, there's also a bunch of stuff to do with the registry>
|
|
|
|
The ``site`` module, which is implicitly imported at startup (unless
|
|
disabled via the ``-S`` option) adds additional paths to this initial
|
|
set of paths, as described in its documentation [5_].
|
|
|
|
The ``-s`` command line option can be used to exclude the user site
|
|
directory from the list of directories added. Embedding applications
|
|
can control this by setting the ``Py_NoUserSiteDirectory`` global variable.
|
|
|
|
The following commands can be used to check the default path configurations
|
|
for a given Python executable on a given system:
|
|
|
|
* ``./python -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- standard configuration
|
|
* ``./python -s -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- user site directory disabled
|
|
* ``./python -S -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- all site path modifications disabled
|
|
|
|
(Note: you can see similar information using ``-m site`` instead of ``-c``,
|
|
but this is slightly misleading as it calls ``os.abspath`` on all of the
|
|
path entries, making relative path entries look absolute. Using the ``site``
|
|
module also causes problems in the last case, as on Python versions prior to
|
|
3.3, explicitly importing site will carry out the path modifications ``-S``
|
|
avoids, while on 3.3+ combining ``-m site`` with ``-S`` currently fails)
|
|
|
|
The calculation of ``sys.path[0]`` is comparatively straightforward:
|
|
|
|
* For an ordinary script (Python source or compiled bytecode),
|
|
``sys.path[0]`` will be the directory containing the script.
|
|
* For a valid ``sys.path`` entry (typically a zipfile or directory),
|
|
``sys.path[0]`` will be that path
|
|
* For an interactive session, running from stdin or when using the ``-c`` or
|
|
``-m`` switches, ``sys.path[0]`` will be the empty string, which the import
|
|
system interprets as allowing imports from the current directory
|
|
|
|
|
|
Configuring ``sys.argv``
|
|
------------------------
|
|
|
|
Unlike most other settings discussed in this PEP, ``sys.argv`` is not
|
|
set implicitly by ``Py_Initialize()``. Instead, it must be set via an
|
|
explicitly call to ``Py_SetArgv()``.
|
|
|
|
CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The
|
|
calculation of ``sys.argv[1:]`` is straightforward: they're the command line
|
|
arguments passed after the script name or the argument to the ``-c`` or
|
|
``-m`` options.
|
|
|
|
The calculation of ``sys.argv[0]`` is a little more complicated:
|
|
|
|
* For an ordinary script (source or bytecode), it will be the script name
|
|
* For a ``sys.path`` entry (typically a zipfile or directory) it will
|
|
initially be the zipfile or directory name, but will later be changed by
|
|
the ``runpy`` module to the full path to the imported ``__main__`` module.
|
|
* For a module specified with the ``-m`` switch, it will initially be the
|
|
string ``"-m"``, but will later be changed by the ``runpy`` module to the
|
|
full path to the executed module.
|
|
* For a package specified with the ``-m`` switch, it will initially be the
|
|
string ``"-m"``, but will later be changed by the ``runpy`` module to the
|
|
full path to the executed ``__main__`` submodule of the package.
|
|
* For a command executed with ``-c``, it will be the string ``"-c"``
|
|
* For explicitly requested input from stdin, it will be the string ``"-"``
|
|
* Otherwise, it will be the empty string
|
|
|
|
Embedding applications must call Py_SetArgv themselves. The CPython logic
|
|
for doing so is part of ``Py_Main()`` and is not exposed separately.
|
|
However, the ``runpy`` module does provide roughly equivalent logic in
|
|
``runpy.run_module`` and ``runpy.run_path``.
|
|
|
|
|
|
|
|
Other configuration settings
|
|
----------------------------
|
|
|
|
TBD: Cover the initialization of the following in more detail:
|
|
|
|
* Completely disabling the import system
|
|
* The initial warning system state:
|
|
* ``sys.warnoptions``
|
|
* (-W option, PYTHONWARNINGS)
|
|
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
|
|
* ``sys._xoptions``
|
|
* (-X option)
|
|
* The filesystem encoding used by:
|
|
* ``sys.getfsencoding``
|
|
* ``os.fsencode``
|
|
* ``os.fsdecode``
|
|
* The IO encoding and buffering used by:
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* ``sys.stderr``
|
|
* (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
|
|
* Whether or not to implicitly cache bytecode files:
|
|
* ``sys.dont_write_bytecode``
|
|
* (-B option, PYTHONDONTWRITEBYTECODE)
|
|
* Whether or not to enforce correct case in filenames on case-insensitive
|
|
platforms
|
|
* ``os.environ["PYTHONCASEOK"]``
|
|
* The other settings exposed to Python code in ``sys.flags``:
|
|
|
|
* ``debug`` (Enable debugging output in the pgen parser)
|
|
* ``inspect`` (Enter interactive interpreter after __main__ terminates)
|
|
* ``interactive`` (Treat stdin as a tty)
|
|
* ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
|
|
* ``no_user_site`` (don't add the user site directory to sys.path)
|
|
* ``no_site`` (don't implicitly import site during startup)
|
|
* ``ignore_environment`` (whether environment vars are used during config)
|
|
* ``verbose`` (enable all sorts of random output)
|
|
* ``bytes_warning`` (warnings/errors for implicit str/bytes interaction)
|
|
* ``quiet`` (disable banner output even if verbose is also enabled or
|
|
stdin is a tty and the interpreter is launched in interactive mode)
|
|
|
|
* Whether or not CPython's signal handlers should be installed
|
|
|
|
Much of the configuration of CPython is currently handled through C level
|
|
global variables::
|
|
|
|
Py_BytesWarningFlag (-b)
|
|
Py_DebugFlag (-d option)
|
|
Py_InspectFlag (-i option, PYTHONINSPECT)
|
|
Py_InteractiveFlag (property of stdin, cannot be overridden)
|
|
Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
|
|
Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
|
|
Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
|
|
Py_NoSiteFlag (-S option)
|
|
Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO)
|
|
Py_VerboseFlag (-v option, PYTHONVERBOSE)
|
|
|
|
For the above variables, the conversion of command line options and
|
|
environment variables to C global variables is handled by ``Py_Main``,
|
|
so each embedding application must set those appropriately in order to
|
|
change them from their defaults.
|
|
|
|
Some configuration can only be provided as OS level environment variables::
|
|
|
|
PYTHONSTARTUP
|
|
PYTHONCASEOK
|
|
PYTHONIOENCODING
|
|
|
|
The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate
|
|
whether or not CPython's signal handlers should be installed.
|
|
|
|
Finally, some interactive behaviour (such as printing the introductory
|
|
banner) is triggered only when standard input is reported as a terminal
|
|
connection by the operating system.
|
|
|
|
TBD: Document how the "-x" option is handled (skips processing of the
|
|
first comment line in the main script)
|
|
|
|
Also see detailed sequence of operations notes at [1_]
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] CPython interpreter initialization notes
|
|
(http://wiki.python.org/moin/CPythonInterpreterInitialization)
|
|
|
|
.. [2] BitBucket Sandbox
|
|
(https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits)
|
|
|
|
.. [3] \*nix getpath implementation
|
|
(http://hg.python.org/cpython/file/default/Modules/getpath.c)
|
|
|
|
.. [4] Windows getpath implementation
|
|
(http://hg.python.org/cpython/file/default/PC/getpathp.c)
|
|
|
|
.. [5] Site module documentation
|
|
(http://docs.python.org/3/library/site.html)
|
|
|
|
.. [6] Proposed CLI option for isolated mode
|
|
(http://bugs.python.org/issue16499)
|
|
|
|
.. [7] Adding to sys.path on the command line
|
|
(http://mail.python.org/pipermail/python-ideas/2010-October/008299.html)
|
|
(http://mail.python.org/pipermail/python-ideas/2012-September/016128.html)
|
|
|
|
.. [8] Control sys.path[0] initialisation
|
|
(http://bugs.python.org/issue13475)
|
|
|
|
.. [9] Enabling code coverage in subprocesses when testing
|
|
(http://bugs.python.org/issue14803)
|
|
|
|
.. [10] Problems with PYTHONIOENCODING in Blender
|
|
(http://bugs.python.org/issue16129)
|
|
|
|
|
|
|
|
Copyright
|
|
===========
|
|
This document has been placed in the public domain.
|