756 lines
31 KiB
Plaintext
756 lines
31 KiB
Plaintext
PEP: 432
|
|
Title: Simplifying the CPython startup sequence
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 28-Dec-2012
|
|
Python-Version: 3.4
|
|
Post-History: 28-Dec-2012
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a mechanism for simplifying the startup sequence for
|
|
CPython, making it easier to modify the initialisation behaviour of the
|
|
reference interpreter executable, as well as making it easier to control
|
|
CPython's startup behaviour when creating an alternate executable or
|
|
embedding it as a Python execution engine inside a larger application.
|
|
|
|
Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriate
|
|
resolution for most of these should become clearer as the reference
|
|
implementation is developed.
|
|
|
|
|
|
Proposal Summary
|
|
================
|
|
|
|
This PEP proposes that CPython move to an explicit 2-phase initialisation
|
|
process, where a preliminary interpreter is put in place with limited OS
|
|
interaction capabilities early in the startup sequence. This essential core
|
|
remains in place while all of the configuration settings are determined,
|
|
until a final configuration call takes those settings and finishes
|
|
bootstrapping the interpreter immediately before executing the main module.
|
|
|
|
As a concrete use case to help guide any design changes, and to solve a known
|
|
problem where the appropriate defaults for system utilities differ from those
|
|
for running user scripts, this PEP also proposes the creation and
|
|
distribution of a separate system Python (``spython``) executable which, by
|
|
default, ignores user site directories and environment variables, and does
|
|
not implicitly set ``sys.path[0]`` based on the current directory or the
|
|
script being executed.
|
|
|
|
To keep the implementation complexity under control, this PEP does *not*
|
|
propose wholesale changes to the way the interpreter state is accessed at
|
|
runtime, nor does it propose changes to the way subinterpreters are
|
|
created after the main interpreter has already been initialised. Changing
|
|
the order in which the existing initialisation steps occur to make the
|
|
startup sequence easier to maintain is already a substantial change, and
|
|
attempting to make those other changes at the same time will make the
|
|
change significantly more invasive and much harder to review. However, such
|
|
proposals may be suitable topics for follow-on PEPs or patches - one key
|
|
benefit of this PEP is decreasing the coupling between the internal storage
|
|
model and the configuration interface.
|
|
|
|
|
|
Background
|
|
==========
|
|
|
|
Over time, CPython's initialisation sequence has become progressively more
|
|
complicated, offering more options, as well as performing more complex tasks
|
|
(such as configuring the Unicode settings for OS interfaces in Python 3 as
|
|
well as bootstrapping a pure Python implementation of the import system).
|
|
|
|
Much of this complexity is accessible only through the ``Py_Main`` and
|
|
``Py_Initialize`` APIs, offering embedding applications little opportunity
|
|
for customisation. This creeping complexity also makes life difficult for
|
|
maintainers, as much of the configuration needs to take place prior to the
|
|
``Py_Initialize`` call, meaning much of the Python C API cannot be used
|
|
safely.
|
|
|
|
A number of proposals are on the table for even *more* sophisticated
|
|
startup behaviour, such as better control over ``sys.path`` initialisation
|
|
(easily adding additional directories on the command line in a cross-platform
|
|
fashion, as well as controlling the configuration of ``sys.path[0]``), easier
|
|
configuration of utilities like coverage tracing when launching Python
|
|
subprocesses, and easier control of the encoding used for the standard IO
|
|
streams when embedding CPython in a larger application.
|
|
|
|
Rather than attempting to bolt such behaviour onto an already complicated
|
|
system, this PEP proposes to instead simplify the status quo *first*, with
|
|
the aim of making these further feature requests easier to implement.
|
|
|
|
|
|
Key Concerns
|
|
============
|
|
|
|
There are a couple of key concerns that any change to the startup sequence
|
|
needs to take into account.
|
|
|
|
|
|
Maintainability
|
|
---------------
|
|
|
|
The current CPython startup sequence is difficult to understand, and even
|
|
more difficult to modify. It is not clear what state the interpreter is in
|
|
while much of the initialisation code executes, leading to behaviour such
|
|
as lists, dictionaries and Unicode values being created prior to the call
|
|
to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1_].
|
|
|
|
By moving to a 2-phase startup sequence, developers should only need to
|
|
understand which features are not available in the core bootstrapping state,
|
|
as the vast majority of the configuration process will now take place in
|
|
that state.
|
|
|
|
By basing the new design on a combination of C structures and Python
|
|
dictionaries, it should also be easier to modify the system in the
|
|
future to add new configuration options.
|
|
|
|
|
|
Performance
|
|
-----------
|
|
|
|
CPython is used heavily to run short scripts where the runtime is dominated
|
|
by the interpreter initialisation time. Any changes to the startup sequence
|
|
should minimise their impact on the startup overhead.
|
|
|
|
Experience with the importlib migration suggests that the startup time is
|
|
dominated by IO operations. However, to monitor the impact of any changes,
|
|
a simple benchmark can be used to check how long it takes to start and then
|
|
tear down the interpreter::
|
|
|
|
python3 -m timeit -s "from subprocess import call" "call(['./python', '-c', 'pass'])"
|
|
|
|
Current numbers on my system for 2.7, 3.2 and 3.3 (using the 3.3
|
|
subprocess and timeit modules to execute the check, all with non-debug
|
|
builds)::
|
|
|
|
# Python 2.7
|
|
$ py33/python -m timeit -s "from subprocess import call" "call(['py27/python', '-c', 'pass'])"
|
|
100 loops, best of 3: 17.8 msec per loop
|
|
# Python 3.2
|
|
$ py33/python -m timeit -s "from subprocess import call" "call(['py32/python', '-c', 'pass'])"
|
|
10 loops, best of 3: 39 msec per loop
|
|
# Python 3.3
|
|
$ py33/python -m timeit -s "from subprocess import call" "call(['py33/python', '-c', 'pass'])"
|
|
10 loops, best of 3: 25.3 msec per loop
|
|
|
|
Improvements in the import system and the Unicode support already resulted
|
|
in a more than 30% improvement in startup time in Python 3.3 relative to
|
|
3.2. Python 3.3 is still slightly slower to start than Python 2.7 due to the
|
|
additional infrastructure that needs to be put in place to support the Unicode
|
|
based text model.
|
|
|
|
This PEP is not expected to have any significant effect on the startup time,
|
|
as it is aimed primarily at *reordering* the existing initialisation
|
|
sequence, without making substantial changes to the individual steps.
|
|
|
|
However, if this simple check suggests that the proposed changes to the
|
|
initialisation sequence may pose a performance problem, then a more
|
|
sophisticated microbenchmark will be developed to assist in investigation.
|
|
|
|
|
|
Required Configuration Settings
|
|
===============================
|
|
|
|
A comprehensive configuration scheme requires that an embedding application
|
|
be able to control the following aspects of the final interpreter state:
|
|
|
|
* Whether or not to use randomised hashes (and if used, potentially specify
|
|
a specific random seed)
|
|
* The "Where is Python located?" elements in the ``sys`` module:
|
|
* ``sys.executable``
|
|
* ``sys.base_exec_prefix``
|
|
* ``sys.base_prefix``
|
|
* ``sys.exec_prefix``
|
|
* ``sys.prefix``
|
|
* The path searched for imports from the filesystem (and other path hooks):
|
|
* ``sys.path``
|
|
* The command line arguments seen by the interpeter:
|
|
* ``sys.argv``
|
|
* The filesystem encoding used by:
|
|
* ``sys.getfsencoding``
|
|
* ``os.fsencode``
|
|
* ``os.fsdecode``
|
|
* The IO encoding (if any) and the buffering used by:
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* ``sys.stderr``
|
|
* The initial warning system state:
|
|
* ``sys.warnoptions``
|
|
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
|
|
* ``sys._xoptions``
|
|
* Whether or not to implicitly cache bytecode files:
|
|
* ``sys.dont_write_bytecode``
|
|
* Whether or not to enforce correct case in filenames on case-insensitive
|
|
platforms
|
|
* ``os.environ["PYTHONCASEOK"]``
|
|
* The other settings exposed to Python code in ``sys.flags``:
|
|
|
|
* ``debug`` (Enable debugging output in the pgen parser)
|
|
* ``inspect`` (Enter interactive interpreter after __main__ terminates)
|
|
* ``interactive`` (Treat stdin as a tty)
|
|
* ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
|
|
* ``no_user_site`` (don't add the user site directory to sys.path)
|
|
* ``no_site`` (don't implicitly import site during startup)
|
|
* ``ignore_environment`` (whether environment vars are used during config)
|
|
* ``verbose`` (enable all sorts of random output)
|
|
* ``bytes_warning``
|
|
* ``quiet`` (disable banner output even if verbose is also enabled or
|
|
stdin is a tty and the interpreter is launched in interactive mode)
|
|
|
|
* Whether or not CPython's signal handlers should be installed
|
|
* What code (if any) should be executed as ``__main__``:
|
|
|
|
* Nothing (just create an empty module)
|
|
* A filesystem path referring to a Python script (source or bytecode)
|
|
* A filesystem path referring to a valid ``sys.path`` entry (typically
|
|
a directory or zipfile)
|
|
* A given string (equivalent to the "-c" option)
|
|
* A module or package (equivalent to the "-m" option)
|
|
* Standard input as a script (i.e. a non-interactive stream)
|
|
* Standard input as an interactive interpreter session
|
|
|
|
<TBD: Did I miss anything?>
|
|
|
|
Note that this just covers settings that are currently configurable in some
|
|
manner when using the main CPython executable. While this PEP aims to make
|
|
adding additional configuration settings easier in the future, it
|
|
deliberately avoids any new settings of its own.
|
|
|
|
|
|
The Status Quo
|
|
==============
|
|
|
|
The current mechanisms for configuring the interpreter have accumulated in
|
|
a fairly ad hoc fashion over the past 20+ years, leading to a rather
|
|
inconsistent interface with varying levels of documentation.
|
|
|
|
(Note: some of the info below could probably be cleaned up and added to the
|
|
C API documentation - it's all CPython specific, so it doesn't belong in
|
|
the language reference)
|
|
|
|
|
|
Ignoring Environment Variables
|
|
------------------------------
|
|
|
|
The ``-E`` command line option allows all environment variables to be
|
|
ignored when initialising the Python interpreter. An embedding application
|
|
can enable this behaviour by setting ``Py_IgnoreEnvironmentFlag`` before
|
|
calling ``Py_Initialize()``.
|
|
|
|
In the CPython source code, the ``Py_GETENV`` macro implicitly checks this
|
|
flag, and always produces ``NULL`` if it is set.
|
|
|
|
<TBD: Does -E also ignore Windows registry keys? >
|
|
|
|
|
|
Randomised Hashing
|
|
------------------
|
|
|
|
The randomised hashing is controlled via the ``-R`` command line option (in
|
|
releases prior to 3.3), as well as the ``PYTHONHASHSEED`` environment
|
|
variable.
|
|
|
|
In Python 3.3, only the environment variable remains relevant. It can be
|
|
used to disable randomised hashing (by using a seed value of 0) or else
|
|
to force a specific hash value (e.g. for repeatability of testing, or
|
|
to share hash values between processes)
|
|
|
|
However, embedding applications must use the ``Py_HashRandomizationFlag``
|
|
to explicitly request hash randomisation (CPython sets it in ``Py_Main()``
|
|
rather than in ``Py_Initialize()``).
|
|
|
|
The new configuration API should make it straightforward for an
|
|
embedding application to reuse the ``PYTHONHASHSEED`` processing with
|
|
a text based configuration setting provided by other means.
|
|
|
|
|
|
Locating Python and the standard library
|
|
----------------------------------------
|
|
|
|
The location of the Python binary and the standard library is influenced
|
|
by several elements. The algorithm used to perform the calculation is
|
|
not documented anywhere other than in the source code [3_,4_]. Even that
|
|
description is incomplete, as it failed to be updated for the virtual
|
|
environment support added in Python 3.3 (detailed in PEP 420).
|
|
|
|
These calculations are affected by the following function calls (made
|
|
prior to calling ``Py_Initialize()``) and environment variables:
|
|
|
|
* ``Py_SetProgramName()``
|
|
* ``Py_SetPythonHome()``
|
|
* ``PYTHONHOME``
|
|
|
|
The filesystem is also inspected for ``pyvenv.cfg`` files (see PEP 420) or,
|
|
failing that, a ``lib/os.py`` (Windows) or ``lib/python$VERSION/os.py``
|
|
file.
|
|
|
|
The build time settings for PREFIX and EXEC_PREFIX are also relevant,
|
|
as are some registry settings on Windows. The hardcoded fallbacks are
|
|
based on the layout of the CPython source tree and build output when
|
|
working in a source checkout.
|
|
|
|
|
|
Configuring ``sys.path``
|
|
------------------------
|
|
|
|
An embedding application may call ``Py_SetPath()`` prior to
|
|
``Py_Initialize()`` to completely override the calculation of
|
|
``sys.path``. It is not straightforward to only allow *some* of the
|
|
calculations, as modifying ``sys.path`` after initialisation is
|
|
already complete means those modifications will not be in effect
|
|
when standard library modules are imported during the startup sequence.
|
|
|
|
If ``Py_SetPath()`` is not used prior to the first call to ``Py_GetPath()``
|
|
(implicit in ``Py_Initialize()``), then it builds on the location data
|
|
calculations above to calculate suitable path entries, along with
|
|
the ``PYTHONPATH`` environment variable.
|
|
|
|
<TBD: On Windows, there's also a bunch of stuff to do with the registry>
|
|
|
|
The ``site`` module, which is implicitly imported at startup (unless
|
|
disabled via the ``-S`` option) adds additional paths to this initial
|
|
set of paths, as described in its documentation [5_].
|
|
|
|
The ``-s`` command line option can be used to exclude the user site
|
|
directory from the list of directories added. Embedding applications
|
|
can control this by setting the ``Py_NoUserSiteDirectory`` global variable.
|
|
|
|
The following commands can be used to check the default path configurations
|
|
for a given Python executable on a given system:
|
|
|
|
* ``./python -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- standard configuration
|
|
* ``./python -s -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- user site directory disabled
|
|
* ``./python -S -c "import sys, pprint; pprint.pprint(sys.path)"``
|
|
- all site path modifications disabled
|
|
|
|
(Note: you can see similar information using ``-m site`` instead of ``-c``,
|
|
but this is slightly misleading as it calls ``os.abspath`` on all of the
|
|
path entries (making relative path entries look absolute), and also causes
|
|
problems in the last case, as on Python versions prior to 3.3, explicitly
|
|
importing site will carry out the path modifications ``-S`` avoids, while on
|
|
3.3+ combining ``-m site`` with ``-S`` currently fails)
|
|
|
|
The calculation of ``sys.path[0]`` is comparatively straightforward:
|
|
|
|
* For an ordinary script (Python source or compiled bytecode),
|
|
``sys.path[0]`` will be the directory containing the script.
|
|
* For a valid ``sys.path`` entry (typically a zipfile or directory),
|
|
``sys.path[0]`` will be that path
|
|
* For an interactive session, running from stdin or when using the ``-c`` or
|
|
``-m`` switches, ``sys.path[0]`` will be the empty string, which the import
|
|
system interprets as allowing imports from the current directory
|
|
|
|
|
|
Configuring ``sys.argv``
|
|
------------------------
|
|
|
|
Unlike most other settings discussed in this PEP, ``sys.argv`` is not
|
|
set implicitly by ``Py_Initialize()``. Instead, it must be set via an
|
|
explicitly call to ``Py_SetArgv()``.
|
|
|
|
CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The
|
|
calculation of ``sys.argv[1:]`` is straightforward: they're the command line
|
|
arguments passed after the script name or the argument to the ``-c`` or
|
|
``-m`` options.
|
|
|
|
The calculation of ``sys.argv[0]`` is a little more complicated:
|
|
|
|
* For an ordinary script (source or bytecode), it will be the script name
|
|
* For a ``sys.path`` entry (typically a zipfile or directory) it will
|
|
initially be the zipfile or directory name, but will later be changed by
|
|
the ``runpy`` module to the full path to the imported ``__main__`` module.
|
|
* For a module specified with the ``-m`` switch, it will initially be the
|
|
string ``"-m"``, but will later be changed by the ``runpy`` module to the
|
|
full path to the executed module.
|
|
* For a package specified with the ``-m`` switch, it will initially be the
|
|
string ``"-m"``, but will later be changed by the ``runpy`` module to the
|
|
full path to the executed ``__main__`` submodule of the package.
|
|
* For a command executed with ``-c``, it will be the string ``"-c"``
|
|
* For explicitly requested input from stdin, it will be the string ``"-"``
|
|
* Otherwise, it will be the empty string
|
|
|
|
Embedding applications must call Py_SetArgv themselves. The CPython logic
|
|
for doing so is part of ``Py_Main()`` and is not exposed separately.
|
|
However, the ``runpy`` module does provide roughly equivalent logic in
|
|
``runpy.run_module`` and ``runpy.run_path``.
|
|
|
|
|
|
|
|
Other configuration settings
|
|
----------------------------
|
|
|
|
TBD: Cover the initialisation of the following in more detail:
|
|
|
|
* The initial warning system state:
|
|
* ``sys.warnoptions``
|
|
* (-W option, PYTHONWARNINGS)
|
|
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
|
|
* ``sys._xoptions``
|
|
* (-X option)
|
|
* The filesystem encoding used by:
|
|
* ``sys.getfsencoding``
|
|
* ``os.fsencode``
|
|
* ``os.fsdecode``
|
|
* The IO encoding and buffering used by:
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* ``sys.stderr``
|
|
* (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
|
|
* Whether or not to implicitly cache bytecode files:
|
|
* ``sys.dont_write_bytecode``
|
|
* (-B option, PYTHONDONTWRITEBYTECODE)
|
|
* Whether or not to enforce correct case in filenames on case-insensitive
|
|
platforms
|
|
* ``os.environ["PYTHONCASEOK"]``
|
|
* The other settings exposed to Python code in ``sys.flags``:
|
|
|
|
* ``debug`` (Enable debugging output in the pgen parser)
|
|
* ``inspect`` (Enter interactive interpreter after __main__ terminates)
|
|
* ``interactive`` (Treat stdin as a tty)
|
|
* ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
|
|
* ``no_user_site`` (don't add the user site directory to sys.path)
|
|
* ``no_site`` (don't implicitly import site during startup)
|
|
* ``ignore_environment`` (whether environment vars are used during config)
|
|
* ``verbose`` (enable all sorts of random output)
|
|
* ``bytes_warning`` (This may be obsolete in Py3k...)
|
|
* ``quiet`` (disable banner output even if verbose is also enabled or
|
|
stdin is a tty and the interpreter is launched in interactive mode)
|
|
|
|
* Whether or not CPython's signal handlers should be installed
|
|
|
|
Much of the configuration of CPython is currently handled through C level
|
|
global variables::
|
|
|
|
Py_BytesWarningFlag
|
|
Py_DebugFlag (-d option)
|
|
Py_InspectFlag (-i option, PYTHONINSPECT)
|
|
Py_InteractiveFlag
|
|
Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
|
|
Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
|
|
Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
|
|
Py_NoSiteFlag (-S option)
|
|
Py_UnbufferedStdioFlag
|
|
Py_VerboseFlag (-v option, PYTHONVERBOSE)
|
|
|
|
For the above variables, the conversion of command line options and
|
|
environment variables to C global variables is handled by ``Py_Main``,
|
|
so each embedding application must set those appropriately in order to
|
|
change them from their defaults.
|
|
|
|
Some configuration can only be provided as OS level environment variables::
|
|
|
|
PYTHONSTARTUP
|
|
PYTHONCASEOK
|
|
PYTHONIOENCODING
|
|
|
|
The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate
|
|
whether or not CPython's signal handlers should be installed.
|
|
|
|
Finally, some interactive behaviour (such as printing the introductory
|
|
banner) is triggered only when standard input is reported as a terminal
|
|
connection by the operating system.
|
|
|
|
TBD: Document how the "-x" option is handled (skips processing of the
|
|
first comment line in the main script)
|
|
|
|
Also see detailed sequence of operations notes at [1_]
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
(Note: details here are still very much in flux, but preliminary feedback
|
|
is appreciated anyway)
|
|
|
|
The main theme of this proposal is to create the interpreter state for
|
|
the main interpreter *much* earlier in the startup process. This will allow
|
|
most of the CPython API to be used during the remainder of the initialisation
|
|
process, potentially simplifying a number of operations that currently need
|
|
to rely on basic C functionality rather than being able to use the richer
|
|
data structures provided by the CPython C API.
|
|
|
|
|
|
Core Interpreter Initialisation
|
|
-------------------------------
|
|
|
|
The only configuration that currently absolutely needs to be in place
|
|
before even the interpreter core can be initialised is a flag indicating
|
|
whether or not to use a specific seed value for the randomised hashes, and
|
|
if so, the specific value for the seed (a seed value of zero disables
|
|
randomised hashing).
|
|
|
|
The proposed API for this step in the startup sequence is::
|
|
|
|
void Py_BeginInitialization(Py_CoreConfig *config);
|
|
|
|
Like Py_Initialize, this part of the new API treats initialisation failures
|
|
as fatal errors. While that's still not particularly embedding friendly,
|
|
the operations in this step *really* shouldn't be failing, and changing them
|
|
to return error codes instead of aborting would be an even larger task than
|
|
the one already being proposed.
|
|
|
|
The new Py_CoreConfig struct holds the settings required for preliminary
|
|
configuration::
|
|
|
|
typedef struct {
|
|
int use_hash_seed;
|
|
unsigned long hash_seed;
|
|
} Py_CoreConfig;
|
|
|
|
To disable hash randomisation, set "use_hash_seed" and pass a hash seed of
|
|
zero. (This is the same approach already used when interpreting the
|
|
``PYTHONHASHSEED`` environment variable)
|
|
|
|
The core configuration settings pointer may be NULL, in which case the
|
|
default behaviour of randomised hashes with a random seed will be used.
|
|
|
|
The aim is to keep this initial level of configuration as small as possible
|
|
in order to keep the bootstrapping environment consistent across
|
|
different embedding applications. If we can create a valid interpreter state
|
|
without the setting, then the setting should go in the config dict passed
|
|
to ``Py_EndInitialization()`` rather than in the core configuration.
|
|
|
|
A new query API will allow code to determine if the interpreter is in the
|
|
bootstrapping state between the creation of the interpreter state and the
|
|
completion of the bulk of the initialisation process::
|
|
|
|
int Py_IsInitializing();
|
|
|
|
Attempting to call ``Py_BeginInitialization()`` again when
|
|
``Py_IsInitializing()`` or ``Py_IsInitialized()`` is true is a fatal error.
|
|
|
|
While in the initialising state, the interpreter should be fully functional
|
|
except that:
|
|
|
|
* compilation is not allowed (as the parser and compiler are not yet
|
|
configured properly)
|
|
* The following attributes in the ``sys`` module are all either missing or
|
|
``None``:
|
|
* ``sys.path``
|
|
* ``sys.argv``
|
|
* ``sys.executable``
|
|
* ``sys.base_exec_prefix``
|
|
* ``sys.base_prefix``
|
|
* ``sys.exec_prefix``
|
|
* ``sys.prefix``
|
|
* ``sys.warnoptions``
|
|
* ``sys.flags``
|
|
* ``sys.dont_write_bytecode``
|
|
* ``sys.stdin``
|
|
* ``sys.stdout``
|
|
* The filesystem encoding is not yet defined
|
|
* The IO encoding is not yet defined
|
|
* CPython signal handlers are not yet installed
|
|
* only builtin and frozen modules may be imported (due to above limitations)
|
|
* ``sys.stderr`` is set to a temporary IO object using unbuffered binary
|
|
mode
|
|
* The ``warnings`` module is not yet initialised
|
|
* The ``__main__`` module does not yet exist
|
|
|
|
<TBD: identify any other notable missing functionality>
|
|
|
|
The main things made available by this step will be the core Python
|
|
datatypes, in particular dictionaries, lists and strings. This allows them
|
|
to be used safely for all of the remaining configuration steps (unlike the
|
|
status quo).
|
|
|
|
In addition, the current thread will possess a valid Python thread state,
|
|
allow any further configuration data to be stored on the interpreter object
|
|
rather than in C process globals.
|
|
|
|
Any call to Py_BeginInitialization() must have a matching call to
|
|
Py_Finalize(). It is acceptable to skip calling Py_EndInitialization() in
|
|
between (e.g. if attempting to read the configuration settings fails)
|
|
|
|
|
|
Determining the remaining configuration settings
|
|
------------------------------------------------
|
|
|
|
The next step in the initialisation sequence is to determine the full
|
|
settings needed to complete the process. No changes are made to the
|
|
interpreter state at this point. The core API for this step is::
|
|
|
|
int Py_ReadConfiguration(PyObject *config);
|
|
|
|
The config argument should be a pointer to a Python dictionary. For any
|
|
supported configuration setting already in the dictionary, CPython will
|
|
sanity check the supplied value, but otherwise accept it as correct.
|
|
|
|
Unlike Py_Initialize and Py_BeginInitialization, this call will raise an
|
|
exception and report an error return rather than exhibiting fatal errors if
|
|
a problem is found with the config data.
|
|
|
|
Any supported configuration setting which is not already set will be
|
|
populated appropriately. The default configuration can be overridden
|
|
entirely by setting the value *before* calling Py_ReadConfiguration. The
|
|
provided value will then also be used in calculating any settings derived
|
|
from that value.
|
|
|
|
Alternatively, settings may be overridden *after* the Py_ReadConfiguration
|
|
call (this can be useful if an embedding application wants to adjust
|
|
a setting rather than replace it completely, such as removing
|
|
``sys.path[0]``).
|
|
|
|
|
|
Supported configuration settings
|
|
--------------------------------
|
|
|
|
At least the following configuration settings will be supported::
|
|
|
|
raw_argv (list of str, default = retrieved from OS APIs)
|
|
|
|
argv (list of str, default = derived from raw_argv)
|
|
warnoptions (list of str, default = derived from raw_argv and environment)
|
|
xoptions (list of str, default = derived from raw_argv and environment)
|
|
|
|
program_name (str, default = retrieved from OS APIs)
|
|
executable (str, default = derived from program_name)
|
|
home (str, default = complicated!)
|
|
prefix (str, default = complicated!)
|
|
exec_prefix (str, default = complicated!)
|
|
base_prefix (str, default = complicated!)
|
|
base_exec_prefix (str, default = complicated!)
|
|
path (list of str, default = complicated!)
|
|
|
|
io_encoding (str, default = derived from environment or OS APIs)
|
|
fs_encoding (str, default = derived from OS APIs)
|
|
|
|
skip_signal_handlers (boolean, default = derived from environment or False)
|
|
ignore_environment (boolean, default = derived from environment or False)
|
|
dont_write_bytecode (boolean, default = derived from environment or False)
|
|
no_site (boolean, default = derived from environment or False)
|
|
no_user_site (boolean, default = derived from environment or False)
|
|
<TBD: at least more from sys.flags need to go here>
|
|
|
|
|
|
Completing the interpreter initialisation
|
|
-----------------------------------------
|
|
|
|
The final step in the process is to actually put the configuration settings
|
|
into effect and finish bootstrapping the interpreter up to full operation::
|
|
|
|
int Py_EndInitialization(PyObject *config);
|
|
|
|
Like Py_ReadConfiguration, this call will raise an exception and report an
|
|
error return rather than exhibiting fatal errors if a problem is found with
|
|
the config data.
|
|
|
|
All configuration settings are required - the configuration dictionary
|
|
should always be passed through ``Py_ReadConfiguration()`` to ensure it
|
|
is fully populated.
|
|
|
|
After a successful call, Py_IsInitializing() will be false, while
|
|
Py_IsInitialized() will become true. The caveats described above for the
|
|
interpreter during the initialisation phase will no longer hold.
|
|
|
|
|
|
Stable ABI
|
|
----------
|
|
|
|
All of the APIs proposed in this PEP are excluded from the stable ABI, as
|
|
embedding a Python interpreter involves a much higher degree of coupling
|
|
than merely writing an extension.
|
|
|
|
|
|
Backwards Compatibility
|
|
-----------------------
|
|
|
|
Backwards compatibility will be preserved primarily by ensuring that
|
|
Py_ReadConfiguration() interrogates all the previously defined configuration
|
|
settings stored in global variables and environment variables, and that
|
|
Py_EndInitialization() writes affected settings back to the relevant
|
|
locations.
|
|
|
|
One acknowledged incompatiblity is that some environment variables which
|
|
are currently read lazily may instead be read once during interpreter
|
|
initialisation. As the PEP matures, these will be discussed in more detail
|
|
on a case by case basis. The environment variables which are currently
|
|
known to be looked up dynamically are:
|
|
|
|
* ``PYTHONCASEOK``: writing to ``os.environ['PYTHONCASEOK']`` will no longer
|
|
dynamically alter the interpreter's handling of filename case differences
|
|
on import (TBC)
|
|
* ``PYTHONINSPECT``: ``os.environ['PYTHONINSPECT']`` will still be checked
|
|
after execution of the ``__main__`` module terminates
|
|
|
|
The ``Py_Initialize()`` style of initialisation will continue to be
|
|
supported. It will use (at least some elements of) the new API
|
|
internally, but will continue to exhibit the same behaviour as it
|
|
does today, ensuring that ``sys.argv`` is not populated until a subsequent
|
|
``PySys_SetArgv`` call. All APIs that currently support being called
|
|
prior to ``Py_Initialize()`` will
|
|
continue to do so, and will also support being called prior to
|
|
``Py_BeginInitialization()``.
|
|
|
|
To minimise unnecessary code churn, and to ensure the backwards compatibility
|
|
is well tested, the main CPython executable may continue to use some elements
|
|
of the old style initialisation API. (very much TBC)
|
|
|
|
|
|
A System Python Executable
|
|
==========================
|
|
|
|
When executing system utilities with administrative access to a system, many
|
|
of the default behaviours of CPython are undesirable, as they may allow
|
|
untrusted code to execute with elevated privileges. The most problematic
|
|
aspects are the fact that user site directories are enabled,
|
|
environment variables are trusted and that the directory containing the
|
|
executed file is placed at the beginning of the import path.
|
|
|
|
Currently, providing a separate executable with different default behaviour
|
|
would be prohibitively hard to maintain. One of the goals of this PEP is to
|
|
make it possible to replace much of the hard to maintain bootstrapping code
|
|
with more normal CPython code, as well as making it easier for a separate
|
|
application to make use of key components of ``Py_Main``. Including this
|
|
change in the PEP is designed to help avoid acceptance of a design that
|
|
sounds good in theory but proves to be problematic in practice.
|
|
|
|
One final aspect not addressed by the general embedding changes above is
|
|
the current inaccessibility of the core logic for deciding between the
|
|
different execution modes supported by CPython:
|
|
|
|
* script execution
|
|
* directory/zipfile execution
|
|
* command execution ("-c" switch)
|
|
* module or package execution ("-m" switch)
|
|
* execution from stdin (non-interactive)
|
|
* interactive stdin
|
|
|
|
<TBD: concrete proposal for better exposing the __main__ execution step>
|
|
|
|
Implementation
|
|
==============
|
|
|
|
None as yet. Once I have a reasonably solid plan of attack, I intend to work
|
|
on a reference implementation as a feature branch in my BitBucket sandbox [2_]
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] CPython interpreter initialization notes
|
|
(http://wiki.python.org/moin/CPythonInterpreterInitialization)
|
|
|
|
.. [2] BitBucket Sandbox
|
|
(https://bitbucket.org/ncoghlan/cpython_sandbox)
|
|
|
|
.. [3] \*nix getpath implementation
|
|
(http://hg.python.org/cpython/file/default/Modules/getpath.c)
|
|
|
|
.. [4] Windows getpath implementation
|
|
(http://hg.python.org/cpython/file/default/PC/getpathp.c)
|
|
|
|
.. [5] Site module documentation
|
|
(http://docs.python.org/3/library/site.html)
|
|
|
|
Copyright
|
|
===========
|
|
This document has been placed in the public domain.
|