PEP 432: Move the description of the status quo to the end

This commit is contained in:
Nick Coghlan 2013-01-15 21:50:35 +10:00
parent 1f05458a77
commit 8d83264979
1 changed files with 245 additions and 245 deletions

View File

@ -242,250 +242,6 @@ adding additional configuration settings easier in the future, it
deliberately avoids adding any new settings of its own.
The Status Quo
==============
The current mechanisms for configuring the interpreter have accumulated in
a fairly ad hoc fashion over the past 20+ years, leading to a rather
inconsistent interface with varying levels of documentation.
(Note: some of the info below could probably be cleaned up and added to the
C API documentation - it's all CPython specific, so it doesn't belong in
the language reference)
Ignoring Environment Variables
------------------------------
The ``-E`` command line option allows all environment variables to be
ignored when initializing the Python interpreter. An embedding application
can enable this behaviour by setting ``Py_IgnoreEnvironmentFlag`` before
calling ``Py_Initialize()``.
In the CPython source code, the ``Py_GETENV`` macro implicitly checks this
flag, and always produces ``NULL`` if it is set.
<TBD: I believe PYTHONCASEOK is checked regardless of this setting >
<TBD: Does -E also ignore Windows registry keys? >
Randomised Hashing
------------------
The randomised hashing is controlled via the ``-R`` command line option (in
releases prior to 3.3), as well as the ``PYTHONHASHSEED`` environment
variable.
In Python 3.3, only the environment variable remains relevant. It can be
used to disable randomised hashing (by using a seed value of 0) or else
to force a specific hash value (e.g. for repeatability of testing, or
to share hash values between processes)
However, embedding applications must use the ``Py_HashRandomizationFlag``
to explicitly request hash randomisation (CPython sets it in ``Py_Main()``
rather than in ``Py_Initialize()``).
The new configuration API should make it straightforward for an
embedding application to reuse the ``PYTHONHASHSEED`` processing with
a text based configuration setting provided by other means (e.g. a
config file or separate environment variable).
Locating Python and the standard library
----------------------------------------
The location of the Python binary and the standard library is influenced
by several elements. The algorithm used to perform the calculation is
not documented anywhere other than in the source code [3_,4_]. Even that
description is incomplete, as it failed to be updated for the virtual
environment support added in Python 3.3 (detailed in PEP 405).
These calculations are affected by the following function calls (made
prior to calling ``Py_Initialize()``) and environment variables:
* ``Py_SetProgramName()``
* ``Py_SetPythonHome()``
* ``PYTHONHOME``
The filesystem is also inspected for ``pyvenv.cfg`` files (see PEP 405) or,
failing that, a ``lib/os.py`` (Windows) or ``lib/python$VERSION/os.py``
file.
The build time settings for ``PREFIX`` and ``EXEC_PREFIX`` are also relevant,
as are some registry settings on Windows. The hardcoded fallbacks are
based on the layout of the CPython source tree and build output when
working in a source checkout.
Configuring ``sys.path``
------------------------
An embedding application may call ``Py_SetPath()`` prior to
``Py_Initialize()`` to completely override the calculation of
``sys.path``. It is not straightforward to only allow *some* of the
calculations, as modifying ``sys.path`` after initialization is
already complete means those modifications will not be in effect
when standard library modules are imported during the startup sequence.
If ``Py_SetPath()`` is not used prior to the first call to ``Py_GetPath()``
(implicit in ``Py_Initialize()``), then it builds on the location data
calculations above to calculate suitable path entries, along with
the ``PYTHONPATH`` environment variable.
<TBD: On Windows, there's also a bunch of stuff to do with the registry>
The ``site`` module, which is implicitly imported at startup (unless
disabled via the ``-S`` option) adds additional paths to this initial
set of paths, as described in its documentation [5_].
The ``-s`` command line option can be used to exclude the user site
directory from the list of directories added. Embedding applications
can control this by setting the ``Py_NoUserSiteDirectory`` global variable.
The following commands can be used to check the default path configurations
for a given Python executable on a given system:
* ``./python -c "import sys, pprint; pprint.pprint(sys.path)"``
- standard configuration
* ``./python -s -c "import sys, pprint; pprint.pprint(sys.path)"``
- user site directory disabled
* ``./python -S -c "import sys, pprint; pprint.pprint(sys.path)"``
- all site path modifications disabled
(Note: you can see similar information using ``-m site`` instead of ``-c``,
but this is slightly misleading as it calls ``os.abspath`` on all of the
path entries, making relative path entries look absolute. Using the ``site``
module also causes problems in the last case, as on Python versions prior to
3.3, explicitly importing site will carry out the path modifications ``-S``
avoids, while on 3.3+ combining ``-m site`` with ``-S`` currently fails)
The calculation of ``sys.path[0]`` is comparatively straightforward:
* For an ordinary script (Python source or compiled bytecode),
``sys.path[0]`` will be the directory containing the script.
* For a valid ``sys.path`` entry (typically a zipfile or directory),
``sys.path[0]`` will be that path
* For an interactive session, running from stdin or when using the ``-c`` or
``-m`` switches, ``sys.path[0]`` will be the empty string, which the import
system interprets as allowing imports from the current directory
Configuring ``sys.argv``
------------------------
Unlike most other settings discussed in this PEP, ``sys.argv`` is not
set implicitly by ``Py_Initialize()``. Instead, it must be set via an
explicitly call to ``Py_SetArgv()``.
CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The
calculation of ``sys.argv[1:]`` is straightforward: they're the command line
arguments passed after the script name or the argument to the ``-c`` or
``-m`` options.
The calculation of ``sys.argv[0]`` is a little more complicated:
* For an ordinary script (source or bytecode), it will be the script name
* For a ``sys.path`` entry (typically a zipfile or directory) it will
initially be the zipfile or directory name, but will later be changed by
the ``runpy`` module to the full path to the imported ``__main__`` module.
* For a module specified with the ``-m`` switch, it will initially be the
string ``"-m"``, but will later be changed by the ``runpy`` module to the
full path to the executed module.
* For a package specified with the ``-m`` switch, it will initially be the
string ``"-m"``, but will later be changed by the ``runpy`` module to the
full path to the executed ``__main__`` submodule of the package.
* For a command executed with ``-c``, it will be the string ``"-c"``
* For explicitly requested input from stdin, it will be the string ``"-"``
* Otherwise, it will be the empty string
Embedding applications must call Py_SetArgv themselves. The CPython logic
for doing so is part of ``Py_Main()`` and is not exposed separately.
However, the ``runpy`` module does provide roughly equivalent logic in
``runpy.run_module`` and ``runpy.run_path``.
Other configuration settings
----------------------------
TBD: Cover the initialization of the following in more detail:
* Completely disabling the import system
* The initial warning system state:
* ``sys.warnoptions``
* (-W option, PYTHONWARNINGS)
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
* ``sys._xoptions``
* (-X option)
* The filesystem encoding used by:
* ``sys.getfsencoding``
* ``os.fsencode``
* ``os.fsdecode``
* The IO encoding and buffering used by:
* ``sys.stdin``
* ``sys.stdout``
* ``sys.stderr``
* (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
* Whether or not to implicitly cache bytecode files:
* ``sys.dont_write_bytecode``
* (-B option, PYTHONDONTWRITEBYTECODE)
* Whether or not to enforce correct case in filenames on case-insensitive
platforms
* ``os.environ["PYTHONCASEOK"]``
* The other settings exposed to Python code in ``sys.flags``:
* ``debug`` (Enable debugging output in the pgen parser)
* ``inspect`` (Enter interactive interpreter after __main__ terminates)
* ``interactive`` (Treat stdin as a tty)
* ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
* ``no_user_site`` (don't add the user site directory to sys.path)
* ``no_site`` (don't implicitly import site during startup)
* ``ignore_environment`` (whether environment vars are used during config)
* ``verbose`` (enable all sorts of random output)
* ``bytes_warning`` (warnings/errors for implicit str/bytes interaction)
* ``quiet`` (disable banner output even if verbose is also enabled or
stdin is a tty and the interpreter is launched in interactive mode)
* Whether or not CPython's signal handlers should be installed
Much of the configuration of CPython is currently handled through C level
global variables::
Py_BytesWarningFlag (-b)
Py_DebugFlag (-d option)
Py_InspectFlag (-i option, PYTHONINSPECT)
Py_InteractiveFlag (property of stdin, cannot be overridden)
Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
Py_NoSiteFlag (-S option)
Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO)
Py_VerboseFlag (-v option, PYTHONVERBOSE)
For the above variables, the conversion of command line options and
environment variables to C global variables is handled by ``Py_Main``,
so each embedding application must set those appropriately in order to
change them from their defaults.
Some configuration can only be provided as OS level environment variables::
PYTHONSTARTUP
PYTHONCASEOK
PYTHONIOENCODING
The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate
whether or not CPython's signal handlers should be installed.
Finally, some interactive behaviour (such as printing the introductory
banner) is triggered only when standard input is reported as a terminal
connection by the operating system.
TBD: Document how the "-x" option is handled (skips processing of the
first comment line in the main script)
Also see detailed sequence of operations notes at [1_]
Design Details
==============
@ -623,7 +379,7 @@ configuration::
#define Py_CoreConfig_INIT {0, -1, 0, 0}
The core configuration settings pointer may be ``NULL``, in which case the
default values are ``ignore_environment = 0`` and ``use_hash_seed = -1``.
default values are ``ignore_environment = -1`` and ``use_hash_seed = -1``.
The ``Py_CoreConfig_INIT`` macro is designed to allow easy initialization
of a struct instance with sensible defaults::
@ -1189,6 +945,250 @@ is intended for CPython builtin and extension modules) and into the Tools
directory.
The Status Quo
==============
The current mechanisms for configuring the interpreter have accumulated in
a fairly ad hoc fashion over the past 20+ years, leading to a rather
inconsistent interface with varying levels of documentation.
(Note: some of the info below could probably be cleaned up and added to the
C API documentation for at least 3.3. - it's all CPython specific, so it
doesn't belong in the language reference)
Ignoring Environment Variables
------------------------------
The ``-E`` command line option allows all environment variables to be
ignored when initializing the Python interpreter. An embedding application
can enable this behaviour by setting ``Py_IgnoreEnvironmentFlag`` before
calling ``Py_Initialize()``.
In the CPython source code, the ``Py_GETENV`` macro implicitly checks this
flag, and always produces ``NULL`` if it is set.
<TBD: I believe PYTHONCASEOK is checked regardless of this setting >
<TBD: Does -E also ignore Windows registry keys? >
Randomised Hashing
------------------
The randomised hashing is controlled via the ``-R`` command line option (in
releases prior to 3.3), as well as the ``PYTHONHASHSEED`` environment
variable.
In Python 3.3, only the environment variable remains relevant. It can be
used to disable randomised hashing (by using a seed value of 0) or else
to force a specific hash value (e.g. for repeatability of testing, or
to share hash values between processes)
However, embedding applications must use the ``Py_HashRandomizationFlag``
to explicitly request hash randomisation (CPython sets it in ``Py_Main()``
rather than in ``Py_Initialize()``).
The new configuration API should make it straightforward for an
embedding application to reuse the ``PYTHONHASHSEED`` processing with
a text based configuration setting provided by other means (e.g. a
config file or separate environment variable).
Locating Python and the standard library
----------------------------------------
The location of the Python binary and the standard library is influenced
by several elements. The algorithm used to perform the calculation is
not documented anywhere other than in the source code [3_,4_]. Even that
description is incomplete, as it failed to be updated for the virtual
environment support added in Python 3.3 (detailed in PEP 405).
These calculations are affected by the following function calls (made
prior to calling ``Py_Initialize()``) and environment variables:
* ``Py_SetProgramName()``
* ``Py_SetPythonHome()``
* ``PYTHONHOME``
The filesystem is also inspected for ``pyvenv.cfg`` files (see PEP 405) or,
failing that, a ``lib/os.py`` (Windows) or ``lib/python$VERSION/os.py``
file.
The build time settings for ``PREFIX`` and ``EXEC_PREFIX`` are also relevant,
as are some registry settings on Windows. The hardcoded fallbacks are
based on the layout of the CPython source tree and build output when
working in a source checkout.
Configuring ``sys.path``
------------------------
An embedding application may call ``Py_SetPath()`` prior to
``Py_Initialize()`` to completely override the calculation of
``sys.path``. It is not straightforward to only allow *some* of the
calculations, as modifying ``sys.path`` after initialization is
already complete means those modifications will not be in effect
when standard library modules are imported during the startup sequence.
If ``Py_SetPath()`` is not used prior to the first call to ``Py_GetPath()``
(implicit in ``Py_Initialize()``), then it builds on the location data
calculations above to calculate suitable path entries, along with
the ``PYTHONPATH`` environment variable.
<TBD: On Windows, there's also a bunch of stuff to do with the registry>
The ``site`` module, which is implicitly imported at startup (unless
disabled via the ``-S`` option) adds additional paths to this initial
set of paths, as described in its documentation [5_].
The ``-s`` command line option can be used to exclude the user site
directory from the list of directories added. Embedding applications
can control this by setting the ``Py_NoUserSiteDirectory`` global variable.
The following commands can be used to check the default path configurations
for a given Python executable on a given system:
* ``./python -c "import sys, pprint; pprint.pprint(sys.path)"``
- standard configuration
* ``./python -s -c "import sys, pprint; pprint.pprint(sys.path)"``
- user site directory disabled
* ``./python -S -c "import sys, pprint; pprint.pprint(sys.path)"``
- all site path modifications disabled
(Note: you can see similar information using ``-m site`` instead of ``-c``,
but this is slightly misleading as it calls ``os.abspath`` on all of the
path entries, making relative path entries look absolute. Using the ``site``
module also causes problems in the last case, as on Python versions prior to
3.3, explicitly importing site will carry out the path modifications ``-S``
avoids, while on 3.3+ combining ``-m site`` with ``-S`` currently fails)
The calculation of ``sys.path[0]`` is comparatively straightforward:
* For an ordinary script (Python source or compiled bytecode),
``sys.path[0]`` will be the directory containing the script.
* For a valid ``sys.path`` entry (typically a zipfile or directory),
``sys.path[0]`` will be that path
* For an interactive session, running from stdin or when using the ``-c`` or
``-m`` switches, ``sys.path[0]`` will be the empty string, which the import
system interprets as allowing imports from the current directory
Configuring ``sys.argv``
------------------------
Unlike most other settings discussed in this PEP, ``sys.argv`` is not
set implicitly by ``Py_Initialize()``. Instead, it must be set via an
explicitly call to ``Py_SetArgv()``.
CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The
calculation of ``sys.argv[1:]`` is straightforward: they're the command line
arguments passed after the script name or the argument to the ``-c`` or
``-m`` options.
The calculation of ``sys.argv[0]`` is a little more complicated:
* For an ordinary script (source or bytecode), it will be the script name
* For a ``sys.path`` entry (typically a zipfile or directory) it will
initially be the zipfile or directory name, but will later be changed by
the ``runpy`` module to the full path to the imported ``__main__`` module.
* For a module specified with the ``-m`` switch, it will initially be the
string ``"-m"``, but will later be changed by the ``runpy`` module to the
full path to the executed module.
* For a package specified with the ``-m`` switch, it will initially be the
string ``"-m"``, but will later be changed by the ``runpy`` module to the
full path to the executed ``__main__`` submodule of the package.
* For a command executed with ``-c``, it will be the string ``"-c"``
* For explicitly requested input from stdin, it will be the string ``"-"``
* Otherwise, it will be the empty string
Embedding applications must call Py_SetArgv themselves. The CPython logic
for doing so is part of ``Py_Main()`` and is not exposed separately.
However, the ``runpy`` module does provide roughly equivalent logic in
``runpy.run_module`` and ``runpy.run_path``.
Other configuration settings
----------------------------
TBD: Cover the initialization of the following in more detail:
* Completely disabling the import system
* The initial warning system state:
* ``sys.warnoptions``
* (-W option, PYTHONWARNINGS)
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
* ``sys._xoptions``
* (-X option)
* The filesystem encoding used by:
* ``sys.getfsencoding``
* ``os.fsencode``
* ``os.fsdecode``
* The IO encoding and buffering used by:
* ``sys.stdin``
* ``sys.stdout``
* ``sys.stderr``
* (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
* Whether or not to implicitly cache bytecode files:
* ``sys.dont_write_bytecode``
* (-B option, PYTHONDONTWRITEBYTECODE)
* Whether or not to enforce correct case in filenames on case-insensitive
platforms
* ``os.environ["PYTHONCASEOK"]``
* The other settings exposed to Python code in ``sys.flags``:
* ``debug`` (Enable debugging output in the pgen parser)
* ``inspect`` (Enter interactive interpreter after __main__ terminates)
* ``interactive`` (Treat stdin as a tty)
* ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
* ``no_user_site`` (don't add the user site directory to sys.path)
* ``no_site`` (don't implicitly import site during startup)
* ``ignore_environment`` (whether environment vars are used during config)
* ``verbose`` (enable all sorts of random output)
* ``bytes_warning`` (warnings/errors for implicit str/bytes interaction)
* ``quiet`` (disable banner output even if verbose is also enabled or
stdin is a tty and the interpreter is launched in interactive mode)
* Whether or not CPython's signal handlers should be installed
Much of the configuration of CPython is currently handled through C level
global variables::
Py_BytesWarningFlag (-b)
Py_DebugFlag (-d option)
Py_InspectFlag (-i option, PYTHONINSPECT)
Py_InteractiveFlag (property of stdin, cannot be overridden)
Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
Py_NoSiteFlag (-S option)
Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO)
Py_VerboseFlag (-v option, PYTHONVERBOSE)
For the above variables, the conversion of command line options and
environment variables to C global variables is handled by ``Py_Main``,
so each embedding application must set those appropriately in order to
change them from their defaults.
Some configuration can only be provided as OS level environment variables::
PYTHONSTARTUP
PYTHONCASEOK
PYTHONIOENCODING
The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate
whether or not CPython's signal handlers should be installed.
Finally, some interactive behaviour (such as printing the introductory
banner) is triggered only when standard input is reported as a terminal
connection by the operating system.
TBD: Document how the "-x" option is handled (skips processing of the
first comment line in the main script)
Also see detailed sequence of operations notes at [1_]
References
==========