365 lines
14 KiB
ReStructuredText
365 lines
14 KiB
ReStructuredText
PEP: 304
|
||
Title: Controlling Generation of Bytecode Files
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Skip Montanaro
|
||
Status: Withdrawn
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 22-Jan-2003
|
||
Post-History: 27-Jan-2003, 31-Jan-2003, 17-Jun-2005
|
||
|
||
Historical Note
|
||
===============
|
||
|
||
While this original PEP was withdrawn, a variant of this feature
|
||
was eventually implemented for Python 3.8 in https://bugs.python.org/issue33499
|
||
|
||
Several of the issues and concerns originally raised in this PEP were resolved
|
||
by other changes in the intervening years:
|
||
|
||
- the introduction of isolated mode to handle potential security concerns
|
||
- the switch to ``importlib``, a fully import-hook based import system implementation
|
||
- :pep:`3147`'s change in the bytecode cache layout to use ``__pycache__``
|
||
subdirectories, including the ``source_to_cache(path)`` and
|
||
``cache_to_source(path)`` APIs that allow the interpreter to automatically
|
||
handle the redirection to a separate cache directory
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP outlines a mechanism for controlling the generation and
|
||
location of compiled Python bytecode files. This idea originally
|
||
arose as a patch request [1]_ and evolved into a discussion thread on
|
||
the python-dev mailing list [2]_. The introduction of an environment
|
||
variable will allow people installing Python or Python-based
|
||
third-party packages to control whether or not bytecode files should
|
||
be generated at installation time, and if so, where they should be
|
||
written. It will also allow users to control whether or not bytecode
|
||
files should be generated at application run-time, and if so, where
|
||
they should be written.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
Add a new environment variable, PYTHONBYTECODEBASE, to the mix of
|
||
environment variables which Python understands. PYTHONBYTECODEBASE is
|
||
interpreted as follows:
|
||
|
||
- If not defined, Python bytecode is generated in exactly the same way
|
||
as is currently done. sys.bytecodebase is set to the root directory
|
||
(either / on Unix and Mac OSX or the root directory of the startup
|
||
(installation???) drive -- typically ``C:\`` -- on Windows).
|
||
|
||
- If defined and it refers to an existing directory to which the user
|
||
has write permission, sys.bytecodebase is set to that directory and
|
||
bytecode files are written into a directory structure rooted at that
|
||
location.
|
||
|
||
- If defined but empty, sys.bytecodebase is set to None and generation
|
||
of bytecode files is suppressed altogether.
|
||
|
||
- If defined and one of the following is true:
|
||
|
||
* it does not refer to a directory,
|
||
|
||
* it refers to a directory, but not one for which the user has write
|
||
permission
|
||
|
||
a warning is displayed, sys.bytecodebase is set to None and
|
||
generation of bytecode files is suppressed altogether.
|
||
|
||
After startup initialization, all runtime references are to
|
||
sys.bytecodebase, not the PYTHONBYTECODEBASE environment variable.
|
||
sys.path is not modified.
|
||
|
||
From the above, we see sys.bytecodebase can only take on two valid
|
||
types of values: None or a string referring to a valid directory on
|
||
the system.
|
||
|
||
During import, this extension works as follows:
|
||
|
||
- The normal search for a module is conducted. The search order is
|
||
roughly: dynamically loaded extension module, Python source file,
|
||
Python bytecode file. The only time this mechanism comes into play
|
||
is if a Python source file is found.
|
||
|
||
- Once we've found a source module, an attempt to read a byte-compiled
|
||
file in the same directory is made. (This is the same as before.)
|
||
|
||
- If no byte-compiled file is found, an attempt to read a
|
||
byte-compiled file from the augmented directory is made.
|
||
|
||
- If bytecode generation is required, the generated bytecode is written
|
||
to the augmented directory if possible.
|
||
|
||
Note that this PEP is explicitly *not* about providing
|
||
module-by-module or directory-by-directory control over the
|
||
disposition of bytecode files.
|
||
|
||
|
||
Glossary
|
||
--------
|
||
|
||
- "bytecode base" refers to the current setting of
|
||
sys.bytecodebase.
|
||
|
||
- "augmented directory" refers to the directory formed from the
|
||
bytecode base and the directory name of the source file.
|
||
|
||
- PYTHONBYTECODEBASE refers to the environment variable when necessary
|
||
to distinguish it from "bytecode base".
|
||
|
||
|
||
Locating bytecode files
|
||
-----------------------
|
||
|
||
When the interpreter is searching for a module, it will use sys.path
|
||
as usual. However, when a possible bytecode file is considered, an
|
||
extra probe for a bytecode file may be made. First, a check is made
|
||
for the bytecode file using the directory in sys.path which holds the
|
||
source file (the current behavior). If a valid bytecode file is not
|
||
found there (either one does not exist or exists but is out-of-date)
|
||
and the bytecode base is not None, a second probe is made using the
|
||
directory in sys.path prefixed appropriately by the bytecode base.
|
||
|
||
|
||
Writing bytecode files
|
||
----------------------
|
||
|
||
When the bytecode base is not None, a new bytecode file is written to
|
||
the appropriate augmented directory, never directly to a directory in
|
||
sys.path.
|
||
|
||
|
||
Defining augmented directories
|
||
------------------------------
|
||
|
||
Conceptually, the augmented directory for a bytecode file is the
|
||
directory in which the source file exists prefixed by the bytecode
|
||
base. In a Unix environment this would be::
|
||
|
||
pcb = os.path.abspath(sys.bytecodebase)
|
||
if sourcefile[0] == os.sep: sourcefile = sourcefile[1:]
|
||
augdir = os.path.join(pcb, os.path.dirname(sourcefile))
|
||
|
||
On Windows, which does not have a single-rooted directory tree, the
|
||
drive letter of the directory containing the source file is treated as
|
||
a directory component after removing the trailing colon. The
|
||
augmented directory is thus derived as ::
|
||
|
||
pcb = os.path.abspath(sys.bytecodebase)
|
||
drive, base = os.path.splitdrive(os.path.dirname(sourcefile))
|
||
drive = drive[:-1]
|
||
if base[0] == "\\": base = base[1:]
|
||
augdir = os.path.join(pcb, drive, base)
|
||
|
||
|
||
Fixing the location of the bytecode base
|
||
----------------------------------------
|
||
|
||
During program startup, the value of the PYTHONBYTECODEBASE
|
||
environment variable is made absolute, checked for validity and added
|
||
to the sys module, effectively::
|
||
|
||
pcb = os.path.abspath(os.environ["PYTHONBYTECODEBASE"])
|
||
probe = os.path.join(pcb, "foo")
|
||
try:
|
||
open(probe, "w")
|
||
except IOError:
|
||
sys.bytecodebase = None
|
||
else:
|
||
os.unlink(probe)
|
||
sys.bytecodebase = pcb
|
||
|
||
This allows the user to specify the bytecode base as a relative path,
|
||
but not have it subject to changes to the current working directory
|
||
during program execution. (I can't imagine you'd want it to move
|
||
around during program execution.)
|
||
|
||
There is nothing special about sys.bytecodebase. The user may change
|
||
it at runtime if desired, but normally it will not be modified.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
In many environments it is not possible for non-root users to write
|
||
into directories containing Python source files. Most of the time,
|
||
this is not a problem as Python source is generally byte compiled
|
||
during installation. However, there are situations where bytecode
|
||
files are either missing or need to be updated. If the directory
|
||
containing the source file is not writable by the current user a
|
||
performance penalty is incurred each time a program importing the
|
||
module is run. [3]_ Warning messages may also be generated in certain
|
||
circumstances. If the directory is writable, nearly simultaneous
|
||
attempts to write the bytecode file by two separate processes
|
||
may occur, resulting in file corruption. [4]_
|
||
|
||
In environments with RAM disks available, it may be desirable for
|
||
performance reasons to write bytecode files to a directory on such a
|
||
disk. Similarly, in environments where Python source code resides on
|
||
network file systems, it may be desirable to cache bytecode files on
|
||
local disks.
|
||
|
||
|
||
Alternatives
|
||
============
|
||
|
||
The only other alternative proposed so far [1]_ seems to be to add a
|
||
-R flag to the interpreter to disable writing bytecode files
|
||
altogether. This proposal subsumes that. Adding a command-line
|
||
option is certainly possible, but is probably not sufficient, as the
|
||
interpreter's command line is not readily available during
|
||
installation (early during program startup???).
|
||
|
||
|
||
Issues
|
||
======
|
||
|
||
- Interpretation of a module's __file__ attribute. I believe the
|
||
__file__ attribute of a module should reflect the true location of
|
||
the bytecode file. If people want to locate a module's source code,
|
||
they should use imp.find_module(module).
|
||
|
||
- Security - What if root has PYTHONBYTECODEBASE set? Yes, this can
|
||
present a security risk, but so can many other things the root user
|
||
does. The root user should probably not set PYTHONBYTECODEBASE
|
||
except possibly during installation. Still, perhaps this problem
|
||
can be minimized. When running as root the interpreter should check
|
||
to see if PYTHONBYTECODEBASE refers to a directory which is writable
|
||
by anyone other than root. If so, it could raise an exception or
|
||
warning and set sys.bytecodebase to None. Or, see the next item.
|
||
|
||
- More security - What if PYTHONBYTECODEBASE refers to a general
|
||
directory (say, /tmp)? In this case, perhaps loading of a
|
||
preexisting bytecode file should occur only if the file is owned by
|
||
the current user or root. (Does this matter on Windows?)
|
||
|
||
- The interaction of this PEP with import hooks has not been
|
||
considered yet. In fact, the best way to implement this idea might
|
||
be as an import hook. See :pep:`302`.
|
||
|
||
- In the current (pre-:pep:`304`) environment, it is safe to delete a
|
||
source file after the corresponding bytecode file has been created,
|
||
since they reside in the same directory. With :pep:`304` as currently
|
||
defined, this is not the case. A bytecode file in the augmented
|
||
directory is only considered when the source file is present and it
|
||
thus never considered when looking for module files ending in
|
||
".pyc". I think this behavior may have to change.
|
||
|
||
|
||
Examples
|
||
========
|
||
|
||
In the examples which follow, the urllib source code resides in
|
||
/usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path but
|
||
is not writable by the current user.
|
||
|
||
- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists and
|
||
is valid. When urllib is imported, the contents of
|
||
/usr/lib/python2.3/urllib.pyc are used. The augmented directory is
|
||
not consulted. No other bytecode file is generated.
|
||
|
||
- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists,
|
||
but is out-of-date. When urllib is imported, the generated bytecode
|
||
file is written to urllib.pyc in the augmented directory which has
|
||
the value /tmp/usr/lib/python2.3. Intermediate directories will be
|
||
created as needed.
|
||
|
||
- The bytecode base is None. No urllib.pyc file is found. When
|
||
urllib is imported, no bytecode file is written.
|
||
|
||
- The bytecode base is /tmp. No urllib.pyc file is found. When
|
||
urllib is imported, the generated bytecode file is written to the
|
||
augmented directory which has the value /tmp/usr/lib/python2.3.
|
||
Intermediate directories will be created as needed.
|
||
|
||
- At startup, PYTHONBYTECODEBASE is /tmp/foobar, which does not exist.
|
||
A warning is emitted, sys.bytecodebase is set to None and no
|
||
bytecode files are written during program execution unless
|
||
sys.bytecodebase is later changed to refer to a valid,
|
||
writable directory.
|
||
|
||
- At startup, PYTHONBYTECODEBASE is set to /, which exists, but is not
|
||
writable by the current user. A warning is emitted,
|
||
sys.bytecodebase is set to None and no bytecode files are
|
||
written during program execution unless sys.bytecodebase is
|
||
later changed to refer to a valid, writable directory. Note that
|
||
even though the augmented directory constructed for a particular
|
||
bytecode file may be writable by the current user, what counts is
|
||
that the bytecode base directory itself is writable.
|
||
|
||
- At startup PYTHONBYTECODEBASE is set to the empty string.
|
||
sys.bytecodebase is set to None. No warning is generated, however.
|
||
If no urllib.pyc file is found when urllib is imported, no bytecode
|
||
file is written.
|
||
|
||
In the Windows examples which follow, the urllib source code resides
|
||
in ``C:\PYTHON22\urllib.py``. ``C:\PYTHON22`` is in sys.path but is
|
||
not writable by the current user.
|
||
|
||
- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc``
|
||
exists and is valid. When urllib is imported, the contents of
|
||
``C:\PYTHON22\urllib.pyc`` are used. The augmented directory is not
|
||
consulted.
|
||
|
||
- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc``
|
||
exists, but is out-of-date. When urllib is imported, a new bytecode
|
||
file is written to the augmented directory which has the value
|
||
``C:\TEMP\C\PYTHON22``. Intermediate directories will be created as
|
||
needed.
|
||
|
||
- At startup PYTHONBYTECODEBASE is set to ``TEMP`` and the current
|
||
working directory at application startup is ``H:\NET``. The
|
||
potential bytecode base is thus ``H:\NET\TEMP``. If this directory
|
||
exists and is writable by the current user, sys.bytecodebase will be
|
||
set to that value. If not, a warning will be emitted and
|
||
sys.bytecodebase will be set to None.
|
||
|
||
- The bytecode base is ``C:\TEMP``. No urllib.pyc file is found.
|
||
When urllib is imported, the generated bytecode file is written to
|
||
the augmented directory which has the value ``C:\TEMP\C\PYTHON22``.
|
||
Intermediate directories will be created as needed.
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
See the patch on Sourceforge. [6]_
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] patch 602345, Option for not writing py.[co] files, Klose
|
||
(https://bugs.python.org/issue602345)
|
||
|
||
.. [2] python-dev thread, Disable writing .py[co], Norwitz
|
||
(https://mail.python.org/pipermail/python-dev/2003-January/032270.html)
|
||
|
||
.. [3] Debian bug report, Mailman is writing to /usr in cron, Wegner
|
||
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)
|
||
|
||
.. [4] python-dev thread, Parallel pyc construction, Dubois
|
||
(https://mail.python.org/pipermail/python-dev/2003-January/032060.html)
|
||
|
||
.. [6] patch 677103, PYTHONBYTECODEBASE patch (PEP 304), Montanaro
|
||
(https://bugs.python.org/issue677103)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|