293 lines
11 KiB
Plaintext
293 lines
11 KiB
Plaintext
PEP: 304
|
||
Title: Controlling generation of bytecode files
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Skip Montanaro
|
||
Status: Active
|
||
Type: Draft
|
||
Content-Type: text/x-rst
|
||
Created: 22-Jan-2003
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP outlines a mechanism for controlling the generation and
|
||
location of compiled Python bytecode files. This idea originally
|
||
arose as a patch request [1]_ and evolved into a discussion thread on
|
||
the python-dev mailing list [2]_. The introduction of an environment
|
||
variable will allow people installing Python or Python-based
|
||
third-party packages to control whether or not bytecode files
|
||
should be generated, and if so, where they should be written.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
Add a new environment variable, PYTHONBYTECODEBASE, to the mix of
|
||
environment variables which Python understands. Its interpretation
|
||
is:
|
||
|
||
- If not present Python bytecode is generated in exactly the same way
|
||
as is currently done. sys.pythonbytecodebase is set to the root
|
||
directory (either / on Unix or the root directory of the startup
|
||
drive -- typically ``C:\`` -- on Windows).
|
||
|
||
- If present and it refers to an existing directory,
|
||
sys.pythonbytecodebase is set to that directory and bytecode files
|
||
are written into a directory structure rooted at that location.
|
||
|
||
- If present but empty, sys.pythonbytecodebase is set to None and
|
||
generation of bytecode files is suppressed altogether.
|
||
|
||
- If present and it does not refer to an existing directory, a warning
|
||
is displayed, sys.pythonbytecodebase is set to None and generation
|
||
of bytecode files is suppressed altogether.
|
||
|
||
After startup, all runtime references are to sys.pythonbytecodebase,
|
||
not the PYTHONBYTECODEBASE enbironment variable. sys.path is not
|
||
modified.
|
||
|
||
|
||
Glossary
|
||
--------
|
||
|
||
- "bytecode base" refers to the current setting of
|
||
sys.pythonbytecodebase.
|
||
|
||
- "augmented directory" refers to the directory formed from the
|
||
bytecode base and the directory name of the source file.
|
||
|
||
- PYTHONBYTECODEBASE refers to the environment variable when necessary
|
||
to distinguish it from "bytecode base".
|
||
|
||
|
||
Locating bytecode files
|
||
-----------------------
|
||
|
||
When the interpreter is searching for a module, it will use sys.path
|
||
as usual. However, when a possible bytecode file is considered, an
|
||
extra probe for a bytecode file may be made. First, a check is made
|
||
for the bytecode file using the directory in sys.path which holds the
|
||
source file (the current behavior). If a valid bytecode file is not
|
||
found there (either one does not exist or exists but is out-of-date)
|
||
and the bytecode base is not None, a second probe is made using the
|
||
directory in sys.path prefixed appropriately by the bytecode base.
|
||
|
||
|
||
Writing bytecode files
|
||
----------------------
|
||
|
||
When the bytecode base is not None, a new bytecode file is written to
|
||
the appropriate augmented directory, never directly to a directory in
|
||
sys.path.
|
||
|
||
|
||
Defining augmented directories
|
||
------------------------------
|
||
|
||
Conceptually, the augmented directory for a bytecode file is the
|
||
directory in which the source file exists prefixed by the bytecode
|
||
base. In a Unix environment this would be::
|
||
|
||
pcb = os.path.abspath(sys.pythonbytecodebase)
|
||
if sourcefile[0] == os.sep: sourcefile = sourcefile[1:]
|
||
augdir = os.path.join(pcb, os.path.dirname(sourcefile))
|
||
|
||
On Windows, which does not have a single-rooted directory tree, the
|
||
drive letter of the directory containing the source file is treated as
|
||
a directory component after removing the trailing colon. The
|
||
augmented directory is thus derived as ::
|
||
|
||
pcb = os.path.abspath(sys.pythonbytecodebase)
|
||
drive, base = os.path.splitdrive(os.path.dirname(sourcefile))
|
||
drive = drive[:-1]
|
||
if base[0] == "\\": base = base[1:]
|
||
augdir = os.path.join(pcb, drive, base)
|
||
|
||
Fixing the location of the bytecode base
|
||
----------------------------------------
|
||
|
||
During program startup, the value of the PYTHONBYTECODEBASE
|
||
environment variable is made absolute, checked for validity and added
|
||
to the sys module, effectively::
|
||
|
||
pcb = os.path.abspath(os.environ["PYTHONBYTECODEBASE"])
|
||
try:
|
||
probe = os.path.join(pcb, "foo")
|
||
open(probe, "w")
|
||
os.unlink(probe)
|
||
sys.pythonbytecodebase = pcb
|
||
except IOError:
|
||
sys.pythonbytecodebase = None
|
||
|
||
This allows the user to specify the bytecode base as a relative path,
|
||
but not have it subject to changes to the current working directory.
|
||
(I can't imagine you'd want it to move around during program
|
||
execution.)
|
||
|
||
There is nothing special about sys.pythonbytecodebase. The user may
|
||
change it at runtime if she so chooses, but normally it will not be
|
||
modified.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
In many environments it is not possible for non-root users to write
|
||
into directories containing Python source files. Most of the time,
|
||
this is not a problem as Python source is generally byte compiled
|
||
during installation. However, there are situations where bytecode
|
||
files are either missing or need to be updated. If the directory
|
||
containing the source file is not writable by the current user a
|
||
performance penalty is incurred each time a program importing the
|
||
module is run. [3]_ Warning messages may also be generated in certain
|
||
circumstances. If the directory is writable, nearly simultaneous
|
||
attempts attempts to write the bytecode file by two separate processes
|
||
may occur, resulting in file corruption. [4]_
|
||
|
||
In environments with ramdisks available, it may be desirable for
|
||
performance reasons to write bytecode files to a directory on such a
|
||
disk. Similarly, in environments where Python source code resides on
|
||
network file systems, it may be desirable to cache bytecode files on
|
||
local disks.
|
||
|
||
|
||
Alternatives
|
||
============
|
||
|
||
The only other alternative proposed so far [1]_ seems to be to add a
|
||
-R flag to the interpreter to disable writing bytecode files
|
||
altogether. This proposal subsumes that. Adding a command-line
|
||
option is certainly possible, but is probably not sufficient, as the
|
||
interpreter's command line is not readily available during
|
||
installation.
|
||
|
||
|
||
Issues
|
||
======
|
||
|
||
- Interpretation of a module's __file__ attribute. I believe the
|
||
__file__ attribute of a module should reflect the true location of
|
||
the bytecode file. If people want to locate a module's source code,
|
||
they should use imp.find_module(module).
|
||
|
||
- Security - What if root has PYTHONBYTECODEBASE set? Yes, this can
|
||
present a security risk, but so can many other things the root user
|
||
does. The root user should probably not set PYTHONBYTECODEBASE
|
||
except during installation. Still, perhaps this problem can be
|
||
minimized. When running as root the interpreter should check to see
|
||
if PYTHONBYTECODEBASE refers to a directory which is writable by
|
||
anyone other than root. If so, it could raise an exception or
|
||
warning and set sys.pythonbytecodebase to None. Or, see the next
|
||
item.
|
||
|
||
- More security - What if PYTHONBYTECODEBASE refers to a general
|
||
directory (say, /tmp)? In this case, perhaps loading of a
|
||
preexisting bytecode file should occur only if the file is owned by
|
||
the current user or root. (Does this matter on Windows?)
|
||
|
||
|
||
Examples
|
||
========
|
||
|
||
In the examples which follow, the urllib source code resides in
|
||
/usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path but
|
||
is not writable by the current user.
|
||
|
||
- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists and
|
||
is valid. When urllib is imported, the contents of
|
||
/usr/lib/python2.3/urllib.pyc are used. The augmented directory is
|
||
not consulted. No other bytecode file is generated.
|
||
|
||
- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists,
|
||
but is out-of-date. When urllib is imported, the generated bytecode
|
||
file is written to urllib.pyc in the augmented directory.
|
||
Intermediate directories will be created as needed.
|
||
|
||
- The bytecode base is None. No urllib.pyc file is found. When
|
||
urllib is imported, no bytecode file is written.
|
||
|
||
- The bytecode base is /tmp. No urllib.pyc file is found. When
|
||
urllib is imported, the generated bytecode file is written to the
|
||
augmented directory, creating intermediate directories as needed.
|
||
|
||
- At startup, PYTHONBYTECODEBASE is /tmp/foobar, which does not exist.
|
||
A warning is emitted, sys.pythonbytecodebase is set to None and no
|
||
bytecode files are written during program execution unless
|
||
sys.pythonbytecodebase is later changed to refer to a valid,
|
||
writable directory.
|
||
|
||
- At startup, PYTHONBYTECODEBASE is set to /, which exists, but is not
|
||
writable by the current user. A warning is emitted,
|
||
sys.pythonbytecodebase is set to None and no bytecode files are
|
||
written during program execution unless sys.pythonbytecodebase is
|
||
later changed to refer to a valid, writable directory. Note that
|
||
even though the augmented directory constructed for a particular
|
||
bytecode file may be writable by the current user, what counts is
|
||
that the bytecode base directory itself is writable.
|
||
|
||
- At startup PYTHONBYTECODEBASE is set to the empty string.
|
||
sys.pythonbytecodebase is set to None. No warning is generated,
|
||
however. If no urllib.pyc file is found when urllib is imported, no
|
||
bytecode file is written.
|
||
|
||
In the Windows examples which follow, the urllib source code resides
|
||
in ``C:\PYTHON22\urllib.py``. ``C:\PYTHON22`` is in sys.path but is
|
||
not writable by the current user.
|
||
|
||
- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc``
|
||
exists and is valid. When urllib is imported, the contents of
|
||
``C:\PYTHON22\urllib.pyc`` are used. The augmented directory is not
|
||
consulted.
|
||
|
||
- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc``
|
||
exists, but is out-of-date. When urllib is imported, a new bytecode
|
||
file is written to the augmented directory. Intermediate
|
||
directories will be created as needed.
|
||
|
||
- At startuyp PYTHONBYTECODEBASE is set to ``TEMP`` and the current
|
||
working directory at application startup is ``H:\NET``. The
|
||
potential bytecode base is thus ``H:\NET\TEMP``. If this directory
|
||
exists and is writable by the current user, sys.pythonbytecodebase
|
||
will be set to that value. If not, a warning will be emitted and
|
||
sys.pythonbytecodebase will be set to None.
|
||
|
||
- The bytecode base is ``C:\TEMP``. No urllib.pyc file is found.
|
||
When urllib is imported, the generated bytecode file is written to
|
||
the augmented directory, creating intermediate directories as
|
||
needed.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] patch 602345, Option for not writing py.[co] files, Klose
|
||
(http://www.python.org/sf/602345)
|
||
|
||
.. [2] python-dev thread, Disable writing .py[co], Norwitz
|
||
(http://mail.python.org/pipermail/python-dev/2003-January/032270.html)
|
||
|
||
.. [3] Debian bug report, Mailman is writing to /usr in cron, Wegner
|
||
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)
|
||
|
||
.. [4] python-dev thread, Parallel pyc construction, Dubois
|
||
(http://mail.python.org/pipermail/python-dev/2003-January/032060.html)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|