python-peps/pep-0304.txt

319 lines
12 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 304
Title: Controlling Generation of Bytecode Files
Version: $Revision$
Last-Modified: $Date$
Author: Skip Montanaro
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Jan-2003
Post-History: 27-Jan-2003
Abstract
========
This PEP outlines a mechanism for controlling the generation and
location of compiled Python bytecode files. This idea originally
arose as a patch request [1]_ and evolved into a discussion thread on
the python-dev mailing list [2]_. The introduction of an environment
variable will allow people installing Python or Python-based
third-party packages to control whether or not bytecode files should
be generated at installation time, and if so, where they should be
written. It will also allow users to control whether or not bytecode
files should be generated at application run-time, and if so, where
they should be written.
Proposal
========
Add a new environment variable, PYTHONBYTECODEBASE, to the mix of
environment variables which Python understands. Its interpretation
is:
- If not present Python bytecode is generated in exactly the same way
as is currently done. sys.bytecodebase is set to the root
directory (either / on Unix or the root directory of the startup
drive -- typically ``C:\`` -- on Windows).
- If present and it refers to an existing directory,
sys.bytecodebase is set to that directory and bytecode files
are written into a directory structure rooted at that location.
- If present but empty, sys.bytecodebase is set to None and
generation of bytecode files is suppressed altogether.
- If present and it does not refer to an existing directory, a warning
is displayed, sys.bytecodebase is set to None and generation
of bytecode files is suppressed altogether.
After startup, all runtime references are to sys.bytecodebase,
not the PYTHONBYTECODEBASE environment variable. sys.path is not
modified.
Note that this PEP is explicitly *not* about providing
module-by-module or directory-by-directory control over the
disposition of bytecode files.
Glossary
--------
- "bytecode base" refers to the current setting of
sys.bytecodebase.
- "augmented directory" refers to the directory formed from the
bytecode base and the directory name of the source file.
- PYTHONBYTECODEBASE refers to the environment variable when necessary
to distinguish it from "bytecode base".
Locating bytecode files
-----------------------
When the interpreter is searching for a module, it will use sys.path
as usual. However, when a possible bytecode file is considered, an
extra probe for a bytecode file may be made. First, a check is made
for the bytecode file using the directory in sys.path which holds the
source file (the current behavior). If a valid bytecode file is not
found there (either one does not exist or exists but is out-of-date)
and the bytecode base is not None, a second probe is made using the
directory in sys.path prefixed appropriately by the bytecode base.
Writing bytecode files
----------------------
When the bytecode base is not None, a new bytecode file is written to
the appropriate augmented directory, never directly to a directory in
sys.path.
Defining augmented directories
------------------------------
Conceptually, the augmented directory for a bytecode file is the
directory in which the source file exists prefixed by the bytecode
base. In a Unix environment this would be::
pcb = os.path.abspath(sys.bytecodebase)
if sourcefile[0] == os.sep: sourcefile = sourcefile[1:]
augdir = os.path.join(pcb, os.path.dirname(sourcefile))
On Windows, which does not have a single-rooted directory tree, the
drive letter of the directory containing the source file is treated as
a directory component after removing the trailing colon. The
augmented directory is thus derived as ::
pcb = os.path.abspath(sys.bytecodebase)
drive, base = os.path.splitdrive(os.path.dirname(sourcefile))
drive = drive[:-1]
if base[0] == "\\": base = base[1:]
augdir = os.path.join(pcb, drive, base)
Fixing the location of the bytecode base
----------------------------------------
During program startup, the value of the PYTHONBYTECODEBASE
environment variable is made absolute, checked for validity and added
to the sys module, effectively::
pcb = os.path.abspath(os.environ["PYTHONBYTECODEBASE"])
try:
probe = os.path.join(pcb, "foo")
open(probe, "w")
os.unlink(probe)
sys.bytecodebase = pcb
except IOError:
sys.bytecodebase = None
This allows the user to specify the bytecode base as a relative path,
but not have it subject to changes to the current working directory.
(I can't imagine you'd want it to move around during program
execution.)
There is nothing special about sys.bytecodebase. The user may
change it at runtime if she so chooses, but normally it will not be
modified.
Rationale
=========
In many environments it is not possible for non-root users to write
into directories containing Python source files. Most of the time,
this is not a problem as Python source is generally byte compiled
during installation. However, there are situations where bytecode
files are either missing or need to be updated. If the directory
containing the source file is not writable by the current user a
performance penalty is incurred each time a program importing the
module is run. [3]_ Warning messages may also be generated in certain
circumstances. If the directory is writable, nearly simultaneous
attempts attempts to write the bytecode file by two separate processes
may occur, resulting in file corruption. [4]_
In environments with RAM disks available, it may be desirable for
performance reasons to write bytecode files to a directory on such a
disk. Similarly, in environments where Python source code resides on
network file systems, it may be desirable to cache bytecode files on
local disks.
Alternatives
============
The only other alternative proposed so far [1]_ seems to be to add a
-R flag to the interpreter to disable writing bytecode files
altogether. This proposal subsumes that. Adding a command-line
option is certainly possible, but is probably not sufficient, as the
interpreter's command line is not readily available during
installation.
Issues
======
- Interpretation of a module's __file__ attribute. I believe the
__file__ attribute of a module should reflect the true location of
the bytecode file. If people want to locate a module's source code,
they should use imp.find_module(module).
- Security - What if root has PYTHONBYTECODEBASE set? Yes, this can
present a security risk, but so can many other things the root user
does. The root user should probably not set PYTHONBYTECODEBASE
except during installation. Still, perhaps this problem can be
minimized. When running as root the interpreter should check to see
if PYTHONBYTECODEBASE refers to a directory which is writable by
anyone other than root. If so, it could raise an exception or
warning and set sys.bytecodebase to None. Or, see the next
item.
- More security - What if PYTHONBYTECODEBASE refers to a general
directory (say, /tmp)? In this case, perhaps loading of a
preexisting bytecode file should occur only if the file is owned by
the current user or root. (Does this matter on Windows?)
- The interaction of this PEP with import hooks has not been
considered yet. In fact, the best way to implement this idea might
be as an import hook. See PEP 302. [5]_
Examples
========
In the examples which follow, the urllib source code resides in
/usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path but
is not writable by the current user.
- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists and
is valid. When urllib is imported, the contents of
/usr/lib/python2.3/urllib.pyc are used. The augmented directory is
not consulted. No other bytecode file is generated.
- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists,
but is out-of-date. When urllib is imported, the generated bytecode
file is written to urllib.pyc in the augmented directory which has
the value /tmp/usr/lib/python2.3. Intermediate directories will be
created as needed.
- The bytecode base is None. No urllib.pyc file is found. When
urllib is imported, no bytecode file is written.
- The bytecode base is /tmp. No urllib.pyc file is found. When
urllib is imported, the generated bytecode file is written to the
augmented directory which has the value /tmp/usr/lib/python2.3.
Intermediate directories will be created as needed.
- At startup, PYTHONBYTECODEBASE is /tmp/foobar, which does not exist.
A warning is emitted, sys.bytecodebase is set to None and no
bytecode files are written during program execution unless
sys.bytecodebase is later changed to refer to a valid,
writable directory.
- At startup, PYTHONBYTECODEBASE is set to /, which exists, but is not
writable by the current user. A warning is emitted,
sys.bytecodebase is set to None and no bytecode files are
written during program execution unless sys.bytecodebase is
later changed to refer to a valid, writable directory. Note that
even though the augmented directory constructed for a particular
bytecode file may be writable by the current user, what counts is
that the bytecode base directory itself is writable.
- At startup PYTHONBYTECODEBASE is set to the empty string.
sys.bytecodebase is set to None. No warning is generated, however.
If no urllib.pyc file is found when urllib is imported, no bytecode
file is written.
In the Windows examples which follow, the urllib source code resides
in ``C:\PYTHON22\urllib.py``. ``C:\PYTHON22`` is in sys.path but is
not writable by the current user.
- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc``
exists and is valid. When urllib is imported, the contents of
``C:\PYTHON22\urllib.pyc`` are used. The augmented directory is not
consulted.
- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc``
exists, but is out-of-date. When urllib is imported, a new bytecode
file is written to the augmented directory which has the value
``C:\TEMP\C\PYTHON22``. Intermediate directories will be created as
needed.
- At startup PYTHONBYTECODEBASE is set to ``TEMP`` and the current
working directory at application startup is ``H:\NET``. The
potential bytecode base is thus ``H:\NET\TEMP``. If this directory
exists and is writable by the current user, sys.bytecodebase will be
set to that value. If not, a warning will be emitted and
sys.bytecodebase will be set to None.
- The bytecode base is ``C:\TEMP``. No urllib.pyc file is found.
When urllib is imported, the generated bytecode file is written to
the augmented directory which has the value ``C:\TEMP\C\PYTHON22``.
Intermediate directories will be created as needed.
Implementation
==============
See the patch on Sourceforge. [6]_
References
==========
.. [1] patch 602345, Option for not writing py.[co] files, Klose
(http://www.python.org/sf/602345)
.. [2] python-dev thread, Disable writing .py[co], Norwitz
(http://mail.python.org/pipermail/python-dev/2003-January/032270.html)
.. [3] Debian bug report, Mailman is writing to /usr in cron, Wegner
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)
.. [4] python-dev thread, Parallel pyc construction, Dubois
(http://mail.python.org/pipermail/python-dev/2003-January/032060.html)
.. [5] PEP 302, New Import Hooks, van Rossum and Moore
(http://www.python.org/dev/peps/pep-0302.html)
.. [6] patch 677103, PYTHONBYTECODEBASE patch (PEP 304), Montanaro
(http://www.python.org/sf/677103)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: