From ae271889987b46fde8bb4e7e45479353f6dd0aa1 Mon Sep 17 00:00:00 2001 From: Skip Montanaro Date: Thu, 23 Jan 2003 17:26:42 +0000 Subject: [PATCH] new pep - initial feedback from python-dev incorporated. --- pep-0304.txt | 289 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 289 insertions(+) create mode 100644 pep-0304.txt diff --git a/pep-0304.txt b/pep-0304.txt new file mode 100644 index 000000000..19137b0bf --- /dev/null +++ b/pep-0304.txt @@ -0,0 +1,289 @@ +PEP: 304 +Title: Controlling generation of bytecode files +Version: $Revision$ +Last-Modified: $Date$ +Author: Skip Montanaro +Status: Active +Type: Draft +Content-Type: text/x-rst +Created: 22-Jan-2003 +Post-History: + + +Abstract +======== + +This PEP outlines a mechanism for controlling the generation and +location of compiled Python bytecode files. This idea originally +arose as a patch request [1]_ and evolved into a discussion thread on +the python-dev mailing list [2]_. The introduction of an environment +variable will allow people installing Python or Python-based +third-party packages to control whether or not bytecode files +should be generated, and if so, where they should be written. + + +Proposal +======== + +Add a new environment variable, PYTHONBYTECODEBASE, to the mix of +environment variables which Python understands. Its interpretation +is: + +- If not present Python bytecode is generated in exactly the same way + as is currently done. sys.pythonbytecodebase is set to the root + directory (either / on Unix or the root directory of the startup + drive -- typically ``C:\`` -- on Windows). + +- If present and it refers to an existing directory, + sys.pythonbytecodebase is set to that directory and bytecode files + are written into a directory structure rooted at that location. + +- If present but empty, sys.pythonbytecodebase is set to None and + generation of bytecode files is suppressed altogether. + +- If present and it does not refer to an existing directory, a warning + is displayed, sys.pythonbytecodebase is set to None and generation + of bytecode files is suppressed altogether. + +After startup, all runtime references are to sys.pythonbytecodebase, +not the PYTHONBYTECODEBASE enbironment variable. sys.path is not +modified. + + +Glossary +-------- + +- "bytecode base" refers to the current setting of + sys.pythonbytecodebase. + +- "augmented directory" refers to the directory formed from the + bytecode base and the directory name of the source file. + +- PYTHONBYTECODEBASE refers to the environment variable when necessary + to distinguish it from "bytecode base". + +Locating bytecode files +----------------------- + +When the interpreter is searching for a module, it will use sys.path +as usual. However, when a possible bytecode file is considered, an +extra probe for a bytecode file may be made. First, a check is made +for the bytecode file using the directory in sys.path which holds the +source file (the current behavior). If a valid bytecode file is not +found there (either one does not exist or exists but is out-of-date) +and the bytecode base is not None, a second probe is made using the +directory in sys.path prefixed appropriately by the bytecode base. + +Writing bytecode files +---------------------- + +When the bytecode base is not None, a new bytecode file is written to +the appropriate augmented directory, never directly to a directory in +sys.path. + + +Defining augmented directories +------------------------------ + +Conceptually, the augmented directory for a bytecode file is the +directory in which the source file exists prefixed by the bytecode +base. In a Unix environment this would be: + + pcb = os.path.abspath(sys.pythonbytecodebase) + if sourcefile[0] == os.sep: sourcefile = sourcefile[1:] + augdir = os.path.join(pcb, os.path.dirname(sourcefile)) + +On Windows, which does not have a single-rooted directory tree, the +drive letter of the directory containing the source file is treated as +a directory component after removing the trailing colon. The +augmented directory is thus derived as + + pcb = os.path.abspath(sys.pythonbytecodebase) + drive, base = os.path.splitdrive(os.path.dirname(sourcefile)) + drive = drive[:-1] + if base[0] == "\\": base = base[1:] + augdir = os.path.join(pcb, drive, base) + +Fixing the location of the bytecode base +---------------------------------------- + +During program startup, the value of the PYTHONBYTECODEBASE +environment variable is made absolute, checked for validity and added +to the sys module, effectively: + + pcb = os.path.abspath(os.environ["PYTHONBYTECODEBASE"]) + try: + probe = os.path.join(pcb, "foo") + open(probe, "w") + os.unlink(probe) + sys.pythonbytecodebase = pcb + except IOError: + sys.pythonbytecodebase = None + +This allows the user to specify the bytecode base as a relative path, +but not have it subject to changes to the current working directory. +(I can't imagine you'd want it to move around during program +execution.) + +There is nothing special about sys.pythonbytecodebase. The user may +change it at runtime if she so chooses, but normally it will not be +modified. + + +Rationale +========= + +In many environments it is not possible for non-root users to write +into directories containing Python source files. Most of the time, +this is not a problem as Python source is generally byte compiled +during installation. However, there are situations where bytecode +files are either missing or need to be updated. If the directory +containing the source file is not writable by the current user a +performance penalty is incurred each time a program importing the +module is run. [3]_ Warning messages may also be generated in certain +circumstances. If the directory is writable, nearly simultaneous +attempts attempts to write the bytecode file by two separate processes +may occur, resulting in file corruption. [4]_ + +In environments with ramdisks available, it may be desirable for +performance reasons to write bytecode files to a directory on such a +disk. Similarly, in environments where Python source code resides on +network file systems, it may be desirable to cache bytecode files on +local disks. + + +Alternatives +============ + +The only other alternative proposed so far [1]_ seems to be to add a +-R flag to the interpreter to disable writing bytecode files +altogether. This proposal subsumes that. Adding a command-line +option is certainly possible, but is probably not sufficient, as the +interpreter's command line is not readily available during +installation. + + +Issues +====== + +- Interpretation of a module's __file__ attribute. I believe the + __file__ attribute of a module should reflect the true location of + the bytecode file. If people want to locate a module's source code, + they should use imp.find_module(module). + +- Security - What if root has PYTHONBYTECODEBASE set? Yes, this can + present a security risk, but so can many other things the root user + does. The root user should probably not set PYTHONBYTECODEBASE + except during installation. Still, perhaps this problem can be + minimized. When running as root the interpreter should check to see + if PYTHONBYTECODEBASE refers to a directory which is writable by + anyone other than root. If so, it could raise an exception or + warning and set sys.pythonbytecodebase to None. Or, see the next + item. + +- More security - What if PYTHONBYTECODEBASE refers to a general + directory (say, /tmp)? In this case, perhaps loading of a + preexisting bytecode file should occur only if the file is owned by + the current user or root. (Does this matter on Windows?) + + +Examples +======== + +In the examples which follow, the urllib source code resides in +/usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path but +is not writable by the current user. + +- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists and + is valid. When urllib is imported, the contents of + /usr/lib/python2.3/urllib.pyc are used. The augmented directory is + not consulted. No other bytecode file is generated. + +- The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists, + but is out-of-date. When urllib is imported, the generated bytecode + file is written to urllib.pyc in the augmented directory. + Intermediate directories will be created as needed. + +- The bytecode base is None. No urllib.pyc file is found. When + urllib is imported, no bytecode file is written. + +- The bytecode base is /tmp. No urllib.pyc file is found. When + urllib is imported, the generated bytecode file is written to the + augmented directory, creating intermediate directories as needed. + +- At startup, PYTHONBYTECODEBASE is /tmp/foobar, which does not exist. + A warning is emitted, sys.pythonbytecodebase is set to None and no + bytecode files are written during program execution unless + sys.pythonbytecodebase is later changed to refer to a valid, + writable directory. + +- At startup, PYTHONBYTECODEBASE is set to /, which exists, but is not + writable by the current user. A warning is emitted, + sys.pythonbytecodebase is set to None and no bytecode files are + written during program execution unless sys.pythonbytecodebase is + later changed to refer to a valid, writable directory. Note that + even though the augmented directory constructed for a particular + bytecode file may be writable by the current user, what counts is + that the bytecode base directory itself is writable. + +- At startup PYTHONBYTECODEBASE is set to the empty string. + sys.pythonbytecodebase is set to None. No warning is generated, + however. If no urllib.pyc file is found when urllib is imported, no + bytecode file is written. + +In the Windows examples which follow, the urllib source code resides +in ``C:\PYTHON22\urllib.py``. ``C:\\PYTHON22`` is in sys.path but is +not writable by the current user. + +- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc`` + exists and is valid. When urllib is imported, the contents of + ``C:\PYTHON22\urllib.pyc`` are used. The augmented directory is not + consulted. + +- The bytecode base is set to ``C:\TEMP``. ``C:\PYTHON22\urllib.pyc`` + exists, but is out-of-date. When urllib is imported, a new bytecode + file is written to the augmented directory. Intermediate + directories will be created as needed. + +- At startuyp PYTHONBYTECODEBASE is set to ``TEMP`` and the current + working directory at application startup is ``H:\NET``. The + potential bytecode base is thus ``H:\NET\TEMP``. If this directory + exists and is writable by the current user, sys.pythonbytecodebase + will be set to that value. If not, a warning will be emitted and + sys.pythonbytecodebase will be set to None. + +- The bytecode base is ``C:\TEMP``. No urllib.pyc file is found. + When urllib is imported, the generated bytecode file is written to + the augmented directory, creating intermediate directories as + needed. + +References +========== + +.. [1] patch 602345, Option for not writing py.[co] files, Klose + (http://www.python.org/sf/602345) + +.. [2] python-dev thread, Disable writing .py[co], Norwitz + (http://mail.python.org/pipermail/python-dev/2003-January/032270.html) + +.. [3] Debian bug report, Mailman is writing to /usr in cron, Wegner + (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111) + +.. [4] python-dev thread, Parallel pyc construction, Dubois + (http://mail.python.org/pipermail/python-dev/2003-January/032060.html) + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: