2010-01-29 15:37:07 -05:00
|
|
|
|
PEP: 3147
|
|
|
|
|
Title: PYC Repository Directories
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Barry Warsaw <barry@python.org>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 2009-12-16
|
|
|
|
|
Python-Version: 3.2
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This PEP describes an extension to Python's import mechanism which
|
|
|
|
|
improves sharing of Python source code files among multiple installed
|
|
|
|
|
different versions of the Python interpreter. It does this by
|
|
|
|
|
allowing many different byte compilation files (.pyc files) to be
|
|
|
|
|
co-located with the Python source file (.py file). The extension
|
|
|
|
|
described here can also be used to support different Python
|
|
|
|
|
compilation caches, such as JIT output that may be produced by an
|
|
|
|
|
Unladen Swallow [1]_ enabled C Python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
Linux distributions such as Ubuntu [2]_ and Debian [3]_ provide more
|
|
|
|
|
than one Python version at the same time to their users. For example,
|
|
|
|
|
Ubuntu 9.10 Karmic Koala can install Python 2.5, 2.6, and 3.1, with
|
|
|
|
|
Python 2.6 being the default.
|
|
|
|
|
|
|
|
|
|
In order to ease the burden on operating system packagers for these
|
|
|
|
|
distributions, the distribution packages do not contain Python version
|
|
|
|
|
numbers [4]_; they are shared across all Python versions installed on
|
|
|
|
|
the system. Putting Python version numbers in the packages would be a
|
|
|
|
|
maintenance nightmare, since all the packages - *and their
|
|
|
|
|
dependencies* - would have to be updated every time a new Python
|
|
|
|
|
release was added or removed from the distribution. Because of the
|
|
|
|
|
sheer number of packages available, this amount of work is infeasible.
|
|
|
|
|
|
|
|
|
|
For pure Python modules, sharing is possible because upstream
|
|
|
|
|
maintainers typically support multiple versions of Python in a source
|
|
|
|
|
compatible way. In practice though, it is well known that pyc files
|
|
|
|
|
are not compatible across Python major releases. A reading of
|
|
|
|
|
import.c [5]_ in the Python source code proves that within recent
|
|
|
|
|
memory, every new CPython major release has bumped the pyc magic
|
|
|
|
|
number.
|
|
|
|
|
|
|
|
|
|
Even C extensions can be source compatible across multiple versions of
|
|
|
|
|
Python. Compiled extension modules are usually not compatible though,
|
|
|
|
|
and PEP 384 [6]_ has been proposed to address this by defining a
|
|
|
|
|
stable ABI for extension modules.
|
|
|
|
|
|
|
|
|
|
Because the distributions cannot share pyc files, elaborate mechanisms
|
|
|
|
|
have been developed to put the resulting pyc files in non-shared
|
|
|
|
|
locations while the source code is still shared. Examples include the
|
|
|
|
|
symlink-based Debian regimes python-support [7]_ and python-central
|
|
|
|
|
[8]_. These approaches make for much more complicated, fragile,
|
|
|
|
|
inscrutable, and fragmented policies for delivering Python
|
|
|
|
|
applications to a wide range of users. Arguably more users get Python
|
|
|
|
|
from their operating system vendor than from upstream tarballs. Thus,
|
|
|
|
|
solving this pyc sharing problem for CPython is a high priority for
|
|
|
|
|
such vendors.
|
|
|
|
|
|
|
|
|
|
This PEP proposes a solution to this problem.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Proposal
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
Python's import machinery is extended to search for byte code cache
|
|
|
|
|
files in a directory co-located with the source file, but with an
|
|
|
|
|
extension 'pyr'. The pyr directory contains individual files with the
|
|
|
|
|
cached byte compilation of the source code, identical to current pyc
|
|
|
|
|
and pyo files. The files inside the pyr directory retain their file
|
|
|
|
|
extensions, but the base name is replaced by the hexlified [10]_ magic
|
|
|
|
|
number of the Python version the byte code is compatible with.
|
|
|
|
|
|
|
|
|
|
The file extension pyr was chosen because 'r' is a mnemonic for
|
|
|
|
|
'repository', and there appears to be no prior uses of the extension
|
|
|
|
|
[9]_.
|
|
|
|
|
|
|
|
|
|
For example, a module `foo` with source code in `foo.py` and byte
|
|
|
|
|
compiled with Python 2.5, Python 2.6, Python 2.6 `-O`, Python 2.6
|
|
|
|
|
`-U`, and Python 3.1 would have the following file system layout::
|
|
|
|
|
|
|
|
|
|
foo.py
|
|
|
|
|
foo.pyr/
|
|
|
|
|
f2b30a0d.pyc # Python 2.5
|
|
|
|
|
f2d10a0d.pyc # Python 2.6
|
|
|
|
|
f2d10a0d.pyo # Python 2.6 -O
|
|
|
|
|
f2d20a0d.pyc # Python 2.6 -U
|
|
|
|
|
0c4f0a0d.pyc # Python 3.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Python behavior
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
When Python searches for a module to import (say `foo`), it may find
|
|
|
|
|
one of several situations. As per current Python rules, the term
|
|
|
|
|
"matching pyc" means that the magic number matches the current
|
|
|
|
|
interpreter's magic number, and the source file is not newer than the
|
|
|
|
|
`pyc` file.
|
|
|
|
|
|
|
|
|
|
When Python finds a `foo.py` file for which no `foo.pyc` file or
|
|
|
|
|
`foo.pyr` directory exists, Python will by default load the `foo.py`
|
|
|
|
|
file and write a `foo.pyc` file next to the source file. This is
|
|
|
|
|
unchanged from current behavior.
|
|
|
|
|
|
|
|
|
|
When the Python executable is given a `-R` flag, or the environment
|
|
|
|
|
variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
|
|
|
|
|
directory and write a `pyc` file to that directory with the hexlified
|
|
|
|
|
magic number as the base name.
|
|
|
|
|
|
|
|
|
|
If during import, Python finds an existing `pyc` file but no `pyr`
|
|
|
|
|
directory, and the `$PYTHONPYR` environment variable is not set, then
|
|
|
|
|
the `pyc` file is loaded as normal and no `pyr` directory is created.
|
|
|
|
|
|
|
|
|
|
If during import, Python finds a `pyr` directory with a matching `pyc`
|
|
|
|
|
file, *regardless of whether `$PYTHONPYR` is set or not*, then
|
|
|
|
|
`foo.pyr/<magic>.pyc` is loaded and import completes successfully.
|
|
|
|
|
Thus a matching `pyc` file inside a `pyr` directory always takes
|
|
|
|
|
precedence over a sibling `pyc` file.
|
|
|
|
|
|
|
|
|
|
If during import, Python finds a `pyr` directory that does not contain
|
|
|
|
|
a matching `pyc` file, and no sibling `foo.pyc` file exists, Python
|
|
|
|
|
will load the source file and write a sibling `foo.pyc` file, unless
|
|
|
|
|
the `-R` flag is given in which case a `foo.pyr/<magic>.pyc` file will
|
|
|
|
|
be written.
|
|
|
|
|
|
|
|
|
|
Here is a flowchart illustrating the rules.
|
|
|
|
|
|
|
|
|
|
.. image:: pep-3147-1.png
|
|
|
|
|
:scale: 75
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Effects on non-conforming Python versions
|
|
|
|
|
=========================================
|
|
|
|
|
|
|
|
|
|
Python implementations which don't know anything about `pyr`
|
|
|
|
|
directories will ignore them. This means that they will read and
|
|
|
|
|
write `pyc` files as usual. A conforming implementation will still
|
|
|
|
|
prefer any existing `foo.pyr/<magic>.pyc` file over an existing
|
|
|
|
|
sibling `pyc` file.
|
|
|
|
|
|
|
|
|
|
The one possible conflicting state is where a sibling `pyc` file
|
|
|
|
|
exists, but its magic number does not match.
|
|
|
|
|
|
|
|
|
|
In the default case, when Python finds a `pyc` file with a
|
|
|
|
|
non-matching magic number, it simply overwrites the `pyc` file with
|
|
|
|
|
the new byte code and magic number. In the absence of the `-R` flag,
|
|
|
|
|
this remains unchanged. When the `-R` flag was given, the
|
|
|
|
|
non-matching sibling `pyc` file is ignored - it is neither removed nor
|
|
|
|
|
overwritten - and a `foo.pyr/<magic>.pyc` file is written instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implementation strategy
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
This feature is targeted for Python 3.2, solving the problem for those
|
|
|
|
|
and all future versions. It may be back-ported to Python 2.7.
|
|
|
|
|
Vendors are free to backport the changes to earlier distributions as
|
|
|
|
|
they see fit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alternatives
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
PEP 304
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
There is some overlap between the goals of this PEP and PEP 304 [12]_,
|
|
|
|
|
which has been withdrawn. However PEP 304 would allow a user to
|
|
|
|
|
create a shadow file system hierarchy in which to store `pyc` files.
|
|
|
|
|
This concept of a shadow hierarchy for `pyc` files could be used to
|
|
|
|
|
satisfy the aims of this PEP. Although the PEP 304 does not indicate
|
|
|
|
|
why it was withdrawn, shadow directories have a number of problems.
|
|
|
|
|
The location of the shadow `pyc` files would not be easily discovered
|
|
|
|
|
and would depend on the proper and consistent use of the
|
|
|
|
|
`$PYTHONBYTECODE` environment variable both by the system and by end
|
|
|
|
|
users. There are also global implications, meaning that while the
|
|
|
|
|
system might want to shadow `pyc` files, users might not want to, but
|
|
|
|
|
the PEP defines only an all-or-nothing approach.
|
|
|
|
|
|
|
|
|
|
As an example of the problem, a common (though fragile) Python idiom
|
|
|
|
|
for locating data files is to do something like this::
|
|
|
|
|
|
|
|
|
|
from os import dirname, join
|
|
|
|
|
import foo.bar
|
|
|
|
|
data_file = join(dirname(foo.bar.__file__), 'my.dat')
|
|
|
|
|
|
|
|
|
|
This would be problematic since `foo.bar.__file__` will give the
|
|
|
|
|
location of the `pyc` file in the shadow directory, and it may not be
|
|
|
|
|
possible to find the `my.dat` file relative to the source directory
|
|
|
|
|
from there.
|
|
|
|
|
|
2010-01-30 19:37:48 -05:00
|
|
|
|
On the other hand, this PEP keeps all byte code artifacts co-located
|
2010-01-29 15:37:07 -05:00
|
|
|
|
with the source file. Some adjustment will have to be made for the
|
|
|
|
|
fact that the `pyc` file lives in a subdirectory. For example, in
|
|
|
|
|
current Python, when you import a module, its `__file__` attribute
|
|
|
|
|
points to its `pyc` file. A package's `__file__` points to the `pyc`
|
|
|
|
|
file for its `__init__.py`. E.g.::
|
|
|
|
|
|
|
|
|
|
>>> import foo
|
|
|
|
|
>>> foo.__file__
|
|
|
|
|
'foo.pyc'
|
|
|
|
|
# baz is a package
|
|
|
|
|
>>> import baz
|
|
|
|
|
>>> baz.__file__
|
|
|
|
|
'baz/__init__.pyc'
|
|
|
|
|
|
2010-01-30 19:38:55 -05:00
|
|
|
|
The implementation of this PEP would have to ensure that the same
|
2010-01-29 15:37:07 -05:00
|
|
|
|
directory level is returned from `__file__` as it does without the
|
|
|
|
|
`pyr` directory, so that the common idiom above continues to work::
|
|
|
|
|
|
|
|
|
|
>>> import foo
|
|
|
|
|
>>> foo.__file__
|
|
|
|
|
'foo.pyr'
|
|
|
|
|
# baz is a package
|
|
|
|
|
>>> import baz
|
|
|
|
|
>>> baz.__file__
|
|
|
|
|
'baz/__init__.pyr'
|
|
|
|
|
|
|
|
|
|
Note that some existing Python code only checks for `.py` and `.pyc`
|
|
|
|
|
file extensions (and possibly `.pyo`). These would have to be
|
|
|
|
|
extended to also check for `.pyr` extensions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fat byte compilation files
|
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
|
|
An earlier version of this PEP described "fat" Python byte code files.
|
|
|
|
|
These files would contain the equivalent of multiple `pyc` files in a
|
|
|
|
|
single `pyf` file, with a lookup table keyed off the appropriate magic
|
|
|
|
|
number. This was an extensible file format so that the first 5
|
|
|
|
|
parallel Python implementations could be supported fairly efficiently,
|
|
|
|
|
but with extension lookup tables available to scale `pyf` byte code
|
|
|
|
|
objects as large as necessary.
|
|
|
|
|
|
|
|
|
|
The fat byte compilation files were fairly complex, so the current
|
|
|
|
|
simplification of using directories was suggested.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Multiple file extensions
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
The PEP author also considered an approach where multiple thin byte
|
|
|
|
|
compiled files lived in the same place, but used different file
|
|
|
|
|
extensions to designate the Python version. E.g. foo.pyc25,
|
|
|
|
|
foo.pyc26, foo.pyc31 etc. This was rejected because of the clutter
|
|
|
|
|
involved in writing so many different files. The multiple extension
|
|
|
|
|
approach makes it more difficult (and an ongoing task) to update any
|
|
|
|
|
tools that are dependent on the file extension.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Open questions
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
* Are there any concurrency issues added by this PEP, above those that
|
|
|
|
|
already exist? For example, what if two Python processes attempt to
|
|
|
|
|
write the same `<magic>.pyc` file? Is that any different than two
|
|
|
|
|
Python processes trying to write to the same `foo.pyc` file?
|
|
|
|
|
Current thinking is that there isn't since the exclusive open
|
|
|
|
|
mechanism currently used, will still be used to open `pyc` files
|
|
|
|
|
inside a `pyr` directory.
|
|
|
|
|
|
|
|
|
|
* How do the imp [13]_ and importlib [14]_ modules need to be updated
|
|
|
|
|
to conform to the `pyr` directories?
|
|
|
|
|
|
|
|
|
|
* What about `py` source files that are compatible with most but not
|
|
|
|
|
all installed Python versions. We might need a way to say "this py
|
|
|
|
|
file should be hidden from Python versions X.Y or earlier". There
|
|
|
|
|
are three options:
|
|
|
|
|
|
|
|
|
|
- Use file system tricks to only share py files that are actually
|
|
|
|
|
sharable in all installed Python versions (e.g. different search
|
|
|
|
|
directories for Python X.Y and Python X.Z).
|
|
|
|
|
- Introduce Python syntax that is legal before __future__ imports
|
|
|
|
|
and is evaluated to determine if the py file is compatible,
|
|
|
|
|
raising an `ImportError('no module named foo')` if not.
|
|
|
|
|
- Add an optional metadata file co-located with the py file that
|
|
|
|
|
declares which Python versions it is compatible with.
|
|
|
|
|
|
|
|
|
|
How does this requirement interact with PEP 382 namespace packages [15]_?
|
|
|
|
|
|
|
|
|
|
* Are there any opportunities for also sharing extension modules
|
|
|
|
|
(.so/.dll files) in a `pyr` directory?
|
|
|
|
|
|
|
|
|
|
* Would a moratorium on byte code changes, similar to the language
|
|
|
|
|
moratorium described in PEP 3003 [16]_ be a better approach to
|
|
|
|
|
pursue, and would that solve the problem for vendors? At the time
|
|
|
|
|
of this writing, PEP 3003 is silent on the issue.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference implementation
|
|
|
|
|
========================
|
|
|
|
|
|
|
|
|
|
TBD
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [1] PEP 3146
|
|
|
|
|
|
|
|
|
|
.. [2] Ubuntu: <http://www.ubuntu.com>
|
|
|
|
|
|
|
|
|
|
.. [3] Debian: <http://www.debian.org>
|
|
|
|
|
|
|
|
|
|
.. [4] Debian Python Policy:
|
|
|
|
|
http://www.debian.org/doc/packaging-manuals/python-policy/
|
|
|
|
|
|
|
|
|
|
.. [5] import.c:
|
|
|
|
|
http://svn.python.org/view/python/branches/py3k/Python/import.c?view=markup
|
|
|
|
|
|
|
|
|
|
.. [6] PEP 384
|
|
|
|
|
|
|
|
|
|
.. [7] python-support:
|
|
|
|
|
http://wiki.debian.org/DebianPythonFAQ#Whatispython-support.3F
|
|
|
|
|
|
|
|
|
|
.. [8] python-central:
|
|
|
|
|
http://wiki.debian.org/DebianPythonFAQ#Whatispython-central.3F
|
|
|
|
|
|
|
|
|
|
.. [9] http://www.filesuffix.com/?m=search&e=pyr&submit=Search
|
|
|
|
|
|
|
|
|
|
.. [10] binascii.hexlify():
|
|
|
|
|
http://www.python.org/doc/current/library/binascii.html#binascii.hexlify
|
|
|
|
|
|
|
|
|
|
.. [11] The marshal module:
|
|
|
|
|
http://www.python.org/doc/current/library/marshal.html
|
|
|
|
|
|
|
|
|
|
.. [12] PEP 304:
|
|
|
|
|
|
|
|
|
|
.. [13] imp: http://www.python.org/doc/current/library/imp.html
|
|
|
|
|
|
|
|
|
|
.. [14] importlib: http://docs.python.org/3.1/library/importlib.html
|
|
|
|
|
|
|
|
|
|
.. [15] PEP 382
|
|
|
|
|
|
|
|
|
|
.. [16] PEP 3003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ACKNOWLEDGMENTS
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
Barry Warsaw's original idea was for fat Python byte code files.
|
|
|
|
|
Martin von Loewis reviewed an early draft of the PEP and suggested the
|
|
|
|
|
simplification to store traditional `pyc` and `pyo` files in a
|
|
|
|
|
directory. Many other people reviewed early versions of this PEP and
|
|
|
|
|
provided useful feedback including:
|
|
|
|
|
|
|
|
|
|
* David Malcolm
|
|
|
|
|
* Josselin Mouette
|
|
|
|
|
* Matthias Klose
|
|
|
|
|
* Michael Hudson
|
|
|
|
|
* Michael Vogt
|
|
|
|
|
* Piotr Ożarowski
|
|
|
|
|
* Scott Kitterman
|
|
|
|
|
* Toshio Kuratomi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
2010-02-03 12:46:28 -05:00
|
|
|
|
Notes from python-dev
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
The python-dev discussion has been very fruitful. Here are some
|
|
|
|
|
in-progress notes from that thread which still needs to be reconciled
|
|
|
|
|
into the body of the PEP.
|
|
|
|
|
|
|
|
|
|
* Rarity of the use of this feature. Important for distros but
|
|
|
|
|
probably much less so for individual users (who may never even see
|
|
|
|
|
these things).
|
|
|
|
|
* Sibling vs folder-per-folder. Do performance measurements. Do stat
|
|
|
|
|
calls outweigh everything else? We need to do an analysis of the
|
|
|
|
|
current implementation as a baseline.
|
|
|
|
|
* Magic numbers in file names are magical; no one really knows the
|
|
|
|
|
mappings. Maybe we should use magic strings (with a lookup table?),
|
|
|
|
|
e.g. 'foo.cython-27.py'
|
|
|
|
|
* Modules should unambiguously name their __source__ and __cache__
|
|
|
|
|
file names. __file__ is ambiguous.
|
|
|
|
|
|
|
|
|
|
|
2010-01-29 15:37:07 -05:00
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|