Updated PEP 3147 with latest BDFL pronouncement.

This commit is contained in:
Barry Warsaw 2010-03-03 14:11:24 +00:00
parent 1f032d497c
commit d142f19eb9
3 changed files with 60 additions and 74 deletions

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

After

Width:  |  Height:  |  Size: 64 KiB

View File

@ -8,7 +8,7 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 2009-12-16
Python-Version: 3.2
Post-History: 2010-01-30, 2010-02-25
Post-History: 2010-01-30, 2010-02-25, 2010-03-03
Abstract
@ -75,7 +75,7 @@ Python release was added or removed from the distribution. Because of
the sheer number of packages available, this amount of work is
infeasible.
Even C extensions can be source compatible across multiple versions of
C extensions can be source compatible across multiple versions of
Python. Compiled extension modules are usually not compatible though,
and PEP 384 [7]_ has been proposed to address this by defining a
stable ABI for extension modules.
@ -101,10 +101,9 @@ Python's import machinery is extended to write and search for byte
code cache files in a single directory inside every Python package
directory. This directory will be called `__pycache__`.
Further, pyc files will contain a magic string that
differentiates the Python version they were compiled for. This allows
multiple byte compiled cache files to co-exist for a single Python
source file.
Further, pyc files will contain a magic string that differentiates the
Python version they were compiled for. This allows multiple byte
compiled cache files to co-exist for a single Python source file.
This scheme has the added benefit of reducing the clutter in a Python
package directory.
@ -112,8 +111,8 @@ package directory.
What would this look like in practice?
Let's say we have a Python package named `alpha` which contains a
sub-package name `beta`. The source directory layout might look like
this::
sub-package name `beta`. The source directory layout before byte
compilation might look like this::
alpha/
__init__.py
@ -144,6 +143,8 @@ following layout::
three.py
four.py
*Note: listing order may differ depending on the platform.*
Let's say that two new versions of Python are installed, one is Python
3.3 and another is Unladen Swallow. After byte compilation, the file
system would look like this::
@ -240,23 +241,29 @@ It's possible that the `foo.py` file somehow got removed, while
leaving the cached pyc file still on the file system. If the
`__pycache__/foo.<magic>.pyc` file exists, but the `foo.py` file used
to create it does not, Python will raise an `ImportError` when asked
to import foo. In other words, by default, Python will not support
importing a module unless the source file exists.
Python users who want to deploy sourceless imports are instructed to
create a custom importer that supports this behavior. Options include
importing pycs from a zip file, or locating pyc files where the py
source file would have existed. (See the Open Issues section for more
discussion.)
to import foo. In other words, Python will not import a pyc file from
the cache directory unless the source file exists.
Case 4: legacy pyc files
------------------------
Case 4: legacy pyc files and source-less imports
------------------------------------------------
Python will ignore all legacy pyc files when a source file exists next
to it. In other words, if a `foo.pyc` file exists next to the
`foo.py` file, the pyc file will be ignored in all cases
In order to continue to support source-less distributions though, if
the source file is missing, Python will import a lone pyc file if it
lives where the source file would have been.
Case 5: read-only file systems
------------------------------
When the source lives on a read-only file system, or the `__pycache__`
directory or pyc file cannot otherwise be written, all the same rules
apply.
Python will ignore all legacy pyc files. In other words, if a
`foo.pyc` file exists next to the `foo.py` file, it will be ignored in
all cases, including sourceless deployments. Python users wishing to
support this use case can create a custom importer.
Flow chart
@ -273,7 +280,7 @@ Magic identifiers
pyc files inside of the `__pycache__` directories contain a magic
identifier in their file names. These are mnemonic tags for the
actual magic numbers used by the importer. For example, for Python
actual magic numbers used by the importer. For example, in Python
3.2, we could use the hexlified [10]_ magic number as a unique
identifier::
@ -402,8 +409,8 @@ possible to backport this PEP. However, in Python 3.2 (and possibly
2.7), this behavior will be turned on by default, and in fact, it will
replace the old behavior. Backports will need to support the old
layout by default. We suggest supporting PEP 3147 through the use of
an environment variable called `$PYTHONCACHEDIR` or the command line
switch `-Xcachedir` to enable the feature.
an environment variable called `$PYTHONENABLECACHEDIR` or the command
line switch `-Xenablecachedir` to enable the feature.
Alternatives
@ -482,58 +489,40 @@ implementation remain in sync.
Open issues
===========
Byte code only packages
-----------------------
__pycache__ vs. __cachepy__
-----------------------------
Some users of Python distribute packages containing only the byte code
files (pyc). The use cases for this are to make it more difficult for
end-users to view the source code, and to reduce maintenance burdens
when end users casually edit the source files.
Minor point, but __pycache__ sorts after __init__.py alphabetically so
that might be a little jarring (see the directory layout examples
above). It seems that `ls(1)` on Linux at least also sorts the files
alphabetically, ignoring the leading underscores.
This PEP currently promote no default support for bytecode-only
packages. The primary motivator for this are that we can reduce stat
calls if the importer only looks for .py files, making Python start-up
and import faster.
Should we name the cache directory something like `__cachepy__` so
that it sorts before `__init__.py`? OTOH, many graphical file system
navigators sort directories before plain files anyway, so maybe it
doesn't matter.
The question is how to balance the requirements of bytecode-only users
with the more universally beneficial faster start up times for
requiring source files? Should all Python users pay the extra stat
call penalty in the general case for a minority use case by default?
Evidence shows that the extra stats can be fairly costly to start up
time.
Here are some sample `ls(1) -l` output. First, with `__pycache__`::
There are several ways out of this. Should we decide that it's
important enough to support bytecode-only packages, the semantics
would be as follows:
% ls -l
total 8
-rw-rw-r-- 1 user user 0 2010-03-03 08:29 alpha.py
drwxrwxr-x 2 user user 4096 2010-03-03 08:28 beta/
-rw-rw-r-- 1 user user 0 2010-03-03 08:28 __init__.py
-rw-rw-r-- 1 user user 0 2010-03-03 08:28 one.py
drwxrwxr-x 2 user user 4096 2010-03-03 08:28 __pycache__/
-rw-rw-r-- 1 user user 0 2010-03-03 08:28 two.py
* If there is a traditional, non-magic-tagged .pyc file in the
location where a .py file should be found, it will satisfy the
import.
* The `__file__` attribute of the module will point to the .pyc file.
* The `__cached__` attribute of the module will point to the .pyc file
too.
* The existence of a matching `__pycached__/foo.<magic>.pyc` file
without the source py file will *not* satisfy the import. This
means that if the source file is removed, the pyc file will be
ignored (unlike in today's implementation).
Now, with `__cachepy__`::
Other ways to satisfy the bytecode-only packagers requirements would
have less impact on the general Python user population, and include:
* Add a `-X` switch and/or environment variable to enable
the bytecode-only search algorithm.
* Let those who want more protection against casual py hackers package
their code in a zip file, which is supported today. Sub-options
include supporting pyc-only imports only in zip files, or still
requiring the py file for zip imports.
* Provide a custom importer supporting bytecode-only packages, which
would have to be enabled explicitly by the application. Either
Python would provide such a custom importer or it would be left to
third parties to implement.
* Add a marker to a package's `__init__.py` file to enable
bytecode-only imports for everything else in the package.
* Leave it to third-party tools such as py2exe [20]_ to build an
ecosystem and standards around source-less distributions.
% ls -l
total 8
-rw-rw-r-- 1 user user 0 2010-03-03 08:29 alpha.py
drwxrwxr-x 2 user user 4096 2010-03-03 08:28 beta/
drwxrwxr-x 2 user user 4096 2010-03-03 08:28 __cachepy__/
-rw-rw-r-- 1 user user 0 2010-03-03 08:28 __init__.py
-rw-rw-r-- 1 user user 0 2010-03-03 08:28 one.py
-rw-rw-r-- 1 user user 0 2010-03-03 08:28 two.py
__cached__ vs. __compiled__
@ -592,9 +581,6 @@ References
.. [18] importlib: http://docs.python.org/3.1/library/importlib.html
.. [19] http://mail.python.org/pipermail/python-dev/2010-March/098042.html
.. [20] py2exe: http://www.py2exe.org/
ACKNOWLEDGMENTS
===============