diff --git a/pep-3147-1.dia b/pep-3147-1.dia index 5acc52f68..ea085741a 100644 Binary files a/pep-3147-1.dia and b/pep-3147-1.dia differ diff --git a/pep-3147-1.png b/pep-3147-1.png index a3c2e5153..930910adf 100644 Binary files a/pep-3147-1.png and b/pep-3147-1.png differ diff --git a/pep-3147.txt b/pep-3147.txt index d6fcab16d..a8745f0fa 100644 --- a/pep-3147.txt +++ b/pep-3147.txt @@ -8,7 +8,7 @@ Type: Standards Track Content-Type: text/x-rst Created: 2009-12-16 Python-Version: 3.2 -Post-History: 2010-01-30, 2010-02-25 +Post-History: 2010-01-30, 2010-02-25, 2010-03-03 Abstract @@ -75,7 +75,7 @@ Python release was added or removed from the distribution. Because of the sheer number of packages available, this amount of work is infeasible. -Even C extensions can be source compatible across multiple versions of +C extensions can be source compatible across multiple versions of Python. Compiled extension modules are usually not compatible though, and PEP 384 [7]_ has been proposed to address this by defining a stable ABI for extension modules. @@ -101,10 +101,9 @@ Python's import machinery is extended to write and search for byte code cache files in a single directory inside every Python package directory. This directory will be called `__pycache__`. -Further, pyc files will contain a magic string that -differentiates the Python version they were compiled for. This allows -multiple byte compiled cache files to co-exist for a single Python -source file. +Further, pyc files will contain a magic string that differentiates the +Python version they were compiled for. This allows multiple byte +compiled cache files to co-exist for a single Python source file. This scheme has the added benefit of reducing the clutter in a Python package directory. @@ -112,8 +111,8 @@ package directory. What would this look like in practice? Let's say we have a Python package named `alpha` which contains a -sub-package name `beta`. The source directory layout might look like -this:: +sub-package name `beta`. The source directory layout before byte +compilation might look like this:: alpha/ __init__.py @@ -144,6 +143,8 @@ following layout:: three.py four.py +*Note: listing order may differ depending on the platform.* + Let's say that two new versions of Python are installed, one is Python 3.3 and another is Unladen Swallow. After byte compilation, the file system would look like this:: @@ -240,23 +241,29 @@ It's possible that the `foo.py` file somehow got removed, while leaving the cached pyc file still on the file system. If the `__pycache__/foo..pyc` file exists, but the `foo.py` file used to create it does not, Python will raise an `ImportError` when asked -to import foo. In other words, by default, Python will not support -importing a module unless the source file exists. - -Python users who want to deploy sourceless imports are instructed to -create a custom importer that supports this behavior. Options include -importing pycs from a zip file, or locating pyc files where the py -source file would have existed. (See the Open Issues section for more -discussion.) +to import foo. In other words, Python will not import a pyc file from +the cache directory unless the source file exists. -Case 4: legacy pyc files ------------------------- +Case 4: legacy pyc files and source-less imports +------------------------------------------------ + +Python will ignore all legacy pyc files when a source file exists next +to it. In other words, if a `foo.pyc` file exists next to the +`foo.py` file, the pyc file will be ignored in all cases + +In order to continue to support source-less distributions though, if +the source file is missing, Python will import a lone pyc file if it +lives where the source file would have been. + + +Case 5: read-only file systems +------------------------------ + +When the source lives on a read-only file system, or the `__pycache__` +directory or pyc file cannot otherwise be written, all the same rules +apply. -Python will ignore all legacy pyc files. In other words, if a -`foo.pyc` file exists next to the `foo.py` file, it will be ignored in -all cases, including sourceless deployments. Python users wishing to -support this use case can create a custom importer. Flow chart @@ -273,7 +280,7 @@ Magic identifiers pyc files inside of the `__pycache__` directories contain a magic identifier in their file names. These are mnemonic tags for the -actual magic numbers used by the importer. For example, for Python +actual magic numbers used by the importer. For example, in Python 3.2, we could use the hexlified [10]_ magic number as a unique identifier:: @@ -402,8 +409,8 @@ possible to backport this PEP. However, in Python 3.2 (and possibly 2.7), this behavior will be turned on by default, and in fact, it will replace the old behavior. Backports will need to support the old layout by default. We suggest supporting PEP 3147 through the use of -an environment variable called `$PYTHONCACHEDIR` or the command line -switch `-Xcachedir` to enable the feature. +an environment variable called `$PYTHONENABLECACHEDIR` or the command +line switch `-Xenablecachedir` to enable the feature. Alternatives @@ -482,58 +489,40 @@ implementation remain in sync. Open issues =========== -Byte code only packages ------------------------ +__pycache__ vs. __cachepy__ +----------------------------- -Some users of Python distribute packages containing only the byte code -files (pyc). The use cases for this are to make it more difficult for -end-users to view the source code, and to reduce maintenance burdens -when end users casually edit the source files. +Minor point, but __pycache__ sorts after __init__.py alphabetically so +that might be a little jarring (see the directory layout examples +above). It seems that `ls(1)` on Linux at least also sorts the files +alphabetically, ignoring the leading underscores. -This PEP currently promote no default support for bytecode-only -packages. The primary motivator for this are that we can reduce stat -calls if the importer only looks for .py files, making Python start-up -and import faster. +Should we name the cache directory something like `__cachepy__` so +that it sorts before `__init__.py`? OTOH, many graphical file system +navigators sort directories before plain files anyway, so maybe it +doesn't matter. -The question is how to balance the requirements of bytecode-only users -with the more universally beneficial faster start up times for -requiring source files? Should all Python users pay the extra stat -call penalty in the general case for a minority use case by default? -Evidence shows that the extra stats can be fairly costly to start up -time. +Here are some sample `ls(1) -l` output. First, with `__pycache__`:: -There are several ways out of this. Should we decide that it's -important enough to support bytecode-only packages, the semantics -would be as follows: + % ls -l + total 8 + -rw-rw-r-- 1 user user 0 2010-03-03 08:29 alpha.py + drwxrwxr-x 2 user user 4096 2010-03-03 08:28 beta/ + -rw-rw-r-- 1 user user 0 2010-03-03 08:28 __init__.py + -rw-rw-r-- 1 user user 0 2010-03-03 08:28 one.py + drwxrwxr-x 2 user user 4096 2010-03-03 08:28 __pycache__/ + -rw-rw-r-- 1 user user 0 2010-03-03 08:28 two.py -* If there is a traditional, non-magic-tagged .pyc file in the - location where a .py file should be found, it will satisfy the - import. -* The `__file__` attribute of the module will point to the .pyc file. -* The `__cached__` attribute of the module will point to the .pyc file - too. -* The existence of a matching `__pycached__/foo..pyc` file - without the source py file will *not* satisfy the import. This - means that if the source file is removed, the pyc file will be - ignored (unlike in today's implementation). +Now, with `__cachepy__`:: -Other ways to satisfy the bytecode-only packagers requirements would -have less impact on the general Python user population, and include: - -* Add a `-X` switch and/or environment variable to enable - the bytecode-only search algorithm. -* Let those who want more protection against casual py hackers package - their code in a zip file, which is supported today. Sub-options - include supporting pyc-only imports only in zip files, or still - requiring the py file for zip imports. -* Provide a custom importer supporting bytecode-only packages, which - would have to be enabled explicitly by the application. Either - Python would provide such a custom importer or it would be left to - third parties to implement. -* Add a marker to a package's `__init__.py` file to enable - bytecode-only imports for everything else in the package. -* Leave it to third-party tools such as py2exe [20]_ to build an - ecosystem and standards around source-less distributions. + % ls -l + total 8 + -rw-rw-r-- 1 user user 0 2010-03-03 08:29 alpha.py + drwxrwxr-x 2 user user 4096 2010-03-03 08:28 beta/ + drwxrwxr-x 2 user user 4096 2010-03-03 08:28 __cachepy__/ + -rw-rw-r-- 1 user user 0 2010-03-03 08:28 __init__.py + -rw-rw-r-- 1 user user 0 2010-03-03 08:28 one.py + -rw-rw-r-- 1 user user 0 2010-03-03 08:28 two.py __cached__ vs. __compiled__ @@ -592,9 +581,6 @@ References .. [18] importlib: http://docs.python.org/3.1/library/importlib.html -.. [19] http://mail.python.org/pipermail/python-dev/2010-March/098042.html - -.. [20] py2exe: http://www.py2exe.org/ ACKNOWLEDGMENTS ===============