352 lines
14 KiB
Plaintext
352 lines
14 KiB
Plaintext
PEP: 488
|
||
Title: Elimination of PYO files
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Brett Cannon <brett@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 20-Feb-2015
|
||
Post-History:
|
||
2015-03-06
|
||
2015-03-13
|
||
2015-03-20
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes eliminating the concept of PYO files from Python.
|
||
To continue the support of the separation of bytecode files based on
|
||
their optimization level, this PEP proposes extending the PYC file
|
||
name to include the optimization level in the bytecode repository
|
||
directory when there are optimizations applied.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
As of today, bytecode files come in two flavours: PYC and PYO. A PYC
|
||
file is the bytecode file generated and read from when no
|
||
optimization level is specified at interpreter startup (i.e., ``-O``
|
||
is not specified). A PYO file represents the bytecode file that is
|
||
read/written when **any** optimization level is specified (i.e., when
|
||
``-O`` **or** ``-OO`` is specified). This means that while PYC
|
||
files clearly delineate the optimization level used when they were
|
||
generated -- namely no optimizations beyond the peepholer -- the same
|
||
is not true for PYO files. To put this in terms of optimization
|
||
levels and the file extension:
|
||
|
||
- 0: ``.pyc``
|
||
- 1 (``-O``): ``.pyo``
|
||
- 2 (``-OO``): ``.pyo``
|
||
|
||
The reuse of the ``.pyo`` file extension for both level 1 and 2
|
||
optimizations means that there is no clear way to tell what
|
||
optimization level was used to generate the bytecode file. In terms
|
||
of reading PYO files, this can lead to an interpreter using a mixture
|
||
of optimization levels with its code if the user was not careful to
|
||
make sure all PYO files were generated using the same optimization
|
||
level (typically done by blindly deleting all PYO files and then
|
||
using the `compileall` module to compile all-new PYO files [1]_).
|
||
This issue is only compounded when people optimize Python code beyond
|
||
what the interpreter natively supports, e.g., using the astoptimizer
|
||
project [2]_.
|
||
|
||
In terms of writing PYO files, the need to delete all PYO files
|
||
every time one either changes the optimization level they want to use
|
||
or are unsure of what optimization was used the last time PYO files
|
||
were generated leads to unnecessary file churn. The change proposed
|
||
by this PEP also allows for **all** optimization levels to be
|
||
pre-compiled for bytecode files ahead of time, something that is
|
||
currently impossible thanks to the reuse of the ``.pyo`` file
|
||
extension for multiple optimization levels.
|
||
|
||
As for distributing bytecode-only modules, having to distribute both
|
||
``.pyc`` and ``.pyo`` files is unnecessary for the common use-case
|
||
of code obfuscation and smaller file deployments. This means that
|
||
bytecode-only modules will only load from their non-optimized
|
||
``.pyc`` file name.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
To eliminate the ambiguity that PYO files present, this PEP proposes
|
||
eliminating the concept of PYO files and their accompanying ``.pyo``
|
||
file extension. To allow for the optimization level to be unambiguous
|
||
as well as to avoid having to regenerate optimized bytecode files
|
||
needlessly in the `__pycache__` directory, the optimization level
|
||
used to generate the bytecode file will be incorporated into the
|
||
bytecode file name. When no optimization level is specified, the
|
||
pre-PEP ``.pyc`` file name will be used (i.e., no optimization level
|
||
will be specified in the file name). For example, a source file named
|
||
``foo.py`` in CPython 3.5 could have the following bytecode files
|
||
based on the interpreter's optimization level (none, ``-O``, and
|
||
``-OO``):
|
||
|
||
- 0: ``foo.cpython-35.pyc``
|
||
- 1: ``foo.cpython-35.opt-1.pyc``
|
||
- 2: ``foo.cpython-35.opt-2.pyc``
|
||
|
||
Currently bytecode file names are created by
|
||
``importlib.util.cache_from_source()``, approximately using the
|
||
following expression defined by PEP 3147 [3]_, [4]_, [5]_::
|
||
|
||
'{name}.{cache_tag}.pyc'.format(name=module_name,
|
||
cache_tag=sys.implementation.cache_tag)
|
||
|
||
This PEP proposes to change the expression when an optimization
|
||
level is specified to::
|
||
|
||
'{name}.{cache_tag}.opt-{optimization}.pyc'.format(
|
||
name=module_name,
|
||
cache_tag=sys.implementation.cache_tag,
|
||
optimization=str(sys.flags.optimize))
|
||
|
||
The "opt-" prefix was chosen so as to provide a visual separator
|
||
from the cache tag. The placement of the optimization level after
|
||
the cache tag was chosen to preserve lexicographic sort order of
|
||
bytecode file names based on module name and cache tag which will
|
||
not vary for a single interpreter. The "opt-" prefix was chosen over
|
||
"o" so as to be somewhat self-documenting. The "opt-" prefix was
|
||
chosen over "O" so as to not have any confusion in case "0" was the
|
||
leading prefix of the optimization level.
|
||
|
||
A period was chosen over a hyphen as a separator so as to distinguish
|
||
clearly that the optimization level is not part of the interpreter
|
||
version as specified by the cache tag. It also lends to the use of
|
||
the period in the file name to delineate semantically different
|
||
concepts.
|
||
|
||
For example, if ``-OO`` had been passed to the interpreter then
|
||
instead of ``importlib.cpython-35.pyo`` the file name would be
|
||
``importlib.cpython-35.opt-2.pyc``.
|
||
|
||
Leaving out the new ``opt-`` tag when no optimization level is
|
||
applied should increase backwards-compatibility. This is also more
|
||
understanding of Python implementations which have no use for
|
||
optimization levels (e.g., PyPy[10]_).
|
||
|
||
It should be noted that this change in no way affects the performance
|
||
of import. Since the import system looks for a single bytecode file
|
||
based on the optimization level of the interpreter already and
|
||
generates a new bytecode file if it doesn't exist, the introduction
|
||
of potentially more bytecode files in the ``__pycache__`` directory
|
||
has no effect in terms of stat calls. The interpreter will continue
|
||
to look for only a single bytecode file based on the optimization
|
||
level and thus no increase in stat calls will occur.
|
||
|
||
The only potentially negative result of this PEP is the probable
|
||
increase in the number of ``.pyc`` files and thus increase in storage
|
||
use. But for platforms where this is an issue,
|
||
``sys.dont_write_bytecode`` exists to turn off bytecode generation so
|
||
that it can be controlled offline.
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
importlib
|
||
---------
|
||
|
||
As ``importlib.util.cache_from_source()`` is the API that exposes
|
||
bytecode file paths as well as being directly used by importlib, it
|
||
requires the most critical change. As of Python 3.4, the function's
|
||
signature is::
|
||
|
||
importlib.util.cache_from_source(path, debug_override=None)
|
||
|
||
This PEP proposes changing the signature in Python 3.5 to::
|
||
|
||
importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
|
||
|
||
The introduced ``optimization`` keyword-only parameter will control
|
||
what optimization level is specified in the file name. If the
|
||
argument is ``None`` then the current optimization level of the
|
||
interpreter will be assumed (including no optimization). Any argument
|
||
given for ``optimization`` will be passed to ``str()`` and must have
|
||
``str.isalnum()`` be true, else ``ValueError`` will be raised (this
|
||
prevents invalid characters being used in the file name). If the
|
||
empty string is passed in for ``optimization`` then the addition of
|
||
the optimization will be suppressed, reverting to the file name
|
||
format which predates this PEP.
|
||
|
||
It is expected that beyond Python's own two optimization levels,
|
||
third-party code will use a hash of optimization names to specify the
|
||
optimization level, e.g.
|
||
``hashlib.sha256(','.join(['no dead code', 'const folding'])).hexdigest()``.
|
||
While this might lead to long file names, it is assumed that most
|
||
users never look at the contents of the __pycache__ directory and so
|
||
this won't be an issue.
|
||
|
||
The ``debug_override`` parameter will be deprecated. As the parameter
|
||
expects a boolean, the integer value of the boolean will be used as
|
||
if it had been provided as the argument to ``optimization`` (a
|
||
``None`` argument will mean the same as for ``optimization``). A
|
||
deprecation warning will be raised when ``debug_override`` is given a
|
||
value other than ``None``, but there are no plans for the complete
|
||
removal of the parameter at this time (but removal will be no later
|
||
than Python 4).
|
||
|
||
The various module attributes for importlib.machinery which relate to
|
||
bytecode file suffixes will be updated [7]_. The
|
||
``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will
|
||
both be documented as deprecated and set to the same value as
|
||
``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and
|
||
``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be
|
||
not later than Python 4).
|
||
|
||
All various finders and loaders will also be updated as necessary,
|
||
but updating the previous mentioned parts of importlib should be all
|
||
that is required.
|
||
|
||
|
||
Rest of the standard library
|
||
----------------------------
|
||
|
||
The various functions exposed by the ``py_compile`` and
|
||
``compileall`` functions will be updated as necessary to make sure
|
||
they follow the new bytecode file name semantics [6]_, [1]_. The CLI
|
||
for the ``compileall`` module will not be directly affected (the
|
||
``-b`` flag will be implicit as it will no longer generate ``.pyo``
|
||
files when ``-O`` is specified).
|
||
|
||
|
||
Compatibility Considerations
|
||
============================
|
||
|
||
Any code directly manipulating bytecode files from Python 3.2 on
|
||
will need to consider the impact of this change on their code (prior
|
||
to Python 3.2 -- including all of Python 2 -- there was no
|
||
__pycache__ which already necessitates bifurcating bytecode file
|
||
handling support). If code was setting the ``debug_override``
|
||
argument to ``importlib.util.cache_from_source()`` then care will be
|
||
needed if they want the path to a bytecode file with an optimization
|
||
level of 2. Otherwise only code **not** using
|
||
``importlib.util.cache_from_source()`` will need updating.
|
||
|
||
As for people who distribute bytecode-only modules (i.e., use a
|
||
bytecode file instead of a source file), they will have to choose
|
||
which optimization level they want their bytecode files to be since
|
||
distributing a ``.pyo`` file with a ``.pyc`` file will no longer be
|
||
of any use. Since people typically only distribute bytecode files for
|
||
code obfuscation purposes or smaller distribution size then only
|
||
having to distribute a single ``.pyc`` should actually be beneficial
|
||
to these use-cases. And since the magic number for bytecode files
|
||
changed in Python 3.5 to support PEP 465 there is no need to support
|
||
pre-existing ``.pyo`` files [8]_.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Completely dropping optimization levels from CPython
|
||
----------------------------------------------------
|
||
|
||
Some have suggested that instead of accommodating the various
|
||
optimization levels in CPython, we should instead drop them
|
||
entirely. The argument is that significant performance gains would
|
||
occur from runtime optimizations through something like a JIT and not
|
||
through pre-execution bytecode optimizations.
|
||
|
||
This idea is rejected for this PEP as that ignores the fact that
|
||
there are people who do find the pre-existing optimization levels for
|
||
CPython useful. It also assumes that no other Python interpreter
|
||
would find what this PEP proposes useful.
|
||
|
||
|
||
Alternative formatting of the optimization level in the file name
|
||
-----------------------------------------------------------------
|
||
|
||
Using the "opt-" prefix and placing the optimization level between
|
||
the cache tag and file extension is not critical. All options which
|
||
have been considered are:
|
||
|
||
* ``importlib.cpython-35.opt-1.pyc``
|
||
* ``importlib.cpython-35.opt1.pyc``
|
||
* ``importlib.cpython-35.o1.pyc``
|
||
* ``importlib.cpython-35.O1.pyc``
|
||
* ``importlib.cpython-35.1.pyc``
|
||
* ``importlib.cpython-35-O1.pyc``
|
||
* ``importlib.O1.cpython-35.pyc``
|
||
* ``importlib.o1.cpython-35.pyc``
|
||
* ``importlib.1.cpython-35.pyc``
|
||
|
||
These were initially rejected either because they would change the
|
||
sort order of bytecode files, possible ambiguity with the cache tag,
|
||
or were not self-documenting enough. An informal poll was taken and
|
||
people clearly preferred the formatting proposed by the PEP [9]_.
|
||
Since this topic is non-technical and of personal choice, the issue
|
||
is considered solved.
|
||
|
||
|
||
Embedding the optimization level in the bytecode metadata
|
||
---------------------------------------------------------
|
||
|
||
Some have suggested that rather than embedding the optimization level
|
||
of bytecode in the file name that it be included in the file's
|
||
metadata instead. This would mean every interpreter had a single copy
|
||
of bytecode at any time. Changing the optimization level would thus
|
||
require rewriting the bytecode, but there would also only be a single
|
||
file to care about.
|
||
|
||
This has been rejected due to the fact that Python is often installed
|
||
as a root-level application and thus modifying the bytecode file for
|
||
modules in the standard library are always possible. In this
|
||
situation integrators would need to guess at what a reasonable
|
||
optimization level was for users for any/all situations. By
|
||
allowing multiple optimization levels to co-exist simultaneously it
|
||
frees integrators from having to guess what users want and allows
|
||
users to utilize the optimization level they want.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] The compileall module
|
||
(https://docs.python.org/3/library/compileall.html#module-compileall)
|
||
|
||
.. [2] The astoptimizer project
|
||
(https://pypi.python.org/pypi/astoptimizer)
|
||
|
||
.. [3] ``importlib.util.cache_from_source()``
|
||
(https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source)
|
||
|
||
.. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1
|
||
(https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437)
|
||
|
||
.. [5] PEP 3147, PYC Repository Directories, Warsaw
|
||
(http://www.python.org/dev/peps/pep-3147)
|
||
|
||
.. [6] The py_compile module
|
||
(https://docs.python.org/3/library/compileall.html#module-compileall)
|
||
|
||
.. [7] The importlib.machinery module
|
||
(https://docs.python.org/3/library/importlib.html#module-importlib.machinery)
|
||
|
||
.. [8] ``importlib.util.MAGIC_NUMBER``
|
||
(https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER)
|
||
|
||
.. [9] Informal poll of file name format options on Google+
|
||
(https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm)
|
||
|
||
.. [10] The PyPy Project
|
||
(http://pypy.org/)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|