Add PEP 488: elimination of PYO files
This commit is contained in:
parent
d2ed46e511
commit
e213a64794
|
@ -0,0 +1,303 @@
|
|||
PEP: 488
|
||||
Title: Elimination of PYO files
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Brett Cannon <brett@python.org>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 20-Feb-2015
|
||||
Post-History:
|
||||
2015-03-06
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes eliminating the concept of PYO files from Python.
|
||||
To continue the support of the separation of bytecode files based on
|
||||
their optimization level, this PEP proposes extending the PYC file
|
||||
name to include the optimization level in bytecode repository
|
||||
directory (i.e., the ``__pycache__`` directory).
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
As of today, bytecode files come in two flavours: PYC and PYO. A PYC
|
||||
file is the bytecode file generated and read from when no
|
||||
optimization level is specified at interpreter startup (i.e., ``-O``
|
||||
is not specified). A PYO file represents the bytecode file that is
|
||||
read/written when **any** optimization level is specified (i.e., when
|
||||
``-O`` is specified, including ``-OO``). This means that while PYC
|
||||
files clearly delineate the optimization level used when they were
|
||||
generated -- namely no optimizations beyond the peepholer -- the same
|
||||
is not true for PYO files. Put in terms of optimization levels and
|
||||
the file extension:
|
||||
|
||||
- 0: ``.pyc``
|
||||
- 1 (``-O``): ``.pyo``
|
||||
- 2 (``-OO``): ``.pyo``
|
||||
|
||||
The reuse of the ``.pyo`` file extension for both level 1 and 2
|
||||
optimizations means that there is no clear way to tell what
|
||||
optimization level was used to generate the bytecode file. In terms
|
||||
of reading PYO files, this can lead to an interpreter using a mixture
|
||||
of optimization levels with its code if the user was not careful to
|
||||
make sure all PYO files were generated using the same optimization
|
||||
level (typically done by blindly deleting all PYO files and then
|
||||
using the `compileall` module to compile all-new PYO files [1]_).
|
||||
This issue is only compounded when people optimize Python code beyond
|
||||
what the interpreter natively supports, e.g., using the astoptimizer
|
||||
project [2]_.
|
||||
|
||||
In terms of writing PYO files, the need to delete all PYO files
|
||||
every time one either changes the optimization level they want to use
|
||||
or are unsure of what optimization was used the last time PYO files
|
||||
were generated leads to unnecessary file churn. The change proposed
|
||||
by this PEP also allows for **all** optimization levels to be
|
||||
pre-compiled for bytecode files ahead of time, something that is
|
||||
currently impossible thanks to the reuse of the ``.pyo`` file
|
||||
extension for multiple optimization levels.
|
||||
|
||||
As for distributing bytecode-only modules, having to distribute both
|
||||
``.pyc`` and ``.pyo`` files is unnecessary for the common use-case
|
||||
of code obfuscation and smaller file deployments.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
To eliminate the ambiguity that PYO files present, this PEP proposes
|
||||
eliminating the concept of PYO files and their accompanying ``.pyo``
|
||||
file extension. To allow for the optimization level to be unambiguous
|
||||
as well as to avoid having to regenerate optimized bytecode files
|
||||
needlessly in the `__pycache__` directory, the optimization level
|
||||
used to generate a PYC file will be incorporated into the bytecode
|
||||
file name. Currently bytecode file names are created by
|
||||
``importlib.util.cache_from_source()``, approximately using the
|
||||
following expression defined by PEP 3147 [3]_, [4]_, [5]_::
|
||||
|
||||
'{name}.{cache_tag}.pyc'.format(name=module_name,
|
||||
cache_tag=sys.implementation.cache_tag)
|
||||
|
||||
This PEP proposes to change the expression to::
|
||||
|
||||
'{name}.{cache_tag}.opt-{optimization}.pyc'.format(
|
||||
name=module_name,
|
||||
cache_tag=sys.implementation.cache_tag,
|
||||
optimization=str(sys.flags.optimize))
|
||||
|
||||
The "opt-" prefix was chosen so as to provide a visual separator
|
||||
from the cache tag. The placement of the optimization level after
|
||||
the cache tag was chosen to preserve lexicographic sort order of
|
||||
bytecode file names based on module name and cache tag which will
|
||||
not vary for a single interpreter. The "opt-" prefix was chosen over
|
||||
"o" so as to be somewhat self-documenting. The "opt-" prefix was
|
||||
chosen over "O" so as to not have any confusion with "0" while being
|
||||
so close to the interpreter version number.
|
||||
|
||||
A period was chosen over a hyphen as a separator so as to distinguish
|
||||
clearly that the optimization level is not part of the interpreter
|
||||
version as specified by the cache tag. It also lends to the use of
|
||||
the period in the file name to delineate semantically different
|
||||
concepts.
|
||||
|
||||
For example, the bytecode file name of ``importlib.cpython-35.pyc``
|
||||
would become ``importlib.cpython-35.opt-0.pyc``. If ``-OO`` had been
|
||||
passed to the interpreter then instead of
|
||||
``importlib.cpython-35.pyo`` the file name would be
|
||||
``importlib.cpython-35.opt-2.pyc``.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
importlib
|
||||
---------
|
||||
|
||||
As ``importlib.util.cache_from_source()`` is the API that exposes
|
||||
bytecode file paths as while as being directly used by importlib, it
|
||||
requires the most critical change. As of Python 3.4, the function's
|
||||
signature is::
|
||||
|
||||
importlib.util.cache_from_source(path, debug_override=None)
|
||||
|
||||
This PEP proposes changing the signature in Python 3.5 to::
|
||||
|
||||
importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
|
||||
|
||||
The introduced ``optimization`` keyword-only parameter will control
|
||||
what optimization level is specified in the file name. If the
|
||||
argument is ``None`` then the current optimization level of the
|
||||
interpreter will be assumed. Any argument given for ``optimization``
|
||||
will be passed to ``str()`` and must have ``str.isalnum()`` be true,
|
||||
else ``ValueError`` will be raised (this prevents invalid characters
|
||||
being used in the file name). If the empty string is passed in for
|
||||
``optimization`` then the addition of the optimization will be
|
||||
suppressed, reverting to the file name format which predates this
|
||||
PEP.
|
||||
|
||||
It is expected that beyond Python's own
|
||||
0-2 optimization levels, third-party code will use a hash of
|
||||
optimization names to specify the optimization level, e.g.
|
||||
``hashlib.sha256(','.join(['dead code elimination', 'constant folding'])).hexdigest()``.
|
||||
While this might lead to long file names, it is assumed that most
|
||||
users never look at the contents of the __pycache__ directory and so
|
||||
this won't be an issue.
|
||||
|
||||
The ``debug_override`` parameter will be deprecated. As the parameter
|
||||
expects a boolean, the integer value of the boolean will be used as
|
||||
if it had been provided as the argument to ``optimization`` (a
|
||||
``None`` argument will mean the same as for ``optimization``). A
|
||||
deprecation warning will be raised when ``debug_override`` is given a
|
||||
value other than ``None``, but there are no plans for the complete
|
||||
removal of the parameter as this time (but removal will be no later
|
||||
than Python 4).
|
||||
|
||||
The various module attributes for importlib.machinery which relate to
|
||||
bytecode file suffixes will be updated [7]_. The
|
||||
``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will
|
||||
both be documented as deprecated and set to the same value as
|
||||
``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and
|
||||
``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be
|
||||
not later than Python 4).
|
||||
|
||||
All various finders and loaders will also be updated as necessary,
|
||||
but updating the previous mentioned parts of importlib should be all
|
||||
that is required.
|
||||
|
||||
|
||||
Rest of the standard library
|
||||
----------------------------
|
||||
|
||||
The various functions exposed by the ``py_compile`` and
|
||||
``compileall`` functions will be updated as necessary to make sure
|
||||
they follow the new bytecode file name semantics [6]_, [1]_. The CLI
|
||||
for the ``compileall`` module will not be directly affected (the
|
||||
``-b`` flag will be implicitly as it will no longer generate ``.pyo``
|
||||
files when ``-O`` is specified).
|
||||
|
||||
|
||||
Compatibility Considerations
|
||||
============================
|
||||
|
||||
Any code directly manipulating bytecode files from Python 3.2 on
|
||||
will need to consider the impact of this change on their code (prior
|
||||
to Python 3.2 -- including all of Python 2 -- there was no
|
||||
__pycache__ which already necessitates bifurcating bytecode file
|
||||
handling support). If code was setting the ``debug_override``
|
||||
argument to ``importlib.util.cache_from_source()`` then care will be
|
||||
needed if they want the path to a bytecode file with an optimization
|
||||
level of 2. Otherwise only code **not** using
|
||||
``importlib.util.cache_from_source()`` will need updating.
|
||||
|
||||
As for people who distribute bytecode-only modules (i.e., use a
|
||||
bytecode file instead of a source file), they will have to choose
|
||||
which optimization level they want their bytecode files to be since
|
||||
distributing a ``.pyo`` file with a ``.pyc`` file will no longer be
|
||||
of any use. Since people typically only distribute bytecode files for
|
||||
code obfuscation purposes or smaller distribution size then only
|
||||
having to distribute a single ``.pyc`` should actually be beneficial
|
||||
to these use-cases. And since the magic number for bytecode files
|
||||
changed in Python 3.5 to support PEP 465 there is no need to support
|
||||
pre-existing ``.pyo`` files [8]_.
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
N/A
|
||||
|
||||
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
Formatting of the optimization level in the file name
|
||||
-----------------------------------------------------
|
||||
|
||||
Using the "opt-" prefix and placing the optimization level between
|
||||
the cache tag and file extension is not critical. All options which
|
||||
have been considered are:
|
||||
|
||||
* ``importlib.cpython-35.opt-0.pyc``
|
||||
* ``importlib.cpython-35.opt0.pyc``
|
||||
* ``importlib.cpython-35.o0.pyc``
|
||||
* ``importlib.cpython-35.O0.pyc``
|
||||
* ``importlib.cpython-35.0.pyc``
|
||||
* ``importlib.cpython-35-O0.pyc``
|
||||
* ``importlib.O0.cpython-35.pyc``
|
||||
* ``importlib.o0.cpython-35.pyc``
|
||||
* ``importlib.0.cpython-35.pyc``
|
||||
|
||||
These were initially rejected either because they would change the
|
||||
sort order of bytecode files, possible ambiguity with the cache tag,
|
||||
or were not self-documenting enough.
|
||||
|
||||
|
||||
Not specifying the optimization level when it is at 0
|
||||
-----------------------------------------------------
|
||||
|
||||
It has been suggested that for the common case of when the
|
||||
optimizations are at level 0 that the entire part of the file name
|
||||
relating to the optimization level be left out. This would allow for
|
||||
file names of ``.pyc`` files to go unchanged, potentially leading to
|
||||
less backwards-compatibility issues (although Python 3.5 introduces a
|
||||
new magic number for bytecode so all bytecode files will have to be
|
||||
regenerated regardless of the outcome of this PEP).
|
||||
|
||||
It would also allow a potentially redundant bit of information to be
|
||||
left out of the file name if an implementation of Python did not
|
||||
allow for optimizing bytecode. This would only occur, though, if the
|
||||
interpreter didn't support ``-O`` **and** didn't implement the ast
|
||||
module, else users could implement their own optimizations.
|
||||
|
||||
Arguments against allow this special case is "explicit is better than
|
||||
implicit" and "special cases aren't special enough to break the
|
||||
rules". There are also currently no Python 3 interpreters that don't
|
||||
support ``-O``, so a potential Python 3 implementation which doesn't
|
||||
allow bytecode optimization is entirely theoretical at the moment.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] The compileall module
|
||||
(https://docs.python.org/3/library/compileall.html#module-compileall)
|
||||
|
||||
.. [2] The astoptimizer project
|
||||
(https://pypi.python.org/pypi/astoptimizer)
|
||||
|
||||
.. [3] ``importlib.util.cache_from_source()``
|
||||
(https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source)
|
||||
|
||||
.. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1
|
||||
(https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437)
|
||||
|
||||
.. [5] PEP 3147, PYC Repository Directories, Warsaw
|
||||
(http://www.python.org/dev/peps/pep-3147)
|
||||
|
||||
.. [6] The py_compile module
|
||||
(https://docs.python.org/3/library/compileall.html#module-compileall)
|
||||
|
||||
.. [7] The importlib.machinery module
|
||||
(https://docs.python.org/3/library/importlib.html#module-importlib.machinery)
|
||||
|
||||
.. [8] ``importlib.util.MAGIC_NUMBER``
|
||||
(https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER)
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue