simplify pyc header format
This commit is contained in:
parent
2c88e2d062
commit
e909d34d27
31
pep-0552.txt
31
pep-0552.txt
|
@ -61,19 +61,26 @@ address here that can make pycs non-deterministic.)
|
|||
Specification
|
||||
=============
|
||||
|
||||
Python will begin to recognize two magic number variants for every pyc
|
||||
version. One magic number will correspond to the current pyc format and the
|
||||
other to "hash-based" pycs introduced by this PEP.
|
||||
The pyc header currently consists of 3 32-bit words. We will expand it to 4. The
|
||||
first word will continue to be the magic number, versioning the bytecode and pyc
|
||||
format. The second word, conceptually the new word, will be a bit field. The
|
||||
interpretation of the rest of the header and invalidation behavior of the pyc
|
||||
depends on the contents of the bit field.
|
||||
|
||||
In hash-based pycs, the second field in the pyc header (currently the
|
||||
"timestamp" field) will become a bitset of flags. We define the lowest flag in
|
||||
this bitset called ``check_source``. Following the bitset is a 64-bit hash of
|
||||
the source file. We will use a SipHash_ with a hardcoded key of the contents of
|
||||
the source file. Another a fast hash like MD5 or BLAKE2_ would also work. We
|
||||
choose SipHash because Python already has a builtin implementation of it from
|
||||
:pep:`456`, although an interface that allows picking the SipHash key must be
|
||||
exposed to Python. Security of the hash is not a concern, though we pass over
|
||||
red-flag hashes like MD5 to ease auditing of Python in controlled environments.
|
||||
If the bit field is 0, the pyc is a traditional timestamp-based pyc. I.e., the
|
||||
third and forth words will be the timestamp and file size respectively, and
|
||||
invalidation will be done by comparing the timestamp and file size of the source
|
||||
file with that in the header.
|
||||
|
||||
If the lowest bit of the bit field is set, the pyc is a hash-based pyc. We call
|
||||
the second lowest bit the ``check_source`` flag. Following the bitset is a
|
||||
64-bit hash of the source file. We will use a SipHash_ with a hardcoded key of
|
||||
the contents of the source file. Another a fast hash like MD5 or BLAKE2_ would
|
||||
also work. We choose SipHash because Python already has a builtin implementation
|
||||
of it from :pep:`456`, although an interface that allows picking the SipHash key
|
||||
must be exposed to Python. Security of the hash is not a concern, though we pass
|
||||
over red-flag hashes like MD5 to ease auditing of Python in controlled
|
||||
environments.
|
||||
|
||||
When Python encounters a hash-based pyc, its behavior depends on the setting of
|
||||
the ``check_source`` flag. If the ``check_source`` flag is set, Python will
|
||||
|
|
Loading…
Reference in New Issue