PEP 751: A file format to list Python dependencies for installation reproducibility (#3870)

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
This commit is contained in:
Brett Cannon 2024-07-24 16:13:38 -07:00 committed by GitHub
parent d0bbb6bdbb
commit 92634ee01f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 937 additions and 1 deletions

4
.github/CODEOWNERS vendored
View File

@ -625,9 +625,11 @@ peps/pep-0743.rst @vstinner
peps/pep-0744.rst @brandtbucher
peps/pep-0745.rst @hugovk
peps/pep-0746.rst @JelleZijlstra
peps/pep-0747.rst @JelleZijlstra
# ...
peps/pep-0749.rst @JelleZijlstra
# ...
peps/pep-0747.rst @JelleZijlstra
peps/pep-0751.rst @brettcannon
# ...
# peps/pep-0754.rst
# ...

934
peps/pep-0751.rst Normal file
View File

@ -0,0 +1,934 @@
PEP: 751
Title: A file format to list Python dependencies for installation reproducibility
Author: Brett Cannon <brett@python.org>
Status: Draft
Type: Standards Track
Topic: Packaging
Created: 24-Jul-2024
Replaces: 665
========
Abstract
========
This PEP proposes a new file format for dependency specification
to enable reproducible installation in a Python environment. The format is
designed to be human-readable and machine-generated. Installers consuming the
file should be able to evaluate each package in question in isolation, with no
need for dependency resolution at install-time.
==========
Motivation
==========
Currently, no standard exists to:
- Specify what top-level dependencies should be installed into a Python
environment.
- Create an immutable record, such as a lock file, of which dependencies were
installed.
Considering there are at least four well-known solutions to this problem in the
community (``pip freeze``, pip-tools_, Poetry_, and PDM_), there seems to be an
appetite for lock files in general.
Those tools also vary in what locking scenarios they support. For instance,
``pip freeze`` and pip-tools only generate lock files for the current
environment while PDM and Poetry try to lock for *any* environment to some
degree. And none of them directly support locking to specific files to install
which can be important for some workflows. There's also concerns around the lack
of secure defaults in the face of supply chain attacks (e.g., always including
hashes for files). Finally, not all the formats are easy to audit to determine
what would be installed into an environment ahead of time.
The lack of a standard also has some drawbacks. For instance, any tooling that
wants to work with lock files must choose which format to support, potentially
leaving users unsupported (e.g., if Dependabot_ chose not to support PDM,
support by cloud providers who can do dependency installations on your behalf,
etc.).
=========
Rationale
=========
The format is designed so that a *locker* which produces the lock file
and an *installer* which consumes the lock file can be separate tools. This
allows for situations such as cloud hosting providers to use their own installer
that's optimized for their system which is independent of what locker the user
used to create their lock file.
The file format is designed to be human-readable. This is
so that the contents of the file can be audited by a human to make sure no
undesired dependencies end up being included in the lock file. It is also to
facilitate easy understanding of what would be installed if the lock file
without necessitating running a tool, once again to help with auditing. Finally,
the format is designed so that viewing a diff of the file is easy by centralizing
relevant details.
The file format is also designed to not require a resolver at install time.
Being able to analyze dependencies in isolation from one another when listed in
a lock file provides a few benefits. First, it supports auditing by making it
easy to figure out if a certain dependency would be installed for a certain
environment without needing to reference other parts of the file contextually.
It should also lead to faster installs which are much more frequent than
creating a lock file. Finally, the four tools mentioned in the Motivation_
section either already implement this approach of evaluating dependencies in
isolation or have suggested they could (in
`Poetry's case <https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/83>`__).
-----------------
Locking Scenarios
-----------------
The lock file format is designed to support two locking scenarios. The format
should also be flexible enough that adding support for other locking scenarios
is possible via a separate PEP.
Per-file Locking
================
*Per-file locking* operates under the premise that one wants to install exactly
the same files in any matching environment. As such, the lock file specifies the
with the files to install. There can be multiple environments specified in a
single file, each with their own set of files to install. By specifying the
exact files to install, installers avoid performing any resolution to decide what
to install.
The motivation for this approach to locking is for those who have controlled
environments that they work with. For instance, if you have specific, controlled
development and production environments then you can use per-file locking to
make sure the **same** files are installed in both environments for everyone.
This is similar to what ``pip freeze`` and pip-tools_
support, but with more strictness of the exact files as well as incorporating
support to specify the locked files for multiple environments in the same file.
Package Locking
===============
*Package locking* lists the packages and their versions that *may* apply to any
environment being installed for. The list of packages and their versions are
evaluated individually and independently from any other packages and versions
listed in the file. This allows installation to be linear -- read each package
and version and make an isolated decision as to whether it should be installed.
This avoids requiring the installer to perform a *resolution* (i.e.
determine what to install based on what else is to be installed).
The motivation of this approach comes from
`PDM lock files <https://frostming.com/en/2024/pdm-lockfile/>`__. By listing the
potential packages and versions that may be installed, what's installed is
controlled in a way that's easy to reason about. This also allows for not
specifying the exact environments that would be supported by the lock file so
there's more flexibility for what environments are compatible with the lock
file. This approach supports scenarios like open-source projects that want to
lock what people should use to build the documentation without knowing upfront
what environments their contributors are working from.
As already mentioned, this approach is supported by PDM_. Poetry_ has
`shown some interest <https://discuss.python.org/t/46593/83>`__.
=============
Specification
=============
---------
File Name
---------
A lock file MUST be named :file:`pylock.toml` or match the regular expression
``r"pylock\.(.+)\.toml"`` if a name for the lock file is desired or if multiple lock files exist.
The use of the ``.toml`` file extension is to make syntax highlighting in
editors easier and to reinforce the fact that the file format is meant to be
human-readable. The prefix and suffix of a named file MUST be lowercase for easy
detection and stripping off to find the name, e.g.::
if filename.startswith("pylock.") and filename.endswith(".toml"):
name = filename.removeprefix("pylock.").removesuffix(".toml")
This PEP has no opinion as to the location of lock files (i.e. in the root or
the subdirectory of a project).
-----------
File Format
-----------
The format of the file is TOML_.
All keys listed below are required unless otherwise noted. If two keys are
mutually exclusive to one another, then one of the keys is required while the
other is disallowed.
``version``
===========
- String
- The version of the lock file format.
- This PEP specifies the initial version -- and only valid value until future
updates to the standard change it -- as ``"1.0"``.
``hash-algorithm``
==================
- String
- The name of the hash algorithm used for calculating all hash values.
- Only a single hash algorithm is used for the entire file to allow the
``[[package.files]]`` table to be written inline for readability and
compactness purposes by only listing a single hash value instead of multiple
values based on multiple hash algorithms.
- Specifying a single hash algorithm guarantees that an algorithm that the user
prefers is used consistently throughout the file without having to audit
each file hash value separately.
- Allows for updating the entire file to a new hash algorithm without running
the risk of accidentally leaving an old hash value in the file.
- :ref:`packaging:simple-repository-api-json` and the ``hashes`` dictionary of
of the ``files`` dictionary of the Project Details dictionary specifies what
values are valid and guidelines on what hash algorithms to use.
- Failure to validate any hash values for any file that is to be installed MUST
raise an error.
``dependencies``
================
- Array of strings
- A listing of the `dependency specifiers`_ that act as the input to the lock file,
representing the direct, top-level dependencies to be installed.
``[[file-lock]]``
=================
- Array of tables
- Mutually exclusive with ``[package-lock]``.
- The array's existence implies the use of the per-file locking approach.
- An environment that meets all of the specified criteria in the table will be
considered compatible with the environment that was locked for.
- Lockers MUST NOT generate multiple ``[file-lock]`` tables which would be
considered compatible for the same environment.
- In instances where there would be a conflict but the lock is still desired,
either separate lock files can be written or per-package locking can be used.
- Entries in array SHOULD be sorted by ``file-lock.name`` lexicographically.
``file-lock.name``
------------------
- String
- A unique name within the array for the environment this table represents.
``[file-lock.marker-values]``
-----------------------------
- Optional
- Table of strings
- The keys represent the names of `environment markers`_ and the values are the
values for those markers.
- Compatibility is defined by the environment's values matching what is in the
table.
- Lockers SHOULD sort the keys lexicographically to minimize changes when
updating the file.
``file-lock.wheel-tags``
------------------------
- Optional
- Array of strings
- An unordered array of `wheel tags`_ which must be supported by the environment.
- The array MAY not be exhaustive to allow for a smaller array as well as to
help prevent multiple ``[[file-lock]]`` tables being compatible with the
same environment by having one array being a strict subset of another
``file-lock.wheel-tags`` entry in the same file's
``[[file-lock]]`` tables.
- Lockers SHOULD sort the keys lexicographically to minimize changes when
updating the file.
- Lockers MUST NOT include
`compressed tag sets <https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/#compressed-tag-sets>`__
or duplicate tags for consistency across lockers and to simplify checking for
compatibility.
``[package-lock]``
==================
- Table
- Mutually exclusive with ``[[file-lock]]``.
- Signifies the use of the package locking approach.
``package-lock.requires-python``
--------------------------------
- String
- Holds the `version specifiers`_ for Python version compatibility for the
overall package locking.
- Provides at-a-glance information to know if the lock file *may* apply to a
version of Python instead of having to scan the entire file to compile the
same information.
``[[package]]``
===============
- Array of tables
- The array contains all data on the locked package versions.
- Lockers SHOULD record packages in order by ``package.name`` lexicographically
and ``package.version`` by the sort order for `version specifiers`_.
- Lockers SHOULD record keys in the same order as written in this PEP to
minimmize changes when updating.
- Designed so that relevant details as to why a package is included are
in one place to make diff reading easier.
``package.name``
----------------
- String
- The `normalized name`_ of the package.
- Part of what's required to uniquely identify this entry.
``package.version``
-------------------
- String
- The version of the package.
- Part of what's required to uniquely identify this entry.
``package.multiple-entries``
----------------------------
- Boolean
- If package locking via ``[package-lock]``, then the multiple entries for the
same package MUST be mutually exclusive via ``package.marker`` (this is not
required for per-file locking as the ``package.*.lock`` entries imply mutual
exclusivity).
- Aids in auditing by knowing that there are multiple entries for the same
package that may need to be considered.
``package.description``
-----------------------
- Optional
- String
- The package's ``Summary`` from its `core metadata`_.
- Useful to help understand why a package was included in the file based on its
purpose.
``package.simple-repo-package-url``
-----------------------------------
- Optional (although mutually exclusive with
``package.files.simple-repo-package-url``)
- String
- Stores the `project detail`_ URL from the `Simple Repository API`_.
- Useful for generating Packaging URLs (aka PURLs).
- When possible, lockers SHOULD include this or
``package.files.simple-repo-package-url`` to assist with generating
`software bill of materials`_ (aka SBOMs).
``package.marker``
------------------
- Optional
- String
- The `environment markers`_ expression which specifies whether this package and
version applies to the environment.
- Only applicable via ``[package-lock]`` and the package locking scenario.
- The lack of this key means this package and version is required to be
installed.
``package.requires-python``
---------------------------
- Optional
- String
- Holds the `version specifiers`_ for Python version compatibility for the
package and version.
- Useful for documenting why this package and version was included in the file.
- Also helps document why the version restriction in
``package-lock.requires-python`` was chosen.
- It should not provide useful information for installers as it would be
captured by ``package-lock.requires-python`` and isn't relevant when
``[[file-lock]]`` is used.
``package.dependents``
----------------------
- Optional
- Array of strings
- A record of the packages that depend on this package and version.
- Useful for analyzing why a package happens to be listed in the file
for auditing purposes.
- This does not provide information which influences installers.
``package.dependencies``
------------------------
- Optional
- Array of strings
- A record of the dependencies of the package and version.
- Useful in analyzing why a package happens to be listed in the file
for auditing purposes.
- This does not provide information which influences the installer as
``[[file-lock]]`` specifies the exact files to use and ``[package-lock]``
applicability is determined by ``package.marker``.
``package.direct``
------------------
- Optional (defaults to ``false``)
- Boolean
- Represents whether the installation is via a `direct URL reference`_.
``[[package.files]]``
---------------------
- Must be specified if ``[package.vcs]`` is not
- Array of tables
- Tables can be written inline.
- Represents the files to potentially install for the package and version.
- Entries in ``[[package.files]]`` SHOULD be lexicographically sorted by
``package.files.name`` key to minimze changes in diffs.
``package.files.name``
''''''''''''''''''''''
- String
- The file name.
- Necessary for installers to decide what to install when using package locking.
``package.files.lock``
''''''''''''''''''''''
- Required when ``[[file-lock]]`` is used
- Array of strings
- An array of ``file-lock.name`` values which signify that the file is to be
installed when the corresponding ``[[file-lock]]`` table applies to the
environment.
- There MUST only be a single file with any one ``file-lock.name`` entry per
package, regardless of version.
``package.files.simple-repo-package-url``
'''''''''''''''''''''''''''''''''''''''''
- Optional (although mutually exclusive with
``package.simple-repo-package-url``)
- String
- The value has the same meaning as ``package.simple-repo-package-url``.
- This key is available per-file to support :pep:`708` when some files override
what's provided by another `Simple Repository API`_ index.
``package.files.origin``
''''''''''''''''''''''''
- Optional
- String
- URI where the file was found when the lock file was generated.
- Useful for documenting where the file came from and potentially where to look
for the file if not already downloaded/available.
``package.files.hash``
''''''''''''''''''''''
- String
- The hash value of the file contents using the hash algorithm specified by
``hash-algorithm``.
- Used by installers to verify the file contents match what the locker worked
with.
``[package.vcs]``
-----------------
- Must be specified if ``[[package.files]]`` is not (although may be specified
simultaneously with ``[[package.files]]``).
- Table representing the version control system containing the package and
version.
``package.vcs.type``
''''''''''''''''''''
- String
- The type of version control system used.
- The valid values are specified by the
`registered VCSs <https://packaging.python.org/en/latest/specifications/direct-url-data-structure/#registered-vcs>`__
of the direct URL data structure.
``package.vcs.origin``
''''''''''''''''''''''
- String
- The URI of where the repository was located when the lock file was generated.
``package.vcs.commit``
''''''''''''''''''''''
- String
- The commit ID for the repository which represents the package and version.
- The value MUST be immutable for the VCS for security purposes
(e.g. no Git tags).
``package.vcs.lock``
''''''''''''''''''''
- Required when ``[[file-lock]]`` is used
- An array of strings
- An array of ``file-lock.name`` values which signify that the repository at the
specified commit is to be installed when the corresponding ``[[file-lock]]``
table applies to the environment.
- A name in the array may only appear if no file listed in
``package.files.lock`` contains the name for the same package, regardless of
version.
``package.directory``
---------------------
- Optional and only valid when ``[package-lock]`` is specified
- String
- A local directory where a source tree for the package and version exists.
- Not valid under ``[[file-lock]]`` as this PEP does not make an attempt to
specify a mechanism for verifying file contents have not changed since locking
was performed.
``[[package.build-requires]]``
------------------------------
- Optional
- An array of tables whose structure matches that of ``[[package]]``.
- Each entry represents a package and version to use when building the
enclosing package and version.
- The array is complete/locked like ``[[package]]`` itself (i.e. installers
follow the same installation procedure for ``[[package.build-requires]]`` as
``[[package]]``)
- Selection of which entries to use for an environment as the same as
``[[package]]`` itself, albeit only applying when installing the build
back-end and its dependencies.
- This helps with reproducibility of the building of a package by recording
either what was or would have been used if the locker needed to build the
package.
- If the installer and user choose to install from source and this array is
missing then the installer MAY choose to resolve what to install for building
at install time, otherwise the installer MUST raise an error.
``[package.tool]``
------------------
- Optional
- Same usage as that of the equivalent table from the
`pyproject.toml specification`_.
``[tool]``
==========
- Optional
- Same usage as that of the equivalent table from the
`pyproject.toml specification`_.
------------------------
Expectations for Lockers
------------------------
- When creating a lock file for ``[package-lock]``, the locker SHOULD read
the metadata of **all** files that end up being listed in
``[[package.files]]`` to make sure all potential metadata cases are covered
- If a locker chooses not to check every file for its metadata, the tool MUST
either provide the user with the option to have all files checked (whether
that is opt-in or out is left up to the tool), or the user is somehow notified
that such a standards-violating shortcut is being taken (whether this is by
documentation or at runtime is left to the tool)
- Lockers MAY want to provide a way to let users provide the information
necessary to install for multiple environments at once when doing per-file
locking, e.g. supporting a JSON file format which specifies wheel tags and
marker values much like in ``[[file-lock]]`` for which multiple files can be
specified, which could then be directly recorded in the corresponding
``[[file-lock]]`` table (if it allowed for unambiguous per-file locking
environment selection)
.. code-block:: JSON
{
"marker-values": {"<marker>": "<value>"},
"wheel-tags": ["<tag>"]
}
---------------------------
Expectations for Installers
---------------------------
- Installers MAY support installation of non-binary files
(i.e. source distributions, source trees, and VCS), but are not required to
- Installers MUST provide a way to avoid non-binary file installation for
reproducibility and security purposes
- Installers SHOULD make it opt-in to use non-binary file installation to
facilitate a secure-by-default approach
- Under per-file locking, if what to install is ambiguous then the installer
MUST raise an error
Installing for per-file locking
===============================
An example workflow is:
- Iterate through each ``[[file-lock]]`` table to find the one that applies to
the environment being installed for
- If no compatible environment is found an error MUST be raised
- If multiple environments are found to be compatible then an error MUST be raised
- For the compatible environment, iterate through each entry in ``[[package]]``
- For each ``[[package]]`` entry, iterate through ``[[package.files]]`` to look
for any files with ``file-lock.name`` listed in ``package.files.lock``
- If a file is found with a matching lock name, add it to the list of candidate
files to install and move on to the next ``[[package]]`` entry
- If no file is found then check if ``package.vcs.lock`` contains a match (no
match is also acceptable)
- If a ``[[package.files]]`` contains multiple matching entries an error MUST
be raised due to ambiguity for what is to be installed
- If multiple ``[[package]]`` entries for the same package have matching files
an error MUST be raised due to ambiguity for what is to be installed
- Find and verify the candidate files and/or VCS entries based on their hash or
commit ID as appropriate
- If a source distribution or VCS was selected and
``[[package.build-requires]]`` exists, then repeat the above process as
appropriate to install the build dependencies necessary to build the package
- Install the candidate files
Installing for package locking
==============================
An example workflow is:
- Verify that the environment is compatible with
``package-lock.requires-python``; if it isn't an error MUST be raised
- Iterate through each entry in ``[package]]``
- For each entry, if there's a ``package.marker`` key, evaluate the expression
- If the expression is false, then move on
- Otherwise the package entry must be installed somehow
- Iterate through the files listed in ``[[package.files]]``, looking for the
"best" file to install
- If no file is found, check for ``[package.vcs]``
- If no match is found, an error MUST be raised
- Find and verify the selected files and/or VCS entries based on their hash or
commit ID as appropriate
- If the match is a source distribution or VCS and
``[[package.build-requires]]`` is provided, repeat the above as appropriate to
build the package
- Install the selected files
=======================
Backwards Compatibility
=======================
Because there is no preexisting lock file format, there are no explicit
backwards-compatibility concerns in terms of Python packaging standards.
As for packaging tools themselves, that will be a per-tool decision. For tools
that don't document their lock file format, they could choose to simply start
using the format internally and then transition to saving their lock files with
a name supported by this PEP. For tools with a preexisting, documented format,
they could provide an option to choose which format to emit.
=====================
Security Implications
=====================
The hope is that by standardizing on a lock file format that starts from a
security-first posture it will help make overall packaging installation safer.
However, this PEP does not solve all potential security concerns.
One potential concern is tampering with a lock file. If a lock file is not kept
in source control and properly audited, a bad actor could change the file in
nefarious ways (e.g. point to a malware version of a package). Tampering could
also occur in transit to e.g. a cloud provider who will perform an installation
on the user's behalf. Both could be mitigated by signing the lock file either
within the file in a ``[tool]`` entry or via a side channel external to the lock
file itself.
This PEP does not do anything to prevent a user from installing an incorrect
package. While including many details to help in auditing a package's inclusion,
there isn't any mechanism to stop e.g. name confusion attacks via typosquatting.
Lockers may be able to provide some UX to help with this (e.g. by providing
download counts for a package).
=================
How to Teach This
=================
Users should be informed that when they ask to install some package, that
package may have its own dependencies, those dependencies may have dependencies,
and so on. Without writing down what gets installed as part of installing the
package they requested, things could change from underneath them (e.g. package
versions). Changes to the underlying dependencies can lead to accidental
breakage of their code. Lock files help deal with that by providing a way to
write down what was installed.
Having what to install written down also helps in collaborating with others. By
agreeing to a lock file's contents, everyone ends up with the same packages
installed. This helps make sure no one relies on e.g. an API that's only
available in a certain version that not everyone working on the project has
installed.
Lock files also help with security by making sure you always get the same files
installed and not a malicious one that someone may have slipped in. It also
lets one be more deliberate in upgrading their dependencies and thus making sure
the change is on purpose and not one slipped in by a bad actor.
========================
Reference Implementation
========================
A rough proof-of-concept for per-file locking can be found at
https://github.com/brettcannon/mousebender/tree/pep. An example lock file can
be seen at
https://github.com/brettcannon/mousebender/blob/pep/pylock.example.toml.
For per-package locking, PDM_ indirectly proves the approach works as this PEP
maintains equivalent data as PDM does for its lock files (whose format was
inspired by Poetry_). Some of the details of PDM's approach are covered in
https://frostming.com/en/2024/pdm-lockfile/ and
https://frostming.com/en/2024/pdm-lock-strategy/.
==============
Rejected Ideas
==============
----------------------------
Only support package locking
----------------------------
At one point it was suggested to skip per-file locking and only support package
locking as the former was not explicitly supported in the larger Python
ecosystem while the latter was. But because this PEP has taken the position
that security is important and per-file locking is the more secure of the two
options, leaving out per-file locking was never considered.
-------------------------------------------------------------------------------------
Specifying a new core metadata version that requires consistent metadata across files
-------------------------------------------------------------------------------------
At one point, to handle the issue of metadata varying between files and thus
require examining every released file for a package and version for accurate
locking results, the idea was floated to introduce a new core metadata version
which would require all metadata for all wheel files be the same for a single
version of a package. Ultimately, though, it was deemed unnecessary as this PEP
will put pressure on people to make files consistent for performance reasons or
to make indexes provide all the metadata separate from the wheel files
themselves. As well, there's no easy enforcement mechanism, and so community
expectation would work as well as a new metadata version.
-------------------------------------------
Have the installer do dependency resolution
-------------------------------------------
In order to support a format more akin to how Poetry worked when this PEP was
drafted, it was suggested that lockers effectively record the packages and their
versions which may be necessary to make an install work in any possible
scenario, and then the installer resolves what to install. But that complicates
auditing a lock file by requiring much more mental effort to know what packages
may be installed in any given scenario. Also, one of the Poetry developers
`suggested <https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/83>`__
that markers as represented in the package locking approach of this PEP may be
sufficient to cover the needs of Poetry. Not having the installer do a
resolution also simplifies their implementation, centralizing complexity in
lockers.
-----------------------------------------
Requiring specific hash algorithm support
-----------------------------------------
It was proposed to require a baseline hash algorithm for the files. This was
rejected as no other Python packaging specification requires specific hash
algorithm support. As well, the minimum hash algorithm suggested may eventually
become an outdated/unsafe suggestion, requiring further updates. In order to
promote using the best algorithm at all times, no baseline is provided to avoid
simply defaulting to the baseline in tools without considering the security
ramifications of that hash algorithm.
-----------
File naming
-----------
Using ``*.pylock.toml`` as the file name
========================================
It was proposed to put the ``pylock`` constant part of the file name after the
identifier for the purpose of the lock file. It was decided not to do this so
that lock files would sort together when looking at directory contents instead
of purely based on their purpose which could spread them out in a directory.
Using ``*.pylock`` as the file name
===================================
Not using ``.toml`` as the file extension and instead making it ``.pylock``
itself was proposed. This was decided against so that code editors would know
how to provide syntax highlighting to a lock file without having special
knowledge about the file extension.
Not having a naming convention for the file
===========================================
Having no requirements or guidance for a lock file's name was considered, but
ultimately rejected. By having a standardized naming convention it makes it easy
to identify a lock file for both a human and a code editor. This helps
facilitate discovery when e.g. a tool wants to know all of the lock files that
are available.
-----------
File format
-----------
Use JSON over TOML
==================
Since having a format that is machine-writable was a goal of this PEP, it was
suggested to use JSON. But it was deemed less human-readable than TOML while
not improving on the machine-writable aspect enough to warrant the change.
Use YAML over TOML
==================
Some argued that YAML met the machine-writable/human-readable requirement in a
better way than TOML. But as that's subjective and ``pyproject.toml`` already
existed as the human-writable file used by Python packaging standards it was
deemed more important to keep using TOML.
----------
Other keys
----------
Multiple hashes per file
========================
An initial version of this PEP proposed supporting multiple hashes per file. The
idea was to allow one to choose which hashing algorithm they wanted to go with
when installing. But upon reflection it seemed like an unnecessary complication
as there was no guarantee the hashes provided would satisfy the user's needs.
As well, if the single hash algorithm used in the lock file wasn't sufficient,
rehashing the files involved as a way to migrate to a different algorithm didn't
seem insurmountable.
Hashing the contents of the lock file itself
============================================
Hashing the contents of the bytes of the file and storing hash value within the
file itself was proposed at some point. This was removed to make it easier
when merging changes to the lock file as each merge would have to recalculate
the hash value to avoid a merge conflict.
Hashing the semantic contents of the file was also proposed, but it would lead
to the same merge conflict issue.
Regardless of which contents were hashed, either approach could have the hash
value stored outside of the file if such a hash was desired.
Recording the creation date of the lock file
============================================
To know how potentially stale the lock file was, an earlier proposal suggested
recording the creation date of the lock file. But for some same merge conflict
reasons as storing the hash of the file contents, this idea was dropped.
Recording the package indexes used
==================================
Recording what package indexes were used by the locker to decide what to lock
for was considered. In the end, though, it was rejected as it was deemed
unnecessary bookkeeping.
===========
Open Issues
===========
N/A
================
Acknowledgements
================
Thanks to everyone who participated in the discussions in
https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/,
especially Alyssa Coghlan who probably caused the biggest structural shifts from
the initial proposal.
Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek Lev for
providing feedback on a draft version of this PEP.
=========
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
.. _core metadata: https://packaging.python.org/en/latest/specifications/core-metadata/
.. _Dependabot: https://docs.github.com/en/code-security/dependabot
.. _dependency specifiers: https://packaging.python.org/en/latest/specifications/dependency-specifiers/
.. _direct URL reference: https://packaging.python.org/en/latest/specifications/direct-url/
.. _environment markers: https://packaging.python.org/en/latest/specifications/dependency-specifiers/#environment-markers
.. _normalized name: https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
.. _PDM: https://pypi.org/project/pdm/
.. _pip-tools: https://pypi.org/project/pip-tools/
.. _Poetry: https://python-poetry.org/
.. _project detail: https://packaging.python.org/en/latest/specifications/simple-repository-api/#project-detail
.. _pyproject.toml specification: https://packaging.python.org/en/latest/specifications/pyproject-toml/#pyproject-toml-specification
.. _Simple Repository API: https://packaging.python.org/en/latest/specifications/simple-repository-api/
.. _software bill of materials: https://www.cisa.gov/sbom
.. _TOML: https://toml.io/
.. _version specifiers: https://packaging.python.org/en/latest/specifications/version-specifiers/
.. _wheel tags: https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/