PEP 751: A file format to list Python dependencies for installation reproducibility (#3870)
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
This commit is contained in:
parent
d0bbb6bdbb
commit
92634ee01f
|
@ -625,9 +625,11 @@ peps/pep-0743.rst @vstinner
|
|||
peps/pep-0744.rst @brandtbucher
|
||||
peps/pep-0745.rst @hugovk
|
||||
peps/pep-0746.rst @JelleZijlstra
|
||||
peps/pep-0747.rst @JelleZijlstra
|
||||
# ...
|
||||
peps/pep-0749.rst @JelleZijlstra
|
||||
# ...
|
||||
peps/pep-0747.rst @JelleZijlstra
|
||||
peps/pep-0751.rst @brettcannon
|
||||
# ...
|
||||
# peps/pep-0754.rst
|
||||
# ...
|
||||
|
|
|
@ -0,0 +1,934 @@
|
|||
PEP: 751
|
||||
Title: A file format to list Python dependencies for installation reproducibility
|
||||
Author: Brett Cannon <brett@python.org>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Topic: Packaging
|
||||
Created: 24-Jul-2024
|
||||
Replaces: 665
|
||||
|
||||
========
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes a new file format for dependency specification
|
||||
to enable reproducible installation in a Python environment. The format is
|
||||
designed to be human-readable and machine-generated. Installers consuming the
|
||||
file should be able to evaluate each package in question in isolation, with no
|
||||
need for dependency resolution at install-time.
|
||||
|
||||
|
||||
==========
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Currently, no standard exists to:
|
||||
|
||||
- Specify what top-level dependencies should be installed into a Python
|
||||
environment.
|
||||
- Create an immutable record, such as a lock file, of which dependencies were
|
||||
installed.
|
||||
|
||||
Considering there are at least four well-known solutions to this problem in the
|
||||
community (``pip freeze``, pip-tools_, Poetry_, and PDM_), there seems to be an
|
||||
appetite for lock files in general.
|
||||
|
||||
Those tools also vary in what locking scenarios they support. For instance,
|
||||
``pip freeze`` and pip-tools only generate lock files for the current
|
||||
environment while PDM and Poetry try to lock for *any* environment to some
|
||||
degree. And none of them directly support locking to specific files to install
|
||||
which can be important for some workflows. There's also concerns around the lack
|
||||
of secure defaults in the face of supply chain attacks (e.g., always including
|
||||
hashes for files). Finally, not all the formats are easy to audit to determine
|
||||
what would be installed into an environment ahead of time.
|
||||
|
||||
The lack of a standard also has some drawbacks. For instance, any tooling that
|
||||
wants to work with lock files must choose which format to support, potentially
|
||||
leaving users unsupported (e.g., if Dependabot_ chose not to support PDM,
|
||||
support by cloud providers who can do dependency installations on your behalf,
|
||||
etc.).
|
||||
|
||||
|
||||
=========
|
||||
Rationale
|
||||
=========
|
||||
|
||||
The format is designed so that a *locker* which produces the lock file
|
||||
and an *installer* which consumes the lock file can be separate tools. This
|
||||
allows for situations such as cloud hosting providers to use their own installer
|
||||
that's optimized for their system which is independent of what locker the user
|
||||
used to create their lock file.
|
||||
|
||||
The file format is designed to be human-readable. This is
|
||||
so that the contents of the file can be audited by a human to make sure no
|
||||
undesired dependencies end up being included in the lock file. It is also to
|
||||
facilitate easy understanding of what would be installed if the lock file
|
||||
without necessitating running a tool, once again to help with auditing. Finally,
|
||||
the format is designed so that viewing a diff of the file is easy by centralizing
|
||||
relevant details.
|
||||
|
||||
The file format is also designed to not require a resolver at install time.
|
||||
Being able to analyze dependencies in isolation from one another when listed in
|
||||
a lock file provides a few benefits. First, it supports auditing by making it
|
||||
easy to figure out if a certain dependency would be installed for a certain
|
||||
environment without needing to reference other parts of the file contextually.
|
||||
It should also lead to faster installs which are much more frequent than
|
||||
creating a lock file. Finally, the four tools mentioned in the Motivation_
|
||||
section either already implement this approach of evaluating dependencies in
|
||||
isolation or have suggested they could (in
|
||||
`Poetry's case <https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/83>`__).
|
||||
|
||||
|
||||
-----------------
|
||||
Locking Scenarios
|
||||
-----------------
|
||||
|
||||
The lock file format is designed to support two locking scenarios. The format
|
||||
should also be flexible enough that adding support for other locking scenarios
|
||||
is possible via a separate PEP.
|
||||
|
||||
|
||||
Per-file Locking
|
||||
================
|
||||
|
||||
*Per-file locking* operates under the premise that one wants to install exactly
|
||||
the same files in any matching environment. As such, the lock file specifies the
|
||||
with the files to install. There can be multiple environments specified in a
|
||||
single file, each with their own set of files to install. By specifying the
|
||||
exact files to install, installers avoid performing any resolution to decide what
|
||||
to install.
|
||||
|
||||
The motivation for this approach to locking is for those who have controlled
|
||||
environments that they work with. For instance, if you have specific, controlled
|
||||
development and production environments then you can use per-file locking to
|
||||
make sure the **same** files are installed in both environments for everyone.
|
||||
This is similar to what ``pip freeze`` and pip-tools_
|
||||
support, but with more strictness of the exact files as well as incorporating
|
||||
support to specify the locked files for multiple environments in the same file.
|
||||
|
||||
|
||||
Package Locking
|
||||
===============
|
||||
|
||||
*Package locking* lists the packages and their versions that *may* apply to any
|
||||
environment being installed for. The list of packages and their versions are
|
||||
evaluated individually and independently from any other packages and versions
|
||||
listed in the file. This allows installation to be linear -- read each package
|
||||
and version and make an isolated decision as to whether it should be installed.
|
||||
This avoids requiring the installer to perform a *resolution* (i.e.
|
||||
determine what to install based on what else is to be installed).
|
||||
|
||||
The motivation of this approach comes from
|
||||
`PDM lock files <https://frostming.com/en/2024/pdm-lockfile/>`__. By listing the
|
||||
potential packages and versions that may be installed, what's installed is
|
||||
controlled in a way that's easy to reason about. This also allows for not
|
||||
specifying the exact environments that would be supported by the lock file so
|
||||
there's more flexibility for what environments are compatible with the lock
|
||||
file. This approach supports scenarios like open-source projects that want to
|
||||
lock what people should use to build the documentation without knowing upfront
|
||||
what environments their contributors are working from.
|
||||
|
||||
As already mentioned, this approach is supported by PDM_. Poetry_ has
|
||||
`shown some interest <https://discuss.python.org/t/46593/83>`__.
|
||||
|
||||
|
||||
=============
|
||||
Specification
|
||||
=============
|
||||
|
||||
---------
|
||||
File Name
|
||||
---------
|
||||
|
||||
A lock file MUST be named :file:`pylock.toml` or match the regular expression
|
||||
``r"pylock\.(.+)\.toml"`` if a name for the lock file is desired or if multiple lock files exist.
|
||||
The use of the ``.toml`` file extension is to make syntax highlighting in
|
||||
editors easier and to reinforce the fact that the file format is meant to be
|
||||
human-readable. The prefix and suffix of a named file MUST be lowercase for easy
|
||||
detection and stripping off to find the name, e.g.::
|
||||
|
||||
if filename.startswith("pylock.") and filename.endswith(".toml"):
|
||||
name = filename.removeprefix("pylock.").removesuffix(".toml")
|
||||
|
||||
This PEP has no opinion as to the location of lock files (i.e. in the root or
|
||||
the subdirectory of a project).
|
||||
|
||||
|
||||
-----------
|
||||
File Format
|
||||
-----------
|
||||
|
||||
The format of the file is TOML_.
|
||||
|
||||
All keys listed below are required unless otherwise noted. If two keys are
|
||||
mutually exclusive to one another, then one of the keys is required while the
|
||||
other is disallowed.
|
||||
|
||||
|
||||
``version``
|
||||
===========
|
||||
|
||||
- String
|
||||
- The version of the lock file format.
|
||||
- This PEP specifies the initial version -- and only valid value until future
|
||||
updates to the standard change it -- as ``"1.0"``.
|
||||
|
||||
|
||||
``hash-algorithm``
|
||||
==================
|
||||
|
||||
- String
|
||||
- The name of the hash algorithm used for calculating all hash values.
|
||||
- Only a single hash algorithm is used for the entire file to allow the
|
||||
``[[package.files]]`` table to be written inline for readability and
|
||||
compactness purposes by only listing a single hash value instead of multiple
|
||||
values based on multiple hash algorithms.
|
||||
- Specifying a single hash algorithm guarantees that an algorithm that the user
|
||||
prefers is used consistently throughout the file without having to audit
|
||||
each file hash value separately.
|
||||
- Allows for updating the entire file to a new hash algorithm without running
|
||||
the risk of accidentally leaving an old hash value in the file.
|
||||
- :ref:`packaging:simple-repository-api-json` and the ``hashes`` dictionary of
|
||||
of the ``files`` dictionary of the Project Details dictionary specifies what
|
||||
values are valid and guidelines on what hash algorithms to use.
|
||||
- Failure to validate any hash values for any file that is to be installed MUST
|
||||
raise an error.
|
||||
|
||||
|
||||
``dependencies``
|
||||
================
|
||||
|
||||
- Array of strings
|
||||
- A listing of the `dependency specifiers`_ that act as the input to the lock file,
|
||||
representing the direct, top-level dependencies to be installed.
|
||||
|
||||
|
||||
``[[file-lock]]``
|
||||
=================
|
||||
|
||||
- Array of tables
|
||||
- Mutually exclusive with ``[package-lock]``.
|
||||
- The array's existence implies the use of the per-file locking approach.
|
||||
- An environment that meets all of the specified criteria in the table will be
|
||||
considered compatible with the environment that was locked for.
|
||||
- Lockers MUST NOT generate multiple ``[file-lock]`` tables which would be
|
||||
considered compatible for the same environment.
|
||||
- In instances where there would be a conflict but the lock is still desired,
|
||||
either separate lock files can be written or per-package locking can be used.
|
||||
- Entries in array SHOULD be sorted by ``file-lock.name`` lexicographically.
|
||||
|
||||
|
||||
``file-lock.name``
|
||||
------------------
|
||||
|
||||
- String
|
||||
- A unique name within the array for the environment this table represents.
|
||||
|
||||
|
||||
``[file-lock.marker-values]``
|
||||
-----------------------------
|
||||
|
||||
- Optional
|
||||
- Table of strings
|
||||
- The keys represent the names of `environment markers`_ and the values are the
|
||||
values for those markers.
|
||||
- Compatibility is defined by the environment's values matching what is in the
|
||||
table.
|
||||
- Lockers SHOULD sort the keys lexicographically to minimize changes when
|
||||
updating the file.
|
||||
|
||||
|
||||
``file-lock.wheel-tags``
|
||||
------------------------
|
||||
|
||||
- Optional
|
||||
- Array of strings
|
||||
- An unordered array of `wheel tags`_ which must be supported by the environment.
|
||||
- The array MAY not be exhaustive to allow for a smaller array as well as to
|
||||
help prevent multiple ``[[file-lock]]`` tables being compatible with the
|
||||
same environment by having one array being a strict subset of another
|
||||
``file-lock.wheel-tags`` entry in the same file's
|
||||
``[[file-lock]]`` tables.
|
||||
- Lockers SHOULD sort the keys lexicographically to minimize changes when
|
||||
updating the file.
|
||||
- Lockers MUST NOT include
|
||||
`compressed tag sets <https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/#compressed-tag-sets>`__
|
||||
or duplicate tags for consistency across lockers and to simplify checking for
|
||||
compatibility.
|
||||
|
||||
|
||||
``[package-lock]``
|
||||
==================
|
||||
|
||||
- Table
|
||||
- Mutually exclusive with ``[[file-lock]]``.
|
||||
- Signifies the use of the package locking approach.
|
||||
|
||||
|
||||
``package-lock.requires-python``
|
||||
--------------------------------
|
||||
|
||||
- String
|
||||
- Holds the `version specifiers`_ for Python version compatibility for the
|
||||
overall package locking.
|
||||
- Provides at-a-glance information to know if the lock file *may* apply to a
|
||||
version of Python instead of having to scan the entire file to compile the
|
||||
same information.
|
||||
|
||||
|
||||
``[[package]]``
|
||||
===============
|
||||
|
||||
- Array of tables
|
||||
- The array contains all data on the locked package versions.
|
||||
- Lockers SHOULD record packages in order by ``package.name`` lexicographically
|
||||
and ``package.version`` by the sort order for `version specifiers`_.
|
||||
- Lockers SHOULD record keys in the same order as written in this PEP to
|
||||
minimmize changes when updating.
|
||||
- Designed so that relevant details as to why a package is included are
|
||||
in one place to make diff reading easier.
|
||||
|
||||
|
||||
``package.name``
|
||||
----------------
|
||||
|
||||
- String
|
||||
- The `normalized name`_ of the package.
|
||||
- Part of what's required to uniquely identify this entry.
|
||||
|
||||
|
||||
``package.version``
|
||||
-------------------
|
||||
|
||||
- String
|
||||
- The version of the package.
|
||||
- Part of what's required to uniquely identify this entry.
|
||||
|
||||
|
||||
``package.multiple-entries``
|
||||
----------------------------
|
||||
|
||||
- Boolean
|
||||
- If package locking via ``[package-lock]``, then the multiple entries for the
|
||||
same package MUST be mutually exclusive via ``package.marker`` (this is not
|
||||
required for per-file locking as the ``package.*.lock`` entries imply mutual
|
||||
exclusivity).
|
||||
- Aids in auditing by knowing that there are multiple entries for the same
|
||||
package that may need to be considered.
|
||||
|
||||
|
||||
``package.description``
|
||||
-----------------------
|
||||
|
||||
- Optional
|
||||
- String
|
||||
- The package's ``Summary`` from its `core metadata`_.
|
||||
- Useful to help understand why a package was included in the file based on its
|
||||
purpose.
|
||||
|
||||
|
||||
``package.simple-repo-package-url``
|
||||
-----------------------------------
|
||||
|
||||
- Optional (although mutually exclusive with
|
||||
``package.files.simple-repo-package-url``)
|
||||
- String
|
||||
- Stores the `project detail`_ URL from the `Simple Repository API`_.
|
||||
- Useful for generating Packaging URLs (aka PURLs).
|
||||
- When possible, lockers SHOULD include this or
|
||||
``package.files.simple-repo-package-url`` to assist with generating
|
||||
`software bill of materials`_ (aka SBOMs).
|
||||
|
||||
|
||||
``package.marker``
|
||||
------------------
|
||||
|
||||
- Optional
|
||||
- String
|
||||
- The `environment markers`_ expression which specifies whether this package and
|
||||
version applies to the environment.
|
||||
- Only applicable via ``[package-lock]`` and the package locking scenario.
|
||||
- The lack of this key means this package and version is required to be
|
||||
installed.
|
||||
|
||||
|
||||
``package.requires-python``
|
||||
---------------------------
|
||||
|
||||
- Optional
|
||||
- String
|
||||
- Holds the `version specifiers`_ for Python version compatibility for the
|
||||
package and version.
|
||||
- Useful for documenting why this package and version was included in the file.
|
||||
- Also helps document why the version restriction in
|
||||
``package-lock.requires-python`` was chosen.
|
||||
- It should not provide useful information for installers as it would be
|
||||
captured by ``package-lock.requires-python`` and isn't relevant when
|
||||
``[[file-lock]]`` is used.
|
||||
|
||||
|
||||
``package.dependents``
|
||||
----------------------
|
||||
|
||||
- Optional
|
||||
- Array of strings
|
||||
- A record of the packages that depend on this package and version.
|
||||
- Useful for analyzing why a package happens to be listed in the file
|
||||
for auditing purposes.
|
||||
- This does not provide information which influences installers.
|
||||
|
||||
|
||||
``package.dependencies``
|
||||
------------------------
|
||||
|
||||
- Optional
|
||||
- Array of strings
|
||||
- A record of the dependencies of the package and version.
|
||||
- Useful in analyzing why a package happens to be listed in the file
|
||||
for auditing purposes.
|
||||
- This does not provide information which influences the installer as
|
||||
``[[file-lock]]`` specifies the exact files to use and ``[package-lock]``
|
||||
applicability is determined by ``package.marker``.
|
||||
|
||||
|
||||
``package.direct``
|
||||
------------------
|
||||
|
||||
- Optional (defaults to ``false``)
|
||||
- Boolean
|
||||
- Represents whether the installation is via a `direct URL reference`_.
|
||||
|
||||
|
||||
``[[package.files]]``
|
||||
---------------------
|
||||
|
||||
- Must be specified if ``[package.vcs]`` is not
|
||||
- Array of tables
|
||||
- Tables can be written inline.
|
||||
- Represents the files to potentially install for the package and version.
|
||||
- Entries in ``[[package.files]]`` SHOULD be lexicographically sorted by
|
||||
``package.files.name`` key to minimze changes in diffs.
|
||||
|
||||
|
||||
``package.files.name``
|
||||
''''''''''''''''''''''
|
||||
|
||||
- String
|
||||
- The file name.
|
||||
- Necessary for installers to decide what to install when using package locking.
|
||||
|
||||
|
||||
``package.files.lock``
|
||||
''''''''''''''''''''''
|
||||
|
||||
- Required when ``[[file-lock]]`` is used
|
||||
- Array of strings
|
||||
- An array of ``file-lock.name`` values which signify that the file is to be
|
||||
installed when the corresponding ``[[file-lock]]`` table applies to the
|
||||
environment.
|
||||
- There MUST only be a single file with any one ``file-lock.name`` entry per
|
||||
package, regardless of version.
|
||||
|
||||
|
||||
``package.files.simple-repo-package-url``
|
||||
'''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
- Optional (although mutually exclusive with
|
||||
``package.simple-repo-package-url``)
|
||||
- String
|
||||
- The value has the same meaning as ``package.simple-repo-package-url``.
|
||||
- This key is available per-file to support :pep:`708` when some files override
|
||||
what's provided by another `Simple Repository API`_ index.
|
||||
|
||||
|
||||
``package.files.origin``
|
||||
''''''''''''''''''''''''
|
||||
|
||||
- Optional
|
||||
- String
|
||||
- URI where the file was found when the lock file was generated.
|
||||
- Useful for documenting where the file came from and potentially where to look
|
||||
for the file if not already downloaded/available.
|
||||
|
||||
|
||||
``package.files.hash``
|
||||
''''''''''''''''''''''
|
||||
|
||||
- String
|
||||
- The hash value of the file contents using the hash algorithm specified by
|
||||
``hash-algorithm``.
|
||||
- Used by installers to verify the file contents match what the locker worked
|
||||
with.
|
||||
|
||||
|
||||
``[package.vcs]``
|
||||
-----------------
|
||||
|
||||
- Must be specified if ``[[package.files]]`` is not (although may be specified
|
||||
simultaneously with ``[[package.files]]``).
|
||||
- Table representing the version control system containing the package and
|
||||
version.
|
||||
|
||||
|
||||
``package.vcs.type``
|
||||
''''''''''''''''''''
|
||||
|
||||
- String
|
||||
- The type of version control system used.
|
||||
- The valid values are specified by the
|
||||
`registered VCSs <https://packaging.python.org/en/latest/specifications/direct-url-data-structure/#registered-vcs>`__
|
||||
of the direct URL data structure.
|
||||
|
||||
|
||||
``package.vcs.origin``
|
||||
''''''''''''''''''''''
|
||||
|
||||
- String
|
||||
- The URI of where the repository was located when the lock file was generated.
|
||||
|
||||
|
||||
``package.vcs.commit``
|
||||
''''''''''''''''''''''
|
||||
|
||||
- String
|
||||
- The commit ID for the repository which represents the package and version.
|
||||
- The value MUST be immutable for the VCS for security purposes
|
||||
(e.g. no Git tags).
|
||||
|
||||
|
||||
``package.vcs.lock``
|
||||
''''''''''''''''''''
|
||||
|
||||
- Required when ``[[file-lock]]`` is used
|
||||
- An array of strings
|
||||
- An array of ``file-lock.name`` values which signify that the repository at the
|
||||
specified commit is to be installed when the corresponding ``[[file-lock]]``
|
||||
table applies to the environment.
|
||||
- A name in the array may only appear if no file listed in
|
||||
``package.files.lock`` contains the name for the same package, regardless of
|
||||
version.
|
||||
|
||||
|
||||
``package.directory``
|
||||
---------------------
|
||||
|
||||
- Optional and only valid when ``[package-lock]`` is specified
|
||||
- String
|
||||
- A local directory where a source tree for the package and version exists.
|
||||
- Not valid under ``[[file-lock]]`` as this PEP does not make an attempt to
|
||||
specify a mechanism for verifying file contents have not changed since locking
|
||||
was performed.
|
||||
|
||||
|
||||
``[[package.build-requires]]``
|
||||
------------------------------
|
||||
|
||||
- Optional
|
||||
- An array of tables whose structure matches that of ``[[package]]``.
|
||||
- Each entry represents a package and version to use when building the
|
||||
enclosing package and version.
|
||||
- The array is complete/locked like ``[[package]]`` itself (i.e. installers
|
||||
follow the same installation procedure for ``[[package.build-requires]]`` as
|
||||
``[[package]]``)
|
||||
- Selection of which entries to use for an environment as the same as
|
||||
``[[package]]`` itself, albeit only applying when installing the build
|
||||
back-end and its dependencies.
|
||||
- This helps with reproducibility of the building of a package by recording
|
||||
either what was or would have been used if the locker needed to build the
|
||||
package.
|
||||
- If the installer and user choose to install from source and this array is
|
||||
missing then the installer MAY choose to resolve what to install for building
|
||||
at install time, otherwise the installer MUST raise an error.
|
||||
|
||||
|
||||
``[package.tool]``
|
||||
------------------
|
||||
|
||||
- Optional
|
||||
- Same usage as that of the equivalent table from the
|
||||
`pyproject.toml specification`_.
|
||||
|
||||
|
||||
``[tool]``
|
||||
==========
|
||||
|
||||
- Optional
|
||||
- Same usage as that of the equivalent table from the
|
||||
`pyproject.toml specification`_.
|
||||
|
||||
|
||||
------------------------
|
||||
Expectations for Lockers
|
||||
------------------------
|
||||
|
||||
- When creating a lock file for ``[package-lock]``, the locker SHOULD read
|
||||
the metadata of **all** files that end up being listed in
|
||||
``[[package.files]]`` to make sure all potential metadata cases are covered
|
||||
- If a locker chooses not to check every file for its metadata, the tool MUST
|
||||
either provide the user with the option to have all files checked (whether
|
||||
that is opt-in or out is left up to the tool), or the user is somehow notified
|
||||
that such a standards-violating shortcut is being taken (whether this is by
|
||||
documentation or at runtime is left to the tool)
|
||||
- Lockers MAY want to provide a way to let users provide the information
|
||||
necessary to install for multiple environments at once when doing per-file
|
||||
locking, e.g. supporting a JSON file format which specifies wheel tags and
|
||||
marker values much like in ``[[file-lock]]`` for which multiple files can be
|
||||
specified, which could then be directly recorded in the corresponding
|
||||
``[[file-lock]]`` table (if it allowed for unambiguous per-file locking
|
||||
environment selection)
|
||||
|
||||
.. code-block:: JSON
|
||||
|
||||
{
|
||||
"marker-values": {"<marker>": "<value>"},
|
||||
"wheel-tags": ["<tag>"]
|
||||
}
|
||||
|
||||
|
||||
---------------------------
|
||||
Expectations for Installers
|
||||
---------------------------
|
||||
|
||||
- Installers MAY support installation of non-binary files
|
||||
(i.e. source distributions, source trees, and VCS), but are not required to
|
||||
- Installers MUST provide a way to avoid non-binary file installation for
|
||||
reproducibility and security purposes
|
||||
- Installers SHOULD make it opt-in to use non-binary file installation to
|
||||
facilitate a secure-by-default approach
|
||||
- Under per-file locking, if what to install is ambiguous then the installer
|
||||
MUST raise an error
|
||||
|
||||
|
||||
Installing for per-file locking
|
||||
===============================
|
||||
|
||||
An example workflow is:
|
||||
|
||||
- Iterate through each ``[[file-lock]]`` table to find the one that applies to
|
||||
the environment being installed for
|
||||
- If no compatible environment is found an error MUST be raised
|
||||
- If multiple environments are found to be compatible then an error MUST be raised
|
||||
- For the compatible environment, iterate through each entry in ``[[package]]``
|
||||
- For each ``[[package]]`` entry, iterate through ``[[package.files]]`` to look
|
||||
for any files with ``file-lock.name`` listed in ``package.files.lock``
|
||||
- If a file is found with a matching lock name, add it to the list of candidate
|
||||
files to install and move on to the next ``[[package]]`` entry
|
||||
- If no file is found then check if ``package.vcs.lock`` contains a match (no
|
||||
match is also acceptable)
|
||||
- If a ``[[package.files]]`` contains multiple matching entries an error MUST
|
||||
be raised due to ambiguity for what is to be installed
|
||||
- If multiple ``[[package]]`` entries for the same package have matching files
|
||||
an error MUST be raised due to ambiguity for what is to be installed
|
||||
- Find and verify the candidate files and/or VCS entries based on their hash or
|
||||
commit ID as appropriate
|
||||
- If a source distribution or VCS was selected and
|
||||
``[[package.build-requires]]`` exists, then repeat the above process as
|
||||
appropriate to install the build dependencies necessary to build the package
|
||||
- Install the candidate files
|
||||
|
||||
|
||||
Installing for package locking
|
||||
==============================
|
||||
|
||||
An example workflow is:
|
||||
|
||||
- Verify that the environment is compatible with
|
||||
``package-lock.requires-python``; if it isn't an error MUST be raised
|
||||
- Iterate through each entry in ``[package]]``
|
||||
- For each entry, if there's a ``package.marker`` key, evaluate the expression
|
||||
|
||||
- If the expression is false, then move on
|
||||
- Otherwise the package entry must be installed somehow
|
||||
- Iterate through the files listed in ``[[package.files]]``, looking for the
|
||||
"best" file to install
|
||||
- If no file is found, check for ``[package.vcs]``
|
||||
- If no match is found, an error MUST be raised
|
||||
- Find and verify the selected files and/or VCS entries based on their hash or
|
||||
commit ID as appropriate
|
||||
- If the match is a source distribution or VCS and
|
||||
``[[package.build-requires]]`` is provided, repeat the above as appropriate to
|
||||
build the package
|
||||
- Install the selected files
|
||||
|
||||
|
||||
=======================
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
Because there is no preexisting lock file format, there are no explicit
|
||||
backwards-compatibility concerns in terms of Python packaging standards.
|
||||
|
||||
As for packaging tools themselves, that will be a per-tool decision. For tools
|
||||
that don't document their lock file format, they could choose to simply start
|
||||
using the format internally and then transition to saving their lock files with
|
||||
a name supported by this PEP. For tools with a preexisting, documented format,
|
||||
they could provide an option to choose which format to emit.
|
||||
|
||||
|
||||
=====================
|
||||
Security Implications
|
||||
=====================
|
||||
|
||||
The hope is that by standardizing on a lock file format that starts from a
|
||||
security-first posture it will help make overall packaging installation safer.
|
||||
However, this PEP does not solve all potential security concerns.
|
||||
|
||||
One potential concern is tampering with a lock file. If a lock file is not kept
|
||||
in source control and properly audited, a bad actor could change the file in
|
||||
nefarious ways (e.g. point to a malware version of a package). Tampering could
|
||||
also occur in transit to e.g. a cloud provider who will perform an installation
|
||||
on the user's behalf. Both could be mitigated by signing the lock file either
|
||||
within the file in a ``[tool]`` entry or via a side channel external to the lock
|
||||
file itself.
|
||||
|
||||
This PEP does not do anything to prevent a user from installing an incorrect
|
||||
package. While including many details to help in auditing a package's inclusion,
|
||||
there isn't any mechanism to stop e.g. name confusion attacks via typosquatting.
|
||||
Lockers may be able to provide some UX to help with this (e.g. by providing
|
||||
download counts for a package).
|
||||
|
||||
|
||||
=================
|
||||
How to Teach This
|
||||
=================
|
||||
|
||||
Users should be informed that when they ask to install some package, that
|
||||
package may have its own dependencies, those dependencies may have dependencies,
|
||||
and so on. Without writing down what gets installed as part of installing the
|
||||
package they requested, things could change from underneath them (e.g. package
|
||||
versions). Changes to the underlying dependencies can lead to accidental
|
||||
breakage of their code. Lock files help deal with that by providing a way to
|
||||
write down what was installed.
|
||||
|
||||
Having what to install written down also helps in collaborating with others. By
|
||||
agreeing to a lock file's contents, everyone ends up with the same packages
|
||||
installed. This helps make sure no one relies on e.g. an API that's only
|
||||
available in a certain version that not everyone working on the project has
|
||||
installed.
|
||||
|
||||
Lock files also help with security by making sure you always get the same files
|
||||
installed and not a malicious one that someone may have slipped in. It also
|
||||
lets one be more deliberate in upgrading their dependencies and thus making sure
|
||||
the change is on purpose and not one slipped in by a bad actor.
|
||||
|
||||
|
||||
========================
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
A rough proof-of-concept for per-file locking can be found at
|
||||
https://github.com/brettcannon/mousebender/tree/pep. An example lock file can
|
||||
be seen at
|
||||
https://github.com/brettcannon/mousebender/blob/pep/pylock.example.toml.
|
||||
|
||||
For per-package locking, PDM_ indirectly proves the approach works as this PEP
|
||||
maintains equivalent data as PDM does for its lock files (whose format was
|
||||
inspired by Poetry_). Some of the details of PDM's approach are covered in
|
||||
https://frostming.com/en/2024/pdm-lockfile/ and
|
||||
https://frostming.com/en/2024/pdm-lock-strategy/.
|
||||
|
||||
|
||||
==============
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
----------------------------
|
||||
Only support package locking
|
||||
----------------------------
|
||||
|
||||
At one point it was suggested to skip per-file locking and only support package
|
||||
locking as the former was not explicitly supported in the larger Python
|
||||
ecosystem while the latter was. But because this PEP has taken the position
|
||||
that security is important and per-file locking is the more secure of the two
|
||||
options, leaving out per-file locking was never considered.
|
||||
|
||||
|
||||
-------------------------------------------------------------------------------------
|
||||
Specifying a new core metadata version that requires consistent metadata across files
|
||||
-------------------------------------------------------------------------------------
|
||||
|
||||
At one point, to handle the issue of metadata varying between files and thus
|
||||
require examining every released file for a package and version for accurate
|
||||
locking results, the idea was floated to introduce a new core metadata version
|
||||
which would require all metadata for all wheel files be the same for a single
|
||||
version of a package. Ultimately, though, it was deemed unnecessary as this PEP
|
||||
will put pressure on people to make files consistent for performance reasons or
|
||||
to make indexes provide all the metadata separate from the wheel files
|
||||
themselves. As well, there's no easy enforcement mechanism, and so community
|
||||
expectation would work as well as a new metadata version.
|
||||
|
||||
|
||||
-------------------------------------------
|
||||
Have the installer do dependency resolution
|
||||
-------------------------------------------
|
||||
|
||||
In order to support a format more akin to how Poetry worked when this PEP was
|
||||
drafted, it was suggested that lockers effectively record the packages and their
|
||||
versions which may be necessary to make an install work in any possible
|
||||
scenario, and then the installer resolves what to install. But that complicates
|
||||
auditing a lock file by requiring much more mental effort to know what packages
|
||||
may be installed in any given scenario. Also, one of the Poetry developers
|
||||
`suggested <https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/83>`__
|
||||
that markers as represented in the package locking approach of this PEP may be
|
||||
sufficient to cover the needs of Poetry. Not having the installer do a
|
||||
resolution also simplifies their implementation, centralizing complexity in
|
||||
lockers.
|
||||
|
||||
|
||||
-----------------------------------------
|
||||
Requiring specific hash algorithm support
|
||||
-----------------------------------------
|
||||
|
||||
It was proposed to require a baseline hash algorithm for the files. This was
|
||||
rejected as no other Python packaging specification requires specific hash
|
||||
algorithm support. As well, the minimum hash algorithm suggested may eventually
|
||||
become an outdated/unsafe suggestion, requiring further updates. In order to
|
||||
promote using the best algorithm at all times, no baseline is provided to avoid
|
||||
simply defaulting to the baseline in tools without considering the security
|
||||
ramifications of that hash algorithm.
|
||||
|
||||
|
||||
-----------
|
||||
File naming
|
||||
-----------
|
||||
|
||||
Using ``*.pylock.toml`` as the file name
|
||||
========================================
|
||||
|
||||
It was proposed to put the ``pylock`` constant part of the file name after the
|
||||
identifier for the purpose of the lock file. It was decided not to do this so
|
||||
that lock files would sort together when looking at directory contents instead
|
||||
of purely based on their purpose which could spread them out in a directory.
|
||||
|
||||
|
||||
Using ``*.pylock`` as the file name
|
||||
===================================
|
||||
|
||||
Not using ``.toml`` as the file extension and instead making it ``.pylock``
|
||||
itself was proposed. This was decided against so that code editors would know
|
||||
how to provide syntax highlighting to a lock file without having special
|
||||
knowledge about the file extension.
|
||||
|
||||
|
||||
Not having a naming convention for the file
|
||||
===========================================
|
||||
|
||||
Having no requirements or guidance for a lock file's name was considered, but
|
||||
ultimately rejected. By having a standardized naming convention it makes it easy
|
||||
to identify a lock file for both a human and a code editor. This helps
|
||||
facilitate discovery when e.g. a tool wants to know all of the lock files that
|
||||
are available.
|
||||
|
||||
|
||||
-----------
|
||||
File format
|
||||
-----------
|
||||
|
||||
Use JSON over TOML
|
||||
==================
|
||||
|
||||
Since having a format that is machine-writable was a goal of this PEP, it was
|
||||
suggested to use JSON. But it was deemed less human-readable than TOML while
|
||||
not improving on the machine-writable aspect enough to warrant the change.
|
||||
|
||||
|
||||
Use YAML over TOML
|
||||
==================
|
||||
|
||||
Some argued that YAML met the machine-writable/human-readable requirement in a
|
||||
better way than TOML. But as that's subjective and ``pyproject.toml`` already
|
||||
existed as the human-writable file used by Python packaging standards it was
|
||||
deemed more important to keep using TOML.
|
||||
|
||||
|
||||
----------
|
||||
Other keys
|
||||
----------
|
||||
|
||||
Multiple hashes per file
|
||||
========================
|
||||
|
||||
An initial version of this PEP proposed supporting multiple hashes per file. The
|
||||
idea was to allow one to choose which hashing algorithm they wanted to go with
|
||||
when installing. But upon reflection it seemed like an unnecessary complication
|
||||
as there was no guarantee the hashes provided would satisfy the user's needs.
|
||||
As well, if the single hash algorithm used in the lock file wasn't sufficient,
|
||||
rehashing the files involved as a way to migrate to a different algorithm didn't
|
||||
seem insurmountable.
|
||||
|
||||
|
||||
Hashing the contents of the lock file itself
|
||||
============================================
|
||||
|
||||
Hashing the contents of the bytes of the file and storing hash value within the
|
||||
file itself was proposed at some point. This was removed to make it easier
|
||||
when merging changes to the lock file as each merge would have to recalculate
|
||||
the hash value to avoid a merge conflict.
|
||||
|
||||
Hashing the semantic contents of the file was also proposed, but it would lead
|
||||
to the same merge conflict issue.
|
||||
|
||||
Regardless of which contents were hashed, either approach could have the hash
|
||||
value stored outside of the file if such a hash was desired.
|
||||
|
||||
|
||||
Recording the creation date of the lock file
|
||||
============================================
|
||||
|
||||
To know how potentially stale the lock file was, an earlier proposal suggested
|
||||
recording the creation date of the lock file. But for some same merge conflict
|
||||
reasons as storing the hash of the file contents, this idea was dropped.
|
||||
|
||||
|
||||
Recording the package indexes used
|
||||
==================================
|
||||
|
||||
Recording what package indexes were used by the locker to decide what to lock
|
||||
for was considered. In the end, though, it was rejected as it was deemed
|
||||
unnecessary bookkeeping.
|
||||
|
||||
|
||||
===========
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
N/A
|
||||
|
||||
|
||||
================
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
Thanks to everyone who participated in the discussions in
|
||||
https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/,
|
||||
especially Alyssa Coghlan who probably caused the biggest structural shifts from
|
||||
the initial proposal.
|
||||
|
||||
Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek Lev for
|
||||
providing feedback on a draft version of this PEP.
|
||||
|
||||
|
||||
=========
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
||||
|
||||
|
||||
.. _core metadata: https://packaging.python.org/en/latest/specifications/core-metadata/
|
||||
.. _Dependabot: https://docs.github.com/en/code-security/dependabot
|
||||
.. _dependency specifiers: https://packaging.python.org/en/latest/specifications/dependency-specifiers/
|
||||
.. _direct URL reference: https://packaging.python.org/en/latest/specifications/direct-url/
|
||||
.. _environment markers: https://packaging.python.org/en/latest/specifications/dependency-specifiers/#environment-markers
|
||||
.. _normalized name: https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
|
||||
.. _PDM: https://pypi.org/project/pdm/
|
||||
.. _pip-tools: https://pypi.org/project/pip-tools/
|
||||
.. _Poetry: https://python-poetry.org/
|
||||
.. _project detail: https://packaging.python.org/en/latest/specifications/simple-repository-api/#project-detail
|
||||
.. _pyproject.toml specification: https://packaging.python.org/en/latest/specifications/pyproject-toml/#pyproject-toml-specification
|
||||
.. _Simple Repository API: https://packaging.python.org/en/latest/specifications/simple-repository-api/
|
||||
.. _software bill of materials: https://www.cisa.gov/sbom
|
||||
.. _TOML: https://toml.io/
|
||||
.. _version specifiers: https://packaging.python.org/en/latest/specifications/version-specifiers/
|
||||
.. _wheel tags: https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/
|
Loading…
Reference in New Issue