python-peps/peps/pep-0751.rst

1313 lines
48 KiB
ReStructuredText

PEP: 751
Title: A file format to record Python dependencies for installation reproducibility
Author: Brett Cannon <brett@python.org>
Status: Draft
Type: Standards Track
Topic: Packaging
Created: 24-Jul-2024
Post-History:
`25-Jul-2024 <https://discuss.python.org/t/59173>`__
`30-Oct-2024 <https://discuss.python.org/t/69721>`__
Replaces: 665
========
Abstract
========
This PEP proposes a new file format for dependency specification
to enable reproducible installation in a Python environment. The format is
designed to be human-readable and machine-generated. Installers consuming the
file should be able to calculate what to install without the need for dependency
resolution at install-time.
==========
Motivation
==========
Currently, no standard exists to create an immutable record, such as a lock
file, which specifies what direct and indirect dependencies should be installed
into a virtual environment.
Considering there are at least five well-known solutions to this problem in the
community (``pip freeze``, pip-tools_, uv_, Poetry_, and PDM_), there seems to
be an appetite for lock files in general.
Those tools also vary in what locking scenarios they support. For instance,
``pip freeze`` and pip-tools only generate lock files for the current
environment while PDM and Poetry try to lock for *any* environment to some
degree. There's also concerns around the lack of secure defaults in the face of
supply chain attacks (e.g., always including hashes for files).
The lack of a standard also has some drawbacks. For instance, any tooling that
wants to work with lock files must choose which format to support, potentially
leaving users unsupported (e.g., Dependabot_ only supporting select tools,
same for cloud providers who can do dependency installations on your behalf,
etc.). It also impacts portability between tools, which causes vendor lock-in.
By not having compatibility and interoperability it fractures tooling around
lock files where both users and tools have to choose what lock file format to
use upfront and making it costly to use/switch to other formats. Rallying
around a single format removes that cost/barrier.
.. note::
Much of the motivation from :pep:`665` also applies to this PEP.
=========
Rationale
=========
The format is designed so that a *locker* which produces the lock file
and an *installer* which consumes the lock file can be separate tools. This
allows for situations such as cloud hosting providers to use their own installer
that's optimized for their system which is independent of what locker the user
used to create their lock file.
The file format is designed to be human-readable. This is so that the contents
of the file can be audited by a human to make sure no undesired dependencies end
up being included in the lock file.
The file format is also designed to not require a resolver at install time. This
greatly simplifies installers and thus reasoning about what would be installed
when consuming a lock file. It should also lead to faster installs which are
much more frequent than creating a lock file.
Finally, the lock file is meant to be flexible enough to meets the various needs
tools have for choosing what to install. That means the lock file records the
dependency graph of what _may_ be installed. This allows tools to enter the
graph at any point and still have reproducible results from that root of the
graph. Flexibility also means supporting different installation scenarios within
the same lock file (e.g., with or without test dependencies).
=============
Specification
=============
---------
File Name
---------
A lock file MUST be named :file:`pylock.toml`. The use of the ``.toml`` file
extension is to make syntax highlighting in editors easier and to reinforce the
fact that the file format is meant to be human-readable.
The lock file SHOULD be located in the directory as appropriate for the scope of
the lock file. Locking against a single ``pyproject.toml``, for instance, would
place the ``pylock.toml`` in the same directory. If the lock file covered
multiple projects in a monorepo, then the expectation is the ``pylock.toml``
file would be in the directory that held all the projects being locked.
-----------
File Format
-----------
The format of the file is TOML_.
All keys listed below are required unless otherwise noted. If two keys are
mutually exclusive to one another, then one of the keys is required while the
other is disallowed.
Keys in tables -- including the top-level table -- SHOULD be emitted by lockers
in the order they are listed in this PEP when applicable unless another sort
order is specified to minimize noise in diffs. If the keys are not explicitly
specified in this PEP, then the keys SHOULD be sorted by lexicographic order.
As well, lockers SHOULD sort arrays in lexicographic order unless otherwise
specified for the same reason.
``version``
===========
- String
- The version of the lock file format.
- This PEP specifies the initial version -- and only valid value until future
updates to the standard change it -- as ``"1.0"``.
- If an installer supports the major version but not the minor version, a tool
SHOULD warn when an unknown key is seen.
- If an installer doesn't support a major version, it MUST raise an error.
``hash-algorithm``
==================
- String
- The name of the hash algorithm used for calculating all hash values.
- Only a single hash algorithm is used for the entire file to allow hash values
to be written in inline tables for readability and compactness purposes by
only listing a single hash value instead of multiple values based on multiple
hash algorithms.
- Specifying a single hash algorithm guarantees that an algorithm that the user
prefers is used consistently throughout the file without having to audit
each file hash value separately.
- Allows for updating the entire file to a new hash algorithm without running
the risk of accidentally leaving an old hash value in the file.
- :ref:`packaging:simple-repository-api-json` and the ``hashes`` dictionary of
of the ``files`` dictionary of the Project Details dictionary specifies what
values are valid and guidelines on what hash algorithms to use.
- Failure to validate any hash values for any file that is to be installed MUST
raise an error.
``[locker]``
============
- Table
- Record of the tool that generated the lock file.
- Enough details SHOULD be provided such that the lock
file from the details in this table can be reproduced (provided the same I/O
data is available, e.g., Dependabot if only files from a repository is
necessary to run the command).
``locker.name``
---------------
- String
- The name of the tool used to create the lock file.
- If the locker is a Python project, its normalized name SHOULD be used.
``locker.version``
------------------
- String
- The version of the tool used.
``locker.run``
--------------
- Optional
- Inline table
- Records the command used to create the lock file.
``locker.run.module``
'''''''''''''''''''''
- Optional
- String
- The module name used for running the locker (i.e. what would be passed to
``python -m``).
- Lockers MUST specify this key if the locker can be executed via ``python -m``.
``locker.run.args``
'''''''''''''''''''
- Optional
- Array of strings
- If the locker has a CLI, the arguments to pass to the locker.
- All paths MUST be relative to the lock file so that another tool could use
the lock file's location as the current working directory.
``[[groups]]``
==============
- Array of tables
- A named subset of packages as found in ``[[packages]]``.
- Act as roots into the dependency graph.
- Installers MUST allow the user to select one or more groups by name to
install all relevant packages together.
- Installers SHOULD let the user skip specifying a name if there is only one
entry in the array.
``groups.name``
---------------
- String
- The name of the group.
``groups.project``
------------------
- Mutually-exclusive with ``requirements``
- String
- The normalized name of a package to act as the starting point into the
dependency graph.
- Analogous to locking to the ``[project]`` table in ``pyproject.toml``.
- Installers MUST let a user specify any optional features/extras that the
package provides.
- Lockers MUST NOT allow for ambiguity by specifying multiple package versions
of the same package under the same group name when a package is listed in any
``project`` key.
``groups.requirements``
-----------------------
- Mutually-exclusive with ``project``
- Array of tables
- Represents the installation requirements for this group.
- Analogous to a key in ``[dependency-groups]`` in ``pyproject.toml``.
- Lockers MUST make sure that resolving any requirement for any environment does
not lead to ambiguity by having multiple values in ``[[packages]]`` match the
same requirement.
- Values in the array SHOULD be written as inline tables, sorted
lexicographically by ``name``, then by ``feature`` with the lack of that key
sorting first.
``groups.requirements.name``
''''''''''''''''''''''''''''''
- String
- Normalized name of the package.
``groups.requirements.extras``
'''''''''''''''''''''''''''''''
- Optional
- Array of strings
- The names of the extras specified for the requirement
(i.e. what comes between ``[...]``).
``groups.requirements.version``
'''''''''''''''''''''''''''''''''
- Optional
- String
- The `version specifiers`_ for the requirement.
``groups.requirements.marker``
''''''''''''''''''''''''''''''''
- Optional
- String
- The `environment markers`_ for the requirement.
``[[packages]]``
================
- Array of tables
- The array contains all data on the nodes of the dependency graph.
- Lockers SHOULD record packages in order by ``name``
lexicographically, ``version`` by its Python `version specifiers`_
ordering, and then by ``groups`` following Python's sort order for lists of
strings (i.e. item by item, then by length as a tiebreaker).
.. Identification
``packages.name``
-----------------
- String
- The `normalized name`_ of the package.
``packages.version``
--------------------
- String
- The version of the package.
``packages.groups``
-------------------
- Array of strings
- Associates this table with the ``group.name`` entries of the same names.
``packages.index-url``
----------------------
- Optional
- String
- Stores the `project index`_ URL from the `Simple Repository API`_.
- Useful for generating Packaging URLs (aka PURLs).
- When possible, lockers SHOULD include this to assist with generating
`software bill of materials`_ (aka SBOMs).
``packages.direct``
-------------------
- Optional (defaults to ``false``)
- Boolean
- Represents whether the installation is via a `direct URL reference`_.
.. Requirements
``packages.requires-python``
----------------------------
- String
- Holds the `version specifiers`_ for Python version compatibility for the
package and version.
- The value MUST match what's provided by the package version, if available, via
:ref:`packaging:core-metadata-requires-python`.
``[[packages.dependencies]]``
-----------------------------
- Array of tables
- A record of the dependency requirements of the package and version.
- The values MUST semantically match what's provided by the package version via
:ref:`packaging:core-metadata-requires-dist` for all dependencies referenced
in the lock file (i.e all base dependencies plus all dependencies for extras
referenced in the lock file); lock files MAY list all dependencies for unused
extras if desired.
- Values in the array SHOULD be written as inline tables, sorted
lexicographically by ``name``, then by ``feature`` with the lack of that key
sorting first.
``packages.dependencies.name``
''''''''''''''''''''''''''''''
See ``groups.requirements.name``.
``packages.dependencies.extras``
''''''''''''''''''''''''''''''''
See ``groups.requirements.extras``.
``packages.dependencies.version``
'''''''''''''''''''''''''''''''''
See ``groups.requirements.version``.
``packages.dependencies.marker``
''''''''''''''''''''''''''''''''
See ``groups.requirements.marker``.
``packages.dependencies.feature``
'''''''''''''''''''''''''''''''''
- Optional
- String
- The optional feature/:ref:`packaging:core-metadata-provides-extra` that this
requirement is conditional on.
.. Installing
``packages.editable``
---------------------
- Optional (defaults to ``false``)
- Boolean
- Specifies whether the package should be installed in editable mode.
``[packages.source-tree]``
--------------------------
- Optional
- Table
- For recording where to find the `source tree`_ for the package version.
- Lockers SHOULD write this table inline.
- Support for source trees by installers is optional.
- If support is provided by an installer it SHOULD be opt-in.
- If multiple source trees are provided, installers MUST prefer either the
``vcs`` option or a file for security/reproducibility due to their commit or
hash, respectively.
``packages.source-tree.vcs``
''''''''''''''''''''''''''''
- Optional
- String
- If specifying a VCS, the type of version control system used.
- The valid values are specified by the
`registered VCSs <https://packaging.python.org/en/latest/specifications/direct-url-data-structure/#registered-vcs>`__
of the direct URL data structure.
``packages.source-tree.path``
'''''''''''''''''''''''''''''
- Required if ``url`` is not set
- String
- A path to the source tree, which may be absolute or relative.
- If the path is relative it MUST be relative to the lock file.
- The path may either be to a directory, file archive, or VCS checkout if
``vcs`` if is specified.
``packages.source-tree.url``
''''''''''''''''''''''''''''
- Required if ``path`` is not set
- String
- A URL to a file archive containing the source tree, or a VCS checkout if
``vcs`` is specified.
``packages.source-tree.commit``
'''''''''''''''''''''''''''''''
- Required if ``vcs`` is set
- String
- The commit ID for the repository which represents the package and version.
- The value MUST be immutable for the VCS for security purposes
(e.g. no Git tags).
``packages.source-tree.size``
'''''''''''''''''''''''''''''
- Optional
- Integer
- The size in bytes for the source tree if it is a file.
- Installers MUST verify the file size matches this value.
``packages.source-tree.hash``
'''''''''''''''''''''''''''''
- Required if ``url`` or ``path`` points to a file
- String
- The hash value of the file contents using the hash algorithm specified by
``hash-algorithm``.
- Installers MUST verify the hash matches the file.
``[packages.sdist]``
--------------------
- Optional
- Table
- The location of a source distribution as specified by
:ref:`packaging:source-distribution-format`.
- Lockers SHOULD write the table inline.
- Support for source distributions by installers is optional.
- If support is provided by an installer it SHOULD be opt-in.
``packages.sdist.url``
''''''''''''''''''''''
- Optional; mutually-exclusive with ``path``
- String
- The URL to the file.
``packages.sdist.path``
'''''''''''''''''''''''
- Optional; mutually-exclusive with ``url``
- String
- A path to the file, which may be absolute or relative.
- If the path is relative it MUST be relative to the lock file.
``packages.sdist.upload-time``
''''''''''''''''''''''''''''''
- Optional and only applicable when ``url`` is specified
- Offset date time
- The upload date and time of the file as specified by a valid ISO 8601
date/time string for the ``.files[]."upload-time"`` field in the JSON
version of :ref:`packaging:simple-repository-api`.
``packages.sdist.size``
'''''''''''''''''''''''
- Optional
- Integer
- The size of the file in bytes.
- Installers MUST verify the file size matches this value.
``packages.sdist.hash``
'''''''''''''''''''''''
- String
- The hash value of the file contents using the hash algorithm specified by
``hash-algorithm``.
- Installers MUST verify the hash matches the file.
``[[packages.wheels]]``
-----------------------
- Optional
- Array of tables
- For recording the wheel files as specified by
:ref:`packaging:binary-distribution-format` for the package version.
- Lockers SHOULD write the table inline.
- Lockers SHOULD sort the array values lexicographically by ``tag``.
``packages.wheels.tags``
''''''''''''''''''''''''
- Array of string
- The uncompressed tag portion of the wheel file: Python, ABI, and platform.
- Lockers MUST make sure the tag values are unique within the
``packages.wheels`` array.
``packages.wheels.build``
'''''''''''''''''''''''''
- Optional
- String
- The build tag for the wheel file (if appropriate).
``packages.wheels.url``
'''''''''''''''''''''''
See ``packages.sdist.url``.
``packages.wheels.path``
''''''''''''''''''''''''
See ``packages.sdist.path``.
``packages.wheels.upload-time``
'''''''''''''''''''''''''''''''
See ``packages.sdist.upload-time``.
``packages.wheels.size``
''''''''''''''''''''''''
See ``packages.sdist.size``.
``packages.wheels.hash``
''''''''''''''''''''''''
See ``packages.sdist.hash``.
``[packages.tool]``
-------------------
- Optional
- Table
- Similar usage as that of the ``[tool]`` table from the
`pyproject.toml specification`_ , but at the package version level instead of
at the lock file level (which is also available via ``[tool]``).
- Useful for scoping package version/release details (e.g., recording signing
identities to then use to verify package integrity separately from where the
package is hosted, prototyping future extensions to this file format, etc.).
``[tool]``
==========
- Optional
- Table
- Same usage as that of the equivalent ``[tool]`` table from the
`pyproject.toml specification`_.
--------
Examples
--------
.. code-block:: TOML
version = '1.0'
hash-algorithm = 'sha256'
[locker]
name = 'mousebender'
version = 'pep'
run = { module = 'mousebender', args = ['lock', '--platform', 'cpython3.12-manylinux2014-x64', '--platform', 'cpython3.12-windows-x64', 'cattrs', 'numpy'] }
[[groups]]
name = 'Default'
requirements = [
{ name = 'cattrs' },
{ name = 'numpy' },
]
[[packages]]
name = 'attrs'
version = '24.2.0'
groups = ['Default']
index_url = 'https://pypi.org/simple/attrs'
direct = false
requires_python = '>=3.7'
dependencies = [
{ name = 'importlib-metadata', marker = 'python_version < "3.8"' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'benchmark' },
{ name = 'hypothesis', feature = 'benchmark' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'benchmark' },
{ name = 'pympler', feature = 'benchmark' },
{ name = 'pytest-codspeed', feature = 'benchmark' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'benchmark' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'benchmark' },
{ name = 'pytest', version = '>=4.3.0', feature = 'benchmark' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'cov' },
{ name = 'coverage', extras = ['toml'], version = '>=5.3', feature = 'cov' },
{ name = 'hypothesis', feature = 'cov' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'cov' },
{ name = 'pympler', feature = 'cov' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'cov' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'cov' },
{ name = 'pytest', version = '>=4.3.0', feature = 'cov' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'dev' },
{ name = 'hypothesis', feature = 'dev' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'dev' },
{ name = 'pre-commit', feature = 'dev' },
{ name = 'pympler', feature = 'dev' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'dev' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'dev' },
{ name = 'pytest', version = '>=4.3.0', feature = 'dev' },
{ name = 'cogapp', feature = 'docs' },
{ name = 'furo', feature = 'docs' },
{ name = 'myst-parser', feature = 'docs' },
{ name = 'sphinx', feature = 'docs' },
{ name = 'sphinx-notfound-page', feature = 'docs' },
{ name = 'sphinxcontrib-towncrier', feature = 'docs' },
{ name = 'towncrier', version = '<24.7', feature = 'docs' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'tests' },
{ name = 'hypothesis', feature = 'tests' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests' },
{ name = 'pympler', feature = 'tests' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'tests' },
{ name = 'pytest', version = '>=4.3.0', feature = 'tests' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests-mypy' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests-mypy' }
]
editable = false
wheels = [
{ tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/6a/21/5b6702a7f963e95456c0de2d495f67bf5fd62840ac655dc451586d23d39a/attrs-24.2.0-py3-none-any.whl', hash = '81921eb96de3191c8258c199618104dd27ac608d9366f5e35d011eae1867ede2', upload_time = 2024-08-06T14:37:36.958006+00:00, size = 63001 }
]
[[packages]]
name = 'cattrs'
version = '24.1.2'
groups = ['Default']
index_url = 'https://pypi.org/simple/cattrs'
direct = false
requires_python = '>=3.8'
dependencies = [
{ name = 'attrs', version = '>=23.1.0' },
{ name = 'exceptiongroup', version = '>=1.1.1', marker = 'python_version < "3.11"' },
{ name = 'typing-extensions', version = '!=4.6.3,>=4.1.0', marker = 'python_version < "3.11"' },
{ name = 'pymongo', version = '>=4.4.0', feature = 'bson' },
{ name = 'cbor2', version = '>=5.4.6', feature = 'cbor2' },
{ name = 'msgpack', version = '>=1.0.5', feature = 'msgpack' },
{ name = 'msgspec', version = '>=0.18.5', marker = 'implementation_name == "cpython"', feature = 'msgspec' },
{ name = 'orjson', version = '>=3.9.2', marker = 'implementation_name == "cpython"', feature = 'orjson' },
{ name = 'pyyaml', version = '>=6.0', feature = 'pyyaml' },
{ name = 'tomlkit', version = '>=0.11.8', feature = 'tomlkit' },
{ name = 'ujson', version = '>=5.7.0', feature = 'ujson' }
]
editable = false
wheels = [
{ tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/c8/d5/867e75361fc45f6de75fe277dd085627a9db5ebb511a87f27dc1396b5351/cattrs-24.1.2-py3-none-any.whl', hash = '67c7495b760168d931a10233f979b28dc04daf853b30752246f4f8471c6d68d0', upload_time = 2024-09-22T14:58:34.812643+00:00, size = 66446 }
]
[[packages]]
name = 'numpy'
version = '2.1.2'
groups = ['Default']
index_url = 'https://pypi.org/simple/numpy'
direct = false
requires_python = '>=3.10'
dependencies = [
]
editable = false
wheels = [
{ tags = ['cp312-cp312-manylinux2014_x86_64', 'cp312-cp312-manylinux_2_17_x86_64'], url = 'https://files.pythonhosted.org/packages/9b/b4/e3c7e6fab0f77fff6194afa173d1f2342073d91b1d3b4b30b17c3fb4407a/numpy-2.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', hash = '6d95f286b8244b3649b477ac066c6906fbb2905f8ac19b170e2175d3d799f4df', upload_time = 2024-10-05T18:36:20.729642+00:00, size = 16041825 },
{ tags = ['cp312-cp312-win_amd64'], url = 'https://files.pythonhosted.org/packages/4c/79/73735a6a5dad6059c085f240a4e74c9270feccd2bc66e4d31b5ca01d329c/numpy-2.1.2-cp312-cp312-win_amd64.whl', hash = '456e3b11cb79ac9946c822a56346ec80275eaf2950314b249b512896c0d2505e', upload_time = 2024-10-05T18:37:38.159022+00:00, size = 12568254 }
]
------------------------
Expectations for Lockers
------------------------
- Lockers MUST make sure that entering the dependency graph via a specific group
will not lead to ambiguity for installers as to which value in
``[[packages]]`` to install for any environment (this can be controlled for
via ``packages.version`` and ``packages.groups``).
- Lockers SHOULD try to make all logically related groups resolve together
(i.e. no ambiguity if grouped together).
- If a ``groups.project`` would have extras that cause ambiguity or installation
failure due to conflicts between the extras, the locker MAY create
separate ``groups.requirements`` entries instead, otherwise the locker MUST
raise an error.
- Lockers MAY try to lock for multiple environments in a single lock file.
- Lockers MAY try to update a lock file containing ``[tool]`` and
``[packages.tool]`` for other tools than themselves.
- Lockers MAY want to provide a way to let users provide the information
necessary to lock for other environments, e.g., supporting a JSON
file format which specifies wheel tags and marker values.
.. code-block:: JSON
{
"marker-values": {"<marker>": "<value>"},
"wheel-tags": ["<tag>"]
}
---------------------------
Expectations for Installers
---------------------------
- Installers MAY support installation of non-binary files
(i.e. source trees and source distributions), but are not required to.
- Installers MUST provide a way to avoid non-binary file installation for
reproducibility and security purposes.
- Installers SHOULD make it opt-in to use non-binary file installation to
facilitate a secure-by-default approach.
- If a traversal of the graph leads to any ambiguity as to what package version
to install (i.e. more than one package version qualifies), an error MUST be
raised.
- Installers MUST only consider package versions included in any selected
groups (i.e. installers cannot consider packages outside of the groups
selected to install from).
- Installers MUST error out if a package version lacks a way to install into the
chosen environment.
- Installers MUST support installing into an empty environment.
Pseudo-Code
===========
.. code-block:: Python
class UnsatisfiableError(Exception):
"""Raised when a requirement cannot be satisfied."""
class AmbiguityError(Exception):
"""Raised when a requirement has multiple solutions."""
def install_packages(lock_file_contents):
# Hard-coded out of laziness.
packages = choose_packages(lock_file_contents, (GROUP_NAME, frozenset()))
for package in packages:
tags = list(packaging.tags.sys_tags())
for tag in tags: # Prioritize by tag order.
tag_str = str(tag)
for wheel in package["wheels"]:
if tag_str in wheel["tags"]:
break
else:
continue
break
else:
raise UnsatisfiableError(
f"No wheel for {package['name']} {package['version']}"
)
print(f"Installing {package['name']} {package['version']} ({tag_str})")
def choose_packages(lock_file_data, *selected_groups):
"""Select the package versions that should be installed based on the requested groups.
'selected_groups' is a sequence of two-item tuples, representing a group name and
optionally any requested extras if the group is a project.
"""
group_names = frozenset(operator.itemgetter(0)(group) for group in selected_groups)
available_packages = {} # The packages in the selected groups.
for pkg in lock_file_data["packages"]:
if frozenset(pkg["groups"]) & group_names:
available_packages.setdefault(pkg["name"], []).append(pkg)
selected_packages = {} # The package versions that have been selected.
handled_extras = {} # The extras that have been handled.
requirements = [] # A stack of requirements to satisfy.
# First, get our starting list of requirements.
for group in selected_groups:
requirements.extend(gather_requirements(lock_file_data, group))
# Next, go through the requirements and try to find a **single** package version
# that satisfies each requirement.
while requirements:
req = requirements.pop()
# Ignore requirements whose markers disqualify it.
if not applies_to_env(req):
continue
name = req["name"]
if pkg := selected_packages.get(name):
# Safety check that the cross-section of groups doesn't cause issues.
# It somewhat assumes the locker didn't mess up such that there would be
# ambiguity by what package version was initially selected.
if not version_satisfies(req, pkg):
raise UnsatisfiableError(
f"requirement {req!r} not satisfied by "
f"{selected_packages[req['name']]!r}"
)
if "extras" not in req:
continue
needed_extras = req["extras"]
if not (extras := handled_extras.set_default(name, set())).difference(
needed_extras
):
continue
# This isn't optimal as we may tread over the same extras multiple times,
# but eventually the maximum set of extras for the package will be handled
# and thus the above guard will short-circuit adding any more requirements.
extras.update(needed_extras)
else:
# Raises UnsatisfiableError or AmbiguityError if no suitable, single package
# version is found.
pkg = compatible_package_version(req, available_packages[req["name"]])
selected_packages[name] = pkg
requirements.extend(dependencies(pkg, req))
return selected_packages.values()
def gather_requirements(locked_file_data, group):
"""Return a collection of all requirements for a group."""
# Hard-coded to support `groups.requirements` out of laziness.
group_name, _extras = group
for group in locked_file_data["groups"]:
if group["name"] == group_name:
return group["requirements"]
else:
raise ValueError(f"Group {group_name!r} not found in lock file")
def applies_to_env(requirement):
"""Check if the requirement applies to the current environment."""
try:
markers = requirement["marker"]
except KeyError:
return True
else:
return packaging.markers.Marker(markers).evaluate()
def version_satisfies(requirement, package):
"""Check if the package version satisfies the requirement."""
try:
raw_specifier = requirement["version"]
except KeyError:
return True
else:
specifier = packaging.specifiers.SpecifierSet(raw_specifier)
return specifier.contains(package["version"], prereleases=True)
def compatible_package_version(requirement, available_packages):
"""Return the package version that satisfies the requirement.
If no package version can satisfy the requirement, raise UnsatisfiableError. If
multiple package versions can satisfy the requirement, raise AmbiguityError.
"""
possible_packages = [
pkg for pkg in available_packages if version_satisfies(requirement, pkg)
]
if not possible_packages:
raise UnsatisfiableError(f"No package version satisfies {requirement!r}")
elif len(possible_packages) > 1:
raise AmbiguityError(f"Multiple package versions satisfy {requirement!r}")
return possible_packages[0]
def dependencies(package, requirement):
"""Return the dependencies of the package.
The extras from the requirement will extend the base requirements as needed.
"""
applicable_deps = []
extras = frozenset(requirement.get("extras", []))
for dep in package["dependencies"]:
if "feature" not in dep or dep["feature"] in extras:
applicable_deps.append(dep)
return applicable_deps
=======================
Backwards Compatibility
=======================
Because there is no preexisting lock file format, there are no explicit
backwards-compatibility concerns in terms of Python packaging standards.
As for packaging tools themselves, that will be a per-tool decision. For tools
that don't document their lock file format, they could choose to simply start
using the format internally and then transition to saving their lock files with
a name supported by this PEP. For tools with a preexisting, documented format,
they could provide an option to choose which format to emit.
=====================
Security Implications
=====================
The hope is that by standardizing on a lock file format that starts from a
security-first posture it will help make overall packaging installation safer.
However, this PEP does not solve all potential security concerns.
One potential concern is tampering with a lock file. If a lock file is not kept
in source control and properly audited, a bad actor could change the file in
nefarious ways (e.g. point to a malware version of a package). Tampering could
also occur in transit to e.g. a cloud provider who will perform an installation
on the user's behalf. Both could be mitigated by signing the lock file either
within the file in a ``[tool]`` entry or via a side channel external to the lock
file itself.
This PEP does not do anything to prevent a user from installing an incorrect
packages. While including many details to help in auditing a package's inclusion,
there isn't any mechanism to stop e.g. name confusion attacks via typosquatting.
Lockers may be able to provide some UX to help with this (e.g. by providing
download counts for a package).
=================
How to Teach This
=================
Users should be informed that when they ask to install some package, that
package may have its own dependencies, those dependencies may have dependencies,
and so on. Without writing down what gets installed as part of installing the
package they requested, things could change from underneath them (e.g., package
versions). Changes to the underlying dependencies can lead to accidental
breakage of their code. Lock files help deal with that by providing a way to
write down what was (and should be) installed.
Having what to install written down also helps in collaborating with others. By
agreeing to a lock file's contents, everyone ends up with the same packages
installed. This helps make sure no one relies on e.g. an API that's only
available in a certain version that not everyone working on the project has
installed.
Lock files also help with security by making sure you always get the same files
installed and not a malicious one that someone may have slipped in. It also
lets one be more deliberate in upgrading their dependencies and thus making sure
the change is on purpose and not one slipped in by a bad actor.
========================
Reference Implementation
========================
A proof-of-concept implementing most of this PEP for wheels can be found at
https://github.com/brettcannon/mousebender/tree/pep .
==============
Rejected Ideas
==============
---------------------------------
A flat set of packages to install
---------------------------------
An earlier version of this PEP proposed to use a flat set of package versions
instead of a graph. The idea was that each package version could be evaluated in
isolation as to whether it applied to an environment for installation. The hope
was that would lend itself to easier auditing as one wouldn't have to worry
about how a package version fit into the graph when looking at e.g., a diff for
a lock file.
Unfortunately this was deemed not as flexible as using a graph. For instance,
recording the graph
`assists in dependency analysis for tools like GitHub <https://discuss.python.org/t/pep-751-lock-files-again/59173/327>`__.
A graph also makes following how you ended up with dependencies within your lock
file from any point in the graph. It also balances out the implementation costs
a bit more between lockers and installers by alleviating the complexity off of
lockers a bit for only a minor increase in complexity for installers by
involving standard graph-traversing algorithms instead of a linear walk.
And if the dependency graph is already being recorded for the above benefits,
then recording that same data in a flattened manner is redundant that makes
lock files larger and potentially more unruly.
-------------------------------------------------------------------------------------
Specifying a new core metadata version that requires consistent metadata across files
-------------------------------------------------------------------------------------
At one point, to handle the issue of metadata varying between files and thus
require examining every released file for a package and version for accurate
locking results, the idea was floated to introduce a new core metadata version
which would require all metadata for all wheel files be the same for a single
version of a packages. Ultimately, though, it was deemed unnecessary as this PEP
will put pressure on people to make files consistent for performance reasons or
to make indexes provide all the metadata separate from the wheel files
themselves. As well, there's no easy enforcement mechanism, and so community
expectation would work as well as a new metadata version.
-------------------------------------------
Have the installer do dependency resolution
-------------------------------------------
In order to support a format more akin to how Poetry worked when this PEP was
drafted, it was suggested that lockers effectively record the packages and their
versions which may be necessary to make an install work in any possible
scenario, and then the installer resolves what to install. But that complicates
auditing a lock file by requiring much more mental effort to know what packages
may be installed in any given scenario. Also, one of the Poetry developers
`suggested <https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/83>`__
that markers as represented in the package locking approach of this PEP may be
sufficient to cover the needs of Poetry. Not having the installer do a
resolution also simplifies their implementation, centralizing complexity in
lockers.
-----------------------------------------
Requiring specific hash algorithm support
-----------------------------------------
It was proposed to require a baseline hash algorithm for the files. This was
rejected as no other Python packaging specification requires specific hash
algorithm support. As well, the minimum hash algorithm suggested may eventually
become an outdated/unsafe suggestion, requiring further updates. In order to
promote using the best algorithm at all times, no baseline is provided to avoid
simply defaulting to the baseline in tools without considering the security
ramifications of that hash algorithm.
------------------------------------
Require a URL or file path for files
------------------------------------
Originally references to files were required, e.g., ``packages.sdist.url`` or
``packages.sdist.path``. But at least
`one use-case <https://discuss.python.org/t/pep-751-now-with-graphs/69721/34>`__
surfaced during discussions about this PEP where statically specifying the
location of files would be problematic. And in earlier discussions the idea of
the location being a hint wasn't preferred. Hence the PEP now makes the data
optional, but considers the locations accurate if specified.
-----------
File naming
-----------
Using ``*.pylock.toml`` as the file name
========================================
It was proposed to put the ``pylock`` constant part of the file name after the
identifier for the purpose of the lock file. It was decided not to do this so
that lock files would sort together when looking at directory contents instead
of purely based on their purpose which could spread them out in a directory.
Using ``*.pylock`` as the file name
===================================
Not using ``.toml`` as the file extension and instead making it ``.pylock``
itself was proposed. This was decided against so that code editors would know
how to provide syntax highlighting to a lock file without having special
knowledge about the file extension.
Not having a naming convention for the file
===========================================
Having no requirements or guidance for a lock file's name was considered, but
ultimately rejected. By having a standardized naming convention it makes it easy
to identify a lock file for both a human and a code editor. This helps
facilitate discovery when e.g. a tool wants to know all of the lock files that
are available.
-----------
File format
-----------
Use JSON over TOML
==================
Since having a format that is machine-writable was a goal of this PEP, it was
suggested to use JSON. But it was deemed less human-readable than TOML while
not improving on the machine-writable aspect enough to warrant the change.
Use YAML over TOML
==================
Some argued that YAML met the machine-writable/human-readable requirement in a
better way than TOML. But as that's subjective and ``pyproject.toml`` already
existed as the human-writable file used by Python packaging standards it was
deemed more important to keep using TOML.
----------
Other keys
----------
Multiple hashes per file
========================
An initial version of this PEP proposed supporting multiple hashes per file. The
idea was to allow one to choose which hashing algorithm they wanted to go with
when installing. But upon reflection it seemed like an unnecessary complication
as there was no guarantee the hashes provided would satisfy the user's needs.
As well, if the single hash algorithm used in the lock file wasn't sufficient,
rehashing the files involved as a way to migrate to a different algorithm didn't
seem insurmountable.
Hashing the contents of the lock file itself
============================================
Hashing the contents of the bytes of the file and storing hash value within the
file itself was proposed at some point. This was removed to make it easier
when merging changes to the lock file as each merge would have to recalculate
the hash value to avoid a merge conflict.
Hashing the semantic contents of the file was also proposed, but it would lead
to the same merge conflict issue.
Regardless of which contents were hashed, either approach could have the hash
value stored outside of the file if such a hash was desired.
Recording the creation date of the lock file
============================================
To know how potentially stale the lock file was, an earlier proposal suggested
recording the creation date of the lock file. But for some same merge conflict
reasons as storing the hash of the file contents, this idea was dropped.
Recording the package indexes used
==================================
Recording what package indexes were used by the locker to decide what to lock
for was considered. In the end, though, it was rejected as it was deemed
unnecessary bookkeeping.
Locking build requirements for sdists
=====================================
An earlier version of this PEP tried to lock the build requirements for sdists
under a ``packages.build-requires`` key. Unfortunately it confused enough people
about how it was expected to operate and there were enough edge case issues to
decide it wasn't worth trying to do in this PEP upfront. Instead, a future PEP
could propose a solution.
===========
Open Issues
===========
----------------------------------------------
Specify ``requires-python`` at the file level?
----------------------------------------------
The lock file formats from PDM_, Poetry_, and uv_ all specify
``requires-python`` at the top level for the absolute minimum Python version
needed for the lock file. This can be inferred, though, by examining all
``packages.requires-python`` values. The global value might also not be
accurate for all platforms depending on how environment markers influence what
package versions are installed and what their Python version requirements are.
---------------------
Don't pre-parse data?
---------------------
This PEP currently takes the viewpoint that if a piece of data is going to be
parsed by installers everytime they run, then trying to pre-parse as much as
possible so the TOML parser can help is a good thing. The thinking is TOML
parsers have a higher chance of being optimized, and so letting them do more
parsing leads to a faster outcome. It should also increase readability by
breaking apart data upfront more.
But in the case of doing this to wheel file names, some might consider it too
much. The question becomes whether separating out all the parts of a wheel
file name hinders readability because people are used to reading the file names
already, or by clearly separating its parts it actually helps make installers
faster, easier to write, and doesn't hinder readability.
This all equally applies to requirement specifiers.
==============
Deferred Ideas
==============
----------------
Per-file locking
----------------
An earlier version of this PEP supported two approaches to locking: *per-file*
and **per-package**. The idea for the former approach to locking was that if you
were locking for an a-priori set of environments you could lock to just the
files necessary to install into those environments. The thinking was that by
only listing a subset of files that auditing would be easier.
Unfortunately there was disagreement on how best to express upfront what the
supported environment requirements would be. Since what this PEP currently
proposes still prevents accidental success of installation into unsupported
environments, this idea has been deferred until such time someone can come up
with a representation that makes sense.
--------------------------------
Allowing for multiple lock files
--------------------------------
Before the introduction of ``[[groups]]``, this PEP proposed supporting multiple
lock files that would match the regular expression
``r"pylock\.(.+)\.toml"`` if a name for the lock file is desired or if multiple
lock files exist. But since ``[[groups]]`` subsumes a lot of the need to support
multiple lock files, this specific feature can be postponed until such time that
a need is shown to support multiple lock files.
================
Acknowledgements
================
Thanks to everyone who participated in the discussions on discuss.python.org.
Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek Lev for
providing feedback on a draft version of this PEP.
=========
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
.. _core metadata: https://packaging.python.org/en/latest/specifications/core-metadata/
.. _Dependabot: https://docs.github.com/en/code-security/dependabot
.. _dependency specifiers: https://packaging.python.org/en/latest/specifications/dependency-specifiers/
.. _direct URL reference: https://packaging.python.org/en/latest/specifications/direct-url/
.. _environment markers: https://packaging.python.org/en/latest/specifications/dependency-specifiers/#environment-markers
.. _normalized name: https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
.. _PDM: https://pypi.org/project/pdm/
.. _pip-tools: https://pypi.org/project/pip-tools/
.. _Poetry: https://python-poetry.org/
.. _project index: https://packaging.python.org/en/latest/specifications/simple-repository-api/#project-list
.. _pyproject.toml specification: https://packaging.python.org/en/latest/specifications/pyproject-toml/#pyproject-toml-specification
.. _Simple Repository API: https://packaging.python.org/en/latest/specifications/simple-repository-api/
.. _software bill of materials: https://www.cisa.gov/sbom
.. _source tree: https://packaging.python.org/en/latest/specifications/source-distribution-format/#source-trees
.. _TOML: https://toml.io/
.. _uv: https://github.com/astral-sh/uv
.. _version specifiers: https://packaging.python.org/en/latest/specifications/version-specifiers/
.. _wheel tags: https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/