python-peps/pep-0665.rst

782 lines
28 KiB
ReStructuredText
Raw Normal View History

PEP: 665
Title: Specifying Installation Requirements for Python Projects
Author: Brett Cannon <brett@python.org>,
Pradyun Gedam <pradyunsg@gmail.com>,
Tzu-ping Chung <uranusjr@gmail.com>
PEP-Delegate:
Discussions-To: https://discuss.python.org/t/pep-665-specifying-installation-requirements-for-python-projects/9911
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Jul-2021
Post-History: 29-Jul-2021
Resolution:
========
Abstract
========
This PEP specifies a file format to list the Python package
installation requirements for a project. The list of projects is
considered exhaustive for the installation target and thus
*locked down*, not requiring any information beyond the platform being
installed for and the *lock file* listing the required dependencies
to perform a successful installation of dependencies.
==========
Motivation
==========
Thanks to PEP 621, projects have a way to list their direct/top-level
dependencies which they need to have installed. But PEP 621 also
(purposefully) omits two key details that often become important for
projects:
#. A listing of all indirect/transitive dependencies
#. Specifying (at least) specific versions of dependencies for
reproducible installations
Both needs are important for various reasons. One is that without a
complete listing of all dependencies and the specific versions to use,
there can be a skew between developers of the same project, or
developer and user, based on what versions of a project's dependencies
happen to be available at the time of installation. For instance,
a dependency may have v1 as the newest version on Monday when one
developer installed the dependency, while v2 comes out on Wednesday
when another developer installs the same dependency. Now the two
developers are working against two different versions of the same
dependency, which can lead to different outcomes.
Another important reason for reproducible installations is for
security purposes. Guaranteeing that the same binary data is
downloaded and installed for all installations makes sure that no bad
actor has somehow changed a dependency's binary data in a malicious
way. A lock file can assist in this guarantee by recording the exact
details of what should be installed and how to verify that those
dependencies have not changed any bytes unexpectedly.
The community itself has also shown a need for lock files based on the
fact that multiple tools have independently created their own lock
file formats:
#. PDM_
#. `pip-tools`_
#. Pipenv_
#. Poetry_
#. Pyflow_
Other programming language communities have also shown the usefulness
of lock files by developing their own solution to this problem. Some
of those communities include:
#. Dart_
#. npm_/Node
#. Rust_
=========
Rationale
=========
To begin, two key terms should be defined. A **locker** is a tool
which *produces* a lock file. An **installer** is a tool which
*consumes* a lock file to install the appropriate dependencies.
-----
Goals
-----
The file format should be *machine-readable*, *machine-writable*, and
*human-readable*. Since the assumption is the vast majority of lock
file will be generated by a locker tool, the format should be easy
to write by a locker. As install tools will be consuming the lock
file, the format also needs to be easily read by an installer. But the
format should also be readable by a person as people will inevitably
be performing audits on lock files. Having a format that does not lend
itself towards being read by people would hinder that. This includes
changes to a lock file being readable in a diff format for auditing
changes. It also means that understanding *why* something is in
the lock file should be comprehensible in a diff to assist in auditing
changes.
The lock file format needs to be general enough to support
*cross-platform and cross-environment* specifications of dependencies.
This allows having a single lock file which can work on a myriad of
platforms and environments when that makes sense. This has been shown
as a necessary feature by the various tools in the Python packaging
ecosystem which already have a lock file format (e.g. Pipenv_,
Poetry_, PDM_).
The lock file also needs to support *reproducible installations*. If
one wants to restrict what the lock file covers to a single platform
to guarantee the exact dependencies and files which will be installed,
that should be doable. This can be critical in security contexts for
projects like SecureDrop_.
When a computation could be performed either in the locker or
installer, the preference is to *perform the computation in the
locker*. This is because the assumption is a locker will be executed
less frequently than an installer.
The installer should be able to resolve what to install based entirely
on platform/environment information and what is contained within the
lock file. There should be
*no need to use network or other file system I/O* in order to resolve
what to install.
The lock file should provide enough flexibility to allow lockers and
installers to innovate. While the lock file specification provides a
*common denominator of functionality*, it should not act as a ceiling
for functionality.
---------
Non-Goals
---------
Because of the expected size of lock files, no effort was put into
making lock files *human-writable*.
=============
Specification
=============
-------
Details
-------
Lock files MUST use the TOML_ file format thanks to its adoption by
PEP 518 for ``pyproject.toml``. This not only prevents the need to
have another file format in the Python packaging ecosystem, but it
also assists in making lock files human-readable.
Lock files MUST be kept in a directory named ``pyproject-lock.d``.
Lock files MUST end with a ``.toml`` file extension. Projects may have
as many lock files as they want using whatever file name stems they
choose. This PEP prescribes no specific way to automatically select
between multiple lock files and installers SHOULD avoid guessing which
lock file is "best-fitting" (this does not preclude situations where
only a single lock file with a certain name is expected to exist and
will be used by default, e.g. a documentation hosting site always
using a lock file named ``pyproject-lock.d/rftd.toml`` when provided).
The following are the top-level keys of the TOML file data format.
``version``
===========
The version of the lock file being used. The key MUST be specified and
it MUST be set to ``1``. The number MUST always be an integer and it
MUST only increment in future updates to the specification. What
consistitutes a version number increase is left to future PEPs or
standards changes.
``[tool]``
==========
Tools may create their own sub-tables under the ``tool`` table. The
rules for this table match those for ``pyproject.toml`` and its
``[tool]`` table from the `build system declaration spec`_.
``[metadata]``
==============
A table containing data applying to the overall lock file.
``metadata.marker``
-------------------
An optional key storing a string containing an environment marker as
specified in the `dependency specifier spec`_.
The locker MAY specify an environment marker which specifies any
restrictions the lock file was generated under (e.g. specific Python
versions supported).
If the installer is installing for an environment which does not
satisfy the specified environment marker, the installer MUST raise an
error as the lock file does not support the environment.
``metadata.tags``
-----------------
An optional array of inline tables representing
`platform compatibility tags`_ that the lock file supports. The locker
MAY specify tables in the array which represent the compatibility the
lock file was generated for.
The tables have the possible keys of:
- ``interpreter``
- ``abi``
- ``platform``
representing the parts of the platform compatibility tags. Each key is
optional in a table. These keys MUST represent a single value, i.e.
the values are exploded and not compressed in wheel tag parlance.
If the environment an installer is installing for does not match
**any** table in the array (missing keys in the table means implicit
support for that part of the compatibility), the installer MUST raise
an error as the lock file does not support the environment.
``metadata.needs``
------------------
An array of strings representing the package specifiers for the
top-level/direct dependencies of the lock file as defined by the
`dependency specifier spec`_ (i.e. the root of the dependency graph
for the lock file).
Lockers MUST only allow specifiers which may be satisfiable by the
lock file and the dependency graph the lock file encodes. Lockers MUST
normalize project names according to the `simple repository API`_.
``[package]``
===============
A table containing arrays of tables for each dependency recorded
in the lock file.
Each key of the table is the name of a package which MUST be
normalized according to the `simple repository API`_. If extras are
specified as part of the project to install, the extras are to be
included in the key name and are to be sorted in lexicographic order.
Within the file, the tables for the projects MUST be
sorted by:
#. Project/key name in lexicographic order
#. Package version, newest/highest to older/lowest according to the
`version specifiers spec`_
#. Extras via lexicographic order
``package.<name>.version``
--------------------------
A required string of the version of the package as specified by the
`version specifiers spec`_.
``package.<name>.needs``
------------------------
An optional key containing an array of strings following the
`dependency specifier spec`_ which specify what other packages this
package depends on. See ``metadata.needs`` for full details.
``package.<name>.required-by``
------------------------------
A key containing an array of package names which depend on this
package. The package names MUST match the package name as used in the
``package`` table.
The lack of a ``required-by`` key infers that the package is a
top-level package listed in ``metadata.needs``.
``package.<name>.code``
-----------------------
An array of tables listing files that are available to satisfy
the installation of the package for the specified version in the
``version`` key.
Each table has a ``type`` key which specifies how the code is stored.
All other keys in the table are dependent on the value set for
``type``. The acceptable values for ``type`` are listed below; all
other possible values are reserved for future use.
Tables in the array MUST be sorted in lexicographic order of the value
of ``type``, then lexicographic order for the value of ``url``.
When recording a table, the fields SHOULD be listed in the order
the fields are listed in this specification for consistency to make
diffs of a lock file easier to read.
For all types other than "wheel", an INSTALLER MAY refuse to install
code to avoid arbitrary code execution during installation.
An installer MUST verify the hash of any specified file.
``type="wheel"``
''''''''''''''''
A `wheel file`_ for the package version.
Supported keys in the table are:
- ``url``: a string of location of the wheel file (use the
``file://`` protocol for the local file system)
- ``hash-algorithm``: a string of the algorithm used to generate the
hash value stored in ``hash-value``
- ``hash-value``: a string of the hash of the file contents
- ``interpreter-tag``: (optional) a string of the interpreter portion
of the wheel tag as specified by the `platform compatibility tags`_
spec
- ``abi-tag``: (optional) a string of the ABI portion of the wheel tag
as specified by the `platform compatibility tags`_ spec
- ``platform-tag``: (optional) a string of the platform portion of the
wheel tag as specified by the `platform compatibility tags`_ spec
If the keys related to `platform compatibility tags`_ are absent then
the installer MUST infer the tags from the URL's file name. If any of
the `platform compatibility tags`_ are specified by a key in the table
then a locker MUST provide all three related keys. The values of the
keys may be compressed tags.
``type="sdist"``
''''''''''''''''
A `source distribution file`_ (sdist) for the package version.
- ``url``: a string of location of the sdist file (use the
``file://`` protocol for the local file system)
- ``hash-algorithm``: a string of the algorithm used to generate the
hash value stored in ``hash-value``
- ``hash-value``: a string of the hash of the file contents
``type="git"``
''''''''''''''
A Git_ version control repository for the package.
- ``url``: a string of location of the repository (use the
``file://`` protocol for the local file system)
- ``commit``: a string of the commit of the repository which
represents the version of the package
The repository MUST follow the `source distribution file`_ spec
for source trees, otherwise an error is to be raised by the locker.
As the commit ID for a Git repository is a hash of the repository's
contents, there is no hash to verify.
``type="source tree"``
''''''''''''''''''''''
A source tree which can be used to build a wheel.
- ``url``: a string of location of the source tree (use the
``file://`` protocol for the local file system)
- ``mime-type``: (optional) a string representing the MIME type of the
URL
- ``hash-algorithm``: (optional for a local directory) a string of the
algorithm used to generate the hash value stored in ``hash-value``
- ``hash-value``: (optional for a local directory) a string of the
hash of the file contents
The collection of files MUST follow the `source distribution file`_
spec for source trees, otherwise an error is to be raised by the
locker.
Installers MAY use the file extension, MIME type from HTTP headers,
etc. to infer whether they support the storage mechanism used for the
source tree. If the MIME type cannot be inferred and it is not
specified via ``mime-type`` then an error MUST be raised.
If the source tree is NOT a local directory, then an installer MUST
verify the hash value. Otherwise if the source tree is a local
directory then the ``hash-algorithm`` and ``hash-value`` keys MUST be
left out. The installer MAY warn the user of the use of a local
directory due to the potential change in code since the lock file
was created.
-------
Example
-------
::
version = 1
[tool]
# Tool-specific table ala PEP 518's `[tool]` table.
[metadata]
marker = "python_version>='3.6'"
needs = ["mousebender"]
[[package.attrs]]
version = "21.2.0"
required-by = ["mousebender"]
[[package.attrs.code]]
type = "wheel"
url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl"
hash-algorithm="sha256"
hash-value = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1"
[[package.mousebender]]
version = "2.0.0"
needs = ["attrs>=19.3", "packaging>=20.3"]
[[package.mousebender.code]]
type = "sdist"
url = "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"
hash-algorithm = "sha256"
hash-value = "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"
[[package.mousebender.code]]
type = "wheel"
url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"
hash-algorithm = "sha256"
hash-value = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"
[[package.packaging]]
version = "20.9"
needs = ["pyparsing>=2.0.2"]
required-by = ["mousebender"]
[[package.packaging.code]]
type = "git"
url = "https://github.com/pypa/packaging.git"
commit = "53fd698b1620aca027324001bf53c8ffda0c17d1"
[[package.pyparsing]]
version = "2.4.7"
required-by = ["packaging"]
[[package.pyparsing.code]]
type="wheel"
url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl"
hash-algorithm="sha256"
hash-value="ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"
interpreter-tag = "py2.py3"
abi-tag = "none"
platform-tag = "any"
----------------------
Installer Expectations
----------------------
Installers MUST implement the
`direct URL origin of installed distributions spec`_ as all packages
installed from a lock file inherently originate from a URL and not a
search of an index by package name and version.
Example Flow
============
#. Have the user specify which lock file they would like to use in
``pyproject-lock.d`` (e.g. ``dev``, ``prod``)
#. Check if the environment supports what is specified in
``metadata.tags``; error out if it doesn't
#. Check if the environment supports what is specified in
``metadata.marker``; error out if it doesn't
#. Gather the list of package names from ``metadata.needs``, and for
each listed package ...
#. Resolve any markers to find the appropriate package to install
#. Find the most appropriate code to install for the package
#. Repeat the above steps for packages listed in the ``needs`` key
for each package found to install
#. For each project collected to install ...
#. Gather the specified code for the package
#. Verify hashes of code
#. Install the packages (if necessary)
=======================
Backwards Compatibility
=======================
As there is no pre-existing specification regarding lock files, there
are no explicit backwards compatibility concerns.
As for pre-existing tools that have their own lock file, some updating
will be required. Most document the lock file name, but not its
contents, in which case the file name of the lock file(s) is the
important part. For projects which do not commit their lock file to
version control, they will need to update the equivalent of their
``.gitignore`` file. For projects that do commit their lock file to
version control, what file(s) get committed will need an update.
For projects which do document their lock file format like pipenv_,
they will very likely need a new major version release.
Specifically for Poetry_, it has an
`export command <https://python-poetry.org/docs/cli/#export>`_ which
should allow Poetry to support this lock file format even if the
project chose not to adopt this PEP as Poetry's primary lock file
format.
=====================
Security Implications
=====================
A lock file should not introduce security issues but instead help
solve them. By requiring the recording of hashes of code, a lock file
is able to help prevent tampering with code since the hash details
were recorded. A lock file also helps prevent unexpected package
updates being installed which may be malicious.
=================
How to Teach This
=================
Teaching of this PEP will very much be dependent on the lockers and
installers being used for day-to-day use. Conceptually, though, users
could be taught that the ``pyproject-lock.d`` directory contains files
which specify what should be installed for a project to work. The
benefits of consistency and security should be emphasized to help
users realize why they should care about lock files.
========================
Reference Implementation
========================
No proof-of-concept or reference implementation currently exists.
==============
Rejected Ideas
==============
----------------------------
File Formats Other Than TOML
----------------------------
JSON_ was briefly considered, but due to:
#. TOML already being used for ``pyproject.toml``
#. TOML being more human-readable
#. TOML leading to better diffs
the decision was made to go with TOML. There was some concern over
Python's standard library lacking a TOML parser, but most packaging
tools already use a TOML parser thanks to ``pyproject.toml`` so this
issue did not seem to be a showstopper. Some have also argued against
this concern in the past by the fact that if packaging tools abhor
installing dependencies and feel they can't vendor a package then the
packaging ecosystem has much bigger issues to rectify than needing to
depend on a third-party TOML parser.
----------------------------------------
Alternative Name to ``pyproject-lock.d``
----------------------------------------
The name ``__lockfile__`` was briefly considered, but the directory
would not sort next to ``pyproject.toml`` in instances where files
and directories were sorted together in lexicographic order. The
current naming is also more obvious in terms of its relationship
to ``pyproject.toml``.
-----------------------------
Supporting a Single Lock File
-----------------------------
At one point the idea of not using a directory of lock files but a
single lock file which contained all possible lock information was
considered. But it quickly became apparent that trying to devise a
data format which could encompass both a lock file format which could
support multiple environments as well as strict lock outcomes for
reproducible builds would become quite complex and cumbersome.
The idea of supporting a directory of lock files as well as a single
lock file named ``pyproject-lock.toml`` was also considered. But any
possible simplicity from skipping the directory in the case of a
single lock file seemed unnecessary. Trying to define appropriate
logic for what should be the ``pyproject-lock.toml`` file and what
should go into ``pyproject-lock.d`` seemed unnecessarily complicated.
-----------------------------------------------
Using a Flat List Instead of a Dependency Graph
-----------------------------------------------
The first version of this PEP proposed that the lock file have no
concept of a dependency graph. Instead, the lock file would list
exactly what should be installed for a specific platform such that
installers did not have to make any decisions about *what* to install,
only validating that the lock file would work for the target platform.
This idea was eventually rejected due to the number of combinations
of potential PEP 508 environment markers. The decision was made that
trying to have lockers generate all possible combinations when a
project wants to be cross-platform would be too much.
-------------------------------------------------------------------------
Being Concerned About Different Dependencies Per Wheel File For a Project
-------------------------------------------------------------------------
It is technically possible for a project to specify different
dependencies between its various wheel files. Taking that into
consideration would then require the lock file to operate not
per-project but per-file. Luckily, specifying different dependencies
in this way is very rare and frowned upon and so it was deemed not
worth supporting.
-------------------------------
Use Wheel Tags in the File Name
-------------------------------
Instead of having the ``metadata.tags`` field there was a suggestion
of encoding the tags into the file name. But due to the addition of
the ``metadata.marker`` field and what to do when no tags were needed,
the idea was dropped.
-----------------------------------------
Using Semantic Versioning for ``version``
-----------------------------------------
Instead of a monotonically increasing integer, using a float was
considered to attempt to convey semantic versioning. In the end,
though, it was deemed more hassle than it was worth as adding a new
key would likely constitute a "major" version change (only if the
key was entirely optional would it be considered "minor"), and
experience with the `core metadata spec`_ suggests there's a bigger
chance parsing will be relaxed and made more strict which is also a
"major" change. As such, the simplicity of using an integer made
sense.
-------------------------------
Alternative Names for ``needs``
-------------------------------
Some other names for what became ``needs`` were ``installs`` and
``dependencies``. In the end a Python beginner was asked which term
they preferred and they found ``needs`` clearer. Since there wasn't
any reason to disagree with that, the decision was to go with
``needs``.
-------------------------------------
Alternative Names for ``required-by``
-------------------------------------
Other names that were considered were ``dependents``, ``depended-by``,
and ``supports``. In the end, ``required-by`` simply seemed like the
best fit.
-------------------------------------
Support for Branches and Tags for Git
-------------------------------------
Due to the `direct URL origin of installed distributions spec`_
supporting the specification of branches and tags, it was suggested
that lock files support the same thing. But because branches and tags
can change what commit they point to between locking and installation,
that was viewed as a security concern (Git commit IDs are hashes of
metadata and thus are viewed as immutable).
===========
Open Issues
===========
---------------------------------------
Allow for Tool-Specific ``type`` Values
---------------------------------------
It has been suggested to allow for custom ``type`` values in the
``code`` table. They would be prefixed with ``x-`` and followed by
the tool's name and then the type, i.e. ``x-<tool>-<type>``. This
would provide enough flexibility for things such as other version
control systems, innovative container formats, etc. to be officially
usable in a lock file.
-----------------------------------------------
Support Variable Expansion in the ``url`` field
-----------------------------------------------
This could include predefined variables like ``PROJECT_ROOT`` for the
directory containing ``pyproject-lock.d`` so URLs to local directories
and files could be relative to the project itself.
Environment variables could be supported to avoid hardcoding things
such as user credentials for Git.
===============
Acknowledgments
===============
Thanks to Frost Ming of PDM_ and Sébastien Eustace of Poetry_ for
providing input around dynamic install-time resolution of PEP 508
requirements.
Thanks to Kushal Das for making sure reproducible builds stayed a
concern for this PEP.
Thanks to Andrea McInnes for settling the bikeshedding and choosing
the paint colour of ``needs``.
=========
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
.. _build system declaration spec: https://packaging.python.org/specifications/declaring-build-dependencies/
.. _core metadata spec: https://packaging.python.org/specifications/core-metadata/
.. _Dart: https://dart.dev/
.. _dependency specifier spec: https://packaging.python.org/specifications/dependency-specifiers/
.. _Git: https://git-scm.com/
.. _JSON: https://www.json.org/
.. _npm: https://www.npmjs.com/
.. _PDM: https://pypi.org/project/pdm/
.. _pip-tools: https://pypi.org/project/pip-tools/
.. _Pipenv: https://pypi.org/project/pipenv/
.. _platform compatibility tags: https://packaging.python.org/specifications/platform-compatibility-tags/
.. _Poetry: https://pypi.org/project/poetry/
.. _Pyflow: https://pypi.org/project/pyflow/
.. _direct URL origin of installed distributions spec: https://packaging.python.org/specifications/direct-url/
.. _Rust: https://www.rust-lang.org/
.. _SecureDrop: https://securedrop.org/
.. _simple repository API: https://packaging.python.org/specifications/simple-repository-api/
.. _source distribution file: https://packaging.python.org/specifications/source-distribution-format/
.. _TOML: https://toml.io
.. _version specifiers spec: https://packaging.python.org/specifications/version-specifiers/
.. _wheel file: https://packaging.python.org/specifications/binary-distribution-format/
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: