PEP 710: Recording the provenance of installed packages (#3076)
This commit is contained in:
parent
70437b31d8
commit
0c6f86f0b9
|
@ -590,6 +590,7 @@ pep-0706.rst @encukou
|
|||
pep-0707.rst @iritkatriel
|
||||
pep-0708.rst @dstufft
|
||||
pep-0709.rst @carljm
|
||||
pep-0710.rst @dstufft
|
||||
# ...
|
||||
# pep-0754.txt
|
||||
# ...
|
||||
|
|
|
@ -0,0 +1,619 @@
|
|||
PEP: 710
|
||||
Title: Recording the provenance of installed packages
|
||||
Author: Fridolín Pokorný <fridolin.pokorny at gmail.com>
|
||||
Sponsor: Donald Stufft <donald@stufft.io>
|
||||
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
|
||||
Discussions-To: https://discuss.python.org/t/draft-pep-recording-provenance-of-installed-packages/24838
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Topic: Packaging
|
||||
Content-Type: text/x-rst
|
||||
Created: 27-Mar-2023
|
||||
Post-History: `03-Dec-2021 <https://discuss.python.org/t/pip-installation-reports/12316>`__,
|
||||
`30-Jan-2023 <https://discuss.python.org/t/pre-pep-recording-provenance-of-installed-packages/23340>`__,
|
||||
`14-Mar-2023 <https://discuss.python.org/t/draft-pep-recording-provenance-of-installed-packages/24838>`__,
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP describes a way to record the provenance of installed Python distributions.
|
||||
The record is created by an installer and is available to users in
|
||||
the form of a JSON file ``provenance_url.json`` in the ``.dist-info`` directory.
|
||||
The mentioned JSON file captures additional metadata to allow recording a URL to a
|
||||
:term:`distribution package` together with the installed distribution hash. This
|
||||
proposal is built on top of :pep:`610` following
|
||||
:ref:`its corresponding canonical PyPA spec <packaging:direct-url>` and
|
||||
complements ``direct_url.json`` with ``provenance_url.json`` for when packages
|
||||
are identified by a name, and optionally a version.
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Installing a Python :term:`Project` involves downloading a :term:`Distribution Package`
|
||||
from a :term:`Package Index`
|
||||
and extracting its content to an appropriate place. After the installation
|
||||
process is done, information about the release artifact used as well as its source
|
||||
is generally lost. However, there are use cases for keeping records of
|
||||
distributions used for installing packages and their provenance.
|
||||
|
||||
Python wheels can be built with different compiler flags or supporting
|
||||
different wheel tags. In both cases, users might get into a situation in which
|
||||
multiple wheels might be considered by installers (possibly from different
|
||||
package indexes) and immediately finding out which wheel file was actually used
|
||||
during the installation might be helpful. This way, developers can use
|
||||
information about wheels to debug issues making sure the desired wheel was
|
||||
actually installed. Another use case could be tools reporting software
|
||||
installed, such as tools reporting a SBOM (Software Bill of Materials), that might
|
||||
give more accurate reports. Yet another use case could be reconstruction of the
|
||||
Python environment by pinning each installed package to a specific distribution
|
||||
artifact consumed from a Python package index.
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
The motivation described in this PEP is an extension of that in :pep:`610`.
|
||||
In addition to recording provenance information for packages installed using a direct URL,
|
||||
installers should also do so for packages installed by name
|
||||
(and optionally version) from Python package indexes.
|
||||
|
||||
The idea described in this PEP originated in a tool called `micropipenv`_
|
||||
that is used to install
|
||||
:term:`distribution packages <Distribution Package>` in containerized
|
||||
environments (see the reported issue `thoth-station/micropipenv#206`_).
|
||||
Currently, the assembled containerized application does not implicitly carry
|
||||
information about the provenance of installed distribution packages
|
||||
(unless these are installed from full URLs and recorded via ``direct_url.json``).
|
||||
This requires container image suppliers to link
|
||||
container images with the corresponding build process, its configuration and
|
||||
the application source code for checking requirements files in cases when
|
||||
software present in containerized environments needs to be audited.
|
||||
|
||||
The `subsequent discussion in the Discourse thread
|
||||
<https://discuss.python.org/t/12316>`__ also brought up
|
||||
pip's new ``--report`` option that can
|
||||
`generate a detailed JSON report <pip_installation_report_>`__ about
|
||||
the installation process. This option could help with the provenance problem
|
||||
this PEP approaches. Nevertheless, this option needs to be *explicitly* passed
|
||||
to pip to obtain the provenance information, and includes additional metadata that
|
||||
might not be necessary for checking the provenance (such as Python version
|
||||
requirements of each distribution package). Also, this option is
|
||||
specific to pip as of the writing of this PEP.
|
||||
|
||||
Note the current :ref:`spec for recording installed packages
|
||||
<packaging:recording-installed-packages>` defines a ``RECORD`` file that
|
||||
records installed files, but not the distribution artifact from which these
|
||||
files were obtained. Auditing installed artifacts can be performed
|
||||
based on matching the entries listed in the ``RECORD`` file. However, this
|
||||
technique requires a pre-computed database of files each artifact provides or a
|
||||
comparison with the actual artifact content. Both approaches are relatively
|
||||
expensive and time consuming operations which could be eliminated with the
|
||||
proposed ``provenance_url.json`` file.
|
||||
|
||||
Recording provenance information for installed distribution packages,
|
||||
both those obtained from direct URLs and by name/version from an index,
|
||||
can simplify auditing Python environments in general, beyond just
|
||||
the specific use case for containerized applications mentioned earlier.
|
||||
A community project `pip-audit
|
||||
<https://github.com/pypa/pip-audit>`__ raised their possible interest in
|
||||
`pypa/pip-audit#170`_.
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHOULD”,
|
||||
“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL”
|
||||
in this document are to be interpreted as described in :rfc:`2119`.
|
||||
|
||||
The ``provenance_url.json`` file SHOULD be created in the ``.dist-info``
|
||||
directory by installers when installing a :term:`Distribution Package`
|
||||
specified by name (and optionally by :term:`Version Specifier`).
|
||||
|
||||
This file MUST NOT be created when installing a distribution package from a requirement
|
||||
specifying a direct URL reference (including a VCS URL).
|
||||
|
||||
Only one of the files ``provenance_url.json`` and ``direct_url.json`` (from :pep:`610`),
|
||||
may be present in a given ``.dist-info`` directory; installers MUST NOT add both.
|
||||
|
||||
The ``provenance_url.json`` JSON file MUST be a dictionary, compliant with
|
||||
:rfc:`8259` and UTF-8 encoded.
|
||||
|
||||
If present, it MUST contain exactly two keys. The first one is ``url``, with
|
||||
type ``string``. The second key MUST be ``archive_info`` with a value defined
|
||||
below.
|
||||
|
||||
The value of the ``url`` key MUST be the URL from which the distribution package was downloaded. If a wheel is
|
||||
built from a source distribution, the ``url`` value MUST be the URL from which
|
||||
the source distribution was downloaded. If a wheel is downloaded and installed directly,
|
||||
the ``url`` field MUST be the URL from which the wheel was downloaded.
|
||||
As in the :ref:`direct URL origin specification<packaging:direct-url>`, the ``url`` value
|
||||
MUST be stripped of any sensitive authentication information for security reasons.
|
||||
|
||||
The user:password section of the URL MAY however be composed of environment
|
||||
variables, matching the following regular expression:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
\$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})?
|
||||
|
||||
Additionally, the user:password section of the URL MAY be a well-known,
|
||||
non-security sensitive string. A typical example is ``git`` in the case of an
|
||||
URL such as ``ssh://git@gitlab.com``.
|
||||
|
||||
The value of ``archive_info`` MUST be a dictionary with a single key
|
||||
``hashes``. The value of ``hashes`` is a dictionary mapping hash function names to a
|
||||
hex-encoded digest of the file referenced by the ``url`` value. Multiple hashes
|
||||
can be included, and it is up to the consumer to decide what to do with
|
||||
multiple hashes (it may validate all of them or a subset of them, or nothing at
|
||||
all).
|
||||
|
||||
Each hash MUST be one of the single argument hashes provided by
|
||||
:data:`py3.11:hashlib.algorithms_guaranteed`, excluding ``sha1`` and ``md5`` which MUST NOT be used.
|
||||
As of Python 3.11, with ``shake_128`` and ``shake_256`` excluded
|
||||
for being multi-argument, the allowed set of hashes is:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> import hashlib
|
||||
>>> sorted(hashlib.algorithms_guaranteed - {"shake_128", "shake_256", "sha1", "md5"})
|
||||
['blake2b', 'blake2s', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512']
|
||||
|
||||
Each hash MUST be referenced by the canonical name of the hash, always lower case.
|
||||
|
||||
Hashes ``sha1`` and ``md5`` MUST NOT be present, due to the security
|
||||
limitations of these hash algorithms. Conversely, hash ``sha256`` SHOULD
|
||||
be included.
|
||||
|
||||
Installers that cache distribution packages from an index SHOULD keep
|
||||
information related to the cached distribution artifact, so that
|
||||
the ``provenance_url.json`` file can be created even when installing distribution packages
|
||||
from the installer's cache.
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
Following the :ref:`packaging:recording-installed-packages` specification,
|
||||
installers may keep additional installer-specific files in the ``.dist-info``
|
||||
directory. To make sure this PEP does not cause any backwards compatibility
|
||||
issues, a :ref:`comprehensive survey of installers and libraries <710-tool-survey>`
|
||||
found no current tools that are using a similarly-named file,
|
||||
or other major feasibility concerns.
|
||||
|
||||
The :ref:`Wheel specification <packaging:binary-distribution-format>` lists files that can be
|
||||
present in the ``.dist-info`` directory. None of these file names collide with
|
||||
the proposed ``provenance_url.json`` file from this PEP.
|
||||
|
||||
Presence of provenance_url.json in installers and libraries
|
||||
-----------------------------------------------------------
|
||||
|
||||
A comprehensive survey of the existing installers, libraries, and dependency
|
||||
managers in the Python ecosystem analyzed the implications of adding support for
|
||||
``provenance_url.json`` to each tool.
|
||||
In summary, no major backwards compatibility issues, conflicts or feasibility blockers
|
||||
were found as of the time of writing of this PEP. More details about the survey
|
||||
can be found in the :ref:`710-tool-survey` section.
|
||||
|
||||
Compatibility with direct_url.json
|
||||
----------------------------------
|
||||
|
||||
This proposal does not make any changes to the ``direct_url.json`` file
|
||||
described in :pep:`610` and :ref:`its corresponding canonical PyPA spec
|
||||
<direct-url>`.
|
||||
|
||||
The content of ``provenance_url.json`` file was designed in a way to eventually
|
||||
allow installers reuse some of the logic supporting ``direct_url.json`` when a
|
||||
direct URL refers to a source archive or a wheel.
|
||||
|
||||
The main difference between the ``provenance_url.json`` and ``direct_url.json``
|
||||
files are the mandatory keys and their values in the ``provenance_url.json`` file.
|
||||
This helps make sure consumers of the ``provenance_url.json`` file can rely
|
||||
on its content, if the file is present in the ``.dist-info`` directory.
|
||||
|
||||
Security Implications
|
||||
=====================
|
||||
|
||||
One of the main security features of the ``provenance_url.json`` file is the
|
||||
ability to audit installed artifacts in Python environments. Tools can check
|
||||
which Python package indexes were used to install Python :term:`distribution
|
||||
packages <Distribution Package>` as well as the hash digests of their release
|
||||
artifacts.
|
||||
|
||||
As an example, we can take the recent compromised dependency chain in `the
|
||||
PyTorch incident <https://pytorch.org/blog/compromised-nightly-dependency/>`__.
|
||||
The PyTorch index provided a package named ``torchtriton``. An attacker
|
||||
published ``torchtriton`` on PyPI, which ran a malicious binary. By checking
|
||||
the URL of the installed Python distribution stated in the
|
||||
``provenance_url.json`` file, tools can automatically check the source of the
|
||||
installed Python distribution. In case of the PyTorch incident, the URL of
|
||||
``torchtriton`` should point to the PyTorch index, not PyPI. Tools can help
|
||||
identifying such malicious Python distributions installed by checking the
|
||||
installed Python distribution URL. A more exact check can include also the hash
|
||||
of the installed Python distribution stated in the ``provenance_url.json``
|
||||
file. Such checks on hashes can be helpful for mirrored Python package indexes
|
||||
where Python distributions are not distinguishable by their source URLs, making
|
||||
sure only desired Python package distributions are installed.
|
||||
|
||||
A malicious actor can intentionally adjust the content of
|
||||
``provenance_url.json`` to possibly hide provenance information of the
|
||||
installed Python distribution. A security check which would uncover such
|
||||
malicious activity is beyond scope of this PEP as it would require monitoring
|
||||
actions on the filesystem and eventually reviewing user or file permissions.
|
||||
|
||||
How to Teach This
|
||||
=================
|
||||
|
||||
The ``provenance_url.json`` metadata file is intended for tools and is not
|
||||
directly visible to end users.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
Examples of a valid provenance_url.json
|
||||
---------------------------------------
|
||||
|
||||
A valid ``provenance_url.json`` list multiple hashes:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"archive_info": {
|
||||
"hashes": {
|
||||
"blake2s": "fffeaf3d0bd71dc960ca2113af890a2f2198f2466f8cd58ce4b77c1fc54601ff",
|
||||
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f",
|
||||
"sha3_256": "c856930e0f707266d30e5b48c667a843d45e79bb30473c464e92dfa158285eab",
|
||||
"sha512": "6bad5536c30a0b2d5905318a1592948929fbac9baf3bcf2e7faeaf90f445f82bc2b656d0a89070d8a6a9395761f4793c83187bd640c64b2656a112b5be41f73d"
|
||||
}
|
||||
},
|
||||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
|
||||
}
|
||||
|
||||
A valid ``provenance_url.json`` listing a single hash entry:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"archive_info": {
|
||||
"hashes": {
|
||||
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
|
||||
}
|
||||
},
|
||||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
|
||||
}
|
||||
|
||||
A valid ``provenance_url.json`` listing a source distribution which was used to
|
||||
build and install a wheel:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"archive_info": {
|
||||
"hashes": {
|
||||
"sha256": "8bfe29f17c10e2f2e619de8033a07a224058d96b3bfe2ed61777596f7ffd7fa9"
|
||||
}
|
||||
},
|
||||
"url": "https://files.pythonhosted.org/packages/1d/43/ad8ae671de795ec2eafd86515ef9842ab68455009d864c058d0c3dcf680d/micropipenv-0.0.1.tar.gz"
|
||||
}
|
||||
|
||||
Examples of an invalid provenance_url.json
|
||||
------------------------------------------
|
||||
|
||||
The following example includes a ``hash`` key in the ``archive_info`` dictionary
|
||||
as originally designed in :pep:`610` and the data structure documented in
|
||||
:ref:`packaging:direct-url`.
|
||||
The ``hash`` key MUST NOT be present to prevent from any possible confusion
|
||||
with ``hashes`` and additional checks that would be required to keep hash
|
||||
values in sync.
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"archive_info": {
|
||||
"hash": "sha256=236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f",
|
||||
"hashes": {
|
||||
"sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
|
||||
}
|
||||
},
|
||||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
|
||||
}
|
||||
|
||||
Another example demonstrates an invalid hash name. The referenced hash name does not
|
||||
correspond to the canonical hash names described in this PEP and
|
||||
in the Python docs under :attr:`py3.11:hashlib.hash.name`.
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"archive_info": {
|
||||
"hashes": {
|
||||
"SHA-256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
|
||||
}
|
||||
},
|
||||
"url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
|
||||
}
|
||||
|
||||
|
||||
Example pip commands and their effect on provenance_url.json and direct_url.json
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
These commands generate a ``direct_url.json`` file but do not generate a
|
||||
``provenance_url.json`` file. These examples follow examples from :pep:`610`:
|
||||
|
||||
* ``pip install https://example.com/app-1.0.tgz``
|
||||
* ``pip install https://example.com/app-1.0.whl``
|
||||
* ``pip install "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"``
|
||||
* ``pip install ./app``
|
||||
* ``pip install file:///home/user/app``
|
||||
* ``pip install --editable "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"`` (in which case, ``url`` will be the local directory where the git repository has been cloned to, and ``dir_info`` will be present with ``"editable": true`` and no ``vcs_info`` will be set)
|
||||
* ``pip install -e ./app``
|
||||
|
||||
Commands that generate a ``provenance_url.json`` file but do not generate
|
||||
a ``direct_url.json`` file:
|
||||
|
||||
* ``pip install app``
|
||||
* ``pip install app~=2.2.0``
|
||||
* ``pip install app --no-index --find-links "https://example.com/"``
|
||||
|
||||
This behaviour can be tested using changes to pip implemented in the PR
|
||||
`pypa/pip#11865`_.
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
A proof-of-concept for creating the ``provenance_url.json`` metadata file when
|
||||
installing a Python :term:`Distribution Package` is available in the PR to pip
|
||||
`pypa/pip#11865`_. It reuses the already available implementation for the
|
||||
:ref:`direct URL data structure <packaging:direct-url-data-structure>` to provide
|
||||
the ``provenance_url.json`` metadata file for cases when ``direct_url.json`` is not
|
||||
created.
|
||||
|
||||
A prototype called `pip-preserve <pip_preserve_>`_ was developed to
|
||||
demonstrate creation of ``requirements.txt`` files considering ``direct_url.json``
|
||||
and ``provenance_url.json`` metadata files. This tool mimics the ``pip
|
||||
freeze`` functionality, but the listing of installed packages also includes
|
||||
the hashes of the Python distribution artifacts.
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
Naming the file direct_url.json instead of provenance_url.json
|
||||
--------------------------------------------------------------
|
||||
|
||||
To preserve backwards compatibility with the
|
||||
:ref:`Direct URL Origin specification <packaging:direct-url>`,
|
||||
the file cannot be named ``direct_url.json``, as per the text of that specification:
|
||||
|
||||
This file MUST NOT be created when installing a distribution from an other
|
||||
type of requirement (i.e. name plus version specifier).
|
||||
|
||||
Such a change might introduce backwards compatibility issues for consumers of
|
||||
``direct_url.json`` who rely on its presence only when distributions are
|
||||
installed using a direct URL reference.
|
||||
|
||||
Deprecating direct_url.json and using only provenance_url.json
|
||||
--------------------------------------------------------------
|
||||
|
||||
File ``direct_url.json`` is already well established with :pep:`610` being accepted and is
|
||||
already used by installers. For example, ``pip`` uses ``direct_url.json`` to
|
||||
report a direct URL reference on ``pip freeze``. Deprecating
|
||||
``direct_url.json`` would require additional changes to the ``pip freeze``
|
||||
implementation in pip (see PR `fridex/pip#2`_) and could introduce backwards compatibility
|
||||
issues for already existing ``direct_url.json`` consumers.
|
||||
|
||||
Keeping the hash key in the archive_info dictionary
|
||||
---------------------------------------------------
|
||||
|
||||
:pep:`610` and :ref:`its corresponding canonical PyPA spec <direct-url>` discuss
|
||||
the possibility to include the ``hash`` key alongside the ``hashes`` key in the
|
||||
``archive_info`` dictionary. This PEP explicitly does not include the ``hash`` key in
|
||||
the ``provenance_url.json`` file and allows only the ``hashes`` key to be present.
|
||||
By doing so we eliminate possible redundancy in the file, possible confusion,
|
||||
and any additional checks that would need to be done to make sure the hashes are in
|
||||
sync.
|
||||
|
||||
Making the hashes key optional
|
||||
------------------------------
|
||||
|
||||
:pep:`610` and :ref:`its corresponding canonical PyPA spec <direct-url>`
|
||||
recommend including the ``hashes`` key of the ``archive_info`` in the
|
||||
``direct_url.json`` file but it is not required (per the :rfc:`21119` language):
|
||||
|
||||
A hashes key SHOULD be present as a dictionary mapping a hash name to a hex
|
||||
encoded digest of the file.
|
||||
|
||||
This PEP requires the ``hashes`` key be included in ``archive_info``
|
||||
in the ``provenance_url.json`` file if that file is created; per this PEP:
|
||||
|
||||
The value of ``archive_info`` MUST be a dictionary with a single key
|
||||
``hashes``.
|
||||
|
||||
By doing so, consumers of ``provenance_url.json`` can check
|
||||
artifact digests when the ``provenance_url.json`` file is created by installers.
|
||||
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
Availability of the provenance_url.json file in Conda
|
||||
-----------------------------------------------------
|
||||
|
||||
We would like to get feedback on the ``provenance_url.json`` file from the Conda
|
||||
maintainers. It is not clear whether Conda would like to adopt the
|
||||
``provenance_url.json`` file. Conda already stores provenance related
|
||||
information (similar to the provenance information proposed in this PEP) in
|
||||
JSON files located in the ``conda-meta`` directory `following its actions
|
||||
during installation
|
||||
<https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/install.html>`__.
|
||||
|
||||
Using provenance_url.json in downstream installers
|
||||
--------------------------------------------------
|
||||
|
||||
The proposed ``provenance_url.json`` file was meant to be adopted primarily by
|
||||
Python installers. Other installers, such as APT or DNF, might record the
|
||||
provenance of the installed downstream Python distributions in their own
|
||||
way specific to downstream package management. The proposed file is
|
||||
not expected to be created by these downstream package installers and thus they
|
||||
were intentionally left out of this PEP. However, any input by developers or
|
||||
maintainers of these installers is valuable to possibly enrich the
|
||||
``provenance_url.json`` file with information that would help in some way.
|
||||
|
||||
.. _710-tool-survey:
|
||||
|
||||
Appendix: Survey of installers and libraries
|
||||
============================================
|
||||
|
||||
pip
|
||||
---
|
||||
|
||||
The function from pip's internal API responsible for installing wheels, named
|
||||
`_install_wheel
|
||||
<https://github.com/pypa/pip/blob/10d9cbc601e5cadc45163452b1bc463d8ad2c1f7/src/pip/_internal/operations/install/wheel.py#L432>`__,
|
||||
does not store any ``provenance_url.json`` file in the ``.dist-info``
|
||||
directory. Additionally, a prototype introducing the mentioned file to pip in
|
||||
`pypa/pip#11865`_ demonstrates incorporating logic for handling the
|
||||
``provenance_url.json`` file in pip's source code.
|
||||
|
||||
As pip is used by some of the tools mentioned below to install Python package
|
||||
distributions, findings for pip apply to these tools, as well as pip does not
|
||||
allow parametrizing creation of files in the ``.dist-info`` directory in its
|
||||
internal API. Most of the tools mentioned below that use pip invoke pip as a
|
||||
subprocess which has no effect on the eventual presence of the
|
||||
``provenance_url.json`` file in the ``.dist-info`` directory.
|
||||
|
||||
distlib
|
||||
-------
|
||||
|
||||
`distlib`_ implements low-level functionality to manipulate the
|
||||
``dist-info`` directory. The database of installed distributions does not use
|
||||
any file named ``provenance_url.json``, based on `the distlib's source code
|
||||
<https://github.com/pypa/distlib/blob/05375908c1b2d6b0e74bdeb574569d3609db9f56/distlib/database.py#L39-L40>`__.
|
||||
|
||||
Pipenv
|
||||
------
|
||||
|
||||
`Pipenv`_ uses pip `to install Python package distributions
|
||||
<https://github.com/pypa/pipenv/blob/babd428d8ee3c5caeb818d746f715c02f338839b/pipenv/routines/install.py#L262>`__.
|
||||
There wasn't any additional identified logic that would cause backwards
|
||||
compatibility issues when introducing the ``provenance_url.json`` file in the
|
||||
``.dist-info`` directory.
|
||||
|
||||
installer
|
||||
---------
|
||||
|
||||
`installer`_ does not create a ``provenance_url.json`` file explicitly.
|
||||
Nevertheless, as per the :ref:`Recording Installed Projects <packaging:recording-installed-packages>`
|
||||
specification, installer allows passing the ``additional_metadata`` argument to
|
||||
create a file in the ``.dist-info`` directory - see `the source code
|
||||
<https://github.com/pypa/installer/blob/f89b5d93a643ef5e9858a6e3f450c83a57bbe1f1/src/installer/_core.py#L67>`__.
|
||||
To avoid any backwards compatibility issues, any library or tool using
|
||||
installer must not request creating the ``provenance_url.json`` file using the
|
||||
mentioned ``additional_metadata`` argument.
|
||||
|
||||
Poetry
|
||||
------
|
||||
|
||||
The installation logic in `Poetry`_ depends on the
|
||||
``installer.modern-installer`` configuration option (`see docs
|
||||
<https://python-poetry.org/docs/configuration#installermodern-installation>`__).
|
||||
|
||||
For cases when the ``installer.modern-installer`` configuration option is set
|
||||
to ``false``, Poetry uses `pip for installing Python package distributions
|
||||
<https://github.com/python-poetry/poetry/blob/2b15ce10f02b0c6347fe2f12ae902488edeaaf7c/src/poetry/installation/executor.py#L543-L544>`__.
|
||||
|
||||
On the other hand, when ``installer.modern-installer`` configuration option is
|
||||
set to ``true``, Poetry uses `installer to install Python package distributions
|
||||
<https://github.com/python-poetry/poetry/blob/2b15ce10f02b0c6347fe2f12ae902488edeaaf7c/src/poetry/installation/wheel_installer.py#L99-L109>`__.
|
||||
As can be seen from the linked sources, there isn't passed any additional
|
||||
metadata file named ``provenance_url.json`` that would cause compatibility
|
||||
issues with this PEP.
|
||||
|
||||
Conda
|
||||
-----
|
||||
|
||||
`Conda`_ does not create any ``provenance_url.json`` file
|
||||
`when Python package distributions are installed
|
||||
<https://github.com/conda/conda/blob/86e83925e17c68233ac659633bdc4d76b05a245a/conda/common/pkg_formats/python.py#L370-L390>`__.
|
||||
|
||||
Hatch
|
||||
-----
|
||||
|
||||
`Hatch`_ uses pip `to install project dependencies
|
||||
<https://github.com/pypa/hatch/blob/dd6e9545a355a0b5b58e065b489c1ef087e3bcaf/src/hatch/env/system.py#L28-L29>`__.
|
||||
|
||||
micropipenv
|
||||
-----------
|
||||
|
||||
As `micropipenv`_ is a wrapper on top of pip, it uses
|
||||
pip to install Python distributions, for both `lock files
|
||||
<https://github.com/thoth-station/micropipenv/blob/8176862ec96df23e152938659d6f45645246e398/micropipenv.py#L393>`__
|
||||
as well as `for requirements files
|
||||
<https://github.com/thoth-station/micropipenv/blob/8176862ec96df23e152938659d6f45645246e398/micropipenv.py#L977>`__.
|
||||
|
||||
Thamos
|
||||
------
|
||||
|
||||
`Thamos`_ uses micropipenv `to install Python package
|
||||
distributions
|
||||
<https://github.com/thoth-station/thamos/blob/234351025c77cfe28b0df07f7ee017469b57d3f4/thamos/lib.py#L1290>`__,
|
||||
hence any findings for micropipenv apply for Thamos.
|
||||
|
||||
PDM
|
||||
---
|
||||
|
||||
`PDM`_ uses installer `to install binary distributions
|
||||
<https://github.com/pdm-project/pdm/blob/d39a8e5b36c37093ea31e666d0e55fe21b38c16b/src/pdm/installers/installers.py#L241>`__.
|
||||
The only additional metadata file it eventually creates in the ``.dist-info``
|
||||
directory is `the REFER_TO file
|
||||
<https://github.com/pdm-project/pdm/blob/d39a8e5b36c37093ea31e666d0e55fe21b38c16b/src/pdm/installers/installers.py#L197>`__.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. _pypa/pip#11865: https://github.com/pypa/pip/pull/11865
|
||||
|
||||
.. _fridex/pip#2: https://github.com/fridex/pip/pull/2/
|
||||
|
||||
.. _pip_preserve: https://pypi.org/project/pip-preserve/
|
||||
|
||||
.. _thoth-station/micropipenv#206: https://github.com/thoth-station/micropipenv/issues/206
|
||||
|
||||
.. _pypa/pip-audit#170: https://github.com/pypa/pip-audit/issues/170
|
||||
|
||||
.. _pip_installation_report: https://pip.pypa.io/en/stable/reference/installation-report/
|
||||
|
||||
.. _distlib: https://distlib.readthedocs.io/
|
||||
|
||||
.. _Pipenv: https://pipenv.pypa.io/
|
||||
|
||||
.. _installer: https://github.com/pypa/installer
|
||||
|
||||
.. _Poetry: https://python-poetry.org/
|
||||
|
||||
.. _Conda: https://docs.conda.io/
|
||||
|
||||
.. _Hatch: https://hatch.pypa.io/
|
||||
|
||||
.. _micropipenv: https://github.com/thoth-station/micropipenv
|
||||
|
||||
.. _Thamos: https://github.com/thoth-station/thamos/
|
||||
|
||||
.. _PDM: https://pdm.fming.dev/
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
Thanks to Dustin Ingram, Brett Cannon, and Paul Moore for the initial discussion in
|
||||
which this idea originated.
|
||||
|
||||
Thanks to Donald Stufft, Ofek Lev, and Trishank Kuppusamy for early feedback
|
||||
and support to work on this PEP.
|
||||
|
||||
Thanks to Gregory P. Smith, Stéphane Bidoul, and C.A.M. Gerlach for
|
||||
reviewing this PEP and providing valuable suggestions.
|
||||
|
||||
Thanks to Stéphane Bidoul and Chris Jerdonek for :pep:`610`.
|
||||
|
||||
Last, but not least, thanks to Donald Stufft for sponsoring this PEP.
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the CC0-1.0-Universal
|
||||
license, whichever is more permissive.
|
Loading…
Reference in New Issue