diff --git a/peps/pep-0710.rst b/peps/pep-0710.rst index ae28e2db3..063c2d075 100644 --- a/peps/pep-0710.rst +++ b/peps/pep-0710.rst @@ -21,11 +21,11 @@ This PEP describes a way to record the provenance of installed Python distributi The record is created by an installer and is available to users in the form of a JSON file ``provenance_url.json`` in the ``.dist-info`` directory. The mentioned JSON file captures additional metadata to allow recording a URL to a -:term:`distribution package` together with the installed distribution hash. This -proposal is built on top of :pep:`610` following -:ref:`its corresponding canonical PyPA spec ` and -complements ``direct_url.json`` with ``provenance_url.json`` for when packages -are identified by a name, and optionally a version. +:term:`distribution package` together with the installed distribution hash. +This proposal is built on top of :pep:`610` following :ref:`its corresponding +canonical PyPA spec ` and complements ``direct_url.json`` +with ``provenance_url.json`` for when packages are identified by a name, and +optionally a version. Motivation ========== @@ -38,7 +38,7 @@ is generally lost. However, there are use cases for keeping records of distributions used for installing packages and their provenance. Python wheels can be built with different compiler flags or supporting -different wheel tags. In both cases, users might get into a situation in which +different wheel tags. In both cases, users might get into a situation in which multiple wheels might be considered by installers (possibly from different package indexes) and immediately finding out which wheel file was actually used during the installation might be helpful. This way, developers can use @@ -52,10 +52,11 @@ artifact consumed from a Python package index. Rationale ========= -The motivation described in this PEP is an extension of that in :pep:`610`. -In addition to recording provenance information for packages installed using a direct URL, -installers should also do so for packages installed by name -(and optionally version) from Python package indexes. +The motivation described in this PEP is an extension of :ref:`Recording the +Direct URL Origin of installed distributions ` +specification. In addition to recording provenance information for packages +installed using a direct URL, installers should also do so for packages +installed by name (and optionally version) from Python package indexes. The idea described in this PEP originated in a tool called `micropipenv`_ that is used to install @@ -112,22 +113,28 @@ specified by name (and optionally by :term:`Version Specifier`). This file MUST NOT be created when installing a distribution package from a requirement specifying a direct URL reference (including a VCS URL). -Only one of the files ``provenance_url.json`` and ``direct_url.json`` (from :pep:`610`), -may be present in a given ``.dist-info`` directory; installers MUST NOT add both. +Only one of the files ``provenance_url.json`` and ``direct_url.json`` (from +:ref:`Recording the Direct URL Origin of installed distributions +` specification and the corresponding specification of +the :ref:`Direct URL Data Structure `), +may be present in a given ``.dist-info`` directory; installers MUST NOT add +both. The ``provenance_url.json`` JSON file MUST be a dictionary, compliant with :rfc:`8259` and UTF-8 encoded. If present, it MUST contain exactly two keys. The first MUST be ``url``, with -type ``string``. The second key MUST be ``archive_info`` with a value defined +type ``string``. The second key MUST be ``archive_info`` with a value defined below. -The value of the ``url`` key MUST be the URL from which the distribution package was downloaded. If a wheel is -built from a source distribution, the ``url`` value MUST be the URL from which -the source distribution was downloaded. If a wheel is downloaded and installed directly, -the ``url`` field MUST be the URL from which the wheel was downloaded. -As in the :ref:`direct URL origin specification`, the ``url`` value -MUST be stripped of any sensitive authentication information for security reasons. +The value of the ``url`` key MUST be the URL from which the distribution +package was downloaded. If a wheel is built from a source distribution, the +``url`` value MUST be the URL from which the source distribution was +downloaded. If a wheel is downloaded and installed directly, the ``url`` field +MUST be the URL from which the wheel was downloaded. As in the :ref:`Direct URL +Data Structure ` specification, the ``url`` +value MUST be stripped of any sensitive authentication information for security +reasons. The user:password section of the URL MAY however be composed of environment variables, matching the following regular expression: @@ -141,7 +148,7 @@ non-security sensitive string. A typical example is ``git`` in the case of an URL such as ``ssh://git@gitlab.com``. The value of ``archive_info`` MUST be a dictionary with a single key -``hashes``. The value of ``hashes`` is a dictionary mapping hash function +``hashes``. The value of ``hashes`` is a dictionary mapping hash function names to a hex-encoded digest of the file referenced by the ``url`` value. At least one hash MUST be recorded. Multiple hashes MAY be included, and it is up to the consumer to decide what to do with multiple hashes (it may validate all @@ -174,7 +181,7 @@ Backwards Compatibility Following the :ref:`packaging:recording-installed-packages` specification, installers may keep additional installer-specific files in the ``.dist-info`` -directory. To make sure this PEP does not cause any backwards compatibility +directory. To make sure this PEP does not cause any backwards compatibility issues, a `comprehensive survey of installers and libraries <710-tool-survey_>`_ found no current tools that are using a similarly-named file, or other major feasibility concerns. @@ -204,7 +211,7 @@ The content of ``provenance_url.json`` file was designed in a way to eventually allow installers reuse some of the logic supporting ``direct_url.json`` when a direct URL refers to a source archive or a wheel. -The main difference between the ``provenance_url.json`` and ``direct_url.json`` +The main difference between the ``provenance_url.json`` and ``direct_url.json`` files are the mandatory keys and their values in the ``provenance_url.json`` file. This helps make sure consumers of the ``provenance_url.json`` file can rely on its content, if the file is present in the ``.dist-info`` directory. @@ -297,12 +304,11 @@ build and install a wheel: Examples of an invalid provenance_url.json ------------------------------------------ -The following example includes a ``hash`` key in the ``archive_info`` dictionary -as originally designed in :pep:`610` and the data structure documented in -:ref:`packaging:direct-url`. -The ``hash`` key MUST NOT be present to prevent from any possible confusion -with ``hashes`` and additional checks that would be required to keep hash -values in sync. +The following example includes a ``hash`` key in the ``archive_info`` +dictionary as originally designed in the data structure documented in +:ref:`packaging:direct-url`. The ``hash`` key MUST NOT be present to prevent +from any possible confusion with ``hashes`` and additional checks that would be +required to keep hash values in sync. .. code-block:: json @@ -347,7 +353,8 @@ Example pip commands and their effect on provenance_url.json and direct_url.json -------------------------------------------------------------------------------- These commands generate a ``direct_url.json`` file but do not generate a -``provenance_url.json`` file. These examples follow examples from :pep:`610`: +``provenance_url.json`` file. These examples follow examples from :ref:`Direct +URL Data Structure ` specification: * ``pip install https://example.com/app-1.0.tgz`` * ``pip install https://example.com/app-1.0.whl`` @@ -373,16 +380,16 @@ Reference Implementation A proof-of-concept for creating the ``provenance_url.json`` metadata file when installing a Python :term:`Distribution Package` is available in the PR to pip `pypa/pip#11865`_. It reuses the already available implementation for the -:ref:`direct URL data structure ` to provide -the ``provenance_url.json`` metadata file for cases when ``direct_url.json`` is not -created. +:ref:`direct URL data structure ` to +provide the ``provenance_url.json`` metadata file for cases when +``direct_url.json`` is not created. A reference implementation for supporting the ``provenance_url.json`` file in PDM exists is available in `pdm-project/pdm#3013`_. A prototype called `pip-preserve `_ was developed to demonstrate creation of ``requirements.txt`` files considering ``direct_url.json`` -and ``provenance_url.json`` metadata files. This tool mimics the ``pip +and ``provenance_url.json`` metadata files. This tool mimics the ``pip freeze`` functionality, but the listing of installed packages also includes the hashes of the Python distribution artifacts. @@ -396,9 +403,8 @@ Rejected Ideas Naming the file direct_url.json instead of provenance_url.json -------------------------------------------------------------- -To preserve backwards compatibility with the -:ref:`Direct URL Origin specification `, -the file cannot be named ``direct_url.json``, as per the text of that specification: +To preserve backwards compatibility with the :ref:`Recording the Direct URL Origin of installed distributions `, the file cannot be named +``direct_url.json``, as per the text of that specification: This file MUST NOT be created when installing a distribution from an other type of requirement (i.e. name plus version specifier). @@ -410,23 +416,24 @@ installed using a direct URL reference. Deprecating direct_url.json and using only provenance_url.json -------------------------------------------------------------- -File ``direct_url.json`` is already well established with :pep:`610` being accepted and is +File ``direct_url.json`` is already well established by the :ref:`Direct URL +Data Structure ` specification and is already used by installers. For example, ``pip`` uses ``direct_url.json`` to report a direct URL reference on ``pip freeze``. Deprecating ``direct_url.json`` would require additional changes to the ``pip freeze`` -implementation in pip (see PR `fridex/pip#2`_) and could introduce backwards compatibility -issues for already existing ``direct_url.json`` consumers. +implementation in pip (see PR `fridex/pip#2`_) and could introduce backwards +compatibility issues for already existing ``direct_url.json`` consumers. Keeping the hash key in the archive_info dictionary --------------------------------------------------- -:pep:`610` and :ref:`its corresponding canonical PyPA spec ` -discuss the possibility to include the ``hash`` key alongside the ``hashes`` key in the -``archive_info`` dictionary. This PEP explicitly does not include the ``hash`` key in -the ``provenance_url.json`` file and allows only the ``hashes`` key to be present. -By doing so we eliminate possible redundancy in the file, possible confusion, -and any additional checks that would need to be done to make sure the hashes are in -sync. +:ref:`Direct URL Data Structure ` +specification discusses the possibility to include the ``hash`` key alongside +the ``hashes`` key in the ``archive_info`` dictionary. This PEP explicitly does +not include the ``hash`` key in the ``provenance_url.json`` file and allows +only the ``hashes`` key to be present. By doing so we eliminate possible +redundancy in the file, possible confusion, and any additional checks that +would need to be done to make sure the hashes are in sync. Allowing no hashes stated ------------------------- @@ -670,7 +677,10 @@ reviewing this PEP and providing valuable suggestions. Thanks to Seth Michael Larson for providing valuable suggestions and for the proposed pip-sbom prototype. -Thanks to Stéphane Bidoul and Chris Jerdonek for :pep:`610`. +Thanks to Stéphane Bidoul and Chris Jerdonek for :pep:`610`, and related +:ref:`Recording the Direct URL Origin of installed distributions +` and :ref:`Direct URL Data Structure +` specifications. Thanks to Frost Ming for raising possible concern around storing index URL in the ``provenance_url.json`` file and initial PEP 710 support in PDM.