From 1e094877fd08138acccc9285fa2be5752f584fa3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Bidoul=20=28ACSONE=29?= Date: Thu, 14 Nov 2019 18:41:21 +0100 Subject: [PATCH] Recording the Direct URL Origin of installed distributions (#1145) --- pep-9999.rst | 508 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 508 insertions(+) create mode 100644 pep-9999.rst diff --git a/pep-9999.rst b/pep-9999.rst new file mode 100644 index 000000000..5cce2034b --- /dev/null +++ b/pep-9999.rst @@ -0,0 +1,508 @@ +PEP: 9999 +Title: Recording the Direct URL Origin of installed distributions +Author: Stéphane Bidoul , Chris Jerdonek +Discussions-To: https://discuss.python.org/t/recording-the-source-url-of-an-installed-distribution/1535 +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 21-Apr-2019 +Post-History: + +Abstract +======== + +Following PEP 440, a distribution can be identified by a name and either a +version, or a direct URL reference (see `PEP440 Direct References`_). +After installation, the name and version are captured in the project metadata, +but currently there is no way to obtain details of the URL used when the +distribution was identified by a direct URL reference. + +This proposal defines +additional metadata, to be added to the installed distribution by the +installation front end, which records the Direct URL Origin for use by +consumers which introspect the database of installed packages (see PEP 376). + +Motivation +========== + +The original motivation of this PEP was to permit tools with a "freeze" +operation allowing a Python environment to be recreated to work in a broader +range of situations. + +Specifically, the PEP originated from the desire to address `pip issue #609`_: +i.e. improving the behavior of ``pip freeze`` in the presence of distributions +installed from direct URL references. It follows a +`thread on discuss.python.org`_ about the best course of action to implement +it. + +Installation from direct URL references +--------------------------------------- + +Python installers such as pip are capable of downloading and installing +distributions from package indexes. They are also capable of downloading +and installing source code from requirements specifying arbitrary URLs of +source archives and Version Control Systems (VCS) repositories, +as standardized in `PEP440 Direct References`_. + +In other words two relevant installation modes exist. + +1. the package to install is specified as a name and version specifier: + + In this case, the installer looks in a package index (or optionally + using ``--find-links`` in the case of pip) to find the distribution to install. + +2. The package to install is specified as a direct URL reference: + + In this case, the installer downloads whatever is specified by the URL + (typically a wheel, a source archive or a VCS repository) and installs it. + + In this mode, installers typically download the source code in a + temporary directory, invoke the PEP 517 build backend to produce a wheel + if needed, install the wheel, and delete the temporary directory. + + After installation, no trace of the URL the user requested to download the + package is left on the user system. + +Freezing an environment +----------------------- + +Pip also sports a command named ``pip freeze`` which examines the Database of +Installed Python Distributions to generate a list of requirements. The main +goal of this command is to help users generating a list of requirements that +will later allow the re-installation the same environment with the highest +possible fidelity. + +The ``pip freeze`` command outputs a ``name==version`` line for each installed +distribution (except for editable installs). To achieve the goal of +reinstalling the same environment, this requires the (name, version) +tuple to refer to an immutable version of the +distribution. The immutability is guaranteed by package indexes +such as Warehouse. The package index to use is typically known from +environmental or command line parameters of the installer. + +This freeze mechanism therefore works fine for installation mode 1 (i.e. +when the package to install was specified as a name plus version specifier). + +For installation mode 2, i.e. when the package to install was specified as a +direct URL reference, the ``name==version`` tuple is obviously not sufficient +to reinstall the same distribution and users of the freeze command expect it +to output the URL that was originally requested. + +The reasoning above is equally applicable to tools, other than ``pip freeze``, +that would attempt to generate a ``Pipfile.lock`` or any other similar format +from the Database of Installed Python Distributions. Unless specified +otherwise, "freeze" is used in this document as a generic term for such +an operation. + +The importance of installing from (VCS) URLs for application integrators +------------------------------------------------------------------------ + +For an application integrator, it is important to be able to reliably install +and freeze unreleased version of python distributions. +For instance when a developer needs to deploy an unreleased patched version +of a dependency, it is common to install the dependency directly from a VCS +branch that has the patch, while waiting for the maintainer to release an +updated version. + +In such cases, it is important for "freeze" to pin the exact VCS +reference (commit-hash if available) that was installed, in order to create +reproducible builds with the highest possible fidelity. + +Additional origin metadata available for VCS URLs +------------------------------------------------- + +For VCS URLs, there is additional origin information available only at +install time useful for introspection and certain workflows. For example, +when installing a revision from a VCS URL, a tool can determine if the +revision corresponds to a branch, tag or (in the case of Git) a ref. This +information can be used when introspecting the database of installed packages +to communicate to users more information about what version was installed +(e.g. whether a branch or tag was installed and, if so, the name of the +branch or tag). This also permits one to know whether a PEP 440 direct +reference URL can be constructed using the tag form, as only tags have the +semantics of immutability. + +In cases where the revision is mutable (e.g. branches and Git refs), knowing +this information enables workflows where users can e.g. update to the latest +version of a branch they are tracking, or update to the latest version of a +pull request they are reviewing locally. In contrast, when the revision is a +tag, tools can know in advance (e.g. without network calls) that no update is +needed. + +As with the URL itself, if this information isn't recorded at install time +when the VCS repository is available, it would otherwise be lost. + +Note about "editable" installs +------------------------------ + +The editable installation mode of pip roughly lets a user insert a +local directory in sys.path for development purpose. This mode is somewhat +abused to work around the fact that a non-editable install from a VCS URL +loses track of the origin after installation. +Indeed editable installs implicitly record the VCS origin in the checkout +directory, so the information can be recovered when running "freeze". + +The use of this workaround, although useful, is fragile, creates confusion +about the purpose of the editable mode, and works only when the distribution +can be installed with setuptools (i.e. it is not usable with other PEP 517 +build backends). + +For the sake of clarity, it is important to note that this PEP is otherwise +unrelated to editable installs. + +Rationale +========= + +This PEP specifies a new ``direct_url.json`` metadata file in the +``.dist-info`` directory of an installed distribution. + +The fields specified are sufficient to reproduce the source archive and `VCS +URLs supported by pip`_. They are also sufficient to reproduce `PEP440 Direct +References`_, as well as `Pipfile and Pipfile.lock`_ entries. Finally, they +are sufficient to record the branch, tag, and/or Git ref origin of the +installed version that is already available for editable installs by virtue +of a VCS checkout being present. + +Since at least three different ways already exist to encode this type of +information, this PEP uses a key-value format, so as not to make any +assumption on how a direct +URL reference must ultimately be encoded in a requirement or lockfile. See also +the `Alternatives`_ section below for more discussion about this choice. + +Information has been taken from Ruby's bundler manual to verify it has similar +capabilities and inform the selection and naming of fields in this +specifications. + +The JSON format allows for the addition of additional fields in the future. + +Specification +============= + +This PEP specifies a ``direct_url.json`` file in the ``.dist-info`` directory +of an installed distribution, to record the Direct URL Origin of the distribution. + +This file MUST be created by installers when installing a distribution +from a requirement specifying a direct URL reference (including a VCS URL) +in *non*-editable mode. + +This file MUST NOT be created when installing a distribution from an other +type of requirement (i.e. name plus version specifier, or URL in editable mode). + +This JSON MUST be a flat dictionary where all keys and values are of string type. +For the sake of forward compatibility, tools SHOULD ignore values which are +not of string type. + +If present, it MUST contain at least one field with name ``url``. + +``url`` MUST be stripped of any sensitive authentication information, +for security reasons. The user:password section of the URL MAY however +be composed of environment variables, matching the following regular +expression:: + + \$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})? + +When ``url`` refers to a VCS repository: + +- A ``vcs`` field MUST be present, containing the name of the VCS + (i.e. one of ``git``, ``hg``, ``bzr``, ``svn``). Other VCS's SHOULD be registered by + amending this PEP. +- The ``url`` value MUST be compatible with the corresponding VCS, + so an installer can hand it off without transformation to a + checkout/download command of the VCS. +- A ``requested_revision`` field MAY be present naming a + branch/tag/ref/commit/revision/etc (in a format compatible with the VCS) + to install. +- A ``commit_id`` field MUST be present, containing the + exact commit/revision number that was installed. + If the VCS supports commit-hash + based revision identifiers, such commit-hash MUST be used as + ``commit_id`` in order to reference the immutable + version of the source code that was installed. +- A ``tag`` field naming a tag MAY be present to indicate that a particular + tag was installed. +- A ``branch`` field naming a branch MAY be present to indicate that a + particular branch was installed. If ``branch`` is present, ``tag`` MUST not + be present. + +When ``url`` refers to a source archive, a wheel, or a local directory: + +- A ``hash`` field SHOULD be present, with value + ``=``. + It is RECOMMENDED that only hashes which are unconditionally provided by + the latest version of the standard library's ``hashlib`` module be used for + source archive hashes. At time of writing, that list consists of 'md5', + 'sha1', 'sha224', 'sha256', 'sha384', and 'sha512'. + +.. note:: + + When the requested URL points to a local directory that happens to contain a + VCS checkout, installers MUST NOT attempt to infer any VCS information and + therefore MUST NOT output any vcs related information (such as the ``vcs`` field) + in ``direct_url.json``. + +A ``subdirectory`` field MAY be present containing a directory path, +relative to the root of the VCS repository, source archive or local directory, +to specify where ``pyproject.toml`` or ``setup.py`` is located. + +.. note:: + + As a general rule, installers should as much as possible preserve the + information that was provided in the requested URL when generating + ``direct_url.json``. For example user:password environment variables + should be preserved and ``requested_revision`` should reflect the revision that was + provided in the requested URL as faithfully as possible. This information is + however *enriched* with more precise data, such as ``commit_id``. + +Registered VCS +-------------- + +This section lists the registered VCS's; expanded, VCS-specific information +on how to use the ``vcs``, ``requested_revision``, and other fields; and in +some cases additional VCS-specific fields. +Tools MAY support other VCS's although it is RECOMMENDED to register +them by amending this PEP. The ``vcs`` field SHOULD be the command name +(lowercased). Additional fields that would be necessary to +support such VCS SHOULD be prefixed with the VCS command name. + +Git ++++ + +Home page + + https://git-scm.com/ + +vcs command + + git + +vcs field + + git + +requested_revision field + + A tag name, branch name, Git ref, commit hash, shortened commit hash, + or other commit-ish. + +commit_id field + + A commit hash (40 hexadecimal characters sha1). + +branch field + + If no ``requested_revision`` is provided and the remote repository has + a default branch, this field can be used to record the default branch that + was installed. + +VCS-specific fields: + +- A ``git_ref`` field naming a Git ref (string beginning with ``refs/``) MAY + be present to indicate that a particular ref was installed (e.g. + ``refs/pull/123/head``). + +.. note:: + + Installers can use the ``git show-ref`` and ``git symbolic-ref`` commands + to determine if the ``requested_revision`` corresponds to a Git ref. + In turn, a ref beginning with ``refs/tags/`` corresponds to a tag, and + a ref beginning with ``refs/remotes/origin/`` after cloning corresponds + to a branch. + +Mercurial ++++++++++ + +Home page + + https://www.mercurial-scm.org/ + +vcs command + + hg + +vcs field + + hg + +requested_revision field + + A tag name, branch name, changeset ID, shortened changeset ID. + +commit_id field + + A changeset ID (40 hexadecimal characters). + +Bazaar +++++++ + +Home page + + https://bazaar.canonical.com/ + +vcs command + + bzr + +vcs field + + bzr + +requested_revision field + + A tag name, branch name, revision id. + +commit_id field + + A revision id. + +Subversion +++++++++++ + +Home page + + https://subversion.apache.org/ + +vcs command + + svn + +vcs field + + svn + +requested_revision field + + ``requested_revision`` must be compatible with ``svn checkout`` ``--revision`` option. + In Subversion, branch or tag is part of ``url``. + +commit_id + + Since Subversion does not support globally unique identifiers, + this field is the Subversion revision number in the corresponding + repository. + +Examples +======== + +Example direct_url.json +----------------------- + +Source archive: + +.. code:: + + { + "url": "https://github.com/pypa/pip/archive/1.3.1.zip", + "hash": "sha256=2dc6b5a470a1bde68946f263f1af1515a2574a150a30d6ce02c6ff742fcc0db8" + } + +Git URL with tag and commit-hash: + +.. code:: + + { + "url": "https://github.com/pypa/pip.git", + "vcs": "git", + "requested_revision": "1.3.1", + "commit_id": "7921be1537eac1e97bc40179a57f0349c2aee67d" + } + +Example pip commands and their effect on direct_url.json +-------------------------------------------------------- + +Commands that generate a ``direct_url.json``: + +* pip install https://example.com/app-1.0.tgz +* pip install https://example.com/app-1.0.whl +* pip install "git+https://example.com/repo/app.git#egg=app&subdirectory=setup" +* pip install ./app +* pip install file:///home/user/app + +Commands that *do not* generate a ``direct_url.json`` + +* pip install app +* pip install app --no-index --find-links https://example.com/ +* pip install --editable "git+https://example.com/repo/app.git#egg=app&subdirectory=setup" +* pip install -e ./app + +Use cases +========= + +"Freezing" an environment + + Tools, such as ``pip freeze``, which generate requirements from the Database + of Installed Python Distributions SHOULD exploit ``direct_url.json`` + if it is present, and give it priority over the Version metadata in order + to generate a higher fidelity output. In the presence of a ``vcs`` direct URL reference, + the ``commit_id`` field SHOULD be used in priority in order to provide + the highest possible fidelity to the originally installed version. If + supported by their requirement format, tools are encouraged also to output + the ``tag`` value if present, as it has immutable semantics. + Tools MAY choose another approach, depending on the needs of their users. + +Backwards Compatibility +======================= + +Since this PEP specifies a new file in the ``.dist-info`` directory, +there are no backwards compatibility implications. + +Alternatives +============ + +PEP426 source_url +----------------- + +The now withdrawn PEP 426 specifies a ``source_url`` metadata entry. +It is also implemented in `distlib`_. + +It was intended for a slightly different purpose, for use in sdists. + +This format lacks support for the ``subdirectory`` option of pip requirement +URLs. The same limitation is present in `PEP440 Direct References`_. + +It also lacks explicit support for `environment variables in the user:password +part of URLs`_. + +The introduction of a key/value extensibility mechanism and support +for environment variables for user:password in PEP 440, would be necessary +for use in this PEP. + +revision vs ref +--------------- + +The ``requested_revision`` key was retained over ``requested_ref`` as it is a more generic term +across various VCS and ``ref`` has a specific meaning for ``git``. + + +References +========== + +.. _`pip issue #609`: https://github.com/pypa/pip/issues/609 +.. _`thread on discuss.python.org`: https://discuss.python.org/t/pip-freeze-vcs-urls-and-pep-517-feat-editable-installs/1473 +.. _PEP440: http://www.python.org/dev/peps/pep-0440 +.. _`VCS URLs supported by pip`: https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support +.. _`PEP440 Direct References`: https://www.python.org/dev/peps/pep-0440/#direct-references +.. _`Pipfile and Pipfile.lock`: https://github.com/pypa/pipfile +.. _distlib: https://distlib.readthedocs.io +.. _`environment variables in the user:password part of URLs`: https://pip.pypa.io/en/stable/reference/pip_install/#id10 + +Acknowledgements +================ + +Various people helped make this PEP a reality. Paul F. Moore provided the +essence of the abstract. Nick Coghlan suggested the direct_url name. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + + .. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: