PEP: 610 Title: Recording the Direct URL Origin of installed distributions Author: Stéphane Bidoul , Chris Jerdonek Sponsor: Alyssa Coghlan BDFL-Delegate: Pradyun Gedam Discussions-To: https://discuss.python.org/t/recording-the-source-url-of-an-installed-distribution/1535 Status: Final Type: Standards Track Topic: Packaging Content-Type: text/x-rst Created: 21-Apr-2019 Post-History: Resolution: https://discuss.python.org/t/1535/56 .. canonical-pypa-spec:: :ref:`packaging:direct-url` Abstract ======== Following :pep:`440`, a distribution can be identified by a name and either a version, or a direct URL reference (see :pep:`PEP440 Direct References <440#direct-references>`). After installation, the name and version are captured in the project metadata, but currently there is no way to obtain details of the URL used when the distribution was identified by a direct URL reference. This proposal defines additional metadata, to be added to the installed distribution by the installation front end, which records the Direct URL Origin for use by consumers which introspect the database of installed packages (see :pep:`376`). Motivation ========== The original motivation of this PEP was to permit tools with a "freeze" operation allowing a Python environment to be recreated to work in a broader range of situations. Specifically, the PEP originated from the desire to address `pip issue #609`_: i.e. improving the behavior of ``pip freeze`` in the presence of distributions installed from direct URL references. It follows a `thread on discuss.python.org`_ about the best course of action to implement it. Installation from direct URL references --------------------------------------- Python installers such as pip are capable of downloading and installing distributions from package indexes. They are also capable of downloading and installing source code from requirements specifying arbitrary URLs of source archives and Version Control Systems (VCS) repositories, as standardized in :pep:`PEP440 Direct References <440#direct-references>`. In other words, two relevant installation modes exist. 1. the package to install is specified as a name and version specifier: In this case, the installer looks in a package index (or optionally using ``--find-links`` in the case of pip) to find the distribution to install. 2. The package to install is specified as a direct URL reference: In this case, the installer downloads whatever is specified by the URL (typically a wheel, a source archive or a VCS repository) and installs it. In this mode, installers typically download the source code in a temporary directory, invoke the :pep:`517` build backend to produce a wheel if needed, install the wheel, and delete the temporary directory. After installation, no trace of the URL the user requested to download the package is left on the user system. Freezing an environment ----------------------- Pip also sports a command named ``pip freeze`` which examines the Database of Installed Python Distributions to generate a list of requirements. The main goal of this command is to help users generating a list of requirements that will later allow the re-installation the same environment with the highest possible fidelity. As of pip version 19.3, the ``pip freeze`` command outputs a ``name==version`` line for each installed distribution (except for editable installs). To achieve the goal of reinstalling the same environment, this requires the (name, version) tuple to refer to an immutable version of the distribution. The immutability is guaranteed by package indexes such as Warehouse. The package index to use is typically known from environmental or command line parameters of the installer. This freeze mechanism therefore works fine for installation mode 1 (i.e. when the package to install was specified as a name plus version specifier). For installation mode 2, i.e. when the package to install was specified as a direct URL reference, the ``name==version`` tuple is obviously not sufficient to reinstall the same distribution and users of the freeze command expect it to output the URL that was originally requested. The reasoning above is equally applicable to tools, other than ``pip freeze``, that would attempt to generate a ``Pipfile.lock`` or any other similar format from the Database of Installed Python Distributions. Unless specified otherwise, "freeze" is used in this document as a generic term for such an operation. The importance of installing from (VCS) URLs for application integrators ------------------------------------------------------------------------ For an application integrator, it is important to be able to reliably install and freeze unreleased version of python distributions. For instance when a developer needs to deploy an unreleased patched version of a dependency, it is common to install the dependency directly from a VCS branch that has the patch, while waiting for the maintainer to release an updated version. In such cases, it is important for "freeze" to pin the exact VCS reference (commit-hash if available) that was installed, in order to create reproducible builds with the highest possible fidelity. Additional origin metadata available for VCS URLs ------------------------------------------------- For VCS URLs, there is additional origin information available only at install time useful for introspection and certain workflows. For example, when installing a revision from a VCS URL, a tool can determine if the revision corresponds to a branch, tag or (in the case of Git) a ref. This information can be used when introspecting the Database of Installed Distributions to communicate to users more information about what version was installed (e.g. whether a branch or tag was installed and, if so, the name of the branch or tag). This also permits one to know whether a :pep:`440` direct reference URL can be constructed using the tag form, as only tags have the semantics of immutability. In cases where the revision is mutable (e.g. branches and Git refs), knowing this information enables workflows where users can e.g. update to the latest version of a branch they are tracking, or update to the latest version of a pull request they are reviewing locally. In contrast, when the revision is a tag, tools can know in advance (e.g. without network calls) that no update is needed. As with the URL itself, if this information isn't recorded at install time when the VCS repository is available, it would otherwise be lost. Note about "editable" installs ------------------------------ The editable installation mode of pip roughly lets a user insert a local directory in sys.path for development purpose. This mode is somewhat abused to work around the fact that a non-editable install from a VCS URL loses track of the origin after installation. Indeed, editable installs implicitly record the VCS origin in the checkout directory, so the information can be recovered when running "freeze". The use of this workaround, although useful, is fragile, creates confusion about the purpose of the editable mode, and works only when the distribution can be installed with setuptools (i.e. it is not usable with other :pep:`517` build backends). When this PEP is implemented, it will not be necessary anymore to use editable installs for the purpose of making pip freeze work correctly with VCS references. Rationale ========= This PEP specifies a new ``direct_url.json`` metadata file in the ``.dist-info`` directory of an installed distribution. The fields specified are sufficient to reproduce the source archive and `VCS URLs supported by pip`_. They are also sufficient to reproduce :pep:`PEP440 Direct References <440#direct-references>`, as well as `Pipfile and Pipfile.lock`_ entries. Finally, they are sufficient to record the branch, tag, and/or Git ref origin of the installed version that is already available for editable installs by virtue of a VCS checkout being present. Since at least three different ways already exist to encode this type of information, this PEP uses a dictionary format, so as not to make any assumption on how a direct URL reference must ultimately be encoded in a requirement or lockfile. See also the `Alternatives`_ section below for more discussion about this choice. Information has been taken from Ruby's bundler manual to verify it has similar capabilities and inform the selection and naming of fields in this specifications. The JSON format allows for the addition of additional fields in the future. Specification ============= This PEP specifies a ``direct_url.json`` file in the ``.dist-info`` directory of an installed distribution, to record the Direct URL Origin of the distribution. The canonical source for the name and semantics of this metadata file is the `Recording the Direct URL Origin of installed distributions`_ document. This file MUST be created by installers when installing a distribution from a requirement specifying a direct URL reference (including a VCS URL). This file MUST NOT be created when installing a distribution from an other type of requirement (i.e. name plus version specifier). This JSON file MUST be a dictionary, compliant with :rfc:`8259` and UTF-8 encoded. If present, it MUST contain at least two fields. The first one is ``url``, with type ``string``. Depending on what ``url`` refers to, the second field MUST be one of ``vcs_info`` (if ``url`` is a VCS reference), ``archive_info`` (if ``url`` is a source archives or a wheel), or ``dir_info`` (if ``url`` is a local directory). These info fields have a (possibly empty) subdictionary as value, with the possible keys defined below. ``url`` MUST be stripped of any sensitive authentication information, for security reasons. The user:password section of the URL MAY however be composed of environment variables, matching the following regular expression:: \$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})? Additionally, the user:password section of the URL MAY be a well-known, non security sensitive string. A typical example is ``git`` in the case of an URL such as ``ssh://git@gitlab.com``. When ``url`` refers to a VCS repository, the ``vcs_info`` key MUST be present as a dictionary with the following keys: - A ``vcs`` key (type ``string``) MUST be present, containing the name of the VCS (i.e. one of ``git``, ``hg``, ``bzr``, ``svn``). Other VCS's SHOULD be registered by writing a PEP to amend this specification. The ``url`` value MUST be compatible with the corresponding VCS, so an installer can hand it off without transformation to a checkout/download command of the VCS. - A ``requested_revision`` key (type ``string``) MAY be present naming a branch/tag/ref/commit/revision/etc (in a format compatible with the VCS) to install. - A ``commit_id`` key (type ``string``) MUST be present, containing the exact commit/revision number that was installed. If the VCS supports commit-hash based revision identifiers, such commit-hash MUST be used as ``commit_id`` in order to reference the immutable version of the source code that was installed. - If the installer could discover additional information about the requested revision, it MAY add a ``resolved_revision`` and/or ``resolved_revision_type`` field. If no revision was provided in the requested URL, ``resolved_revision`` MAY contain the default branch that was installed, and ``resolved_revision_type`` will be ``branch``. If the installer determines that ``requested_revision`` was a tag, it MAY add ``resolved_revision_type`` with value ``tag``. When ``url`` refers to a source archive or a wheel, the ``archive_info`` key MUST be present as a dictionary with the following key: - A ``hash`` key (type ``string``) SHOULD be present, with value ``=``. It is RECOMMENDED that only hashes which are unconditionally provided by the latest version of the standard library's ``hashlib`` module be used for source archive hashes. At time of writing, that list consists of 'md5', 'sha1', 'sha224', 'sha256', 'sha384', and 'sha512'. When ``url`` refers to a local directory, the ``dir_info`` key MUST be present as a dictionary with the following key: - ``editable`` (type: ``boolean``): ``true`` if the distribution was installed in editable mode, ``false`` otherwise. If absent, default to ``false``. When ``url`` refers to a local directory, it MUST have the ``file`` scheme and be compliant with :rfc:`8089`. In particular, the path component must be absolute. Symbolic links SHOULD be preserved when making relative paths absolute. .. note:: When the requested URL has the file:// scheme and points to a local directory that happens to contain a VCS checkout, installers MUST NOT attempt to infer any VCS information and therefore MUST NOT output any VCS related information (such as ``vcs_info``) in ``direct_url.json``. A top-level ``subdirectory`` field MAY be present containing a directory path, relative to the root of the VCS repository, source archive or local directory, to specify where ``pyproject.toml`` or ``setup.py`` is located. .. note:: As a general rule, installers should as much as possible preserve the information that was provided in the requested URL when generating ``direct_url.json``. For example, user:password environment variables should be preserved and ``requested_revision`` should reflect the revision that was provided in the requested URL as faithfully as possible. This information is however *enriched* with more precise data, such as ``commit_id``. Registered VCS -------------- This section lists the registered VCS's; expanded, VCS-specific information on how to use the ``vcs``, ``requested_revision``, and other fields of ``vcs_info``; and in some cases additional VCS-specific fields. Tools MAY support other VCS's although it is RECOMMENDED to register them by writing a PEP to amend this specification. The ``vcs`` field SHOULD be the command name (lowercased). Additional fields that would be necessary to support such VCS SHOULD be prefixed with the VCS command name. Git +++ Home page https://git-scm.com/ vcs command git ``vcs`` field git ``requested_revision`` field A tag name, branch name, Git ref, commit hash, shortened commit hash, or other commit-ish. ``commit_id`` field A commit hash (40 hexadecimal characters sha1). .. note:: Installers can use the ``git show-ref`` and ``git symbolic-ref`` commands to determine if the ``requested_revision`` corresponds to a Git ref. In turn, a ref beginning with ``refs/tags/`` corresponds to a tag, and a ref beginning with ``refs/remotes/origin/`` after cloning corresponds to a branch. Mercurial +++++++++ Home page https://www.mercurial-scm.org/ vcs command hg ``vcs`` field hg ``requested_revision`` field A tag name, branch name, changeset ID, shortened changeset ID. ``commit_id`` field A changeset ID (40 hexadecimal characters). Bazaar ++++++ Home page https://bazaar.canonical.com/ vcs command bzr ``vcs`` field bzr ``requested_revision`` field A tag name, branch name, revision id. ``commit_id`` field A revision id. Subversion ++++++++++ Home page https://subversion.apache.org/ vcs command svn ``vcs`` field svn ``requested_revision`` field ``requested_revision`` must be compatible with ``svn checkout`` ``--revision`` option. In Subversion, branch or tag is part of ``url``. ``commit_id`` field Since Subversion does not support globally unique identifiers, this field is the Subversion revision number in the corresponding repository. Examples ======== Example direct_url.json ----------------------- Source archive: .. code:: { "url": "https://github.com/pypa/pip/archive/1.3.1.zip", "archive_info": { "hash": "sha256=2dc6b5a470a1bde68946f263f1af1515a2574a150a30d6ce02c6ff742fcc0db8" } } Git URL with tag and commit-hash: .. code:: { "url": "https://github.com/pypa/pip.git", "vcs_info": { "vcs": "git", "requested_revision": "1.3.1", "resolved_revision_type": "tag", "commit_id": "7921be1537eac1e97bc40179a57f0349c2aee67d" } } Local directory: .. code:: { "url": "file:///home/user/project", "dir_info": {} } Local directory installed in editable mode: .. code:: { "url": "file:///home/user/project", "dir_info": { "editable": true } } Example pip commands and their effect on direct_url.json -------------------------------------------------------- Commands that generate a ``direct_url.json``: * pip install https://example.com/app-1.0.tgz * pip install https://example.com/app-1.0.whl * pip install "git+https://example.com/repo/app.git#egg=app&subdirectory=setup" * pip install ./app * pip install file:///home/user/app * pip install --editable "git+https://example.com/repo/app.git#egg=app&subdirectory=setup" (in which case, ``url`` will be the local directory where the git repository has been cloned to, and ``dir_info`` will be present with ``"editable": true`` and no ``vcs_info`` will be set) * pip install -e ./app Commands that *do not* generate a ``direct_url.json`` * pip install app * pip install app --no-index --find-links https://example.com/ Use cases ========= "Freezing" an environment Tools, such as ``pip freeze``, which generate requirements from the Database of Installed Python Distributions SHOULD exploit ``direct_url.json`` if it is present, and give it priority over the Version metadata in order to generate a higher fidelity output. In the presence of a ``vcs`` direct URL reference, the ``commit_id`` field SHOULD be used in priority in order to provide the highest possible fidelity to the originally installed version. If supported by their requirement format, tools are encouraged also to output the ``tag`` value if present, as it has immutable semantics. Tools MAY choose another approach, depending on the needs of their users. Note the initial iteration of this PEP does not attempt to make environments that include editable installs or installs from local directories reproducible, but it does attempt to make them readily identifiable. By locating the local project directory via the ``url`` and ``dir_info`` fields of this specification, tools can implement any strategy that fits their use cases. Backwards Compatibility ======================= Since this PEP specifies a new file in the ``.dist-info`` directory, there are no backwards compatibility implications. Alternatives ============ PEP 426 source_url ------------------ The now withdrawn :pep:`426` specifies a ``source_url`` metadata entry. It is also implemented in `distlib`_. It was intended for a slightly different purpose, for use in sdists. This format lacks support for the ``subdirectory`` option of pip requirement URLs. The same limitation is present in :pep:`PEP440 Direct References <440#direct-references>`. It also lacks explicit support for `environment variables in the user:password part of URLs`_. The introduction of a key/value extensibility mechanism and support for environment variables for user:password in :pep:`440`, would be necessary for use in this PEP. revision vs ref --------------- The ``requested_revision`` key was retained over ``requested_ref`` as it is a more generic term across various VCS and ``ref`` has a specific meaning for ``git``. References ========== .. _`pip issue #609`: https://github.com/pypa/pip/issues/609 .. _`thread on discuss.python.org`: https://discuss.python.org/t/pip-freeze-vcs-urls-and-pep-517-feat-editable-installs/1473 .. _`VCS URLs supported by pip`: https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support .. _`Pipfile and Pipfile.lock`: https://github.com/pypa/pipfile .. _distlib: https://distlib.readthedocs.io .. _`environment variables in the user:password part of URLs`: https://pip.pypa.io/en/stable/reference/pip_install/#id10 .. _`Recording the Direct URL Origin of installed distributions`: https://packaging.python.org/specifications/direct-url Acknowledgements ================ Various people helped make this PEP a reality. Paul F. Moore provided the essence of the abstract. Alyssa Coghlan suggested the direct_url name. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: