From 1490ed1ef5ede3dd65f07a87ef19485f5473872b Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Wed, 2 Oct 2024 16:53:07 -0700 Subject: [PATCH] PEP 759: External Wheel Hosting (#4011) PEP 759, External Wheel Hosting; initial published version Co-authored-by: Jelle Zijlstra Co-authored-by: Ethan Smith Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> --------- Co-authored-by: Jelle Zijlstra Co-authored-by: Ethan Smith Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> --- .github/CODEOWNERS | 1 + .gitignore | 3 +- peps/pep-0759.rst | 472 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 475 insertions(+), 1 deletion(-) create mode 100644 peps/pep-0759.rst diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 60450e4ee..f59bf136f 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -636,6 +636,7 @@ peps/pep-0755.rst @warsaw peps/pep-0756.rst @vstinner peps/pep-0757.rst @vstinner peps/pep-0758.rst @pablogsal @brettcannon +peps/pep-0759.rst @warsaw # ... peps/pep-0789.rst @njsmith # ... diff --git a/.gitignore b/.gitignore index 6004c5e61..6e056e9f6 100644 --- a/.gitignore +++ b/.gitignore @@ -24,4 +24,5 @@ coverage.xml /venv # Builds -/sphinx-warnings.txt \ No newline at end of file +/sphinx-warnings.txt +/peps/numerical.rst diff --git a/peps/pep-0759.rst b/peps/pep-0759.rst new file mode 100644 index 000000000..810f4d192 --- /dev/null +++ b/peps/pep-0759.rst @@ -0,0 +1,472 @@ +PEP: 759 +Title: External Wheel Hosting +Author: Barry Warsaw , + Ethan Smith +PEP-Delegate: Donald Stufft +Status: Draft +Type: Standards Track +Topic: Packaging +Created: 01-Oct-2024 +Post-History: + +Abstract +======== + +This PEP proposes a mechanism by which projects hosted on `pypi.org +`__ can safely host wheel artifacts on external sites other +than PyPI. This PEP explicitly does *not* propose external hosting of +projects, packages, or their metadata. That functionality is already available +by externally hosting independent package indexes. Because this PEP only +provides a mechanism for projects to customize the download URL for specific +released wheel artifacts, dependency resolution as already implemented by +common installer tools such as `pip `__ and +`uv `__ does not need to change. + +This PEP defines what it means to be "safe" in this context, along with a new +package upload file format called a ``.rim`` file. It defines how ``.rim`` +files affect the metadata returned for a package's :ref:`Simple Repository API +` +in both HTML and JSON formats, and how traditional wheels can easily be turned +into ``.rim`` files. + +Rationale +========= + +The Python Package Index, hosted at https://pypi.org, imposes `default limits +`__ on upload artifact file size (100 MiB) and total project size +(10 GiB). Most projects can comfortably fit within these limits during the lifetime of the +project, through years of uploads. A few projects have encountered these limits, and have +been granted both file size and project size exceptions, allowing them to continue +uploading new releases without having to take more drastic measures, such as removing +files which may potentially still be in use by consumers (e.g. through version pins). + +A related workaround is the `"wheel stub" `__ +approach, which provides an indirect link between PyPI and an external third party package +index, where such limitations can be avoided. Wheel stubs are :ref:`source distributions +` (a.k.a. "sdists") which utilize a :pep:`517` build +backend that, instead of turning source code into a binary wheel, performs some logic to +calculate the URL for an existing, externally hosted wheel to download and install. This +approach works, but it obscures the connection between PyPI, the sdist, and the externally +hosted wheel, since there is no way to present this information to ``pip`` or other such +tools. + +Historical context +------------------ + +In 2013, :pep:`438` proposed a "backward-compatible two-phase transition +process" to modify several aspects of release file hosting on PyPI. As this +PEP describes, PyPI originally supported only project and release +*registration* without also allowing for artifact file hosting. As such, most +projects hosted release file artifacts elsewhere. Artifact hosting was later +added, but the mix of externally and PyPI-hosted files led to a wide range of +usability and potential security-related problems. PEP 438 was an attempt to +provide several facilities to allow external hosting while promoting a +PyPI-first hosting preference. + +PEP 438 was complex, with three different "hosting modes", ``rel`` metadata in +the simple HTML index pages to signify hosting locations, and a two-phase +transition plan affecting PyPI and installer tools. PEP 438 was ultimately +retracted in 2015 by :pep:`470`, which acknowledges that PEP 438 did succeed +in... + + bringing about more people to utilize PyPI's repository features, an + altogether good thing given the global CDN powering PyPI providing speed + ups for a lot of people[...] + +Instead of external hosting, PEP 470 promoted the use of explicit multiple +repositories, providing full package indexing and artifact hosting, and +enabled through installer tool support, such as ``pip install +--extra-index-url`` allowing ``pip`` to essentially treat multiple +repositories as `one single global repository +`__ +for package installation resolution. Because this has been the blessed norm +for so many years, all Python package installation tools support querying +multiple indexes for dependency resolution. + +The problem with multiple indexes +--------------------------------- + +Why then does this PEP propose to allow a more limited form of external +hosting, and how does this proposal avoid the problems documented in PEP 470? + +One well-known problem that consolidating multiple indexes enables is +`dependency confusion attacks +`__, to +which Python *can* be particularly vulnerable, due to the algorithm that ``pip +install`` uses for resolving package dependencies and preferred versions. The +``uv`` tool addresses this by supporting an additional `index strategy +`__ option, +whereby users can select between, e.g. a ``pip``-compatible strategy, and a +more limited strategy that prevents such dependency confusion attacks. + +:pep:`708` provides additional background about dependency confusion attacks, +and takes a different approach to preventing them. At its core, PEP 708 allows +repository owners to indicate that projects track across different +repositories, which allows installers to determine how to treat the global +package namespace when combined across multiple repositories. PEP 708 has been +provisionally accepted, pending several required conditions as outlined in PEP +708, some of which may have an indeterminate future. As PEP 708 itself says, +this won't by itself solve dependency confusion attacks, but is one way to +provide enough information to installers to help minimize these attacks. + +While there can still be valid use cases for standing up a totally independent +package index (such as providing richer platform support for GPUs until a +fully formed `variant proposal +`__ +is accepted), this PEP takes a different, simpler approach and doesn't replace +any of the existing, proposed, or approved package index cooperation +specifications. + +This PEP also preserves the core purpose of PyPI, and allows it to +remain the traditional, canonical, centralized index of all Python +packages. + +Addressing PyPI limits +---------------------- + +This proposal also addresses the problem of size limits imposed by PyPI, where there is a +`default artifact size limit `__ of 100 MiB and a +default overall `project size limit `__ of 10 +GiB. Most packages and artifacts can easily fit in these limits, even for packages +containing binary extension modules for a variety of platforms. A small, but important +class of packages routinely exceed these limits, requiring them to submit PyPI `exception +request support tickets`_. It's not necessarily difficult to get resolution on such +exceptions, but it is a special process that can take some time to resolve, and the +criteria for granting such exceptions aren't well documented. + +Reducing operational complexity +------------------------------- + +Setting up and maintaining an entire package index can be a complex +operational solution, both time and resource intensive. This is especially +true if the main purpose of such an index is just to avoid file size +limitations. The external index approach also imposes a tricky UX on consumers +of projects on the external index, requiring them to understand how CLI +options such as ``--external-index-url`` work, along with the security +implications of such flags. It would be much easier for both producers and +consumers of large wheel packages to just set up and maintain a simple web +server, capable of serving individual files with no more complex API than +``HTTP GET``. Such an interface is also easily cacheable or placed behind a +`CDN `__. Simple HTTP +servers are also much easier to audit for security purposes, easier to proxy, +and usually take much less resources to run, support, and maintain. Even +something like `Amazon S3 `__ could be used to +host external wheels. + +This PEP proposes an approach that favors such operational simplicity. + +Specification +============= + +A new type of uploadable file is defined, called a "RIM" (i.e. ``.rim``), or "Remote +Installable Metadata" file. The name evokes the image of a wheel with the tire removed, +and emphasizes that ``.rim`` files are easily derived from ``.whl`` files. The process of +turning a ``.whl`` into a ``.rim`` is :ref:`outlined below `. The file name +format exactly matches the :ref:`wheel file naming format +` specification, except that RIM files use the suffix +``.rim``. This means that all the tags used to discriminate ``.whl`` files also +distinguish between different ``.rim`` files, and thus can be used during dependency +resolution steps, exactly as ``.whl`` files are today. In this respect, ``.whl`` and +``.rim`` files are interchangeable. + +The content of a ``.rim`` file is *nearly* identical to ``.whl`` files, however ``.rim`` +files **MUST** contain only the ``.dist-info`` directory from a wheel. No other top-level +file or directory is allowed in the ``.rim`` zip file. The ``.dist-info`` directory +**MUST** contain a single additional file in addition to those `allowed`_ in a ``.whl`` +file's ``.dist-info`` directory: a file called ``EXTERNAL-HOSTING.json``. + +.. _file-format: + +This is a JSON file contains containing the following keys: + +``version`` + This is the file format version, which for this PEP **MUST** be ``1.0``. +``owner`` + This **MUST** name the PyPI organization owner of this externally hosted file, for + reasons which will be described in :ref:`detail below `. +``uri`` + This is a single URL naming the location of the physical ``.whl`` file hosted on an + external site. This URL **MUST** use the ``https`` scheme. +``size`` + This is an integer value describing the size in bytes of the physical ``.whl`` file on + the remote host. +``hashes`` + This is a dictionary of the format described in :pep:`694`, used to capture both the + :pep:`694#upload-each-file` of the physical ``.whl`` file, with the same + constraints as proposed in that PEP. Since these hashes are immutable once uploaded + to PyPI, they serve as a critical validation that the externally hosted wheel hasn't + been corrupted or compromised. + +Effects of the RIM file +----------------------- + +The only effect of a ``.rim`` file is to change the download URL for the wheel artifact in +both the HTML and JSON interfaces in the `simple repository API`_. In the HTML page for a +package release, the ``href`` attribute **MUST** be the value of the ``uri`` key, +including a ``#=`` fragment. this hash fragment **MUST** be in +exactly the same format as described the :pep:`376` originated `signed wheel file format`_ +in the ``.dist-info/RECORD`` file. The exact same rules for selection of hash algorithm +and encoding is used here. + +Similarly in the `JSON response`_ the ``url`` key pointing to the download file must be +the value of the :ref:`uri ` key, and the ``hashes`` dictionary **MUST** be +included with values populated from the ``hashes`` dictionary provided above. + +In all other respects, a compliant package index should treat ``.rim`` files the same as +``.whl`` files, with some other minor exceptions as outlined below. For example, ``.rim`` +files can be `deleted `__ and yanked (:pep:`592`) just +like any ``.whl`` file, with the exact same semantics (i.e. deletions are permanent). When +a ``.rim`` is deleted, an index **MUST NOT** allow a matching ``.whl`` or ``.rim`` file to +be (re-)uploaded. + +Availability order +------------------ + +Externally hosted wheels **MUST** be available before the corresponding ``.rim`` file is +uploaded to PyPI, otherwise a publishing race condition is introduced, although this +requirement **MAY** be relaxed for ``.rim`` files uploaded to a :pep:`694` staged release. + +Wheels can override RIMs +------------------------ + +Indexes **MUST** reject ``.rim`` files if a matching ``.whl`` file already exists with the +exact same file name tags. However, indexes **MAY** accept a ``.whl`` file if a matching +``.rim`` file exists, as long as that ``.rim`` file hasn't been deleted or yanked. This +allows uploaders to replace an externally hosted wheel file with an index hosted wheel +file, but the converse is prohibited. Since the default is to host wheels on the same +package index that contains the package metadata, it is not allowed to "downgrade" an +existing wheel file once uploaded. When a ``.whl`` replaces a ``.rim``, the index **MUST** +provide download URLs for the package using its own hosted file service. When uploading +the overriding ``.whl`` file, the package index **MUST** validate the hash from the +existing ``.rim`` file, and these hashes must match or the overriding upload **MUST** be +rejected. + +PyPI API bump unnecessary +------------------------- + +It's likely that the changes are backward compatible enough that a bump in the `PyPI +repository version`_ is not necessary. Since ``.rim`` files are essentially changes only +to the upload API, package resolvers and package installers can continue to function with +the APIs they've always supported. + +.. _resiliency: + +External hosting resiliency +=========================== + +One of the key concerns leading to PEP 438's revocation in PEP 470 was +potential user confusion when an external index disappeared. From PEP 470: + + This confusion comes down to end users of projects not realizing if a + project is hosted on PyPI or if it relies on an external service. This + often manifests itself when the external service is down but PyPI is + not. People will see that PyPI works, and other projects works, but this + one specific one does not. They oftentimes do not realize who they need to + contact in order to get this fixed or what their remediation steps are. + +While the problem of external wheel hosting service going down is not directly +solved by this PEP, several safeguards are in place to greatly reduce the +potential burden on PyPI administrators. + +This PEP thus proposes that: + +- External wheel hosting is only allowed for packages which are owned by + `organization accounts `__. + External hosting is an organization-wide setting. +- Organization accounts do not automatically gain the ability to externally + host wheels; this feature MUST be explicitly enabled by PyPI admins at their discretion. Since + this will not be a common request, we don't expect the overhead to be nearly + as burdensome as :pep:`541` resolutions, account recovery requests, or even + file/project size increase requests. External hosting requests would be + handled in the same manner as those requests, i.e. via the `PyPI GitHub + support tracker `__. +- Organization accounts requesting external wheel hosting **MUST** register their own + support contact URI, be it a ``mailto`` URI for a contact email address, or the URL to + the organization's support tracker. Such a contact URI is optional for organizations + which do not avail themselves of external wheel file hosting. + +Combined with the ``EXTERNAL-HOSTING.json`` file's ``owner`` key, this allows for +installer tools to unambiguously redirect any download errors away from the PyPI support +admins and squarely to the organization's support admins. + +While the exact mechanics of storing and retrieving this organization support +URL will be defined separately, for the sake of example, let's say a package +``foo`` externally hosts wheel files on ```https://foo.example.com`` +`__ and that host becomes unreachable. When an +installer tool tries to download and install the package ``foo`` wheel, the +download step will fail. The installer would then be able to query PyPI to +provide a useful error message to the end user: + +- The installer downloads the ``.rim`` file and reads the ``owner`` key from the + ``EXTERNAL-HOSTING.json`` file inside the ``.rim`` zip file. +- The installer queries PyPI for the support URI for the organization + owner of the externally hosted wheel. +- An informative error message would then be displayed, e.g.: + + The externally hosted wheel file ``foo-....whl`` could not be + downloaded. Please contact support@foo.example.com for help. Do not report + this to the PyPI administrators. + +.. _dismounting: + +Dismounting wheels +================== + +It is generally very easy to produce a ``.rim`` file from an existing ``.whl`` +file. This could be done efficiently by a :pep:`518` build backend with an +additional command line option, or a separate tool which takes a ``.whl`` file +as input and creates the associated ``.rim`` file. To complete the analogy, +the act of turning a ``.whl`` into a ``.rim`` is called "dismounting". The +steps such a tool would take are: + +- Accept as input the source ``.whl`` file, the organization owner of the + package, and URL at which the ``.whl`` will be hosted, and the support URI + to report download problems from. These could in fact be captured in the + ``pyproject.toml`` file, but that specification is out of scope for this + PEP. +- Unzip the ``.whl`` and create the ``.rim`` zip archive. +- Omit from the ``.rim`` file any path in the ``.whl`` that **isn't** rooted + at the ``.dist-info`` directory. +- Calculate the hash of the source ``.whl`` file. +- Add the ``EXTERNAL-HOSTING.json`` file containing the JSON keys and values as described + above, to the ``.rim`` archive. + +Changes to tools +================ + +Theoretically, installer tools shouldn't need any changes, since when they +have identified the wheel to download and install, they simply consult the +download URLs returned by PyPI's Simple API. In practice though, tools such as +``pip`` and ``uv`` may have constrained lists of hosts they will allow +downloads from, such as PyPI's own ``pythonhosted.org`` domain. + +In this case, such tools will need to relax those constraints, but the exact policy for +this is left to the installer tools themselves. Any number of approaches could be +implemented, such as downloading the ``.rim`` file and verifying the +``EXTERNAL-HOSTING.json`` metadata, or simply trusting the external downloads for any +wheel with a matching checksum. They could also query PyPI for the project's organization +owner and support URI before trusting the download. They could warn the user when +externally hosted wheel files are encountered, and/or require the use of a command line +option to enable additional download hosts. Any of these verification policies could be +chosen in configuration files. + +Installer tools should also probably provide better error messages when +externally hosted wheels cannot be downloaded, e.g. because a host is +unreachable. As described above, such tools could query enough metadata from +PyPI to provide clear and distinct error messages pointing users to the +package's external hosting support email or issue tracker. + +Constraints for external hosting services +========================================= + +The following constraints lead to reliable and compatible external wheel hosting services: + +- External wheels **MUST** be served over HTTPS, with a certificate signed by + `Mozilla's root certificate store `__. This ensures + compatibility with `pip `__ + and `uv + `__. At + the time of this writing, ``pip`` 24.2 on Python 3.10 or newer uses the system + certificate store in addition to the Mozilla store provided by the third party `certifi + `__ Python package. ``uv`` uses the Mozilla store + provided by the `webpki-roots `__ crate, but not + the system store unless the ``--native-tls`` flag is given [#fn1]_. *The PyPI + administrators may modify this requirement in the future, but compatibility with popular + installers will not be compromised.* +- External wheel hosts **SHOULD** use a content delivery network (`CDN + `__), just as PyPI does. +- External wheel hosts **MUST** commit to a stable URL for all wheels they host. +- Externally hosted wheels **MUST NOT** be removed from an external wheel host unless the + corresponding ``.rim`` file is deleted from PyPI first, and **MUST NOT** remove external + wheels for yanked releases. +- External wheel hosts **MUST** support `HTTP range requests`_. +- External wheel hosts **SHOULD** support the `HTTP/2`_ protocol. + +Security +======== + +Several factors as described in this proposal should mitigate security +concerns with externally hosted wheels, such as: + +- Wheel file checksums **MUST** be included in ``.rim`` files, and once uploaded cannot be + changed. Since the checksum stored on PyPI is immutable and required, it is not possible + to spoof an external wheel file, even if the owning organization lost control of their + hosting domain. +- Externally hosted wheels **MUST** be served over HTTPS. +- In order to serve externally hosted wheels, organizations **MUST** be approved by the + PyPI admins. + +When users identify malware or vulnerabilities in PyPI-hosted projects, they can now +report this using the `malware reporting facilities `__ on +PyPI, as also described in this `blog post`_. The same process can be used to report +security issues in externally hosted wheels, and the same remediation process should be +used. In addition, since organizations with external hosting enabled MUST provide a +support contact URI, that URI can be used in some cases to report the security issue to +the hosting organization. Such organization reporting won't make sense for malware, but +could indeed be a very useful way to report security vulnerabilities in externally hosted +wheels. + +Rejected ideas +============== + +Several ideas were considered and rejected. + +- Requiring digital signatures on externally hosted wheel files, either in + addition to or other than hashes. We deem this unnecessary since the + checksum requirement should be enough to validate that the metadata on PyPI + for a wheel exactly matches the downloaded wheel. The added complexity of + key management outweighs any additional benefit such digital signatures + might convey. + +- Hash verification on ``.rim`` file uploads. PyPI *could* verify that the hash in the + uploaded ``.rim`` file matches the externally hosted wheel before it accepts the upload, + but this requires downloading the external wheel and performing the checksum, which also + implies that the upload of the ``.rim`` file cannot be accepted until this external + ``.whl`` file is downloaded and verified. This increases PyPI bandwidth and slows down + the upload query, although :pep:`694` draft uploads could potentially mitigate these + concerns. Still, the benefit is not likely worth the additional complexity. + +- Periodic verification of the download URLs by the index. PyPI could try to periodically + ensure that the external wheel host or the external ``.whl`` file itself is still + available, e.g. via an :rfc:`HTTP HEAD <9110#section-9.3.2>` request. This is likely overkill and without also + providing the file's checksum in the response [#fn2]_, may not provide much additional + benefit. + +- This PEP could allow for an organization to provide fallback download hosts, + such that a secondary is available if the primary goes down. We believe + that DNS-based replication is a much better, well-known technique, and + probably much more resilient anyway. + +- ``.rim`` file replacement. While it is allowed for ``.whl`` files to replace + existing ``.rim`` files, as long as a) the ``.rim`` file hasn't been deleted + or yanked, b) the checksums match, we do not allow replacing ``.whl`` files + with ``.rim`` files, nor do we allow a ``.rim`` file to overwrite an + existing ``.rim`` file. This latter could be a technique to change the + hosting URL for an externally hosted ``.whl``; however, we do not think this + is a good idea. There are other ways to "fix" an external host URL as + described above, and we do not want to encourage mass re-uploads of existing + ``.rim`` files. + +Footnotes +========= +.. [#fn1] The ``uv --native-tls`` flag `replaces + `__ + the ``webpki-roots`` store. +.. [#fn2] There being no standard way to return the file's checksum in response to an + :rfc:`HTTP HEAD <9110#section-9.3.2>` request. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + +.. _`exception request support tickets`: https://github.com/pypi/support/issues?q=is%3Aissue+is%3Aclosed+file+limit+request +.. _`allowed`: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#the-dist-info-directory +.. _`signed wheel file format`: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#signed-wheel-files +.. _`simple repository API`: https://packaging.python.org/en/latest/specifications/simple-repository-api/# +.. _`JSON response`: https://packaging.python.org/en/latest/specifications/simple-repository-api/#json-based-simple-api-for-python-package-indexes +.. _`PyPI repository version`: https://packaging.python.org/en/latest/specifications/simple-repository-api/#versioning-pypi-s-simple-api +.. _`blog post`: https://blog.pypi.org/posts/2024-03-06-malware-reporting-evolved/ +.. _`HTTP range requests`: https://http.dev/range-request +.. _`HTTP/2`: https://http.dev/2