PEP: 763 Title: Limiting deletions on PyPI Author: William Woodruff , Alexis Challande Sponsor: Donald Stufft PEP-Delegate: Donald Stufft Discussions-To: https://discuss.python.org/t/69487 Status: Draft Type: Standards Track Topic: Packaging Created: 24-Oct-2024 Post-History: `09-Jul-2022 `__, `01-Oct-2024 `__, `28-Oct-2024 `__ Abstract ======== We propose limiting when users can delete files, releases, and projects from PyPI. A project, release, or file may only be deleted within 72 hours of when it is uploaded to the index. From this point, users may only use the "yank" mechanism specified by :pep:`592`. An exception to this restriction is made for releases and files that are marked with :ref:`pre-release specifiers `, which will remain deletable at any time. The PyPI administrators will retain the ability to delete files, releases, and projects at any time, for example for moderation or security purposes. Rationale and Motivation ======================== As observed in :pep:`592`, user-level deletion of projects on PyPI enables a catch-22 situation of dependency breakage: Whenever a project detects that a particular release on PyPI might be broken, they oftentimes will want to prevent further users from inadvertently using that version. However, the obvious solution of deleting the existing file from a repository will break users who have pinned to a specific version of the project. This leaves projects in a catch-22 situation where new projects may be pulling down this known broken version, but if they do anything to prevent that they’ll break projects that are already using it. On a technical level, the problem of deletion is mitigated by "yanking," also specified in :pep:`592`. However, deletions continue to be allowed on PyPI, and have caused multiple notable disruptions to the Python ecosystem over the interceding years: * July 2022: `atomicwrites `_ was `deleted by its maintainer `_ in an attempt to remove the project's "critical" designation, without the maintainer realizing that project deletion would also delete all previously uploaded releases. The project was subsequently restored with the maintainer's consent, but at the cost of manual administrator action and extensive downstream breakage to projects like `pytest `_. As of October 2024, atomicwrites is archived but still has around `4.5 million monthly downloads from PyPI `_. * April 2023: `codecov `_ was deleted by its maintainers after a long deprecation period. This caused extensive breakage for many of Codecov's CI/CD users, who were unaware of the deprecation period due to limited observability of deprecation warnings within CI/CD logs. The project was `subsequently re-created `_ by its maintainers, with a new release published to compensate for the deleted releases (which were not restored), meaning that any pinned installations remained broken. As of October 2024, this single release remains the only release on PyPI and has around `1.5 million monthly downloads `_. * June 2023: `python-sonarqube-api `_ deleted all released releases prior to 2.0.2. The project's maintainer subsequently `deleted conversations `_ and force-pushed over the tag history for ``python-sonarqube-api``'s source repository, impeding efforts by users to compare changes between releases. * June 2024: `PySimpleGUI `_ changed licenses and deleted `nearly all previous releases `_. This resulted in widespread disruption for users, who (prior to the relicensing) were downloading PySimpleGUI approximately 25,000 times a day. In addition to their disruptive effect on downstreams, deletions also have detrimental effects on PyPI's sustainability as well as the overall security of the ecosystem: * Deletions increase support workload for PyPI's administrators and moderators, as users mistakenly file support requests believing that PyPI is broken, or that the administrators themselves have removed the project. * Deletions impair external (meaning end-user) incident response and analysis, making it difficult to distinguish "good faith" maintainer behavior from a malicious actor attempting the cover their tracks. The Python ecosystem is continuing to grow, meaning that future deletions of projects can be reasonably assumed to be *just, as if not more,* disruptive than the deletions sampled above. Given all of the above, this PEP concludes that deletions now present a greater risk and detriment to the Python ecosystem than a benefit. In addition to these technical arguments, there is also precedent from other packaging ecosystems for limiting the ability of users to delete projects and their constituent releases. This precedent is documented in :ref:`Appendix A `. Specification ============= There are three different types of deletable objects: 1. **Files**, which are individual project distributions (such as source distributions or wheels). Example: ``requests-2.32.3-py3-none-any.whl``. 2. **Releases**, which contain one or more files that share the same version number. Example: `requests v2.32.3 `_. 3. **Projects**, which contain one or more releases. Example: `requests `_. Deletion eligibility rules -------------------------- This PEP proposes the following *deletion eligibility rules*: * A **file** is deletable if and only if it was uploaded to PyPI less than 72 hours from the current time, **or** if it has a :ref:`pre-release specifier `. * A **release** is deletable if and only if all of its contained files are deletable. * A **project** is deletable if and only if all of its releases are deletable. These rules allow new projects to be deleted entirely, and allow old projects to delete new files or releases, but do not allow old projects to delete old files or releases. Implementation ============== This PEP's implementation primarily concerns aspects of PyPI that are not standardized or subject to standardization, such as the web interface and signed-in user operations. As a result, this section describes its implementation in behavioral terms. Changes ------- * Per the eligibility rules above, PyPI will reject web interface requests (using an appropriate HTTP response code of its choosing) for file, release, or project deletion if the respective object is not eligible for deletion. * PyPI will amend its web interface to indicate a file/release/project's deletion ineligibility, e.g. by styling the relevant UI elements as "inactive" and making relevant bottoms/forms unclickable. Security Implications ===================== This PEP does not identify negative security implications associated with the proposed approach. This PEP identifies one minor positive security implication: by restricting user-controlled deletions, this PEP makes it more difficult for a malicious actor to cover their tracks by deleting malware from the index. This is particularly useful for external (i.e. non-PyPI administrator) triage and incident response, where the defending party needs easy access to malware samples to develop indicators of compromise. How To Teach This ================= This PEP suggests at least two pieces of public-facing material to help the larger Python packaging community (and its downstream consumers) understand its changes: * An announcement post on the `PyPI blog `_ explaining the nature of the PEP, its motivations, and its behavioral implications for PyPI. * An announcement banner on PyPI itself, linking to the above. * Updates to the `PyPI user documentation `_ explaining the difference between deletion and yanking and the limited conditions under which the former can still be initiated by package owners. Rejected Ideas ============== Conditioning deletion on dependency relationships ------------------------------------------------- An alternative to time-based deletion windows is deletion eligibility based on downstream dependents. For example, a release could be considered deletable if and only if it has fewer than ``N`` downstream dependents on PyPI, where ``N`` could be as low as 1. This idea is appealing since it directly links deletion eligibility to disruptiveness. `npm `_ uses it and conditions project removal on the absence of any downstream dependencies known to the index. Despite its appeal, this PEP identifies several disadvantages and technical limitations that make dependency-conditioned deletion not appropriate for PyPI: 1. *PyPI is not aware of dependency relationships.* In Python packaging, both project builds *and* metadata generation are frequently dynamic operations, involving arbitrary project-specified code. This is typified by source distributions containing ``setup.py`` scripts, where the execution of ``setup.py`` is responsible for computing the set of dependencies encoded in the project's metadata. This is in marked contrast to ecosystems like npm and Rust's `crates `_, where project *builds* can be dynamic but the project's metadata itself is static. As a result of this, `PyPI doesn't know your project's dependencies `_, and is architecturally incapable of knowing them without either running arbitrary code (a significant security risk) or performing a long-tail deprecation of ``setup.py``-based builds in favor of :pep:`517` and :pep:`621`-style static metadata. 2. *Results in an unintuitive permissions model.* Dependency-conditioned deletion results in a "reversed" power relationship, where anybody who introduces a dependency on a project can prevent that project from being deleted. This is reasonable on face value, but can be abused to produce unexpected and undesirable (in the context of enabling some deletions) outcomes. A notable example of this is npm's `everything package `_, which depends on every public package on npm (as of 30 Dec 2023) and thereby prevents their deletion. Conditioning deletion on download count --------------------------------------- Another alternative to time-based deletion windows is to delete based on the number of downloads. For example, a release could be considered deletable if and only if it has fewer than ``N`` downloads during the last period. While presenting advantages by tying a project deletion possibility to its usage, this PEP identifies several limitations to this approach: 1. *Ecosystem diversity.* The Python ecosystem includes projects with widely varying usage patterns. A fixed download threshold would not adequately account for niche but critical projects with naturally low download counts. 2. *Time sensitivity.* Download counts do not necessarily reflect a project's current status or importance. A previously popular project might have low recent downloads but still be crucial for maintaining older systems. 3. *Technical complexity.* Accessing the download count of a project within PyPI is not straightforward, and there is limited possibility to gather a project's download statistics from mirrors or other distributions systems. .. _pep763-appendix-a: Appendix A: Precedent in other ecosystems ========================================= The following is a table of support for deletion in different packaging ecosystems. An ecosystem is considered to **not** support deletion if it restrict's a user's ability to perform deletions in a manner similar to this PEP. An earlier version of this table, showing only deletion, was compiled by Donald Stufft and others on the Python discussion forum in `July 2022 `__. .. list-table:: :header-rows: 1 * - Ecosystem (Index) - Deletion - Yanking - Notes * - Python (PyPI) - ✅ [#f1]_ - ✅ [#f2]_ - Deletion currently completely unrestricted. * - Rust (crates.io) - ❌ - ✅ [#f3]_ - Deletion by users not allowed at all. * - JavaScript (npm) - ❌ [#f4]_ - ✅ [#f5]_ - Deletion is limited by criteria similar to this PEP. * - Ruby (RubyGems) - ✅ [#f6]_ - ❌ - RubyGems calls deletion "yanking." Yanking in PyPI's terms is not supported at all. * - Java (Maven Central) - ❌ [#f7]_ - ❌ - Deletion by users not allowed at all. * - PHP (Packagist) - ❌ [#f8]_ - ❌ - Deletion restricted after an undocumented number of installs. * - .NET (NuGet) - ❌ [#f9]_ - ✅ [#f10]_ - NuGet calls yanking "unlisting." * - Elixir (Hex) - ❌ [#f11]_ - ✅ [#f11]_ - Hex calls yanking "retiring." * - R (CRAN) - ❌ [#f12]_ - ✅ [#f12]_ - Deletion is limited to within 24 hours of initial release or 60 minutes for subsequent versions. CRAN calls yanking "archiving." * - Perl (CPAN) - ✅ - ❌ - Yanking is not supported at all. Deletion seemingly encouraged, at least as of 2021 [#f13]_. * - Lua (LuaRocks) - ✅ [#f14]_ - ✅ [#f14]_ - LuaRocks calls yanking "archiving." * - Haskell (Hackage) - ❌ [#f15]_ - ✅ [#f16]_ - Hackage calls yanking "deprecating." * - OCaml (OPAM) - ❌ [#f17]_ - ✅ [#f17]_ - Deletion is allowed if it occurs "reasonably soon" after inclusion. Yanking is *de facto* supported by the ``available: false`` marker, which effectively disables resolution. The following trends are present: * A strong majority of indices **do not** support deletion (9 vs. 4) * A strong majority of indices **do** support yanking (9 vs. 4) * An overwhelming majority of indices support one or the other or neither, but **not** both (11 vs. 2) * PyPI and LuaRocks are notable outliers in supporting **both** deletion and yanking. Footnotes ========= .. [#f1] https://pypi.org/help/#deletion .. [#f2] https://pypi.org/help/#yanked .. [#f3] https://doc.rust-lang.org/cargo/commands/cargo-yank.html .. [#f4] https://docs.npmjs.com/unpublishing-packages-from-the-registry .. [#f5] https://docs.npmjs.com/deprecating-and-undeprecating-packages-or-package-versions .. [#f6] https://guides.rubygems.org/removing-a-published-gem/ .. [#f7] https://central.sonatype.org/faq/can-i-change-a-component/ .. [#f8] https://github.com/composer/packagist/issues/875 .. [#f9] https://learn.microsoft.com/en-us/nuget/nuget-org/policies/deleting-packages .. [#f10] https://learn.microsoft.com/en-us/nuget/nuget-org/policies/deleting-packages#unlisting-a-package .. [#f11] https://hex.pm/docs/faq#can-packages-be-removed-from-the-repository .. [#f12] https://cran.r-project.org/web/packages/policies.html .. [#f13] https://neilb.org/2021/05/10/delete-your-old-releases.html .. [#f14] https://luarocks.org/changes .. [#f15] https://hackage.haskell.org/upload .. [#f16] https://hackage.haskell.org/packages/deprecated .. [#f17] https://github.com/ocaml/opam-repository/wiki/Policies#1-removal-of-packages-should-be-avoided Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.