diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index c38e688e5..a06a5f3d4 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -640,6 +640,8 @@ peps/pep-0759.rst @warsaw peps/pep-0760.rst @pablogsal @brettcannon peps/pep-0761.rst @sethmlarson @hugovk # ... +peps/pep-0777.rst @warsaw +# ... peps/pep-0789.rst @njsmith # ... peps/pep-0801.rst @warsaw diff --git a/peps/pep-0777.rst b/peps/pep-0777.rst new file mode 100644 index 000000000..e89fe9500 --- /dev/null +++ b/peps/pep-0777.rst @@ -0,0 +1,303 @@ +PEP: 777 +Title: How to Re-invent the Wheel +Author: Ethan Smith +Sponsor: Barry Warsaw +PEP-Delegate: Paul Moore +Status: Draft +Type: Standards Track +Topic: Packaging +Created: 09-Oct-2024 +Post-History: + +Abstract +======== + +The current :pep:`wheel 1.0 specification <427>` was written over a decade ago, +and has been extremely robust to changes in the Python packaging ecosystem. +Previous efforts to improve the wheel specification +:pep:`were deferred <491#pep-deferral>` to focus on other packaging +specifications. Meanwhile, the use of wheels has changed dramatically in the +last decade. There have been many requests for new wheel features over the +years; however, a fundamental obstacle to evolving the wheel specification has +been that there is no defined process for how to handle adding +backwards-incompatible features to wheels. Therefore, to enable other PEPs to +describe new enhancements to the wheel specification, **this PEP prescribes** +**compatibility requirements on future wheel revisions**. This PEP does *not* +specify a new wheel revision. The specification of a new wheel format +(“Wheel 2.0”) is left to a future PEP. + +Rationale +========= + +Currently, wheel specification changes that require new installer behavior are backwards incompatible and require a major version increase in +the wheel metadata format. An increase of the wheel major version has yet to +happen, partially because such a change has the potential to be +catastrophically disruptive. Per +`the wheel specification `_, +any installer that does not support the new major version must abort at install +time. This means that if the major version were to be incremented without +further planning, many users would see installation failures as older installers reject new wheels +uploaded to public package indices like the Python Package Index (PyPI). It is +critically important to carefully plan the interactions between build tools, +package indices, and package installers to avoid incompatibility issues, +especially considering the long tail of users who are slow to update their +installers. + +The backward compatibility concerns have prevented valuable improvements +to the wheel file format, such as +`better compression `_, +`wheel data format improvements `_, +`better information about what is included in a wheel `_, +and `JSON formatted metadata in the ".dist-info" folder `_. + +This PEP describes constraints and behavior for new wheel revisions to preserve +stability for existing tools that do not support a new major version of the wheel format. +This ensures that backwards incompatible changes to the wheel specification +will only affect users and tools that are properly set up to use the newer +wheels. With a clear path for evolving the wheel specification, future PEPs +will be able to improve the wheel format without needing to re-define a +completely new compatibility story. + +Specification +============= + +Add Wheel-Version Metadata Field to Core Metadata +------------------------------------------------- + +Currently, the :pep:`wheel 1.0 PEP <427>`, PEP 427, specifies that wheel files +must contain a ``WHEEL`` metadata file that contains the version of the wheel +specification that the file conforms to. PEP 427 stipulates that installers +MUST warn on installation of a wheel with a minor version greater than supported, +and MUST abort on installation of wheels with a major version that is greater than +what the installer supports. This ensures that users do not get invalid +installations from wheels that installers cannot properly install. + +However, resolvers do not currently exclude wheels with an incompatible wheel +version. There is also currently no way for a resolver to check a wheel's +version without downloading the wheel directly. To make wheel version filtering +easy for resolvers, the wheel version **MUST** be included in the relevant +metadata file (currently METADATA). This will allow resolvers to efficiently +check the wheel version using the :pep:`658` metadata API without needing to +download and inspect the ``.dist-info/WHEEL`` file. + +To accomplish this, a new core metadata field is introduced called +``Wheel-Version``. While this field is optional for metadata included in a +wheel of major version 1, it is a mandatory field for metadata in wheels of major +version 2 or higher. This enforces that future revisions of the wheel +specification can rely on resolvers skipping incompatible wheels by checking +the ``Wheel-Version`` field. + +The ``Wheel-Version`` field in the metadata file shall contain the exact same entry as the +``Wheel-Version`` entry in the ``WHEEL`` file, or any future replacement file +defining metadata about the wheel file. Installers **MUST** verify that these +entries match when installing a wheel. If ``Wheel-Version`` is absent from the +metadata file, then the implied major version of the wheel is 1. + +Resolver Behavior Regarding ``Wheel-Version`` +--------------------------------------------- + +Resolvers, in the process of selecting a wheel to install, **MUST** check a +candidate wheel's ``Wheel-Version``, and ignore incompatible wheel files. +Without ignoring these files, older installers might select a wheel that uses +an unsupported wheel version for that installer, and force the installer to +abort per :pep:`427`. By skipping incompatible wheel files, users will not see +installation errors when a project adopts a new wheel major version. As already +specified in PEP 427, installers **MUST** abort if a user tries to directly +install a wheel that is incompatible. If, in the process of resolving packages +found in multiple indices, a resolver comes across two wheels of the same +distribution and version, resolvers should prioritize the wheel of the highest +compatible version. + +While the above protects users from unexpected breakages, users may miss a new +release of a distribution if their installer does not support the wheel version +used in the release. Imagine in the future that a package publishes 3.0 wheel +files. Downstream users won't see that there is a new release available if +their installers only support 2.x wheels. Therefore, installers **SHOULD** emit +a warning if, in the process of resolving packages, they come across an incompatible wheel +and skip it. + +First Major Version Bump Must Change File Extension +--------------------------------------------------- + +Unfortunately, existing resolvers do not check the compatibility of wheels +before selecting them as installation candidates. Until a majority of users +update to installers that properly check for wheel compatibility, it is unsafe +to allow publishing wheels of a new major version that existing resolvers might +select. It could take upwards of four years before the majority of users are on +updated resolvers, based on current data about PyPI installer usage (See the +:ref:`777-pypi-download-analysis`, for +details). To allow for experimentation and faster adoption of 2.0 wheels, +this PEP proposes a one time change to the file extension of the +wheel file format, from ``.whl`` to ``.whlx``. This resolves the initial +transition issue of 2.0 wheels breaking users on existing installers that do +not implement ``Wheel-Version`` checks. By using a different file extension, +2.0 wheels can immediately be uploaded to PyPI, and users will be able to +experiment with the new features right away. Users on older installers will +simply ignore these new files. + +One rejected alternative would be to keep the ``.whl`` extension, but delay the +publishing of wheel 2.0 to PyPI. For more on that, please see Rejected Ideas. + +Recommended Build Backend Behavior with New Wheel Formats +--------------------------------------------------------- + +Build backends are recommended to generate the most compatible wheel based on +features a project uses. For example, if a wheel does not use symbolic links, +and such a feature was introduced in wheel 5.0, the build backend could +generate a wheel of version 4.0. On the other hand, some features will want to +be adopted by default. For example, if wheel 3.0 introduces better compression, +the build backend may wish to enable this feature by default to improve the +wheel size and download performance. + +Limitations on Future Wheel Revisions +------------------------------------- + +While it is difficult to know what future features may be planned for the wheel +format, it is important that certain compatibility promises are maintained. + +Wheel files, when installed, **MUST** stay compatible with the Python standard +library's ``importlib.metadata`` for all supported CPython versions. For +example, replacing ``.dist-info/METADATA`` with a JSON formatted metadata file +MUST be a multi-major version migration with one version introducing the new +JSON file alongside the existing email header format, and another future +version removing the email header format metadata file. The version to remove +``.dist-info/METADATA`` also **MUST** be adopted only after the last CPython +release that lacked support for the new file reaches end of life. This ensures +that code using ``importlib.metadata`` will not break with wheel major version +revisions. + +Wheel files **MUST** remain ZIP format files as the outer container format. +Additionally, the ``.dist-info`` metadata directory **MUST** be placed at the +root of the archive without any compression, so that unpacking the wheel file +produces a normal ``.dist-info`` directory holding any metadata for the wheel. +Future wheel revisions **MAY** modify the layout, compression, and other +attributes about non-metadata components of a wheel such as data and code. This +assures that future wheel revisions remain compatible with tools operating on +package metadata, while allowing for improvements to code storage in the wheel, +such as adopting compression. + +Package tooling **MUST NOT** assume that the contents and format of the wheel +file will remain the same for future wheel major versions beyond the +limitations above about metadata folder contents and outer container format. +For example, newer wheel major versions may add or remove filename components, +such as the build tag or the platform tag. Therefore it is incumbent upon +tooling to check the metadata for the ``Wheel-Version`` before attempting to +install a wheel. + +Finally, future wheel revisions **MUST NOT** use any compression formats not in +the CPython standard library of at least the latest release. Wheels generated +using any new compression format should be tagged as requiring at least the +first released version of CPython to support the new compression format, +regardless of the Python API compatibility of the code within the wheel. + +Backwards Compatibility +======================= + +Backwards compatibility is an incredibly important issue for evolving the wheel +format. If adopting a new wheel revision is painful for downstream users, +package creators will hesitate to adopt the new standards, and users will be +stuck with failed CI pipelines and other installation woes. + +Several choices in the above specification are made so that the adoption of a +new feature is less painful. For example, today wheels of an incompatible major +version are still selected by pip as installation candidates, which causes +installer failures if a project starts publishing 2.0 wheels. To avoid this +issue, this PEP requires resolvers to filter out wheels with major versions or +features incompatible with the installer. + +This PEP also defines constraints on future wheel revisions, with the goal of +maintaining compatibility with CPython, but allowing evolution of wheel +contents. Wheel revisions shouldn't cause package installations to break on +older CPython revisions, as not only would it be frustrating, it would be +incredibly hard to debug for users. + +The main compatibility limitation of this PEP is for projects that start +publishing solely new wheels alongside a source distribution. If a user on an +older installer tries to install the package, it will fall back to the source +distribution, because the resolver will skip all newer wheels. Users are often +poorly set up to build projects from source, so this could lead to some failed +builds users would not see otherwise. There are several approaches to resolving +this issue, such as allowing dual-publishing for the initial migration, or +marking source distributions as not intended to be built. + +Rejected Ideas +============== + +The Wheel Format is Perfect and Does not Need to be Changed +----------------------------------------------------------- +The wheel format has been around for over 10 years, and in that time, Python +packages have changed a lot. It is much more common for packages to include +Rust or C extension modules, increasing the size of packages. Better +compression, such as lzma or zstd, could save a lot of time and bandwidth for +PyPI and its users. Compatibility tags cannot express the wide variety of +hardware used to accelerate Python code today, nor encode shared library +compatibility information. In order to address these issues, evolution of the +wheel package format is necessary. + +Wheel Format Changes Should be Tied to CPython Releases +------------------------------------------------------- +I do not believe that tying wheel revisions to CPython +releases is beneficial. The main benefit of doing so is to make adoption of new +wheels predictable - users with the latest CPython get the latest package +format! This choice has several issues however. First, tying the new format +to the latest CPython makes adoption much slower. Users on LTS versions of +Linux with older Python installations are free to update their pip in a virtual +environment, but cannot update the version of Python as easily. While some +changes to the wheel format must be tied to CPython changes necessarily, such +as adding new compression formats or changing the metadata format, many changes +do not need to be tied to the Python version, such as symlinks, enhanced +compatibility tags, and new formats that use existing compression formats in +the standard library. Additionally, wheels are used across multiple different +language implementations, which lag behind the CPython version. It seems unfair +to prevent their users from using a feature due to the Python version. Lastly, +while this PEP does not suggest tying the wheel version to CPython releases, a +future PEP may still do so at any time, so this choice does not need to be made +in this PEP. + +Keep Using ``.whl`` as the File Extension +----------------------------------------- +While keeping the extension ``.whl`` is appealing for many reasons, it presents +several problems that are difficult to surmount. First, current installers +would still pick a new wheel and fail to install the package. Furthermore, +the file name of a wheel would not be able to change without breaking existing +installers that expect a set wheel file name format. While the current filename +specification for wheels is sufficient for current usage, the optional +build tag in the middle of the file name makes any extensions ambiguous (i.e. +``foo-0.3-py3-none-any-fancy_new_tag.whl`` would parse as the build tag being +``py3``). This limits changes to information stored in the wheel file name. + +Discussion Topics +================= + +Should Indices Support Dual-publishing for the First Migration? +--------------------------------------------------------------- +Since ``.whl`` and ``.whlx`` will look different in file name, they could be +uploaded side-by-side to package indices like PyPI. This has some nice +benefits, like dual-support for older and newer installers, so users who can +get the latest features, while users who don't upgrade still can install the +latest version of a package. + +There are many complications however. Should we allow wheel 2 uploads to +existing wheel 1-only releases? Should we put any requirements on the +side-by-side wheels, such as: + +.. admonition:: Constraints on dual-published wheels + + A given index may contain identical-content wheels with different wheel + versions, and installers should prefer the newest-available wheel format, + with all other factors held constant. + +Should we only allow uploading both with :pep:`694` allowing "atomic" +dual-publishing? + +Acknowledgements +================ + +The author of this PEP is greatly indebted to the incredibly valuable review, +advice, and feedback of Barry Warsaw and Michael Sarahan. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. diff --git a/peps/pep-0777/appendix-dl-by-installer.png b/peps/pep-0777/appendix-dl-by-installer.png new file mode 100644 index 000000000..7d8cd8568 Binary files /dev/null and b/peps/pep-0777/appendix-dl-by-installer.png differ diff --git a/peps/pep-0777/appendix-dl-by-pip-version.png b/peps/pep-0777/appendix-dl-by-pip-version.png new file mode 100644 index 000000000..2b23afb9b Binary files /dev/null and b/peps/pep-0777/appendix-dl-by-pip-version.png differ diff --git a/peps/pep-0777/appendix-pypi-download-analysis.rst b/peps/pep-0777/appendix-pypi-download-analysis.rst new file mode 100644 index 000000000..227355c2c --- /dev/null +++ b/peps/pep-0777/appendix-pypi-download-analysis.rst @@ -0,0 +1,78 @@ +:orphan: + +.. _777-pypi-download-analysis: + +Appendix: Analysis of Installer Usage on PyPI +============================================= + +.. note:: + This analysis is not perfect. While it uses the best available data, + mirrors, caches used by enterprises, and other confounding factors + could affect the numbers in this analysis. Consider the numbers as trends + rather than concrete reliable figures. + +One pertinent question to :pep:`777` is how frequently Python users update their +installer. If users update quite frequently, compatibility concerns are not as +important; users will be up-to-date by the time new features get added. On the +other hand, if users are frequently using older installers, then incompatible +wheels on PyPI would have a much wider impact. To figure out the relative share +of up-to-date vs outdated installers, we can use PyPI download statistics. + +PyPI publishes a `BigQuery dataset `_, +which contains information about each download PyPI receives, including +installer name and version when available. The following query was used to +collect the data for this analysis: + +.. code-block:: sql + + #standardSQL + SELECT + details.installer.name as installer_name, + details.installer.version as installer_version, + COUNT(*) as num_downloads, + FROM `bigquery-public-data.pypi.file_downloads` + WHERE + -- Only query the last 6 months of data + DATE(timestamp) + BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH) + AND CURRENT_DATE() + GROUP BY `installer_name`, `installer_version` + ORDER BY `num_downloads` DESC + +With the raw data available, we can start investigating how up-to-date +installers that download packages from PyPI are. The below chart shows the +breakdown by installer name of all downloads on PyPI for the six month period +from March 10, 2024 to September 10, 2024. + +.. image:: appendix-dl-by-installer.png + :class: invert-in-dark-mode + :width: 600 + :alt: A pie chart breaking down PyPI downloads by installer. pip makes up + 87.5%, uv makes up 4.8%, poetry makes up 3.0%, requests makes up 1.6%, + and "null" makes up 2.1%. + +As can be seen above, pip is the most popular installer in this time frame. +For simplicity's sake, this analysis will focus on pip installations when +considering how up-to-date installers are. pip has existed for a long +time, so analyzing the version of pip used to download packages should +provide an idea of how frequently users update their installers. Below is a +chart breaking down installations in PyPI over the same six month period, now +grouped by pip installer major version. pip uses calendar versioning, so +an installation from pip 20.x means that the user has not updated their pip +in four years. + +.. image:: appendix-dl-by-pip-version.png + :class: invert-in-dark-mode + :width: 600 + :alt: A pie chart breaking down PyPI downloads by pip major version. 24.x + makes up 47.7%, 23.x makes up 19.9%, 22.x makes up 10.5%, 21.x makes up + 13.9%, 20.x makes up 5.4%, and 9.x makes up 1.9%. + +Over two thirds of users currently run pip from this year or last. However, +about 7% are on a version that is at least four years old(!). This indicates that +there is a long tail of users who do not regularly update their installers. + +Coming back to the initial question for PEP 777, it appears that caution should +be taken when publishing wheels with major version 2 to PyPI, as they are +likely to cause issues with a small but significant proportion of users who do +not regularly update their pip.