PEP 777: How to Re-invent the Wheel (#4036)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
This commit is contained in:
Ethan Smith 2024-10-10 09:51:33 -07:00 committed by GitHub
parent 3b1b3d8ba3
commit fd8070858f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 383 additions and 0 deletions

2
.github/CODEOWNERS vendored
View File

@ -640,6 +640,8 @@ peps/pep-0759.rst @warsaw
peps/pep-0760.rst @pablogsal @brettcannon
peps/pep-0761.rst @sethmlarson @hugovk
# ...
peps/pep-0777.rst @warsaw
# ...
peps/pep-0789.rst @njsmith
# ...
peps/pep-0801.rst @warsaw

303
peps/pep-0777.rst Normal file
View File

@ -0,0 +1,303 @@
PEP: 777
Title: How to Re-invent the Wheel
Author: Ethan Smith <ethan@ethanhs.me>
Sponsor: Barry Warsaw <barry@python.org>
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
Status: Draft
Type: Standards Track
Topic: Packaging
Created: 09-Oct-2024
Post-History:
Abstract
========
The current :pep:`wheel 1.0 specification <427>` was written over a decade ago,
and has been extremely robust to changes in the Python packaging ecosystem.
Previous efforts to improve the wheel specification
:pep:`were deferred <491#pep-deferral>` to focus on other packaging
specifications. Meanwhile, the use of wheels has changed dramatically in the
last decade. There have been many requests for new wheel features over the
years; however, a fundamental obstacle to evolving the wheel specification has
been that there is no defined process for how to handle adding
backwards-incompatible features to wheels. Therefore, to enable other PEPs to
describe new enhancements to the wheel specification, **this PEP prescribes**
**compatibility requirements on future wheel revisions**. This PEP does *not*
specify a new wheel revision. The specification of a new wheel format
(“Wheel 2.0”) is left to a future PEP.
Rationale
=========
Currently, wheel specification changes that require new installer behavior are backwards incompatible and require a major version increase in
the wheel metadata format. An increase of the wheel major version has yet to
happen, partially because such a change has the potential to be
catastrophically disruptive. Per
`the wheel specification <https://packaging.python.org/en/latest/specifications/binary-distribution-format/#installing-a-wheel-distribution-1-0-py32-none-any-whl>`_,
any installer that does not support the new major version must abort at install
time. This means that if the major version were to be incremented without
further planning, many users would see installation failures as older installers reject new wheels
uploaded to public package indices like the Python Package Index (PyPI). It is
critically important to carefully plan the interactions between build tools,
package indices, and package installers to avoid incompatibility issues,
especially considering the long tail of users who are slow to update their
installers.
The backward compatibility concerns have prevented valuable improvements
to the wheel file format, such as
`better compression <https://discuss.python.org/t/improving-wheel-compression-by-nesting-data-as-a-second-zip/1747>`_,
`wheel data format improvements <https://discuss.python.org/t/should-there-be-a-new-standard-for-installing-arbitrary-data-files/7853/7>`_,
`better information about what is included in a wheel <https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494>`_,
and `JSON formatted metadata in the ".dist-info" folder <https://discuss.python.org/t/is-was-there-a-goal-with-pep-566s-json-encoding-section/12324/3>`_.
This PEP describes constraints and behavior for new wheel revisions to preserve
stability for existing tools that do not support a new major version of the wheel format.
This ensures that backwards incompatible changes to the wheel specification
will only affect users and tools that are properly set up to use the newer
wheels. With a clear path for evolving the wheel specification, future PEPs
will be able to improve the wheel format without needing to re-define a
completely new compatibility story.
Specification
=============
Add Wheel-Version Metadata Field to Core Metadata
-------------------------------------------------
Currently, the :pep:`wheel 1.0 PEP <427>`, PEP 427, specifies that wheel files
must contain a ``WHEEL`` metadata file that contains the version of the wheel
specification that the file conforms to. PEP 427 stipulates that installers
MUST warn on installation of a wheel with a minor version greater than supported,
and MUST abort on installation of wheels with a major version that is greater than
what the installer supports. This ensures that users do not get invalid
installations from wheels that installers cannot properly install.
However, resolvers do not currently exclude wheels with an incompatible wheel
version. There is also currently no way for a resolver to check a wheel's
version without downloading the wheel directly. To make wheel version filtering
easy for resolvers, the wheel version **MUST** be included in the relevant
metadata file (currently METADATA). This will allow resolvers to efficiently
check the wheel version using the :pep:`658` metadata API without needing to
download and inspect the ``.dist-info/WHEEL`` file.
To accomplish this, a new core metadata field is introduced called
``Wheel-Version``. While this field is optional for metadata included in a
wheel of major version 1, it is a mandatory field for metadata in wheels of major
version 2 or higher. This enforces that future revisions of the wheel
specification can rely on resolvers skipping incompatible wheels by checking
the ``Wheel-Version`` field.
The ``Wheel-Version`` field in the metadata file shall contain the exact same entry as the
``Wheel-Version`` entry in the ``WHEEL`` file, or any future replacement file
defining metadata about the wheel file. Installers **MUST** verify that these
entries match when installing a wheel. If ``Wheel-Version`` is absent from the
metadata file, then the implied major version of the wheel is 1.
Resolver Behavior Regarding ``Wheel-Version``
---------------------------------------------
Resolvers, in the process of selecting a wheel to install, **MUST** check a
candidate wheel's ``Wheel-Version``, and ignore incompatible wheel files.
Without ignoring these files, older installers might select a wheel that uses
an unsupported wheel version for that installer, and force the installer to
abort per :pep:`427`. By skipping incompatible wheel files, users will not see
installation errors when a project adopts a new wheel major version. As already
specified in PEP 427, installers **MUST** abort if a user tries to directly
install a wheel that is incompatible. If, in the process of resolving packages
found in multiple indices, a resolver comes across two wheels of the same
distribution and version, resolvers should prioritize the wheel of the highest
compatible version.
While the above protects users from unexpected breakages, users may miss a new
release of a distribution if their installer does not support the wheel version
used in the release. Imagine in the future that a package publishes 3.0 wheel
files. Downstream users won't see that there is a new release available if
their installers only support 2.x wheels. Therefore, installers **SHOULD** emit
a warning if, in the process of resolving packages, they come across an incompatible wheel
and skip it.
First Major Version Bump Must Change File Extension
---------------------------------------------------
Unfortunately, existing resolvers do not check the compatibility of wheels
before selecting them as installation candidates. Until a majority of users
update to installers that properly check for wheel compatibility, it is unsafe
to allow publishing wheels of a new major version that existing resolvers might
select. It could take upwards of four years before the majority of users are on
updated resolvers, based on current data about PyPI installer usage (See the
:ref:`777-pypi-download-analysis`, for
details). To allow for experimentation and faster adoption of 2.0 wheels,
this PEP proposes a one time change to the file extension of the
wheel file format, from ``.whl`` to ``.whlx``. This resolves the initial
transition issue of 2.0 wheels breaking users on existing installers that do
not implement ``Wheel-Version`` checks. By using a different file extension,
2.0 wheels can immediately be uploaded to PyPI, and users will be able to
experiment with the new features right away. Users on older installers will
simply ignore these new files.
One rejected alternative would be to keep the ``.whl`` extension, but delay the
publishing of wheel 2.0 to PyPI. For more on that, please see Rejected Ideas.
Recommended Build Backend Behavior with New Wheel Formats
---------------------------------------------------------
Build backends are recommended to generate the most compatible wheel based on
features a project uses. For example, if a wheel does not use symbolic links,
and such a feature was introduced in wheel 5.0, the build backend could
generate a wheel of version 4.0. On the other hand, some features will want to
be adopted by default. For example, if wheel 3.0 introduces better compression,
the build backend may wish to enable this feature by default to improve the
wheel size and download performance.
Limitations on Future Wheel Revisions
-------------------------------------
While it is difficult to know what future features may be planned for the wheel
format, it is important that certain compatibility promises are maintained.
Wheel files, when installed, **MUST** stay compatible with the Python standard
library's ``importlib.metadata`` for all supported CPython versions. For
example, replacing ``.dist-info/METADATA`` with a JSON formatted metadata file
MUST be a multi-major version migration with one version introducing the new
JSON file alongside the existing email header format, and another future
version removing the email header format metadata file. The version to remove
``.dist-info/METADATA`` also **MUST** be adopted only after the last CPython
release that lacked support for the new file reaches end of life. This ensures
that code using ``importlib.metadata`` will not break with wheel major version
revisions.
Wheel files **MUST** remain ZIP format files as the outer container format.
Additionally, the ``.dist-info`` metadata directory **MUST** be placed at the
root of the archive without any compression, so that unpacking the wheel file
produces a normal ``.dist-info`` directory holding any metadata for the wheel.
Future wheel revisions **MAY** modify the layout, compression, and other
attributes about non-metadata components of a wheel such as data and code. This
assures that future wheel revisions remain compatible with tools operating on
package metadata, while allowing for improvements to code storage in the wheel,
such as adopting compression.
Package tooling **MUST NOT** assume that the contents and format of the wheel
file will remain the same for future wheel major versions beyond the
limitations above about metadata folder contents and outer container format.
For example, newer wheel major versions may add or remove filename components,
such as the build tag or the platform tag. Therefore it is incumbent upon
tooling to check the metadata for the ``Wheel-Version`` before attempting to
install a wheel.
Finally, future wheel revisions **MUST NOT** use any compression formats not in
the CPython standard library of at least the latest release. Wheels generated
using any new compression format should be tagged as requiring at least the
first released version of CPython to support the new compression format,
regardless of the Python API compatibility of the code within the wheel.
Backwards Compatibility
=======================
Backwards compatibility is an incredibly important issue for evolving the wheel
format. If adopting a new wheel revision is painful for downstream users,
package creators will hesitate to adopt the new standards, and users will be
stuck with failed CI pipelines and other installation woes.
Several choices in the above specification are made so that the adoption of a
new feature is less painful. For example, today wheels of an incompatible major
version are still selected by pip as installation candidates, which causes
installer failures if a project starts publishing 2.0 wheels. To avoid this
issue, this PEP requires resolvers to filter out wheels with major versions or
features incompatible with the installer.
This PEP also defines constraints on future wheel revisions, with the goal of
maintaining compatibility with CPython, but allowing evolution of wheel
contents. Wheel revisions shouldn't cause package installations to break on
older CPython revisions, as not only would it be frustrating, it would be
incredibly hard to debug for users.
The main compatibility limitation of this PEP is for projects that start
publishing solely new wheels alongside a source distribution. If a user on an
older installer tries to install the package, it will fall back to the source
distribution, because the resolver will skip all newer wheels. Users are often
poorly set up to build projects from source, so this could lead to some failed
builds users would not see otherwise. There are several approaches to resolving
this issue, such as allowing dual-publishing for the initial migration, or
marking source distributions as not intended to be built.
Rejected Ideas
==============
The Wheel Format is Perfect and Does not Need to be Changed
-----------------------------------------------------------
The wheel format has been around for over 10 years, and in that time, Python
packages have changed a lot. It is much more common for packages to include
Rust or C extension modules, increasing the size of packages. Better
compression, such as lzma or zstd, could save a lot of time and bandwidth for
PyPI and its users. Compatibility tags cannot express the wide variety of
hardware used to accelerate Python code today, nor encode shared library
compatibility information. In order to address these issues, evolution of the
wheel package format is necessary.
Wheel Format Changes Should be Tied to CPython Releases
-------------------------------------------------------
I do not believe that tying wheel revisions to CPython
releases is beneficial. The main benefit of doing so is to make adoption of new
wheels predictable - users with the latest CPython get the latest package
format! This choice has several issues however. First, tying the new format
to the latest CPython makes adoption much slower. Users on LTS versions of
Linux with older Python installations are free to update their pip in a virtual
environment, but cannot update the version of Python as easily. While some
changes to the wheel format must be tied to CPython changes necessarily, such
as adding new compression formats or changing the metadata format, many changes
do not need to be tied to the Python version, such as symlinks, enhanced
compatibility tags, and new formats that use existing compression formats in
the standard library. Additionally, wheels are used across multiple different
language implementations, which lag behind the CPython version. It seems unfair
to prevent their users from using a feature due to the Python version. Lastly,
while this PEP does not suggest tying the wheel version to CPython releases, a
future PEP may still do so at any time, so this choice does not need to be made
in this PEP.
Keep Using ``.whl`` as the File Extension
-----------------------------------------
While keeping the extension ``.whl`` is appealing for many reasons, it presents
several problems that are difficult to surmount. First, current installers
would still pick a new wheel and fail to install the package. Furthermore,
the file name of a wheel would not be able to change without breaking existing
installers that expect a set wheel file name format. While the current filename
specification for wheels is sufficient for current usage, the optional
build tag in the middle of the file name makes any extensions ambiguous (i.e.
``foo-0.3-py3-none-any-fancy_new_tag.whl`` would parse as the build tag being
``py3``). This limits changes to information stored in the wheel file name.
Discussion Topics
=================
Should Indices Support Dual-publishing for the First Migration?
---------------------------------------------------------------
Since ``.whl`` and ``.whlx`` will look different in file name, they could be
uploaded side-by-side to package indices like PyPI. This has some nice
benefits, like dual-support for older and newer installers, so users who can
get the latest features, while users who don't upgrade still can install the
latest version of a package.
There are many complications however. Should we allow wheel 2 uploads to
existing wheel 1-only releases? Should we put any requirements on the
side-by-side wheels, such as:
.. admonition:: Constraints on dual-published wheels
A given index may contain identical-content wheels with different wheel
versions, and installers should prefer the newest-available wheel format,
with all other factors held constant.
Should we only allow uploading both with :pep:`694` allowing "atomic"
dual-publishing?
Acknowledgements
================
The author of this PEP is greatly indebted to the incredibly valuable review,
advice, and feedback of Barry Warsaw and Michael Sarahan.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

View File

@ -0,0 +1,78 @@
:orphan:
.. _777-pypi-download-analysis:
Appendix: Analysis of Installer Usage on PyPI
=============================================
.. note::
This analysis is not perfect. While it uses the best available data,
mirrors, caches used by enterprises, and other confounding factors
could affect the numbers in this analysis. Consider the numbers as trends
rather than concrete reliable figures.
One pertinent question to :pep:`777` is how frequently Python users update their
installer. If users update quite frequently, compatibility concerns are not as
important; users will be up-to-date by the time new features get added. On the
other hand, if users are frequently using older installers, then incompatible
wheels on PyPI would have a much wider impact. To figure out the relative share
of up-to-date vs outdated installers, we can use PyPI download statistics.
PyPI publishes a `BigQuery dataset <https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi>`_,
which contains information about each download PyPI receives, including
installer name and version when available. The following query was used to
collect the data for this analysis:
.. code-block:: sql
#standardSQL
SELECT
details.installer.name as installer_name,
details.installer.version as installer_version,
COUNT(*) as num_downloads,
FROM `bigquery-public-data.pypi.file_downloads`
WHERE
-- Only query the last 6 months of data
DATE(timestamp)
BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH)
AND CURRENT_DATE()
GROUP BY `installer_name`, `installer_version`
ORDER BY `num_downloads` DESC
With the raw data available, we can start investigating how up-to-date
installers that download packages from PyPI are. The below chart shows the
breakdown by installer name of all downloads on PyPI for the six month period
from March 10, 2024 to September 10, 2024.
.. image:: appendix-dl-by-installer.png
:class: invert-in-dark-mode
:width: 600
:alt: A pie chart breaking down PyPI downloads by installer. pip makes up
87.5%, uv makes up 4.8%, poetry makes up 3.0%, requests makes up 1.6%,
and "null" makes up 2.1%.
As can be seen above, pip is the most popular installer in this time frame.
For simplicity's sake, this analysis will focus on pip installations when
considering how up-to-date installers are. pip has existed for a long
time, so analyzing the version of pip used to download packages should
provide an idea of how frequently users update their installers. Below is a
chart breaking down installations in PyPI over the same six month period, now
grouped by pip installer major version. pip uses calendar versioning, so
an installation from pip 20.x means that the user has not updated their pip
in four years.
.. image:: appendix-dl-by-pip-version.png
:class: invert-in-dark-mode
:width: 600
:alt: A pie chart breaking down PyPI downloads by pip major version. 24.x
makes up 47.7%, 23.x makes up 19.9%, 22.x makes up 10.5%, 21.x makes up
13.9%, 20.x makes up 5.4%, and 9.x makes up 1.9%.
Over two thirds of users currently run pip from this year or last. However,
about 7% are on a version that is at least four years old(!). This indicates that
there is a long tail of users who do not regularly update their installers.
Coming back to the initial question for PEP 777, it appears that caution should
be taken when publishing wheels with major version 2 to PyPI, as they are
likely to cause issues with a small but significant proportion of users who do
not regularly update their pip.