79 lines
3.6 KiB
ReStructuredText
79 lines
3.6 KiB
ReStructuredText
:orphan:
|
|
|
|
.. _777-pypi-download-analysis:
|
|
|
|
Appendix: Analysis of Installer Usage on PyPI
|
|
=============================================
|
|
|
|
.. note::
|
|
This analysis is not perfect. While it uses the best available data,
|
|
mirrors, caches used by enterprises, and other confounding factors
|
|
could affect the numbers in this analysis. Consider the numbers as trends
|
|
rather than concrete reliable figures.
|
|
|
|
One pertinent question to :pep:`777` is how frequently Python users update their
|
|
installer. If users update quite frequently, compatibility concerns are not as
|
|
important; users will be up-to-date by the time new features get added. On the
|
|
other hand, if users are frequently using older installers, then incompatible
|
|
wheels on PyPI would have a much wider impact. To figure out the relative share
|
|
of up-to-date vs outdated installers, we can use PyPI download statistics.
|
|
|
|
PyPI publishes a `BigQuery dataset <https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi>`_,
|
|
which contains information about each download PyPI receives, including
|
|
installer name and version when available. The following query was used to
|
|
collect the data for this analysis:
|
|
|
|
.. code-block:: sql
|
|
|
|
#standardSQL
|
|
SELECT
|
|
details.installer.name as installer_name,
|
|
details.installer.version as installer_version,
|
|
COUNT(*) as num_downloads,
|
|
FROM `bigquery-public-data.pypi.file_downloads`
|
|
WHERE
|
|
-- Only query the last 6 months of data
|
|
DATE(timestamp)
|
|
BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH)
|
|
AND CURRENT_DATE()
|
|
GROUP BY `installer_name`, `installer_version`
|
|
ORDER BY `num_downloads` DESC
|
|
|
|
With the raw data available, we can start investigating how up-to-date
|
|
installers that download packages from PyPI are. The below chart shows the
|
|
breakdown by installer name of all downloads on PyPI for the six month period
|
|
from March 10, 2024 to September 10, 2024.
|
|
|
|
.. image:: appendix-dl-by-installer.png
|
|
:class: invert-in-dark-mode
|
|
:width: 600
|
|
:alt: A pie chart breaking down PyPI downloads by installer. pip makes up
|
|
87.5%, uv makes up 4.8%, poetry makes up 3.0%, requests makes up 1.6%,
|
|
and "null" makes up 2.1%.
|
|
|
|
As can be seen above, pip is the most popular installer in this time frame.
|
|
For simplicity's sake, this analysis will focus on pip installations when
|
|
considering how up-to-date installers are. pip has existed for a long
|
|
time, so analyzing the version of pip used to download packages should
|
|
provide an idea of how frequently users update their installers. Below is a
|
|
chart breaking down installations in PyPI over the same six month period, now
|
|
grouped by pip installer major version. pip uses calendar versioning, so
|
|
an installation from pip 20.x means that the user has not updated their pip
|
|
in four years.
|
|
|
|
.. image:: appendix-dl-by-pip-version.png
|
|
:class: invert-in-dark-mode
|
|
:width: 600
|
|
:alt: A pie chart breaking down PyPI downloads by pip major version. 24.x
|
|
makes up 47.7%, 23.x makes up 19.9%, 22.x makes up 10.5%, 21.x makes up
|
|
13.9%, 20.x makes up 5.4%, and 9.x makes up 1.9%.
|
|
|
|
Over two thirds of users currently run pip from this year or last. However,
|
|
about 7% are on a version that is at least four years old(!). This indicates that
|
|
there is a long tail of users who do not regularly update their installers.
|
|
|
|
Coming back to the initial question for PEP 777, it appears that caution should
|
|
be taken when publishing wheels with major version 2 to PyPI, as they are
|
|
likely to cause issues with a small but significant proportion of users who do
|
|
not regularly update their pip.
|