PEP 527 - Removing Un(der)used file types/extensions on PyPI (#73)
This commit is contained in:
parent
2245b0b334
commit
f6d8653a77
|
@ -0,0 +1,234 @@
|
|||
PEP: 527
|
||||
Title: Removing Un(der)used file types/extensions on PyPI
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Donald Stufft <donald@stufft.io>
|
||||
BDFL-Delegate: TBD <donald@stufft.io>
|
||||
Discussions-To: distutils-sig@python.org
|
||||
Status: Draft
|
||||
Type: Process
|
||||
Content-Type: text/x-rst
|
||||
Created: 23-Aug-2016
|
||||
Post-History: 23-Aug-2016
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP recommends deprecating, and ultimately removing, support for uploading
|
||||
certain unused or under used file types and extensions to PyPI. In particular
|
||||
it recommends disallowing further uploads of any files of the types
|
||||
``bdist_dumb``, ``bdist_rpm``, ``bdist_dmg``, ``bdist_msi``, and
|
||||
``bdist_wininst``, leaving PyPI to only accept new uploads of the ``sdist``,
|
||||
``bdist_wheel``, and ``bdist_egg`` file types.
|
||||
|
||||
In addition, this PEP proposes removing support for new uploads of sdists using
|
||||
the ``.tar``, ``.tar.bz2``, ``.tar.xz``, ``.zip``, ``.tar.Z``, ``.tgz``,
|
||||
``.tbz``, and any other extension besides ``.tar.gz``.
|
||||
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
File Formats
|
||||
------------
|
||||
|
||||
Currently PyPI supports the following file types:
|
||||
|
||||
* ``sdist``
|
||||
* ``bdist_wheel``
|
||||
* ``bdist_egg``
|
||||
* ``bdist_wininst``
|
||||
* ``bdist_msi``
|
||||
* ``bdist_dmg``
|
||||
* ``bdist_rpm``
|
||||
* ``bdist_dumb``
|
||||
|
||||
However, these different types of files have varying amounts of usefulness or
|
||||
general use in the ecosystem. Continuing to support them adds a maintenance
|
||||
burden on PyPI as well as tool authors and incurs a cost in both bandwidth and
|
||||
disk space not only on PyPI itself, but also on any mirrors of PyPI.
|
||||
|
||||
bdist_dumb
|
||||
~~~~~~~~~~
|
||||
|
||||
As it's name implies, ``bdist_dumb`` is not a very complex format, however it
|
||||
is so simple as to be worthless for actual usage.
|
||||
|
||||
For instance, if you're using something like pyenv on macOS and you're building
|
||||
a library using Python 3.5, then ``bdist_dumb`` will produce a ``.tar.gz`` file
|
||||
named something like ``exampleproject-1.0.macosx-10.11-x86_64.tar.gz``. Right
|
||||
off the bat this file name is somewhat difficult to differentiate from an
|
||||
``sdist`` since they both use the same file extension (and with the legacy pre
|
||||
PEP 440 versions, ``1.0-macosx-10.11-x86_64`` is a valid, although quite silly,
|
||||
version number). However, once you open up the created ``.tar.gz``, you'd find
|
||||
that there is no metadata inside that could be used for things like dependency
|
||||
discovery and in fact, it is quite simply a tarball containing hardcoded paths
|
||||
to wherever files would have been installed on the computer creating the
|
||||
``bdist_dumb``. Going back to our pyenv on macOS example, this means that if I
|
||||
created it, it would contain files like:
|
||||
|
||||
``Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/example.py``
|
||||
|
||||
|
||||
bdist_rpm
|
||||
~~~~~~~~~
|
||||
|
||||
The ``bdist_rpm`` format on PyPI allows people to upload ``.rpm`` files for
|
||||
end users to manually download by hand and then manually install by hand.
|
||||
However, the common usage of ``rpm`` is with a specially designed repository
|
||||
that allows automatic installation of dependencies, upgrades, etc which PyPI
|
||||
does not provide. Thus, it is a type of file that is barely being used on PyPI
|
||||
with only ~460 files of this type having been uploaded to PyPI (out a total of
|
||||
662,544).
|
||||
|
||||
In addition, services like `COPR <https://copr.fedorainfracloud.org/>`_ provide
|
||||
a better supported mechanism for publishing and using RPM files than we're ever
|
||||
likely to get on PyPI.
|
||||
|
||||
|
||||
bdist_dmg, bdist_msi, and bdist_wininst
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ``bdist_dmg``, ``bdist_msi``, and ``bdist_winist`` formats are similar in
|
||||
that they are an OS specific installer that will only install a library into an
|
||||
environment and are not designed for real user facing installs of applications
|
||||
(which would require things like bundling a Python interpreter and the like).
|
||||
|
||||
Out of these three, the usage for ``bdist_dmg`` and ``bdist_msi`` is very low,
|
||||
with only ~500 ``bdist_msi`` files and ~50 ``bdist_dmg`` files having been
|
||||
uploaded to PyPI. The ``bdist_wininst`` format has more use, with ~14,000 files
|
||||
having ever been uploaded to PyPI.
|
||||
|
||||
It's quite easy to look at the low usage of ``bdist_dmg`` and ``bdist_msi`` and
|
||||
conclude that removing them will be fairly low impact, however
|
||||
``bdist_wininst`` has several orders of magnitude more usage. This is somewhat
|
||||
misleading though, because although it has more people *uploading* those files
|
||||
the actual usage of those uploaded files is fairly low. Taking a look at the
|
||||
previous 30 days, we can see that 90% of all downloads of ``bdist_winist``
|
||||
files from PyPI were generated by the mirroring infrastructure and 7% of them
|
||||
were generated by setuptools (which can currently be better covered by
|
||||
``bdist_egg`` files).
|
||||
|
||||
Given the small number of files uploaded for ``bdist_dmg`` and ``bdist_msi``
|
||||
and that ``bdist_wininst`` is largely existing to either consume bandwidth and
|
||||
disk space via the mirroring infrastructure *or* could be trivially replaced
|
||||
with ``bdist_egg``, this PEP proposes to include these three formats in the
|
||||
list of those to be disallowed.
|
||||
|
||||
|
||||
File Extensions
|
||||
---------------
|
||||
|
||||
Currently ``sdist`` supports a wide variety of file extensions like `.tar.gz``,
|
||||
``.tar``, ``.tar.bz2``, ``.tar.xz``, ``.zip``, ``.tar.Z``, ``.tgz``, and
|
||||
``.tbz``. However, of those the only extensions which get anything more than
|
||||
negligable usage is ``.tar.gz`` with 444,338 sdists currently, ``.zip`` with
|
||||
58,774 sdists currently, and ``.tar.bz2`` with 3,265 sdists currently.
|
||||
|
||||
Having multiple formats accepted requires tooling both within PyPI and outside
|
||||
of PyPI to handle all of the various extensions that *might* be used (even if
|
||||
nobody is currently using them). This doesn't only affect PyPI, but ripples out
|
||||
throughout the ecosystem. In addition, the different formats all have different
|
||||
requirements for what optional C libraries Python was linked against and
|
||||
different requirements for what versions of Python they support. In addition,
|
||||
multiple formats also create a weird situation where there may be two
|
||||
``sdist`` files for a particular project/release with subtly different content.
|
||||
|
||||
It's easy to advocate that anything outside of ``.tar.gz``, ``.zip``, and
|
||||
``.tar.bz2`` should be disallowed. Outside of a tiny handful, nobody has
|
||||
actively been uploading these other types of files in the ~15 years of PyPI's
|
||||
existence so they've obviously not been particularly useful. In addition, while
|
||||
``.tar.xz`` is theoretically a nicer format than the other ``.tar.*`` formats
|
||||
due to the better compression ratio achieved by LZMA, it is only available in
|
||||
Python 3.3+ and has an optional dependency on the lzma C library.
|
||||
|
||||
Looking at the three extensions we *do* have in current use, it's also fairly
|
||||
easy to conclude that ``.tar.bz2`` can be disallowed as well. It has a fairly
|
||||
small number of files ever uploaded with it and it requires an additional
|
||||
optional C library to handle the bzip2 compression.
|
||||
|
||||
Finally we get down to ``.tar.gz`` and ``.zip``. Looking at the pure numbers
|
||||
for these two, we can see that ``.tar.gz`` is by far the most uploaded format,
|
||||
with 444,338 total uploaded compared to ``.zip``'s 58,774 and on POSIX
|
||||
operating systems ``.tar.gz`` is also the default produced by all currently
|
||||
released versions of Python and setuptools. In addition, these two file types
|
||||
both use the same C library (``zlib``) which is also required for
|
||||
``bdist_wheel`` and ``bdist_egg``. The two wrinkles with deciding between
|
||||
``.tar.gz`` and ``.zip`` is that while on POSIX operating systems ``.tar.gz``
|
||||
is the default, on Windows ``.zip`` is the default and the ``bdist_wheel``
|
||||
format also uses zip.
|
||||
|
||||
This PEP proposes that we drop the use of ``.zip`` extensions for sdists on
|
||||
PyPI and standardize around ``.tar.gz``. For both extensions there are going to
|
||||
be automation designed by end users which are making assumptions about what the
|
||||
file extension produced by the ``sdist`` command will be. Changing either
|
||||
default will break some number of those, so by changing the default of ``.zip``
|
||||
to ``.tar.gz`` we minimize the amount of breakage by taking the smaller number
|
||||
of users and making them match the larger number. In addition, it's more likely
|
||||
to see Windows users upgrade their setuptools and Python releases on a faster
|
||||
timescale than POSIX users. POSIX users often get their Python and setuptools
|
||||
from their OS vendor and are discouraged or actively prevented from upgrading
|
||||
them outside of complete OS upgrades while Windows users *must* install Python
|
||||
and setuptools on their own, and thus are more able to upgrade those pieces
|
||||
without triggering a complete OS upgrade.
|
||||
|
||||
While it is true that switching to ``.zip`` would align ``sdist`` with
|
||||
``bdist_wheel`` in terms of format, this is not a very large benefit because
|
||||
both formats are able to be manipulated with the Python standard library just
|
||||
as easily and both require the same C library (``zlib``). It is also true that
|
||||
Windows has support for ``.zip`` files out of the box but requires third party
|
||||
software for ``.tar.gz``, however only 0.6% of downloads for sdists on PyPI are
|
||||
initiated by browsers and we can assume that only a fraction of those 0.6% are
|
||||
Windows users who want to manually extract the file and do not have a means of
|
||||
extracting a ``.tar.gz``, particularly since Python itself can be used to
|
||||
extract a ``.tar.gz`` via the command line since version 3.4. In addition, the
|
||||
use of ``.tar.gz`` will result in smaller sdists which will reduce the amount
|
||||
of bandwidth and disk space consumed by ``sdist`` files.
|
||||
|
||||
|
||||
Removal Process
|
||||
===============
|
||||
|
||||
This PEP does **NOT** propose removing any existing files from PyPI, only
|
||||
disallowing new ones from being uploaded. This restriction will be phased in on
|
||||
a per-project basis to allow projects to adjust to the new restrictions where
|
||||
applicable.
|
||||
|
||||
First, any *existing* projects will be flagged to allow legacy file types to be
|
||||
uploaded, and any project without that flag (i.e. new projects) will not be
|
||||
able to upload anything but ``sdist`` with a ``.tar.gz`` extension,
|
||||
``bdist_wheel``, and ``bdist_egg``. Then, any existing projects that have never
|
||||
uploaded a file that requires the legacy file type flag will have that flag
|
||||
removed, also making them fall under the new restrictions. Finally, an email
|
||||
will be generated to the maintainers of all projects still given the legacy
|
||||
flag, which will inform them of the upcoming new restrictions on uploads and
|
||||
tell them that these restrictions will be applied to future uploads to their
|
||||
projects starting in 1 month. This email should also contain work arounds for
|
||||
older versions of Python/setuptools on Windows, to get a ``.tar.gz`` by
|
||||
default. Finally, after 1 month all projects will have the legacy file type
|
||||
flag removed, and support for uploading these types of files will cease to
|
||||
exist on PyPI.
|
||||
|
||||
This plan should provide minimal disruption since it does not remove any
|
||||
existing files, and the types of files it does prevent from being uploaded are
|
||||
either not particularly useful (or used) types of files *or* they can continue
|
||||
to upload a similar type of file with a slight change to their process.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue