235 lines
11 KiB
Plaintext
235 lines
11 KiB
Plaintext
|
PEP: 527
|
|||
|
Title: Removing Un(der)used file types/extensions on PyPI
|
|||
|
Version: $Revision$
|
|||
|
Last-Modified: $Date$
|
|||
|
Author: Donald Stufft <donald@stufft.io>
|
|||
|
BDFL-Delegate: TBD <donald@stufft.io>
|
|||
|
Discussions-To: distutils-sig@python.org
|
|||
|
Status: Draft
|
|||
|
Type: Process
|
|||
|
Content-Type: text/x-rst
|
|||
|
Created: 23-Aug-2016
|
|||
|
Post-History: 23-Aug-2016
|
|||
|
|
|||
|
|
|||
|
Abstract
|
|||
|
========
|
|||
|
|
|||
|
This PEP recommends deprecating, and ultimately removing, support for uploading
|
|||
|
certain unused or under used file types and extensions to PyPI. In particular
|
|||
|
it recommends disallowing further uploads of any files of the types
|
|||
|
``bdist_dumb``, ``bdist_rpm``, ``bdist_dmg``, ``bdist_msi``, and
|
|||
|
``bdist_wininst``, leaving PyPI to only accept new uploads of the ``sdist``,
|
|||
|
``bdist_wheel``, and ``bdist_egg`` file types.
|
|||
|
|
|||
|
In addition, this PEP proposes removing support for new uploads of sdists using
|
|||
|
the ``.tar``, ``.tar.bz2``, ``.tar.xz``, ``.zip``, ``.tar.Z``, ``.tgz``,
|
|||
|
``.tbz``, and any other extension besides ``.tar.gz``.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Rationale
|
|||
|
=========
|
|||
|
|
|||
|
File Formats
|
|||
|
------------
|
|||
|
|
|||
|
Currently PyPI supports the following file types:
|
|||
|
|
|||
|
* ``sdist``
|
|||
|
* ``bdist_wheel``
|
|||
|
* ``bdist_egg``
|
|||
|
* ``bdist_wininst``
|
|||
|
* ``bdist_msi``
|
|||
|
* ``bdist_dmg``
|
|||
|
* ``bdist_rpm``
|
|||
|
* ``bdist_dumb``
|
|||
|
|
|||
|
However, these different types of files have varying amounts of usefulness or
|
|||
|
general use in the ecosystem. Continuing to support them adds a maintenance
|
|||
|
burden on PyPI as well as tool authors and incurs a cost in both bandwidth and
|
|||
|
disk space not only on PyPI itself, but also on any mirrors of PyPI.
|
|||
|
|
|||
|
bdist_dumb
|
|||
|
~~~~~~~~~~
|
|||
|
|
|||
|
As it's name implies, ``bdist_dumb`` is not a very complex format, however it
|
|||
|
is so simple as to be worthless for actual usage.
|
|||
|
|
|||
|
For instance, if you're using something like pyenv on macOS and you're building
|
|||
|
a library using Python 3.5, then ``bdist_dumb`` will produce a ``.tar.gz`` file
|
|||
|
named something like ``exampleproject-1.0.macosx-10.11-x86_64.tar.gz``. Right
|
|||
|
off the bat this file name is somewhat difficult to differentiate from an
|
|||
|
``sdist`` since they both use the same file extension (and with the legacy pre
|
|||
|
PEP 440 versions, ``1.0-macosx-10.11-x86_64`` is a valid, although quite silly,
|
|||
|
version number). However, once you open up the created ``.tar.gz``, you'd find
|
|||
|
that there is no metadata inside that could be used for things like dependency
|
|||
|
discovery and in fact, it is quite simply a tarball containing hardcoded paths
|
|||
|
to wherever files would have been installed on the computer creating the
|
|||
|
``bdist_dumb``. Going back to our pyenv on macOS example, this means that if I
|
|||
|
created it, it would contain files like:
|
|||
|
|
|||
|
``Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/example.py``
|
|||
|
|
|||
|
|
|||
|
bdist_rpm
|
|||
|
~~~~~~~~~
|
|||
|
|
|||
|
The ``bdist_rpm`` format on PyPI allows people to upload ``.rpm`` files for
|
|||
|
end users to manually download by hand and then manually install by hand.
|
|||
|
However, the common usage of ``rpm`` is with a specially designed repository
|
|||
|
that allows automatic installation of dependencies, upgrades, etc which PyPI
|
|||
|
does not provide. Thus, it is a type of file that is barely being used on PyPI
|
|||
|
with only ~460 files of this type having been uploaded to PyPI (out a total of
|
|||
|
662,544).
|
|||
|
|
|||
|
In addition, services like `COPR <https://copr.fedorainfracloud.org/>`_ provide
|
|||
|
a better supported mechanism for publishing and using RPM files than we're ever
|
|||
|
likely to get on PyPI.
|
|||
|
|
|||
|
|
|||
|
bdist_dmg, bdist_msi, and bdist_wininst
|
|||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|||
|
|
|||
|
The ``bdist_dmg``, ``bdist_msi``, and ``bdist_winist`` formats are similar in
|
|||
|
that they are an OS specific installer that will only install a library into an
|
|||
|
environment and are not designed for real user facing installs of applications
|
|||
|
(which would require things like bundling a Python interpreter and the like).
|
|||
|
|
|||
|
Out of these three, the usage for ``bdist_dmg`` and ``bdist_msi`` is very low,
|
|||
|
with only ~500 ``bdist_msi`` files and ~50 ``bdist_dmg`` files having been
|
|||
|
uploaded to PyPI. The ``bdist_wininst`` format has more use, with ~14,000 files
|
|||
|
having ever been uploaded to PyPI.
|
|||
|
|
|||
|
It's quite easy to look at the low usage of ``bdist_dmg`` and ``bdist_msi`` and
|
|||
|
conclude that removing them will be fairly low impact, however
|
|||
|
``bdist_wininst`` has several orders of magnitude more usage. This is somewhat
|
|||
|
misleading though, because although it has more people *uploading* those files
|
|||
|
the actual usage of those uploaded files is fairly low. Taking a look at the
|
|||
|
previous 30 days, we can see that 90% of all downloads of ``bdist_winist``
|
|||
|
files from PyPI were generated by the mirroring infrastructure and 7% of them
|
|||
|
were generated by setuptools (which can currently be better covered by
|
|||
|
``bdist_egg`` files).
|
|||
|
|
|||
|
Given the small number of files uploaded for ``bdist_dmg`` and ``bdist_msi``
|
|||
|
and that ``bdist_wininst`` is largely existing to either consume bandwidth and
|
|||
|
disk space via the mirroring infrastructure *or* could be trivially replaced
|
|||
|
with ``bdist_egg``, this PEP proposes to include these three formats in the
|
|||
|
list of those to be disallowed.
|
|||
|
|
|||
|
|
|||
|
File Extensions
|
|||
|
---------------
|
|||
|
|
|||
|
Currently ``sdist`` supports a wide variety of file extensions like `.tar.gz``,
|
|||
|
``.tar``, ``.tar.bz2``, ``.tar.xz``, ``.zip``, ``.tar.Z``, ``.tgz``, and
|
|||
|
``.tbz``. However, of those the only extensions which get anything more than
|
|||
|
negligable usage is ``.tar.gz`` with 444,338 sdists currently, ``.zip`` with
|
|||
|
58,774 sdists currently, and ``.tar.bz2`` with 3,265 sdists currently.
|
|||
|
|
|||
|
Having multiple formats accepted requires tooling both within PyPI and outside
|
|||
|
of PyPI to handle all of the various extensions that *might* be used (even if
|
|||
|
nobody is currently using them). This doesn't only affect PyPI, but ripples out
|
|||
|
throughout the ecosystem. In addition, the different formats all have different
|
|||
|
requirements for what optional C libraries Python was linked against and
|
|||
|
different requirements for what versions of Python they support. In addition,
|
|||
|
multiple formats also create a weird situation where there may be two
|
|||
|
``sdist`` files for a particular project/release with subtly different content.
|
|||
|
|
|||
|
It's easy to advocate that anything outside of ``.tar.gz``, ``.zip``, and
|
|||
|
``.tar.bz2`` should be disallowed. Outside of a tiny handful, nobody has
|
|||
|
actively been uploading these other types of files in the ~15 years of PyPI's
|
|||
|
existence so they've obviously not been particularly useful. In addition, while
|
|||
|
``.tar.xz`` is theoretically a nicer format than the other ``.tar.*`` formats
|
|||
|
due to the better compression ratio achieved by LZMA, it is only available in
|
|||
|
Python 3.3+ and has an optional dependency on the lzma C library.
|
|||
|
|
|||
|
Looking at the three extensions we *do* have in current use, it's also fairly
|
|||
|
easy to conclude that ``.tar.bz2`` can be disallowed as well. It has a fairly
|
|||
|
small number of files ever uploaded with it and it requires an additional
|
|||
|
optional C library to handle the bzip2 compression.
|
|||
|
|
|||
|
Finally we get down to ``.tar.gz`` and ``.zip``. Looking at the pure numbers
|
|||
|
for these two, we can see that ``.tar.gz`` is by far the most uploaded format,
|
|||
|
with 444,338 total uploaded compared to ``.zip``'s 58,774 and on POSIX
|
|||
|
operating systems ``.tar.gz`` is also the default produced by all currently
|
|||
|
released versions of Python and setuptools. In addition, these two file types
|
|||
|
both use the same C library (``zlib``) which is also required for
|
|||
|
``bdist_wheel`` and ``bdist_egg``. The two wrinkles with deciding between
|
|||
|
``.tar.gz`` and ``.zip`` is that while on POSIX operating systems ``.tar.gz``
|
|||
|
is the default, on Windows ``.zip`` is the default and the ``bdist_wheel``
|
|||
|
format also uses zip.
|
|||
|
|
|||
|
This PEP proposes that we drop the use of ``.zip`` extensions for sdists on
|
|||
|
PyPI and standardize around ``.tar.gz``. For both extensions there are going to
|
|||
|
be automation designed by end users which are making assumptions about what the
|
|||
|
file extension produced by the ``sdist`` command will be. Changing either
|
|||
|
default will break some number of those, so by changing the default of ``.zip``
|
|||
|
to ``.tar.gz`` we minimize the amount of breakage by taking the smaller number
|
|||
|
of users and making them match the larger number. In addition, it's more likely
|
|||
|
to see Windows users upgrade their setuptools and Python releases on a faster
|
|||
|
timescale than POSIX users. POSIX users often get their Python and setuptools
|
|||
|
from their OS vendor and are discouraged or actively prevented from upgrading
|
|||
|
them outside of complete OS upgrades while Windows users *must* install Python
|
|||
|
and setuptools on their own, and thus are more able to upgrade those pieces
|
|||
|
without triggering a complete OS upgrade.
|
|||
|
|
|||
|
While it is true that switching to ``.zip`` would align ``sdist`` with
|
|||
|
``bdist_wheel`` in terms of format, this is not a very large benefit because
|
|||
|
both formats are able to be manipulated with the Python standard library just
|
|||
|
as easily and both require the same C library (``zlib``). It is also true that
|
|||
|
Windows has support for ``.zip`` files out of the box but requires third party
|
|||
|
software for ``.tar.gz``, however only 0.6% of downloads for sdists on PyPI are
|
|||
|
initiated by browsers and we can assume that only a fraction of those 0.6% are
|
|||
|
Windows users who want to manually extract the file and do not have a means of
|
|||
|
extracting a ``.tar.gz``, particularly since Python itself can be used to
|
|||
|
extract a ``.tar.gz`` via the command line since version 3.4. In addition, the
|
|||
|
use of ``.tar.gz`` will result in smaller sdists which will reduce the amount
|
|||
|
of bandwidth and disk space consumed by ``sdist`` files.
|
|||
|
|
|||
|
|
|||
|
Removal Process
|
|||
|
===============
|
|||
|
|
|||
|
This PEP does **NOT** propose removing any existing files from PyPI, only
|
|||
|
disallowing new ones from being uploaded. This restriction will be phased in on
|
|||
|
a per-project basis to allow projects to adjust to the new restrictions where
|
|||
|
applicable.
|
|||
|
|
|||
|
First, any *existing* projects will be flagged to allow legacy file types to be
|
|||
|
uploaded, and any project without that flag (i.e. new projects) will not be
|
|||
|
able to upload anything but ``sdist`` with a ``.tar.gz`` extension,
|
|||
|
``bdist_wheel``, and ``bdist_egg``. Then, any existing projects that have never
|
|||
|
uploaded a file that requires the legacy file type flag will have that flag
|
|||
|
removed, also making them fall under the new restrictions. Finally, an email
|
|||
|
will be generated to the maintainers of all projects still given the legacy
|
|||
|
flag, which will inform them of the upcoming new restrictions on uploads and
|
|||
|
tell them that these restrictions will be applied to future uploads to their
|
|||
|
projects starting in 1 month. This email should also contain work arounds for
|
|||
|
older versions of Python/setuptools on Windows, to get a ``.tar.gz`` by
|
|||
|
default. Finally, after 1 month all projects will have the legacy file type
|
|||
|
flag removed, and support for uploading these types of files will cease to
|
|||
|
exist on PyPI.
|
|||
|
|
|||
|
This plan should provide minimal disruption since it does not remove any
|
|||
|
existing files, and the types of files it does prevent from being uploaded are
|
|||
|
either not particularly useful (or used) types of files *or* they can continue
|
|||
|
to upload a similar type of file with a slight change to their process.
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
=========
|
|||
|
|
|||
|
This document has been placed in the public domain.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
..
|
|||
|
Local Variables:
|
|||
|
mode: indented-text
|
|||
|
indent-tabs-mode: nil
|
|||
|
sentence-end-double-space: t
|
|||
|
fill-column: 70
|
|||
|
coding: utf-8
|
|||
|
End:
|