258 lines
12 KiB
Plaintext
258 lines
12 KiB
Plaintext
PEP: 527
|
||
Title: Removing Un(der)used file types/extensions on PyPI
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Donald Stufft <donald@stufft.io>
|
||
BDFL-Delegate: Nick Coghlan <ncoghlan@gmail.com>
|
||
Discussions-To: distutils-sig@python.org
|
||
Status: Draft
|
||
Type: Process
|
||
Content-Type: text/x-rst
|
||
Created: 23-Aug-2016
|
||
Post-History: 23-Aug-2016
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP recommends deprecating, and ultimately removing, support for uploading
|
||
certain unused or under used file types and extensions to PyPI. In particular
|
||
it recommends disallowing further uploads of any files of the types
|
||
``bdist_dumb``, ``bdist_rpm``, ``bdist_dmg``, ``bdist_msi``, and
|
||
``bdist_wininst``, leaving PyPI to only accept new uploads of the ``sdist``,
|
||
``bdist_wheel``, and ``bdist_egg`` file types.
|
||
|
||
In addition, this PEP proposes removing support for new uploads of sdists using
|
||
the ``.tar``, ``.tar.bz2``, ``.tar.xz``, ``.tar.Z``, ``.tgz``, ``.tbz``, and
|
||
any other extension besides ``.tar.gz`` and ``.zip``.
|
||
|
||
Finally, this PEP also proposes limiting the number of allowed sdist uploads
|
||
for each individual release of a project on PyPI to one instead of one for each
|
||
allowed extension.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
File Formats
|
||
------------
|
||
|
||
Currently PyPI supports the following file types:
|
||
|
||
* ``sdist``
|
||
* ``bdist_wheel``
|
||
* ``bdist_egg``
|
||
* ``bdist_wininst``
|
||
* ``bdist_msi``
|
||
* ``bdist_dmg``
|
||
* ``bdist_rpm``
|
||
* ``bdist_dumb``
|
||
|
||
However, these different types of files have varying amounts of usefulness or
|
||
general use in the ecosystem. Continuing to support them adds a maintenance
|
||
burden on PyPI as well as tool authors and incurs a cost in both bandwidth and
|
||
disk space not only on PyPI itself, but also on any mirrors of PyPI.
|
||
|
||
|
||
Python packaging is a multi-level ecosystem where PyPI is primarily suited and
|
||
used to distribute virtual environment compatible packages directly from their
|
||
respective project owners. These packages are then consumed either directly
|
||
by end-users, or by downstream distributors that take these packages and turn
|
||
them into their respective system level packages (such as RPM, deb, MSI, etc).
|
||
|
||
While PyPI itself only directly works with these Python specific but platform
|
||
agnostic packages, we encourage community-driven and commercial conversions of
|
||
these packages to downstream formats for particular target environments, like:
|
||
|
||
* The conda cross-platform data analysis ecosystem (conda-forge)
|
||
* The deb based Linux ecosystem (Debian, Ubuntu, etc)
|
||
* The RPM based Linux ecosystem (Fedora, openSuSE, Mageia, etc)
|
||
* The homebrew, MacPorts and fink ecosystems for Mac OS X
|
||
* The Windows Package Management ecosystem (NuGet, Chocalatey, etc)
|
||
* 3rd party creation of Windows MSIs and installers (e.g. Christoph Gohlke's
|
||
work at http://www.lfd.uci.edu/~gohlke/pythonlibs/ )
|
||
* other commercial redistribution formats (ActiveState's PyPM, Enthought
|
||
Canopy, etc)
|
||
* other open source community redistribution formats (Nix, Gentoo, Arch, \*BSD,
|
||
etc)
|
||
|
||
It is the belief of this PEP that the entire ecosystem is best supported by
|
||
keeping PyPI focused on the platform agnostic formats, where the limited amount
|
||
of time by volunteers can be best used instead of spreading the available time
|
||
out amongst several platforms. Further more, this PEP believes that the people
|
||
best positioned to provide well integrated packages for a particular platform
|
||
are people focused on that platform, and not across all possible platforms.
|
||
|
||
|
||
bdist_dumb
|
||
~~~~~~~~~~
|
||
|
||
As it's name implies, ``bdist_dumb`` is not a very complex format, however it
|
||
is so simple as to be worthless for actual usage.
|
||
|
||
For instance, if you're using something like pyenv on macOS and you're building
|
||
a library using Python 3.5, then ``bdist_dumb`` will produce a ``.tar.gz`` file
|
||
named something like ``exampleproject-1.0.macosx-10.11-x86_64.tar.gz``. Right
|
||
off the bat this file name is somewhat difficult to differentiate from an
|
||
``sdist`` since they both use the same file extension (and with the legacy pre
|
||
PEP 440 versions, ``1.0-macosx-10.11-x86_64`` is a valid, although quite silly,
|
||
version number). However, once you open up the created ``.tar.gz``, you'd find
|
||
that there is no metadata inside that could be used for things like dependency
|
||
discovery and in fact, it is quite simply a tarball containing hardcoded paths
|
||
to wherever files would have been installed on the computer creating the
|
||
``bdist_dumb``. Going back to our pyenv on macOS example, this means that if I
|
||
created it, it would contain files like:
|
||
|
||
``Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/example.py``
|
||
|
||
|
||
bdist_rpm
|
||
~~~~~~~~~
|
||
|
||
The ``bdist_rpm`` format on PyPI allows people to upload ``.rpm`` files for
|
||
end users to manually download by hand and then manually install by hand.
|
||
However, the common usage of ``rpm`` is with a specially designed repository
|
||
that allows automatic installation of dependencies, upgrades, etc which PyPI
|
||
does not provide. Thus, it is a type of file that is barely being used on PyPI
|
||
with only ~460 files of this type having been uploaded to PyPI (out a total of
|
||
662,544).
|
||
|
||
In addition, services like `COPR <https://copr.fedorainfracloud.org/>`_ provide
|
||
a better supported mechanism for publishing and using RPM files than we're ever
|
||
likely to get on PyPI.
|
||
|
||
|
||
bdist_dmg, bdist_msi, and bdist_wininst
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The ``bdist_dmg``, ``bdist_msi``, and ``bdist_winist`` formats are similar in
|
||
that they are an OS specific installer that will only install a library into an
|
||
environment and are not designed for real user facing installs of applications
|
||
(which would require things like bundling a Python interpreter and the like).
|
||
|
||
Out of these three, the usage for ``bdist_dmg`` and ``bdist_msi`` is very low,
|
||
with only ~500 ``bdist_msi`` files and ~50 ``bdist_dmg`` files having been
|
||
uploaded to PyPI. The ``bdist_wininst`` format has more use, with ~14,000 files
|
||
having ever been uploaded to PyPI.
|
||
|
||
It's quite easy to look at the low usage of ``bdist_dmg`` and ``bdist_msi`` and
|
||
conclude that removing them will be fairly low impact, however
|
||
``bdist_wininst`` has several orders of magnitude more usage. This is somewhat
|
||
misleading though, because although it has more people *uploading* those files
|
||
the actual usage of those uploaded files is fairly low. Taking a look at the
|
||
previous 30 days, we can see that 90% of all downloads of ``bdist_winist``
|
||
files from PyPI were generated by the mirroring infrastructure and 7% of them
|
||
were generated by setuptools (which can currently be better covered by
|
||
``bdist_egg`` files).
|
||
|
||
Given the small number of files uploaded for ``bdist_dmg`` and ``bdist_msi``
|
||
and that ``bdist_wininst`` is largely existing to either consume bandwidth and
|
||
disk space via the mirroring infrastructure *or* could be trivially replaced
|
||
with ``bdist_egg``, this PEP proposes to include these three formats in the
|
||
list of those to be disallowed.
|
||
|
||
|
||
File Extensions
|
||
---------------
|
||
|
||
Currently ``sdist`` supports a wide variety of file extensions like `.tar.gz``,
|
||
``.tar``, ``.tar.bz2``, ``.tar.xz``, ``.zip``, ``.tar.Z``, ``.tgz``, and
|
||
``.tbz``. However, of those the only extensions which get anything more than
|
||
negligable usage is ``.tar.gz`` with 444,338 sdists currently, ``.zip`` with
|
||
58,774 sdists currently, and ``.tar.bz2`` with 3,265 sdists currently.
|
||
|
||
Having multiple formats accepted requires tooling both within PyPI and outside
|
||
of PyPI to handle all of the various extensions that *might* be used (even if
|
||
nobody is currently using them). This doesn't only affect PyPI, but ripples out
|
||
throughout the ecosystem. In addition, the different formats all have different
|
||
requirements for what optional C libraries Python was linked against and
|
||
different requirements for what versions of Python they support. In addition,
|
||
multiple formats also create a weird situation where there may be two
|
||
``sdist`` files for a particular project/release with subtly different content.
|
||
|
||
It's easy to advocate that anything outside of ``.tar.gz``, ``.zip``, and
|
||
``.tar.bz2`` should be disallowed. Outside of a tiny handful, nobody has
|
||
actively been uploading these other types of files in the ~15 years of PyPI's
|
||
existence so they've obviously not been particularly useful. In addition, while
|
||
``.tar.xz`` is theoretically a nicer format than the other ``.tar.*`` formats
|
||
due to the better compression ratio achieved by LZMA, it is only available in
|
||
Python 3.3+ and has an optional dependency on the lzma C library.
|
||
|
||
Looking at the three extensions we *do* have in current use, it's also fairly
|
||
easy to conclude that ``.tar.bz2`` can be disallowed as well. It has a fairly
|
||
small number of files ever uploaded with it and it requires an additional
|
||
optional C library to handle the bzip2 compression.
|
||
|
||
Finally we get down to ``.tar.gz`` and ``.zip``. Looking at the pure numbers
|
||
for these two, we can see that ``.tar.gz`` is by far the most uploaded format,
|
||
with 444,338 total uploaded compared to ``.zip``'s 58,774 and on POSIX
|
||
operating systems ``.tar.gz`` is also the default produced by all currently
|
||
released versions of Python and setuptools. In addition, these two file types
|
||
both use the same C library (``zlib``) which is also required for
|
||
``bdist_wheel`` and ``bdist_egg``. The two wrinkles with deciding between
|
||
``.tar.gz`` and ``.zip`` is that while on POSIX operating systems ``.tar.gz``
|
||
is the default, on Windows ``.zip`` is the default and the ``bdist_wheel``
|
||
format also uses zip.
|
||
|
||
Instead of trying to standardize on either ``.tar.gz`` or ``.zip``, this PEP
|
||
proposes that we allow *either* ``.tar.gz`` or ``.zip`` for sdists.
|
||
|
||
|
||
Limiting number of sdists per release
|
||
-------------------------------------
|
||
|
||
A sdist on PyPI should be a single source of truth for a particular release of
|
||
software. However, currently PyPI allows you to upload one sdist for each of
|
||
the sdist file extensions it allows. Currently this allows something like 10
|
||
different sdists for a project, but even with this PEP it allows two different
|
||
sources of truth for a single version. Having multiple sdists often times can
|
||
account for strange bugs that only expose themselves based on which sdist that
|
||
the person used.
|
||
|
||
To resolve this, this PEP proposes to allow one, and only one, sdist per
|
||
release of a project.
|
||
|
||
|
||
Removal Process
|
||
===============
|
||
|
||
This PEP does **NOT** propose removing any existing files from PyPI, only
|
||
disallowing new ones from being uploaded. This restriction will be phased in on
|
||
a per-project basis to allow projects to adjust to the new restrictions where
|
||
applicable.
|
||
|
||
First, any *existing* projects will be flagged to allow legacy file types to be
|
||
uploaded, and any project without that flag (i.e. new projects) will not be
|
||
able to upload anything but ``sdist`` with a ``.tar.gz`` or ``.zip`` extension,
|
||
``bdist_wheel``, and ``bdist_egg``. Then, any existing projects that have never
|
||
uploaded a file that requires the legacy file type flag will have that flag
|
||
removed, also making them fall under the new restrictions. Finally, an email
|
||
will be generated to the maintainers of all projects still given the legacy
|
||
flag, which will inform them of the upcoming new restrictions on uploads and
|
||
tell them that these restrictions will be applied to future uploads to their
|
||
projects starting in 1 month. Finally, after 1 month all projects will have the
|
||
legacy file type flag removed, and support for uploading these types of files
|
||
will cease to exist on PyPI.
|
||
|
||
This plan should provide minimal disruption since it does not remove any
|
||
existing files, and the types of files it does prevent from being uploaded are
|
||
either not particularly useful (or used) types of files *or* they can continue
|
||
to upload a similar type of file with a slight change to their process.
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|