Add PEP 438: Transitioning to release-file hosting on PyPI, submitted by Holger Krekel.
This commit is contained in:
parent
43a0b7904d
commit
c084fd16c3
|
@ -0,0 +1,387 @@
|
||||||
|
PEP: 438
|
||||||
|
Title: Transitioning to release-file hosting on PyPI
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: Holger Krekel <holger@merlinux.eu>, Carl Meyer <carl@oddbird.net>
|
||||||
|
Discussions-To: catalog-sig@python.org
|
||||||
|
Status: Draft
|
||||||
|
Type: Process
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 15-Mar-2013
|
||||||
|
Post-History:
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
This PEP proposes a backward-compatible two-phase transition process
|
||||||
|
to speed up, simplify and robustify installing from the
|
||||||
|
pypi.python.org (PyPI) package index. To ease the transition and
|
||||||
|
minimize client-side friction, **no changes to distutils or existing
|
||||||
|
installation tools are required in order to benefit from the first
|
||||||
|
transition phase, which will result in faster, more reliable installs
|
||||||
|
for most existing packages**.
|
||||||
|
|
||||||
|
The first transition phase implements an easy and explicit means for a
|
||||||
|
package maintainer to control which release file links are served to
|
||||||
|
present-day installation tools. The first phase also includes the
|
||||||
|
implementation of analysis tools for present-day packages, to support
|
||||||
|
communication with package maintainers and the automated setting of
|
||||||
|
default modes for controlling release file links. The first phase
|
||||||
|
also will default newly-registered projects on PyPI to only serve
|
||||||
|
links to release files which were uploaded to PyPI.
|
||||||
|
|
||||||
|
The second transition phase concerns end-user installation tools,
|
||||||
|
which shall default to only install release files that are hosted on
|
||||||
|
PyPI and tell the user if external release files exist, offering a
|
||||||
|
choice to automatically use those external files.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
=========
|
||||||
|
|
||||||
|
.. _history:
|
||||||
|
|
||||||
|
History and motivations for external hosting
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
When PyPI went online, it offered release registration but had no
|
||||||
|
facility to host release files itself. When hosting was added, no
|
||||||
|
automated downloading tool existed yet. When Philip Eby implemented
|
||||||
|
automated downloading (through setuptools), he made the choice to
|
||||||
|
allow people to use download hosts of their choice. The finding of
|
||||||
|
externally-hosted packages was implemented as follows:
|
||||||
|
|
||||||
|
#. The PyPI ``simple/`` index for a package contains all links found
|
||||||
|
by scraping them from that package's long_description metadata for
|
||||||
|
any release. Links in the "Download-URL" and "Home-page" metadata
|
||||||
|
fields are given ``rel=download`` and ``rel=homepage`` attributes,
|
||||||
|
respectively.
|
||||||
|
|
||||||
|
#. Any of these links whose target is a file whose name appears to be
|
||||||
|
in the form of an installable source or binary distribution, with
|
||||||
|
name in the form "packagename-version.ARCHIVEEXT", is considered a
|
||||||
|
potential installation candidate by installation tools.
|
||||||
|
|
||||||
|
#. Similarly, any links suffixed with an "#egg=packagename-version"
|
||||||
|
fragment are considered an installation candidate.
|
||||||
|
|
||||||
|
#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
|
||||||
|
crawled by installation tools and, if HTML, are themselves scraped
|
||||||
|
for release-file links in the above formats.
|
||||||
|
|
||||||
|
Today, most packages released on PyPI host their release files on
|
||||||
|
PyPI, but a small percentage (XXX need updated data) rely on external
|
||||||
|
hosting.
|
||||||
|
|
||||||
|
There are many reasons [2]_ why people have chosen external
|
||||||
|
hosting. To cite just a few:
|
||||||
|
|
||||||
|
- release processes and scripts have been developed already and upload
|
||||||
|
to external sites
|
||||||
|
|
||||||
|
- it takes too long to upload large files from some places in the
|
||||||
|
world
|
||||||
|
|
||||||
|
- export restrictions e.g. for crypto-related software
|
||||||
|
|
||||||
|
- company policies which require offering open source packages through
|
||||||
|
own sites
|
||||||
|
|
||||||
|
- problems with integrating uploading to PyPI into one's release
|
||||||
|
process (because of release policies)
|
||||||
|
|
||||||
|
- desiring download statistics different from those maintained by PyPI
|
||||||
|
|
||||||
|
- perceived bad reliability of PyPI
|
||||||
|
|
||||||
|
- not aware that PyPI offers file-hosting
|
||||||
|
|
||||||
|
Irrespective of the present-day validity of these reasons, there
|
||||||
|
clearly is a history why people choose to host files externally and it
|
||||||
|
even was for some time the only way you could do things. This PEP
|
||||||
|
takes the position that there are at least some valid reasons for
|
||||||
|
external hosting.
|
||||||
|
|
||||||
|
Problem
|
||||||
|
-------
|
||||||
|
|
||||||
|
**Today, python package installers (pip, easy_install, buildout, and
|
||||||
|
others) often need to query many non-PyPI URLs even if there are no
|
||||||
|
externally hosted files**. Apart from querying pypi.python.org's
|
||||||
|
simple index pages, also all homepages and download pages ever
|
||||||
|
specified with any release of a package are crawled by an installer.
|
||||||
|
The need for installers to crawl external sites slows down
|
||||||
|
installation and makes for a brittle and unreliable installation
|
||||||
|
process. Those sites and packages also don't take part in the
|
||||||
|
:pep:`381` mirroring infrastructure, further decreasing reliability
|
||||||
|
and speed of automated installation processes around the world.
|
||||||
|
|
||||||
|
Most packages are hosted directly on pypi.python.org [1]_. Even for
|
||||||
|
these packages, installers still crawl their homepage and
|
||||||
|
download-url, if specified. Many package uploaders are not aware that
|
||||||
|
specifying the "homepage" or "download-url" in their package metadata
|
||||||
|
will needlessly slow down the installation process for all users.
|
||||||
|
|
||||||
|
Relying on third party sites also opens up more attack vectors for
|
||||||
|
injecting malicious packages into sites using automated installs. A
|
||||||
|
simple attack might just involve getting hold of an old now-unused
|
||||||
|
homepage domain and placing malicious packages there. Moreover,
|
||||||
|
performing a Man-in-The-Middle (MITM) attack between an installation
|
||||||
|
site and any of the download sites can inject malicious packages on
|
||||||
|
the installation site. As many homepages and download locations are
|
||||||
|
using HTTP and not HTTPS, such attacks are not hard to launch. Such
|
||||||
|
MITM attacks can easily happen even for packages which never intended
|
||||||
|
to host files externally as their homepages are contacted by
|
||||||
|
installers anyway.
|
||||||
|
|
||||||
|
There is currently no way for package maintainers to avoid
|
||||||
|
external-link crawling, other than removing all homepage/download url
|
||||||
|
metadata for all historic releases. While a script [3]_ has been
|
||||||
|
written to perform this action, it is not a good general solution
|
||||||
|
because it removes useful metadata from PyPI releases.
|
||||||
|
|
||||||
|
Even if the sites referenced by "Homepage" and "Download-URL" links
|
||||||
|
were not scraped for further links, there is no obvious way under the
|
||||||
|
current system for a package owner to link to an installable file from
|
||||||
|
a long_description metadata field (which is shown as package
|
||||||
|
documentation on ``/pypi/PKG``) without installation tools
|
||||||
|
automatically considering that file a candidate for installation.
|
||||||
|
Conversely, there is no way to explicitly register multiple external
|
||||||
|
release files without putting them in metadata fields.
|
||||||
|
|
||||||
|
|
||||||
|
Goals
|
||||||
|
-----
|
||||||
|
|
||||||
|
These are the goals to be achieved by implementation of this PEP:
|
||||||
|
|
||||||
|
* Package owners should be able to explicitly control which files are
|
||||||
|
presented by PyPI to installer tools as installation
|
||||||
|
candidates. Installation should not be slowed and made less reliable
|
||||||
|
by extensive and unnecessary crawling of links that package owners
|
||||||
|
did not explicitly nominate as installation files.
|
||||||
|
|
||||||
|
* It should remain possible for package owners to choose to host their
|
||||||
|
release files on their own hosting, external to PyPI. It should be
|
||||||
|
easy for a user to request the installation of such releases using
|
||||||
|
automated installer tools.
|
||||||
|
|
||||||
|
* Automated installer tools should not install externally-hosted
|
||||||
|
packages **by default**, but only when explicitly authorized to do
|
||||||
|
so by the user. When tools refuse to install such a package by
|
||||||
|
default, they should tell the user exactly which external link(s)
|
||||||
|
they would need to follow, and what option(s) the user can provide
|
||||||
|
to authorize the tool to follow those links. PyPI should provide all
|
||||||
|
necessary metadata for installer tools to implement this easily and
|
||||||
|
within a single request/reply interaction.
|
||||||
|
|
||||||
|
* Migration from the status quo to the above points should be gradual
|
||||||
|
and minimize breakage. This includes tooling that makes it easy for
|
||||||
|
package owners with an existing release process that uploads to
|
||||||
|
non-PyPI hosting to also upload those release files to PyPI.
|
||||||
|
|
||||||
|
|
||||||
|
Solution / two transition phases
|
||||||
|
================================
|
||||||
|
|
||||||
|
The first transition phase introduces a "hosting-mode" field for each
|
||||||
|
project on PyPI, allowing package owners explicit control of which
|
||||||
|
release file links are served to present-day installation tools in the
|
||||||
|
machine-readable ``simple/`` index. The first transition will, after
|
||||||
|
successful hosting-mode manipulations by individual early-adopters,
|
||||||
|
set a default hosting mode for existing packages, based on automated
|
||||||
|
analysis. **Maintainers will be notified one month ahead of any such
|
||||||
|
automated change**. At completion of the first transition phase,
|
||||||
|
**all present-day existing release and installation processes and
|
||||||
|
tools are expected to continue working**. Any remaining errors or
|
||||||
|
problems are expected to only relate to installation of individual
|
||||||
|
packages and can be easily corrected by package maintainers or PyPI
|
||||||
|
admins if maintainers are not reachable.
|
||||||
|
|
||||||
|
Also in the first phase, each link served in the ``simple/`` index
|
||||||
|
will be explicitly marked as ``rel="internal"`` (hosted by the index
|
||||||
|
itself) or ``rel="external"`` (linking to an external site that is not
|
||||||
|
part of the index).
|
||||||
|
|
||||||
|
In the second transition phase, PyPI client installation tools shall
|
||||||
|
be updated to default to only install ``rel="internal"`` packages
|
||||||
|
unless a user specifies option(s) to permit installing from external
|
||||||
|
links.
|
||||||
|
|
||||||
|
Maintainers of packages which currently host release files on non-PyPI
|
||||||
|
sites shall receive instructions and tools to ease "re-hosting" of
|
||||||
|
their historic and future package release files. This re-hosting tool
|
||||||
|
MUST be available before automated hosting-mode changes are announced
|
||||||
|
to package maintainers.
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Hosting modes
|
||||||
|
-------------
|
||||||
|
|
||||||
|
The foundation of the first transition phase is the introduction of
|
||||||
|
three "modes" of PyPI hosting for a package, affecting which links are
|
||||||
|
generated for the ``simple/`` index. These modes are implemented
|
||||||
|
without requiring changes to installation tools via changes to the
|
||||||
|
algorithm for generating the machine-readable ``simple/`` index.
|
||||||
|
|
||||||
|
The modes are:
|
||||||
|
|
||||||
|
- ``pypi-scrape-crawl``: no change from the current situation of
|
||||||
|
generating machine-readable links for installation tools, as
|
||||||
|
outlined in the history_.
|
||||||
|
|
||||||
|
- ``pypi-scrape``: for a package in this mode, links to be added to
|
||||||
|
the ``simple/`` index are still scraped from package
|
||||||
|
metadata. However, the "Home-page" and "Download-url" links are
|
||||||
|
given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
|
||||||
|
instead of ``rel=homepage`` and ``rel=download``. The effect of this
|
||||||
|
(with no change in installation tools necessary) is that these links
|
||||||
|
will not be followed and scraped for further candidate links by
|
||||||
|
present-day installation tools: only installable files directly
|
||||||
|
hosted from PyPI or linked directly from PyPI metadata will be
|
||||||
|
considered for installation. Installation tools MAY evolve to offer
|
||||||
|
an option to use the new rel-attribution to crawl external pages but
|
||||||
|
MUST NOT default to it.
|
||||||
|
|
||||||
|
- ``pypi-explicit``: for a package in this mode, only links to release
|
||||||
|
files uploaded to PyPI, and external links to release files
|
||||||
|
explicitly nominated by the package owner (via a new interface
|
||||||
|
exposed by PyPI) will be added to the ``simple/`` index.
|
||||||
|
|
||||||
|
Thus the hope is that eventually all projects on PyPI can be migrated
|
||||||
|
to the ``pypi-explicit`` mode, while preserving the ability to install
|
||||||
|
release files hosted externally via installer tools. Deprecation of
|
||||||
|
hosting modes to eventually only allow the ``pypi-explicit`` mode is
|
||||||
|
NOT REGULATED by this PEP but is expected to become feasible some time
|
||||||
|
after successful implementation of the transition phases described in
|
||||||
|
this PEP. It is expected that deprecation requires **a new process to
|
||||||
|
deal with abandoned packages** because of unreachable maintainers for
|
||||||
|
still popular packages.
|
||||||
|
|
||||||
|
|
||||||
|
First transition phase (PyPI)
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
The proposed solution consists of multiple implementation and
|
||||||
|
communication steps:
|
||||||
|
|
||||||
|
#. Implement in PyPI the three modes described above, with an
|
||||||
|
interface for package owners to select the mode for each package
|
||||||
|
and register explicit external file URLs.
|
||||||
|
|
||||||
|
#. For packages in all modes, label all links in the ``simple/`` index
|
||||||
|
with ``rel="internal"`` or ``rel="external"``, to make it easier
|
||||||
|
for client tools to distinguish the types of links in the second
|
||||||
|
transition phase.
|
||||||
|
|
||||||
|
#. Default all newly-registered packages to ``pypi-explicit`` mode
|
||||||
|
(package owners can still switch to the other modes as desired).
|
||||||
|
|
||||||
|
#. Determine (via an automated analysis tool) which packages have all
|
||||||
|
installable files available on PyPI itself (group A), which have
|
||||||
|
all installable files linked directly from PyPI metadata (group B),
|
||||||
|
and which have installable versions available that are linked only
|
||||||
|
from external homepage/download HTML pages (group C).
|
||||||
|
|
||||||
|
#. Send mail to maintainers of projects in group A that their project
|
||||||
|
will be automatically configured to ``pypi-explicit`` mode in one
|
||||||
|
month, and similarly to maintainers of projects in group B that
|
||||||
|
their project will be automatically configured to ``pypi-scrape``
|
||||||
|
mode. Inform them that this change is not expected to affect
|
||||||
|
installability of their project at all, but will result in faster
|
||||||
|
and safer installs for their users. Encourage them to set this
|
||||||
|
mode themselves sooner to benefit their users.
|
||||||
|
|
||||||
|
#. Send mail to maintainers of packages in group C that their package
|
||||||
|
hosting mode is ``pypi-scrape-crawl``, list the URLs which
|
||||||
|
currently are crawled, and suggest that they either re-host their
|
||||||
|
packages directly on PyPI and switch to ``pypi-explicit``, or at
|
||||||
|
least provide direct links to release files in PyPI metadata and
|
||||||
|
switch to ``pypi-scrape``. Provide instructions and tools to help
|
||||||
|
with these transitions.
|
||||||
|
|
||||||
|
|
||||||
|
Second transition phase (installer tools)
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
For the second transition phase, maintainers of installation tools are
|
||||||
|
asked to release two updates.
|
||||||
|
|
||||||
|
The first update shall provide clear warnings if externally-hosted
|
||||||
|
release files (that is, files whose link is ``rel="external"``) are
|
||||||
|
selected for download, for which projects and URLs exactly this
|
||||||
|
happens, and warn that in future versions externally-hosted downloads
|
||||||
|
will be disabled by default.
|
||||||
|
|
||||||
|
The second update should change the default mode to allow only
|
||||||
|
installation of ``rel="internal"`` package files, and allow
|
||||||
|
installation of externally-hosted packages only when the user supplies
|
||||||
|
an option (ideally an option specifying exactly which external domains
|
||||||
|
are to be trusted as download sources). When download of an
|
||||||
|
externally-hosted package is disallowed, the user should be notified,
|
||||||
|
with instructions for how to make the install succeed and warnings
|
||||||
|
about the implication (that a file will be downloaded from a site that
|
||||||
|
is not part of the package index).
|
||||||
|
|
||||||
|
|
||||||
|
Open Questions / tasks
|
||||||
|
======================
|
||||||
|
|
||||||
|
- Should we introduce some form of PyPI API versioning in this PEP?
|
||||||
|
(it might complicate matters and delay the implementation but is
|
||||||
|
often seen as good practise).
|
||||||
|
|
||||||
|
- Do another round of discussions with installation tool authors and
|
||||||
|
see about incorporating their feedback. There is one known issue in
|
||||||
|
particular from Philip J. Eby who considers a host-based pattern
|
||||||
|
matching algorithm preferable to interpreting "rel" attributes.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted,
|
||||||
|
http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html
|
||||||
|
(XXX need to update this data for all easy_install-supported formats)
|
||||||
|
|
||||||
|
.. [2] Marc-Andre Lemburg, reasons for external hosting,
|
||||||
|
http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html
|
||||||
|
|
||||||
|
.. [3] Holger Krekel, script to remove homepage/download metadata for
|
||||||
|
all releases
|
||||||
|
http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html
|
||||||
|
|
||||||
|
|
||||||
|
Acknowledgments
|
||||||
|
===============
|
||||||
|
|
||||||
|
Philip Eby for precise information and the basic ideas to implement
|
||||||
|
the transition via server-side changes only.
|
||||||
|
|
||||||
|
Donald Stufft for pushing away from external hosting and offering to
|
||||||
|
implement both a Pull Request for the necessary PyPI changes and the
|
||||||
|
analysis tool to drive the transition phase 1.
|
||||||
|
|
||||||
|
Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for
|
||||||
|
thinking through issues regarding getting rid of "external hosting".
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
..
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
coding: utf-8
|
||||||
|
End:
|
Loading…
Reference in New Issue