473 lines
20 KiB
Plaintext
473 lines
20 KiB
Plaintext
PEP: 438
|
||
Title: Transitioning to release-file hosting on PyPI
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Holger Krekel <holger@merlinux.eu>, Carl Meyer <carl@oddbird.net>
|
||
BDFL-Delegate: Richard Jones <richard@python.org>
|
||
Discussions-To: distutils-sig@python.org
|
||
Status: Superseded
|
||
Type: Process
|
||
Topic: Packaging
|
||
Content-Type: text/x-rst
|
||
Created: 15-Mar-2013
|
||
Post-History: 19-May-2013
|
||
Superseded-By: 470
|
||
Resolution: https://mail.python.org/pipermail/distutils-sig/2013-May/020773.html
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes a backward-compatible two-phase transition process
|
||
to speed up, simplify and robustify installing from the
|
||
pypi.python.org (PyPI) package index. To ease the transition and
|
||
minimize client-side friction, **no changes to distutils or existing
|
||
installation tools are required in order to benefit from the first
|
||
transition phase, which will result in faster, more reliable installs
|
||
for most existing packages**.
|
||
|
||
The first transition phase implements easy and explicit means for a
|
||
package maintainer to control which release file links are served to
|
||
present-day installation tools. The first phase also includes the
|
||
implementation of analysis tools for present-day packages, to support
|
||
communication with package maintainers and the automated setting of
|
||
default modes for controlling release file links. The first phase
|
||
also will default newly-registered projects on PyPI to only serve
|
||
links to release files which were uploaded to PyPI.
|
||
|
||
The second transition phase concerns end-user installation tools,
|
||
which shall default to only install release files that are hosted on
|
||
PyPI and tell the user if external release files exist, offering a
|
||
choice to automatically use those external files. External release
|
||
files shall in the future be registered together with a checksum
|
||
hash so that installation tools can verify the integrity of the
|
||
eventual download (PyPI-hosted release files always carry such
|
||
a checksum).
|
||
|
||
Alternative PyPI server implementations should implement the new
|
||
simple index serving behaviour of transition phase 1 to avoid
|
||
installation tools treating their release links as external ones in
|
||
phase 2.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
.. _history:
|
||
|
||
History and motivations for external hosting
|
||
--------------------------------------------
|
||
|
||
When PyPI went online, it offered release registration but had no
|
||
facility to host release files itself. When hosting was added, no
|
||
automated downloading tool existed yet. When Phillip Eby implemented
|
||
automated downloading (through setuptools), he made the choice to
|
||
allow people to use download hosts of their choice. The finding of
|
||
externally-hosted packages was implemented as follows:
|
||
|
||
#. The PyPI ``simple/`` index for a package contains all links found
|
||
by scraping them from that package's long_description metadata for
|
||
any release. Links in the "Download-URL" and "Home-page" metadata
|
||
fields are given ``rel=download`` and ``rel=homepage`` attributes,
|
||
respectively.
|
||
|
||
#. Any of these links whose target is a file whose name appears to be
|
||
in the form of an installable source or binary distribution, with
|
||
name in the form "packagename-version.ARCHIVEEXT", is considered a
|
||
potential installation candidate by installation tools.
|
||
|
||
#. Similarly, any links suffixed with an "#egg=packagename-version"
|
||
fragment are considered an installation candidate.
|
||
|
||
#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
|
||
crawled by installation tools and, if HTML, are themselves scraped
|
||
for release-file links in the above formats.
|
||
|
||
See the easy_install documentation for a complete description of this
|
||
behavior. [1]_
|
||
|
||
Today, most packages indexed on PyPI host their release files on
|
||
PyPI. Out of 29,117 total projects on PyPI, only 2,581 (less than 10%)
|
||
include any links to installable files that are available only
|
||
off-PyPI. [2]_
|
||
|
||
There are many reasons [3]_ why people have chosen external
|
||
hosting. To cite just a few:
|
||
|
||
- release processes and scripts have been developed already and upload
|
||
to external sites
|
||
|
||
- it takes too long to upload large files from some places in the
|
||
world
|
||
|
||
- export restrictions e.g. for crypto-related software
|
||
|
||
- company policies which require offering open source packages through
|
||
own sites
|
||
|
||
- problems with integrating uploading to PyPI into one's release
|
||
process (because of release policies)
|
||
|
||
- desiring download statistics different from those maintained by PyPI
|
||
|
||
- perceived bad reliability of PyPI
|
||
|
||
- not aware that PyPI offers file-hosting
|
||
|
||
Irrespective of the present-day validity of these reasons, there
|
||
clearly is a history why people choose to host files externally and it
|
||
even was for some time the only way you could do things. This PEP
|
||
takes the position that there remain some valid reasons for
|
||
external hosting even today.
|
||
|
||
Problem
|
||
-------
|
||
|
||
**Today, python package installers (pip, easy_install, buildout, and
|
||
others) often need to query many non-PyPI URLs even if there are no
|
||
externally hosted files**. Apart from querying pypi.python.org's
|
||
simple index pages, also all homepages and download pages ever
|
||
specified with any release of a package are crawled by an installer.
|
||
The need for installers to crawl external sites slows down
|
||
installation and makes for a brittle and unreliable installation
|
||
process. Those sites and packages also don't take part in the
|
||
:pep:`381` mirroring infrastructure, further decreasing reliability
|
||
and speed of automated installation processes around the world.
|
||
|
||
Most packages are hosted directly on pypi.python.org [2]_. Even for
|
||
these packages, installers still crawl their homepage and
|
||
download-url, if specified. Many package uploaders are not aware that
|
||
specifying the "homepage" or "download-url" in their package metadata
|
||
will needlessly slow down the installation process for all users.
|
||
|
||
Relying on third party sites also opens up more attack vectors for
|
||
injecting malicious packages into sites using automated installs. A
|
||
simple attack might just involve getting hold of an old now-unused
|
||
homepage domain and placing malicious packages there. Moreover,
|
||
performing a Man-in-The-Middle (MITM) attack between an installation
|
||
site and any of the download sites can inject malicious packages on
|
||
the installation site. As many homepages and download locations are
|
||
using HTTP and not HTTPS, such attacks are not hard to launch. Such
|
||
MITM attacks can easily happen even for packages which never intended
|
||
to host files externally as their homepages are contacted by
|
||
installers anyway.
|
||
|
||
There is currently no way for package maintainers to avoid
|
||
external-link crawling, other than removing all homepage/download url
|
||
metadata for all historic releases. While a script [4]_ has been
|
||
written to perform this action, it is not a good general solution
|
||
because it removes useful metadata from PyPI releases.
|
||
|
||
Even if the sites referenced by "Homepage" and "Download-URL" links
|
||
were not scraped for further links, there is no obvious way under the
|
||
current system for a package owner to link to an installable file from
|
||
a long_description metadata field (which is shown as package
|
||
documentation on ``/pypi/PKG``) without installation tools
|
||
automatically considering that file a candidate for installation.
|
||
Conversely, there is no way to explicitly register multiple external
|
||
release files without putting them in metadata fields.
|
||
|
||
|
||
Goals
|
||
-----
|
||
|
||
These are the goals to be achieved by implementation of this PEP:
|
||
|
||
* Package owners should be able to explicitly control which files are
|
||
presented by PyPI to installer tools as installation
|
||
candidates. Installation should not be slowed and made less reliable
|
||
by extensive and unnecessary crawling of links that package owners
|
||
did not explicitly nominate as installation files.
|
||
|
||
* It should remain possible for package owners to choose to host their
|
||
release files on their own hosting, external to PyPI. It should be
|
||
easy for a user to request the installation of such releases using
|
||
automated installer tools, especially if the external release files
|
||
were registered together with a checksum hash.
|
||
|
||
* Automated installer tools should not install externally-hosted
|
||
packages **by default**, but require explicit authorization to do so
|
||
by the user. When tools refuse to install such a package by default,
|
||
they should tell the user exactly which external link(s) the
|
||
installer needs to follow, and what option(s) the user can provide
|
||
to authorize the tool to follow those links. PyPI should provide all
|
||
necessary metadata for installer tools to implement this easily and
|
||
within a single request/reply interaction.
|
||
|
||
* Migration from the status quo to the above points should be gradual
|
||
and minimize breakage. This includes tooling that makes it easy for
|
||
package owners with an existing release process that uploads to
|
||
non-PyPI hosting to also upload those release files to PyPI.
|
||
|
||
|
||
Solution / two transition phases
|
||
================================
|
||
|
||
The first transition phase introduces a "hosting-mode" field for each
|
||
project on PyPI, allowing package owners explicit control of which
|
||
release file links are served to present-day installation tools in the
|
||
machine-readable ``simple/`` index. The first transition will, after
|
||
successful hosting-mode manipulations by individual early-adopters,
|
||
set a default hosting mode for existing packages, based on automated
|
||
analysis. **Maintainers will be notified one month ahead of any such
|
||
automated change**. At completion of the first transition phase,
|
||
**all present-day existing release and installation processes and
|
||
tools are expected to continue working**. Any remaining errors or
|
||
problems are expected to only relate to installation of individual
|
||
packages and can be easily corrected by package maintainers or PyPI
|
||
admins if maintainers are not reachable.
|
||
|
||
Also in the first phase, each link served in the ``simple/`` index
|
||
will be explicitly marked as ``rel="internal"`` if it is hosted by the
|
||
index itself (even if on a separate domain, which may be the case if
|
||
the index uses a CDN for file-serving). Any link not so marked will be
|
||
considered an external link.
|
||
|
||
In the second transition phase, PyPI client installation tools shall
|
||
be updated to default to only install ``rel="internal"`` packages
|
||
unless a user specifies option(s) to permit installing from external
|
||
links. See `second transition phase`_ for details on how installers
|
||
should behave.
|
||
|
||
Maintainers of packages which currently host release files on non-PyPI
|
||
sites shall receive instructions and tools to ease "re-hosting" of
|
||
their historic and future package release files. This re-hosting tool
|
||
MUST be available before automated hosting-mode changes are announced
|
||
to package maintainers.
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
Hosting modes
|
||
-------------
|
||
|
||
The foundation of the first transition phase is the introduction of
|
||
three "modes" of PyPI hosting for a package, affecting which links are
|
||
generated for the ``simple/`` index. These modes are implemented
|
||
without requiring changes to installation tools via changes to the
|
||
algorithm for generating the machine-readable ``simple/`` index.
|
||
|
||
The modes are:
|
||
|
||
- ``pypi-scrape-crawl``: no change from the current situation of
|
||
generating machine-readable links for installation tools, as
|
||
outlined in the history_.
|
||
|
||
- ``pypi-scrape``: for a package in this mode, links to be added to
|
||
the ``simple/`` index are still scraped from package
|
||
metadata. However, the "Home-page" and "Download-url" links are
|
||
given ``rel=ext-homepage`` and ``rel=ext-download`` attributes
|
||
instead of ``rel=homepage`` and ``rel=download``. The effect of this
|
||
(with no change in installation tools necessary) is that these links
|
||
will not be followed and scraped for further candidate links by
|
||
present-day installation tools: only installable files directly
|
||
hosted from PyPI or linked directly from PyPI metadata will be
|
||
considered for installation. Installation tools MAY evolve to offer
|
||
an option to use the new rel-attribution to crawl external pages but
|
||
MUST NOT default to it.
|
||
|
||
- ``pypi-explicit``: for a package in this mode, only links to release
|
||
files uploaded to PyPI, and external links to release files
|
||
explicitly nominated by the package owner, will be added to the
|
||
``simple/`` index. PyPI will provide a new interface for package
|
||
owners to supply external release-file URLs. These URLs MUST include
|
||
a URL fragment in the form "#hashtype=hashvalue" specifying a hash
|
||
of the externally-linked file which installer tools MUST use to
|
||
validate that they have downloaded the intended file.
|
||
|
||
Thus the hope is that eventually all projects on PyPI can be migrated
|
||
to the ``pypi-explicit`` mode, while preserving the ability to install
|
||
release files hosted externally via installer tools. Deprecation of
|
||
hosting modes to eventually only allow the ``pypi-explicit`` mode is
|
||
NOT REGULATED by this PEP but is expected to become feasible some time
|
||
after successful implementation of the transition phases described in
|
||
this PEP. It is expected that deprecation requires **a new process to
|
||
deal with abandoned packages** because of unreachable maintainers for
|
||
still popular packages.
|
||
|
||
|
||
First transition phase (PyPI)
|
||
-----------------------------
|
||
|
||
The proposed solution consists of multiple implementation and
|
||
communication steps:
|
||
|
||
#. Implement in PyPI the three modes described above, with an
|
||
interface for package owners to select the mode for each package
|
||
and register explicit external file URLs.
|
||
|
||
#. For packages in all modes, label links in the ``simple/`` index to
|
||
index-hosted files with ``rel="internal"``, to make it easier for
|
||
client tools to distinguish these links in the second phase.
|
||
|
||
#. Add an HTML tag ``<meta name="api-version" value="2">`` to all
|
||
``simple/`` index pages, to allow clients to distinguish between
|
||
indexes providing the ``rel="internal"`` metadata and older ones
|
||
that do not.
|
||
|
||
#. Default all newly-registered packages to ``pypi-explicit`` mode
|
||
(package owners can still switch to the other modes as desired).
|
||
|
||
#. Determine (via automated analysis [2]_) which packages have all
|
||
installable files available on PyPI itself (group A), which have
|
||
all installable files on PyPI or linked directly from PyPI metadata
|
||
(group B), and which have installable versions available that are
|
||
linked only from external homepage/download HTML pages (group C).
|
||
|
||
#. Send mail to maintainers of projects in group A that their project
|
||
will be automatically configured to ``pypi-explicit`` mode in one
|
||
month, and similarly to maintainers of projects in group B that
|
||
their project will be automatically configured to ``pypi-scrape``
|
||
mode. Inform them that this change is not expected to affect
|
||
installability of their project at all, but will result in faster
|
||
and safer installs for their users. Encourage them to set this
|
||
mode themselves sooner to benefit their users.
|
||
|
||
#. Send mail to maintainers of packages in group C that their package
|
||
hosting mode is ``pypi-scrape-crawl``, list the URLs which
|
||
currently are crawled, and suggest that they either re-host their
|
||
packages directly on PyPI and switch to ``pypi-explicit``, or at
|
||
least provide direct links to release files in PyPI metadata and
|
||
switch to ``pypi-scrape``. Provide instructions and tools to help
|
||
with these transitions.
|
||
|
||
|
||
.. _`second transition phase`:
|
||
|
||
Second transition phase (installer tools)
|
||
-----------------------------------------
|
||
|
||
For the second transition phase, maintainers of installation tools are
|
||
asked to release two updates.
|
||
|
||
The first update shall provide clear warnings if externally-hosted
|
||
release files (that is, files whose link does not include
|
||
``rel="internal"``) are selected for download, for which projects and
|
||
URLs exactly this happens, and warn that in future versions
|
||
externally-hosted downloads will be disabled by default.
|
||
|
||
The second update should change the default mode to allow only
|
||
installation of ``rel="internal"`` package files, and allow
|
||
installation of externally-hosted packages only when the user supplies
|
||
an option.
|
||
|
||
The installer should distinguish between verifiable and non-verifiable
|
||
external links. A verifiable external link is a direct link to an
|
||
installable file from the PyPI ``simple/`` index that includes a hash
|
||
in the URL fragment ("#hashtype=hashvalue") which can be used to
|
||
verify the integrity of the downloaded file. A non-verifiable external
|
||
link is any link (other than those explicitly supplied by the user of
|
||
an installer tool) without a hash, scraped from external HTML, or
|
||
injected into the search via some other non-PyPI source
|
||
(e.g. setuptools' ``dependency_links`` feature).
|
||
|
||
Installers should provide a blanket option to allow
|
||
installing any verifiable external link. Non-verifiable external links
|
||
should only be installed if the user-provided option specifies exactly
|
||
which external domains can be used or for which specific package names
|
||
external links can be used.
|
||
|
||
When download of an externally-hosted package is disallowed by the
|
||
default configuration, the user should be notified, with instructions
|
||
for how to make the install succeed and warnings about the implication
|
||
(that a file will be downloaded from a site that is not part of the
|
||
package index). The warning given for non-verifiable links should
|
||
clearly state that the installer cannot verify the integrity of the
|
||
downloaded file. The warning given for verifiable external links
|
||
should simply note that the file will be downloaded from an external
|
||
URL, but that the file integrity can be verified by checksum.
|
||
|
||
Alternative PyPI-compatible index implementations should upgrade to
|
||
begin providing the ``rel="internal"`` metadata and the ``<meta
|
||
name="api-version" value="2">`` tag as soon as possible. For
|
||
alternative indexes which do not yet provide the meta tag in their
|
||
``simple/`` pages, installation tools should provide
|
||
backwards-compatible fallback behavior (treat links as internal as in
|
||
pre-PEP times and provide a warning).
|
||
|
||
|
||
API For Submitting External Distribution URLs
|
||
---------------------------------------------
|
||
|
||
New distribution URLs may be submitted by performing a HTTP POST to
|
||
the URL:
|
||
|
||
https://pypi.python.org/pypi
|
||
|
||
With the following form-encoded data:
|
||
|
||
============== ================================
|
||
Name Value
|
||
-------------- --------------------------------
|
||
:action The string "urls"
|
||
name The package name as a string
|
||
version The release version as a string
|
||
new-url The new URL to store
|
||
submit_new_url The string "yes"
|
||
============== ================================
|
||
|
||
The POST must be accompanied by an HTTP Basic Auth header encoding the
|
||
username and password of the user authorized to maintain the package
|
||
on PyPI.
|
||
|
||
The HTTP response to this request will be one of:
|
||
|
||
======= ============ ================================================
|
||
Code Meaning URL submission implications
|
||
------- ------------ ------------------------------------------------
|
||
200 OK Everything worked just fine
|
||
400 Bad request Data provided for submission was malformed
|
||
401 Unauthorised The username or password supplied were incorrect
|
||
403 Forbidden User does not have permission to update the
|
||
package information (not Owner or Maintainer)
|
||
======= ============ ================================================
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] Phillip Eby, easy_install 'Package Index "API"' documentation,
|
||
http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
|
||
|
||
.. [2] Donald Stufft, automated analysis of PyPI project links,
|
||
https://github.com/dstufft/pypi.linkcheck
|
||
|
||
.. [3] Marc-Andre Lemburg, reasons for external hosting,
|
||
https://mail.python.org/pipermail/catalog-sig/2013-March/005626.html
|
||
|
||
.. [4] Holger Krekel, script to remove homepage/download metadata for
|
||
all releases
|
||
https://mail.python.org/pipermail/catalog-sig/2013-February/005423.html
|
||
|
||
|
||
Acknowledgments
|
||
===============
|
||
|
||
Phillip Eby for precise information and the basic ideas to implement
|
||
the transition via server-side changes only.
|
||
|
||
Donald Stufft for pushing away from external hosting and offering to
|
||
implement both a Pull Request for the necessary PyPI changes and the
|
||
analysis tool to drive the transition phase 1.
|
||
|
||
Marc-Andre Lemburg, Nick Coghlan and catalog-sig in general for
|
||
thinking through issues regarding getting rid of "external hosting".
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|