Update PEP 470 with feedback from distutils-sig

This commit is contained in:
Donald Stufft 2014-10-13 13:59:22 -04:00
parent 6b6e86d391
commit d19108d0e6
1 changed files with 174 additions and 143 deletions

View File

@ -24,7 +24,7 @@ located and what they need to do to enable this additional repository. In
addition to adding discovery information to make explicit multiple repositories
easy to use, this PEP also deprecates and removes the implicit multiple
repository support which currently functions through directly or indirectly
linking offsite via the simple API. Finally this PEP also proposes deprecating
linking off site via the simple API. Finally this PEP also proposes deprecating
and removing the functionality added by PEP 438, particularly the additional
rel information and the meta tag to indicate the API version.
@ -52,17 +52,24 @@ This gives PyPI two equally important roles that it plays in the Python
ecosystem, that of index to enable easy discovery of Python projects and
central repository to enable easy hosting, download, and installation of Python
projects. Due to the history behind PyPI and the very organic growth it has
experienced the lines between these two roles are blurry, and this blurriness
has caused confusion for the end users of both of these roles and this has in
turn caused ire between people attempting to use PyPI in different capacities,
most often when end users want to use PyPI as a repository but the author wants
to use PyPI soley as an index.
experienced the lines between these two roles are blurry, and this blurring has
caused confusion for the end users of both of these roles and this has in turn
caused ire between people attempting to use PyPI in different capacities, most
often when end users want to use PyPI as a repository but the author wants to
use PyPI solely as an index.
By moving to using explict multiple repositories we can make the lines between
these two roles much more explicit and remove the "hidden" surprises caused
by the current implementation of handling people who do not want to use PyPI
as a repository. However simply moving to explicit multiple repositories is
a regression in discoverablity, and for that reason this PEP adds an extension
This confusion comes down to end users of projects not realizing if a project
is hosted on PyPI or if it relies on an external service. This often manifests
itself when the external service is down but PyPI is not. People will see that
PyPI works, and other projects works, but this one specific one does not. They
often times do not realize who they need to contact in order to get this fixed
or what their remediation steps are.
By moving to using explicit multiple repositories we can make the lines between
these two roles much more explicit and remove the "hidden" surprises caused by
the current implementation of handling people who do not want to use PyPI as a
repository. However simply moving to explicit multiple repositories is a
regression in discoverability, and for that reason this PEP adds an extension
to the current simple API which will enable easy discovery of the specific
repository that a project can be found in.
@ -79,11 +86,32 @@ PyPI providing speed ups for a lot of people, however it did so by introducing
a new point of confusion and pain for both the end users and the authors.
Key User Experience Expectations
--------------------------------
#. Easily allow external hosting to "just work" when appropriately configured
at the system, user or virtual environment level.
#. Easily allow package authors to tell PyPI "my releases are hosted <here>"
and have that advertised in such a way that tools can clearly communicate it
to users, without silently introducing unexpected dependencies on third
party services.
#. Eliminate any and all references to the confusing "verifiable external" and
"unverifiable external" distinction from the user experience (both when
installing and when releasing packages).
#. The repository aspects of PyPI should become *just* the default package
hosting location (i.e. the only one that is treated as opt-out rather than
opt-in by most client tools in their default configuration). Aside from that
aspect, hosting on PyPI should not otherwise provide an enhanced user
experience over hosting your own package repository.
#. Do all of the above while providing default behaviour that is secure against
most attackers below the nation state adversary level.
Why Additional Repositories?
----------------------------
The two common installer tools, pip and easy_install/setuptools, both support
the concept of additional locations to search for files to satisify the
the concept of additional locations to search for files to satisfy the
installation requirements and have done so for many years. This means that
there is no need to "phase" in a new flag or concept and the solution to
installing a project from a repository other than PyPI will function regardless
@ -91,43 +119,17 @@ of how old (within reason) the end user's installer is. Not only has this
concept existed in the Python tooling for some time, but it is a concept that
exists across languages and even extending to the OS level with OS package
tools almost universally using multiple repository support making it extremely
likely that someone is already familar with the concept.
likely that someone is already familiar with the concept.
Additionally, the multiple repository approach is a concept that is useful
outside of the narrow scope of allowing projects which wish to be included on
the index portion of PyPI but do not wish to utilize the repository portion
of PyPI. This includes places where a company may wish to host a repository
that contains their internal packages or where a project may wish to have
multiple "channels" of releases, such as alpha, beta, release candidate, and
final release.
Setting up an external repository is very simple, it can be achieved with
nothing more than a filesystem, some files to host, and any web server capable
of serving files and generating an automated index of directories (commonly
called "autoindex"). This can be as simple as:
::
$ mkdir -p /var/www/index.example.com/
$ mkdir -p /var/www/index.example.com/myproject/
$ mv ~/myproject-1.0.tar.gz /var/www/index.example.com/myproject/
$ twistd -n web --path /var/www/index.example.com/
Using this additional location within pip is also simple and can be included
on a per invocation, per shell, or per user basis. The pip 6.0 will also
include the ability to configure this on a per virtual environment or per
machine basis as well. This can be as simple as:
::
$ # As a CLI argument
$ pip install --extra-index-url https://index.example.com/ myproject
$ # As an environment variable
$ PIP_EXTRA_INDEX_URL=https://pypi.example.com/ pip install myproject
$ # With a configuration file
$ echo "[global]\nextra-index-url = https://pypi.example.com/" > ~/.pip/pip.conf
$ pip install myproject
the index portion of PyPI but do not wish to utilize the repository portion of
PyPI. This includes places where a company may wish to host a repository that
contains their internal packages or where a project may wish to have multiple
"channels" of releases, such as alpha, beta, release candidate, and final
release. This could also be used for projects wishing to host files which
cannot be uploaded to PyPI, such as multi-gigabyte data files or, currently at
least, Linux Wheels.
Why Not PEP 438 or Similar?
@ -138,11 +140,11 @@ for quite some time support for PEP 438 has only existed in pip since the 1.4
version, and still has yet to be implemented in setuptools. The design of
PEP 438 did mean that users still benefited for projects which did not require
external files even with older installers, however for projects which *did*
require external files, users are still silently being given either
potentionally unreliable or, even worse, unsafe files to download. This system
is also unique to Python as it arises out of the history of PyPI, this means
that it is almost certain that this concept will be foreign to most, if not all
users, until they encounter it while attempting to use the Python toolchain.
require external files, users are still silently being given either potentially
unreliable or, even worse, unsafe files to download. This system is also unique
to Python as it arises out of the history of PyPI, this means that it is almost
certain that this concept will be foreign to most, if not all users, until they
encounter it while attempting to use the Python toolchain.
Additionally, the classification system proposed by PEP 438 has, in practice,
turned out to be extremely confusing to end users, so much so that it is a
@ -158,33 +160,28 @@ install.
This UX failure exists for several reasons.
1. If pip can locate files at all for a project on the Simple API it will
#. If pip can locate files at all for a project on the Simple API it will
simply use that instead of attempting to locate more. This is generally the
right thing to do as attempting to locate more would erase a large part of
the benefit of PEP 438. This means that if a project *ever* uploaded
a file that matches what the user has requested for install that will be
used regardless of how old it is.
2. PEP 438 makes an implicit assumption that most projects would either upload
the benefit of PEP 438. This means that if a project *ever* uploaded a file
that matches what the user has requested for install that will be used
regardless of how old it is.
#. PEP 438 makes an implicit assumption that most projects would either upload
themselves to PyPI or would update themselves to directly linking to release
files. While a large number of projects *did* ultimately decide to upload
to PyPI, some of them did so only because the UX around what PEP 438 was so
bad that they felt forced to do so. More concerning however, is the fact
that very few projects have opted to directly and safely link to files and
files. While a large number of projects did ultimately decide to upload to
PyPI, some of them did so only because the UX around what PEP 438 was so bad
that they felt forced to do so. More concerning however, is the fact that
very few projects have opted to directly and safely link to files and
instead they still simply link to pages which must be scraped in order to
find the actual files, thus rendering the safe variant
(``--allow-external``) largely useless.
3. Even if an author wishes to directly link to their files, doing so safely is
#. Even if an author wishes to directly link to their files, doing so safely is
non-obvious. It requires the inclusion of a MD5 hash (for historical
reasons) in the hash of the URL. If they do not include this then their
files will be considered "unverified".
4. PEP 438 takes a security centric view and disallows any form of a global
opt in for unverified projects. While this is generally a good thing, it
creates extremely verbose and repetive command invocations such as:
::
#. PEP 438 takes a security centric view and disallows any form of a global opt
in for unverified projects. While this is generally a good thing, it creates
extremely verbose and repetitive command invocations such as::
$ pip install --allow-external myproject --allow-unverified myproject myproject
$ pip install --allow-all-external --allow-unverified myproject myproject
@ -195,7 +192,7 @@ Multiple Repository/Index Support
Installers SHOULD implement or continue to offer, the ability to point the
installer at multiple URL locations. The exact mechanisms for a user to
indicate they wish to use an additional location is left up to each indidivdual
indicate they wish to use an additional location is left up to each individual
implementation.
Additionally the mechanism discovering an installation candidate when multiple
@ -210,11 +207,12 @@ essentially treating it as if it were one large repository.
Installers SHOULD also implement some mechanism for removing or otherwise
disabling use of the default repository. The exact specifics of how that is
achieved is up to each indidivdual implementation.
achieved is up to each individual implementation.
End users wishing to limit what files they pull from which repository can
simply use `devpi <http://doc.devpi.net/latest/>`_ to whitelist projects from
PyPI or another repository.
Installers SHOULD also implement some mechanism for whitelisting and
blacklisting which projects a user wishes to install from a particular
repository. The exact specifics of how that is achieved is up to each
individual implementation.
External Index Discovery
@ -234,43 +232,47 @@ page however they will not be linked or provided in a form that older
installers will automatically search them.
This ability will take the form of a ``<meta>`` tag. The name of this tag must
be set to ``external-repository`` and the content will be a link to the location
of the external repository. An optional data-description attribute will convey
be set to ``repository`` or ``find-link`` and the content will be a link to the
location of the repository. An optional data-description attribute will convey
any comments or description that the author has provided.
An example would look something like:
An example would look something like::
::
<meta name="external-repository" content="https://index.example.com/" data-description="Primary Repository">
<meta name="external-repository" content="https://index.example.com/Ubuntu-14.04/" data-description="Wheels built for Ubuntu 14.04">
When an external repository is added to a project, new uploads will no longer
be permitted to that project. However any existing files will simply be hidden
from the simple API and the web interface until all of the external repositories
are removed, in which case they will be visible again. PyPI MUST warn authors
if adding an external repository will hide files and that warning must persist
on any of the project management pages for that particular project.
<meta name="repository" content="https://index.example.com/" data-description="Primary Repository">
<meta name="repository" content="https://index.example.com/Ubuntu-14.04/" data-description="Wheels built for Ubuntu 14.04">
<meta name="find-link" content="https://links.example.com/find-links/" data-description="A flat index for find links">
When an installer fetches the simple page for a project, if it finds this
additional meta-data and it cannot find any files for that project in it's
configured URLs then it should use this data to tell the user how to add one
or more of the additional URLs to search in. This message should include any
comments that the project has included to enable them to communicate to the
additional meta-data then it should use this data to tell the user how to add
one or more of the additional URLs to search in. This message should include
any comments that the project has included to enable them to communicate to the
user and provide hints as to which URL they might want (e.g. if some are only
useful or compatible with certain platforms or situations). When the installer
has implemented the auto discovery mechanisms they should also deprecate any
of the mechanisms added for PEP 438 (such as ``--allow-external``) for removal
at the end of the deprecation period proposed by the PEP.
has implemented the auto discovery mechanisms they should also deprecate any of
the mechanisms added for PEP 438 (such as ``--allow-external``) for removal at
the end of the deprecation period proposed by the PEP.
This feature *must* be added to PyPI prior to starting the deprecation and
removal process for the implicit offsite hosting functionality.
In addition to the API for programtic access to the registered external
repositories, PyPI will also prevent these URLs in the UI so that users with
an installer that does not implement the discovery mechanism can still easily
discover what repository the project is using to host itself.
This feature **MUST** be added to PyPI and be contained in a released version
of pip prior to starting the deprecation and removal process for the implicit
offsite hosting functionality.
Deprecation and Removal of Link Spidering
=========================================
.. important:: The deprecation specified in this section **MUST** not start to
until after the discovery mechanisms have been implemented and released in
pip.
The only exception to this is the addition of the ``pypi-only`` mode and
defaulting new projects to it without abilility to switch to a different
mode.
A new hosting mode will be added to PyPI. This hosting mode will be called
``pypi-only`` and will be in addition to the three that PEP 438 has already
given us which are ``pypi-explicit``, ``pypi-scrape``, ``pypi-scrape-crawl``.
@ -295,16 +297,18 @@ After that switch, an email will be sent to projects which rely on hosting
external to PyPI. This email will warn these projects that externally hosted
files have been deprecated on PyPI and that in 6 months from the time of that
email that all external links will be removed from the installer APIs. This
email *must* include instructions for converting their projects to be hosted
on PyPI and *must* include links to a script or package that will enable them
email **MUST** include instructions for converting their projects to be hosted
on PyPI and **MUST** include links to a script or package that will enable them
to enter their PyPI credentials and package name and have it automatically
download and re-host all of their files on PyPI. This email *must also*
download and re-host all of their files on PyPI. This email **MUST** also
include instructions for setting up their own index page and registering that
with PyPI, including the fact that they can use pythonhosted.org as a host
for an index page without requiring them to host any additional infrastructure
or purchase a TLS certificate. This email must also contain a link to the Terms
of Service for PyPI as many users may have signed up a long time ago and may
not recall what those terms are.
with PyPI, including the fact that they can use pythonhosted.org as a host for
an index page without requiring them to host any additional infrastructure or
purchase a TLS certificate. This email must also contain a link to the Terms of
Service for PyPI as many users may have signed up a long time ago and may not
recall what those terms are. Finally this email must also contain a list of
the links registered with PyPI where we were able to detect an installable file
was located.
Five months after the initial email, another email must be sent to any projects
still relying on external hosting. This email will include all of the same
@ -312,24 +316,48 @@ information that the first email contained, except that the removal date will
be one month away instead of six.
Finally a month later all projects will be switched to the ``pypi-only`` mode
and PyPI will be modified to remove the externally linked files functionality.
At this point in time any installers should finally remove any of the
deprecated PEP 438 functionality such as ``--allow-external`` and
``--allow-unverified`` in pip.
and PyPI will be modified to remove the externally linked files functionality,
when switching these projects to the ``pypi-only`` mode we will move any links
which are able to be used for discovering other projects automatically to as
an external repository.
Summary of Changes
==================
Repository side
---------------
#. Implement simple API changes to allow the addition of an external
repository.
#. *(Optional, Mandatory on PyPI)* Deprecate and remove the hosting modes as
defined by PEP 438.
#. *(Optional, Mandatory on PyPI)* Restrict simple API to only list the files
that are contained within the repository and the external repository
metadata.
Client side
-----------
#. Implement multiple repository support.
#. Implement some mechanism for removing/disabling the default repository.
#. Implement the discovery mechanism.
#. *(Optional)* Deprecate / Remove PEP 438
Impact
======
The largest impact of this is going to be projects where the maintainers are
no longer maintaining the project, for one reason or another. For these
projects it's unlikely that a maintainer will arrive to set the external index
metadata which would allow the auto discovery mechanism to find it.
The large impact of this PEP will be that for users of older installation
clients they will not get a discovery mechanism built into the install command.
This will require them to browse to the PyPI web UI and discover the repository
there. Since any URLs required to instal a project will be automatically
migrated to the new format, the biggest change to users will be requiring a new
option to install these projects.
Looking at the numbers factoring out PIL (which has been special cased below)
the actual impact should be quite low, with it affecting just 3.8% of projects
which host any files only externally or 2.2% which have their latest version
hosted only externally.
Looking at the numbers the actual impact should be quite low, with it affecting
just 3.8% of projects which host any files only externally or 2.2% which have
their latest version hosted only externally.
6674 unique IP addresses have accessed the Simple API for these 3.8% of
projects in a single day (2014-09-30). Of those, 99.5% of them installed
@ -337,6 +365,10 @@ something which could not be verified, and thus they were open to a Remote Code
Execution via a Man-In-The-Middle attack, while 7.9% installed something which
could be verified and only 0.4% only installed things which could be verified.
This means that 99.5% users of these features, both new and old, are doing
something unsafe, and for anything using an older copy of pip or using
setuptools at all they are silently unsafe.
Projects Which Rely on Externally Hosted files
----------------------------------------------
@ -403,20 +435,6 @@ pyDes 76
============================== ==========
PIL
---
It's obvious from the numbers above that the vast bulk of the impact come from
the PIL project. On 2014-05-17 an email was sent to the contact for PIL
inquiring whether or not they would be willing to upload to PyPI. A response
has not been received as of yet (2014-10-03) nor has any change in the hosting
happened. Due to the popularity of PIL this PEP also proposes that during the
deprecation period that PyPI Administrators will set the PIL download URL as
the external index for that project. Allowing the users of PIL to take
advantage of the auto discovery mechanisms although the project has seemingly
become unmaintained.
Rejected Proposals
==================
@ -424,19 +442,20 @@ Keep the current classification system but adjust the options
-------------------------------------------------------------
This PEP rejects several related proposals which attempt to fix some of the
usability problems with the current system but while still keeping the
general gist of PEP 438.
usability problems with the current system but while still keeping the general
gist of PEP 438.
This includes:
* Default to allowing safely externally hosted files, but disallow unsafely
hosted.
* Default to disallowing safely externally hosted files with only a global
flag to enable them, but disallow unsafely hosted.
* Default to disallowing safely externally hosted files with only a global flag
to enable them, but disallow unsafely hosted.
* Continue on the suggested path of PEP 438 and remove the option to unsafely
host externally but continue to allow the option to safely host externally.
These proposals are rejected because:
* The classification system introduced in PEP 438 in an entirely unique concept
@ -445,8 +464,8 @@ These proposals are rejected because:
* The classification system itself is non-obvious to explain and to
pre-determine what classification of link a project will require entails
inspecting the project's ``/simple/<project>/`` page, and possibly any
URLs linked from that page.
inspecting the project's ``/simple/<project>/`` page, and possibly any URLs
linked from that page.
* The ability to host externally while still being linked for automatic
discovery is mostly a historic relic which causes a fair amount of pain and
@ -457,16 +476,28 @@ These proposals are rejected because:
This extends to the ``--allow-*`` options as well as the inability to
determine if a link is expected to fail or not.
* The mechanism paints a very broad brush when enabling an option, while PEP
438 attempts to limit this with per package options. However a project that
has existed for an extended period of time may often times have several
different URLs listed in their simple index. It is not unsusual for at least
* The mechanism paints a very broad brush when enabling an option, while
PEP 438 attempts to limit this with per package options. However a project
that has existed for an extended period of time may often times have several
different URLs listed in their simple index. It is not unusual for at least
one of these to no longer be under control of the project. While an
unregistered domain will sit there relatively harmless most of the time, pip
will continue to attempt to install from it on every discovery phase. This
means that an attacker simply needs to look at projects which rely on unsafe
external URLs and register expired domains to attack users.
Implement this PEP, but Do Not Remove the Existing Links
--------------------------------------------------------
This is essentially the backwards compatible version of this PEP. It attempts
to allow people using older clients, or clients which do not implement this
PEP to continue on as if nothing had changed. This proposal is rejected because
the vast bulk of those scenarios are unsafe uses of the deprecated features. It
is the opinion of this PEP that silently allowing unsafe actions to take place
on behalf of end users is simply not an acceptable solution.
Copyright
=========