merge
This commit is contained in:
commit
f880710236
276
pep-0470.txt
276
pep-0470.txt
|
@ -1,36 +1,25 @@
|
|||
PEP: 470
|
||||
Title: Using Multi Repository Support for External to PyPI Package File Hosting
|
||||
Title: Removing External Hosting Support on PyPI
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Donald Stufft <donald@stufft.io>,
|
||||
BDFL-Delegate: Richard Jones <richard@python.org>
|
||||
BDFL-Delegate: TBD
|
||||
Discussions-To: distutils-sig@python.org
|
||||
Status: Draft
|
||||
Type: Process
|
||||
Content-Type: text/x-rst
|
||||
Created: 12-May-2014
|
||||
Post-History: 14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014
|
||||
Post-History: 14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014, 26-Aug-2015
|
||||
Replaces: 438
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes a mechanism for project authors to register with PyPI an
|
||||
external repository where their project's downloads can be located. This
|
||||
information can than be included as part of the simple API so that installers
|
||||
can use it to tell users where the item they are attempting to install is
|
||||
located and what they need to do to enable this additional repository. In
|
||||
addition to adding discovery information to make explicit multiple repositories
|
||||
easy to use, this PEP also deprecates and removes the implicit multiple
|
||||
repository support which currently functions through directly or indirectly
|
||||
linking off site via the simple API. Finally this PEP also proposes deprecating
|
||||
and removing the functionality added by PEP 438, particularly the additional
|
||||
rel information and the meta tag to indicate the API version.
|
||||
|
||||
This PEP *does* not propose mandating that all authors upload their projects to
|
||||
PyPI in order to exist in the index nor does it propose any change to the human
|
||||
facing elements of PyPI.
|
||||
This PEP proposes the deprecation and removal of support for hosting files
|
||||
externally to PyPI as well as the deprecation and removal of the functionality
|
||||
added by PEP 438, particularly rel information to classify different types of
|
||||
links and the meta-tag to indicate API version.
|
||||
|
||||
|
||||
Rationale
|
||||
|
@ -65,14 +54,6 @@ PyPI works, and other projects works, but this one specific one does not. They
|
|||
often times do not realize who they need to contact in order to get this fixed
|
||||
or what their remediation steps are.
|
||||
|
||||
By moving to using explicit multiple repositories we can make the lines between
|
||||
these two roles much more explicit and remove the "hidden" surprises caused by
|
||||
the current implementation of handling people who do not want to use PyPI as a
|
||||
repository. However simply moving to explicit multiple repositories is a
|
||||
regression in discoverability, and for that reason this PEP adds an extension
|
||||
to the current simple API which will enable easy discovery of the specific
|
||||
repository that a project can be found in.
|
||||
|
||||
PEP 438 attempted to solve this issue by allowing projects to explicitly
|
||||
declare if they were using the repository features or not, and if they were
|
||||
not, it had the installers classify the links it found as either "internal",
|
||||
|
@ -85,16 +66,17 @@ repository features, an altogether good thing given the global CDN powering
|
|||
PyPI providing speed ups for a lot of people, however it did so by introducing
|
||||
a new point of confusion and pain for both the end users and the authors.
|
||||
|
||||
By moving to using explicit multiple repositories we can make the lines between
|
||||
these two roles much more explicit and remove the "hidden" surprises caused by
|
||||
the current implementation of handling people who do not want to use PyPI as a
|
||||
repository.
|
||||
|
||||
|
||||
Key User Experience Expectations
|
||||
--------------------------------
|
||||
|
||||
#. Easily allow external hosting to "just work" when appropriately configured
|
||||
at the system, user or virtual environment level.
|
||||
#. Easily allow package authors to tell PyPI "my releases are hosted <here>"
|
||||
and have that advertised in such a way that tools can clearly communicate it
|
||||
to users, without silently introducing unexpected dependencies on third
|
||||
party services.
|
||||
#. Eliminate any and all references to the confusing "verifiable external" and
|
||||
"unverifiable external" distinction from the user experience (both when
|
||||
installing and when releasing packages).
|
||||
|
@ -122,7 +104,7 @@ tools almost universally using multiple repository support making it extremely
|
|||
likely that someone is already familiar with the concept.
|
||||
|
||||
Additionally, the multiple repository approach is a concept that is useful
|
||||
outside of the narrow scope of allowing projects which wish to be included on
|
||||
outside of the narrow scope of allowing projects that wish to be included on
|
||||
the index portion of PyPI but do not wish to utilize the repository portion of
|
||||
PyPI. This includes places where a company may wish to host a repository that
|
||||
contains their internal packages or where a project may wish to have multiple
|
||||
|
@ -215,64 +197,9 @@ repository. The exact specifics of how that is achieved is up to each
|
|||
individual implementation.
|
||||
|
||||
|
||||
External Index Discovery
|
||||
========================
|
||||
|
||||
One of the problems with using an additional index is one of discovery. Users
|
||||
will not generally be aware that an additional index is required at all much
|
||||
less where that index can be found. Projects can attempt to convey this
|
||||
information using their description on the PyPI page however that excludes
|
||||
people who discover their project organically through ``pip search``.
|
||||
|
||||
To support projects that wish to externally host their files and to enable
|
||||
users to easily discover what additional indexes are required, PyPI will gain
|
||||
the ability for projects to register external index URLs along with an
|
||||
associated comment for each. These URLs will be made available on the simple
|
||||
page however they will not be linked or provided in a form that older
|
||||
installers will automatically search them.
|
||||
|
||||
This ability will take the form of a ``<meta>`` tag. The name of this tag must
|
||||
be set to ``repository`` or ``find-link`` and the content will be a link to the
|
||||
location of the repository. An optional data-description attribute will convey
|
||||
any comments or description that the author has provided.
|
||||
|
||||
An example would look something like::
|
||||
|
||||
<meta name="repository" content="https://index.example.com/" data-description="Primary Repository">
|
||||
<meta name="repository" content="https://index.example.com/Ubuntu-14.04/" data-description="Wheels built for Ubuntu 14.04">
|
||||
<meta name="find-link" content="https://links.example.com/find-links/" data-description="A flat index for find links">
|
||||
|
||||
When an installer fetches the simple page for a project, if it finds this
|
||||
additional meta-data then it should use this data to tell the user how to add
|
||||
one or more of the additional URLs to search in. This message should include
|
||||
any comments that the project has included to enable them to communicate to the
|
||||
user and provide hints as to which URL they might want (e.g. if some are only
|
||||
useful or compatible with certain platforms or situations). When the installer
|
||||
has implemented the auto discovery mechanisms they should also deprecate any of
|
||||
the mechanisms added for PEP 438 (such as ``--allow-external``) for removal at
|
||||
the end of the deprecation period proposed by the PEP.
|
||||
|
||||
In addition to the API for programtic access to the registered external
|
||||
repositories, PyPI will also prevent these URLs in the UI so that users with
|
||||
an installer that does not implement the discovery mechanism can still easily
|
||||
discover what repository the project is using to host itself.
|
||||
|
||||
This feature **MUST** be added to PyPI and be contained in a released version
|
||||
of pip prior to starting the deprecation and removal process for the implicit
|
||||
offsite hosting functionality.
|
||||
|
||||
|
||||
Deprecation and Removal of Link Spidering
|
||||
=========================================
|
||||
|
||||
.. important:: The deprecation specified in this section **MUST** not start to
|
||||
until after the discovery mechanisms have been implemented and released in
|
||||
pip.
|
||||
|
||||
The only exception to this is the addition of the ``pypi-only`` mode and
|
||||
defaulting new projects to it without abilility to switch to a different
|
||||
mode.
|
||||
|
||||
A new hosting mode will be added to PyPI. This hosting mode will be called
|
||||
``pypi-only`` and will be in addition to the three that PEP 438 has already
|
||||
given us which are ``pypi-explicit``, ``pypi-scrape``, ``pypi-scrape-crawl``.
|
||||
|
@ -282,44 +209,34 @@ else.
|
|||
|
||||
Upon acceptance of this PEP and the addition of the ``pypi-only`` mode, all new
|
||||
projects will be defaulted to the PyPI only mode and they will be locked to
|
||||
this mode and unable to change this particular setting. ``pypi-only`` projects
|
||||
will still be able to register external index URLs as described above - the
|
||||
"pypi-only" refers only to the download links that are published directly on
|
||||
PyPI.
|
||||
this mode and unable to change this particular setting.
|
||||
|
||||
An email will then be sent out to all of the projects which are hosted only on
|
||||
PyPI informing them that in one month their project will be automatically
|
||||
converted to the ``pypi-only`` mode. A month after these emails have been sent
|
||||
any of those projects which were emailed, which still are hosted only on PyPI
|
||||
will have their mode set to ``pypi-only``.
|
||||
will have their mode set permanently to ``pypi-only``.
|
||||
|
||||
After that switch, an email will be sent to projects which rely on hosting
|
||||
At the same time, an email will be sent to projects which rely on hosting
|
||||
external to PyPI. This email will warn these projects that externally hosted
|
||||
files have been deprecated on PyPI and that in 6 months from the time of that
|
||||
files have been deprecated on PyPI and that in 3 months from the time of that
|
||||
email that all external links will be removed from the installer APIs. This
|
||||
email **MUST** include instructions for converting their projects to be hosted
|
||||
on PyPI and **MUST** include links to a script or package that will enable them
|
||||
to enter their PyPI credentials and package name and have it automatically
|
||||
download and re-host all of their files on PyPI. This email **MUST** also
|
||||
include instructions for setting up their own index page and registering that
|
||||
with PyPI, including the fact that they can use pythonhosted.org as a host for
|
||||
an index page without requiring them to host any additional infrastructure or
|
||||
purchase a TLS certificate. This email must also contain a link to the Terms of
|
||||
Service for PyPI as many users may have signed up a long time ago and may not
|
||||
recall what those terms are. Finally this email must also contain a list of
|
||||
the links registered with PyPI where we were able to detect an installable file
|
||||
was located.
|
||||
include instructions for setting up their own index page. This email must also contain a link to the Terms of Service for PyPI as many users may have signed
|
||||
up a long time ago and may not recall what those terms are. Finally this email
|
||||
must also contain a list of the links registered with PyPI where we were able
|
||||
to detect an installable file was located.
|
||||
|
||||
Five months after the initial email, another email must be sent to any projects
|
||||
Two months after the initial email, another email must be sent to any projects
|
||||
still relying on external hosting. This email will include all of the same
|
||||
information that the first email contained, except that the removal date will
|
||||
be one month away instead of six.
|
||||
be one month away instead of three.
|
||||
|
||||
Finally a month later all projects will be switched to the ``pypi-only`` mode
|
||||
and PyPI will be modified to remove the externally linked files functionality,
|
||||
when switching these projects to the ``pypi-only`` mode we will move any links
|
||||
which are able to be used for discovering other projects automatically to as
|
||||
an external repository.
|
||||
and PyPI will be modified to remove the externally linked files functionality.
|
||||
|
||||
|
||||
Summary of Changes
|
||||
|
@ -328,116 +245,85 @@ Summary of Changes
|
|||
Repository side
|
||||
---------------
|
||||
|
||||
#. Implement simple API changes to allow the addition of an external
|
||||
#. Deprecate and remove the hosting modes as defined by PEP 438.
|
||||
#. Restrict simple API to only list the files that are contained within the
|
||||
repository.
|
||||
#. *(Optional, Mandatory on PyPI)* Deprecate and remove the hosting modes as
|
||||
defined by PEP 438.
|
||||
#. *(Optional, Mandatory on PyPI)* Restrict simple API to only list the files
|
||||
that are contained within the repository and the external repository
|
||||
metadata.
|
||||
|
||||
|
||||
Client side
|
||||
-----------
|
||||
|
||||
#. Implement multiple repository support.
|
||||
#. Implement some mechanism for removing/disabling the default repository.
|
||||
#. Implement the discovery mechanism.
|
||||
#. *(Optional)* Deprecate / Remove PEP 438
|
||||
#. Deprecate / Remove PEP 438
|
||||
|
||||
|
||||
Impact
|
||||
======
|
||||
|
||||
The large impact of this PEP will be that for users of older installation
|
||||
clients they will not get a discovery mechanism built into the install command.
|
||||
This will require them to browse to the PyPI web UI and discover the repository
|
||||
there. Since any URLs required to instal a project will be automatically
|
||||
migrated to the new format, the biggest change to users will be requiring a new
|
||||
option to install these projects.
|
||||
To determine impact, we've looked at all projects using a method of searching
|
||||
PyPI which is similar to what pip and setuptools use and searched for all
|
||||
files available on PyPI, safely linked from PyPI, unsafely linked from PyPI,
|
||||
and finally unsafely available outside of PyPI. When the same file was found
|
||||
in multiple locations it was deduplicated and only counted it in one location
|
||||
based on the following preferences: PyPI > Safely Off PyPI > Unsafely Off PyPI.
|
||||
This gives us the broadest possible definition of impact, it means that any
|
||||
single file for this project may no longer be visible by default, however that
|
||||
file could be years old, or it could be a binary file while there is a sdist
|
||||
available on PyPI. This means that the *real* impact will likely be much
|
||||
smaller, but in an attempt not to miscount we take the broadest possible
|
||||
definition.
|
||||
|
||||
Looking at the numbers the actual impact should be quite low, with it affecting
|
||||
just 3.8% of projects which host any files only externally or 2.2% which have
|
||||
their latest version hosted only externally.
|
||||
|
||||
6674 unique IP addresses have accessed the Simple API for these 3.8% of
|
||||
projects in a single day (2014-09-30). Of those, 99.5% of them installed
|
||||
something which could not be verified, and thus they were open to a Remote Code
|
||||
Execution via a Man-In-The-Middle attack, while 7.9% installed something which
|
||||
could be verified and only 0.4% only installed things which could be verified.
|
||||
|
||||
This means that 99.5% users of these features, both new and old, are doing
|
||||
something unsafe, and for anything using an older copy of pip or using
|
||||
setuptools at all they are silently unsafe.
|
||||
At the time of this writing there are 65,232 projects hosted on PyPI and of
|
||||
those, 59 of them rely on external files that are safely hosted outside of PyPI
|
||||
and 931 of them rely on external files which are unsafely hosted outside of
|
||||
PyPI. This shows us that 1.5% of projects will be affected in some way by this
|
||||
change while 98.5% will continue to function as they always have. In addition,
|
||||
only 5% of the projects affected are using the features provided by PEP 438 to
|
||||
safely host outside of PyPI while 95% of them are exposing their users to
|
||||
Remote Code Execution via a Man In The Middle attack.
|
||||
|
||||
|
||||
Projects Which Rely on Externally Hosted files
|
||||
----------------------------------------------
|
||||
Data Sovereignty
|
||||
================
|
||||
|
||||
This is determined by crawling the simple index and looking for installable
|
||||
files using a similar detection method as pip and setuptools use. The "latest"
|
||||
version is determined using ``pkg_resources.parse_version`` sort order and it
|
||||
is used to show whether or not the latest version is hosted externally or only
|
||||
old versions are.
|
||||
In the discussions around previous versions of this PEP, one of the key use
|
||||
cases for wanting to host files externally to PyPI was due to data sovereignty
|
||||
requirements for people living in jurisdictions outside of the USA, where PyPI
|
||||
is currently hosted. The author of this PEP is not blind to these concerns and
|
||||
realizes that this PEP represents a regression for the people that have these
|
||||
concerns, however the current situation is presenting an extremely poor user
|
||||
experience and the feature is only being used by a small percentage of
|
||||
projects. In addition, the data sovereignty problems requires familarity with
|
||||
the laws outside of the home jurisdiction of the author of this PEP, who is
|
||||
also the principal developer and operator of PyPI. For these reasons, a
|
||||
solution for the problem of data sovereignty has been deferred and is
|
||||
considered outside of the scope for this PEP.
|
||||
|
||||
============ ======= ================ =================== =======
|
||||
\ PyPI External (old) External (latest) Total
|
||||
============ ======= ================ =================== =======
|
||||
**Safe** 43313 16 39 43368
|
||||
**Unsafe** 0 756 1092 1848
|
||||
**Total** 43313 772 1131 45216
|
||||
============ ======= ================ =================== =======
|
||||
|
||||
|
||||
Top Externally Hosted Projects by Requests
|
||||
------------------------------------------
|
||||
|
||||
This is determined by looking at the number of requests the
|
||||
``/simple/<project>/`` page had gotten in a single day. The total number of
|
||||
requests during that day was 10,623,831.
|
||||
|
||||
============================== ========
|
||||
Project Requests
|
||||
============================== ========
|
||||
PIL 63869
|
||||
Pygame 2681
|
||||
mysql-connector-python 1562
|
||||
pyodbc 724
|
||||
elementtree 635
|
||||
salesforce-python-toolkit 316
|
||||
wxPython 295
|
||||
PyXML 251
|
||||
RBTools 235
|
||||
python-graph-core 123
|
||||
cElementTree 121
|
||||
============================== ========
|
||||
|
||||
|
||||
Top Externally Hosted Projects by Unique IPs
|
||||
--------------------------------------------
|
||||
|
||||
This is determined by looking at the IP addresses of requests the
|
||||
``/simple/<project>/`` page had gotten in a single day. The total number of
|
||||
unique IP addresses during that day was 124,604.
|
||||
|
||||
============================== ==========
|
||||
Project Unique IPs
|
||||
============================== ==========
|
||||
PIL 4553
|
||||
mysql-connector-python 462
|
||||
Pygame 202
|
||||
pyodbc 181
|
||||
elementtree 166
|
||||
wxPython 126
|
||||
RBTools 114
|
||||
PyXML 87
|
||||
salesforce-python-toolkit 76
|
||||
pyDes 76
|
||||
============================== ==========
|
||||
If someone for whom the issue of data sovereignty matters to them wishes to
|
||||
put forth the effort, then at that time a system can be designed, implemented,
|
||||
and ultimately deployed and operated that would satisfy both the needs of non
|
||||
US users that cannot upload their projects to a system on US soil and the
|
||||
quality of user experience that is attempted to be created on PyPI.
|
||||
|
||||
|
||||
Rejected Proposals
|
||||
==================
|
||||
|
||||
Allow easier discovery of externally hosted indexes
|
||||
---------------------------------------------------
|
||||
|
||||
A previous version of this PEP included a new feature added to both PyPI and
|
||||
installers that would allow project authors to enter into PyPI a list of
|
||||
URLs that would instruct installers to ignore any files uploaded to PyPI and
|
||||
instead return an error telling the end user about these extra URLs that they
|
||||
can add to their installer to make the installation work.
|
||||
|
||||
This idea is rejected because it provides a similar painful end user experience
|
||||
where people will first attempt to install something, get an error, then have
|
||||
to re-run the installation with the correct options.
|
||||
|
||||
|
||||
Keep the current classification system but adjust the options
|
||||
-------------------------------------------------------------
|
||||
|
||||
|
|
Loading…
Reference in New Issue