merge
This commit is contained in:
commit
f880710236
276
pep-0470.txt
276
pep-0470.txt
|
@ -1,36 +1,25 @@
|
||||||
PEP: 470
|
PEP: 470
|
||||||
Title: Using Multi Repository Support for External to PyPI Package File Hosting
|
Title: Removing External Hosting Support on PyPI
|
||||||
Version: $Revision$
|
Version: $Revision$
|
||||||
Last-Modified: $Date$
|
Last-Modified: $Date$
|
||||||
Author: Donald Stufft <donald@stufft.io>,
|
Author: Donald Stufft <donald@stufft.io>,
|
||||||
BDFL-Delegate: Richard Jones <richard@python.org>
|
BDFL-Delegate: TBD
|
||||||
Discussions-To: distutils-sig@python.org
|
Discussions-To: distutils-sig@python.org
|
||||||
Status: Draft
|
Status: Draft
|
||||||
Type: Process
|
Type: Process
|
||||||
Content-Type: text/x-rst
|
Content-Type: text/x-rst
|
||||||
Created: 12-May-2014
|
Created: 12-May-2014
|
||||||
Post-History: 14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014
|
Post-History: 14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014, 26-Aug-2015
|
||||||
Replaces: 438
|
Replaces: 438
|
||||||
|
|
||||||
|
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
This PEP proposes a mechanism for project authors to register with PyPI an
|
This PEP proposes the deprecation and removal of support for hosting files
|
||||||
external repository where their project's downloads can be located. This
|
externally to PyPI as well as the deprecation and removal of the functionality
|
||||||
information can than be included as part of the simple API so that installers
|
added by PEP 438, particularly rel information to classify different types of
|
||||||
can use it to tell users where the item they are attempting to install is
|
links and the meta-tag to indicate API version.
|
||||||
located and what they need to do to enable this additional repository. In
|
|
||||||
addition to adding discovery information to make explicit multiple repositories
|
|
||||||
easy to use, this PEP also deprecates and removes the implicit multiple
|
|
||||||
repository support which currently functions through directly or indirectly
|
|
||||||
linking off site via the simple API. Finally this PEP also proposes deprecating
|
|
||||||
and removing the functionality added by PEP 438, particularly the additional
|
|
||||||
rel information and the meta tag to indicate the API version.
|
|
||||||
|
|
||||||
This PEP *does* not propose mandating that all authors upload their projects to
|
|
||||||
PyPI in order to exist in the index nor does it propose any change to the human
|
|
||||||
facing elements of PyPI.
|
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
|
@ -65,14 +54,6 @@ PyPI works, and other projects works, but this one specific one does not. They
|
||||||
often times do not realize who they need to contact in order to get this fixed
|
often times do not realize who they need to contact in order to get this fixed
|
||||||
or what their remediation steps are.
|
or what their remediation steps are.
|
||||||
|
|
||||||
By moving to using explicit multiple repositories we can make the lines between
|
|
||||||
these two roles much more explicit and remove the "hidden" surprises caused by
|
|
||||||
the current implementation of handling people who do not want to use PyPI as a
|
|
||||||
repository. However simply moving to explicit multiple repositories is a
|
|
||||||
regression in discoverability, and for that reason this PEP adds an extension
|
|
||||||
to the current simple API which will enable easy discovery of the specific
|
|
||||||
repository that a project can be found in.
|
|
||||||
|
|
||||||
PEP 438 attempted to solve this issue by allowing projects to explicitly
|
PEP 438 attempted to solve this issue by allowing projects to explicitly
|
||||||
declare if they were using the repository features or not, and if they were
|
declare if they were using the repository features or not, and if they were
|
||||||
not, it had the installers classify the links it found as either "internal",
|
not, it had the installers classify the links it found as either "internal",
|
||||||
|
@ -85,16 +66,17 @@ repository features, an altogether good thing given the global CDN powering
|
||||||
PyPI providing speed ups for a lot of people, however it did so by introducing
|
PyPI providing speed ups for a lot of people, however it did so by introducing
|
||||||
a new point of confusion and pain for both the end users and the authors.
|
a new point of confusion and pain for both the end users and the authors.
|
||||||
|
|
||||||
|
By moving to using explicit multiple repositories we can make the lines between
|
||||||
|
these two roles much more explicit and remove the "hidden" surprises caused by
|
||||||
|
the current implementation of handling people who do not want to use PyPI as a
|
||||||
|
repository.
|
||||||
|
|
||||||
|
|
||||||
Key User Experience Expectations
|
Key User Experience Expectations
|
||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
||||||
#. Easily allow external hosting to "just work" when appropriately configured
|
#. Easily allow external hosting to "just work" when appropriately configured
|
||||||
at the system, user or virtual environment level.
|
at the system, user or virtual environment level.
|
||||||
#. Easily allow package authors to tell PyPI "my releases are hosted <here>"
|
|
||||||
and have that advertised in such a way that tools can clearly communicate it
|
|
||||||
to users, without silently introducing unexpected dependencies on third
|
|
||||||
party services.
|
|
||||||
#. Eliminate any and all references to the confusing "verifiable external" and
|
#. Eliminate any and all references to the confusing "verifiable external" and
|
||||||
"unverifiable external" distinction from the user experience (both when
|
"unverifiable external" distinction from the user experience (both when
|
||||||
installing and when releasing packages).
|
installing and when releasing packages).
|
||||||
|
@ -122,7 +104,7 @@ tools almost universally using multiple repository support making it extremely
|
||||||
likely that someone is already familiar with the concept.
|
likely that someone is already familiar with the concept.
|
||||||
|
|
||||||
Additionally, the multiple repository approach is a concept that is useful
|
Additionally, the multiple repository approach is a concept that is useful
|
||||||
outside of the narrow scope of allowing projects which wish to be included on
|
outside of the narrow scope of allowing projects that wish to be included on
|
||||||
the index portion of PyPI but do not wish to utilize the repository portion of
|
the index portion of PyPI but do not wish to utilize the repository portion of
|
||||||
PyPI. This includes places where a company may wish to host a repository that
|
PyPI. This includes places where a company may wish to host a repository that
|
||||||
contains their internal packages or where a project may wish to have multiple
|
contains their internal packages or where a project may wish to have multiple
|
||||||
|
@ -215,64 +197,9 @@ repository. The exact specifics of how that is achieved is up to each
|
||||||
individual implementation.
|
individual implementation.
|
||||||
|
|
||||||
|
|
||||||
External Index Discovery
|
|
||||||
========================
|
|
||||||
|
|
||||||
One of the problems with using an additional index is one of discovery. Users
|
|
||||||
will not generally be aware that an additional index is required at all much
|
|
||||||
less where that index can be found. Projects can attempt to convey this
|
|
||||||
information using their description on the PyPI page however that excludes
|
|
||||||
people who discover their project organically through ``pip search``.
|
|
||||||
|
|
||||||
To support projects that wish to externally host their files and to enable
|
|
||||||
users to easily discover what additional indexes are required, PyPI will gain
|
|
||||||
the ability for projects to register external index URLs along with an
|
|
||||||
associated comment for each. These URLs will be made available on the simple
|
|
||||||
page however they will not be linked or provided in a form that older
|
|
||||||
installers will automatically search them.
|
|
||||||
|
|
||||||
This ability will take the form of a ``<meta>`` tag. The name of this tag must
|
|
||||||
be set to ``repository`` or ``find-link`` and the content will be a link to the
|
|
||||||
location of the repository. An optional data-description attribute will convey
|
|
||||||
any comments or description that the author has provided.
|
|
||||||
|
|
||||||
An example would look something like::
|
|
||||||
|
|
||||||
<meta name="repository" content="https://index.example.com/" data-description="Primary Repository">
|
|
||||||
<meta name="repository" content="https://index.example.com/Ubuntu-14.04/" data-description="Wheels built for Ubuntu 14.04">
|
|
||||||
<meta name="find-link" content="https://links.example.com/find-links/" data-description="A flat index for find links">
|
|
||||||
|
|
||||||
When an installer fetches the simple page for a project, if it finds this
|
|
||||||
additional meta-data then it should use this data to tell the user how to add
|
|
||||||
one or more of the additional URLs to search in. This message should include
|
|
||||||
any comments that the project has included to enable them to communicate to the
|
|
||||||
user and provide hints as to which URL they might want (e.g. if some are only
|
|
||||||
useful or compatible with certain platforms or situations). When the installer
|
|
||||||
has implemented the auto discovery mechanisms they should also deprecate any of
|
|
||||||
the mechanisms added for PEP 438 (such as ``--allow-external``) for removal at
|
|
||||||
the end of the deprecation period proposed by the PEP.
|
|
||||||
|
|
||||||
In addition to the API for programtic access to the registered external
|
|
||||||
repositories, PyPI will also prevent these URLs in the UI so that users with
|
|
||||||
an installer that does not implement the discovery mechanism can still easily
|
|
||||||
discover what repository the project is using to host itself.
|
|
||||||
|
|
||||||
This feature **MUST** be added to PyPI and be contained in a released version
|
|
||||||
of pip prior to starting the deprecation and removal process for the implicit
|
|
||||||
offsite hosting functionality.
|
|
||||||
|
|
||||||
|
|
||||||
Deprecation and Removal of Link Spidering
|
Deprecation and Removal of Link Spidering
|
||||||
=========================================
|
=========================================
|
||||||
|
|
||||||
.. important:: The deprecation specified in this section **MUST** not start to
|
|
||||||
until after the discovery mechanisms have been implemented and released in
|
|
||||||
pip.
|
|
||||||
|
|
||||||
The only exception to this is the addition of the ``pypi-only`` mode and
|
|
||||||
defaulting new projects to it without abilility to switch to a different
|
|
||||||
mode.
|
|
||||||
|
|
||||||
A new hosting mode will be added to PyPI. This hosting mode will be called
|
A new hosting mode will be added to PyPI. This hosting mode will be called
|
||||||
``pypi-only`` and will be in addition to the three that PEP 438 has already
|
``pypi-only`` and will be in addition to the three that PEP 438 has already
|
||||||
given us which are ``pypi-explicit``, ``pypi-scrape``, ``pypi-scrape-crawl``.
|
given us which are ``pypi-explicit``, ``pypi-scrape``, ``pypi-scrape-crawl``.
|
||||||
|
@ -282,44 +209,34 @@ else.
|
||||||
|
|
||||||
Upon acceptance of this PEP and the addition of the ``pypi-only`` mode, all new
|
Upon acceptance of this PEP and the addition of the ``pypi-only`` mode, all new
|
||||||
projects will be defaulted to the PyPI only mode and they will be locked to
|
projects will be defaulted to the PyPI only mode and they will be locked to
|
||||||
this mode and unable to change this particular setting. ``pypi-only`` projects
|
this mode and unable to change this particular setting.
|
||||||
will still be able to register external index URLs as described above - the
|
|
||||||
"pypi-only" refers only to the download links that are published directly on
|
|
||||||
PyPI.
|
|
||||||
|
|
||||||
An email will then be sent out to all of the projects which are hosted only on
|
An email will then be sent out to all of the projects which are hosted only on
|
||||||
PyPI informing them that in one month their project will be automatically
|
PyPI informing them that in one month their project will be automatically
|
||||||
converted to the ``pypi-only`` mode. A month after these emails have been sent
|
converted to the ``pypi-only`` mode. A month after these emails have been sent
|
||||||
any of those projects which were emailed, which still are hosted only on PyPI
|
any of those projects which were emailed, which still are hosted only on PyPI
|
||||||
will have their mode set to ``pypi-only``.
|
will have their mode set permanently to ``pypi-only``.
|
||||||
|
|
||||||
After that switch, an email will be sent to projects which rely on hosting
|
At the same time, an email will be sent to projects which rely on hosting
|
||||||
external to PyPI. This email will warn these projects that externally hosted
|
external to PyPI. This email will warn these projects that externally hosted
|
||||||
files have been deprecated on PyPI and that in 6 months from the time of that
|
files have been deprecated on PyPI and that in 3 months from the time of that
|
||||||
email that all external links will be removed from the installer APIs. This
|
email that all external links will be removed from the installer APIs. This
|
||||||
email **MUST** include instructions for converting their projects to be hosted
|
email **MUST** include instructions for converting their projects to be hosted
|
||||||
on PyPI and **MUST** include links to a script or package that will enable them
|
on PyPI and **MUST** include links to a script or package that will enable them
|
||||||
to enter their PyPI credentials and package name and have it automatically
|
to enter their PyPI credentials and package name and have it automatically
|
||||||
download and re-host all of their files on PyPI. This email **MUST** also
|
download and re-host all of their files on PyPI. This email **MUST** also
|
||||||
include instructions for setting up their own index page and registering that
|
include instructions for setting up their own index page. This email must also contain a link to the Terms of Service for PyPI as many users may have signed
|
||||||
with PyPI, including the fact that they can use pythonhosted.org as a host for
|
up a long time ago and may not recall what those terms are. Finally this email
|
||||||
an index page without requiring them to host any additional infrastructure or
|
must also contain a list of the links registered with PyPI where we were able
|
||||||
purchase a TLS certificate. This email must also contain a link to the Terms of
|
to detect an installable file was located.
|
||||||
Service for PyPI as many users may have signed up a long time ago and may not
|
|
||||||
recall what those terms are. Finally this email must also contain a list of
|
|
||||||
the links registered with PyPI where we were able to detect an installable file
|
|
||||||
was located.
|
|
||||||
|
|
||||||
Five months after the initial email, another email must be sent to any projects
|
Two months after the initial email, another email must be sent to any projects
|
||||||
still relying on external hosting. This email will include all of the same
|
still relying on external hosting. This email will include all of the same
|
||||||
information that the first email contained, except that the removal date will
|
information that the first email contained, except that the removal date will
|
||||||
be one month away instead of six.
|
be one month away instead of three.
|
||||||
|
|
||||||
Finally a month later all projects will be switched to the ``pypi-only`` mode
|
Finally a month later all projects will be switched to the ``pypi-only`` mode
|
||||||
and PyPI will be modified to remove the externally linked files functionality,
|
and PyPI will be modified to remove the externally linked files functionality.
|
||||||
when switching these projects to the ``pypi-only`` mode we will move any links
|
|
||||||
which are able to be used for discovering other projects automatically to as
|
|
||||||
an external repository.
|
|
||||||
|
|
||||||
|
|
||||||
Summary of Changes
|
Summary of Changes
|
||||||
|
@ -328,116 +245,85 @@ Summary of Changes
|
||||||
Repository side
|
Repository side
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
#. Implement simple API changes to allow the addition of an external
|
#. Deprecate and remove the hosting modes as defined by PEP 438.
|
||||||
|
#. Restrict simple API to only list the files that are contained within the
|
||||||
repository.
|
repository.
|
||||||
#. *(Optional, Mandatory on PyPI)* Deprecate and remove the hosting modes as
|
|
||||||
defined by PEP 438.
|
|
||||||
#. *(Optional, Mandatory on PyPI)* Restrict simple API to only list the files
|
|
||||||
that are contained within the repository and the external repository
|
|
||||||
metadata.
|
|
||||||
|
|
||||||
Client side
|
Client side
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
#. Implement multiple repository support.
|
#. Implement multiple repository support.
|
||||||
#. Implement some mechanism for removing/disabling the default repository.
|
#. Implement some mechanism for removing/disabling the default repository.
|
||||||
#. Implement the discovery mechanism.
|
#. Deprecate / Remove PEP 438
|
||||||
#. *(Optional)* Deprecate / Remove PEP 438
|
|
||||||
|
|
||||||
|
|
||||||
Impact
|
Impact
|
||||||
======
|
======
|
||||||
|
|
||||||
The large impact of this PEP will be that for users of older installation
|
To determine impact, we've looked at all projects using a method of searching
|
||||||
clients they will not get a discovery mechanism built into the install command.
|
PyPI which is similar to what pip and setuptools use and searched for all
|
||||||
This will require them to browse to the PyPI web UI and discover the repository
|
files available on PyPI, safely linked from PyPI, unsafely linked from PyPI,
|
||||||
there. Since any URLs required to instal a project will be automatically
|
and finally unsafely available outside of PyPI. When the same file was found
|
||||||
migrated to the new format, the biggest change to users will be requiring a new
|
in multiple locations it was deduplicated and only counted it in one location
|
||||||
option to install these projects.
|
based on the following preferences: PyPI > Safely Off PyPI > Unsafely Off PyPI.
|
||||||
|
This gives us the broadest possible definition of impact, it means that any
|
||||||
|
single file for this project may no longer be visible by default, however that
|
||||||
|
file could be years old, or it could be a binary file while there is a sdist
|
||||||
|
available on PyPI. This means that the *real* impact will likely be much
|
||||||
|
smaller, but in an attempt not to miscount we take the broadest possible
|
||||||
|
definition.
|
||||||
|
|
||||||
Looking at the numbers the actual impact should be quite low, with it affecting
|
At the time of this writing there are 65,232 projects hosted on PyPI and of
|
||||||
just 3.8% of projects which host any files only externally or 2.2% which have
|
those, 59 of them rely on external files that are safely hosted outside of PyPI
|
||||||
their latest version hosted only externally.
|
and 931 of them rely on external files which are unsafely hosted outside of
|
||||||
|
PyPI. This shows us that 1.5% of projects will be affected in some way by this
|
||||||
6674 unique IP addresses have accessed the Simple API for these 3.8% of
|
change while 98.5% will continue to function as they always have. In addition,
|
||||||
projects in a single day (2014-09-30). Of those, 99.5% of them installed
|
only 5% of the projects affected are using the features provided by PEP 438 to
|
||||||
something which could not be verified, and thus they were open to a Remote Code
|
safely host outside of PyPI while 95% of them are exposing their users to
|
||||||
Execution via a Man-In-The-Middle attack, while 7.9% installed something which
|
Remote Code Execution via a Man In The Middle attack.
|
||||||
could be verified and only 0.4% only installed things which could be verified.
|
|
||||||
|
|
||||||
This means that 99.5% users of these features, both new and old, are doing
|
|
||||||
something unsafe, and for anything using an older copy of pip or using
|
|
||||||
setuptools at all they are silently unsafe.
|
|
||||||
|
|
||||||
|
|
||||||
Projects Which Rely on Externally Hosted files
|
Data Sovereignty
|
||||||
----------------------------------------------
|
================
|
||||||
|
|
||||||
This is determined by crawling the simple index and looking for installable
|
In the discussions around previous versions of this PEP, one of the key use
|
||||||
files using a similar detection method as pip and setuptools use. The "latest"
|
cases for wanting to host files externally to PyPI was due to data sovereignty
|
||||||
version is determined using ``pkg_resources.parse_version`` sort order and it
|
requirements for people living in jurisdictions outside of the USA, where PyPI
|
||||||
is used to show whether or not the latest version is hosted externally or only
|
is currently hosted. The author of this PEP is not blind to these concerns and
|
||||||
old versions are.
|
realizes that this PEP represents a regression for the people that have these
|
||||||
|
concerns, however the current situation is presenting an extremely poor user
|
||||||
|
experience and the feature is only being used by a small percentage of
|
||||||
|
projects. In addition, the data sovereignty problems requires familarity with
|
||||||
|
the laws outside of the home jurisdiction of the author of this PEP, who is
|
||||||
|
also the principal developer and operator of PyPI. For these reasons, a
|
||||||
|
solution for the problem of data sovereignty has been deferred and is
|
||||||
|
considered outside of the scope for this PEP.
|
||||||
|
|
||||||
============ ======= ================ =================== =======
|
If someone for whom the issue of data sovereignty matters to them wishes to
|
||||||
\ PyPI External (old) External (latest) Total
|
put forth the effort, then at that time a system can be designed, implemented,
|
||||||
============ ======= ================ =================== =======
|
and ultimately deployed and operated that would satisfy both the needs of non
|
||||||
**Safe** 43313 16 39 43368
|
US users that cannot upload their projects to a system on US soil and the
|
||||||
**Unsafe** 0 756 1092 1848
|
quality of user experience that is attempted to be created on PyPI.
|
||||||
**Total** 43313 772 1131 45216
|
|
||||||
============ ======= ================ =================== =======
|
|
||||||
|
|
||||||
|
|
||||||
Top Externally Hosted Projects by Requests
|
|
||||||
------------------------------------------
|
|
||||||
|
|
||||||
This is determined by looking at the number of requests the
|
|
||||||
``/simple/<project>/`` page had gotten in a single day. The total number of
|
|
||||||
requests during that day was 10,623,831.
|
|
||||||
|
|
||||||
============================== ========
|
|
||||||
Project Requests
|
|
||||||
============================== ========
|
|
||||||
PIL 63869
|
|
||||||
Pygame 2681
|
|
||||||
mysql-connector-python 1562
|
|
||||||
pyodbc 724
|
|
||||||
elementtree 635
|
|
||||||
salesforce-python-toolkit 316
|
|
||||||
wxPython 295
|
|
||||||
PyXML 251
|
|
||||||
RBTools 235
|
|
||||||
python-graph-core 123
|
|
||||||
cElementTree 121
|
|
||||||
============================== ========
|
|
||||||
|
|
||||||
|
|
||||||
Top Externally Hosted Projects by Unique IPs
|
|
||||||
--------------------------------------------
|
|
||||||
|
|
||||||
This is determined by looking at the IP addresses of requests the
|
|
||||||
``/simple/<project>/`` page had gotten in a single day. The total number of
|
|
||||||
unique IP addresses during that day was 124,604.
|
|
||||||
|
|
||||||
============================== ==========
|
|
||||||
Project Unique IPs
|
|
||||||
============================== ==========
|
|
||||||
PIL 4553
|
|
||||||
mysql-connector-python 462
|
|
||||||
Pygame 202
|
|
||||||
pyodbc 181
|
|
||||||
elementtree 166
|
|
||||||
wxPython 126
|
|
||||||
RBTools 114
|
|
||||||
PyXML 87
|
|
||||||
salesforce-python-toolkit 76
|
|
||||||
pyDes 76
|
|
||||||
============================== ==========
|
|
||||||
|
|
||||||
|
|
||||||
Rejected Proposals
|
Rejected Proposals
|
||||||
==================
|
==================
|
||||||
|
|
||||||
|
Allow easier discovery of externally hosted indexes
|
||||||
|
---------------------------------------------------
|
||||||
|
|
||||||
|
A previous version of this PEP included a new feature added to both PyPI and
|
||||||
|
installers that would allow project authors to enter into PyPI a list of
|
||||||
|
URLs that would instruct installers to ignore any files uploaded to PyPI and
|
||||||
|
instead return an error telling the end user about these extra URLs that they
|
||||||
|
can add to their installer to make the installation work.
|
||||||
|
|
||||||
|
This idea is rejected because it provides a similar painful end user experience
|
||||||
|
where people will first attempt to install something, get an error, then have
|
||||||
|
to re-run the installation with the correct options.
|
||||||
|
|
||||||
|
|
||||||
Keep the current classification system but adjust the options
|
Keep the current classification system but adjust the options
|
||||||
-------------------------------------------------------------
|
-------------------------------------------------------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue