diff --git a/pep-0470.txt b/pep-0470.txt index 462b51df3..ae5dc1140 100644 --- a/pep-0470.txt +++ b/pep-0470.txt @@ -1,36 +1,25 @@ PEP: 470 -Title: Using Multi Repository Support for External to PyPI Package File Hosting +Title: Removing External Hosting support on PyPI Version: $Revision$ Last-Modified: $Date$ Author: Donald Stufft , -BDFL-Delegate: Richard Jones +BDFL-Delegate: TBD Discussions-To: distutils-sig@python.org Status: Draft Type: Process Content-Type: text/x-rst Created: 12-May-2014 -Post-History: 14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014 +Post-History: 14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014, 26-Aug-2015 Replaces: 438 Abstract ======== -This PEP proposes a mechanism for project authors to register with PyPI an -external repository where their project's downloads can be located. This -information can than be included as part of the simple API so that installers -can use it to tell users where the item they are attempting to install is -located and what they need to do to enable this additional repository. In -addition to adding discovery information to make explicit multiple repositories -easy to use, this PEP also deprecates and removes the implicit multiple -repository support which currently functions through directly or indirectly -linking off site via the simple API. Finally this PEP also proposes deprecating -and removing the functionality added by PEP 438, particularly the additional -rel information and the meta tag to indicate the API version. - -This PEP *does* not propose mandating that all authors upload their projects to -PyPI in order to exist in the index nor does it propose any change to the human -facing elements of PyPI. +This PEP proposes the deprecation and removal of support for hosting files +externally to PyPI as well as the deprecation and removal of the functionality +added by PEP 438, particularly rel information to classify different types of +links and the meta-tag to indicate API version. Rationale @@ -65,14 +54,6 @@ PyPI works, and other projects works, but this one specific one does not. They often times do not realize who they need to contact in order to get this fixed or what their remediation steps are. -By moving to using explicit multiple repositories we can make the lines between -these two roles much more explicit and remove the "hidden" surprises caused by -the current implementation of handling people who do not want to use PyPI as a -repository. However simply moving to explicit multiple repositories is a -regression in discoverability, and for that reason this PEP adds an extension -to the current simple API which will enable easy discovery of the specific -repository that a project can be found in. - PEP 438 attempted to solve this issue by allowing projects to explicitly declare if they were using the repository features or not, and if they were not, it had the installers classify the links it found as either "internal", @@ -85,16 +66,17 @@ repository features, an altogether good thing given the global CDN powering PyPI providing speed ups for a lot of people, however it did so by introducing a new point of confusion and pain for both the end users and the authors. +By moving to using explicit multiple repositories we can make the lines between +these two roles much more explicit and remove the "hidden" surprises caused by +the current implementation of handling people who do not want to use PyPI as a +repository. + Key User Experience Expectations -------------------------------- #. Easily allow external hosting to "just work" when appropriately configured at the system, user or virtual environment level. -#. Easily allow package authors to tell PyPI "my releases are hosted " - and have that advertised in such a way that tools can clearly communicate it - to users, without silently introducing unexpected dependencies on third - party services. #. Eliminate any and all references to the confusing "verifiable external" and "unverifiable external" distinction from the user experience (both when installing and when releasing packages). @@ -122,7 +104,7 @@ tools almost universally using multiple repository support making it extremely likely that someone is already familiar with the concept. Additionally, the multiple repository approach is a concept that is useful -outside of the narrow scope of allowing projects which wish to be included on +outside of the narrow scope of allowing projects that wish to be included on the index portion of PyPI but do not wish to utilize the repository portion of PyPI. This includes places where a company may wish to host a repository that contains their internal packages or where a project may wish to have multiple @@ -215,64 +197,9 @@ repository. The exact specifics of how that is achieved is up to each individual implementation. -External Index Discovery -======================== - -One of the problems with using an additional index is one of discovery. Users -will not generally be aware that an additional index is required at all much -less where that index can be found. Projects can attempt to convey this -information using their description on the PyPI page however that excludes -people who discover their project organically through ``pip search``. - -To support projects that wish to externally host their files and to enable -users to easily discover what additional indexes are required, PyPI will gain -the ability for projects to register external index URLs along with an -associated comment for each. These URLs will be made available on the simple -page however they will not be linked or provided in a form that older -installers will automatically search them. - -This ability will take the form of a ```` tag. The name of this tag must -be set to ``repository`` or ``find-link`` and the content will be a link to the -location of the repository. An optional data-description attribute will convey -any comments or description that the author has provided. - -An example would look something like:: - - - - - -When an installer fetches the simple page for a project, if it finds this -additional meta-data then it should use this data to tell the user how to add -one or more of the additional URLs to search in. This message should include -any comments that the project has included to enable them to communicate to the -user and provide hints as to which URL they might want (e.g. if some are only -useful or compatible with certain platforms or situations). When the installer -has implemented the auto discovery mechanisms they should also deprecate any of -the mechanisms added for PEP 438 (such as ``--allow-external``) for removal at -the end of the deprecation period proposed by the PEP. - -In addition to the API for programtic access to the registered external -repositories, PyPI will also prevent these URLs in the UI so that users with -an installer that does not implement the discovery mechanism can still easily -discover what repository the project is using to host itself. - -This feature **MUST** be added to PyPI and be contained in a released version -of pip prior to starting the deprecation and removal process for the implicit -offsite hosting functionality. - - Deprecation and Removal of Link Spidering ========================================= -.. important:: The deprecation specified in this section **MUST** not start to - until after the discovery mechanisms have been implemented and released in - pip. - - The only exception to this is the addition of the ``pypi-only`` mode and - defaulting new projects to it without abilility to switch to a different - mode. - A new hosting mode will be added to PyPI. This hosting mode will be called ``pypi-only`` and will be in addition to the three that PEP 438 has already given us which are ``pypi-explicit``, ``pypi-scrape``, ``pypi-scrape-crawl``. @@ -282,44 +209,34 @@ else. Upon acceptance of this PEP and the addition of the ``pypi-only`` mode, all new projects will be defaulted to the PyPI only mode and they will be locked to -this mode and unable to change this particular setting. ``pypi-only`` projects -will still be able to register external index URLs as described above - the -"pypi-only" refers only to the download links that are published directly on -PyPI. +this mode and unable to change this particular setting. An email will then be sent out to all of the projects which are hosted only on PyPI informing them that in one month their project will be automatically converted to the ``pypi-only`` mode. A month after these emails have been sent any of those projects which were emailed, which still are hosted only on PyPI -will have their mode set to ``pypi-only``. +will have their mode set permanently to ``pypi-only``. -After that switch, an email will be sent to projects which rely on hosting +At the same time, an email will be sent to projects which rely on hosting external to PyPI. This email will warn these projects that externally hosted -files have been deprecated on PyPI and that in 6 months from the time of that +files have been deprecated on PyPI and that in 3 months from the time of that email that all external links will be removed from the installer APIs. This email **MUST** include instructions for converting their projects to be hosted on PyPI and **MUST** include links to a script or package that will enable them to enter their PyPI credentials and package name and have it automatically download and re-host all of their files on PyPI. This email **MUST** also -include instructions for setting up their own index page and registering that -with PyPI, including the fact that they can use pythonhosted.org as a host for -an index page without requiring them to host any additional infrastructure or -purchase a TLS certificate. This email must also contain a link to the Terms of -Service for PyPI as many users may have signed up a long time ago and may not -recall what those terms are. Finally this email must also contain a list of -the links registered with PyPI where we were able to detect an installable file -was located. +include instructions for setting up their own index page. This email must also contain a link to the Terms of Service for PyPI as many users may have signed +up a long time ago and may not recall what those terms are. Finally this email +must also contain a list of the links registered with PyPI where we were able +to detect an installable file was located. -Five months after the initial email, another email must be sent to any projects +Two months after the initial email, another email must be sent to any projects still relying on external hosting. This email will include all of the same information that the first email contained, except that the removal date will -be one month away instead of six. +be one month away instead of three. Finally a month later all projects will be switched to the ``pypi-only`` mode -and PyPI will be modified to remove the externally linked files functionality, -when switching these projects to the ``pypi-only`` mode we will move any links -which are able to be used for discovering other projects automatically to as -an external repository. +and PyPI will be modified to remove the externally linked files functionality. Summary of Changes @@ -328,116 +245,85 @@ Summary of Changes Repository side --------------- -#. Implement simple API changes to allow the addition of an external +#. Deprecate and remove the hosting modes as defined by PEP 438. +#. Restrict simple API to only list the files that are contained within the repository. -#. *(Optional, Mandatory on PyPI)* Deprecate and remove the hosting modes as - defined by PEP 438. -#. *(Optional, Mandatory on PyPI)* Restrict simple API to only list the files - that are contained within the repository and the external repository - metadata. + Client side ----------- #. Implement multiple repository support. #. Implement some mechanism for removing/disabling the default repository. -#. Implement the discovery mechanism. -#. *(Optional)* Deprecate / Remove PEP 438 +#. Deprecate / Remove PEP 438 Impact ====== -The large impact of this PEP will be that for users of older installation -clients they will not get a discovery mechanism built into the install command. -This will require them to browse to the PyPI web UI and discover the repository -there. Since any URLs required to instal a project will be automatically -migrated to the new format, the biggest change to users will be requiring a new -option to install these projects. +To determine impact, we've looked at all projects using a method of searching +PyPI which is similar to what pip and setuptools use and searched for all +files available on PyPI, safely linked from PyPI, unsafely linked from PyPI, +and finally unsafely available outside of PyPI. When the same file was found +in multiple locations it was deduplicated and only counted it in one location +based on the following preferences: PyPI > Safely Off PyPI > Unsafely Off PyPI. +This gives us the broadest possible definition of impact, it means that any +single file for this project may no longer be visible by default, however that +file could be years old, or it could be a binary file while there is a sdist +available on PyPI. This means that the *real* impact will likely be much +smaller, but in an attempt not to miscount we take the broadest possible +definition. -Looking at the numbers the actual impact should be quite low, with it affecting -just 3.8% of projects which host any files only externally or 2.2% which have -their latest version hosted only externally. - -6674 unique IP addresses have accessed the Simple API for these 3.8% of -projects in a single day (2014-09-30). Of those, 99.5% of them installed -something which could not be verified, and thus they were open to a Remote Code -Execution via a Man-In-The-Middle attack, while 7.9% installed something which -could be verified and only 0.4% only installed things which could be verified. - -This means that 99.5% users of these features, both new and old, are doing -something unsafe, and for anything using an older copy of pip or using -setuptools at all they are silently unsafe. +At the time of this writing there are 65,232 projects hosted on PyPI and of +those, 59 of them rely on external files that are safely hosted outside of PyPI +and 931 of them rely on external files which are unsafely hosted outside of +PyPI. This shows us that 1.5% of projects will be affected in some way by this +change while 98.5% will continue to function as they always have. In addition, +only 5% of the projects affected are using the features provided by PEP 438 to +safely host outside of PyPI while 95% of them are exposing their users to +Remote Code Execution via a Man In The Middle attack. -Projects Which Rely on Externally Hosted files ----------------------------------------------- +Data Sovereignty +================ -This is determined by crawling the simple index and looking for installable -files using a similar detection method as pip and setuptools use. The "latest" -version is determined using ``pkg_resources.parse_version`` sort order and it -is used to show whether or not the latest version is hosted externally or only -old versions are. +In the discussions around previous versions of this PEP, one of the key use +cases for wanting to host files externally to PyPI was due to data sovereignty +requirements for people living in jurisdictions outside of the USA, where PyPI +is currently hosted. The author of this PEP is not blind to these concerns and +realizes that this PEP represents a regression for the people that have these +concerns, however the current situation is presenting an extremely poor user +experience and the feature is only being used by a small percentage of +projects. In addition, the data sovereignty problems requires familarity with +the laws outside of the home jurisdiction of the author of this PEP, who is +also the principal developer and operator of PyPI. For these reasons, a +solution for the problem of data sovereignty has been deferred and is +considered outside of the scope for this PEP. -============ ======= ================ =================== ======= -\ PyPI External (old) External (latest) Total -============ ======= ================ =================== ======= - **Safe** 43313 16 39 43368 - **Unsafe** 0 756 1092 1848 - **Total** 43313 772 1131 45216 -============ ======= ================ =================== ======= - - -Top Externally Hosted Projects by Requests ------------------------------------------- - -This is determined by looking at the number of requests the -``/simple//`` page had gotten in a single day. The total number of -requests during that day was 10,623,831. - -============================== ======== -Project Requests -============================== ======== -PIL 63869 -Pygame 2681 -mysql-connector-python 1562 -pyodbc 724 -elementtree 635 -salesforce-python-toolkit 316 -wxPython 295 -PyXML 251 -RBTools 235 -python-graph-core 123 -cElementTree 121 -============================== ======== - - -Top Externally Hosted Projects by Unique IPs --------------------------------------------- - -This is determined by looking at the IP addresses of requests the -``/simple//`` page had gotten in a single day. The total number of -unique IP addresses during that day was 124,604. - -============================== ========== -Project Unique IPs -============================== ========== -PIL 4553 -mysql-connector-python 462 -Pygame 202 -pyodbc 181 -elementtree 166 -wxPython 126 -RBTools 114 -PyXML 87 -salesforce-python-toolkit 76 -pyDes 76 -============================== ========== +If someone for whom the issue of data sovereignty matters to them wishes to +put forth the effort, then at that time a system can be designed, implemented, +and ultimately deployed and operated that would satisfy both the needs of non +US users that cannot upload their projects to a system on US soil and the +quality of user experience that is attempted to be created on PyPI. Rejected Proposals ================== +Allow easier discovery of externally hosted indexes +--------------------------------------------------- + +A previous version of this PEP included a new feature added to both PyPI and +installers that would allow project authors to enter into PyPI a list of +URLs that would instruct installers to ignore any files uploaded to PyPI and +instead return an error telling the end user about these extra URLs that they +can add to their installer to make the installation work. + +This idea is rejected because it provides a similar painful end user experience +where people will first attempt to install something, get an error, then have +to re-run the installation with the correct options. + + Keep the current classification system but adjust the options -------------------------------------------------------------