Create a new draft of PEP 470
* Clarifies the text throughout * Address comments from the previous threads
This commit is contained in:
parent
ca057e1e8b
commit
f23ac16974
537
pep-0470.txt
537
pep-0470.txt
|
@ -1,5 +1,5 @@
|
|||
PEP: 470
|
||||
Title: Using Multi Index Support for External to PyPI Package File Hosting
|
||||
Title: Using Multi Repository Support for External to PyPI Package File Hosting
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Donald Stufft <donald@stufft.io>,
|
||||
|
@ -9,192 +9,212 @@ Status: Draft
|
|||
Type: Process
|
||||
Content-Type: text/x-rst
|
||||
Created: 12-May-2014
|
||||
Post-History: 14-May-2014, 05-Jun-2014
|
||||
Post-History: 14-May-2014, 05-Jun-2014, 03-Oct-2014
|
||||
Replaces: 438
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes that the official means of having an installer locate and
|
||||
find package files which are hosted externally to PyPI become the use of
|
||||
multi index support instead of the practice of using external links on the
|
||||
simple installer API.
|
||||
This PEP proposes a mechanism for project authors to register with PyPI an
|
||||
external repository where their project's downloads can be located. This
|
||||
information can than be included as part of the simple API so that installers
|
||||
can use it to tell users where the item they are attempting to install is
|
||||
located and what they need to do to enable this additional repository. In
|
||||
addition to adding discovery information to make explicit multiple repositories
|
||||
easy to use, this PEP also deprecates and removes the implicit multiple
|
||||
repository support which currently functions through directly or indirectly
|
||||
linking offsite via the simple API. Finally this PEP also proposes deprecating
|
||||
and removing the functionality added by PEP 438, particularly the additional
|
||||
rel information and the meta tag to indicate the API version.
|
||||
|
||||
It is important to remember that this is **not** about forcing anyone to host
|
||||
their files on PyPI. If someone does not wish to do so they will never be under
|
||||
any obligation too. They can still list their project in PyPI as an index, and
|
||||
the tooling will still allow them to host it elsewhere.
|
||||
|
||||
This PEP strictly is concerned with the Simple Installer API and how automated
|
||||
installers interact with PyPI, it has no bearing on the informational pages
|
||||
which are primarily for human consumption.
|
||||
This PEP *does* not propose mandating that all authors upload their projects to
|
||||
PyPI in order to exist in the index nor does it propose any change to the human
|
||||
facing elements of PyPI.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
There is a long history documented in PEP 438 that explains why externally
|
||||
hosted files exist today in the state that they do on PyPI. For the sake of
|
||||
brevity I will not duplicate that and instead urge readers to first take a look
|
||||
at PEP 438 for background.
|
||||
Historically PyPI did not have any method of hosting files nor any method of
|
||||
automatically retrieving installables, it was instead focused on providing a
|
||||
central registry of names, to prevent naming collisions, and as a means of
|
||||
discovery for finding projects to use. In the course of time setuptools began
|
||||
to scrape these human facing pages, as well as pages linked from those pages,
|
||||
looking for things it could automatically download and install. Eventually this
|
||||
became the "Simple" API which used a similar URL structure however it
|
||||
eliminated any of the extraneous links and information to make the API more
|
||||
efficient. Additionally PyPI grew the ability for a project to upload release
|
||||
files directly to PyPI enabling PyPI to act as a repository in addition to an
|
||||
index.
|
||||
|
||||
There are currently two primary ways for a project to make itself available
|
||||
without directly hosting the package files on PyPI. They can either include
|
||||
links to the package files in the simpler installer API or they can publish
|
||||
a custom package index which contains their project.
|
||||
This gives PyPI two equally important roles that it plays in the Python
|
||||
ecosystem, that of index to enable easy discovery of Python projects and
|
||||
central repository to enable easy hosting, download, and installation of Python
|
||||
projects. Due to the history behind PyPI and the very organic growth it has
|
||||
experienced the lines between these two roles are blurry, and this blurriness
|
||||
has caused confusion for the end users of both of these roles and this has in
|
||||
turn caused ire between people attempting to use PyPI in different capacities,
|
||||
most often when end users want to use PyPI as a repository but the author wants
|
||||
to use PyPI soley as an index.
|
||||
|
||||
By moving to using explict multiple repositories we can make the lines between
|
||||
these two roles much more explicit and remove the "hidden" surprises caused
|
||||
by the current implementation of handling people who do not want to use PyPI
|
||||
as a repository. However simply moving to explicit multiple repositories is
|
||||
a regression in discoverablity, and for that reason this PEP adds an extension
|
||||
to the current simple API which will enable easy discovery of the specific
|
||||
repository that a project can be found in.
|
||||
|
||||
PEP 438 attempted to solve this issue by allowing projects to explicitly
|
||||
declare if they were using the repository features or not, and if they were
|
||||
not, it had the installers classify the links it found as either "internal",
|
||||
"verifiable external" or "unverifiable external". PEP 438 was accepted and
|
||||
implemented in pip 1.4 (released on Jul 23, 2013) with the final transition
|
||||
implemented in pip 1.5 (released on Jan 2, 2014).
|
||||
|
||||
PEP 438 was successful in bringing about more people to utilize PyPI's
|
||||
repository features, an altogether good thing given the global CDN powering
|
||||
PyPI providing speed ups for a lot of people, however it did so by introducing
|
||||
a new point of confusion and pain for both the end users and the authors.
|
||||
|
||||
|
||||
Custom Additional Index
|
||||
-----------------------
|
||||
Why Additional Repositories?
|
||||
----------------------------
|
||||
|
||||
Each installer which speaks to PyPI offers a mechanism for the user invoking
|
||||
that installer to provide additional custom locations to search for files
|
||||
during the dependency resolution phase. For pip these locations can be
|
||||
configured per invocation, per shell environment, per requirements file, per
|
||||
virtual environment, and per user. The mechanism for specifying additional
|
||||
locations have existed within pip and setuptools for many years, by comparison
|
||||
the mechanisms in PEP 438 and any other new mechanism will have existed for
|
||||
only a short period of time (if they exist at all currently).
|
||||
The two common installer tools, pip and easy_install/setuptools, both support
|
||||
the concept of additional locations to search for files to satisify the
|
||||
installation requirements and have done so for many years. This means that
|
||||
there is no need to "phase" in a new flag or concept and the solution to
|
||||
installing a project from a repository other than PyPI will function regardless
|
||||
of how old (within reason) the end user's installer is. Not only has this
|
||||
concept existed in the Python tooling for some time, but it is a concept that
|
||||
exists across languages and even extending to the OS level with OS package
|
||||
tools almost universally using multiple repository support making it extremely
|
||||
likely that someone is already familar with the concept.
|
||||
|
||||
The use of additional indexes instead of external links on the simple
|
||||
installer API provides a simple clean interface which is consistent with the
|
||||
way most Linux package systems work (apt-get, yum, etc). More importantly it
|
||||
works the same even for projects which are commercial or otherwise have their
|
||||
access restricted in some form (private networks, password, IP ACLs etc)
|
||||
while the external links method only realistically works for projects which
|
||||
do not have their access restricted.
|
||||
Additionally, the multiple repository approach is a concept that is useful
|
||||
outside of the narrow scope of allowing projects which wish to be included on
|
||||
the index portion of PyPI but do not wish to utilize the repository portion
|
||||
of PyPI. This includes places where a company may wish to host a repository
|
||||
that contains their internal packages or where a project may wish to have
|
||||
multiple "channels" of releases, such as alpha, beta, release candidate, and
|
||||
final release.
|
||||
|
||||
Compared to the complex rules which a project must be aware of to prevent
|
||||
themselves from being considered unsafely hosted setting up an index is fairly
|
||||
trivial and in the simplest case does not require anything more than a
|
||||
filesystem and a standard web server such as Nginx or Twisted Web. Even if
|
||||
using simple static hosting without autoindexing support, it is still
|
||||
straightforward to generate appropriate index pages as static HTML.
|
||||
|
||||
Example Index with Twisted Web
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
1. Create a root directory for your index, for the purposes of the example
|
||||
I'll assume you've chosen ``/var/www/index.example.com/``.
|
||||
2. Inside of this root directory, create a directory for each project such
|
||||
as ``mkdir -p /var/www/index.example.com/{foo,bar,other}/``.
|
||||
3. Place the package files for each project in their respective folder,
|
||||
creating paths like ``/var/www/index.example.com/foo/foo-1.0.tar.gz``.
|
||||
4. Configure Twisted Web to serve the root directory, ideally with TLS.
|
||||
Setting up an external repository is very simple, it can be achieved with
|
||||
nothing more than a filesystem, some files to host, and any web server capable
|
||||
of serving files and generating an automated index of directories (commonly
|
||||
called "autoindex"). This can be as simple as:
|
||||
|
||||
::
|
||||
|
||||
$ mkdir -p /var/www/index.example.com/
|
||||
$ mkdir -p /var/www/index.example.com/myproject/
|
||||
$ mv ~/myproject-1.0.tar.gz /var/www/index.example.com/myproject/
|
||||
$ twistd -n web --path /var/www/index.example.com/
|
||||
|
||||
|
||||
Examples of Additional indexes with pip
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Invocation:**
|
||||
Using this additional location within pip is also simple and can be included
|
||||
on a per invocation, per shell, or per user basis. The pip 6.0 will also
|
||||
include the ability to configure this on a per virtual environment or per
|
||||
machine basis as well. This can be as simple as:
|
||||
|
||||
::
|
||||
|
||||
$ pip install --extra-index-url https://pypi.example.com/ foobar
|
||||
|
||||
**Shell Environment:**
|
||||
|
||||
::
|
||||
|
||||
$ export PIP_EXTRA_INDEX_URL=https://pypi.example.com/
|
||||
$ pip install foobar
|
||||
|
||||
**Requirements File:**
|
||||
|
||||
::
|
||||
|
||||
$ echo "--extra-index-url https://pypi.example.com/\nfoobar" > requirements.txt
|
||||
$ pip install -r requirements.txt
|
||||
|
||||
**Virtual Environment:**
|
||||
|
||||
::
|
||||
|
||||
$ python -m venv myvenv
|
||||
$ echo "[global]\nextra-index-url = https://pypi.example.com/" > myvenv/pip.conf
|
||||
$ myvenv/bin/pip install foobar
|
||||
|
||||
**User:**
|
||||
|
||||
::
|
||||
|
||||
$ echo "[global]\nextra-index-url = https://pypi.example.com/" >~/.pip/pip.conf
|
||||
$ pip install foobar
|
||||
$ # As a CLI argument
|
||||
$ pip install --extra-index-url https://index.example.com/ myproject
|
||||
$ # As an environment variable
|
||||
$ PIP_EXTRA_INDEX_URL=https://pypi.example.com/ pip install myproject
|
||||
$ # With a configuration file
|
||||
$ echo "[global]\nextra-index-url = https://pypi.example.com/" > ~/.pip/pip.conf
|
||||
$ pip install myproject
|
||||
|
||||
|
||||
External Links on the Simple Installer API
|
||||
------------------------------------------
|
||||
Why Not PEP 438 or Similar?
|
||||
---------------------------
|
||||
|
||||
PEP 438 proposed a system of classifying file links as either internal,
|
||||
external, or unsafe. It recommended that by default only internal links would
|
||||
be installed by an installer however users could opt into external links on
|
||||
either a global or a per package basis. Additionally they could also opt into
|
||||
unsafe links on a per package basis.
|
||||
While the additional search location support has existed in pip and setuptools
|
||||
for quite some time support for PEP 438 has only existed in pip since the 1.4
|
||||
version, and still has yet to be implemented in setuptools. The design of
|
||||
PEP 438 did mean that users still benefited for projects which did not require
|
||||
external files even with older installers, however for projects which *did*
|
||||
require external files, users are still silently being given either
|
||||
potentionally unreliable or, even worse, unsafe files to download. This system
|
||||
is also unique to Python as it arises out of the history of PyPI, this means
|
||||
that it is almost certain that this concept will be foreign to most, if not all
|
||||
users, until they encounter it while attempting to use the Python toolchain.
|
||||
|
||||
This system has turned out to be *extremely* unfriendly towards the end users
|
||||
and it is the position of this PEP that the situation has become untenable. The
|
||||
situation as provided by PEP 438 requires an end user to be aware not only of
|
||||
the difference between internal, external, and unsafe, but also to be aware of
|
||||
what hosting mode the package they are trying to install is in, what links are
|
||||
available on that project's /simple/ page, whether or not those links have
|
||||
a properly formatted hash fragment, and what links are available from pages
|
||||
linked to from that project's /simple/ page.
|
||||
Additionally, the classification system proposed by PEP 438 has, in practice,
|
||||
turned out to be extremely confusing to end users, so much so that it is a
|
||||
position of this PEP that the situation as it stands is completely untenable.
|
||||
The common pattern for a user with this system is to attempt to install a
|
||||
project possibly get an error message (or maybe not if the project ever
|
||||
uploaded something to PyPI but later switched without removing old files), see
|
||||
that the error message suggests ``--allow-external``, they reissue the command
|
||||
adding that flag most likely getting another error message, see that this time
|
||||
the error message suggests also adding ``--allow-unverified``, and again issue
|
||||
the command a third time, this time finally getting the thing they wish to
|
||||
install.
|
||||
|
||||
There are a number of common confusion/pain points with this system that I
|
||||
have witnessed:
|
||||
This UX failure exists for several reasons.
|
||||
|
||||
* Users unaware what the simple installer api is at all or how an installer
|
||||
locates installable files.
|
||||
* Users unaware that even if the simple api links to a file, if it does
|
||||
not include a ``#md5=...`` fragment that it will be counted as unsafe.
|
||||
* Users unaware that an installer can look at pages linked from the
|
||||
simple api to determine additional links, or that any links found in this
|
||||
fashion are considered unsafe.
|
||||
* Users are unaware and often surprised that PyPI supports hosting your files
|
||||
someplace other than PyPI at all.
|
||||
1. If pip can locate files at all for a project on the Simple API it will
|
||||
simply use that instead of attempting to locate more. This is generally the
|
||||
right thing to do as attempting to locate more would erase a large part of
|
||||
the benefit of PEP 438. This means that if a project *ever* uploaded
|
||||
a file that matches what the user has requested for install that will be
|
||||
used regardless of how old it is.
|
||||
|
||||
In addition to that, the information that an installer is able to provide
|
||||
when an installation fails is pretty minimal. We are able to detect if there
|
||||
are externally hosted files directly linked from the simple installer api,
|
||||
however we cannot detect if there are files hosted on a linked page without
|
||||
fetching that page and doing so would cause a massive performance hit just to
|
||||
see if there might be a file there so that a better error message could be
|
||||
provided.
|
||||
2. PEP 438 makes an implicit assumption that most projects would either upload
|
||||
themselves to PyPI or would update themselves to directly linking to release
|
||||
files. While a large number of projects *did* ultimately decide to upload
|
||||
to PyPI, some of them did so only because the UX around what PEP 438 was so
|
||||
bad that they felt forced to do so. More concerning however, is the fact
|
||||
that very few projects have opted to directly and safely link to files and
|
||||
instead they still simply link to pages which must be scraped in order to
|
||||
find the actual files, thus rendering the safe variant
|
||||
(``--allow-external``) largely useless.
|
||||
|
||||
Finally very few projects have properly linked to their external files so that
|
||||
they can be safely downloaded and verified. At the time of this writing there
|
||||
are a total of 65 projects which have files that are only available externally
|
||||
and are safely hosted.
|
||||
3. Even if an author wishes to directly link to their files, doing so safely is
|
||||
non-obvious. It requires the inclusion of a MD5 hash (for historical
|
||||
reasons) in the hash of the URL. If they do not include this then their
|
||||
files will be considered "unverified".
|
||||
|
||||
The end result of all of this, is that with PEP 438, when a user attempts to
|
||||
install a file that is not hosted on PyPI typically the steps they follow are:
|
||||
4. PEP 438 takes a security centric view and disallows any form of a global
|
||||
opt in for unverified projects. While this is generally a good thing, it
|
||||
creates extremely verbose and repetive command invocations such as:
|
||||
|
||||
1. First, they attempt to install it normally, using ``pip install foobar``.
|
||||
This fails because the file is not hosted on PyPI and PEP 438 has us default
|
||||
to only hosted on PyPI. If pip detected any externally hosted files or other
|
||||
pages that we *could* have attempted to find other files at it will give an
|
||||
error message suggesting that they try ``--allow-external foobar``.
|
||||
2. They then attempt to install their package using
|
||||
``pip install --allow-external foobar foobar``. If they are lucky foobar is
|
||||
one of the packages which is hosted externally and safely and this will
|
||||
succeed. If they are unlucky they will get a different error message
|
||||
suggesting that they *also* try ``--allow-unverified foobar``.
|
||||
3. They then attempt to install their package using
|
||||
``pip install --allow-external foobar --allow-unverified foobar foobar``
|
||||
and this finally works.
|
||||
::
|
||||
|
||||
This is the same basic steps that practically everyone goes through every time
|
||||
they try to install something that is not hosted on PyPI. If they are lucky it'll
|
||||
only take them two steps, but typically it requires three steps. Worse there is
|
||||
no real indication to these people why one package might install after two
|
||||
but most require three. Even worse than that most of them will never get an
|
||||
externally hosted package that does not take three steps, so they will be
|
||||
increasingly annoyed and frustrated at the intermediate step and will likely
|
||||
eventually just start skipping it.
|
||||
$ pip install --allow-external myproject --allow-unverified myproject myproject
|
||||
$ pip install --allow-all-external --allow-unverified myproject myproject
|
||||
|
||||
|
||||
Multiple Repository/Index Support
|
||||
=================================
|
||||
|
||||
Installers SHOULD implement or continue to offer, the ability to point the
|
||||
installer at multiple URL locations. The exact mechanisms for a user to
|
||||
indicate they wish to use an additional location is left up to each indidivdual
|
||||
implementation.
|
||||
|
||||
Additionally the mechanism discovering an installation candidate when multiple
|
||||
repositories are being used is also up to each individual implementation,
|
||||
however once configured an implementation should not discourage, warn, or
|
||||
otherwise cast a negative light upon the use of a repository simply because it
|
||||
is not the default repository.
|
||||
|
||||
Currently both pip and setuptools implement multiple repository support by
|
||||
using the best installation candidate it can find from either repository,
|
||||
essentially treating it as if it were one large repository.
|
||||
|
||||
Installers SHOULD also implement some mechanism for removing or otherwise
|
||||
disabling use of the default repository. The exact specifics of how that is
|
||||
achieved is up to each indidivdual implementation.
|
||||
|
||||
End users wishing to limit what files they pull from which repository can
|
||||
simply use `devpi <http://doc.devpi.net/latest/>`_ to whitelist projects from
|
||||
PyPI or another repository.
|
||||
|
||||
|
||||
External Index Discovery
|
||||
|
@ -208,24 +228,44 @@ people who discover their project organically through ``pip search``.
|
|||
|
||||
To support projects that wish to externally host their files and to enable
|
||||
users to easily discover what additional indexes are required, PyPI will gain
|
||||
the ability for projects to register external index URLs and additionally an
|
||||
the ability for projects to register external index URLs along with an
|
||||
associated comment for each. These URLs will be made available on the simple
|
||||
page however they will not be linked or provided in a form that older
|
||||
installers will automatically search them.
|
||||
|
||||
This ability will take the form of a ``<meta>`` tag. The name of this tag must
|
||||
be set to ``external-repository`` and the content will be a link to the location
|
||||
of the external repository. An optional data-description attribute will convey
|
||||
any comments or description that the author has provided.
|
||||
|
||||
An example would look something like:
|
||||
|
||||
::
|
||||
|
||||
<meta name="external-repository" content="https://index.example.com/" data-description="Primary Repository">
|
||||
<meta name="external-repository" content="https://index.example.com/Ubuntu-14.04/" data-description="Wheels built for Ubuntu 14.04">
|
||||
|
||||
|
||||
When an external repository is added to a project, new uploads will no longer
|
||||
be permitted to that project. However any existing files will simply be hidden
|
||||
from the simple API and the web interface until all of the external repositories
|
||||
are removed, in which case they will be visible again. PyPI MUST warn authors
|
||||
if adding an external repository will hide files and that warning must persist
|
||||
on any of the project management pages for that particular project.
|
||||
|
||||
When an installer fetches the simple page for a project, if it finds this
|
||||
additional meta-data and it cannot find any files for that project in it's
|
||||
configured URLs then it should use this data to tell the user how to add one
|
||||
or more of the additional URLs to search in. This message should include any
|
||||
comments that the project has included to enable them to communicate to the
|
||||
user and provide hints as to which URL they might want if some are only
|
||||
useful or compatible with certain platforms or situations. When the installer
|
||||
user and provide hints as to which URL they might want (e.g. if some are only
|
||||
useful or compatible with certain platforms or situations). When the installer
|
||||
has implemented the auto discovery mechanisms they should also deprecate any
|
||||
of the mechanisms added for PEP 438 (such as ``--allow-external``) for removal
|
||||
at the end of the deprecation period proposed by the PEP.
|
||||
|
||||
This feature *must* be added to PyPI prior to starting the deprecation and
|
||||
removal process for link spidering.
|
||||
removal process for the implicit offsite hosting functionality.
|
||||
|
||||
|
||||
Deprecation and Removal of Link Spidering
|
||||
|
@ -278,20 +318,6 @@ deprecated PEP 438 functionality such as ``--allow-external`` and
|
|||
``--allow-unverified`` in pip.
|
||||
|
||||
|
||||
PIL
|
||||
---
|
||||
|
||||
It's obvious from the numbers below that the vast bulk of the impact come from
|
||||
the PIL project. On 2014-05-17 an email was sent to the contact for PIL
|
||||
inquiring whether or not they would be willing to upload to PyPI. A response
|
||||
has not been received as of yet (2014-06-05) nor has any change in the hosting
|
||||
happened. Due to the popularity of PIL this PEP also proposes that during the
|
||||
deprecation period that PyPI Administrators will set the PIL download URL as
|
||||
the external index for that project. Allowing the users of PIL to take
|
||||
advantage of the auto discovery mechanisms although the project has seemingly
|
||||
become unmaintained.
|
||||
|
||||
|
||||
Impact
|
||||
======
|
||||
|
||||
|
@ -300,12 +326,16 @@ no longer maintaining the project, for one reason or another. For these
|
|||
projects it's unlikely that a maintainer will arrive to set the external index
|
||||
metadata which would allow the auto discovery mechanism to find it.
|
||||
|
||||
Looking at the numbers factoring out PIL (which has been special cased above)
|
||||
the actual impact should be quite low, with it affecting just 6.9% of projects
|
||||
which host only externally or 2.8% which have their latest version hosted
|
||||
externally. This represents a mere 3883 unique IP addresses. The break down of
|
||||
this is that of those 3883 addresses, 100% of them installed something that
|
||||
could not be verified while only 3% installed something which could be.
|
||||
Looking at the numbers factoring out PIL (which has been special cased below)
|
||||
the actual impact should be quite low, with it affecting just 3.8% of projects
|
||||
which host any files only externally or 2.2% which have their latest version
|
||||
hosted only externally.
|
||||
|
||||
6674 unique IP addresses have accessed the Simple API for these 3.8% of
|
||||
projects in a single day (2014-09-30). Of those, 99.5% of them installed
|
||||
something which could not be verified, and thus they were open to a Remote Code
|
||||
Execution via a Man-In-The-Middle attack, while 7.9% installed something which
|
||||
could be verified and only 0.4% only installed things which could be verified.
|
||||
|
||||
|
||||
Projects Which Rely on Externally Hosted files
|
||||
|
@ -320,9 +350,9 @@ old versions are.
|
|||
============ ======= ================ =================== =======
|
||||
\ PyPI External (old) External (latest) Total
|
||||
============ ======= ================ =================== =======
|
||||
**Safe** 38716 31 35 38782
|
||||
**Unsafe** 0 1659 1169 2828
|
||||
**Total** 38716 1690 1204 41610
|
||||
**Safe** 43313 16 39 43368
|
||||
**Unsafe** 0 756 1092 1848
|
||||
**Total** 43313 772 1131 45216
|
||||
============ ======= ================ =================== =======
|
||||
|
||||
|
||||
|
@ -331,21 +361,22 @@ Top Externally Hosted Projects by Requests
|
|||
|
||||
This is determined by looking at the number of requests the
|
||||
``/simple/<project>/`` page had gotten in a single day. The total number of
|
||||
requests during that day was 17,960,467.
|
||||
requests during that day was 10,623,831.
|
||||
|
||||
============================== ========
|
||||
Project Requests
|
||||
============================== ========
|
||||
PIL 13470
|
||||
mysql-connector-python 321
|
||||
salesforce-python-toolkit 54
|
||||
pyodbc 50
|
||||
elementtree 44
|
||||
atfork 39
|
||||
RBTools 29
|
||||
django-contrib-requestprovider 28
|
||||
wadofstuff-django-serializers 23
|
||||
Pygame 21
|
||||
PIL 63869
|
||||
Pygame 2681
|
||||
mysql-connector-python 1562
|
||||
pyodbc 724
|
||||
elementtree 635
|
||||
salesforce-python-toolkit 316
|
||||
wxPython 295
|
||||
PyXML 251
|
||||
RBTools 235
|
||||
python-graph-core 123
|
||||
cElementTree 121
|
||||
============================== ========
|
||||
|
||||
|
||||
|
@ -354,25 +385,38 @@ Top Externally Hosted Projects by Unique IPs
|
|||
|
||||
This is determined by looking at the IP addresses of requests the
|
||||
``/simple/<project>/`` page had gotten in a single day. The total number of
|
||||
unique IP addresses during that day was 105,587.
|
||||
unique IP addresses during that day was 124,604.
|
||||
|
||||
============================== ==========
|
||||
Project Unique IPs
|
||||
============================== ==========
|
||||
PIL 3515
|
||||
mysql-connector-python 117
|
||||
pyodbc 34
|
||||
elementtree 21
|
||||
RBTools 19
|
||||
egenix-mx-base 16
|
||||
Pygame 14
|
||||
salesforce-python-toolkit 13
|
||||
django-contrib-requestprovider 12
|
||||
wxPython 11
|
||||
python-apt 10
|
||||
PIL 4553
|
||||
mysql-connector-python 462
|
||||
Pygame 202
|
||||
pyodbc 181
|
||||
elementtree 166
|
||||
wxPython 126
|
||||
RBTools 114
|
||||
PyXML 87
|
||||
salesforce-python-toolkit 76
|
||||
pyDes 76
|
||||
============================== ==========
|
||||
|
||||
|
||||
PIL
|
||||
---
|
||||
|
||||
It's obvious from the numbers above that the vast bulk of the impact come from
|
||||
the PIL project. On 2014-05-17 an email was sent to the contact for PIL
|
||||
inquiring whether or not they would be willing to upload to PyPI. A response
|
||||
has not been received as of yet (2014-10-03) nor has any change in the hosting
|
||||
happened. Due to the popularity of PIL this PEP also proposes that during the
|
||||
deprecation period that PyPI Administrators will set the PIL download URL as
|
||||
the external index for that project. Allowing the users of PIL to take
|
||||
advantage of the auto discovery mechanisms although the project has seemingly
|
||||
become unmaintained.
|
||||
|
||||
|
||||
Rejected Proposals
|
||||
==================
|
||||
|
||||
|
@ -395,80 +439,33 @@ This includes:
|
|||
|
||||
These proposals are rejected because:
|
||||
|
||||
* The classification "system" is complex, hard to explain, and requires an
|
||||
intimate knowledge of how the simple API works in order to be able to reason
|
||||
about which classification is required. This is reflected in the fact that
|
||||
the code to implement it is complicated and hard to understand as well.
|
||||
* The classification system introduced in PEP 438 in an entirely unique concept
|
||||
to PyPI which is not generically applicable even in the context of Python
|
||||
packaging. Adding additional concepts comes at a cost.
|
||||
|
||||
* People are generally surprised that PyPI allows externally linking to files
|
||||
and doesn't require people to host on PyPI. In contrast most of them are
|
||||
familiar with the concept of multiple software repositories such as is in
|
||||
use by many OSs.
|
||||
* The classification system itself is non-obvious to explain and to
|
||||
pre-determine what classification of link a project will require entails
|
||||
inspecting the project's ``/simple/<project>/`` page, and possibly any
|
||||
URLs linked from that page.
|
||||
|
||||
* PyPI is fronted by a globally distributed CDN which has improved the
|
||||
reliability and speed for end users. It is unlikely that any particular
|
||||
external host has something comparable. This can lead to extremely bad
|
||||
performance for end users when the external host is located in different
|
||||
parts of the world or does not generally have good connectivity.
|
||||
* The ability to host externally while still being linked for automatic
|
||||
discovery is mostly a historic relic which causes a fair amount of pain and
|
||||
complexity for little reward.
|
||||
|
||||
As a data point, many users reported sub DSL speeds and latency when
|
||||
accessing PyPI from parts of Europe and Asia prior to the use of the CDN.
|
||||
* The installer's ability to optimize or clean up the user interface is limited
|
||||
due to the nature of the implicit link scraping which would need to be done.
|
||||
This extends to the ``--allow-*`` options as well as the inability to
|
||||
determine if a link is expected to fail or not.
|
||||
|
||||
* PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
|
||||
downtime quickly, thus enabling a quicker response to downtime. Again it is
|
||||
unlikely that any particular external host will have this. This can lead
|
||||
to single packages in a dependency chain being un-installable. This will
|
||||
often confuse users, who often times have no idea that this package relies
|
||||
on an external host, and they cannot figure out why PyPI appears to be up
|
||||
but the installer cannot find a package.
|
||||
|
||||
* PyPI supports mirroring, both for private organizations and public mirrors.
|
||||
The legal terms of uploading to PyPI ensure that mirror operators, both
|
||||
public and private, have the right to distribute the software found on PyPI.
|
||||
However software that is hosted externally does not have this, causing
|
||||
private organizations to need to investigate each package individually and
|
||||
manually to determine if the license allows them to mirror it.
|
||||
|
||||
For public mirrors this essentially means that these externally hosted
|
||||
packages *cannot* be reasonably mirrored. This is particularly troublesome
|
||||
in countries such as China where the bandwidth to outside of China is
|
||||
highly congested making a mirror within China often times a massively better
|
||||
experience.
|
||||
|
||||
* Installers have no method to determine if they should expect any particular
|
||||
URL to be available or not. It is not unusual for the simple API to reference
|
||||
old packages and URLs which have long since stopped working. This causes
|
||||
installers to have to assume that it is OK for any particular URL to not be
|
||||
accessible. This causes problems where an URL is temporarily down or
|
||||
otherwise unavailable (a common cause of this is using a copy of Python
|
||||
linked against a really ancient copy of OpenSSL which is unable to verify
|
||||
the SSL certificate on PyPI) but it *should* be expected to be up. In this
|
||||
case installers will typically silently ignore this URL and later the user
|
||||
will get a confusing error stating that the installer couldn't find any
|
||||
versions instead of getting the real error message indicating that the URL
|
||||
was unavailable.
|
||||
|
||||
* In the long run, global opt in flags like ``--allow-all-external`` will
|
||||
become little annoyances that developers cargo cult around in order to make
|
||||
their installer work. When they run into a project that requires it they
|
||||
will most likely simply add it to their configuration file for that installer
|
||||
and continue on with whatever they were actually trying to do. This will
|
||||
continue until they try to install their requirements on another computer
|
||||
or attempt to deploy to a server where their install will fail again until
|
||||
they add the "make it work" flag in their configuration file.
|
||||
|
||||
* The URL classification only works for a certain subset of projects, however
|
||||
it does not allow for any project which needs additional restrictions such
|
||||
as Access Controls. This means that there would be two methods of doing the
|
||||
same thing, linking to a file safely and hosting an index. Hosting an index
|
||||
works in all situations and by relying on this we make for a more consistent
|
||||
experience no matter the reason for external hosting.
|
||||
|
||||
* The safe external hosting option hampers the ability of PyPI to upgrade it's
|
||||
security infrastructure. For instance if MD5 becomes broken in the future
|
||||
there will be no way for PyPI to upgrade the hashes of the projects which
|
||||
rely on safe external hosting via MD5 while files that are hosted on PyPI
|
||||
can simply be processed over with a new hash function.
|
||||
* The mechanism paints a very broad brush when enabling an option, while PEP
|
||||
438 attempts to limit this with per package options. However a project that
|
||||
has existed for an extended period of time may often times have several
|
||||
different URLs listed in their simple index. It is not unsusual for at least
|
||||
one of these to no longer be under control of the project. While an
|
||||
unregistered domain will sit there relatively harmless most of the time, pip
|
||||
will continue to attempt to install from it on every discovery phase. This
|
||||
means that an attacker simply needs to look at projects which rely on unsafe
|
||||
external URLs and register expired domains to attack users.
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue