2009-03-22 05:01:46 -04:00
|
|
|
PEP: 381
|
2009-03-21 10:08:19 -04:00
|
|
|
Title: Mirroring infrastructure for PyPI
|
2023-09-01 10:46:38 -04:00
|
|
|
Author: Tarek Ziadé <tarek@ziade.org>, Martin von Löwis <martin@v.loewis.de>
|
2019-03-04 05:35:07 -05:00
|
|
|
Status: Withdrawn
|
2009-03-21 10:08:19 -04:00
|
|
|
Type: Standards Track
|
2022-06-14 17:22:20 -04:00
|
|
|
Topic: Packaging
|
2009-03-21 10:08:19 -04:00
|
|
|
Content-Type: text/x-rst
|
2021-02-09 11:54:26 -05:00
|
|
|
Created: 21-Mar-2009
|
2009-03-21 10:08:19 -04:00
|
|
|
Post-History:
|
|
|
|
|
2009-03-30 00:53:39 -04:00
|
|
|
|
2009-03-21 10:08:19 -04:00
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
This PEP describes a mirroring infrastructure for PyPI.
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
|
2019-03-04 05:35:07 -05:00
|
|
|
PEP Withdrawal
|
|
|
|
==============
|
|
|
|
|
|
|
|
The main PyPI web service was moved behind the Fastly caching CDN in May 2013:
|
|
|
|
https://mail.python.org/pipermail/distutils-sig/2013-May/020848.html
|
|
|
|
|
|
|
|
Subsequently, this arrangement was formalised as an in-kind sponsorship with
|
|
|
|
the PSF, and the PSF has also taken on the task of risk management in the event
|
|
|
|
that that sponsorship arrangement were to ever cease.
|
|
|
|
|
|
|
|
The download statistics that were previously provided directly on PyPI, are now
|
|
|
|
published indirectly via Google Big Query:
|
|
|
|
https://packaging.python.org/guides/analyzing-pypi-package-downloads/
|
|
|
|
|
|
|
|
Accordingly, the mirroring proposal described in this PEP is no longer required,
|
|
|
|
and has been marked as Withdrawn.
|
|
|
|
|
|
|
|
|
2009-03-21 10:08:19 -04:00
|
|
|
Rationale
|
|
|
|
=========
|
|
|
|
|
2013-03-30 18:18:51 -04:00
|
|
|
PyPI is hosting over 6000 projects and is used on a daily basis
|
2023-09-01 15:19:20 -04:00
|
|
|
by people to build applications. Especially systems like ``easy_install``
|
|
|
|
and ``zc.buildout`` make intensive usage of PyPI.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
For people making intensive use of PyPI, it can act as a single point
|
2009-03-22 05:01:46 -04:00
|
|
|
of failure. People have started to set up some mirrors, both private
|
|
|
|
and public. Those mirrors are active mirrors, which means that they
|
|
|
|
are browsing PyPI to get synced.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
In order to make the system more reliable, this PEP describes:
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
- the mirror listing and registering at PyPI
|
|
|
|
- the pages a public mirror should maintain. These pages will be used
|
|
|
|
by PyPI, in order to get hit counts and the last modified date.
|
2009-03-21 10:08:19 -04:00
|
|
|
- how a mirror should synchronize with PyPI
|
|
|
|
- how a client can implement a fail-over mechanism
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
|
2009-03-21 10:08:19 -04:00
|
|
|
Mirror listing and registering
|
|
|
|
==============================
|
|
|
|
|
2009-03-29 17:37:27 -04:00
|
|
|
People that wants to mirror PyPI make a proposal on catalog-SIG.
|
|
|
|
When a mirror is proposed on the mailing list, it is manually
|
|
|
|
added in a mirror list in the PyPI application after it
|
|
|
|
has been checked to be compliant with the mirroring rules.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2010-07-04 17:13:29 -04:00
|
|
|
The mirror list is provided as a list of host names of the
|
|
|
|
form
|
2009-03-29 17:37:27 -04:00
|
|
|
|
2010-07-04 17:13:29 -04:00
|
|
|
X.pypi.python.org
|
2009-03-29 17:37:27 -04:00
|
|
|
|
2010-07-04 17:13:29 -04:00
|
|
|
The values of X are the sequence a,b,c,...,aa,ab,...
|
|
|
|
a.pypi.python.org is the master server; the mirrors start
|
|
|
|
with b. A CNAME record last.pypi.python.org points to the
|
|
|
|
last host name. Mirror operators should use a static address,
|
|
|
|
and report planned changes to that address in advance to
|
2013-03-30 18:18:51 -04:00
|
|
|
distutils-sig.
|
2009-03-29 17:37:27 -04:00
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
The new mirror also appears at ``http://pypi.python.org/mirrors``
|
2009-03-29 17:37:27 -04:00
|
|
|
which is a human-readable page that gives the list of mirrors.
|
|
|
|
This page also explains how to register a new mirror.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2009-03-30 00:53:39 -04:00
|
|
|
Statistics page
|
|
|
|
:::::::::::::::
|
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
PyPI provides statistics on downloads at ``/stats``. This page is
|
2013-03-30 18:18:51 -04:00
|
|
|
calculated daily by PyPI, by reading all mirrors' local stats and
|
2009-03-30 00:53:39 -04:00
|
|
|
summing them.
|
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
The stats are presented in daily or monthly files, under ``/stats/days``
|
|
|
|
and ``/stats/months``. Each file is a ``bzip2`` file with these formats:
|
2009-03-30 00:53:39 -04:00
|
|
|
|
|
|
|
- YYYY-MM-DD.bz2 for daily files
|
|
|
|
- YYYY-MM.bz2 for monthly files
|
|
|
|
|
|
|
|
Examples:
|
|
|
|
|
|
|
|
- /stats/days/2008-11-06.bz2
|
|
|
|
- /stats/days/2008-11-07.bz2
|
|
|
|
- /stats/days/2008-11-08.bz2
|
|
|
|
- /stats/months/2008-11.bz2
|
|
|
|
- /stats/months/2008-10.bz2
|
|
|
|
|
2010-07-16 03:53:23 -04:00
|
|
|
Mirror Authenticity
|
|
|
|
===================
|
|
|
|
|
|
|
|
With a distributed mirroring system, clients may want to verify that
|
|
|
|
the mirrored copies are authentic. There are multiple threats to
|
|
|
|
consider:
|
|
|
|
|
|
|
|
1. the central index may get compromised
|
|
|
|
2. the central index is assumed to be trusted, but the mirrors might
|
|
|
|
be tampered.
|
|
|
|
3. a man in the middle between the central index and the end user,
|
|
|
|
or between a mirror and the end user might tamper with datagrams.
|
|
|
|
|
|
|
|
This specification only deals with the second threat. Some provisions
|
|
|
|
are made to detect man-in-the-middle attacks. To detect the first
|
|
|
|
attack, package authors need to sign their packages using PGP keys, so
|
|
|
|
that users verify that the package comes from the author they trust.
|
|
|
|
|
|
|
|
The central index provides a DSA key at the URL /serverkey, in the PEM
|
2022-01-21 06:03:51 -05:00
|
|
|
format as generated by "openssl dsa -pubout" (i.e. :rfc:`3280`
|
2010-07-16 03:53:23 -04:00
|
|
|
SubjectPublicKeyInfo, with the algorithm 1.3.14.3.2.12). This URL must
|
|
|
|
*not* be mirrored, and clients must fetch the official serverkey from
|
|
|
|
PyPI directly, or use the copy that came with the PyPI client software.
|
|
|
|
Mirrors should still download the key, to detect a key rollover.
|
|
|
|
|
|
|
|
For each package, a mirrored signature is provided at
|
|
|
|
/serversig/<package>. This is the DSA signature of the parallel URL
|
2022-01-21 06:03:51 -05:00
|
|
|
/simple/<package>, in DER form, using SHA-1 with DSA (i.e. as a
|
|
|
|
:rfc:`3279` Dsa-Sig-Value, created by algorithm 1.2.840.10040.4.3)
|
2010-07-16 03:53:23 -04:00
|
|
|
|
|
|
|
Clients using a mirror need to perform the following steps to verify
|
|
|
|
a package:
|
|
|
|
|
|
|
|
1. download the /simple page, and compute its SHA-1 hash
|
|
|
|
2. compute the DSA signature of that hash
|
|
|
|
3. download the corresponding /serversig, and compare it (byte-for-byte)
|
|
|
|
with the value computed in step 2.
|
|
|
|
4. compute and verify (against the /simple page) the MD-5 hashes
|
|
|
|
of all files they download from the mirror.
|
|
|
|
|
|
|
|
An implementation of the verification algorithm is available from
|
|
|
|
https://svn.python.org/packages/trunk/pypi/tools/verify.py
|
|
|
|
|
|
|
|
Verification is not needed when downloading from central index, and
|
|
|
|
should be avoided to reduce the computation overhead.
|
|
|
|
|
|
|
|
About once a year, the key will be replaced with a new one. Mirrors
|
|
|
|
will have to re-fetch all /serversig pages. Clients using mirrors need
|
|
|
|
to find a trusted copy of the new server key. One way to obtain one
|
|
|
|
is to download it from https://pypi.python.org/serverkey. To detect
|
|
|
|
man-in-the-middle attacks, clients need to verify the SSL server
|
|
|
|
certificate, which will be signed by the CACert authority.
|
|
|
|
|
2009-03-30 00:53:39 -04:00
|
|
|
|
2009-03-21 10:08:19 -04:00
|
|
|
Special pages a mirror needs to provide
|
|
|
|
=======================================
|
|
|
|
|
2010-07-04 17:33:54 -04:00
|
|
|
A mirror is a subset copy of PyPI, so it provides the same structure
|
2009-03-30 00:53:39 -04:00
|
|
|
by copying it.
|
|
|
|
|
|
|
|
- simple: rest version of the package index
|
|
|
|
- packages: packages, stored by Python version, and letters
|
2010-07-04 17:33:54 -04:00
|
|
|
- serversig: signatures for the simple pages
|
2009-03-30 00:53:39 -04:00
|
|
|
|
|
|
|
It also needs to provide two specific elements:
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
- last-modified
|
|
|
|
- local-stats
|
|
|
|
|
|
|
|
Last modified date
|
|
|
|
::::::::::::::::::
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
CPAN uses a freshness date system where the mirror's last
|
|
|
|
synchronisation date is made available.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
For PyPI, each mirror needs to maintain a URL with simple text content
|
2009-03-21 10:08:19 -04:00
|
|
|
that represents the last synchronisation date the mirror maintains.
|
|
|
|
|
2009-03-30 00:53:39 -04:00
|
|
|
The date is provided in GMT time, using the ISO 8601 format [#iso8601]_.
|
2009-03-22 05:01:46 -04:00
|
|
|
Each mirror will be responsible to maintain its last modified date.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
This page must be located at : ``/last-modified`` and must be a
|
2009-03-30 00:53:39 -04:00
|
|
|
text/plain page.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
Local statistics
|
|
|
|
::::::::::::::::
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
Each mirror is responsible to count all the downloads that where done
|
|
|
|
via it. This is used by PyPI to sum up all downloads, to be able to
|
|
|
|
display the grand total.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
These statistics are in CSV-like form, with a header in the first
|
2022-01-21 06:03:51 -05:00
|
|
|
line. It needs to obey :pep:`305`. Basically, it should be
|
2023-09-01 15:19:20 -04:00
|
|
|
readable by Python's ``csv`` module.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
The fields in this file are:
|
|
|
|
|
|
|
|
- package: the distutils id of the package.
|
|
|
|
- filename: the filename that has been downloaded.
|
2009-03-22 05:01:46 -04:00
|
|
|
- useragent: the User-Agent of the client that has downloaded the
|
|
|
|
package.
|
2009-03-21 10:08:19 -04:00
|
|
|
- count: the number of downloads.
|
|
|
|
|
|
|
|
The content will look like this::
|
|
|
|
|
|
|
|
# package,filename,useragent,count
|
|
|
|
zc.buildout,zc.buildout-1.6.0.tgz,MyAgent,142
|
|
|
|
...
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
The counting starts the day the mirror is launched, and there is one
|
2023-09-01 15:19:20 -04:00
|
|
|
file per day, compressed using the ``bzip2`` format. Each file is named
|
|
|
|
like the day. For example, ``2008-11-06.bz2`` is the file for the 6th of
|
2009-03-22 05:01:46 -04:00
|
|
|
November 2008.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
They are then provided in a folder called ``days``. For example:
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
- /local-stats/days/2008-11-06.bz2
|
|
|
|
- /local-stats/days/2008-11-07.bz2
|
|
|
|
- /local-stats/days/2008-11-08.bz2
|
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
This page must be located at ``/local-stats``.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
|
2009-03-21 10:08:19 -04:00
|
|
|
How a mirror should synchronize with PyPI
|
|
|
|
=========================================
|
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
A mirroring protocol called ``Simple Index`` was described and
|
2009-03-22 05:01:46 -04:00
|
|
|
implemented by Martin v. Loewis and Jim Fulton, based on how
|
2023-09-01 15:19:20 -04:00
|
|
|
``easy_install`` works. This section synthesizes it and gives a few
|
|
|
|
relevant links, plus a small part about ``User-Agent``.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
The mirroring protocol
|
|
|
|
::::::::::::::::::::::
|
|
|
|
|
2016-07-11 11:14:08 -04:00
|
|
|
Mirrors must reduce the amount of data transferred between the central
|
2010-07-04 17:33:54 -04:00
|
|
|
server and the mirror. To achieve that, they MUST use the changelog()
|
|
|
|
PyPI XML-RPC call, and only refetch the packages that have been
|
|
|
|
changed since the last time. For each package P, they MUST copy
|
|
|
|
documents /simple/P/ and /serversig/P. If a package is deleted on the
|
|
|
|
central server, they MUST delete the package and all associated files.
|
|
|
|
To detect modification of package files, they MAY cache the file's
|
|
|
|
ETag, and MAY request skipping it using the If-none-match header.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2010-07-04 17:33:54 -04:00
|
|
|
Each mirroring tool MUST identify itself using a descripte User-agent
|
|
|
|
header.
|
|
|
|
|
|
|
|
The pep381client package [#pep381client]_ provides an application that
|
2009-03-22 05:01:46 -04:00
|
|
|
respects this protocol to browse PyPI.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2013-03-30 18:18:51 -04:00
|
|
|
User-agent request header
|
2009-03-21 10:08:19 -04:00
|
|
|
:::::::::::::::::::::::::
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
In order to be able to differentiate actions taken by clients over
|
|
|
|
PyPI, a specific user agent name should be provided by all mirroring
|
2016-07-11 11:14:08 -04:00
|
|
|
software.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
This is also true for all clients like:
|
|
|
|
|
2009-03-30 00:53:39 -04:00
|
|
|
- zc.buildout [#zc.buildout]_.
|
|
|
|
- setuptools [#setuptools]_.
|
|
|
|
- pip [#pip]_.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
XXX user agent registering mechanism at PyPI ?
|
|
|
|
|
|
|
|
How a client can use PyPI and its mirrors
|
|
|
|
:::::::::::::::::::::::::::::::::::::::::
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
Clients that are browsing PyPI should be able to use alternative
|
2023-09-01 15:19:20 -04:00
|
|
|
mirrors, by getting the list of the mirrors using ``last.pypi.python.org``.
|
2009-03-30 00:53:39 -04:00
|
|
|
|
|
|
|
Code example::
|
|
|
|
|
|
|
|
>>> import socket
|
2010-07-04 17:33:54 -04:00
|
|
|
>>> socket.gethostbyname_ex('last.pypi.python.org')[0]
|
|
|
|
'h.pypi.python.org'
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
The clients so far that could use this mechanism:
|
|
|
|
|
|
|
|
- setuptools
|
|
|
|
- zc.buildout (through setuptools)
|
|
|
|
- pip
|
|
|
|
|
|
|
|
Fail-over mechanism
|
|
|
|
:::::::::::::::::::
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
Clients that are browsing PyPI should be able to use a fail-over
|
|
|
|
mechanism when PyPI or the used mirror is not responding.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2019-08-11 11:53:20 -04:00
|
|
|
It is up to the client to decide which mirror should be used, maybe by
|
2019-07-03 14:20:45 -04:00
|
|
|
looking at its geographical location and its responsiveness.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
This PEP does not describe how this fail-over mechanism should work,
|
|
|
|
but it is strongly encouraged that the clients try to use the nearest
|
|
|
|
mirror.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
The clients so far that could use this mechanism:
|
|
|
|
|
|
|
|
- setuptools
|
|
|
|
- zc.buildout (through setuptools)
|
|
|
|
- pip
|
|
|
|
|
|
|
|
Extra package indexes
|
|
|
|
:::::::::::::::::::::
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
It is obvious that some packages will not be uploaded to PyPI, whether
|
|
|
|
because they are private or whether because the project maintainer
|
2022-01-21 16:08:11 -05:00
|
|
|
runs their own server where people might get the project package.
|
2009-03-22 05:01:46 -04:00
|
|
|
However, it is strongly encouraged that a public package index follows
|
|
|
|
PyPI and Distutils protocols.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2023-09-01 15:19:20 -04:00
|
|
|
In other words, the ``register`` and ``upload`` command should be
|
2009-03-22 05:01:46 -04:00
|
|
|
compatible with any package index server out there.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2016-07-11 11:14:08 -04:00
|
|
|
Software that are compatible with PyPI and Distutils so far:
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2019-08-11 11:53:20 -04:00
|
|
|
- PloneSoftwareCenter [#psc]_ which is used to run plone.org products section.
|
2009-03-30 00:53:39 -04:00
|
|
|
- EggBasket [#eggbasket]_.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
**An extra package index is not a mirror of PyPI, but can have some
|
|
|
|
mirrors itself.**
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
Merging several indexes
|
|
|
|
:::::::::::::::::::::::
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
When a client needs to get some packages from several distinct
|
|
|
|
indexes, it should be able to use each one of them as a potential
|
|
|
|
source of packages. Different indexes should be defined as a sorted
|
|
|
|
list for the client to look for a package.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
2016-07-11 11:14:08 -04:00
|
|
|
Each independent index can of course provide a list of its mirrors.
|
2009-03-30 00:53:39 -04:00
|
|
|
|
|
|
|
XXX define how to get the hostname for the mirrors of an arbitrary
|
|
|
|
index.
|
2009-03-21 10:08:19 -04:00
|
|
|
|
|
|
|
That permits all combinations at client level, for a reliable
|
|
|
|
packaging system with all levels of privacy.
|
|
|
|
|
|
|
|
It is up the client to deal with the merging.
|
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
2010-07-04 17:33:54 -04:00
|
|
|
.. [#pep381client]
|
|
|
|
http://pypi.python.org/pypi/pep381client
|
2009-03-22 05:01:46 -04:00
|
|
|
|
2009-03-30 00:53:39 -04:00
|
|
|
.. [#iso8601]
|
|
|
|
http://en.wikipedia.org/wiki/ISO_8601
|
|
|
|
|
|
|
|
.. [#zc.buildout]
|
|
|
|
http://pypi.python.org/pypi/zc.buildout
|
|
|
|
|
|
|
|
.. [#setuptools]
|
|
|
|
http://pypi.python.org/pypi/setuptools
|
|
|
|
|
|
|
|
.. [#pip]
|
|
|
|
http://pypi.python.org/pypi/pip
|
|
|
|
|
|
|
|
.. [#psc]
|
|
|
|
http://plone.org/products/plonesoftwarecenter
|
|
|
|
|
|
|
|
.. [#eggbasket]
|
|
|
|
http://www.chrisarndt.de/projects/eggbasket
|
|
|
|
|
|
|
|
|
2010-07-04 17:33:54 -04:00
|
|
|
Acknowledgments
|
|
|
|
===============
|
2009-03-30 00:53:39 -04:00
|
|
|
|
2010-07-04 17:33:54 -04:00
|
|
|
Georg Brandl.
|
2009-03-30 00:53:39 -04:00
|
|
|
|
2009-03-22 05:01:46 -04:00
|
|
|
|
2009-03-21 10:08:19 -04:00
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|