312 lines
8.5 KiB
Plaintext
312 lines
8.5 KiB
Plaintext
PEP: 381
|
||
Title: Mirroring infrastructure for PyPI
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Tarek Ziadé <tarek@ziade.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 21-March-2009
|
||
Python-Version: N.A.
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP describes a mirroring infrastructure for PyPI.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
PyPI is hosting over 6000 projects and is used on a daily basis
|
||
by people to build applications. Especially systems like `easy_install`
|
||
and `zc.buildout` make intensive usage of PyPI.
|
||
|
||
For people making intensive use of PyPI, it can act as a single point
|
||
of failure. People have started to set up some mirrors, both private
|
||
and public. Those mirrors are active mirrors, which means that they
|
||
are browsing PyPI to get synced.
|
||
|
||
In order to make the system more reliable, this PEP describes:
|
||
|
||
- the mirror listing and registering at PyPI
|
||
- the pages a public mirror should maintain. These pages will be used
|
||
by PyPI, in order to get hit counts and the last modified date.
|
||
- how a mirror should synchronize with PyPI
|
||
- how a client can implement a fail-over mechanism
|
||
|
||
|
||
Mirror listing and registering
|
||
==============================
|
||
|
||
People that wants to mirror PyPI make a proposal on catalog-SIG.
|
||
When a mirror is proposed on the mailing list, it is manually
|
||
added in a mirror list in the PyPI application after it
|
||
has been checked to be compliant with the mirroring rules.
|
||
|
||
The mirror list is handled by a DNS entry for this hostname:
|
||
|
||
mirrors.pypi.python.org
|
||
|
||
When a mirror is added into the DNS, it becomes an official
|
||
IP for `mirrors.pypi.python.org`, and requests will be sent
|
||
to the given IP. Therefore the mirror maintainer should not
|
||
change the IP provided. If the IP has to change for any reason,
|
||
the mirror maintainer has to send a mail to catalog-SIG at least
|
||
one week before the change so the DNS entry can be changed on time.
|
||
|
||
The new mirror also appears at `http://pypi.python.org/mirrors`
|
||
which is a human-readable page that gives the list of mirrors.
|
||
This page also explains how to register a new mirror.
|
||
|
||
Statistics page
|
||
:::::::::::::::
|
||
|
||
PyPI provides statistics on downloads at `/stats`. This page is
|
||
calculated daily by PyPI, by reading all mirrors' local stats and
|
||
summing them.
|
||
|
||
The stats are presented in daily or montly files, under `/stats/days`
|
||
and `/stats/months`. Each file is a `bzip2` file with these formats:
|
||
|
||
- YYYY-MM-DD.bz2 for daily files
|
||
- YYYY-MM.bz2 for monthly files
|
||
|
||
Examples:
|
||
|
||
- /stats/days/2008-11-06.bz2
|
||
- /stats/days/2008-11-07.bz2
|
||
- /stats/days/2008-11-08.bz2
|
||
- /stats/months/2008-11.bz2
|
||
- /stats/months/2008-10.bz2
|
||
|
||
|
||
Special pages a mirror needs to provide
|
||
=======================================
|
||
|
||
A mirror is a strict copy of PyPI, so it provides the same structure
|
||
by copying it.
|
||
|
||
- pypi: html version of the package index
|
||
- simple: rest version of the package index
|
||
- packages: packages, stored by Python version, and letters
|
||
- stats : statistics on downloads
|
||
- XXX
|
||
|
||
It also needs to provide two specific elements:
|
||
|
||
- last-modified
|
||
- local-stats
|
||
|
||
Last modified date
|
||
::::::::::::::::::
|
||
|
||
CPAN uses a freshness date system where the mirror's last
|
||
synchronisation date is made available.
|
||
|
||
For PyPI, each mirror needs to maintain a URL with simple text content
|
||
that represents the last synchronisation date the mirror maintains.
|
||
|
||
The date is provided in GMT time, using the ISO 8601 format [#iso8601]_.
|
||
Each mirror will be responsible to maintain its last modified date.
|
||
|
||
This page must be located at : `/last-modified` and must be a
|
||
text/plain page.
|
||
|
||
Local statistics
|
||
::::::::::::::::
|
||
|
||
Each mirror is responsible to count all the downloads that where done
|
||
via it. This is used by PyPI to sum up all downloads, to be able to
|
||
display the grand total.
|
||
|
||
These statistics are in CSV-like form, with a header in the first
|
||
line. It needs to obey PEP 305 [#pep305]_. Basically, it should be
|
||
readable by Python's `csv` module.
|
||
|
||
The fields in this file are:
|
||
|
||
- package: the distutils id of the package.
|
||
- filename: the filename that has been downloaded.
|
||
- useragent: the User-Agent of the client that has downloaded the
|
||
package.
|
||
- count: the number of downloads.
|
||
|
||
The content will look like this::
|
||
|
||
# package,filename,useragent,count
|
||
zc.buildout,zc.buildout-1.6.0.tgz,MyAgent,142
|
||
...
|
||
|
||
The counting starts the day the mirror is launched, and there is one
|
||
file per day, compressed using the `bzip2` format. Each file is named
|
||
like the day. For example `2008-11-06.bz2` is the file for the 6th of
|
||
November 2008.
|
||
|
||
They are then provided in a folder called `days`. For example:
|
||
|
||
- /local-stats/days/2008-11-06.bz2
|
||
- /local-stats/days/2008-11-07.bz2
|
||
- /local-stats/days/2008-11-08.bz2
|
||
|
||
This page must be located at `/local-stats`.
|
||
|
||
|
||
How a mirror should synchronize with PyPI
|
||
=========================================
|
||
|
||
A mirroring protocol called `Simple Index` was described and
|
||
implemented by Martin v. Loewis and Jim Fulton, based on how
|
||
`easy_install` works. This section synthesizes it and gives a few
|
||
relevant links, plus a small part about `User-Agent`.
|
||
|
||
The mirroring protocol
|
||
::::::::::::::::::::::
|
||
|
||
XXX Need to describe the protocol here.
|
||
|
||
The z3c.pypimirror package [#zcpkg]_ provides an application that
|
||
respects this protocol to browse PyPI.
|
||
|
||
User-agent request header
|
||
:::::::::::::::::::::::::
|
||
|
||
In order to be able to differentiate actions taken by clients over
|
||
PyPI, a specific user agent name should be provided by all mirroring
|
||
softwares.
|
||
|
||
This is also true for all clients like:
|
||
|
||
- zc.buildout [#zc.buildout]_.
|
||
- setuptools [#setuptools]_.
|
||
- pip [#pip]_.
|
||
|
||
XXX user agent registering mechanism at PyPI ?
|
||
|
||
How a client can use PyPI and its mirrors
|
||
:::::::::::::::::::::::::::::::::::::::::
|
||
|
||
Clients that are browsing PyPI should be able to use alternative
|
||
mirrors, by getting the list of the mirrors using `mirrors.pypi.python.org`.
|
||
|
||
Code example::
|
||
|
||
>>> import socket
|
||
>>> socket.gethostbyname_ex('mirrors.pypi.python.org')[-1]
|
||
['82.94.164.163', '88.191.64.248']
|
||
|
||
The clients so far that could use this mechanism:
|
||
|
||
- setuptools
|
||
- zc.buildout (through setuptools)
|
||
- pip
|
||
|
||
Fail-over mechanism
|
||
:::::::::::::::::::
|
||
|
||
Clients that are browsing PyPI should be able to use a fail-over
|
||
mechanism when PyPI or the used mirror is not responding.
|
||
|
||
It is up to the client to decide wich mirror should be used, maybe by
|
||
looking at its geographical location and its responsivness.
|
||
|
||
This PEP does not describe how this fail-over mechanism should work,
|
||
but it is strongly encouraged that the clients try to use the nearest
|
||
mirror.
|
||
|
||
The clients so far that could use this mechanism:
|
||
|
||
- setuptools
|
||
- zc.buildout (through setuptools)
|
||
- pip
|
||
|
||
Extra package indexes
|
||
:::::::::::::::::::::
|
||
|
||
It is obvious that some packages will not be uploaded to PyPI, whether
|
||
because they are private or whether because the project maintainer
|
||
runs his own server where people might get the project package.
|
||
However, it is strongly encouraged that a public package index follows
|
||
PyPI and Distutils protocols.
|
||
|
||
In other words, the `register` and `upload` command should be
|
||
compatible with any package index server out there.
|
||
|
||
Softwares that are compatible with PyPI and Distutils so far:
|
||
|
||
- PloneSoftwareCenter [#psc]_ wich is used to run plone.org products section.
|
||
- EggBasket [#eggbasket]_.
|
||
|
||
**An extra package index is not a mirror of PyPI, but can have some
|
||
mirrors itself.**
|
||
|
||
Merging several indexes
|
||
:::::::::::::::::::::::
|
||
|
||
When a client needs to get some packages from several distinct
|
||
indexes, it should be able to use each one of them as a potential
|
||
source of packages. Different indexes should be defined as a sorted
|
||
list for the client to look for a package.
|
||
|
||
Each independant index can of course provide a list of its mirrors.
|
||
|
||
XXX define how to get the hostname for the mirrors of an arbitrary
|
||
index.
|
||
|
||
That permits all combinations at client level, for a reliable
|
||
packaging system with all levels of privacy.
|
||
|
||
It is up the client to deal with the merging.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#pep305]
|
||
http://www.python.org/dev/peps/pep-0305/#id19
|
||
|
||
.. [#zcpkg]
|
||
http://pypi.python.org/pypi/z3c.pypimirror
|
||
|
||
.. [#iso8601]
|
||
http://en.wikipedia.org/wiki/ISO_8601
|
||
|
||
.. [#zc.buildout]
|
||
http://pypi.python.org/pypi/zc.buildout
|
||
|
||
.. [#setuptools]
|
||
http://pypi.python.org/pypi/setuptools
|
||
|
||
.. [#pip]
|
||
http://pypi.python.org/pypi/pip
|
||
|
||
.. [#psc]
|
||
http://plone.org/products/plonesoftwarecenter
|
||
|
||
.. [#eggbasket]
|
||
http://www.chrisarndt.de/projects/eggbasket
|
||
|
||
|
||
Aknowledgments
|
||
==============
|
||
|
||
Martin von Loewis, Georg Brandl.
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|