python-peps/pep-0381.txt

312 lines
8.5 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 381
Title: Mirroring infrastructure for PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Tarek Ziadé <tarek@ziade.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 21-March-2009
Python-Version: N.A.
Post-History:
Abstract
========
This PEP describes a mirroring infrastructure for PyPI.
Rationale
=========
PyPI is hosting over 6000 projects and is used on a daily basis
by people to build applications. Especially systems like `easy_install`
and `zc.buildout` make intensive usage of PyPI.
For people making intensive use of PyPI, it can act as a single point
of failure. People have started to set up some mirrors, both private
and public. Those mirrors are active mirrors, which means that they
are browsing PyPI to get synced.
In order to make the system more reliable, this PEP describes:
- the mirror listing and registering at PyPI
- the pages a public mirror should maintain. These pages will be used
by PyPI, in order to get hit counts and the last modified date.
- how a mirror should synchronize with PyPI
- how a client can implement a fail-over mechanism
Mirror listing and registering
==============================
People that wants to mirror PyPI make a proposal on catalog-SIG.
When a mirror is proposed on the mailing list, it is manually
added in a mirror list in the PyPI application after it
has been checked to be compliant with the mirroring rules.
The mirror list is handled by a DNS entry for this hostname:
mirrors.pypi.python.org
When a mirror is added into the DNS, it becomes an official
IP for `mirrors.pypi.python.org`, and requests will be sent
to the given IP. Therefore the mirror maintainer should not
change the IP provided. If the IP has to change for any reason,
the mirror maintainer has to send a mail to catalog-SIG at least
one week before the change so the DNS entry can be changed on time.
The new mirror also appears at `http://pypi.python.org/mirrors`
which is a human-readable page that gives the list of mirrors.
This page also explains how to register a new mirror.
Statistics page
:::::::::::::::
PyPI provides statistics on downloads at `/stats`. This page is
calculated daily by PyPI, by reading all mirrors' local stats and
summing them.
The stats are presented in daily or montly files, under `/stats/days`
and `/stats/months`. Each file is a `bzip2` file with these formats:
- YYYY-MM-DD.bz2 for daily files
- YYYY-MM.bz2 for monthly files
Examples:
- /stats/days/2008-11-06.bz2
- /stats/days/2008-11-07.bz2
- /stats/days/2008-11-08.bz2
- /stats/months/2008-11.bz2
- /stats/months/2008-10.bz2
Special pages a mirror needs to provide
=======================================
A mirror is a strict copy of PyPI, so it provides the same structure
by copying it.
- pypi: html version of the package index
- simple: rest version of the package index
- packages: packages, stored by Python version, and letters
- stats : statistics on downloads
- XXX
It also needs to provide two specific elements:
- last-modified
- local-stats
Last modified date
::::::::::::::::::
CPAN uses a freshness date system where the mirror's last
synchronisation date is made available.
For PyPI, each mirror needs to maintain a URL with simple text content
that represents the last synchronisation date the mirror maintains.
The date is provided in GMT time, using the ISO 8601 format [#iso8601]_.
Each mirror will be responsible to maintain its last modified date.
This page must be located at : `/last-modified` and must be a
text/plain page.
Local statistics
::::::::::::::::
Each mirror is responsible to count all the downloads that where done
via it. This is used by PyPI to sum up all downloads, to be able to
display the grand total.
These statistics are in CSV-like form, with a header in the first
line. It needs to obey PEP 305 [#pep305]_. Basically, it should be
readable by Python's `csv` module.
The fields in this file are:
- package: the distutils id of the package.
- filename: the filename that has been downloaded.
- useragent: the User-Agent of the client that has downloaded the
package.
- count: the number of downloads.
The content will look like this::
# package,filename,useragent,count
zc.buildout,zc.buildout-1.6.0.tgz,MyAgent,142
...
The counting starts the day the mirror is launched, and there is one
file per day, compressed using the `bzip2` format. Each file is named
like the day. For example `2008-11-06.bz2` is the file for the 6th of
November 2008.
They are then provided in a folder called `days`. For example:
- /local-stats/days/2008-11-06.bz2
- /local-stats/days/2008-11-07.bz2
- /local-stats/days/2008-11-08.bz2
This page must be located at `/local-stats`.
How a mirror should synchronize with PyPI
=========================================
A mirroring protocol called `Simple Index` was described and
implemented by Martin v. Loewis and Jim Fulton, based on how
`easy_install` works. This section synthesizes it and gives a few
relevant links, plus a small part about `User-Agent`.
The mirroring protocol
::::::::::::::::::::::
XXX Need to describe the protocol here.
The z3c.pypimirror package [#zcpkg]_ provides an application that
respects this protocol to browse PyPI.
User-agent request header
:::::::::::::::::::::::::
In order to be able to differentiate actions taken by clients over
PyPI, a specific user agent name should be provided by all mirroring
softwares.
This is also true for all clients like:
- zc.buildout [#zc.buildout]_.
- setuptools [#setuptools]_.
- pip [#pip]_.
XXX user agent registering mechanism at PyPI ?
How a client can use PyPI and its mirrors
:::::::::::::::::::::::::::::::::::::::::
Clients that are browsing PyPI should be able to use alternative
mirrors, by getting the list of the mirrors using `mirrors.pypi.python.org`.
Code example::
>>> import socket
>>> socket.gethostbyname_ex('mirrors.pypi.python.org')[-1]
['82.94.164.163', '88.191.64.248']
The clients so far that could use this mechanism:
- setuptools
- zc.buildout (through setuptools)
- pip
Fail-over mechanism
:::::::::::::::::::
Clients that are browsing PyPI should be able to use a fail-over
mechanism when PyPI or the used mirror is not responding.
It is up to the client to decide wich mirror should be used, maybe by
looking at its geographical location and its responsivness.
This PEP does not describe how this fail-over mechanism should work,
but it is strongly encouraged that the clients try to use the nearest
mirror.
The clients so far that could use this mechanism:
- setuptools
- zc.buildout (through setuptools)
- pip
Extra package indexes
:::::::::::::::::::::
It is obvious that some packages will not be uploaded to PyPI, whether
because they are private or whether because the project maintainer
runs his own server where people might get the project package.
However, it is strongly encouraged that a public package index follows
PyPI and Distutils protocols.
In other words, the `register` and `upload` command should be
compatible with any package index server out there.
Softwares that are compatible with PyPI and Distutils so far:
- PloneSoftwareCenter [#psc]_ wich is used to run plone.org products section.
- EggBasket [#eggbasket]_.
**An extra package index is not a mirror of PyPI, but can have some
mirrors itself.**
Merging several indexes
:::::::::::::::::::::::
When a client needs to get some packages from several distinct
indexes, it should be able to use each one of them as a potential
source of packages. Different indexes should be defined as a sorted
list for the client to look for a package.
Each independant index can of course provide a list of its mirrors.
XXX define how to get the hostname for the mirrors of an arbitrary
index.
That permits all combinations at client level, for a reliable
packaging system with all levels of privacy.
It is up the client to deal with the merging.
References
==========
.. [#pep305]
http://www.python.org/dev/peps/pep-0305/#id19
.. [#zcpkg]
http://pypi.python.org/pypi/z3c.pypimirror
.. [#iso8601]
http://en.wikipedia.org/wiki/ISO_8601
.. [#zc.buildout]
http://pypi.python.org/pypi/zc.buildout
.. [#setuptools]
http://pypi.python.org/pypi/setuptools
.. [#pip]
http://pypi.python.org/pypi/pip
.. [#psc]
http://plone.org/products/plonesoftwarecenter
.. [#eggbasket]
http://www.chrisarndt.de/projects/eggbasket
Aknowledgments
==============
Martin von Loewis, Georg Brandl.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: