From b5bf5d858d9a752fee743aa0549b3762c7669779 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tarek=20Ziad=C3=A9?= Date: Sat, 21 Mar 2009 14:08:19 +0000 Subject: [PATCH] added PEP 381 (mirroring infrastructure for PyPI) --- pep-0381.txt | 307 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 307 insertions(+) create mode 100644 pep-0381.txt diff --git a/pep-0381.txt b/pep-0381.txt new file mode 100644 index 000000000..22b320b77 --- /dev/null +++ b/pep-0381.txt @@ -0,0 +1,307 @@ +PEP: 376 +Title: Mirroring infrastructure for PyPI +Version: $Revision$ +Last-Modified: $Date$ +Author: Tarek Ziadé +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 21-March-2009 +Python-Version: N.A. +Post-History: + +Abstract +======== + +This PEP describes a mirroring infrastructure for PyPI. + +Rationale +========= + +PyPI is hosting over 4000 projects and is used on a daily basis +by people to build applications. Especially systems like `easy_install` +and `zc.buildout` make intensive usage of PyPI. + +For people making intensive use of PyPI, it can act as a single point +of failure. People have started to set up some mirrors, both private and +public. Those mirrors are active mirrors, which means that they are +browsing PyPI to get synced. + +In order to make the system more reliable, this PEP describes: + +- the mirror listing and registering at PyPI +- the pages a public mirror should maintain. + these pages will be used by PyPI, in order to get + hit counts and the last modified date. +- how a mirror should synchronize with PyPI +- how a client can implement a fail-over mechanism +- a contact form for Package maintainers + +Mirror listing and registering +============================== + +A new text page will be added at `http://pypi.python.org/mirrors` +that can be browsed like the simple index. This page gives a list of +the mirrors through a list of links. + +These links are the URL of the simple index of each mirror. +The page will look like this:: + + + # PyPI mirrors + # + # If you want to register a new mirror, send an email + # to the catalog-SIG@python.org with: + # + # - The urls of your mirror: + # - the root of the server + # - the index page + # - the last modified page + # - the local stats page + # - the global stats page + # - the mirrors page + # + # - The name and email of the maintainer. + # + # The registering is done manually and to become a + # mirror, you need to strictly follow the mirror protocol + # described here: + # + # http://wiki.python.org/PEP_374 + # + # root,index,last-modified,local-stats,stats,mirrors + http://example.com/pypi,index,last-modified,local-stats,stats,mirrors + http://example2.com/pypi,index,last-modified,local-stats,stats,mirrors + +When a mirror is proposed on the mailing list, it is manually +added in the mirror list in the PyPI application after it +has been checked to be compliant with the mirroring rules. + +The mirror list page is a simple text page that can be browsed +by any tool that wants to get a list of registered mirrors. +Other package indexes that are not mirrors of PyPI are not added in the +mirror list in PyPI. Although they can provide themselve the +same mirroring list mechanism for their own mirrors. + +Special pages a mirror needs to provide +======================================= + +A mirror needs to provide four pages, beside the index one: + +- last-modified +- local-stats +- stats +- mirrors + +Last modified date +:::::::::::::::::: + +CPAN uses a freshness date system where the mirror last synchronisation +date is made available. + +For PyPI, each mirror needs to maintain an url with a simple text content +that represents the last synchronisation date the mirror maintains. + +The date is provided in GMT time, using the ISO 8601 format +(see http://en.wikipedia.org/wiki/ISO_8601) + +Each mirror will be responsible to maintain its last modified date. + +Conventionaly, this page should be reachable at: `/last-modified`. + +Local statistics +:::::::::::::::: + +Each mirror is responsible to count all the downloads +that where done on it. This is used by PyPI to sum up all +downloads, to be able to display the grand total. + +These statistics are in csv-like form, with a header at the first +line. It needs to obey `PEP 305 `_ +Basically, it should be readable by Python `csv` module. + +The fields in this file are: + +- package: the distutils id of the package. +- filename: the filename that has been downloaded. +- useragent: the User-Agent of the client that has downloaded the package. +- count: the number of downloads. + +The content will look like this:: + + # package,filename,useragent,count + zc.buildout,zc.buildout-1.6.0.tgz,MyAgent,142 + ... + +The counting starts the day the mirror is launched, and there is one file per +day, compressed using the `bzip2` format. Each file is named after the +day. For example `2008-11-06.bz2` is the file for the 6th of November 2008. + +They are then provided in a folder called `days`. For example: + +- /local-stats/days/2008-11-06.bz2 +- /local-stats/days/2008-11-07.bz2 +- /local-stats/days/2008-11-08.bz2 + +Conventionally the name should be `local-stats` but it can be any name +provided when the mirror is registered. + +Statistics page +::::::::::::::: + +PyPI and each mirror are responsible to provide the grand total +page at `/stats`. This page is calculated daily by PyPI, +by reading all mirrors local stats and suming them. + +Therefore the mirrors should not try to rebuild this stat page but simply +get PyPI's one during each synchronization. + +It has the same structure than `local-stats` but also provides +counts for months. + +Examples: + +- /stats/days/2008-11-06.bz2 +- /stats/days/2008-11-07.bz2 +- /stats/days/2008-11-08.bz2 +- /stats/months/2008-11.bz2 +- /stats/months/2008-10.bz2 + +Conventionally the name should be `stats` but it can be any name +provided when the mirror is registered. + + +Mirrors listing page +:::::::::::::::::::: + +Like `/stats`, each mirror should get and provide a copy of the `/mirrors` +page. + +Conventionally the name should be `mirrors` but it can be any name +provided when the mirror is registered. + +How a mirror should synchronize with PyPI +========================================= + +A mirroring protocol calls `Simple Index` was described +and implemented by Martin v. Loewis and Jim Fulton, based on +how `easy_install` works. This section synthesizes it +and give a few relevant links, plus a small part about +`User-Agent`. + +The mirroring protocol +:::::::::::::::::::::: + +XXX Need to describe the protocol here. + +The `zc.pypimirror `_ package +provides an application that respects this protocol to browse PyPI. + +User-agent request header +::::::::::::::::::::::::: + +In order to be able to differentiate actions taken by clients +over PyPI, a specific user agent name should be provided by all +mirroring softwares. + +This is also true for all clients like: + +- `zc.buildout `_ +- `setuptools `_ +- `pip `_ +- etc. + +XXX user agent registering mechanism at PyPI ? + +How a client can use PyPI and its mirrors +::::::::::::::::::::::::::::::::::::::::: + +Clients that are browsing PyPI should be able to use +alternative mirrors, by reading the `/mirrors` page +at PyPI. + +The clients so far that could use this mechanism: + +- setuptools +- zc.buildout (through setuptools) +- pip + +Fail-over mechanism +::::::::::::::::::: + +Clients that are browsing PyPI should be able to use +a fail-over mechanism when PyPI or the used mirror +is not responding. + +This can be done by parsing the `/mirrors` page of PyPI +or the one located on any PyPI mirror. + +It is up to the client to decide wich mirror should +be used. Maybe by looking at its geographical location and +its responsivness. + +This PEP does not describe how this fail-over +mechanism should work, but it is strongly encouraged +that the clients try to use the nearest mirror. + +The clients so far that could use this mechanism: + +- setuptools +- zc.buildout (through setuptools) +- pip + +Extra package indexes +::::::::::::::::::::: + +It is obvious that some package will not be uploaded +to PyPI. Wether because they are private or wether because +the project maintainer runs his own server where people +might get the project package. Although, it is strongly +encouraged that a public package index follows PyPI +and Distutils protocols. + +In other words, the `register` and `upload` command +should be compatible with any package index server out +there. + +Softwares that are compatible with PyPI and Distutils so +far: + +- `PloneSoftwareCenter `_ + wich is used to run plone.org products section. +- `EggBasket `_ + +**An extra package index is not a mirror or PyPI but can have itself +some mirrors** + +Merging several indexes +::::::::::::::::::::::: + +When a client needs to get some packages from several +distinct indexes, it should be able to use each one of them +as a potential source of packages. Different indexes +should be defined as a sorted list for the client to +look for a package. + +Each independant index can of course provide a list of +its mirrors, if the `/mirrors` page is available. + +That permits all combinations at client level, for a reliable +packaging system with all levels of privacy. + +It is up the client to deal with the merging. + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: