reorganized the PEP so implementation details are not mixed with the proposal. Also renamed the directpry to distinfo and the metadata file to METADATA

This commit is contained in:
Tarek Ziadé 2010-03-29 09:03:10 +00:00
parent 120165cb88
commit b3350684fb
1 changed files with 136 additions and 176 deletions

View File

@ -1,5 +1,5 @@
PEP: 376
Title: Changing the .egg-info structure
Title: Database of Installed Python Distributions
Version: $Revision$
Last-Modified: $Date$
Author: Tarek Ziadé <tarek@ziade.org>
@ -10,24 +10,23 @@ Created: 22-Feb-2009
Python-Version: 2.7, 3.2
Post-History:
.. contents::
Abstract
========
The overall goal of this PEP is providing an standard infrastructure to manage
project distributions. This should allow third party tools to do installation,
uninstallation and distribution management in a distutils compatible fashion
and share information between them.
The goal of this PEP is to provide a standard infrastructure to manage
project distributions installed on a system, so all tools that are
installing or removing projects are interoperable.
It also provides a sample uninstall feature using this infrastructure.
To achieve this goal, the PEP proposes a new format to describe installed
distributions on a system. It also describes a reference implementation
for the standard library.
For it, the PEP proposes various enhancements for Distutils:
In the past an attempt was made to create a installation database (see PEP 262
[#pep262]_).
- A new format to install projects, as an .egg-info structure.
- New APIs to read a project meta-data
- Replace PEP 262, adding capabilities to record and query information about
installed packages.
- A reference uninstall feature
Combined with PEP 345, the current proposal superseds PEP 262.
Definitions
===========
@ -64,13 +63,13 @@ There are two problems right now in the way distributions are installed in
Python:
- There are too many ways to do it and this makes interoperation difficult.
- There is no API to get the metadata of installed distributions.
- There is no API to get information on installed distributions.
How distributions are installed
-------------------------------
Right now, when a distribution is installed in Python, every element it
contains are installed in various directories.
contains is installed in various directories.
For instance, `Distutils` installs the pure Python code in the `purelib`
directory, which is `lib\python2.6\site-packages` for unix-like systems and
@ -99,25 +98,29 @@ to install distributions, called `EggFormats` [#eggformats]_:
- a self-contained `.egg` directory, that contains all the distribution files
and the distribution metadata in a file called `PKG-INFO` in a subdirectory
called `EGG-INFO`. `setuptools` creates other fils in that directory that can
called `EGG-INFO`. `setuptools` creates other files in that directory that can
be considered as complementary metadata.
- a `.egg-info` directory installed in `site-packages`, that contains the same
- an `.egg-info` directory installed in `site-packages`, that contains the same
files `EGG-INFO` has in the `.egg` format.
The first format is automatically used when you install a distribution that
uses the ``setuptools.setup`` function in its setup.py file, instead of
the ``distutils.core.setup`` one.
The `setuptools` project also provides an executable script called
`setuptools` also add a reference to the distribution into an
``easy-install.pth`` file.
Last, the `setuptools` project provides an executable script called
`easy_install` [#easyinstall]_ that installs all distributions, including
distutils-based ones in self-contained `.egg` directories.
If you want to have a standalone `.egg.info` directory distributions, e.g.
the second `setuptools` format, you have to force it when you work
If you want to have standalone `.egg-info` directories for your distributions,
e.g. the second `setuptools` format, you have to force it when you work
with a setuptools-based distribution or with the `easy_install` script.
You can force it by using the `-single-version-externally-managed` option
**or** the `--root` option.
**or** the `--root` option. This will make the `setuptools` project install
the project like distutils does.
This option is used by :
@ -145,21 +148,24 @@ But there's a common behavior: when you install a distribution, files are
copied in your system. And it's possible to keep track of these files for
later removal.
Moreover, the Pip project has gained an `uninstall` feature lately. It
records all installed files, using the `record` option of the `install`
command.
What this PEP proposes
----------------------
To address those issues, this PEP proposes a few changes:
- A new `.egg-info` structure using a directory, based on one format of
- A new `.dist-info` structure using a directory, inspired on one format of
the `EggFormats` standard from `setuptools`.
- New APIs in `pkgutil` to be able to query the information of installed
distributions.
- A de-facto replacement for PEP 262
- An uninstall function and an uninstall script in Distutils.
.egg-info becomes a directory
=============================
One .dist-info directory per installed distribution
===================================================
As explained earlier, the `EggFormats` standard from `setuptools` proposes two
formats to install the metadata information of a distribution:
@ -172,57 +178,34 @@ formats to install the metadata information of a distribution:
with the metadata inside.
This PEP proposes to keep just one format and make it the standard way to
install the metadata of a distribution : a distinct `.egg-info` directory
located in the site-packages directory, containing the metadata.
install the metadata of a distribution : a distinct `.dist-info` directory
located in the site-packages directory, containing the PKG-INFO metadata
file, renamed to METADATA, and some other files.
This `.egg-info` directory contains a `PKG-INFO` file built by the
`write_pkg_file` method of the `Distribution` class in Distutils.
This change does not impact Python itself because the metadata files are not
This change will not impact Python itself because the metadata files are not
used anywhere yet in the standard library besides Distutils.
It does impact the `setuptools` and `pip` projects, but given the fact that
It will impact the `setuptools` and `pip` projects, but given the fact that
they already work with a directory that contains a `PKG-INFO` file, the change
will have no deep consequences.
Let's take an example of the new format with the `docutils` distribution.
The elements installed in `site-packages` are::
The syntax of the `dist-info` directory name is as follows::
- docutils/
- roman.py
- docutils-0.5.egg-info/
PKG-INFO
name + '-' + version + '.dist-info'
The syntax of the egg-info directory name is as follows::
This `.dist-info` directory will contain these files:
name + '-' + version + '.egg-info'
- `METADATA`: the metadata, as described in PEP 345, PEP 241 and PEP 214.
- `RECORD`: list of installed files
- `INSTALLER`: the installer that was used
- `REQUESTED`: a marker to now if the project was installed as a dependency
or not.
The egg-info directory name is created using a new function called
``egginfo_dirname(name, version)`` added to ``pkgutil``. ``name`` is
converted to a standard distribution name by replacing any runs of
non-alphanumeric characters with a single '-'. ``version`` is converted
to a standard version string. Spaces become dots, and all other
non-alphanumeric characters (except dots) become dashes, with runs of
multiple dashes condensed to a single dash. Both attributes are then
converted into their filename-escaped form, i.e. any '-' characters are
replaced with '_' other than the one in 'egg-info' and the one
separating the name from the version number.
Examples::
RECORD
------
>>> egginfo_dirname('docutils', '0.5')
'docutils-0.5.egg-info'
>>> egginfo_dirname('python-ldap', '2.5')
'python_ldap-2.5.egg-info'
>>> egginfo_dirname('python-ldap', '2.5 a---5')
'python_ldap-2.5.a_5.egg-info'
Adding a RECORD file in the .egg-info directory
===============================================
A `RECORD` file is added inside the `.egg-info` directory at installation
A `RECORD` file is added inside the `.dist-info` directory at installation
time when installing a source distribution using the `install` command.
Notice that when installing a binary distribution created with `bdist` command
or a `bdist`-based command, the `RECORD` file will be installed as well since
@ -240,9 +223,6 @@ that are not in a RECORD file without prompting or warning.
This RECORD file is inspired from PEP 262 FILES [#pep262]_.
The RECORD format
-----------------
The `RECORD` file is a CSV file, composed of records, one line per
installed file. The ``csv`` module is used to read the file, with
these options:
@ -251,22 +231,9 @@ these options:
- quoting char : `"`.
- line terminator : ``os.linesep`` (so ``\r\n`` or ``\n``)
Each record is composed of three elements.
Each record is composed of three elements:
- the file's full **path**
- if the installed file is located in the directory where the `.egg-info`
directory of the package is located, it's a '/'-separated relative
path, no matter what the target system is. This makes this information
cross-compatible and allows simple installations to be relocatable.
- if the installed file is located under ``sys.prefix`` or
`sys.exec_prefix``, it's a it's a '/'-separated relative path prefixed
by the `$PREFIX` or the `$EXEC_PREFIX` string. The `install` command
decides which prefix to use depending on the files. For instance if
it's an executable script defined in the `scripts` option of the
setup script, `$EXEC_PREFIX` will be used. If `install` doesn't know
which prefix to use, `$PREFIX` is preferred.
- the file's full **path** (XXX wait for feedback, rephrasing)
- the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo`
generated files don't have any hash because they are automatically produced
@ -284,38 +251,18 @@ support (see PEP 278 [#pep278]_) is activated, avoiding any trouble
reading a file produced on a platform that uses a different new line
terminator.
Example
-------
Here's an example of a RECORD file (extract)::
Back to our `docutils` example, we now have::
/usr/lib/python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
/usr/lib/python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
/usr/lib/python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
/usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
/usr/lib/python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
/usr/lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD
- docutils/
- roman.py
- docutils-0.5.egg-info/
PKG-INFO
RECORD
Notice that the `RECORD` file can't contain a hash of itself and is just mentioned here
And the RECORD file contains (extract)::
docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
$EXEC_PREFIX/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
docutils-0.5.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195
docutils-0.5.egg-info/RECORD
Notice that:
- the `RECORD` file can't contain a hash of itself and is just mentioned here
- `docutils` and `docutils-0.5.egg-info` are located in `site-packages` so the file
paths are relative to it.
Example 2
---------
If a project has files installed elswhere than under the Python installation
root, they are added in the RECORD file as full paths. For example a project
that installs a `config.ini` file in `/etc/myapp` will be added like this::
A project that installs a `config.ini` file in `/etc/myapp` will be added like this::
/etc/myapp/config.ini,b690274f621402dda63bf11ba5373bf2,9544
@ -325,8 +272,8 @@ so a file that is copied in c:\MyApp\ will be::
c:\etc\myapp\config.ini,b690274f621402dda63bf11ba5373bf2,9544
Adding an INSTALLER file in the .egg-info directory
===================================================
INSTALLER
---------
The `install` command has a new option called `installer`. This option
is the name of the tool used to invoke the installation. It's an normalized
@ -337,11 +284,12 @@ lower-case string matching `[a-z0-9_\-\.]`.
It defaults to `distutils` if not provided.
When a distribution is installed, the INSTALLER file is generated in the
`.egg-info` directory with this value, to keep track of **who** installed the
`.dist-info` directory with this value, to keep track of **who** installed the
distribution. The file is a single-line text file.
Adding a REQUESTED file in the .egg-info directory
==================================================
REQUESTED
---------
Some install tools automatically detect unfulfilled dependencies and
install them. In these cases, it is useful to track which
@ -350,7 +298,7 @@ dependent distribution is later uninstalled, the user can be alerted
to the orphaned dependency.
If a distribution is installed by direct user request (the usual
case), a file REQUESTED is added to the .egg-info directory of the
case), a file REQUESTED is added to the .dist-info directory of the
installed distribution. The REQUESTED file may be empty, or may
contain a marker comment line beginning with the "#" character.
@ -364,31 +312,48 @@ specify whether the file is created.
If a package that was already installed on the system as a dependency
is later installed by name, the distutils ``install`` command will
create the REQUESTED file in the .egg-info directory of the existing
create the REQUESTED file in the .dist-info directory of the existing
installation.
New APIs in pkgutil
===================
To use the `.egg-info` directory content, we need to add in the standard
Implementation details
======================
New functions and classes in pkgutil
------------------------------------
To use the `.dist-info` directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs is `pkgutil`.
Query functions
---------------
Functions
~~~~~~~~~
The new functions added in the ``pkgutil`` are :
The new functions added in the ``pkgutil`` module are :
- ``distinfo_dirname(name, version)`` -> directory name
``name`` is converted to a standard distribution name by replacing any
runs of non-alphanumeric characters with a single '-'.
``version`` is converted to a standard version string. Spaces become
dots, and all other non-alphanumeric characters (except dots) become
dashes, with runs of multiple dashes condensed to a single dash.
Both attributes are then converted into their filename-escaped form,
i.e. any '-' characters are replaced with '_' other than the one in
'dist-info' and the one separating the name from the version number.
- ``get_distributions()`` -> iterator of ``Distribution`` instances.
Provides an iterator that looks for ``.egg-info`` directories in
Provides an iterator that looks for ``.dist-info`` directories in
``sys.path`` and returns ``Distribution`` instances for
each one of them.
- ``get_distribution(name)`` -> ``Distribution`` or None.
Scans all elements in ``sys.path`` and looks for all directories ending with
``.egg-info``. Returns a ``Distribution`` corresponding to the
``.egg-info`` directory that contains a PKG-INFO that matches `name`
``.dist-info``. Returns a ``Distribution`` corresponding to the
``.dist-info`` directory that contains a METADATA that matches `name`
for the `name` metadata.
This function only returns the first result founded, as no more than one
@ -400,11 +365,11 @@ The new functions added in the ``pkgutil`` are :
``path`` can be a local absolute path or a relative '/'-separated path.
Distribution class
------------------
~~~~~~~~~~~~~~~~~~
A new class called ``Distribution`` is created with the path of the
`.egg-info` directory provided to the constructor. It reads the metadata
contained in `PKG-INFO` when it is instanciated.
`.dist-info` directory provided to the constructor. It reads the metadata
contained in `METADATA` when it is instanciated.
``Distribution(path)`` -> instance
@ -415,7 +380,7 @@ contained in `PKG-INFO` when it is instanciated.
- ``name``: The name of the distribution.
- ``metadata``: A ``DistributionMetadata`` instance loaded with the
distribution's PKG-INFO file.
distribution's METADATA file.
- ``requested``: A boolean that indicates whether the REQUESTED
metadata file is present (in other words, whether the package was
@ -437,25 +402,25 @@ And following methods:
Returns ``True`` if ``path`` is listed in `RECORD`. ``path``
can be a local absolute path or a relative '/'-separated path.
- ``get_egginfo_file(path, binary=False)`` -> file object
- ``get_distinfo_file(path, binary=False)`` -> file object
Returns a file located under the `.egg-info` directory.
Returns a file located under the `.dist-info` directory.
Returns a ``file`` instance for the file pointed by ``path``.
``path`` has to be a '/'-separated path relative to the `.egg-info`
``path`` has to be a '/'-separated path relative to the `.dist-info`
directory or an absolute path.
If ``path`` is an absolute path and doesn't start with the `.egg-info`
If ``path`` is an absolute path and doesn't start with the `.dist-info`
directory path, a ``DistutilsError`` is raised.
If ``binary`` is ``True``, opens the file in read-only binary mode (`rb`),
otherwise opens it in read-only mode (`r`).
- ``get_egginfo_files(local=False)`` -> iterator of paths
- ``get_distinfo_files(local=False)`` -> iterator of paths
Iterates over the `RECORD` entries and returns paths for each line if the path
is pointing to a file located in the `.egg-info` directory or one of its
is pointing to a file located in the `.dist-info` directory or one of its
subdirectories.
If ``local`` is ``True``, each path is transformed into a
@ -467,27 +432,36 @@ and Zip files (so it works with files included in Zip files, see PEP 273 for
more details [#pep273]_). These classes are described in the documentation
of the prototype implementation for interested readers [#prototype]_.
Usage example
-------------
Examples
~~~~~~~~
Let's use some of the new APIs with our `docutils` example::
>>> from pkgutil import get_distribution, get_file_users
>>> from pkgutil import get_distribution, get_file_users, distinfo_dirname
>>> dist = get_distribution('docutils')
>>> dist.name
'docutils'
>>> dist.metadata.version
'0.5'
>>> distinfo_dirname('docutils', '0.5')
'docutils-0.5.dist-info'
>>> distinfo_dirname('python-ldap', '2.5')
'python_ldap-2.5.dist-info'
>>> distinfo_dirname('python-ldap', '2.5 a---5')
'python_ldap-2.5.a_5.dist-info'
>>> for path, hash, size in dist.get_installed_files()::
... print '%s %s %d' % (path, hash, size)
...
docutils/__init__.py b690274f621402dda63bf11ba5373bf2 9544
docutils/core.py 9c4b84aff68aa55f2e9bf70481b94333 66188
roman.py a4b84aff68aa55f2e9bf70481b943D3 234
/usr/local/bin/rst2html.py a4b84aff68aa55f2e9bf70481b943D3 234
docutils-0.5.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
docutils-0.5.egg-info/RECORD None None
/usr/lib/python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
/usr/lib/python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
/usr/lib/python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
/usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
/usr/lib/python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
/usr/lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD
>>> dist.uses('docutils/core.py')
True
@ -495,34 +469,15 @@ Let's use some of the new APIs with our `docutils` example::
>>> dist.uses('/usr/local/bin/rst2html.py')
True
>>> dist.get_egginfo_file('PKG-INFO')
>>> dist.get_distinfo_file('METADATA')
<open file at ...>
>>> dist.requested
True
PEP 262 replacement
===================
In the past an attempt was made to create a installation database (see PEP 262
[#pep262]_).
Extract from PEP 262 Requirements:
" We need a way to figure out what distributions, and what versions of
those distributions, are installed on a system..."
Since the APIs proposed in the current PEP provide everything needed to meet
this requirement, PEP 376 replaces PEP 262 and becomes the official
`installation database` standard.
The new version of PEP 345 (XXX work in progress) extends the Metadata
standard and fullfills the requirements described in PEP 262, like the
`REQUIRES` section.
Adding an Uninstall function
============================
New functions in Distutils
--------------------------
Distutils already provides a very basic way to install a distribution, which
is running the `install` command over the `setup.py` script of the
@ -545,7 +500,7 @@ directories left behind.
If the distribution is not found, a ``DistutilsUninstallError`` is be raised.
Filtering
---------
~~~~~~~~~
To make it a reference API for third-party projects that wish to control
how `uninstall` works, a second callable argument can be used. It's
@ -570,10 +525,10 @@ Of course, a third-party tool can use ``pkgutil`` APIs to implement
its own uninstall feature.
Installer marker
----------------
~~~~~~~~~~~~~~~~
As explained earlier in this PEP, the `install` command adds an `INSTALLER`
file in the `.egg-info` directory with the name of the installer.
file in the `.dist-info` directory with the name of the installer.
To avoid removing distributions that where installed by another packaging system,
the ``uninstall`` function takes an extra argument ``installer`` which default
@ -596,7 +551,7 @@ on Distutils APIs does extra steps on the system at installation time,
it has to undo at uninstallation time.
Adding an Uninstall script
==========================
~~~~~~~~~~~~~~~~~~~~~~~~~~
An `uninstall` script is added in Distutils. and is used like this::
@ -613,12 +568,17 @@ provide more advanced dependency management.
Backward compatibility and roadmap
==================================
These changes don't introduce any compatibility problems with the previous
version of Distutils, and will also work with existing third-party tools.
These changes don't introduce any compatibility problems since they
will be implemented in:
The plan is to include the functionality outlined in this PEP in distutils for
Python 2.7 and Python 3.2. A backport of the new distutils for 2.5, 2.6, 3.0
and 3.1 is provided so people can benefit from these new features.
- pkgutil in new functions
- distutils2
The plan is to include the functionality outlined in this PEP in pkgutil for
Python 2.7 and Python 3.2, and in Distutils2.
Distutils2 will also contain a backport of the new pgkutil, and can be used for
2.4 onward.
Distributions installed using existing, pre-standardization formats do not have
the necessary metadata available for the new API, and thus will be