python-peps/pep-0376.txt

599 lines
20 KiB
Plaintext
Raw Normal View History

PEP: 376
Title: Changing the .egg-info structure
2009-02-22 17:42:22 -05:00
Version: $Revision$
Last-Modified: $Date$
Author: Tarek Ziadé <tarek@ziade.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Feb-2009
2009-05-14 18:03:08 -04:00
Python-Version: 2.7, 3.2
Post-History:
Abstract
========
This PEP proposes various enhancements for Distutils:
2009-02-22 17:42:22 -05:00
- A new format for the .egg-info structure.
2009-04-13 17:14:19 -04:00
- Some APIs to read the meta-data of a project
- Replace PEP 262
- An uninstall feature
Definitions
===========
2009-06-12 06:00:17 -04:00
A **project** is a distribution of one or several files, which can be Python
2009-06-08 05:54:02 -04:00
modules, extensions or data. It is distributed using a `setup.py` script
with Distutils and/or Setuptools. The `setup.py` script indicates where each
2009-05-16 12:02:06 -04:00
elements should be installed.
2009-05-16 12:02:06 -04:00
Once installed, the elements are located in various places in the system, like:
- in Python's site-packages (Python modules, Python modules organized into packages,
2009-05-16 12:02:06 -04:00
Extensions, etc.)
- in Python's `include` directory.
- in Python's `bin` or `Script` directory.
- etc.
Rationale
=========
2009-05-14 05:32:29 -04:00
There are two problems right now in the way projects are installed in Python:
2009-05-16 12:02:06 -04:00
- There are too many ways to do it.
2009-05-14 18:07:21 -04:00
- There is no API to get the metadata of installed projects.
How projects are installed
--------------------------
Right now, when a project is installed in Python, every elements its contains
is installed in various directories.
2009-05-16 12:02:06 -04:00
The pure Python code for instance is installed in the `purelib` directory,
which is located in the Python installation in `lib\python2.6\site-packages`
for example under unix-like systems or Mac OS X, and in `Lib/site-packages`
2009-05-16 12:02:06 -04:00
under Windows. This is done with the Distutils `install` command, which calls
various subcommands.
The `install_egg_info` subcommand is called during this process, in order to
2009-05-16 12:02:06 -04:00
create an `.egg-info` file in the `purelib` directory.
2009-05-14 18:08:59 -04:00
For example, if the `zlib` project (which contains one package) is installed,
two elements will be installed in `site-packages`::
- zlib
- zlib-2.5.2-py2.4.egg-info
2009-05-16 12:02:06 -04:00
Where `zlib` is a Python package, and `zlib-2.5.2-py2.4.egg-info` is
a file containing the project metadata as described in PEP 314 [#pep314]_.
This file corresponds to the file called `PKG-INFO`, built by
the `sdist` command.
The problem is that many people use `easy_install` (setuptools [#setuptools]_)
or `pip` [#pip]_ to install their packages, and these third-party tools do not
2009-05-16 13:13:54 -04:00
install packages in the same way that Distutils does:
- `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory,
and adds a `PKG-INFO` file inside this directory. The `.egg` directory
2009-05-16 12:02:06 -04:00
contains in that case all the elements of the project that are supposed to
be installed in `site-packages`, and is placed in the `site-packages`
2009-05-16 12:02:06 -04:00
directory.
2009-05-16 12:02:06 -04:00
- `pip` creates an `.egg-info` directory inside the `site-packages` directory
and adds a `PKG-INFO` file inside it. Elements of the project are then
installed in various places like Distutils does.
They both add other files in the `EGG-INFO` or `.egg-info` directory, and
create or modify `.pth` files.
Uninstall information
---------------------
Distutils doesn't provide any `uninstall` command. If you want to uninstall
2009-05-16 12:02:06 -04:00
a project, you have to be a power user and remove the various elements that
were installed. Then look over the `.pth` file to clean them if necessary.
And the process differs, depending on the tools you have used to install the
project, and if the project's `setup.py` uses Distutils or Setuptools.
Under some circumstances, you might not be able to know for sure that you
2009-05-16 12:02:06 -04:00
have removed everything, or that you didn't break another project by
removing a file that was shared among several projects.
But there's common behavior: when you install a project, files are copied
in your system. And there's a way to keep track of theses files, so to remove
2009-05-16 12:02:06 -04:00
them.
What this PEP proposes
----------------------
To address those issues, this PEP proposes a few changes:
2009-06-08 05:54:02 -04:00
- a new `.egg-info` structure using a directory, based on one form of
the `EggFormats` standard from `setuptools` [#eggformats]_.
- new APIs in `pkgutil` to be able to query the information of installed
projects.
- a de-facto replacement for PEP 262
- an uninstall function in Distutils.
.egg-info becomes a directory
=============================
The first change would be to make `.egg-info` a directory and let it
hold the `PKG-INFO` file built by the `write_pkg_file` method of
2009-05-16 12:02:06 -04:00
the `Distribution` class in Distutils.
Notice that this change is based on the standard proposed by `EggFormats`.
2009-06-08 05:54:02 -04:00
Although, this standard proposes two ways to install files :
2009-06-12 06:00:17 -04:00
- a self-contained directory that can be zipped or left unzipped and that
2009-06-08 05:54:02 -04:00
contains the project files *and* the `.egg-info` directory.
- a distinct `.egg-info` directory located in the site-packages directory.
You may refer to the `EggFormats` documentation for more details.
2009-05-16 12:02:06 -04:00
This change will not impact Python itself, because `egg-info` files are not
used anywhere yet in the standard library besides Distutils.
Although it will impact the `setuptools` and `pip` projects, but given
the fact that they already work with a directory that contains a `PKG-INFO`
2009-05-16 12:02:06 -04:00
file, the change will have no deep consequences.
2009-05-16 12:02:06 -04:00
For example, if the `zlib` package is installed, the elements that
will be installed in `site-packages` will become::
- zlib
- zlib-2.5.2.egg-info/
PKG-INFO
The syntax of the egg-info directory name is as follows::
name + '-' + version + '.egg-info'
The egg-info directory name is created using a new function called
2009-06-12 06:00:17 -04:00
``egginfo_dirname(name, version)`` added to ``pkgutil``. ``name`` is
converted to a standard distribution name any runs of non-alphanumeric
characters are replaced with a single '-'. ``version`` is converted
to a standard version string. Spaces become dots, and all other
non-alphanumeric characters become dashes, with runs of multiple dashes
condensed to a single dash. Both attributes are then converted into their
filename-escaped form. Any '-' characters are currently replaced with '_'.
Examples::
2009-06-12 06:00:17 -04:00
>>> egginfo_dirname('zlib', '2.5.2')
'zlib-2.5.2.egg-info'
2009-06-12 06:00:17 -04:00
>>> egginfo_dirname('python-ldap', '2.5')
'python_ldap-2.5.egg-info'
2009-06-12 06:00:17 -04:00
>>> egginfo_dirname('python-ldap', '2.5 a---5')
'python_ldap-2.5.a_5.egg-info'
Adding a RECORD file in the .egg-info directory
===============================================
A `RECORD` file will be added inside the `.egg-info` directory at installation
time. The `RECORD` file will hold the list of installed files. These correspond
to the files listed by the `record` option of the `install` command, and will
be generated by default. This will allow uninstallation, as explained later in this
PEP. The `install` command will also provide an option to prevent the `RECORD`
2009-06-12 06:00:17 -04:00
file from being written and this option should be used when creating system
packages.
Third-party installation tools also should not overwrite or delete files
that are not in a RECORD file without prompting or warning.
This RECORD file is inspired from PEP 262 FILES [#pep262]_.
2009-05-16 12:02:06 -04:00
The RECORD format
-----------------
2009-06-08 05:54:02 -04:00
The `RECORD` file is a CSV file, composed of records, one line per
installed file. The ``csv`` module is used to read the file, with
the `excel` dialect, which uses these options to read the file:
- field delimiter : `,`
- quoting char : `"`.
- line terminator : `\r\n`
Each record is composed of three elements.
2009-05-16 12:02:06 -04:00
- the file's full **path**
2009-05-19 08:43:34 -04:00
- if the installed file is located in the directory where the .egg-info
directory of the package is located, it will be a '/'-separated relative
path, no matter what is the target system. This makes this information
2009-05-19 08:43:34 -04:00
cross-compatible and allows simple installation to be relocatable.
- if the installed file is located elsewhere in the system, a
2009-05-16 12:02:06 -04:00
'/'-separated absolute path is used.
2009-05-16 12:02:06 -04:00
- the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo`
generated files will not have a hash.
2009-05-16 12:02:06 -04:00
- the file's size in bytes
The ``csv`` module with its default options will be used to generate this file,
so the field separator will be ",". Any "," characters found within a field
will be escaped automatically by ``csv``.
2009-05-16 12:02:06 -04:00
Example
-------
Back to our `zlib` example, we will have::
- zlib
- zlib-2.5.2.egg-info/
PKG-INFO
RECORD
2009-05-16 12:02:06 -04:00
And the RECORD file will contain::
zlib/include/zconf.h,b690274f621402dda63bf11ba5373bf2,9544
zlib/include/zlib.h,9c4b84aff68aa55f2e9bf70481b94333,66188
zlib/lib/libz.a,e6d43fb94292411909404b07d0692d46,91128
zlib/share/man/man3/zlib.3,785dc03452f0508ff0678fba2457e0ba,4486
zlib-2.5.2.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195
2009-05-16 12:02:06 -04:00
zlib-2.5.2.egg-info/RECORD
Notice that:
- the `RECORD` file can't contain a hash of itself and is just mentioned here
- `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file
2009-05-16 12:02:06 -04:00
paths are relative to it.
Adding an INSTALLER file in the .egg-info directory
===================================================
The `install` command will have a new option called `installer`. This option
is the name of the tool used to invoke the installation. It's an normalized
2009-06-12 06:00:17 -04:00
lower-case string matching `[a-z0-9_\-\.]`.
$ python setup.py install --installer=pkg-system
It will default to `distutils` if not provided.
2009-06-12 06:00:17 -04:00
When a project is installed, the INSTALLER file is generated in the
`.egg-info` directory with this value, to keep track of **who** installed the
project. The file is a single-line text file.
New APIs in pkgutil
===================
To use the `.egg-info` directory content, we need to add in the standard
2009-04-13 16:52:58 -04:00
library a set of APIs. The best place to put these APIs seems to be `pkgutil`.
The API is organized in three classes:
2009-06-12 06:00:17 -04:00
- ``Distribution``: manages an `.egg-info` directory.
- ``DistributionDirectory``: manages a directory that contains some `.egg-info`
directories.
2009-06-12 06:00:17 -04:00
- ``DistributionDirectories``: manages ``EggInfoDirectory`` instances.
2009-06-12 06:00:17 -04:00
Distribution class
------------------
2009-06-12 06:00:17 -04:00
A new class called ``Distribution`` is created with a the path of the
`.egg-info` directory provided to the contructor. It reads the metadata
contained in `PKG-INFO` when it is instanciated.
2009-06-12 06:00:17 -04:00
``Distribution`` provides the following attributes:
2009-05-19 08:43:34 -04:00
2009-06-12 06:00:17 -04:00
- ``name``: The name of the distribution.
2009-06-12 06:00:17 -04:00
- ``metadata``: A ``DistributionMetadata`` instance loaded with the
distribution's PKG-INFO file.
And following methods:
- ``get_installed_files(local=False)`` -> iterator of (path, md5, size)
Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)``
for each line. If ``local`` is ``True``, the path is transformed into a
local absolute path. Otherwise the raw value from `RECORD` is returned.
- ``uses(path)`` -> Boolean
Returns ``True`` if ``path`` is listed in `RECORD`. ``path``
can be a local absolute path or a relative '/'-separated path.
2009-05-19 08:43:34 -04:00
2009-06-12 06:00:17 -04:00
- ``get_egginfo_file(path, binary=False)`` -> file object
2009-06-08 05:54:02 -04:00
Returns a file located under the `.egg-info` directory.
Returns a ``file`` instance for the file pointed by ``path``.
2009-06-12 06:00:17 -04:00
``path`` has to be a '/'-separated path relative to the `.egg-info`
2009-06-08 05:54:02 -04:00
directory or an absolute path.
2009-06-12 06:00:17 -04:00
If ``path`` is an absolute path and doesn't start with the `.egg-info`
2009-06-08 05:54:02 -04:00
directory path, a ``DistutilsError`` is raised.
If ``binary`` is ``True``, opens the file in binary mode.
2009-06-12 06:00:17 -04:00
- ``get_egginfo_files(local=False)`` -> iterator of paths
2009-06-08 05:54:02 -04:00
Iterates over the `RECORD` entries and return paths for each line if the path
is pointing a file located in the `.egg-info` directory or one of its
subdirectory.
If ``local`` is ``True``, each path is transformed into a
local absolute path. Otherwise the raw value from `RECORD` is returned.
2009-05-14 17:52:58 -04:00
2009-06-12 06:00:17 -04:00
DistributionDirectory class
---------------------------
2009-06-12 06:00:17 -04:00
A new class called ``DistributionDirectory`` is created with a path
corresponding to a directory. For each `.egg-info` directory founded in
`path`, the class creates a corresponding ``Distribution``.
2009-06-12 06:00:17 -04:00
The class is a ``set`` of ``Distribution`` instances. ``DistributionDirectory``
provides a ``path`` attribute corresponding to the path is was created with.
2009-06-12 06:00:17 -04:00
It also provides two methods besides the ones from ``set``:
2009-06-12 06:00:17 -04:00
- ``file_users(path)`` -> Iterator of ``Distribution``.
2009-06-12 06:00:17 -04:00
Returns all ``Distribution`` which uses ``path``, by calling
``Distribution.uses(path)`` on all ``Distribution`` instances.
2009-06-12 06:00:17 -04:00
- ``owner(path)`` -> ``Distribution`` instance or None
2009-06-12 06:00:17 -04:00
If ``path`` is used by only one ``Distribution`` instance, returns it.
Otherwise returns None.
2009-06-12 06:00:17 -04:00
DistributionDirectories class
-----------------------------
2009-06-12 06:00:17 -04:00
A new class called ``DistributionDirectories`` is created. It's a collection of
``DistributionDirectory`` instances. The constructor takes one optional
argument ``use_cache`` set to ``True`` by default. When ``True``,
2009-06-12 06:00:17 -04:00
``DistributionDirectories`` will use a global cache to reduce the numbers of
I/O accesses and speed up the lookups.
2009-06-12 06:00:17 -04:00
The cache is a global mapping containing ``DistributionDirectory`` instances.
When an ``DistributionDirectories`` object is created, it will use the cache to
add an entry for each path it visits, or reuse existing entries. The cache
2009-06-12 06:00:17 -04:00
usage can be disabled at any time with the ``use_cache`` attribute.
The cache can also be emptied with the global ``purge_cache`` function.
2009-06-12 06:00:17 -04:00
The class is a ``dict`` where the values are ``DistributionDirectory``
instances and the keys are their path attributes.
2009-06-12 06:00:17 -04:00
``EggInfoDirectories`` also provides the following methods besides the ones
from ``dict``:
- ``append(path)``
2009-06-12 06:00:17 -04:00
Creates an ``DistributionDirectory`` instance for ``path`` and adds it
in the mapping.
- ``load(paths)``
2009-06-12 06:00:17 -04:00
Creates and adds ``DistributionDirectory`` instances corresponding to
``paths``.
- ``reload()``
Reloads existing entries.
2009-06-12 06:00:17 -04:00
- ``get_distributions()`` -> Iterator of ``Distribution`` instances.
2009-06-12 06:00:17 -04:00
Iterates over all ``Distribution`` contained in ``DistributionDirectory``
instances.
2009-06-12 06:00:17 -04:00
- ``get_distribution(project_name)`` -> ``Distribution`` or None.
2009-06-12 06:00:17 -04:00
Returns a ``Distribution`` instance for the given project name.
If not found, returns None.
2009-06-12 06:00:17 -04:00
- ``get_file_users(path)`` -> Iterator of ``Distribution`` instances.
Iterates over all projects to find out which project uses the file.
2009-06-12 06:00:17 -04:00
Returns ``Distribution`` instances.
.egg-info functions
-------------------
The new functions added in the ``pkgutil`` are :
2009-06-12 06:00:17 -04:00
- ``get_distributions()`` -> iterator of ``Distribution`` instance.
2009-05-16 12:02:06 -04:00
Provides an iterator that looks for ``.egg-info`` directories in ``sys.path``
2009-06-12 06:00:17 -04:00
and returns ``Distribution`` instances for each one of them.
2009-05-16 12:02:06 -04:00
2009-06-12 06:00:17 -04:00
- ``get_distribution(name)`` -> ``Distribution`` or None.
Scans all elements in ``sys.path`` and looks for all directories ending with
2009-06-12 06:00:17 -04:00
``.egg-info``. Returns an ``Distribution`` corresponding to the ``.egg-info``
directory that contains a PKG-INFO that matches `name` for the `name`
metadata.
2009-05-19 08:43:34 -04:00
Notice that there should be at most one result. The first result founded
will be returned. If the directory is not found, returns None.
2009-05-16 12:02:06 -04:00
2009-06-12 06:00:17 -04:00
- ``get_file_users(path)`` -> iterator of ``Distribution`` instances.
2009-05-16 12:02:06 -04:00
Iterates over all projects to find out which project uses ``path``.
``path`` can be a local absolute path or a relative '/'-separated path.
2009-05-16 12:02:06 -04:00
2009-06-12 06:00:17 -04:00
All these functions use the same global instance of ``DistributionDirectories``to use the cache. Notice that the cache is never emptied explicitely.
Example
-------
Let's use some of the new APIs with our `zlib` example::
2009-06-12 06:00:17 -04:00
>>> from pkgutil import get_distribution, get_file_users
>>> dist = get_distribution('zlib')
>>> dist.name
'zlib'
2009-06-12 06:00:17 -04:00
>>> dist.metadata.version
'2.5.2'
2009-06-12 06:00:17 -04:00
>>> for path, hash, size in dist.get_installed_files()::
... print '%s %s %d %s' % (path, hash, size)
2009-05-16 12:02:06 -04:00
...
zlib/include/zconf.h b690274f621402dda63bf11ba5373bf2 9544
zlib/include/zlib.h 9c4b84aff68aa55f2e9bf70481b94333 66188
zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128
zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486
zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
zlib-2.5.2.egg-info/RECORD None None
2009-06-12 06:00:17 -04:00
>>> dist.uses('zlib/include/zlib.h')
True
2009-05-16 12:02:06 -04:00
2009-06-12 06:00:17 -04:00
>>> dist.get_egginfo_file('PKG-INFO')
<open file at ...>
PEP 262 replacement
===================
In the past an attempt was made to create a installation database (see PEP 262
[#pep262]_).
Extract from PEP 262 Requirements:
" We need a way to figure out what distributions, and what versions of
those distributions, are installed on a system..."
Since the APIs proposed in the current PEP provide everything needed to meet
this requirement, PEP 376 will replace PEP 262 and will become the official
`installation database` standard.
The new version of PEP 345 (XXX work in progress) will extend the Metadata
standard and will fullfill the requirements described in PEP 262, like the
`REQUIRES` section.
2009-05-07 05:02:23 -04:00
Adding an Uninstall function
============================
Distutils already provides a very basic way to install a project, which is running
the `install` command over the `setup.py` script of the distribution.
2009-03-28 15:08:48 -04:00
Distutils will provide a very basic ``uninstall`` function, that will be added
in ``distutils.util`` and will take the name of the project to uninstall as
its argument. ``uninstall`` will use the APIs desribed earlier and remove all
2009-05-16 12:02:06 -04:00
unique files, as long as their hash didn't change. Then it will remove
empty directories left behind.
2009-03-28 15:08:48 -04:00
2009-05-16 12:02:06 -04:00
``uninstall`` will return a list of uninstalled files::
2009-03-28 15:08:48 -04:00
>>> from distutils.util import uninstall
>>> uninstall('zlib')
2009-05-07 05:02:23 -04:00
['/opt/local/lib/python2.6/site-packages/zlib/file1',
'/opt/local/lib/python2.6/site-packages/zlib/file2']
If the project is not found, a ``DistutilsUninstallError`` will be raised.
2009-03-28 15:08:48 -04:00
Filtering
---------
To make it a reference API for third-party projects that wish to control
how `uninstall` works, a second callable argument can be used. It will be
called for each file that is removed. If the callable returns `True`, the
2009-05-16 12:02:06 -04:00
file will be removed. If it returns False, it will be left alone.
2009-03-28 15:08:48 -04:00
Examples::
>>> def _remove_and_log(path):
... logging.info('Removing %s' % path)
... return True
2009-05-16 12:02:06 -04:00
...
>>> uninstall('zlib', _remove_and_log)
2009-05-07 05:02:23 -04:00
>>> def _dry_run(path):
... logging.info('Removing %s (dry run)' % path)
... return False
2009-05-16 12:02:06 -04:00
...
>>> uninstall('zlib', _dry_run)
2009-03-28 15:08:48 -04:00
Of course, a third-party tool can use ``pkgutil`` APIs to implement
2009-05-16 13:13:54 -04:00
its own uninstall feature.
2009-05-16 12:02:06 -04:00
Installer marker
----------------
As explained earlier in this PEP, the `install` command adds an `INSTALLER`
file in the `.egg-info` directory with the name of the installer.
To avoid removing projects that where installed by another packaging system,
the ``uninstall`` function takes an extra argument ``installer`` which default
to ``distutils``.
2009-06-12 06:00:17 -04:00
When called, ``uninstall`` will control that the ``INSTALLER`` file matches
this argument. If not, it will raise a ``DistutilsUninstallError``::
>>> uninstall('zlib')
Traceback (most recent call last):
...
DistutilsUninstallError: zlib was installed by 'cool-pkg-manager'
>>> uninstall('zlib', installer='cool-pkg-manager')
2009-06-12 06:00:17 -04:00
This allows a third-party application to use the ``uninstall`` function
and make sure it's the only program that can remove a project it has
previously installed. This is useful when a third-party program that relies
on Distutils APIs does extra steps on the system at installation time,
it has to undo at uninstallation time.
2009-05-14 05:32:29 -04:00
Backward compatibility and roadmap
==================================
These changes will not introduce any compatibility problems with the previous
version of Distutils, and will also work with existing third-party tools.
2009-05-14 18:19:41 -04:00
Although, a backport of the new Distutils for 2.5, 2.6, 3.0 and 3.1 will be
2009-05-16 12:02:06 -04:00
provided so people can benefit from these new features.
2009-05-14 18:19:41 -04:00
2009-05-14 05:32:29 -04:00
The plan is to integrate them for Python 2.7 and Python 3.2
2009-03-28 15:08:48 -04:00
2009-05-16 12:02:06 -04:00
References
==========
.. [#pep262]
http://www.python.org/dev/peps/pep-0262
.. [#pep314]
http://www.python.org/dev/peps/pep-0314
2009-05-16 13:13:54 -04:00
.. [#setuptools]
http://peak.telecommunity.com/DevCenter/setuptools
.. [#pip]
http://pypi.python.org/pypi/pip
2009-05-16 12:02:06 -04:00
.. [#eggformats]
http://peak.telecommunity.com/DevCenter/EggFormats
2009-03-28 15:08:48 -04:00
Aknowledgments
==============
2009-03-28 15:08:48 -04:00
Jim Fulton, Ian Bicking, Phillip Eby, and many people at Pycon and Distutils-SIG.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: