python-peps/pep-0376.txt

702 lines
24 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 376
Title: Changing the .egg-info structure
Version: $Revision$
Last-Modified: $Date$
Author: Tarek Ziadé <tarek@ziade.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Feb-2009
Python-Version: 2.7, 3.2
Post-History:
Abstract
========
This PEP proposes various enhancements for Distutils:
- A new format for the .egg-info structure.
- Some APIs to read the meta-data of a distribution.
- A replacement PEP 262.
- An uninstall feature.
Definitions
===========
A **distribution** is a collection of files, which can be Python modules,
extensions, or data. A distribution is managed by a special module called
`setup.py` which contains a call to the `distutils.core.setup` function.
The arguments passed to that function describe the distribution, like
its `name`, its `version`, and so on.
Disutils provides, among other things, **commands** that can be called
through the shell using the `setup.py` script. An `sdist` command is provided
for instance to create a source distribution archive. An `install` command
is also provided to perform an installation of the distribution in the Python
installation the script is invoked with::
$ python setup.py install
See the Distutils [#distutils]_ documentation for more information.
Once installed, the elements are located in various places in the system, like:
- In Python's site-packages (Python modules, Python modules organized into
packages, Extensions, etc.)
- In Python's `include` directory.
- In Python's `bin` or `Script` directory.
- Etc.
Rationale
=========
There are two problems right now in the way distributions are installed in
Python:
- There are too many ways to do it.
- There is no API to get the metadata of installed distributions.
How distributions are installed
-------------------------------
Right now, when a distribution is installed in Python, the elements it
contains are installed in various directories.
The pure Python code, for instance, is installed in the `purelib` directory
which is located in the Python installation at ``lib/python2.6/site-packages``
for example under Unix-like systems or Mac OS X, and in ``Lib\site-packages``
under Windows. This is done with the Distutils `install` command, which calls
various subcommands.
The `install_egg_info` subcommand is called during this process in order to
create an `.egg-info` file in the `purelib` directory.
For example, for the `docutils` distribution, which contains one package an
extra module and executable scripts, three elements will be installed in
`site-packages`:
- `docutils`: The ``docutils`` package.
- `roman.py`: An extra module used by `docutils`.
- `docutils-0.5-py2.6.egg-info`: A file containing the distribution metadata
as described in PEP 314 [#pep314]_. This file corresponds to the file
called `PKG-INFO`, built by the `sdist` command.
Some executable scripts, such as `rst2html.py`, will also be added in the
`bin` directory of the Python installation.
The problem is that many people use `easy_install` (from the `setuptools`
project [#setuptools]_) or `pip` [#pip]_ to install their packages, and
these third-party tools do not install packages in the same way that Distutils
does:
- `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory
and adds a `PKG-INFO` file inside this directory. The `.egg` directory
contains all the elements of the distribution that are supposed to be
installed in `site-packages` and is placed in the `site-packages`
directory.
- `pip` creates an `.egg-info` directory inside the `site-packages` directory
and adds a `PKG-INFO` file inside it. Elements of the distribution are then
installed in various places like Distutils does.
They both add other files in the `EGG-INFO` or `.egg-info` directory and
create or modify `.pth` files.
Uninstall information
---------------------
Distutils doesn't provide an `uninstall` command. If you want to uninstall
a distribution, you have to be a power user and remove the various elements
that were installed, and then look over the `.pth` file to clean them if
necessary.
And the process differs depending on the tools you have used to install the
distribution and if the distribution's `setup.py` uses Distutils or
Setuptools.
Under some circumstances, you might not be able to know for sure that you
have removed everything, or that you didn't break another distribution by
removing a file that is shared among several distributions.
But there's a common behavior: when you install a distribution, files are
copied in your system. And it's possible to keep track of these files for
later removal.
What this PEP proposes
----------------------
To address those issues, this PEP proposes a few changes:
- A new `.egg-info` structure using a directory, based on one form of
the `EggFormats` standard from `setuptools` [#eggformats]_.
- New APIs in `pkgutil` to be able to query the information of installed
distributions.
- A de-facto replacement for PEP 262
- An uninstall function and an uninstall script in Distutils.
.egg-info becomes a directory
=============================
The first change would be to make `.egg-info` a directory and let it
hold the `PKG-INFO` file built by the `write_pkg_file` method of
the `Distribution` class in Distutils.
Notice that this change is based on the standard proposed by `EggFormats`,
although this standard proposes two ways to install files:
- A self-contained directory that can be zipped or left unzipped and contains
the distribution files *and* the `.egg-info` directory.
- A distinct `.egg-info` directory located in the site-packages directory.
You may refer to the `EggFormats` documentation for more details.
This change will not impact Python itself because `egg-info` files are not
used anywhere yet in the standard library besides Distutils.
However, it will impact the `setuptools` and `pip` projects, but given
the fact that they already work with a directory that contains a `PKG-INFO`
file, the change will have no deep consequences.
For example, if the `docutils` package is installed, the elements that
will be installed in `site-packages` will become::
- docutils/
- roman.py
- docutils-0.5-py2.6.egg-info/
PKG-INFO
The syntax of the egg-info directory name is as follows::
name + '-' + version + '.egg-info'
The egg-info directory name is created using a new function called
``egginfo_dirname(name, version)`` added to ``pkgutil``. ``name`` is
converted to a standard distribution name by replacing any runs of
non-alphanumeric characters with a single '-'. ``version`` is converted
to a standard version string. Spaces become dots, and all other
non-alphanumeric characters (except dots) become dashes, with runs of
multiple dashes condensed to a single dash. Both attributes are then
converted into their filename-escaped form, i.e. any '-' characters are
replaced with '_'.
Examples::
>>> egginfo_dirname('docutils', '0.5')
'docutils-0.5.egg-info'
>>> egginfo_dirname('python-ldap', '2.5')
'python_ldap-2.5.egg-info'
>>> egginfo_dirname('python-ldap', '2.5 a---5')
'python_ldap-2.5.a_5.egg-info'
Adding a RECORD file in the .egg-info directory
===============================================
A `RECORD` file will be added inside the `.egg-info` directory at installation
time. The `RECORD` file will hold the list of installed files. These correspond
to the files listed by the `record` option of the `install` command, and will
be generated by default. This will allow uninstallation, as explained later in
this PEP. The `install` command will also provide an option to prevent the
`RECORD` file from being written and this option should be used when creating
system packages.
Third-party installation tools also should not overwrite or delete files
that are not in a RECORD file without prompting or warning.
This RECORD file is inspired from PEP 262 FILES [#pep262]_.
The RECORD format
-----------------
The `RECORD` file is a CSV file, composed of records, one line per
installed file. The ``csv`` module is used to read the file, with
the `excel` dialect, which uses these options to read the file:
- field delimiter : `,`
- quoting char : `"`.
- line terminator : ``os.linesep`` (so ``\r\n`` or ``\n``)
Each record is composed of three elements.
- the file's full **path**
- if the installed file is located in the directory where the `.egg-info`
directory of the package is located, it will be a '/'-separated relative
path, no matter what the target system is. This makes this information
cross-compatible and allows simple installations to be relocatable.
- if the installed file is located elsewhere in the system, a
'/'-separated absolute path is used.
- the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo`
generated files will not have a hash because they are automatically produced
from `py` files. So checking the hash of the corresponding `py` file is
enough to decide if the file and its associated `pyc` or `pyo` files have
changed.
- the file's size in bytes
The ``csv`` module with its default options will be used to generate this file,
so the field separator will be ",". Any "," characters found within a field
will be escaped automatically by ``csv``.
When the file is read, the `U` option will be used so the universal newline
support (see PEP 278 [#pep278]_) will be activated, avoiding any trouble
reading a file produced on a platform that uses a different new line
terminator.
Example
-------
Back to our `docutils` example, we will have::
- docutils/
- roman.py
- docutils-0.5-py2.6.egg-info/
PKG-INFO
RECORD
And the RECORD file will contain (extract)::
docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
/usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
docutils-0.5-py2.6.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195
docutils-0.5-py2.6.egg-info/RECORD
Notice that:
- the `RECORD` file can't contain a hash of itself and is just mentioned here
- `docutils` and `docutils-0.5-py2.6.egg-info` are located in `site-packages` so the file
paths are relative to it.
Adding an INSTALLER file in the .egg-info directory
===================================================
The `install` command will have a new option called `installer`. This option
is the name of the tool used to invoke the installation. It's an normalized
lower-case string matching `[a-z0-9_\-\.]`.
$ python setup.py install --installer=pkg-system
It will default to `distutils` if not provided.
When a distribution is installed, the INSTALLER file is generated in the
`.egg-info` directory with this value, to keep track of **who** installed the
distribution. The file is a single-line text file.
New APIs in pkgutil
===================
To use the `.egg-info` directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs seems to be `pkgutil`.
The API is organized in five classes that work with directories and Zip files
(so it works with files included in Zip files, see PEP 273 for more details
[#pep273]_.
- ``Distribution``: manages an `.egg-info` directory.
- ``ZippedDistribution``: manages an `.egg-info` directory contained in a zip
file.
- ``DistributionDir``: manages a directory that contains some `.egg-info`
directories.
- ``ZippedDistributionDir``: manages a zipped directory that contains
some `.egg.info` directory.
- ``DistributionDirMap``: manages ``DistributionDir`` instances.
Distribution class
------------------
A new class called ``Distribution`` is created with a the path of the
`.egg-info` directory provided to the contructor. It reads the metadata
contained in `PKG-INFO` when it is instanciated.
``Distribution(path)`` -> instance
Creates a ``Distribution`` instance for the given ``path``.
``Distribution`` provides the following attributes:
- ``name``: The name of the distribution.
- ``metadata``: A ``DistributionMetadata`` instance loaded with the
distribution's PKG-INFO file.
And following methods:
- ``get_installed_files(local=False)`` -> iterator of (path, md5, size)
Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)``
for each line. If ``local`` is ``True``, the path is transformed into a
local absolute path. Otherwise the raw value from `RECORD` is returned.
- ``uses(path)`` -> Boolean
Returns ``True`` if ``path`` is listed in `RECORD`. ``path``
can be a local absolute path or a relative '/'-separated path.
- ``get_egginfo_file(path, binary=False)`` -> file object
Returns a file located under the `.egg-info` directory.
Returns a ``file`` instance for the file pointed by ``path``.
``path`` has to be a '/'-separated path relative to the `.egg-info`
directory or an absolute path.
If ``path`` is an absolute path and doesn't start with the `.egg-info`
directory path, a ``DistutilsError`` is raised.
If ``binary`` is ``True``, opens the file in read-only binary mode (`rb`),
otherwise opens it in read-only mode (`r`).
- ``get_egginfo_files(local=False)`` -> iterator of paths
Iterates over the `RECORD` entries and return paths for each line if the path
is pointing a file located in the `.egg-info` directory or one of its
subdirectory.
If ``local`` is ``True``, each path is transformed into a
local absolute path. Otherwise the raw value from `RECORD` is returned.
ZippedDistribution class
------------------------
A ``ZippedDistribution`` class is provided. It overrides the ``Distribution``
class so its methods work with an `.egg.info` directory located in a zip file.
``ZippedDistribution(zipfile, path)`` -> instance
Creates a ``ZippedDistribution`` instance for the given relative ``path``
located in the ``zipfile`` file.
Other public methods and attributes are similar to ``Distribution``.
DistributionDir class
---------------------------
A new class called ``DistributionDir`` is created with a path
corresponding to a directory. For each `.egg-info` directory founded in
`path`, the class creates a corresponding ``Distribution``.
The class is a ``set`` of ``Distribution`` instances. ``DistributionDir``
provides a ``path`` attribute corresponding to the path is was created with.
``DistributionDir(path)`` -> instance
Creates a ``DistributionDir`` instance for the given ``path``.
It also provides one extra method besides the ones from ``set``:
- ``get_file_users(path)`` -> Iterator of ``Distribution``.
Returns all ``Distribution`` which uses ``path``, by calling
``Distribution.uses(path)`` on all ``Distribution`` instances.
ZippedDistributionDir class
---------------------------------
A ``ZippedDistributionDir`` is provided. It overrides the
``DistributionDir`` class so its methods work with a Zip file.
``ZippedDistributionDir(path)`` -> instance
Creates a ``ZippedDistributionDir`` instance for the given ``path``.
Other public methods and attributes are similar to ``DistributionDir``.
DistributionDirMap class
-----------------------------
A new class called ``DistributionDirMap`` is created. It's a collection of
``DistributionDir`` and ``ZippedDistributionDir`` instances.
``DistributionDirMap(paths=None, use_cache=True)`` -> instance
If ``paths`` is not not, it's a sequence of paths the constructor loads
in the instance.
The constructor also takes an optional ``use_cache`` argument.
When it's ``True``, ``DistributionDirMap`` will use a global
cache to reduce the numbers of I/O accesses and speed up the lookups.
The cache is a global mapping containing ``DistributionDir`` and
``ZippedDistributionDir`` instances. When a
``DistributionDirMap`` object is created, it will use the cache to
add an entry for each path it visits, or reuse existing entries. The
cache usage can be disabled at any time with the ``use_cache`` attribute.
The cache can also be emptied with the global ``purge_cache`` function.
The class is a ``dict`` where the values are ``DistributionDir``
and ``ZippedDistributionDir`` instances and the keys are their path
attributes.
``DistributionDirMap`` also provides the following methods besides the ones
from ``dict``:
- ``load(*paths)``
Creates and adds ``DistributionDir`` (or
``ZippedDistributionDir``) instances corresponding to ``paths``.
- ``reload()``
Reloads existing entries.
- ``get_distributions()`` -> Iterator of ``Distribution`` (or
``ZippedDistribution``) instances.
Iterates over all ``Distribution`` and ``ZippedDistribution`` contained
in ``DistributionDir`` and ``ZippedDistributionDir`` instances.
- ``get_distribution(dist_name)`` -> ``Distribution`` (or
``ZippedDistribution``) or None.
Returns a ``Distribution`` (or ``ZippedDistribution``) instance for the
given distribution name. If not found, returns None.
- ``get_file_users(path)`` -> Iterator of ``Distribution`` (or
``ZippedDistribution``) instances.
Iterates over all distributions to find out which distributions use the file.
Returns ``Distribution`` (or ``ZippedDistribution``) instances.
.egg-info functions
-------------------
The new functions added in the ``pkgutil`` are :
- ``get_distributions()`` -> iterator of ``Distribution`` (or
``ZippedDistribution``) instance.
Provides an iterator that looks for ``.egg-info`` directories in ``sys.path``
and returns ``Distribution`` (or ``ZippedDistribution``) instances for
each one of them.
- ``get_distribution(name)`` -> ``Distribution`` (or ``ZippedDistribution``)
or None.
Scans all elements in ``sys.path`` and looks for all directories ending with
``.egg-info``. Returns a ``Distribution`` (or ``ZippedDistribution``)
corresponding to the ``.egg-info`` directory that contains a PKG-INFO that
matches `name` for the `name` metadata.
Notice that there should be at most one result. The first result founded
will be returned. If the directory is not found, returns None.
- ``get_file_users(path)`` -> iterator of ``Distribution`` (or
``ZippedDistribution``) instances.
Iterates over all distributions to find out which distributions uses ``path``.
``path`` can be a local absolute path or a relative '/'-separated path.
All these functions use the same global instance of ``DistributionDirMap``
to use the cache. Notice that the cache is never emptied explicitely.
Example
-------
Let's use some of the new APIs with our `docutils` example::
>>> from pkgutil import get_distribution, get_file_users
>>> dist = get_distribution('docutils')
>>> dist.name
'docutils'
>>> dist.metadata.version
'0.5'
>>> for path, hash, size in dist.get_installed_files()::
... print '%s %s %d' % (path, hash, size)
...
docutils/__init__.py b690274f621402dda63bf11ba5373bf2 9544
docutils/core.py 9c4b84aff68aa55f2e9bf70481b94333 66188
roman.py a4b84aff68aa55f2e9bf70481b943D3 234
/usr/local/bin/rst2html.py a4b84aff68aa55f2e9bf70481b943D3 234
docutils-0.5-py2.6.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
docutils-0.5-py2.6.egg-info/RECORD None None
>>> dist.uses('docutils/core.py')
True
>>> dist.uses('/usr/local/bin/rst2html.py')
True
>>> dist.get_egginfo_file('PKG-INFO')
<open file at ...>
PEP 262 replacement
===================
In the past an attempt was made to create a installation database (see PEP 262
[#pep262]_).
Extract from PEP 262 Requirements:
" We need a way to figure out what distributions, and what versions of
those distributions, are installed on a system..."
Since the APIs proposed in the current PEP provide everything needed to meet
this requirement, PEP 376 will replace PEP 262 and will become the official
`installation database` standard.
The new version of PEP 345 (XXX work in progress) will extend the Metadata
standard and will fullfill the requirements described in PEP 262, like the
`REQUIRES` section.
Adding an Uninstall function
============================
Distutils already provides a very basic way to install a distribution, which
is running the `install` command over the `setup.py` script of the
distribution.
Distutils will provide a very basic ``uninstall`` function, that will be added
in ``distutils.util`` and will take the name of the distribution to uninstall
as its argument. ``uninstall`` will use the APIs desribed earlier and remove all
unique files, as long as their hash didn't change. Then it will remove
empty directories left behind.
``uninstall`` will return a list of uninstalled files::
>>> from distutils.util import uninstall
>>> uninstall('docutils')
['/opt/local/lib/python2.6/site-packages/docutils/core.py',
...
'/opt/local/lib/python2.6/site-packages/docutils/__init__.py']
If the distribution is not found, a ``DistutilsUninstallError`` will be raised.
Filtering
---------
To make it a reference API for third-party projects that wish to control
how `uninstall` works, a second callable argument can be used. It will be
called for each file that is removed. If the callable returns `True`, the
file will be removed. If it returns False, it will be left alone.
Examples::
>>> def _remove_and_log(path):
... logging.info('Removing %s' % path)
... return True
...
>>> uninstall('docutils', _remove_and_log)
>>> def _dry_run(path):
... logging.info('Removing %s (dry run)' % path)
... return False
...
>>> uninstall('docutils', _dry_run)
Of course, a third-party tool can use ``pkgutil`` APIs to implement
its own uninstall feature.
Installer marker
----------------
As explained earlier in this PEP, the `install` command adds an `INSTALLER`
file in the `.egg-info` directory with the name of the installer.
To avoid removing distributions that where installed by another packaging system,
the ``uninstall`` function takes an extra argument ``installer`` which default
to ``distutils``.
When called, ``uninstall`` will control that the ``INSTALLER`` file matches
this argument. If not, it will raise a ``DistutilsUninstallError``::
>>> uninstall('docutils')
Traceback (most recent call last):
...
DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'
>>> uninstall('docutils', installer='cool-pkg-manager')
This allows a third-party application to use the ``uninstall`` function
and make sure it's the only program that can remove a distribution it has
previously installed. This is useful when a third-party program that relies
on Distutils APIs does extra steps on the system at installation time,
it has to undo at uninstallation time.
Adding an Uninstall script
==========================
An `uninstall` script will be added in Distutils. and will be used
like this::
$ python -m distutils.uninstall packagename
Notice that script will not control if the removal of a distribution breaks
another distribution. Although it will make sure that all the files it removes
are not used by any other distribution, by using the uninstall function.
Backward compatibility and roadmap
==================================
These changes will not introduce any compatibility problems with the previous
version of Distutils, and will also work with existing third-party tools.
Although, a backport of the new Distutils for 2.5, 2.6, 3.0 and 3.1 will be
provided so people can benefit from these new features.
The plan is to integrate them for Python 2.7 and Python 3.2
References
==========
.. [#distutils]
http://docs.python.org/distutils
.. [#pep262]
http://www.python.org/dev/peps/pep-0262
.. [#pep314]
http://www.python.org/dev/peps/pep-0314
.. [#setuptools]
http://peak.telecommunity.com/DevCenter/setuptools
.. [#pip]
http://pypi.python.org/pypi/pip
.. [#eggformats]
http://peak.telecommunity.com/DevCenter/EggFormats
.. [#pep273]
http://www.python.org/dev/peps/pep-0273
.. [#pep278]
http://www.python.org/dev/peps/pep-0278
Aknowledgments
==============
Jim Fulton, Ian Bicking, Phillip Eby, and many people at Pycon and Distutils-SIG.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: