PEP: 376 Title: Database of Installed Python Distributions Version: $Revision$ Last-Modified: $Date$ Author: Tarek Ziadé Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 22-Feb-2009 Python-Version: 2.7, 3.2 Post-History: Abstract ======== The goal of this PEP is to provide a standard infrastructure to manage project distributions installed on a system, so all tools that are installing or removing projects are interoperable. To achieve this goal, the PEP proposes a new format to describe installed distributions on a system. It also describes a reference implementation for the standard library. In the past an attempt was made to create an installation database (see PEP 262 [#pep262]_). Combined with PEP 345, the current proposal supersedes PEP 262. Rationale ========= There are two problems right now in the way distributions are installed in Python: - There are too many ways to do it and this makes interoperation difficult. - There is no API to get information on installed distributions. How distributions are installed ------------------------------- Right now, when a distribution is installed in Python, every element can be installed in a different directory. For instance, `Distutils` installs the pure Python code in the `purelib` directory, which is ``lib\python2.6\site-packages`` for unix-like systems and Mac OS X, or `Lib/site-packages` under Python's installation directory for Windows. Additionally, the `install_egg_info` subcommand of the Distutils `install` command adds an `.egg-info` file for the project into the `purelib` directory. For example, for the `docutils` distribution, which contains one package an extra module and executable scripts, three elements are installed in `site-packages`: - `docutils`: The ``docutils`` package. - `roman.py`: An extra module used by `docutils`. - `docutils-0.5-py2.6.egg-info`: A file containing the distribution metadata as described in PEP 314 [#pep314]_. This file corresponds to the file called `PKG-INFO`, built by the `sdist` command. Some executable scripts, such as `rst2html.py`, are also added in the `bin` directory of the Python installation. Another project called `setuptools` [#setuptools]_ has two other formats to install distributions, called `EggFormats` [#eggformats]_: - a self-contained `.egg` directory, that contains all the distribution files and the distribution metadata in a file called `PKG-INFO` in a subdirectory called `EGG-INFO`. `setuptools` creates other files in that directory that can be considered as complementary metadata. - an `.egg-info` directory installed in `site-packages`, that contains the same files `EGG-INFO` has in the `.egg` format. The first format is automatically used when you install a distribution that uses the ``setuptools.setup`` function in its setup.py file, instead of the ``distutils.core.setup`` one. `setuptools` also add a reference to the distribution into an ``easy-install.pth`` file. Last, the `setuptools` project provides an executable script called `easy_install` [#easyinstall]_ that installs all distributions, including distutils-based ones in self-contained `.egg` directories. If you want to have standalone `.egg-info` directories for your distributions, e.g. the second `setuptools` format, you have to force it when you work with a setuptools-based distribution or with the `easy_install` script. You can force it by using the `-–single-version-externally-managed` option **or** the `--root` option. This will make the `setuptools` project install the project like distutils does. This option is used by : - the `pip` [#pip]_ installer - the Fedora packagers [#fedora]_. - the Debian packagers [#debian]_. Uninstall information --------------------- Distutils doesn't provide an `uninstall` command. If you want to uninstall a distribution, you have to be a power user and remove the various elements that were installed, and then look over the `.pth` file to clean them if necessary. And the process differs depending on the tools you have used to install the distribution and if the distribution's `setup.py` uses Distutils or Setuptools. Under some circumstances, you might not be able to know for sure that you have removed everything, or that you didn't break another distribution by removing a file that is shared among several distributions. But there's a common behavior: when you install a distribution, files are copied in your system. And it's possible to keep track of these files for later removal. Moreover, the Pip project has gained an `uninstall` feature lately. It records all installed files, using the `record` option of the `install` command. What this PEP proposes ---------------------- To address those issues, this PEP proposes a few changes: - A new `.dist-info` structure using a directory, inspired on one format of the `EggFormats` standard from `setuptools`. - New APIs in `pkgutil` to be able to query the information of installed distributions. - An uninstall function and an uninstall script in Distutils. One .dist-info directory per installed distribution =================================================== This PEP proposes an installation format inspired by one of the options in the `EggFormats` standard, the one that uses a distinct directory located in the site-packages directory. This distinct directory is named as follows:: name + '-' + version + '.dist-info' This `.dist-info` directory can contain these files: - `METADATA`: contains metadata, as described in PEP 345, PEP 314 and PEP 241. - `RECORD`: records the list of installed files - `INSTALLER`: records the name of the tool used to install the project - `REQUESTED`: the presence of this file indicates that the project installation was explicitly requested (i.e., not installed as a dependency). The METADATA, RECORD and INSTALLER files are mandatory, while REQUESTED may be missing. This proposal will not impact Python itself because the metadata files are not used anywhere yet in the standard library besides Distutils. It will impact the `setuptools` and `pip` projects but, given the fact that they already work with a directory that contains a `PKG-INFO` file, the change will have no deep consequences. RECORD ------ A `RECORD` file is added inside the `.dist-info` directory at installation time when installing a source distribution using the `install` command. Notice that when installing a binary distribution created with `bdist` command or a `bdist`-based command, the `RECORD` file will be installed as well since these commands use the `install` command to create binary distributions. The `RECORD` file holds the list of installed files. These correspond to the files listed by the `record` option of the `install` command, and will be generated by default. This allows the implementation of an uninstallation feature, as explained later in this PEP. The `install` command also provides an option to prevent the `RECORD` file from being written and this option should be used when creating system packages. Third-party installation tools also should not overwrite or delete files that are not in a RECORD file without prompting or warning. This RECORD file is inspired from PEP 262 FILES [#pep262]_. The `RECORD` file is a CSV file, composed of records, one line per installed file. The ``csv`` module is used to read the file, with these options: - field delimiter : `,` - quoting char : `"`. - line terminator : ``os.linesep`` (so ``\r\n`` or ``\n``) Each record is composed of three elements: - the file's **path**, relative to ``sys.prefix``. When the file is not under ``sys.prefix``, an absolute path is used. - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo` generated files don't have any hash because they are automatically produced from `py` files. So checking the hash of the corresponding `py` file is enough to decide if the file and its associated `pyc` or `pyo` files have changed. - the file's size in bytes The ``csv`` module is used to generate this file, so the field separator is ",". Any "," character found within a field is escaped automatically by ``csv``. When the file is read, the `U` option is used so the universal newline support (see PEP 278 [#pep278]_) is activated, avoiding any trouble reading a file produced on a platform that uses a different new line terminator. Here's an example of a RECORD file (extract):: lib/python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544 lib/python2.6/site-packages/docutils/__init__.pyc lib/python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188 lib/python2.6/site-packages/docutils/core.pyc lib/python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234 lib/python2.6/site-packages/roman.pyc /usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234 /usr/local/bin/rst2html.pyc python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195 lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD Notice that the `RECORD` file can't contain a hash of itself and is just mentioned here A project that installs a `config.ini` file in `/etc/myapp` will be added like this:: /etc/myapp/config.ini,b690274f621402dda63bf11ba5373bf2,9544 For a windows platform, the drive letter is added for the absolute paths, so a file that is copied in c:\MyApp\ will be:: c:\etc\myapp\config.ini,b690274f621402dda63bf11ba5373bf2,9544 INSTALLER --------- The `install` command has a new option called `installer`. This option is the name of the tool used to invoke the installation. It's an normalized lower-case string matching `[a-z0-9_\-\.]`. $ python setup.py install --installer=pkg-system It defaults to `distutils` if not provided. When a distribution is installed, the INSTALLER file is generated in the `.dist-info` directory with this value, to keep track of **who** installed the distribution. The file is a single-line text file. REQUESTED --------- Some install tools automatically detect unfulfilled dependencies and install them. In these cases, it is useful to track which distributions were installed purely as a dependency, so if their dependent distribution is later uninstalled, the user can be alerted of the orphaned dependency. If a distribution is installed by direct user request (the usual case), a file REQUESTED is added to the .dist-info directory of the installed distribution. The REQUESTED file may be empty, or may contain a marker comment line beginning with the "#" character. If an install tool installs a distribution automatically, as a dependency of another distribution, the REQUESTED file should not be created. The ``install`` command of distutils by default creates the REQUESTED file. It accepts ``--requested`` and ``--no-requested`` options to explicitly specify whether the file is created. If a distribution that was already installed on the system as a dependency is later installed by name, the distutils ``install`` command will create the REQUESTED file in the .dist-info directory of the existing installation. Implementation details ====================== New functions and classes in pkgutil ------------------------------------ To use the `.dist-info` directory content, we need to add in the standard library a set of APIs. The best place to put these APIs is `pkgutil`. Functions ~~~~~~~~~ The new functions added in the ``pkgutil`` module are : - ``distinfo_dirname(name, version)`` -> directory name ``name`` is converted to a standard distribution name by replacing any runs of non-alphanumeric characters with a single '-'. ``version`` is converted to a standard version string. Spaces become dots, and all other non-alphanumeric characters (except dots) become dashes, with runs of multiple dashes condensed to a single dash. Both attributes are then converted into their filename-escaped form, i.e. any '-' characters are replaced with '_' other than the one in 'dist-info' and the one separating the name from the version number. - ``get_distributions()`` -> iterator of ``Distribution`` instances. Provides an iterator that looks for ``.dist-info`` directories in ``sys.path`` and returns ``Distribution`` instances for each one of them. - ``get_distribution(name)`` -> ``Distribution`` or None. Scans all elements in ``sys.path`` and looks for all directories ending with ``.dist-info``. Returns a ``Distribution`` corresponding to the ``.dist-info`` directory that contains a METADATA that matches `name` for the `name` metadata. This function only returns the first result founded, since no more than one values are expected. If the directory is not found, returns None. - ``get_file_users(path)`` -> iterator of ``Distribution`` instances. Iterates over all distributions to find out which distributions uses ``path``. ``path`` can be a local absolute path or a relative '/'-separated path. A local absolute path is an absolute path in which occurrences of '/' have been replaced by the system separator given by ``os.sep``. Distribution class ~~~~~~~~~~~~~~~~~~ A new class called ``Distribution`` is created with the path of the `.dist-info` directory provided to the constructor. It reads the metadata contained in `METADATA` when it is instanciated. ``Distribution(path)`` -> instance Creates a ``Distribution`` instance for the given ``path``. ``Distribution`` provides the following attributes: - ``name``: The name of the distribution. - ``metadata``: A ``DistributionMetadata`` instance loaded with the distribution's METADATA file. - ``requested``: A boolean that indicates whether the REQUESTED metadata file is present (in other words, whether the distribution was installed by user request). And following methods: - ``get_installed_files(local=False)`` -> iterator of (path, md5, size) Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)`` for each line. If ``local`` is ``True``, the path is transformed into a local absolute path. Otherwise the raw value from `RECORD` is returned. A local absolute path is an absolute path in which occurrences of '/' have been replaced by the system separator given by ``os.sep``. - ``uses(path)`` -> Boolean Returns ``True`` if ``path`` is listed in `RECORD`. ``path`` can be a local absolute path or a relative '/'-separated path. - ``get_distinfo_file(path, binary=False)`` -> file object Returns a file located under the `.dist-info` directory. Returns a ``file`` instance for the file pointed by ``path``. ``path`` has to be a '/'-separated path relative to the `.dist-info` directory or an absolute path. If ``path`` is an absolute path and doesn't start with the `.dist-info` directory path, a ``DistutilsError`` is raised. If ``binary`` is ``True``, opens the file in read-only binary mode (`rb`), otherwise opens it in read-only mode (`r`). - ``get_distinfo_files(local=False)`` -> iterator of paths Iterates over the `RECORD` entries and returns paths for each line if the path is pointing to a file located in the `.dist-info` directory or one of its subdirectories. If ``local`` is ``True``, each path is transformed into a local absolute path. Otherwise the raw value from `RECORD` is returned. Notice that the API is organized in five classes that work with directories and Zip files (so it works with files included in Zip files, see PEP 273 for more details [#pep273]_). These classes are described in the documentation of the prototype implementation for interested readers [#prototype]_. Examples ~~~~~~~~ Let's use some of the new APIs with our `docutils` example:: >>> from pkgutil import get_distribution, get_file_users, distinfo_dirname >>> dist = get_distribution('docutils') >>> dist.name 'docutils' >>> dist.metadata.version '0.5' >>> distinfo_dirname('docutils', '0.5') 'docutils-0.5.dist-info' >>> distinfo_dirname('python-ldap', '2.5') 'python_ldap-2.5.dist-info' >>> distinfo_dirname('python-ldap', '2.5 a---5') 'python_ldap-2.5.a_5.dist-info' >>> for path, hash, size in dist.get_installed_files():: ... print '%s %s %d' % (path, hash, size) ... python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544 python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188 python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234 /usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234 python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195 python2.6/site-packages/docutils-0.5.dist-info/RECORD >>> dist.uses('docutils/core.py') True >>> dist.uses('/usr/local/bin/rst2html.py') True >>> dist.get_distinfo_file('METADATA') >>> dist.requested True New functions in Distutils -------------------------- Distutils already provides a very basic way to install a distribution, which is running the `install` command over the `setup.py` script of the distribution. Distutils will provide a very basic ``uninstall`` function, that is added in ``distutils.util`` and takes the name of the distribution to uninstall as its argument. ``uninstall`` uses the APIs described earlier and remove all unique files, as long as their hash didn't change. Then it removes empty directories left behind. ``uninstall`` returns a list of uninstalled files:: >>> from distutils.util import uninstall >>> uninstall('docutils') ['/opt/local/lib/python2.6/site-packages/docutils/core.py', ... '/opt/local/lib/python2.6/site-packages/docutils/__init__.py'] If the distribution is not found, a ``DistutilsUninstallError`` is raised. Filtering ~~~~~~~~~ To make it a reference API for third-party projects that wish to control how `uninstall` works, a second callable argument can be used. It's called for each file that is removed. If the callable returns `True`, the file is removed. If it returns False, it's left alone. Examples:: >>> def _remove_and_log(path): ... logging.info('Removing %s' % path) ... return True ... >>> uninstall('docutils', _remove_and_log) >>> def _dry_run(path): ... logging.info('Removing %s (dry run)' % path) ... return False ... >>> uninstall('docutils', _dry_run) Of course, a third-party tool can use lower-level ``pkgutil`` APIs to implement its own uninstall feature. Installer marker ~~~~~~~~~~~~~~~~ As explained earlier in this PEP, the `install` command adds an `INSTALLER` file in the `.dist-info` directory with the name of the installer. To avoid removing distributions that were installed by another packaging system, the ``uninstall`` function takes an extra argument ``installer`` which defaults to ``distutils``. When called, ``uninstall`` controls that the ``INSTALLER`` file matches this argument. If not, it raises a ``DistutilsUninstallError``:: >>> uninstall('docutils') Traceback (most recent call last): ... DistutilsUninstallError: docutils was installed by 'cool-pkg-manager' >>> uninstall('docutils', installer='cool-pkg-manager') This allows a third-party application to use the ``uninstall`` function and strongly suggest that no other program remove a distribution it has previously installed. This is useful when a third-party program that relies on Distutils APIs does extra steps on the system at installation time, it has to undo at uninstallation time. Adding an Uninstall script ~~~~~~~~~~~~~~~~~~~~~~~~~~ An `uninstall` script is added in Distutils. and is used like this:: $ python -m distutils.uninstall projectname Notice that script doesn't control if the removal of a distribution breaks another distribution. Although it makes sure that all the files it removes are not used by any other distribution, by using the uninstall function. Also note that this uninstall script pays no attention to the REQUESTED metadata; that is provided only for use by external tools to provide more advanced dependency management. Backward compatibility and roadmap ================================== These changes don't introduce any compatibility problems since they will be implemented in: - pkgutil in new functions - distutils2 The plan is to include the functionality outlined in this PEP in pkgutil for Python 2.7 and Python 3.2, and in Distutils2. Distutils2 will also contain a backport of the new pgkutil, and can be used for 2.4 onward. Distributions installed using existing, pre-standardization formats do not have the necessary metadata available for the new API, and thus will be ignored. Third-party tools may of course to continue to support previous formats in addition to the new format, in order to ease the transition. References ========== .. [#distutils] http://docs.python.org/distutils .. [#pep262] http://www.python.org/dev/peps/pep-0262 .. [#pep314] http://www.python.org/dev/peps/pep-0314 .. [#setuptools] http://peak.telecommunity.com/DevCenter/setuptools .. [#easyinstall] http://peak.telecommunity.com/DevCenter/EasyInstall .. [#pip] http://pypi.python.org/pypi/pip .. [#eggformats] http://peak.telecommunity.com/DevCenter/EggFormats .. [#pep273] http://www.python.org/dev/peps/pep-0273 .. [#pep278] http://www.python.org/dev/peps/pep-0278 .. [#fedora] http://fedoraproject.org/wiki/Packaging/Python/Eggs#Providing_Eggs_using_Setuptools .. [#debian] http://wiki.debian.org/DebianPython/NewPolicy .. [#prototype] http://bitbucket.org/tarek/pep376/ Acknowledgements ================ Jim Fulton, Ian Bicking, Phillip Eby, Rafael Villar Burke, and many people at Pycon and Distutils-SIG. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: