PEP: 376 Title: Changing the .egg-info structure Version: $Revision$ Last-Modified: $Date$ Author: Tarek Ziadé Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 22-Feb-2009 Python-Version: 2.7, 3.2 Post-History: Abstract ======== This PEP proposes various enhancements for Distutils: - A new format for the .egg-info structure. - Some APIs to read the meta-data of a project - Replace PEP 262 - An uninstall feature Definitions =========== A **project** is a distribution of one or several files, which can be Python modules, extensions or data. It is distributed using a `setup.py` script with Distutils and/or Setuptools. The `setup.py` script indicates where each elements should be installed. Once installed, the elements are located in various places in the system, like: - in Python's site-packages (Python modules, Python modules organized into packages, Extensions, etc.) - in Python's `include` directory. - in Python's `bin` or `Script` directory. - etc. Rationale ========= There are two problems right now in the way projects are installed in Python: - There are too many ways to do it. - There is no API to get the metadata of installed projects. How projects are installed -------------------------- Right now, when a project is installed in Python, every elements its contains is installed in various directories. The pure Python code for instance is installed in the `purelib` directory, which is located in the Python installation in `lib\python2.6\site-packages` for example under unix-like systems or Mac OS X, and in `Lib/site-packages` under Windows. This is done with the Distutils `install` command, which calls various subcommands. The `install_egg_info` subcommand is called during this process, in order to create an `.egg-info` file in the `purelib` directory. For example, if the `zlib` project (which contains one package) is installed, two elements will be installed in `site-packages`:: - zlib - zlib-2.5.2-py2.4.egg-info Where `zlib` is a Python package, and `zlib-2.5.2-py2.4.egg-info` is a file containing the project metadata as described in PEP 314 [#pep314]_. This file corresponds to the file called `PKG-INFO`, built by the `sdist` command. The problem is that many people use `easy_install` (setuptools [#setuptools]_) or `pip` [#pip]_ to install their packages, and these third-party tools do not install packages in the same way that Distutils does: - `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory, and adds a `PKG-INFO` file inside this directory. The `.egg` directory contains in that case all the elements of the project that are supposed to be installed in `site-packages`, and is placed in the `site-packages` directory. - `pip` creates an `.egg-info` directory inside the `site-packages` directory and adds a `PKG-INFO` file inside it. Elements of the project are then installed in various places like Distutils does. They both add other files in the `EGG-INFO` or `.egg-info` directory, and create or modify `.pth` files. Uninstall information --------------------- Distutils doesn't provide any `uninstall` command. If you want to uninstall a project, you have to be a power user and remove the various elements that were installed. Then look over the `.pth` file to clean them if necessary. And the process differs, depending on the tools you have used to install the project, and if the project's `setup.py` uses Distutils or Setuptools. Under some circumstances, you might not be able to know for sure that you have removed everything, or that you didn't break another project by removing a file that was shared among several projects. But there's common behavior: when you install a project, files are copied in your system. And there's a way to keep track of theses files, so to remove them. What this PEP proposes ---------------------- To address those issues, this PEP proposes a few changes: - a new `.egg-info` structure using a directory, based on one form of the `EggFormats` standard from `setuptools` [#eggformats]_. - new APIs in `pkgutil` to be able to query the information of installed projects. - a de-facto replacement for PEP 262 - an uninstall function in Distutils. .egg-info becomes a directory ============================= The first change would be to make `.egg-info` a directory and let it hold the `PKG-INFO` file built by the `write_pkg_file` method of the `Distribution` class in Distutils. Notice that this change is based on the standard proposed by `EggFormats`. Although, this standard proposes two ways to install files : - a self-contained directory that can be zipped or left unzipped and that contains the project files *and* the `.egg-info` directory. - a distinct `.egg-info` directory located in the site-packages directory. You may refer to the `EggFormats` documentation for more details. This change will not impact Python itself, because `egg-info` files are not used anywhere yet in the standard library besides Distutils. Although it will impact the `setuptools` and `pip` projects, but given the fact that they already work with a directory that contains a `PKG-INFO` file, the change will have no deep consequences. For example, if the `zlib` package is installed, the elements that will be installed in `site-packages` will become:: - zlib - zlib-2.5.2.egg-info/ PKG-INFO The syntax of the egg-info directory name is as follows:: name + '-' + version + '.egg-info' The egg-info directory name is created using a new function called ``egginfo_dirname(name, version)`` added to ``pkgutil``. ``name`` is converted to a standard distribution name any runs of non-alphanumeric characters are replaced with a single '-'. ``version`` is converted to a standard version string. Spaces become dots, and all other non-alphanumeric characters become dashes, with runs of multiple dashes condensed to a single dash. Both attributes are then converted into their filename-escaped form. Any '-' characters are currently replaced with '_'. Examples:: >>> egginfo_dirname('zlib', '2.5.2') 'zlib-2.5.2.egg-info' >>> egginfo_dirname('python-ldap', '2.5') 'python_ldap-2.5.egg-info' >>> egginfo_dirname('python-ldap', '2.5 a---5') 'python_ldap-2.5.a_5.egg-info' Adding a RECORD file in the .egg-info directory =============================================== A `RECORD` file will be added inside the `.egg-info` directory at installation time. The `RECORD` file will hold the list of installed files. These correspond to the files listed by the `record` option of the `install` command, and will be generated by default. This will allow uninstallation, as explained later in this PEP. The `install` command will also provide an option to prevent the `RECORD` file from being written and this option should be used when creating system packages. Third-party installation tools also should not overwrite or delete files that are not in a RECORD file without prompting or warning. This RECORD file is inspired from PEP 262 FILES [#pep262]_. The RECORD format ----------------- The `RECORD` file is a CSV file, composed of records, one line per installed file. The ``csv`` module is used to read the file, with the `excel` dialect, which uses these options to read the file: - field delimiter : `,` - quoting char : `"`. - line terminator : `\r\n` Each record is composed of three elements. - the file's full **path** - if the installed file is located in the directory where the .egg-info directory of the package is located, it will be a '/'-separated relative path, no matter what is the target system. This makes this information cross-compatible and allows simple installation to be relocatable. - if the installed file is located elsewhere in the system, a '/'-separated absolute path is used. - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo` generated files will not have a hash. - the file's size in bytes The ``csv`` module with its default options will be used to generate this file, so the field separator will be ",". Any "," characters found within a field will be escaped automatically by ``csv``. Example ------- Back to our `zlib` example, we will have:: - zlib - zlib-2.5.2.egg-info/ PKG-INFO RECORD And the RECORD file will contain:: zlib/include/zconf.h,b690274f621402dda63bf11ba5373bf2,9544 zlib/include/zlib.h,9c4b84aff68aa55f2e9bf70481b94333,66188 zlib/lib/libz.a,e6d43fb94292411909404b07d0692d46,91128 zlib/share/man/man3/zlib.3,785dc03452f0508ff0678fba2457e0ba,4486 zlib-2.5.2.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195 zlib-2.5.2.egg-info/RECORD Notice that: - the `RECORD` file can't contain a hash of itself and is just mentioned here - `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file paths are relative to it. Adding an INSTALLER file in the .egg-info directory =================================================== The `install` command will have a new option called `installer`. This option is the name of the tool used to invoke the installation. It's an normalized lower-case string matching `[a-z0-9_\-\.]`. $ python setup.py install --installer=pkg-system It will default to `distutils` if not provided. When a project is installed, the INSTALLER file is generated in the `.egg-info` directory with this value, to keep track of **who** installed the project. The file is a single-line text file. New APIs in pkgutil =================== To use the `.egg-info` directory content, we need to add in the standard library a set of APIs. The best place to put these APIs seems to be `pkgutil`. The API is organized in three classes: - ``Distribution``: manages an `.egg-info` directory. - ``DistributionDirectory``: manages a directory that contains some `.egg-info` directories. - ``DistributionDirectories``: manages ``EggInfoDirectory`` instances. Distribution class ------------------ A new class called ``Distribution`` is created with a the path of the `.egg-info` directory provided to the contructor. It reads the metadata contained in `PKG-INFO` when it is instanciated. ``Distribution`` provides the following attributes: - ``name``: The name of the distribution. - ``metadata``: A ``DistributionMetadata`` instance loaded with the distribution's PKG-INFO file. And following methods: - ``get_installed_files(local=False)`` -> iterator of (path, md5, size) Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)`` for each line. If ``local`` is ``True``, the path is transformed into a local absolute path. Otherwise the raw value from `RECORD` is returned. - ``uses(path)`` -> Boolean Returns ``True`` if ``path`` is listed in `RECORD`. ``path`` can be a local absolute path or a relative '/'-separated path. - ``get_egginfo_file(path, binary=False)`` -> file object Returns a file located under the `.egg-info` directory. Returns a ``file`` instance for the file pointed by ``path``. ``path`` has to be a '/'-separated path relative to the `.egg-info` directory or an absolute path. If ``path`` is an absolute path and doesn't start with the `.egg-info` directory path, a ``DistutilsError`` is raised. If ``binary`` is ``True``, opens the file in binary mode. - ``get_egginfo_files(local=False)`` -> iterator of paths Iterates over the `RECORD` entries and return paths for each line if the path is pointing a file located in the `.egg-info` directory or one of its subdirectory. If ``local`` is ``True``, each path is transformed into a local absolute path. Otherwise the raw value from `RECORD` is returned. DistributionDirectory class --------------------------- A new class called ``DistributionDirectory`` is created with a path corresponding to a directory. For each `.egg-info` directory founded in `path`, the class creates a corresponding ``Distribution``. The class is a ``set`` of ``Distribution`` instances. ``DistributionDirectory`` provides a ``path`` attribute corresponding to the path is was created with. It also provides two methods besides the ones from ``set``: - ``file_users(path)`` -> Iterator of ``Distribution``. Returns all ``Distribution`` which uses ``path``, by calling ``Distribution.uses(path)`` on all ``Distribution`` instances. - ``owner(path)`` -> ``Distribution`` instance or None If ``path`` is used by only one ``Distribution`` instance, returns it. Otherwise returns None. DistributionDirectories class ----------------------------- A new class called ``DistributionDirectories`` is created. It's a collection of ``DistributionDirectory`` instances. The constructor takes one optional argument ``use_cache`` set to ``True`` by default. When ``True``, ``DistributionDirectories`` will use a global cache to reduce the numbers of I/O accesses and speed up the lookups. The cache is a global mapping containing ``DistributionDirectory`` instances. When an ``DistributionDirectories`` object is created, it will use the cache to add an entry for each path it visits, or reuse existing entries. The cache usage can be disabled at any time with the ``use_cache`` attribute. The cache can also be emptied with the global ``purge_cache`` function. The class is a ``dict`` where the values are ``DistributionDirectory`` instances and the keys are their path attributes. ``EggInfoDirectories`` also provides the following methods besides the ones from ``dict``: - ``append(path)`` Creates an ``DistributionDirectory`` instance for ``path`` and adds it in the mapping. - ``load(paths)`` Creates and adds ``DistributionDirectory`` instances corresponding to ``paths``. - ``reload()`` Reloads existing entries. - ``get_distributions()`` -> Iterator of ``Distribution`` instances. Iterates over all ``Distribution`` contained in ``DistributionDirectory`` instances. - ``get_distribution(project_name)`` -> ``Distribution`` or None. Returns a ``Distribution`` instance for the given project name. If not found, returns None. - ``get_file_users(path)`` -> Iterator of ``Distribution`` instances. Iterates over all projects to find out which project uses the file. Returns ``Distribution`` instances. .egg-info functions ------------------- The new functions added in the ``pkgutil`` are : - ``get_distributions()`` -> iterator of ``Distribution`` instance. Provides an iterator that looks for ``.egg-info`` directories in ``sys.path`` and returns ``Distribution`` instances for each one of them. - ``get_distribution(name)`` -> ``Distribution`` or None. Scans all elements in ``sys.path`` and looks for all directories ending with ``.egg-info``. Returns an ``Distribution`` corresponding to the ``.egg-info`` directory that contains a PKG-INFO that matches `name` for the `name` metadata. Notice that there should be at most one result. The first result founded will be returned. If the directory is not found, returns None. - ``get_file_users(path)`` -> iterator of ``Distribution`` instances. Iterates over all projects to find out which project uses ``path``. ``path`` can be a local absolute path or a relative '/'-separated path. All these functions use the same global instance of ``DistributionDirectories``to use the cache. Notice that the cache is never emptied explicitely. Example ------- Let's use some of the new APIs with our `zlib` example:: >>> from pkgutil import get_distribution, get_file_users >>> dist = get_distribution('zlib') >>> dist.name 'zlib' >>> dist.metadata.version '2.5.2' >>> for path, hash, size in dist.get_installed_files():: ... print '%s %s %d %s' % (path, hash, size) ... zlib/include/zconf.h b690274f621402dda63bf11ba5373bf2 9544 zlib/include/zlib.h 9c4b84aff68aa55f2e9bf70481b94333 66188 zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128 zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486 zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195 zlib-2.5.2.egg-info/RECORD None None >>> dist.uses('zlib/include/zlib.h') True >>> dist.get_egginfo_file('PKG-INFO') PEP 262 replacement =================== In the past an attempt was made to create a installation database (see PEP 262 [#pep262]_). Extract from PEP 262 Requirements: " We need a way to figure out what distributions, and what versions of those distributions, are installed on a system..." Since the APIs proposed in the current PEP provide everything needed to meet this requirement, PEP 376 will replace PEP 262 and will become the official `installation database` standard. The new version of PEP 345 (XXX work in progress) will extend the Metadata standard and will fullfill the requirements described in PEP 262, like the `REQUIRES` section. Adding an Uninstall function ============================ Distutils already provides a very basic way to install a project, which is running the `install` command over the `setup.py` script of the distribution. Distutils will provide a very basic ``uninstall`` function, that will be added in ``distutils.util`` and will take the name of the project to uninstall as its argument. ``uninstall`` will use the APIs desribed earlier and remove all unique files, as long as their hash didn't change. Then it will remove empty directories left behind. ``uninstall`` will return a list of uninstalled files:: >>> from distutils.util import uninstall >>> uninstall('zlib') ['/opt/local/lib/python2.6/site-packages/zlib/file1', '/opt/local/lib/python2.6/site-packages/zlib/file2'] If the project is not found, a ``DistutilsUninstallError`` will be raised. Filtering --------- To make it a reference API for third-party projects that wish to control how `uninstall` works, a second callable argument can be used. It will be called for each file that is removed. If the callable returns `True`, the file will be removed. If it returns False, it will be left alone. Examples:: >>> def _remove_and_log(path): ... logging.info('Removing %s' % path) ... return True ... >>> uninstall('zlib', _remove_and_log) >>> def _dry_run(path): ... logging.info('Removing %s (dry run)' % path) ... return False ... >>> uninstall('zlib', _dry_run) Of course, a third-party tool can use ``pkgutil`` APIs to implement its own uninstall feature. Installer marker ---------------- As explained earlier in this PEP, the `install` command adds an `INSTALLER` file in the `.egg-info` directory with the name of the installer. To avoid removing projects that where installed by another packaging system, the ``uninstall`` function takes an extra argument ``installer`` which default to ``distutils``. When called, ``uninstall`` will control that the ``INSTALLER`` file matches this argument. If not, it will raise a ``DistutilsUninstallError``:: >>> uninstall('zlib') Traceback (most recent call last): ... DistutilsUninstallError: zlib was installed by 'cool-pkg-manager' >>> uninstall('zlib', installer='cool-pkg-manager') This allows a third-party application to use the ``uninstall`` function and make sure it's the only program that can remove a project it has previously installed. This is useful when a third-party program that relies on Distutils APIs does extra steps on the system at installation time, it has to undo at uninstallation time. Backward compatibility and roadmap ================================== These changes will not introduce any compatibility problems with the previous version of Distutils, and will also work with existing third-party tools. Although, a backport of the new Distutils for 2.5, 2.6, 3.0 and 3.1 will be provided so people can benefit from these new features. The plan is to integrate them for Python 2.7 and Python 3.2 References ========== .. [#pep262] http://www.python.org/dev/peps/pep-0262 .. [#pep314] http://www.python.org/dev/peps/pep-0314 .. [#setuptools] http://peak.telecommunity.com/DevCenter/setuptools .. [#pip] http://pypi.python.org/pypi/pip .. [#eggformats] http://peak.telecommunity.com/DevCenter/EggFormats Aknowledgments ============== Jim Fulton, Ian Bicking, Phillip Eby, and many people at Pycon and Distutils-SIG. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: