PEP: 376 Title: Changing the .egg-info structure Version: $Revision$ Last-Modified: $Date$ Author: Tarek Ziadé Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 22-Feb-2009 Python-Version: 2.7, 3.1 Post-History: Abstract ======== This PEP proposes various enhancements for Distutils: - A new format for the .egg-info structure. - Some APIs to read the meta-data of a project Definitions =========== A **project** is a Python application composed of one or many Python packages. It is distributed using a `setup.py` script with Distutils and/or Setuptools. Once installed, one or several **packages** are added in Python's site-packages. Rationale ========= There are two problems right now in the way projects are installed in Python: - There are too many ways to install a project in Python. - There is no API to get the metadata of installed packages. How projects are installed -------------------------- Right now, when a project is installed in Python, every package its contains is installed in the `site-packages` directory with the Distutils `install` command. The `install_egg_info` subcommand is called during this process, in order to create an `.egg-info` file in the `site-packages` directory. For example, if the `zlib` project (which contains one package), is installed two elements will be installed in `site-packages`:: - zlib - zlib-2.5.2-py2.4.egg-info Where `zlib` is the package, and `zlib-2.5.2-py2.4.egg-info` is a file containing the package metadata as described in PEP 314. This file corresponds to the file called `PKG-INFO`, built by the `sdist` command. The problem is that many people use `easy_install` (setuptools) or `pip` to install their packages, and these third-party tools do not install packages in the same way that Distutils does: - `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory, and adds a `PKG-INFO` file inside this directory. The `.egg` directory contains in that case the packages of the project. - `pip` creates an `.egg-info` directory inside the site-packages directory and adds a `PKG-INFO` file inside it. Packages are installed in site-packages directory in a regular way. They both add other files in the `EGG-INFO` or `.egg-info` directory, and create or modify `.pth` files. Uninstall information --------------------- Distutils doesn't provide any `uninstall` command. If you want to uninstall a project, you have to be a power user and remove the various package directories from the right `site-packages` directory, then look over the right `pth` files. And this method differs, depending on the tools you are using. The worst issue is that you depend on the way the packager created his package. When you call `python setup.py install`, it will not be installed the same way depending on the tool used by the packager (mainly Distutils or Setuptools). But there's common behavior: files are copied in your installation. And there's a way to keep track of theses file, so to remove them. What this PEP proposes ---------------------- To address those issues, this PEP proposes a few changes: - a new `.egg-info` structure using a directory; - a list of elements this directory holds; - new functions in `pkgutil` to be able to query the information of installed projects. .egg-info becomes a directory ============================= The first change would be to make `.egg-info` a directory and let it hold the `PKG-INFO` file built by the `write_pkg_file` method. This change will not impact Python itself, because this file is not used anywhere yet in the standard library. So there's no need of deprecation. Although it will impact the `setuptools` and `pip` projects, but given the fact that they already work with a directory that contains a `PKG-INFO` file, the change will be small. For example, if the `zlib` package is installed, two elements will be installed in `site-packages`:: - zlib - zlib-2.5.2.egg-info/ PKG-INFO The Python version will also be removed from the .egg-info directory name. To be able to implement this change, the impacted code in Distutils is the `install_egg_info` command, and the various third-party projects. Adding a RECORD in the .egg-info directory ========================================== A `RECORD` file will be added inside the `.egg-info` directory at installation time. - the `RECORD` file will hold the list of installed files. These correspond to the files listed by the `record` option of the `install` command, and will always be generated. This will allow uninstallation, as explained later in this PEP. The `install` command will record by default installed files in the RECORD file, using these rules: - if the installed file is located in a directory in `site-packages`, it will be a '/'-separated relative path, no matter what is the target system. This makes this information cross-compatible and allows simple installation to be relocatable. - if the installed file is located elsewhere in the system, a '/'-separated absolute path is used. This will require changing the way the `install` command writes the record file, so the old `record` behavior will be deprecated. Back to our `zlib` example, we will have:: - zlib - zlib-2.5.2.egg-info/ PKG-INFO RECORD New functions in pkgutil ======================== To use the `.egg-info` directory content, we need to add in the standard library a set of APIs. The best place to put these APIs seems to be `pkgutil`. The new functions added in the package are : - get_egg_info(project_name) -> path or None Scans all elements in `sys.path` and looks for all directories ending with `.egg-info`. Returns the directory path that contains a PKG-INFO that matches `project_name` for the `name` metadata. Notice that there should be at most one result. The first result founded will be returned. If the directory is not found, returns None. XXX The implementation of `get_egg_info` will focus on minimizing the I/O accesses. - get_metadata(project_name) -> DistributionMetadata or None Uses `get_egg_info` to get the `PKG-INFO` file, and returns a `DistributionMetadata` instance that contains the metadata. This will require a small change in `DistributionMetadata` (see #4908). - get_egg_info_file(project_name, filename) -> file object or None Uses `get_egg_info` and gets any file inside the directory, pointed by filename. Let's use it with our `zlib` example:: >>> from pkgutil import get_egg_info, get_metadata, get_egg_info_file >>> get_egg_info('zlib') '/opt/local/lib/python2.6/site-packages/zlib-2.5.2.egg-info' >>> metadata = get_metadata('zlib') >>> metadata.version '2.5.2' >>> get_egg_info_file('zlib', 'PKG-INFO').read() some ... files Adding an Uninstall function ============================ Distutils provides a very basic way to install a project, which is running the `install` command over the `setup.py` script of the distribution. Distutils will provide a very basic `uninstall` command that will remove all files listed in the `RECORD` file of a project, as long as they are not mentioned in another `RECORD` file and as long as the package is installed using the standard described earlier. This command will be added in ``distutils.util`` and will take the name of the project to uninstall as its argument. A call to uninstall will return a list of uninstalled files:: >>> from distutils.util import uninstall >>> uninstall('zlib') ['/opt/local/lib/python2.6/site-packages/zlib/file1', '/opt/local/lib/python2.6/site-packages/zlib/file2'] If the project is not found, a ``DistutilsUninstallError`` will be raised. To make it a reference API for third-party projects that wish to provide an `uninstall feature`. The ``uninstall`` function can also be invoked with a second callable argument, that will be invoked for each file to be removed. If this callable returns `True`, the file will be removed. Examples:: >>> def _remove_and_log(path): ... logging.info('Removing %s' % path) ... return True >>> uninstall('zlib', _remove_and_log) >>> def _dry_run(path): ... logging.info('Removing %s (dry run)' % path) ... return False >>> uninstall('zlib', _dry_run) Backward compatibility and roadmap ================================== These changes will not introduce any compatibility problems with the previous version of Distutils, and will also work with existing third-party tools. The plan is to integrate them for Python 2.7 and Python 3.2 Aknowledgments ============== Jim Fulton, Ian Bicking, Phillip Eby, and many people at Pycon and Distutils-SIG. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: