From 58cbf93c364c69975144a37e124937ec2e1e9681 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tarek=20Ziad=C3=A9?= Date: Mon, 25 May 2009 10:22:46 +0000 Subject: [PATCH] updated PEP 376 to reflect the prototype API + more details --- pep-0376.txt | 300 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 193 insertions(+), 107 deletions(-) diff --git a/pep-0376.txt b/pep-0376.txt index 66f814556..101c99f03 100644 --- a/pep-0376.txt +++ b/pep-0376.txt @@ -18,18 +18,20 @@ This PEP proposes various enhancements for Distutils: - A new format for the .egg-info structure. - Some APIs to read the meta-data of a project +- Replace PEP 262 +- An uninstall feature Definitions =========== A **project** is a Python application composed of one or several files, which can -be Python modules, extensions or data. It is distributed using a `setup.py` script -with Distutils and/or Setuptools. The `setup.py` script indicates where each +be Python modules, extensions or data. It is distributed using a `setup.py` script +with Distutils and/or Setuptools. The `setup.py` script indicates where each elements should be installed. Once installed, the elements are located in various places in the system, like: -- in Python's site-packages (Python modules, Python modules organized into packages, +- in Python's site-packages (Python modules, Python modules organized into packages, Extensions, etc.) - in Python's `include` directory. - in Python's `bin` or `Script` directory. @@ -46,16 +48,16 @@ There are two problems right now in the way projects are installed in Python: How projects are installed -------------------------- -Right now, when a project is installed in Python, every elements its contains -is installed in various directories. +Right now, when a project is installed in Python, every elements its contains +is installed in various directories. The pure Python code for instance is installed in the `purelib` directory, which is located in the Python installation in `lib\python2.6\site-packages` -for example under unix-like systems or Mac OS X, and in `Lib/site-packages` +for example under unix-like systems or Mac OS X, and in `Lib/site-packages` under Windows. This is done with the Distutils `install` command, which calls various subcommands. -The `install_egg_info` subcommand is called during this process, in order to +The `install_egg_info` subcommand is called during this process, in order to create an `.egg-info` file in the `purelib` directory. For example, if the `zlib` project (which contains one package) is installed, @@ -67,17 +69,17 @@ two elements will be installed in `site-packages`:: Where `zlib` is a Python package, and `zlib-2.5.2-py2.4.egg-info` is a file containing the project metadata as described in PEP 314 [#pep314]_. -This file corresponds to the file called `PKG-INFO`, built by +This file corresponds to the file called `PKG-INFO`, built by the `sdist` command. -The problem is that many people use `easy_install` (setuptools [#setuptools]_) -or `pip` [#pip]_ to install their packages, and these third-party tools do not +The problem is that many people use `easy_install` (setuptools [#setuptools]_) +or `pip` [#pip]_ to install their packages, and these third-party tools do not install packages in the same way that Distutils does: -- `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory, - and adds a `PKG-INFO` file inside this directory. The `.egg` directory +- `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory, + and adds a `PKG-INFO` file inside this directory. The `.egg` directory contains in that case all the elements of the project that are supposed to - be installed in `site-packages`, and is placed in the `site-packages` + be installed in `site-packages`, and is placed in the `site-packages` directory. - `pip` creates an `.egg-info` directory inside the `site-packages` directory @@ -97,12 +99,12 @@ were installed. Then look over the `.pth` file to clean them if necessary. And the process differs, depending on the tools you have used to install the project, and if the project's `setup.py` uses Distutils or Setuptools. -Under some circumstances, you might not be able to know for sure that you +Under some circumstances, you might not be able to know for sure that you have removed everything, or that you didn't break another project by -removing a file that was shared among the two projects. +removing a file that was shared among several projects. -But there's common behavior: when you install a project, files are copied -in your system. And there's a way to keep track of theses files, so to remove +But there's common behavior: when you install a project, files are copied +in your system. And there's a way to keep track of theses files, so to remove them. What this PEP proposes @@ -110,23 +112,29 @@ What this PEP proposes To address those issues, this PEP proposes a few changes: -- a new `.egg-info` structure using a directory; -- a list of elements this directory holds; -- new functions in `pkgutil` to be able to query the information - of installed projects. +- a new `.egg-info` structure using a directory, based on the `EggFormats` + standard from `setuptools` [#eggformats]_. +- new APIs in `pkgutil` to be able to query the information of installed + projects. +- a de-facto replacement for PEP 262 +- an uninstall function in Distutils. + .egg-info becomes a directory ============================= The first change would be to make `.egg-info` a directory and let it -hold the `PKG-INFO` file built by the `write_pkg_file` method of +hold the `PKG-INFO` file built by the `write_pkg_file` method of the `Distribution` class in Distutils. -This change will not impact Python itself, because `egg-info` files are not -used anywhere yet in the standard library besides Distutils. +Notice that this change is based on the standard proposed by `EggFormats`. +You may refer to its documentation for more information. -Although it will impact the `setuptools` and `pip` projects, but given -the fact that they already work with a directory that contains a `PKG-INFO` +This change will not impact Python itself, because `egg-info` files are not +used anywhere yet in the standard library besides Distutils. + +Although it will impact the `setuptools` and `pip` projects, but given +the fact that they already work with a directory that contains a `PKG-INFO` file, the change will have no deep consequences. For example, if the `zlib` package is installed, the elements that @@ -136,32 +144,53 @@ will be installed in `site-packages` will become:: - zlib-2.5.2.egg-info/ PKG-INFO -The Python version will also be removed from the `.egg-info` directory -name. +The syntax of the egg-info directory name is as follows:: -Adding a RECORD in the .egg-info directory -========================================== + name + '-' + version + '.egg-info' + +The egg-info directory name is created using a new function called +``egg_info_dirname(name, version)`` added to ``pkgutil``. ``name`` is +converted to a standard distribution name any runs of non-alphanumeric +characters are replaced with a single '-'. ``version`` is converted +to a standard version string. Spaces become dots, and all other +non-alphanumeric characters become dashes, with runs of multiple dashes +condensed to a single dash. Both attributes are then converted into their +filename-escaped form. Any '-' characters are currently replaced with '_'. + +Examples:: + + >>> egg_info_dirname('zlib', '2.5.2') + 'zlib-2.5.2.egg-info' + + >>> egg_info_dirname('python-ldap', '2.5') + 'python_ldap-2.5.egg-info' + + >>> egg_info_dirname('python-ldap', '2.5 a---5') + 'python_ldap-2.5.a_5.egg-info' + +Adding a RECORD file in the .egg-info directory +=============================================== A `RECORD` file will be added inside the `.egg-info` directory at installation -time. The `RECORD` file will hold the list of installed files. These correspond -to the files listed by the `record` option of the `install` command, and will -always be generated. This will allow uninstallation, as explained later in this +time. The `RECORD` file will hold the list of installed files. These correspond +to the files listed by the `record` option of the `install` command, and will +always be generated. This will allow uninstallation, as explained later in this PEP. This RECORD file is inspired from PEP 262 FILES [#pep262]_. The RECORD format ----------------- -The `RECORD` file is composed of records, one line per installed file. -Each record is composed of three elements separated by a character: +The `RECORD` file is a CSV-like file, composed of records, one line per +installed file. Each record is composed of three elements. - the file's full **path** - if the installed file is located in the directory where the .egg-info - directory of the package is located, it will be a '/'-separated relative - path, no matter what is the target system. This makes this information + directory of the package is located, it will be a '/'-separated relative + path, no matter what is the target system. This makes this information cross-compatible and allows simple installation to be relocatable. - - if the installed file is located elsewhere in the system, a + - if the installed file is located elsewhere in the system, a '/'-separated absolute path is used. - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo` @@ -169,6 +198,10 @@ Each record is composed of three elements separated by a character: - the file's size in bytes +The ``csv`` module with its default options will be used to generate this file, +so the field separator will be ",". Any "," characters found within a field +will be escaped automatically by ``csv``. + Example ------- @@ -181,116 +214,166 @@ Back to our `zlib` example, we will have:: And the RECORD file will contain:: - zlib/include/zconf.h b690274f621402dda63bf11ba5373bf2 9544 - zlib/include/zlib.h 9c4b84aff68aa55f2e9bf70481b94333 66188 - zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128 - zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486 - zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195 + zlib/include/zconf.h,b690274f621402dda63bf11ba5373bf2,9544 + zlib/include/zlib.h,9c4b84aff68aa55f2e9bf70481b94333,66188 + zlib/lib/libz.a,e6d43fb94292411909404b07d0692d46,91128 + zlib/share/man/man3/zlib.3,785dc03452f0508ff0678fba2457e0ba,4486 + zlib-2.5.2.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195 zlib-2.5.2.egg-info/RECORD Notice that: - the `RECORD` file can't contain a hash of itself and is just mentioned here -- `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file +- `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file paths are relative to it. -New functions in pkgutil -======================== +New APIs in pkgutil +=================== -To use the `.egg-info` directory content, we need to add in the standard +To use the `.egg-info` directory content, we need to add in the standard library a set of APIs. The best place to put these APIs seems to be `pkgutil`. -The new functions added in the package are : +EggInfo class +------------- -- get_projects() -> iterator +A new class called ``EggInfo`` is created, which provides the following +attributes: - Provides an iterator that will return (name, path) tuples, where `name` - is the name of a registered project and `path` the path to its `egg-info` - directory. +- ``name``: The name of the project -- get_egg_info(project_name) -> path or None +- ``metadata``: A ``DistributionMetadata`` instance loaded with the project's + PKG-INFO file - Scans all elements in `sys.path` and looks for all directories ending with - `.egg-info`. Returns the directory path that contains a PKG-INFO that matches - `project_name` for the `name` metadata. Notice that there should be at most - one result. The first result founded will be returned. +The following methods are provided: - If the directory is not found, returns None. +- ``get_installed_files(local=False)`` -> iterator of (path, md5, size) - XXX The implementation of `get_egg_info` will focus on minimizing the I/O - accesses. + Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)`` + for each line. If ``local`` is ``True``, the path is transformed into a + local absolute path. Otherwise the raw value from `RECORD` is returned. -- get_metadata(project_name) -> DistributionMetadata or None +- ``uses(path)`` -> Boolean - Uses `get_egg_info` to get the `PKG-INFO` file, and returns a - `DistributionMetadata` instance that contains the metadata. + Returns ``True`` if ``path`` is listed in `RECORD`. ``path`` + can be a local absolute path or a relative '/'-separated path. -- get_files(project_name, local=False) -> iterator of (path, hash, size, - other_projects) +- ``owns(path)`` -> Boolean - Uses `get_egg_info` to get the `RECORD` file, and returns an iterator. + Returns ``True`` if ``path`` is owned by the project. + Owned means that the path is used only by this project and is not used + by any other project. ``path`` can be a local absolute path or a relative + '/'-separated path. - Each returned element is a tuple `(path, hash, size, other_projects)` where - ``path``, ``hash``, ``size`` are the values found in the RECORD file. +- ``get_file(path, binary=False)`` -> file object - `path` is the raw value founded in the RECORD file. If `local` is - set to True, `path` will be translated to its real absolute path, using - the local path separator. + Returns a ``file`` instance for the file pointed by ``path``. ``path`` can be + a local absolute path or a relative '/'-separated path. If ``binary`` is + ``True``, opens the file in binary mode. - `other_projects` is a tuple containing the name of the projects that are - also referring to this file in their own RECORD file (same path). +.egg-info functions +------------------- - If `other_projects` is empty, it means that the file is only referred by the - current project. In other words, it can be removed if the project is removed. +The new functions added in the ``pkgutil`` are : -- get_egg_info_file(project_name, path, binary=False) -> file object or None +- ``get_egg_infos()`` -> iterator - Uses `get_egg_info` and gets any element inside the directory, - pointed by its relative path. `get_egg_info_file` will perform - an `os.path.join` on `get_egg_info(project_name)` and `path` to build the - whole path. + Provides an iterator that looks for ``.egg-info`` directories in ``sys.path`` + and returns ``EggInfo`` instances for each one of them. - `path` can be a '/'-separated path or can use the local separator. - `get_egg_info_file` will automatically convert it using the platform path - separator, to look for the file. +- ``get_egg_info(project_name)`` -> path or None - If `binary` is set True, the file will be opened using the binary mode. + Scans all elements in ``sys.path`` and looks for all directories ending with + ``.egg-info``. Returns an ``EggInfo`` corresponding to the ``.egg-info`` + directory that contains a PKG-INFO that matches `project_name` for the `name` + metadata. -Let's use it with our `zlib` example:: + Notice that there should be at most one result. The first result founded + will be returned. If the directory is not found, returns None. + +- ``get_file_users(path)`` -> iterator of ``EggInfo`` instances. + + Iterates over all projects to find out which project uses ``path``. + ``path`` can be a local absolute path or a relative '/'-separated path. + +Cache functions +--------------- + +The functions from the previous section work with a global memory cache to +reduce the numbers of I/O accesses and speed up the lookups. + +The cache can be managed with these functions: + +- ``purge_cache``: removes all entries from cache. +- ``cache_enabled``: returns ``True`` if the cache is enabled. +- ``enable_cache``: enables the cache. +- ``disable_cache``: disables the cache. + +Example +------- + +Let's use some of the new APIs with our `zlib` example:: + + >>> from pkgutil import get_egg_info, get_file_users + >>> egg_info = get_egg_info('zlib') + >>> egg_info.name + 'zlib' + >>> egg_info.metadata.version + '2.5.2' - >>> from pkgutil import (get_egg_info, get_metadata, get_egg_info_file, - ... get_files) - >>> get_egg_info('zlib') '/opt/local/lib/python2.6/site-packages/zlib-2.5.2.egg-info' >>> metadata = get_metadata('zlib') >>> metadata.version '2.5.2' - >>> get_egg_info_file('zlib', 'PKG-INFO').read() - some - ... - files - >>> for path, hash, size, other_projects in get_files('zlib'): - ... print '%s %s %d %s' % (path, hash, size, ','.join(other_projects)) + + >>> for path, hash, size in egg_info.get_installed_files():: + ... print '%s %s %d %s' % (path, hash, size) ... zlib/include/zconf.h b690274f621402dda63bf11ba5373bf2 9544 zlib/include/zlib.h 9c4b84aff68aa55f2e9bf70481b94333 66188 - zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128 - zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486 - zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195 - zlib-2.5.2.egg-info/RECORD None None + zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128 + zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486 + zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195 + zlib-2.5.2.egg-info/RECORD None None + >>> egg_info.uses('zlib/include/zlib.h') + True + >>> egg_info.owns('zlib/include/zlib.h') + True + + >>> egg_info.get_file('zlib/include/zlib.h') + + +PEP 262 replacement +=================== + +In the past an attempt was made to create a installation database (see PEP 262 +[#pep262]_). + +Extract from PEP 262 Requirements: + + " We need a way to figure out what distributions, and what versions of + those distributions, are installed on a system..." + + +Since the APIs proposed in the current PEP provide everything needed to meet +this requirement, PEP 376 will replace PEP 262 and will become the official +`installation database` standard. + +The new version of PEP 345 (XXX work in progress) will extend the Metadata +standard and will fullfill the requirements described in PEP 262, like the +`REQUIRES` section. Adding an Uninstall function ============================ -Distutils provides a very basic way to install a project, which is running +Distutils already provides a very basic way to install a project, which is running the `install` command over the `setup.py` script of the distribution. -Distutils will provide a very basic ``uninstall`` function, that will be added -in ``distutils.util`` and will take the name of the project to uninstall as -its argument. ``uninstall`` will use ``pkgutil.get_files`` and remove all +Distutils will provide a very basic ``uninstall`` function, that will be added +in ``distutils.util`` and will take the name of the project to uninstall as +its argument. ``uninstall`` will use the APIs desribed earlier and remove all unique files, as long as their hash didn't change. Then it will remove -directories where it removed the last elements. +empty directories left behind. ``uninstall`` will return a list of uninstalled files:: @@ -301,9 +384,9 @@ directories where it removed the last elements. If the project is not found, a ``DistutilsUninstallError`` will be raised. -To make it a reference API for third-party projects that wish to control -how `uninstall` works, a second callable argument can be used. It will be -called for each file that is removed. If the callable returns `True`, the +To make it a reference API for third-party projects that wish to control +how `uninstall` works, a second callable argument can be used. It will be +called for each file that is removed. If the callable returns `True`, the file will be removed. If it returns False, it will be left alone. Examples:: @@ -320,7 +403,7 @@ Examples:: ... >>> uninstall('zlib', _dry_run) -Of course, a third-party tool can use ``pkgutil.get_files``, to implement +Of course, a third-party tool can use ``pkgutil`` APIs to implement its own uninstall feature. Backward compatibility and roadmap @@ -349,6 +432,9 @@ References .. [#pip] http://pypi.python.org/pypi/pip +.. [#eggformats] + http://peak.telecommunity.com/DevCenter/EggFormats + Aknowledgments ==============