updated PEP 376 to reflect the prototype API + more details

This commit is contained in:
Tarek Ziadé 2009-05-25 10:22:46 +00:00
parent 51b7df09e2
commit 58cbf93c36
1 changed files with 193 additions and 107 deletions

View File

@ -18,18 +18,20 @@ This PEP proposes various enhancements for Distutils:
- A new format for the .egg-info structure.
- Some APIs to read the meta-data of a project
- Replace PEP 262
- An uninstall feature
Definitions
===========
A **project** is a Python application composed of one or several files, which can
be Python modules, extensions or data. It is distributed using a `setup.py` script
with Distutils and/or Setuptools. The `setup.py` script indicates where each
be Python modules, extensions or data. It is distributed using a `setup.py` script
with Distutils and/or Setuptools. The `setup.py` script indicates where each
elements should be installed.
Once installed, the elements are located in various places in the system, like:
- in Python's site-packages (Python modules, Python modules organized into packages,
- in Python's site-packages (Python modules, Python modules organized into packages,
Extensions, etc.)
- in Python's `include` directory.
- in Python's `bin` or `Script` directory.
@ -46,16 +48,16 @@ There are two problems right now in the way projects are installed in Python:
How projects are installed
--------------------------
Right now, when a project is installed in Python, every elements its contains
is installed in various directories.
Right now, when a project is installed in Python, every elements its contains
is installed in various directories.
The pure Python code for instance is installed in the `purelib` directory,
which is located in the Python installation in `lib\python2.6\site-packages`
for example under unix-like systems or Mac OS X, and in `Lib/site-packages`
for example under unix-like systems or Mac OS X, and in `Lib/site-packages`
under Windows. This is done with the Distutils `install` command, which calls
various subcommands.
The `install_egg_info` subcommand is called during this process, in order to
The `install_egg_info` subcommand is called during this process, in order to
create an `.egg-info` file in the `purelib` directory.
For example, if the `zlib` project (which contains one package) is installed,
@ -67,17 +69,17 @@ two elements will be installed in `site-packages`::
Where `zlib` is a Python package, and `zlib-2.5.2-py2.4.egg-info` is
a file containing the project metadata as described in PEP 314 [#pep314]_.
This file corresponds to the file called `PKG-INFO`, built by
This file corresponds to the file called `PKG-INFO`, built by
the `sdist` command.
The problem is that many people use `easy_install` (setuptools [#setuptools]_)
or `pip` [#pip]_ to install their packages, and these third-party tools do not
The problem is that many people use `easy_install` (setuptools [#setuptools]_)
or `pip` [#pip]_ to install their packages, and these third-party tools do not
install packages in the same way that Distutils does:
- `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory,
and adds a `PKG-INFO` file inside this directory. The `.egg` directory
- `easy_install` creates an `EGG-INFO` directory inside an `.egg` directory,
and adds a `PKG-INFO` file inside this directory. The `.egg` directory
contains in that case all the elements of the project that are supposed to
be installed in `site-packages`, and is placed in the `site-packages`
be installed in `site-packages`, and is placed in the `site-packages`
directory.
- `pip` creates an `.egg-info` directory inside the `site-packages` directory
@ -97,12 +99,12 @@ were installed. Then look over the `.pth` file to clean them if necessary.
And the process differs, depending on the tools you have used to install the
project, and if the project's `setup.py` uses Distutils or Setuptools.
Under some circumstances, you might not be able to know for sure that you
Under some circumstances, you might not be able to know for sure that you
have removed everything, or that you didn't break another project by
removing a file that was shared among the two projects.
removing a file that was shared among several projects.
But there's common behavior: when you install a project, files are copied
in your system. And there's a way to keep track of theses files, so to remove
But there's common behavior: when you install a project, files are copied
in your system. And there's a way to keep track of theses files, so to remove
them.
What this PEP proposes
@ -110,23 +112,29 @@ What this PEP proposes
To address those issues, this PEP proposes a few changes:
- a new `.egg-info` structure using a directory;
- a list of elements this directory holds;
- new functions in `pkgutil` to be able to query the information
of installed projects.
- a new `.egg-info` structure using a directory, based on the `EggFormats`
standard from `setuptools` [#eggformats]_.
- new APIs in `pkgutil` to be able to query the information of installed
projects.
- a de-facto replacement for PEP 262
- an uninstall function in Distutils.
.egg-info becomes a directory
=============================
The first change would be to make `.egg-info` a directory and let it
hold the `PKG-INFO` file built by the `write_pkg_file` method of
hold the `PKG-INFO` file built by the `write_pkg_file` method of
the `Distribution` class in Distutils.
This change will not impact Python itself, because `egg-info` files are not
used anywhere yet in the standard library besides Distutils.
Notice that this change is based on the standard proposed by `EggFormats`.
You may refer to its documentation for more information.
Although it will impact the `setuptools` and `pip` projects, but given
the fact that they already work with a directory that contains a `PKG-INFO`
This change will not impact Python itself, because `egg-info` files are not
used anywhere yet in the standard library besides Distutils.
Although it will impact the `setuptools` and `pip` projects, but given
the fact that they already work with a directory that contains a `PKG-INFO`
file, the change will have no deep consequences.
For example, if the `zlib` package is installed, the elements that
@ -136,32 +144,53 @@ will be installed in `site-packages` will become::
- zlib-2.5.2.egg-info/
PKG-INFO
The Python version will also be removed from the `.egg-info` directory
name.
The syntax of the egg-info directory name is as follows::
Adding a RECORD in the .egg-info directory
==========================================
name + '-' + version + '.egg-info'
The egg-info directory name is created using a new function called
``egg_info_dirname(name, version)`` added to ``pkgutil``. ``name`` is
converted to a standard distribution name any runs of non-alphanumeric
characters are replaced with a single '-'. ``version`` is converted
to a standard version string. Spaces become dots, and all other
non-alphanumeric characters become dashes, with runs of multiple dashes
condensed to a single dash. Both attributes are then converted into their
filename-escaped form. Any '-' characters are currently replaced with '_'.
Examples::
>>> egg_info_dirname('zlib', '2.5.2')
'zlib-2.5.2.egg-info'
>>> egg_info_dirname('python-ldap', '2.5')
'python_ldap-2.5.egg-info'
>>> egg_info_dirname('python-ldap', '2.5 a---5')
'python_ldap-2.5.a_5.egg-info'
Adding a RECORD file in the .egg-info directory
===============================================
A `RECORD` file will be added inside the `.egg-info` directory at installation
time. The `RECORD` file will hold the list of installed files. These correspond
to the files listed by the `record` option of the `install` command, and will
always be generated. This will allow uninstallation, as explained later in this
time. The `RECORD` file will hold the list of installed files. These correspond
to the files listed by the `record` option of the `install` command, and will
always be generated. This will allow uninstallation, as explained later in this
PEP. This RECORD file is inspired from PEP 262 FILES [#pep262]_.
The RECORD format
-----------------
The `RECORD` file is composed of records, one line per installed file.
Each record is composed of three elements separated by a <tab> character:
The `RECORD` file is a CSV-like file, composed of records, one line per
installed file. Each record is composed of three elements.
- the file's full **path**
- if the installed file is located in the directory where the .egg-info
directory of the package is located, it will be a '/'-separated relative
path, no matter what is the target system. This makes this information
directory of the package is located, it will be a '/'-separated relative
path, no matter what is the target system. This makes this information
cross-compatible and allows simple installation to be relocatable.
- if the installed file is located elsewhere in the system, a
- if the installed file is located elsewhere in the system, a
'/'-separated absolute path is used.
- the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo`
@ -169,6 +198,10 @@ Each record is composed of three elements separated by a <tab> character:
- the file's size in bytes
The ``csv`` module with its default options will be used to generate this file,
so the field separator will be ",". Any "," characters found within a field
will be escaped automatically by ``csv``.
Example
-------
@ -181,116 +214,166 @@ Back to our `zlib` example, we will have::
And the RECORD file will contain::
zlib/include/zconf.h b690274f621402dda63bf11ba5373bf2 9544
zlib/include/zlib.h 9c4b84aff68aa55f2e9bf70481b94333 66188
zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128
zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486
zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
zlib/include/zconf.h,b690274f621402dda63bf11ba5373bf2,9544
zlib/include/zlib.h,9c4b84aff68aa55f2e9bf70481b94333,66188
zlib/lib/libz.a,e6d43fb94292411909404b07d0692d46,91128
zlib/share/man/man3/zlib.3,785dc03452f0508ff0678fba2457e0ba,4486
zlib-2.5.2.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195
zlib-2.5.2.egg-info/RECORD
Notice that:
- the `RECORD` file can't contain a hash of itself and is just mentioned here
- `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file
- `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file
paths are relative to it.
New functions in pkgutil
========================
New APIs in pkgutil
===================
To use the `.egg-info` directory content, we need to add in the standard
To use the `.egg-info` directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs seems to be `pkgutil`.
The new functions added in the package are :
EggInfo class
-------------
- get_projects() -> iterator
A new class called ``EggInfo`` is created, which provides the following
attributes:
Provides an iterator that will return (name, path) tuples, where `name`
is the name of a registered project and `path` the path to its `egg-info`
directory.
- ``name``: The name of the project
- get_egg_info(project_name) -> path or None
- ``metadata``: A ``DistributionMetadata`` instance loaded with the project's
PKG-INFO file
Scans all elements in `sys.path` and looks for all directories ending with
`.egg-info`. Returns the directory path that contains a PKG-INFO that matches
`project_name` for the `name` metadata. Notice that there should be at most
one result. The first result founded will be returned.
The following methods are provided:
If the directory is not found, returns None.
- ``get_installed_files(local=False)`` -> iterator of (path, md5, size)
XXX The implementation of `get_egg_info` will focus on minimizing the I/O
accesses.
Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)``
for each line. If ``local`` is ``True``, the path is transformed into a
local absolute path. Otherwise the raw value from `RECORD` is returned.
- get_metadata(project_name) -> DistributionMetadata or None
- ``uses(path)`` -> Boolean
Uses `get_egg_info` to get the `PKG-INFO` file, and returns a
`DistributionMetadata` instance that contains the metadata.
Returns ``True`` if ``path`` is listed in `RECORD`. ``path``
can be a local absolute path or a relative '/'-separated path.
- get_files(project_name, local=False) -> iterator of (path, hash, size,
other_projects)
- ``owns(path)`` -> Boolean
Uses `get_egg_info` to get the `RECORD` file, and returns an iterator.
Returns ``True`` if ``path`` is owned by the project.
Owned means that the path is used only by this project and is not used
by any other project. ``path`` can be a local absolute path or a relative
'/'-separated path.
Each returned element is a tuple `(path, hash, size, other_projects)` where
``path``, ``hash``, ``size`` are the values found in the RECORD file.
- ``get_file(path, binary=False)`` -> file object
`path` is the raw value founded in the RECORD file. If `local` is
set to True, `path` will be translated to its real absolute path, using
the local path separator.
Returns a ``file`` instance for the file pointed by ``path``. ``path`` can be
a local absolute path or a relative '/'-separated path. If ``binary`` is
``True``, opens the file in binary mode.
`other_projects` is a tuple containing the name of the projects that are
also referring to this file in their own RECORD file (same path).
.egg-info functions
-------------------
If `other_projects` is empty, it means that the file is only referred by the
current project. In other words, it can be removed if the project is removed.
The new functions added in the ``pkgutil`` are :
- get_egg_info_file(project_name, path, binary=False) -> file object or None
- ``get_egg_infos()`` -> iterator
Uses `get_egg_info` and gets any element inside the directory,
pointed by its relative path. `get_egg_info_file` will perform
an `os.path.join` on `get_egg_info(project_name)` and `path` to build the
whole path.
Provides an iterator that looks for ``.egg-info`` directories in ``sys.path``
and returns ``EggInfo`` instances for each one of them.
`path` can be a '/'-separated path or can use the local separator.
`get_egg_info_file` will automatically convert it using the platform path
separator, to look for the file.
- ``get_egg_info(project_name)`` -> path or None
If `binary` is set True, the file will be opened using the binary mode.
Scans all elements in ``sys.path`` and looks for all directories ending with
``.egg-info``. Returns an ``EggInfo`` corresponding to the ``.egg-info``
directory that contains a PKG-INFO that matches `project_name` for the `name`
metadata.
Let's use it with our `zlib` example::
Notice that there should be at most one result. The first result founded
will be returned. If the directory is not found, returns None.
- ``get_file_users(path)`` -> iterator of ``EggInfo`` instances.
Iterates over all projects to find out which project uses ``path``.
``path`` can be a local absolute path or a relative '/'-separated path.
Cache functions
---------------
The functions from the previous section work with a global memory cache to
reduce the numbers of I/O accesses and speed up the lookups.
The cache can be managed with these functions:
- ``purge_cache``: removes all entries from cache.
- ``cache_enabled``: returns ``True`` if the cache is enabled.
- ``enable_cache``: enables the cache.
- ``disable_cache``: disables the cache.
Example
-------
Let's use some of the new APIs with our `zlib` example::
>>> from pkgutil import get_egg_info, get_file_users
>>> egg_info = get_egg_info('zlib')
>>> egg_info.name
'zlib'
>>> egg_info.metadata.version
'2.5.2'
>>> from pkgutil import (get_egg_info, get_metadata, get_egg_info_file,
... get_files)
>>> get_egg_info('zlib')
'/opt/local/lib/python2.6/site-packages/zlib-2.5.2.egg-info'
>>> metadata = get_metadata('zlib')
>>> metadata.version
'2.5.2'
>>> get_egg_info_file('zlib', 'PKG-INFO').read()
some
...
files
>>> for path, hash, size, other_projects in get_files('zlib'):
... print '%s %s %d %s' % (path, hash, size, ','.join(other_projects))
>>> for path, hash, size in egg_info.get_installed_files()::
... print '%s %s %d %s' % (path, hash, size)
...
zlib/include/zconf.h b690274f621402dda63bf11ba5373bf2 9544
zlib/include/zlib.h 9c4b84aff68aa55f2e9bf70481b94333 66188
zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128
zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486
zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
zlib-2.5.2.egg-info/RECORD None None
zlib/lib/libz.a e6d43fb94292411909404b07d0692d46 91128
zlib/share/man/man3/zlib.3 785dc03452f0508ff0678fba2457e0ba 4486
zlib-2.5.2.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
zlib-2.5.2.egg-info/RECORD None None
>>> egg_info.uses('zlib/include/zlib.h')
True
>>> egg_info.owns('zlib/include/zlib.h')
True
>>> egg_info.get_file('zlib/include/zlib.h')
<open file at ...>
PEP 262 replacement
===================
In the past an attempt was made to create a installation database (see PEP 262
[#pep262]_).
Extract from PEP 262 Requirements:
" We need a way to figure out what distributions, and what versions of
those distributions, are installed on a system..."
Since the APIs proposed in the current PEP provide everything needed to meet
this requirement, PEP 376 will replace PEP 262 and will become the official
`installation database` standard.
The new version of PEP 345 (XXX work in progress) will extend the Metadata
standard and will fullfill the requirements described in PEP 262, like the
`REQUIRES` section.
Adding an Uninstall function
============================
Distutils provides a very basic way to install a project, which is running
Distutils already provides a very basic way to install a project, which is running
the `install` command over the `setup.py` script of the distribution.
Distutils will provide a very basic ``uninstall`` function, that will be added
in ``distutils.util`` and will take the name of the project to uninstall as
its argument. ``uninstall`` will use ``pkgutil.get_files`` and remove all
Distutils will provide a very basic ``uninstall`` function, that will be added
in ``distutils.util`` and will take the name of the project to uninstall as
its argument. ``uninstall`` will use the APIs desribed earlier and remove all
unique files, as long as their hash didn't change. Then it will remove
directories where it removed the last elements.
empty directories left behind.
``uninstall`` will return a list of uninstalled files::
@ -301,9 +384,9 @@ directories where it removed the last elements.
If the project is not found, a ``DistutilsUninstallError`` will be raised.
To make it a reference API for third-party projects that wish to control
how `uninstall` works, a second callable argument can be used. It will be
called for each file that is removed. If the callable returns `True`, the
To make it a reference API for third-party projects that wish to control
how `uninstall` works, a second callable argument can be used. It will be
called for each file that is removed. If the callable returns `True`, the
file will be removed. If it returns False, it will be left alone.
Examples::
@ -320,7 +403,7 @@ Examples::
...
>>> uninstall('zlib', _dry_run)
Of course, a third-party tool can use ``pkgutil.get_files``, to implement
Of course, a third-party tool can use ``pkgutil`` APIs to implement
its own uninstall feature.
Backward compatibility and roadmap
@ -349,6 +432,9 @@ References
.. [#pip]
http://pypi.python.org/pypi/pip
.. [#eggformats]
http://peak.telecommunity.com/DevCenter/EggFormats
Aknowledgments
==============