333 lines
12 KiB
Plaintext
333 lines
12 KiB
Plaintext
PEP: 262
|
||
Title: A Database of Installed Python Packages
|
||
Version: $Revision$
|
||
Author: A.M. Kuchling <amk@amk.ca>
|
||
Type: Standards Track
|
||
Created: 08-Jul-2001
|
||
Status: Draft
|
||
Post-History: 27-Mar-2002
|
||
|
||
Introduction
|
||
|
||
This PEP describes a format for a database of the Python software
|
||
installed on a system.
|
||
|
||
(In this document, the term "distribution" is used to mean a set
|
||
of code that's developed and distributed together. A "distribution"
|
||
is the same as a Red Hat or Debian package, but the term "package"
|
||
already has a meaning in Python terminology, meaning "a directory
|
||
with an __init__.py file in it.")
|
||
|
||
|
||
Requirements
|
||
|
||
We need a way to figure out what distributions, and what versions of
|
||
those distributions, are installed on a system. We want to provide
|
||
features similar to CPAN, APT, or RPM. Required use cases that
|
||
should be supported are:
|
||
|
||
* Is distribution X on a system?
|
||
* What version of distribution X is installed?
|
||
* Where can the new version of distribution X be found? (This can
|
||
be defined as either "a home page where the user can go and
|
||
find a download link", or "a place where a program can find
|
||
the newest version?" Both should probably be supported.)
|
||
* What files did distribution X put on my system?
|
||
* What distribution did the file x/y/z.py come from?
|
||
* Has anyone modified x/y/z.py locally?
|
||
* What other distributions does this software need?
|
||
* What Python modules does this distribution provide?
|
||
|
||
|
||
Database Location
|
||
|
||
The database lives in a bunch of files under
|
||
<prefix>/lib/python<version>/install-db/. This location will be
|
||
called INSTALLDB through the remainder of this PEP.
|
||
|
||
The structure of the database is deliberately kept simple; each
|
||
file in this directory or its subdirectories (if any) describes a
|
||
single distribution. Binary packagings of Python software such as
|
||
RPMs can then update Python's database by just installing the
|
||
corresponding file into the INSTALLDB directory.
|
||
|
||
The rationale for scanning subdirectories is that we can move to a
|
||
directory-based indexing scheme if the database directory contains
|
||
too many entries. For example, this would let us transparently
|
||
switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some
|
||
similar hashing scheme.
|
||
|
||
|
||
Database Contents
|
||
|
||
Each file in INSTALLDB or its subdirectories describes a single
|
||
distribution, and has the following contents:
|
||
|
||
An initial line listing the sections in this file, separated
|
||
by whitespace. Currently this will always be 'PKG-INFO FILES
|
||
REQUIRES PROVIDES'. This is for future-proofing; if we add a
|
||
new section, for example to list documentation files, then
|
||
we'd add a DOCS section and list it in the contents. Sections
|
||
are always separated by blank lines.
|
||
|
||
A distribution that uses the Distutils for installation should
|
||
automatically update the database. Distributions that roll their
|
||
own installation will have to use the database's API to to
|
||
manually add or update their own entry. System package managers
|
||
such as RPM or pkgadd can just create the new file in the
|
||
INSTALLDB directory.
|
||
|
||
Each section of the file is used for a different purpose.
|
||
|
||
PKG-INFO section
|
||
|
||
An initial set of RFC-822 headers containing the distribution
|
||
information for a file, as described in PEP 241, "Metadata for
|
||
Python Software Packages".
|
||
|
||
FILES section
|
||
|
||
An entry for each file installed by the
|
||
distribution. Generated files such as .pyc and .pyo files are
|
||
on this list as well as the original .py files installed by a
|
||
distribution; their checksums won't be stored or checked,
|
||
though.
|
||
|
||
Each file's entry is a single tab-delimited line that contains
|
||
the following fields:
|
||
|
||
* The file's full path, as installed on the system.
|
||
|
||
* The file's size
|
||
|
||
* The file's permissions. On Windows, this field will always be
|
||
'unknown'
|
||
|
||
* The owner and group of the file, separated by a tab.
|
||
On Windows, these fields will both be 'unknown'.
|
||
|
||
* A SHA1 digest of the file, encoded in hex. For generated files
|
||
such as *.pyc files, this field must contain the string "-",
|
||
which indicates that the file's checksum should not be verified.
|
||
|
||
|
||
REQUIRES section
|
||
|
||
This section is a list of strings giving the services required for
|
||
this module distribution to run properly. This list includes the
|
||
distribution name ("python-stdlib") and module names ("rfc822",
|
||
"htmllib", "email", "email.Charset"). It will be specified
|
||
by an extra 'requires' argument to the distutils.core.setup()
|
||
function. For example:
|
||
|
||
setup(..., requires=['xml.utils.iso8601',
|
||
|
||
Eventually there may be automated tools that look through all of
|
||
the code and produce a list of requirements, but it's unlikely
|
||
that these tools can handle all possible cases; a manual
|
||
way to specify requirements will always be necessary.
|
||
|
||
|
||
PROVIDES section
|
||
|
||
This section is a list of strings giving the services provided by
|
||
an installed distribution. This list includes the distribution name
|
||
("python-stdlib") and module names ("rfc822", "htmllib", "email",
|
||
"email.Charset").
|
||
|
||
XXX should files be listed? e.g. $PREFIX/lib/color-table.txt,
|
||
to pick up data files, required scripts, etc.
|
||
|
||
Eventually there may be an option to let module developers add
|
||
their own strings to this section. For example, you might add
|
||
"XML parser" to this section, and other module distributions could
|
||
then list "XML parser" as one of their dependencies to indicate
|
||
that multiple different XML parsers can be used. For now this
|
||
ability isn't supported because it raises too many issues: do we
|
||
need a central registry of legal strings, or just let people put
|
||
whatever they like? Etc., etc...
|
||
|
||
|
||
API Description
|
||
|
||
There's a single fundamental class, InstallationDatabase. The
|
||
code for it lives in distutils/install_db.py. (XXX any
|
||
suggestions for alternate locations in the standard library, or an
|
||
alternate module name?)
|
||
|
||
The InstallationDatabase returns instances of Distribution that contain
|
||
all the information about an installed distribution.
|
||
|
||
XXX Several of the fields in Distribution are duplicates of ones in
|
||
distutils.dist.Distribution. Probably they should be factored out
|
||
into the Distribution class proposed here, but can this be done in a
|
||
backward-compatible way?
|
||
|
||
InstallationDatabase has the following interface:
|
||
|
||
class InstallationDatabase:
|
||
def __init__ (self, path=None):
|
||
"""InstallationDatabase(path:string)
|
||
Read the installation database rooted at the specified path.
|
||
If path is None, INSTALLDB is used as the default.
|
||
"""
|
||
|
||
def get_distribution (self, distribution_name):
|
||
"""get_distribution(distribution_name:string) : Distribution
|
||
Get the object corresponding to a single distribution.
|
||
"""
|
||
|
||
def list_distributions (self):
|
||
"""list_distributions() : [Distribution]
|
||
Return a list of all distributions installed on the system,
|
||
enumerated in no particular order.
|
||
"""
|
||
|
||
def find_distribution (self, path):
|
||
"""find_file(path:string) : Distribution
|
||
Search and return the distribution containing the file 'path'.
|
||
Returns None if the file doesn't belong to any distribution
|
||
that the InstallationDatabase knows about.
|
||
XXX should this work for directories?
|
||
"""
|
||
|
||
class Distribution:
|
||
"""Instance attributes:
|
||
name : string
|
||
Distribution name
|
||
files : {string : (size:int, perms:int, owner:string, group:string,
|
||
digest:string)}
|
||
Dictionary mapping the path of a file installed by this distribution
|
||
to information about the file.
|
||
|
||
The following fields all come from PEP 241.
|
||
|
||
version : distutils.version.Version
|
||
Version of this distribution
|
||
platform : [string]
|
||
summary : string
|
||
description : string
|
||
keywords : string
|
||
home_page : string
|
||
author : string
|
||
author_email : string
|
||
license : string
|
||
"""
|
||
|
||
def add_file (self, path):
|
||
"""add_file(path:string):None
|
||
Record the size, ownership, &c., information for an installed file.
|
||
XXX as written, this would stat() the file. Should the size/perms/
|
||
checksum all be provided as parameters to this method instead?
|
||
"""
|
||
|
||
def has_file (self, path):
|
||
"""has_file(path:string) : Boolean
|
||
Returns true if the specified path belongs to a file in this
|
||
distribution.
|
||
"""
|
||
|
||
def check_file (self, path):
|
||
"""check_file(path:string) : Boolean
|
||
Checks whether the file's size, checksum, and ownership match,
|
||
returning true if they do.
|
||
"""
|
||
|
||
|
||
|
||
Deliverables
|
||
|
||
A description of the database API, to be added to this PEP.
|
||
|
||
Patches to the Distutils that 1) implement an InstallationDatabase
|
||
class, 2) Update the database when a new distribution is installed. 3)
|
||
add a simple package management tool, features to be added to this
|
||
PEP. (Or should that be a separate PEP?) See [2] for the current
|
||
patch.
|
||
|
||
|
||
Open Issues
|
||
|
||
PJE suggests the installation database "be potentially present on
|
||
every directory in sys.path, with the contents merged in sys.path
|
||
order. This would allow home-directory or other
|
||
alternate-location installs to work, and ease the process of a
|
||
distutils install command writing the file." Nice feature: it does
|
||
mean that package manager tools can take into account Python
|
||
packages that a user has privately installed.
|
||
|
||
AMK wonders: what does setup.py do if it's told to install
|
||
packages to a directory not on sys.path? Does it write an
|
||
install-db directory to the directory it's told to write to, or
|
||
does it do nothing?
|
||
|
||
Should the package-database file itself be included in the files
|
||
list? (PJE would think yes, but of course it can't contain its
|
||
own checksum. AMK can't think of a use case where including the
|
||
DB file matters.)
|
||
|
||
PJE wonders about writing the package DB file
|
||
*first*, before installing any other files, so that failed partial
|
||
installations can both be backed out, and recognized as broken.
|
||
This PEP may have to specify some algorithm for recognizing this
|
||
situation.
|
||
|
||
Should we guarantee the format of installation databases remains
|
||
compatible across Python versions, or is it subject to arbitrary
|
||
change? Probably we need to guarantee compatibility.
|
||
|
||
|
||
Rejected Suggestions
|
||
|
||
Instead of using one text file per distribution, one large text
|
||
file or an anydbm file could be used. This has been rejected for
|
||
a few reasons. First, performance is probably not an extremely
|
||
pressing concern as the database is only used when installing or
|
||
removing software, a relatively infrequent task. Scalability also
|
||
likely isn't a problem, as people may have hundreds of Python
|
||
packages installed, but thousands or tens of thousands seems
|
||
unlikely. Finally, individual text files are compatible with
|
||
installers such as RPM or DPKG because a binary packager can just
|
||
drop the new database file into the database directory. If one
|
||
large text file or a binary file were used, the Python database
|
||
would then have to be updated by running a postinstall script.
|
||
|
||
On Windows, the permissions and owner/group of a file aren't
|
||
stored. Windows does in fact support ownership and access
|
||
permissions, but reading and setting them requires the win32all
|
||
extensions, and they aren't present in the basic Python installer
|
||
for Windows.
|
||
|
||
|
||
References
|
||
|
||
[1] Michael Muller's patch (posted to the Distutils-SIG around 28
|
||
Dec 1999) generates a list of installed files.
|
||
|
||
[2] A patch to implement this PEP will be tracked as
|
||
patch #562100 on SourceForge.
|
||
http://www.python.org/sf/562100 .
|
||
Code implementing the installation database is currently in
|
||
Python CVS in the nondist/sandbox/pep262 directory.
|
||
|
||
|
||
Acknowledgements
|
||
|
||
Ideas for this PEP originally came from postings by Greg Ward,
|
||
Fred L. Drake Jr., Thomas Heller, Mats Wichmann, Phillip J. Eby,
|
||
and others.
|
||
|
||
Many changes and rewrites to this document were suggested by the
|
||
readers of the Distutils SIG.
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|