python-peps/pep-0262.txt

291 lines
10 KiB
Plaintext
Raw Normal View History

PEP: 262
Title: A Database of Installed Python Packages
Version: $Revision$
Author: A.M. Kuchling <amk@amk.ca>
Type: Standards Track
Created: 08-Jul-2001
2003-03-30 09:46:55 -05:00
Status: Draft
2002-03-27 22:03:28 -05:00
Post-History: 27-Mar-2002
Introduction
This PEP describes a format for a database of Python packages
installed on a system.
Requirements
We need a way to figure out what packages, and what versions of
those packages, are installed on a system. We want to provide
features similar to CPAN, APT, or RPM. Required use cases that
should be supported are:
* Is package X on a system?
* What version of package X is installed?
* Where can the new version of package X be found? (This can
be defined as either "a home page where the user can go and
find a download link", or "a place where a program can find
the newest version?" Both should probably be supported.)
* What files did package X put on my system?
* What package did the file x/y/z.py come from?
* Has anyone modified x/y/z.py locally?
2003-03-30 11:52:11 -05:00
* What other packages does this package need?
* What Python modules does this package provide?
Database Location
The database lives in a bunch of files under
<prefix>/lib/python<version>/install/. This location will be
called INSTALLDB through the remainder of this PEP.
The structure of the database is deliberately kept simple; each
file in this directory or its subdirectories (if any) describes a
2003-03-30 11:52:11 -05:00
single package. Binary packages of Python software such as RPMs
can then update Python's database by just installing the
corresponding file into the INSTALLDB directory.
The rationale for scanning subdirectories is that we can move to a
directory-based indexing scheme if the package directory contains
too many entries. For example, this would let us transparently
switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some
similar hashing scheme.
Database Contents
Each file in INSTALLDB or its subdirectories describes a single
package, and has the following contents:
An initial line listing the sections in this file, separated
2003-03-30 11:52:11 -05:00
by whitespace. Currently this will always be 'PKG-INFO FILES
REQUIRES PROVIDES'. This is for future-proofing; if we add a
new section, for example to list documentation files, then
we'd add a DOCS section and list it in the contents. Sections
are always separated by blank lines.
A package that uses the Distutils for installation should
automatically update the database. Packages that roll their own
installation will have to use the database's API to to manually
add or update their own entry. System package managers such as
RPM or pkgadd can just create the new 'package name' file in the
INSTALLDB directory.
Each section of the file is used for a different purpose.
PKG-INFO section
An initial set of RFC-822 headers containing the package
information for a file, as described in PEP 241, "Metadata for
Python Software Packages".
A blank line indicating the end of the PKG-INFO section.
FILES section
An entry for each file installed by the package. Generated files
such as .pyc and .pyo files are on this list as well as the original
.py files installed by a package; their checksums won't be stored or
checked, though.
Each file's entry is a single tab-delimited line that contains
the following fields:
* The file's full path, as installed on the system.
* The file's size
* The file's permissions. On Windows, this field will always be
'unknown'
* The owner and group of the file, separated by a tab.
On Windows, these fields will both be 'unknown'.
2003-03-30 10:01:31 -05:00
* A SHA1 digest of the file, encoded in hex.
2003-03-30 11:52:11 -05:00
REQUIRES section
This section is a list of strings giving the services required for
this module distribution to run properly. This list includes the
package name ("python-stdlib") and module names ("rfc822",
"htmllib", "email", "email.Charset"). It will be specified
by an extra 'requires' argument to the distutils.core.setup()
function. For example:
setup(..., requires=['xml.utils.iso8601',
Eventually there may be automated tools that look through all of
the code and produce a list of requirements, but it's unlikely
that these tools can handle all possible cases; a manual
way to specify requirements will always be necessary.
PROVIDES section
This section is a list of strings giving the services provided by
an installed package. This list includes the package name
("python-stdlib") and module names ("rfc822", "htmllib", "email",
"email.Charset").
XXX should files be listed? e.g. $PREFIX/lib/color-table.txt,
to pick up data files, required scripts, etc.
Eventually there may be an option to let module developers add
their own strings to this section. For example, you might add
"XML parser" to this section, and other module distributions could
then list "XML parser" as one of their dependencies to indicate
that multiple different XML parsers can be used. For now this
ability isn't supported because it raises too many issues: do we
need a central registry of legal strings, or just let people put
whatever they like? Etc., etc...
2002-03-28 16:39:16 -05:00
API Description
There's a single fundamental class, InstallationDatabase. The
code for it lives in distutils/install_db.py. (XXX any
suggestions for alternate locations in the standard library, or an
alternate module name?)
The InstallationDatabase returns instances of Package that contain
all the information about an installed package.
XXX Several of the fields in Package are duplicates of ones in
distutils.dist.Distribution. Probably they should be factored out
into the Package class proposed here, but can this be done in a
backward-compatible way?
InstallationDatabase has the following interface:
class InstallationDatabase:
def __init__ (self, path=None):
"""InstallationDatabase(path:string)
Read the installation database rooted at the specified path.
If path is None, INSTALLDB is used as the default.
"""
def get_package (self, package_name):
"""get_package(package_name:string) : Package
Get the object corresponding to a single package.
"""
def list_packages (self):
"""list_packages() : [Package]
Return a list of all packages installed on the system,
enumerated in no particular order.
"""
def find_package (self, path):
"""find_file(path:string) : Package
Search and return the package containing the file 'path'.
Returns None if the file doesn't belong to any package
that the InstallationDatabase knows about.
XXX should this work for directories?
"""
2002-03-28 16:39:16 -05:00
class Package:
"""Instance attributes:
name : string
Package name
files : {string : (size:int, perms:int, owner:string, group:string,
digest:string)}
Dictionary mapping the path of a file installed by this package
to information about the file.
The following fields all come from PEP 241.
version : distutils.version.Version
Version of this package
platform : [string]
summary : string
description : string
keywords : string
home_page : string
author : string
author_email : string
license : string
"""
def add_file (self, path):
"""add_file(path:string):None
Record the size, ownership, &c., information for an installed file.
XXX as written, this would stat() the file. Should the size/perms/
checksum all be provided as parameters to this method instead?
"""
def has_file (self, path):
"""has_file(path:string) : Boolean
Returns true if the specified path belongs to a file in this
package.
"""
def check_file (self, path):
"""check_file(path:string) : Boolean
Checks whether the file's size, checksum, and ownership match,
returning true if they do.
"""
2002-03-28 16:39:16 -05:00
Deliverables
A description of the database API, to be added to this PEP.
Patches to the Distutils that 1) implement an InstallationDatabase
class, 2) Update the database when a new package is installed. 3)
add a simple package management tool, features to be added to this
PEP. (Or should that be a separate PEP?) See [2] for the current
patch.
Rejected Suggestions
Instead of using one text file per package, one large text file or
an anydbm file could be used. This has been rejected for a few
reasons. First, performance is probably not an extremely pressing
concern as the package database is only used when installing or
removing packages, a relatively infrequent task. Scalability also
likely isn't a problem, as people may have hundreds of Python
packages installed, but thousands seems unlikely. Finally,
individual text files are compatible with installers such as RPM
or DPKG because a package can just drop the new database file into
the database directory. If one large text file or a binary file
were used, the Python database would then have to be updated by
running a postinstall script.
On Windows, the permissions and owner/group of a file aren't
stored. Windows does in fact support ownership and access
permissions, but reading and setting them requires the win32all
extensions, and they aren't present in the basic Python installer
for Windows.
References
[1] Michael Muller's patch (posted to the Distutils-SIG around 28
Dec 1999) generates a list of installed files.
[2] A patch to implement this PEP will be tracked as
patch #562100 on SourceForge.
http://www.python.org/sf/562100
Acknowledgements
Ideas for this PEP originally came from postings by Greg Ward,
Fred L. Drake Jr., Thomas Heller, Mats Wichmann, and others.
Many changes and rewrites to this document were suggested by the
readers of the Distutils SIG.
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: