PEP: 262 Title: A Database of Installed Python Packages Version: $Revision$ Author: A.M. Kuchling Type: Standards Track Created: 08-Jul-2001 Status: Draft Post-History: 27-Mar-2002 Introduction This PEP describes a format for a database of Python packages installed on a system. Requirements We need a way to figure out what packages, and what versions of those packages, are installed on a system. We want to provide features similar to CPAN, APT, or RPM. Required use cases that should be supported are: * Is package X on a system? * What version of package X is installed? * Where can the new version of package X be found? (This can be defined as either "a home page where the user can go and find a download link", or "a place where a program can find the newest version?" Both should probably be supported.) * What files did package X put on my system? * What package did the file x/y/z.py come from? * Has anyone modified x/y/z.py locally? Database Location The database lives in a bunch of files under /lib/python/install/. This location will be called INSTALLDB through the remainder of this PEP. The structure of the database is deliberately kept simple; each file in this directory or its subdirectories (if any) describes a single package. The rationale for scanning subdirectories is that we can move to a directory-based indexing scheme if the package directory contains too many entries. For example, this would let us transparently switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some similar hashing scheme. Database Contents Each file in INSTALLDB or its subdirectories describes a single package, and has the following contents: An initial line listing the sections in this file, separated by whitespace. Currently this will always be 'PKG-INFO FILES'. This is for future-proofing; if we add a new section, for example to list documentation files, then we'd add a DOCS section and list it in the contents. Sections are always separated by blank lines. PKG-INFO section An initial set of RFC-822 headers containing the package information for a file, as described in PEP 241, "Metadata for Python Software Packages". A blank line indicating the end of the PKG-INFO section. FILES section An entry for each file installed by the package. Generated files such as .pyc and .pyo files are on this list as well as the original .py files installed by a package; their checksums won't be stored or checked, though. Each file's entry is a single tab-delimited line that contains the following fields: * The file's full path, as installed on the system. * The file's size * The file's permissions. On Windows, this field will always be 'unknown' * The owner and group of the file, separated by a tab. On Windows, these fields will both be 'unknown'. * An MD5 digest of the file, encoded in hex. A package that uses the Distutils for installation should automatically update the database. Packages that roll their own installation will have to use the database's API to to manually add or update their own entry. System package managers such as RPM or pkgadd can just create the new 'package name' file in the INSTALLDB directory. API Description There's a single fundamental class, InstallationDatabase. The code for it lives in distutils/install_db.py. (XXX any suggestions for alternate locations in the standard library, or an alternate module name?) The InstallationDatabase returns instances of Package that contain all the information about an installed package. XXX Several of the fields in Package are duplicates of ones in distutils.dist.Distribution. Probably they should be factored out into the Package class proposed here, but can this be done in a backward-compatible way? InstallationDatabase has the following interface: class InstallationDatabase: def __init__ (self, path=None): """InstallationDatabase(path:string) Read the installation database rooted at the specified path. If path is None, INSTALLDB is used as the default. """ def get_package (self, package_name): """get_package(package_name:string) : Package Get the object corresponding to a single package. """ def list_packages (self): """list_packages() : [Package] Return a list of all packages installed on the system, enumerated in no particular order. """ def find_package (self, path): """find_file(path:string) : Package Search and return the package containing the file 'path'. Returns None if the file doesn't belong to any package that the InstallationDatabase knows about. XXX should this work for directories? """ class Package: """Instance attributes: name : string Package name files : {string : (size:int, perms:int, owner:string, group:string, digest:string)} Dictionary mapping the path of a file installed by this package to information about the file. The following fields all come from PEP 241. version : distutils.version.Version Version of this package platform : [string] summary : string description : string keywords : string home_page : string author : string author_email : string license : string """ def add_file (self, path): """add_file(path:string):None Record the size, ownership, &c., information for an installed file. XXX as written, this would stat() the file. Should the size/perms/ checksum all be provided as parameters to this method instead? """ def has_file (self, path): """has_file(path:string) : Boolean Returns true if the specified path belongs to a file in this package. """ def check_file (self, path): """check_file(path:string) : Boolean Checks whether the file's size, checksum, and ownership match, returning true if they do. """ Deliverables A description of the database API, to be added to this PEP. Patches to the Distutils that 1) implement an InstallationDatabase class, 2) Update the database when a new package is installed. 3) add a simple package management tool, features to be added to this PEP. (Or should that be a separate PEP?) See [2] for the current patch. Rejected Suggestions Instead of using one text file per package, one large text file or an anydbm file could be used. This has been rejected for a few reasons. First, performance is probably not an extremely pressing concern as the package database is only used when installing or removing packages, a relatively infrequent task. Scalability also likely isn't a problem, as people may have hundreds of Python packages installed, but thousands seems unlikely. Finally, individual text files are compatible with installers such as RPM or DPKG because a package can just drop the new database file into the database directory. If one large text file or a binary file were used, the Python database would then have to be updated by running a postinstall script. On Windows, the permissions and owner/group of a file aren't stored. Windows does in fact support ownership and access permissions, but reading and setting them requires the win32all extensions, and they aren't present in the basic Python installer for Windows. References [1] Michael Muller's patch (posted to the Distutils-SIG around 28 Dec 1999) generates a list of installed files. [2] A patch to implement this PEP will be tracked as patch #562100 on SourceForge. http://www.python.org/sf/562100 Acknowledgements Ideas for this PEP originally came from postings by Greg Ward, Fred L. Drake Jr., Thomas Heller, Mats Wichmann, and others. Many changes and rewrites to this document were suggested by the readers of the Distutils SIG. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: