diff --git a/pep-0262.txt b/pep-0262.txt index 4463914fd..a6c3c085d 100644 --- a/pep-0262.txt +++ b/pep-0262.txt @@ -22,10 +22,10 @@ Requirements * Is package X on a system? * What version of package X is installed? - * Where can the new version of package X be found? - XXX Does this mean "a home page where the user can go and + * Where can the new version of package X be found? (This can + be defined as either "a home page where the user can go and find a download link", or "a place where a program can find - the newest version?" Perhaps both... + the newest version?" Both should probably be supported.) * What files did package X put on my system? * What package did the file x/y/z.py come from? * Has anyone modified x/y/z.py locally? @@ -46,18 +46,9 @@ Database Location The rationale for scanning subdirectories is that we can move to a directory-based indexing scheme if the package directory contains - too many entries. That is, instead of INSTALLDB/Numeric, we - could switch to INSTALLDB/N/Nu/Numeric or some similar scheme. - - XXX how much do we care about performance? Do we really need to - use an anydbm file or something similar? - - XXX is the actual filename important? Let's say the installation - data for PIL is in the file INSTALLDB/Numeric. Is this OK? When - we want to figure out if Numeric is installed, do we want to open - a single file, or have to scan them all? Note that for - human-interface purposes, we'll often have to scan all the - packages anyway, for a case-insensitive or keyword search. + too many entries. For example, this would let us transparently + switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some + similar hashing scheme. Database Contents @@ -70,31 +61,31 @@ Database Contents FILES'. This is for future-proofing; if we add a new section, for example to list documentation files, then we'd add a DOCS section and list it in the contents. Sections are always - separated by blank lines. XXX too simple? + separated by blank lines. - [PKG-INFO section] An initial set of RFC-822 headers - containing the package information for a file, as described in - PEP 241, "Metadata for Python Software Packages". + PKG-INFO section + + An initial set of RFC-822 headers containing the package + information for a file, as described in PEP 241, "Metadata for + Python Software Packages". A blank line indicating the end of the PKG-INFO section. + FILES section + An entry for each file installed by the package. XXX Are .pyc and .pyo files in this list? What about compiled .so files? AMK thinks "no" and "yes", respectively. - Each file's entry is a single tab-delimited line that contains the - following fields: - XXX should each file entry be all on one line and - tab-delimited? More RFC-822 headers? AMK thinks tab-delimited - seems sufficent. + Each file's entry is a single tab-delimited line that contains + the following fields: * The file's size - - * XXX do we need to store permissions? The owner/group? - * An MD5 digest of the file, written in hex. (XXX All 16 - bytes of the digest seems unnecessary; first 8 bytes only, - maybe? Is a zlib.crc32() hash sufficient?) + * The file's permissions, and the owner/group of the file. + XXX what to do on Windows? + + * An MD5 digest of the file, encoded in hex. * The file's full path, as installed on the system. (XXX should it be relative to sys.prefix, or sys.prefix + @@ -104,28 +95,42 @@ Database Contents * XXX some sort of type indicator, to indicate whether this is a Python module, binary module, documentation file, config - file? Do we need this? + file? Do we need this? - A package that uses the Distutils for installation will + A package that uses the Distutils for installation should automatically update the database. Packages that roll their own - installation + installation will have to use the database's API to to manually + add or update their own entry. System package managers such as + RPM or pkgadd can just create the new 'package name' file in the + INSTALLDB directory. - XXX what's the relationship between this database and the RPM or - DPKG database? I'm tempted to make the Python database completely - optional; a distributor can preserve the interface of the package - management tool and replace it with their own wrapper on top of - their own package manager. (XXX but how would the Distutils know - that, and not bother to update the Python database?) - Deliverables + + A description of the database API, to be added to this PEP. - Patches to the Distutils that 1) implement a InstallationDatabase + Patches to the Distutils that 1) implement an InstallationDatabase class, 2) Update the database when a new package is installed. 3) a simple package management tool, features to be added to this PEP. (Or a separate PEP?) +Rejected Suggestions + + Instead of using one text file per package, one large text file or + an anydbm file could be used. This has been rejected for a few + reasons. First, performance is probably not an extremely pressing + concern as the package database is only used when installing or + removing packages, a relatively infrequent task. Scalability also + likely isn't a problem, as people may have hundreds of Python + packages installed, but thousands seems unlikely. Finally, + individual text files are compatible with installers such as RPM + or DPKG because a package can just drop the new database file into + the database directory. If one large text file or a binary file + were used, the Python database would then have to be updated by + running a postinstall script. + + References [1] Michael Muller's patch (posted to the Distutils-SIG around 28 @@ -135,7 +140,7 @@ References Acknowledgements Ideas for this PEP originally came from postings by Greg Ward, - Fred Drake, Mats Wichmann, and others. + Fred L. Drake Jr., Thomas Heller, Mats Wichmann, and others. Many changes and rewrites to this document were suggested by the readers of the Distutils SIG.