Various changes inspired by Thomas Heller's c.l.p comments

2002-03-25 13:36:05 +00:00 · 2002-03-25 13:36:05 +00:00 · 8dd23c1dcf
parent 8e48462a3c
commit 8dd23c1dcf
1 changed files with 46 additions and 41 deletions
--- a/pep-0262.txt
+++ b/pep-0262.txt
@ -22,10 +22,10 @@ Requirements
 
        * Is package X on a system?  
        * What version of package X is installed?
-        * Where can the new version of package X be found?  
-          XXX Does this mean "a home page where the user can go and
+        * Where can the new version of package X be found?  (This can
+          be defined as either "a home page where the user can go and
          find a download link", or "a place where a program can find
-          the newest version?"  Perhaps both...
+          the newest version?"  Both should probably be supported.)
        * What files did package X put on my system?
        * What package did the file x/y/z.py come from?
        * Has anyone modified x/y/z.py locally?
@ -46,18 +46,9 @@ Database Location

    The rationale for scanning subdirectories is that we can move to a
    directory-based indexing scheme if the package directory contains
-    too many entries.  That is, instead of INSTALLDB/Numeric, we
-    could switch to INSTALLDB/N/Nu/Numeric or some similar scheme.
-
-    XXX how much do we care about performance?  Do we really need to
-    use an anydbm file or something similar?
-
-    XXX is the actual filename important?  Let's say the installation
-    data for PIL is in the file INSTALLDB/Numeric.  Is this OK?  When
-    we want to figure out if Numeric is installed, do we want to open
-    a single file, or have to scan them all?  Note that for
-    human-interface purposes, we'll often have to scan all the
-    packages anyway, for a case-insensitive or keyword search.
+    too many entries.  For example, this would let us transparently
+    switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some
+    similar hashing scheme.


 Database Contents
@ -70,31 +61,31 @@ Database Contents
        FILES'.  This is for future-proofing; if we add a new section,
        for example to list documentation files, then we'd add a DOCS
        section and list it in the contents.  Sections are always
-        separated by blank lines.  XXX too simple?
+        separated by blank lines.

-        [PKG-INFO section] An initial set of RFC-822 headers
-        containing the package information for a file, as described in
-        PEP 241, "Metadata for Python Software Packages".
+    PKG-INFO section
+
+        An initial set of RFC-822 headers containing the package
+        information for a file, as described in PEP 241, "Metadata for
+        Python Software Packages".

        A blank line indicating the end of the PKG-INFO section.

+    FILES section 
+   
        An entry for each file installed by the package.  
        XXX Are .pyc and .pyo files in this list?  What about compiled
        .so files?  AMK thinks "no" and "yes", respectively.

-    Each file's entry is a single tab-delimited line that contains the
-    following fields: 
-    XXX should each file entry be all on one line and
-    tab-delimited?  More RFC-822 headers?  AMK thinks tab-delimited
-    seems sufficent.
+    Each file's entry is a single tab-delimited line that contains
+    the following fields: 

        * The file's size
- 
-        * XXX do we need to store permissions?  The owner/group?  

-        * An MD5 digest of the file, written in hex.  (XXX All 16
-          bytes of the digest seems unnecessary; first 8 bytes only,
-          maybe?  Is a zlib.crc32() hash sufficient?)
+        * The file's permissions, and the owner/group of the file.
+          XXX what to do on Windows?
+ 
+        * An MD5 digest of the file, encoded in hex.  

        * The file's full path, as installed on the system.  (XXX
          should it be relative to sys.prefix, or sys.prefix +
@ -104,28 +95,42 @@ Database Contents
          
        * XXX some sort of type indicator, to indicate whether this is
          a Python module, binary module, documentation file, config
-          file?  Do we need this?
+          file?  Do we need this?  

-    A package that uses the Distutils for installation will
+    A package that uses the Distutils for installation should
    automatically update the database.  Packages that roll their own
-    installation 
+    installation will have to use the database's API to to manually
+    add or update their own entry.  System package managers such as
+    RPM or pkgadd can just create the new 'package name' file in the
+    INSTALLDB directory.

-    XXX what's the relationship between this database and the RPM or
-    DPKG database?  I'm tempted to make the Python database completely
-    optional; a distributor can preserve the interface of the package
-    management tool and replace it with their own wrapper on top of
-    their own package manager.  (XXX but how would the Distutils know
-    that, and not bother to update the Python database?)

-    
 Deliverables
+
+    A description of the database API, to be added to this PEP.
  
-    Patches to the Distutils that 1) implement a InstallationDatabase
+    Patches to the Distutils that 1) implement an InstallationDatabase
    class, 2) Update the database when a new package is installed.  3)
    a simple package management tool, features to be added to this
    PEP.  (Or a separate PEP?)  


+Rejected Suggestions
+
+    Instead of using one text file per package, one large text file or
+    an anydbm file could be used.  This has been rejected for a few
+    reasons.  First, performance is probably not an extremely pressing
+    concern as the package database is only used when installing or
+    removing packages, a relatively infrequent task.  Scalability also
+    likely isn't a problem, as people may have hundreds of Python
+    packages installed, but thousands seems unlikely.  Finally,
+    individual text files are compatible with installers such as RPM
+    or DPKG because a package can just drop the new database file into
+    the database directory.  If one large text file or a binary file
+    were used, the Python database would then have to be updated by 
+    running a postinstall script.
+
+
 References

    [1] Michael Muller's patch (posted to the Distutils-SIG around 28
@ -135,7 +140,7 @@ References
 Acknowledgements

    Ideas for this PEP originally came from postings by Greg Ward,
-    Fred Drake, Mats Wichmann, and others.
+    Fred L. Drake Jr., Thomas Heller, Mats Wichmann, and others.

    Many changes and rewrites to this document were suggested by the
    readers of the Distutils SIG.