Jim's latest revision.

This commit is contained in:
Barry Warsaw 2001-11-13 20:25:43 +00:00
parent e3756de95c
commit 9acb37124b
1 changed files with 70 additions and 37 deletions

View File

@ -12,7 +12,9 @@ Python-Version: 2.3
Abstract
This PEP adds the ability to import Python modules
*.py, *.py[co] and packages from zip archives.
*.py, *.py[co] and packages from zip archives. The
same code is used to speed up normal directory imports
provided os.listdir is available.
Specification
@ -34,19 +36,13 @@ Specification
zip archive name is added too. Otherwise there is no way to
import all Python library files from an archive.
Reading compressed zip archives requires the zlib module. An
import of zlib will be attempted prior to any other imports. If
zlib is not available at that time, only uncompressed archives
will be readable, even if zlib subsequently becomes available.
Subdirectory Equivalence
The zip archive must be treated exactly as a subdirectory tree so
we can support package imports based on current and future rules.
Zip archive files must be created with relative path names. That
is, archive file names are of the form: file1, file2, dir1/file3,
dir2/dir3/file4.
All zip data is taked from the Central Directory, the data must be
correct, and brain dead zip files are not accommodated.
Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip",
and we are trying to import modfoo from the Q package. Then
@ -93,29 +89,16 @@ Efficiency
This is exactly the name generated by import.c, and makes lookup
easy.
This same mechanism is used to speed up directory (non-zip) imports.
See below.
zlib
Compressed zip archives require zlib for decompression. Prior to
any other imports, we attempt an import of zlib, and set a flag if
it is available. All compressed files are invisible unless this
flag is true.
It could happen that zlib was available later. For example, the
import of site.py might add the correct directory to sys.path so a
dynamic load succeeds. But compressed files will still be
invisible. It is unknown if it can happen that importing site.py
can cause zlib to appear, so maybe we're worrying about nothing.
On Windows and Linux, the early import of zlib succeeds without
site.py.
The problem here is the confusion caused by the reverse. Either a
zip file satisfies imports or it doesn't. It is silly to say that
site.py needs to be uncompressed, and that maybe imports will
succeed later. If you don't like this, create uncompressed zip
archives or make sure zlib is available, for example, as a
built-in module. Or we can write special search logic during zip
initialization.
any other imports, we attempt an import of zlib. Import of
compressed files will fail with a message "missing zlib" unless
zlib is available.
Booting
@ -133,15 +116,65 @@ Booting
find its corresponding libraries even when there are multiple
Python versions on the same machine.
I propose that there is one name added to sys.path, and the
file name is "python%s%s.zip" % (sys.version[0], sys.version[2]).
For example, python22.zip. This is the same on all platforms.
On Unix, the directory is sys.prefix + "/lib". So for prefix
/usr/local, the path /usr/local/lib/python2.2/ is already on
sys.path, and /usr/local/lib/python22.zip would be added.
On Windows, the directory is the directory of sys.executable.
The zip archive name is always inserted as the second item
in sys.path. The first is the directory of the main.py (thanks Tim).
We add one name to sys.path. On Unix, the directory is
sys.prefix + "/lib", and the file name is
"python%s%s.zip" % (sys.version[0], sys.version[2]).
So for Python 2.2 and prefix /usr/local, the path
/usr/local/lib/python2.2/ is already on sys.path, and
/usr/local/lib/python22.zip would be added.
On Windows, the file is the full path to python22.dll, with
"dll" replaced by "zip". The zip archive name is always inserted
as the second item in sys.path. The first is the directory of the
main.py (thanks Tim).
Directory Imports
The static Python dictionary used to speed up zip imports can be
used to speed up normal directory imports too. For each item in
sys.path that is not a zip archive, we call os.listdir, and add
the directory contents to the dictionary. Then instead of calling
fopen() in a double loop, we just check the dictionary. This
greatly speeds up imports. If os.listdir doesn't exist, the
dictionary is not used.
Benchmarks
Case Original 2.2a3 Using os.listdir Zip Uncomp Zip Compr
---- ----------------- ----------------- ---------- ----------
1 3.2 2.5 3.2->1.02 2.3 2.5 2.3->0.87 1.66->0.93 1.5->1.07
2 2.8 3.9 3.0->1.32 Same as Case 1.
3 5.7 5.7 5.7->5.7 2.1 2.1 2.1->1.8 1.25->0.99 1.19->1.13
4 9.4 9.4 9.3->9.35 Same as Case 3.
Case 1: Local drive C:, sys.path has its default value.
Case 2: Local drive C:, directory with files is at the end of sys.path.
Case 3: Network drive, sys.path has its default value.
Case 4: Network drive, directory with files is at the end of sys.path.
Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg.
The machine was running Windows 2000 with a Linux/Samba network server.
Times are in seconds, and are the time to import about 100 Lib modules.
Case 2 and 4 have the "correct" directory moved to the end of sys.path.
"Uncomp" means uncompressed zip archive, "Compr" means compressed.
Initial times are after a re-boot of the system; the time after
"->" is the time after repeated runs. Times to import from C:
after a re-boot are rather highly variable for the "Original" case,
but are more realistic.
Custom Imports
The logic demonstrates the ability to import using default searching
until a needed Python module (in this case, os) becomes available.
This can be used to bootstrap custom importers. For example, if
"importer()" in __init__.py exists, then it could be used for imports.
The "importer()" can freely import os and other modules, and these
will be satisfied from the default mechanism. This PEP does not
define any custom importers, and this note is for information only.
Implementation