165 lines
6.7 KiB
Plaintext
165 lines
6.7 KiB
Plaintext
PEP: 273
|
||
Title: Import Modules from Zip Archives
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: jim@interet.com (James C. Ahlstrom)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 11-Oct-2001
|
||
Post-History: 26-Oct-2001
|
||
Python-Version: 2.3
|
||
|
||
|
||
Abstract
|
||
This PEP adds the ability to import compiled Python modules
|
||
*.py[co] and packages from zip archives.
|
||
|
||
|
||
Specification
|
||
|
||
Currently, sys.path is a list of directory names as strings. If
|
||
this PEP is implemented, an item of sys.path can be a string
|
||
naming a zip file archive. The zip archive can contain a
|
||
subdirectory structure to support package imports. The zip
|
||
archive satisfies imports exactly as a subdirectory would.
|
||
|
||
The implementation is in C code in the Python core and works on
|
||
all supported Python platforms.
|
||
|
||
Any files may be present in the zip archive, but only files *.pyc,
|
||
*.pyo and __init__.py[co] are available for import. Zip import of
|
||
*.py and dynamic modules (*.pyd, *.so) is disallowed.
|
||
|
||
Just as sys.path currently has default directory names, default
|
||
zip archive names are added too. Otherwise there is no way to
|
||
import all Python library files from an archive.
|
||
|
||
Reading compressed zip archives requires the zlib module. An
|
||
import of zlib will be attempted prior to any other imports. If
|
||
zlib is not available at that time, only uncompressed archives
|
||
will be readable, even if zlib subsequently becomes available.
|
||
|
||
|
||
Subdirectory Equivalence
|
||
|
||
The zip archive must be treated exactly as a subdirectory tree so
|
||
we can support package imports based on current and future rules.
|
||
Zip archive files must be created with relative path names. That
|
||
is, archive file names are of the form: file1, file2, dir1/file3,
|
||
dir2/dir3/file4.
|
||
|
||
Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip",
|
||
and we are trying to import modfoo from the Q package. Then
|
||
import.c will generate a list of paths and extensions and will
|
||
look for the file. The list of generated paths does not change
|
||
for zip imports. Suppose import.c generates the path
|
||
"/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path
|
||
"/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is
|
||
exactly equivalent to finding "Q/R/modfoo.pyc" in the archive.
|
||
|
||
Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then
|
||
your zip file will satisfy imports just as your subdirectory did.
|
||
|
||
Well, not quite. You can't satisfy dynamic modules from a zip
|
||
file. Dynamic modules have extensions like .dll, .pyd, and .so.
|
||
They are operating system dependent, and probably can't be loaded
|
||
except from a file. It might be possible to extract the dynamic
|
||
module from the zip file, write it to a plain file and load it.
|
||
But that would mean creating temporary files, and dealing with all
|
||
the dynload_*.c, and that's probably not a good idea.
|
||
|
||
You also can't import source files *.py from a zip archive. The
|
||
problem here is what to do with the compiled files. Python would
|
||
normally write these to the same directory as *.py, but surely we
|
||
don't want to write to the zip file. We could write to the
|
||
directory of the zip archive, but that would clutter it up, not
|
||
good if it is /usr/bin for example. We could just fail to write
|
||
the compiled files, but that makes zip imports very slow, and the
|
||
user would probably not figure out what is wrong. It is probably
|
||
best for users to put *.pyc into zip archives in the first place,
|
||
and this PEP enforces that rule.
|
||
|
||
So the only imports zip archives support are *.pyc and *.pyo, plus
|
||
the import of __init__.py[co] for packages, and the search of the
|
||
subdirectory structure for the same.
|
||
|
||
|
||
Efficiency
|
||
|
||
The only way to find files in a zip archive is linear search. So
|
||
for each zip file in sys.path, we search for its names once, and
|
||
put the names plus other relevant data into a static Python
|
||
dictionary. The key is the archive name from sys.path joined with
|
||
the file name (including any subdirectories) within the archive.
|
||
This is exactly the name generated by import.c, and makes lookup
|
||
easy.
|
||
|
||
|
||
zlib
|
||
|
||
Compressed zip archives require zlib for decompression. Prior to
|
||
any other imports, we attempt an import of zlib, and set a flag if
|
||
it is available. All compressed files are invisible unless this
|
||
flag is true.
|
||
|
||
It could happen that zlib was available later. For example, the
|
||
import of site.py might add the correct directory to sys.path so a
|
||
dynamic load succeeds. But compressed files will still be
|
||
invisible. It is unknown if it can happen that importing site.py
|
||
can cause zlib to appear, so maybe we're worrying about nothing.
|
||
On Windows and Linux, the early import of zlib succeeds without
|
||
site.py.
|
||
|
||
The problem here is the confusion caused by the reverse. Either a
|
||
zip file satisfies imports or it doesn't. It is silly to say that
|
||
site.py needs to be uncompressed, and that maybe imports will
|
||
succeed later. If you don't like this, create uncompressed zip
|
||
archives or make sure zlib is available, for example, as a
|
||
built-in module. Or we can write special search logic during zip
|
||
initialization.
|
||
|
||
|
||
Booting
|
||
|
||
Python imports site.py itself, and this imports os, nt, ntpath,
|
||
stat, and UserDict. It also imports sitecustomize.py which may
|
||
import more modules. Zip imports must be available before site.py
|
||
is imported.
|
||
|
||
Just as there are default directories in sys.path, there must be
|
||
one or more default zip archives too.
|
||
|
||
The problem is what the name should be. The name should be linked
|
||
with the Python version, so the Python executable can correctly
|
||
find its corresponding libraries even when there are multiple
|
||
Python versions on the same machine.
|
||
|
||
This PEP suggests a zip archive name equal to the Python
|
||
interpreter path with extension ".zip" (eg, /usr/bin/python.zip)
|
||
which is always prepended to sys.path. So a directory with python
|
||
and python.zip is complete. This would work fine on Windows, as
|
||
it is common to put supporting files in the directory of the
|
||
executable. But it may offend Unix fans, who dislike bin
|
||
directories being used for libraries. It might be fine to
|
||
generate different defaults for Windows and Unix if necessary, but
|
||
the code will be in C, and there is no sense getting complicated.
|
||
|
||
|
||
Implementation
|
||
|
||
A C implementation is available as SourceForge patch 476047.
|
||
http://sourceforge.net/tracker/index.php?func=detail&aid=476047&group_id=5470&atid=305470
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|