PEP: 273 Title: Import Modules from Zip Archives Version: $Revision$ Last-Modified: $Date$ Author: jim@interet.com (James C. Ahlstrom) Status: Draft Type: Standards Track Created: 11-Oct-2001 Post-History: 26-Oct-2001 Python-Version: 2.3 Abstract This PEP adds the ability to import compiled Python modules *.py[co] and packages from zip archives. Specification Currently, sys.path is a list of directory names as strings. If this PEP is implemented, an item of sys.path can be a string naming a zip file archive. The zip archive can contain a subdirectory structure to support package imports. The zip archive satisfies imports exactly as a subdirectory would. The implementation is in C code in the Python core and works on all supported Python platforms. Any files may be present in the zip archive, but only files *.pyc, *.pyo and __init__.py[co] are available for import. Zip import of *.py and dynamic modules (*.pyd, *.so) is disallowed. Just as sys.path currently has default directory names, default zip archive names are added too. Otherwise there is no way to import all Python library files from an archive. Reading compressed zip archives requires the zlib module. An import of zlib will be attempted prior to any other imports. If zlib is not available at that time, only uncompressed archives will be readable, even if zlib subsequently becomes available. Subdirectory Equivalence The zip archive must be treated exactly as a subdirectory tree so we can support package imports based on current and future rules. Zip archive files must be created with relative path names. That is, archive file names are of the form: file1, file2, dir1/file3, dir2/dir3/file4. Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip", and we are trying to import modfoo from the Q package. Then import.c will generate a list of paths and extensions and will look for the file. The list of generated paths does not change for zip imports. Suppose import.c generates the path "/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is exactly equivalent to finding "Q/R/modfoo.pyc" in the archive. Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then your zip file will satisfy imports just as your subdirectory did. Well, not quite. You can't satisfy dynamic modules from a zip file. Dynamic modules have extensions like .dll, .pyd, and .so. They are operating system dependent, and probably can't be loaded except from a file. It might be possible to extract the dynamic module from the zip file, write it to a plain file and load it. But that would mean creating temporary files, and dealing with all the dynload_*.c, and that's probably not a good idea. You also can't import source files *.py from a zip archive. The problem here is what to do with the compiled files. Python would normally write these to the same directory as *.py, but surely we don't want to write to the zip file. We could write to the directory of the zip archive, but that would clutter it up, not good if it is /usr/bin for example. We could just fail to write the compiled files, but that makes zip imports very slow, and the user would probably not figure out what is wrong. It is probably best for users to put *.pyc into zip archives in the first place, and this PEP enforces that rule. So the only imports zip archives support are *.pyc and *.pyo, plus the import of __init__.py[co] for packages, and the search of the subdirectory structure for the same. Efficiency The only way to find files in a zip archive is linear search. So for each zip file in sys.path, we search for its names once, and put the names plus other relevant data into a static Python dictionary. The key is the archive name from sys.path joined with the file name (including any subdirectories) within the archive. This is exactly the name generated by import.c, and makes lookup easy. zlib Compressed zip archives require zlib for decompression. Prior to any other imports, we attempt an import of zlib, and set a flag if it is available. All compressed files are invisible unless this flag is true. It could happen that zlib was available later. For example, the import of site.py might add the correct directory to sys.path so a dynamic load succeeds. But compressed files will still be invisible. It is unknown if it can happen that importing site.py can cause zlib to appear, so maybe we're worrying about nothing. On Windows and Linux, the early import of zlib succeeds without site.py. The problem here is the confusion caused by the reverse. Either a zip file satisfies imports or it doesn't. It is silly to say that site.py needs to be uncompressed, and that maybe imports will succeed later. If you don't like this, create uncompressed zip archives or make sure zlib is available, for example, as a built-in module. Or we can write special search logic during zip initialization. Booting Python imports site.py itself, and this imports os, nt, ntpath, stat, and UserDict. It also imports sitecustomize.py which may import more modules. Zip imports must be available before site.py is imported. Just as there are default directories in sys.path, there must be one or more default zip archives too. The problem is what the name should be. The name should be linked with the Python version, so the Python executable can correctly find its corresponding libraries even when there are multiple Python versions on the same machine. This PEP suggests a zip archive name equal to the Python interpreter path with extension ".zip" (eg, /usr/bin/python.zip) which is always prepended to sys.path. So a directory with python and python.zip is complete. This would work fine on Windows, as it is common to put supporting files in the directory of the executable. But it may offend Unix fans, who dislike bin directories being used for libraries. It might be fine to generate different defaults for Windows and Unix if necessary, but the code will be in C, and there is no sense getting complicated. Implementation A C implementation is available as SourceForge patch 476047. http://sourceforge.net/tracker/index.php?func=detail&aid=476047&group_id=5470&atid=305470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil fill-column: 70 End: