PEP: 273 Title: Import Modules from Zip Archives Author: James C. Ahlstrom Status: Final Type: Standards Track Content-Type: text/x-rst Created: 11-Oct-2001 Python-Version: 2.3 Post-History: 26-Oct-2001 Abstract ======== This PEP adds the ability to import Python modules ``*.py``, ``*.py[co]`` and packages from zip archives. The same code is used to speed up normal directory imports provided ``os.listdir`` is available. Note ==== Zip imports were added to Python 2.3, but the final implementation uses an approach different from the one described in this PEP. The 2.3 implementation is SourceForge patch #652586 [1]_, which adds new import hooks described in :pep:`302`. The rest of this PEP is therefore only of historical interest. Specification ============= Currently, ``sys.path`` is a list of directory names as strings. If this PEP is implemented, an item of ``sys.path`` can be a string naming a zip file archive. The zip archive can contain a subdirectory structure to support package imports. The zip archive satisfies imports exactly as a subdirectory would. The implementation is in C code in the Python core and works on all supported Python platforms. Any files may be present in the zip archive, but only files ``*.py`` and ``*.py[co]`` are available for import. Zip import of dynamic modules (``*.pyd``, ``*.so``) is disallowed. Just as ``sys.path`` currently has default directory names, a default zip archive name is added too. Otherwise there is no way to import all Python library files from an archive. Subdirectory Equivalence ======================== The zip archive must be treated exactly as a subdirectory tree so we can support package imports based on current and future rules. All zip data is taken from the Central Directory, the data must be correct, and brain dead zip files are not accommodated. Suppose ``sys.path`` contains "/A/B/SubDir" and "/C/D/E/Archive.zip", and we are trying to import ``modfoo`` from the ``Q`` package. Then ``import.c`` will generate a list of paths and extensions and will look for the file. The list of generated paths does not change for zip imports. Suppose ``import.c`` generates the path "/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is exactly equivalent to finding "Q/R/modfoo.pyc" in the archive. Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then your zip file will satisfy imports just as your subdirectory did. Well, not quite. You can't satisfy dynamic modules from a zip file. Dynamic modules have extensions like ``.dll``, ``.pyd``, and ``.so``. They are operating system dependent, and probably can't be loaded except from a file. It might be possible to extract the dynamic module from the zip file, write it to a plain file and load it. But that would mean creating temporary files, and dealing with all the ``dynload_*.c``, and that's probably not a good idea. When trying to import ``*.pyc``, if it is not available then ``*.pyo`` will be used instead. And vice versa when looking for ``*.pyo``. If neither ``*.pyc`` nor ``*.pyo`` is available, or if the magic numbers are invalid, then ``*.py`` will be compiled and used to satisfy the import, but the compiled file will not be saved. Python would normally write it to the same directory as ``*.py``, but surely we don't want to write to the zip file. We could write to the directory of the zip archive, but that would clutter it up, not good if it is ``/usr/bin`` for example. Failing to write the compiled files will make zip imports very slow, and the user will probably not figure out what is wrong. So it is best to put ``*.pyc`` and ``*.pyo`` in the archive with the ``*.py``. Efficiency ========== The only way to find files in a zip archive is linear search. So for each zip file in ``sys.path``, we search for its names once, and put the names plus other relevant data into a static Python dictionary. The key is the archive name from ``sys.path`` joined with the file name (including any subdirectories) within the archive. This is exactly the name generated by ``import.c``, and makes lookup easy. This same mechanism is used to speed up directory (non-zip) imports. See below. zlib ==== Compressed zip archives require ``zlib`` for decompression. Prior to any other imports, we attempt an import of ``zlib``. Import of compressed files will fail with a message "missing ``zlib``" unless ``zlib`` is available. Booting ======= Python imports ``site.py`` itself, and this imports ``os``, ``nt``, ``ntpath``, ``stat``, and ``UserDict``. It also imports ``sitecustomize.py`` which may import more modules. Zip imports must be available before ``site.py`` is imported. Just as there are default directories in ``sys.path``, there must be one or more default zip archives too. The problem is what the name should be. The name should be linked with the Python version, so the Python executable can correctly find its corresponding libraries even when there are multiple Python versions on the same machine. We add one name to ``sys.path``. On Unix, the directory is ``sys.prefix + "/lib"``, and the file name is ``"python%s%s.zip" % (sys.version[0], sys.version[2])``. So for Python 2.2 and prefix ``/usr/local``, the path ``/usr/local/lib/python2.2/`` is already on ``sys.path``, and ``/usr/local/lib/python22.zip`` would be added. On Windows, the file is the full path to ``python22.dll``, with "dll" replaced by "zip". The zip archive name is always inserted as the second item in ``sys.path``. The first is the directory of the ``main.py`` (thanks Tim). Directory Imports ================= The static Python dictionary used to speed up zip imports can be used to speed up normal directory imports too. For each item in ``sys.path`` that is not a zip archive, we call ``os.listdir``, and add the directory contents to the dictionary. Then instead of calling ``fopen()`` in a double loop, we just check the dictionary. This greatly speeds up imports. If ``os.listdir`` doesn't exist, the dictionary is not used. Benchmarks ========== ==== ================= ================= ========== ========== Case Original 2.2a3 Using os.listdir Zip Uncomp Zip Compr ==== ================= ================= ========== ========== 1 3.2 2.5 3.2->1.02 2.3 2.5 2.3->0.87 1.66->0.93 1.5->1.07 2 2.8 3.9 3.0->1.32 Same as Case 1. 3 5.7 5.7 5.7->5.7 2.1 2.1 2.1->1.8 1.25->0.99 1.19->1.13 4 9.4 9.4 9.3->9.35 Same as Case 3. ==== ================= ================= ========== ========== Case 1: Local drive C:, ``sys.path`` has its default value. Case 2: Local drive C:, directory with files is at the end of ``sys.path``. Case 3: Network drive, ``sys.path`` has its default value. Case 4: Network drive, directory with files is at the end of ``sys.path``. Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg. The machine was running Windows 2000 with a Linux/Samba network server. Times are in seconds, and are the time to import about 100 Lib modules. Case 2 and 4 have the "correct" directory moved to the end of ``sys.path``. "Uncomp" means uncompressed zip archive, "Compr" means compressed. Initial times are after a re-boot of the system; the time after "->" is the time after repeated runs. Times to import from C: after a re-boot are rather highly variable for the "Original" case, but are more realistic. Custom Imports ============== The logic demonstrates the ability to import using default searching until a needed Python module (in this case, ``os``) becomes available. This can be used to bootstrap custom importers. For example, if "``importer()``" in ``__init__.py`` exists, then it could be used for imports. The "``importer()``" can freely import os and other modules, and these will be satisfied from the default mechanism. This PEP does not define any custom importers, and this note is for information only. Implementation ============== A C implementation is available as SourceForge patch 492105. Superseded by patch 652586 and current CVS. [2]_ A newer version (updated for recent CVS by Paul Moore) is 645650. Superseded by patch 652586 and current CVS. [3]_ A competing implementation by Just van Rossum is 652586, which is the basis for the final implementation of :pep:`302`. :pep:`273` has been implemented using :pep:`302`'s import hooks. [1]_ References ========== .. [1] Just van Rossum, New import hooks + Import from Zip files https://bugs.python.org/issue652586 .. [2] Import from Zip archive, James C. Ahlstrom https://bugs.python.org/issue492105 .. [3] Import from Zip Archive, Paul Moore https://bugs.python.org/issue645650 Copyright ========= This document has been placed in the public domain.