245 lines
8.8 KiB
Plaintext
245 lines
8.8 KiB
Plaintext
PEP: 273
|
|
Title: Import Modules from Zip Archives
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: jim@interet.com (James C. Ahlstrom)
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 11-Oct-2001
|
|
Python-Version: 2.3
|
|
Post-History: 26-Oct-2001
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP adds the ability to import Python modules
|
|
``*.py``, ``*.py[co]`` and packages from zip archives. The
|
|
same code is used to speed up normal directory imports
|
|
provided ``os.listdir`` is available.
|
|
|
|
|
|
Note
|
|
====
|
|
|
|
Zip imports were added to Python 2.3, but the final implementation
|
|
uses an approach different from the one described in this PEP.
|
|
The 2.3 implementation is SourceForge patch #652586 [1]_, which adds
|
|
new import hooks described in PEP 302.
|
|
|
|
The rest of this PEP is therefore only of historical interest.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Currently, ``sys.path`` is a list of directory names as strings. If
|
|
this PEP is implemented, an item of ``sys.path`` can be a string
|
|
naming a zip file archive. The zip archive can contain a
|
|
subdirectory structure to support package imports. The zip
|
|
archive satisfies imports exactly as a subdirectory would.
|
|
|
|
The implementation is in C code in the Python core and works on
|
|
all supported Python platforms.
|
|
|
|
Any files may be present in the zip archive, but only files
|
|
``*.py`` and ``*.py[co]`` are available for import. Zip import of
|
|
dynamic modules (``*.pyd``, ``*.so``) is disallowed.
|
|
|
|
Just as ``sys.path`` currently has default directory names, a default
|
|
zip archive name is added too. Otherwise there is no way to
|
|
import all Python library files from an archive.
|
|
|
|
|
|
Subdirectory Equivalence
|
|
========================
|
|
|
|
The zip archive must be treated exactly as a subdirectory tree so
|
|
we can support package imports based on current and future rules.
|
|
All zip data is taken from the Central Directory, the data must be
|
|
correct, and brain dead zip files are not accommodated.
|
|
|
|
Suppose ``sys.path`` contains "/A/B/SubDir" and "/C/D/E/Archive.zip",
|
|
and we are trying to import ``modfoo`` from the ``Q`` package. Then
|
|
``import.c`` will generate a list of paths and extensions and will
|
|
look for the file. The list of generated paths does not change
|
|
for zip imports. Suppose ``import.c`` generates the path
|
|
"/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path
|
|
"/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is
|
|
exactly equivalent to finding "Q/R/modfoo.pyc" in the archive.
|
|
|
|
Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then
|
|
your zip file will satisfy imports just as your subdirectory did.
|
|
|
|
Well, not quite. You can't satisfy dynamic modules from a zip
|
|
file. Dynamic modules have extensions like ``.dll``, ``.pyd``, and ``.so``.
|
|
They are operating system dependent, and probably can't be loaded
|
|
except from a file. It might be possible to extract the dynamic
|
|
module from the zip file, write it to a plain file and load it.
|
|
But that would mean creating temporary files, and dealing with all
|
|
the ``dynload_*.c``, and that's probably not a good idea.
|
|
|
|
When trying to import ``*.pyc``, if it is not available then
|
|
``*.pyo`` will be used instead. And vice versa when looking for ``*.pyo``.
|
|
If neither ``*.pyc`` nor ``*.pyo`` is available, or if the magic numbers
|
|
are invalid, then ``*.py`` will be compiled and used to satisfy the
|
|
import, but the compiled file will not be saved. Python would
|
|
normally write it to the same directory as ``*.py``, but surely we
|
|
don't want to write to the zip file. We could write to the
|
|
directory of the zip archive, but that would clutter it up, not
|
|
good if it is ``/usr/bin`` for example.
|
|
|
|
Failing to write the compiled files will make zip imports very slow,
|
|
and the user will probably not figure out what is wrong. So it
|
|
is best to put ``*.pyc`` and ``*.pyo`` in the archive with the ``*.py``.
|
|
|
|
|
|
Efficiency
|
|
==========
|
|
|
|
The only way to find files in a zip archive is linear search. So
|
|
for each zip file in ``sys.path``, we search for its names once, and
|
|
put the names plus other relevant data into a static Python
|
|
dictionary. The key is the archive name from ``sys.path`` joined with
|
|
the file name (including any subdirectories) within the archive.
|
|
This is exactly the name generated by ``import.c``, and makes lookup
|
|
easy.
|
|
|
|
This same mechanism is used to speed up directory (non-zip) imports.
|
|
See below.
|
|
|
|
|
|
zlib
|
|
====
|
|
|
|
Compressed zip archives require ``zlib`` for decompression. Prior to
|
|
any other imports, we attempt an import of ``zlib``. Import of
|
|
compressed files will fail with a message "missing ``zlib``" unless
|
|
``zlib`` is available.
|
|
|
|
|
|
Booting
|
|
=======
|
|
|
|
Python imports ``site.py`` itself, and this imports ``os``, ``nt``, ``ntpath``,
|
|
``stat``, and ``UserDict``. It also imports ``sitecustomize.py`` which may
|
|
import more modules. Zip imports must be available before ``site.py``
|
|
is imported.
|
|
|
|
Just as there are default directories in ``sys.path``, there must be
|
|
one or more default zip archives too.
|
|
|
|
The problem is what the name should be. The name should be linked
|
|
with the Python version, so the Python executable can correctly
|
|
find its corresponding libraries even when there are multiple
|
|
Python versions on the same machine.
|
|
|
|
We add one name to ``sys.path``. On Unix, the directory is
|
|
``sys.prefix + "lib"``, and the file name is
|
|
``"python%s%s.zip" % (sys.version[0], sys.version[2])``.
|
|
So for Python 2.2 and prefix ``/usr/local``, the path
|
|
``/usr/local/lib/python2.2/`` is already on ``sys.path``, and
|
|
``/usr/local/lib/python22.zip`` would be added.
|
|
On Windows, the file is the full path to ``python22.dll``, with
|
|
"dll" replaced by "zip". The zip archive name is always inserted
|
|
as the second item in ``sys.path``. The first is the directory of the
|
|
``main.py`` (thanks Tim).
|
|
|
|
|
|
Directory Imports
|
|
=================
|
|
|
|
The static Python dictionary used to speed up zip imports can be
|
|
used to speed up normal directory imports too. For each item in
|
|
``sys.path`` that is not a zip archive, we call ``os.listdir``, and add
|
|
the directory contents to the dictionary. Then instead of calling
|
|
``fopen()`` in a double loop, we just check the dictionary. This
|
|
greatly speeds up imports. If ``os.listdir`` doesn't exist, the
|
|
dictionary is not used.
|
|
|
|
|
|
Benchmarks
|
|
==========
|
|
|
|
==== ================= ================= ========== ==========
|
|
Case Original 2.2a3 Using os.listdir Zip Uncomp Zip Compr
|
|
==== ================= ================= ========== ==========
|
|
1 3.2 2.5 3.2->1.02 2.3 2.5 2.3->0.87 1.66->0.93 1.5->1.07
|
|
2 2.8 3.9 3.0->1.32 Same as Case 1.
|
|
3 5.7 5.7 5.7->5.7 2.1 2.1 2.1->1.8 1.25->0.99 1.19->1.13
|
|
4 9.4 9.4 9.3->9.35 Same as Case 3.
|
|
==== ================= ================= ========== ==========
|
|
|
|
Case 1: Local drive C:, ``sys.path`` has its default value.
|
|
Case 2: Local drive C:, directory with files is at the end of ``sys.path``.
|
|
Case 3: Network drive, ``sys.path`` has its default value.
|
|
Case 4: Network drive, directory with files is at the end of ``sys.path``.
|
|
|
|
Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg.
|
|
The machine was running Windows 2000 with a Linux/Samba network server.
|
|
Times are in seconds, and are the time to import about 100 Lib modules.
|
|
Case 2 and 4 have the "correct" directory moved to the end of ``sys.path``.
|
|
"Uncomp" means uncompressed zip archive, "Compr" means compressed.
|
|
|
|
Initial times are after a re-boot of the system; the time after
|
|
"->" is the time after repeated runs. Times to import from C:
|
|
after a re-boot are rather highly variable for the "Original" case,
|
|
but are more realistic.
|
|
|
|
|
|
Custom Imports
|
|
==============
|
|
|
|
The logic demonstrates the ability to import using default searching
|
|
until a needed Python module (in this case, os) becomes available.
|
|
This can be used to bootstrap custom importers. For example, if
|
|
"``importer()``" in ``__init__.py`` exists, then it could be used for imports.
|
|
The "``importer()``" can freely import os and other modules, and these
|
|
will be satisfied from the default mechanism. This PEP does not
|
|
define any custom importers, and this note is for information only.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
A C implementation is available as SourceForge patch 492105.
|
|
Superseded by patch 652586 and current CVS. [2]_
|
|
|
|
A newer version (updated for recent CVS by Paul Moore) is 645650.
|
|
Superseded by patch 652586 and current CVS. [3]_
|
|
|
|
A competing implementation by Just van Rossum is 652586, which is
|
|
the basis for the final implementation of PEP 302. PEP 273 has
|
|
been implemented using PEP 302's import hooks. [1]_
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Just van Rossum, New import hooks + Import from Zip files
|
|
https://bugs.python.org/issue652586
|
|
|
|
.. [2] Import from Zip archive, James C. Ahlstrom
|
|
https://bugs.python.org/issue492105
|
|
|
|
.. [3] Import from Zip Archive, Paul Moore
|
|
https://bugs.python.org/issue645650
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
fill-column: 70
|
|
End:
|