Fix backward-compatibility hole described by Jeff Hardy in:

http://mail.python.org/pipermail/python-dev/2011-July/112370.html

Using the approach described here:

  http://mail.python.org/pipermail/python-dev/2011-July/112374.html

This should now restrict backward-compatibility concerns to tool-support
questions, unless somebody comes up with another way to break it.  ;-)
This commit is contained in:
pje 2011-07-20 14:48:00 -04:00
parent 62a9a418f9
commit d65528918f
1 changed files with 135 additions and 56 deletions

View File

@ -339,17 +339,57 @@ it. If this is done *only* by importing a top-level module (i.e., not
checking for a ``__version__`` or some other attribute), *and* there
is a directory of the same name as the sought-for package on
``sys.path`` somewhere, *and* the package is not actually installed,
then such code could *perhaps* be fooled into thinking a package is
installed that really isn't.
then such code could be fooled into thinking a package is installed
that really isn't.
However, even in the rare case where all these conditions line up to
happen at once, the failure is more likely to be annoying than
damaging. In most cases, after all, the code will simply fail a
little later on, when it actually tries to DO something with the
imported (but empty) module. (And code that checks ``__version__``
attributes or for the presence of some desired function, class, or
module in the package will not see a false positive result in the
first place.)
For example, suppose someone writes a script (``datagen.py``)
containing the following code::
try:
import json
except ImportError:
import simplejson as json
And runs it in a directory laid out like this::
datagen.py
json/
foo.js
bar.js
If ``import json`` succeeded due to the mere presence of the ``json/``
subdirectory, the code would incorrectly believe that the ``json``
module was available, and proceed to fail with an error.
However, we can prevent corner cases like these from arising, simply
by making one small change to the algorithm presented so far. Instead
of allowing you to import a "pure virtual" package (like ``zc``),
we allow only importing of the *contents* of virtual packages.
That is, a statement like ``import zc`` should raise ``ImportError``
if there is no ``zc.py`` or ``zc/__init__.py`` on ``sys.path``. But,
doing ``import zc.buildout`` should still succeed, as long as there's
a ``zc/buildout.py`` or ``zc/buildout/__init__.py`` on ``sys.path``.
In other words, we don't allow pure virtual packages to be imported
directly, only modules and self-contained packages. (This is an
acceptable limitation, because there is no *functional* value to
importing such a package by itself. After all, the module object
will have no *contents* until you import at least one of its
subpackages or submodules!)
Once ``zc.buildout`` has been successfully imported, though, there
*will* be a ``zc`` module in ``sys.modules``, and trying to import it
will of course succeed. We are only preventing an *initial* import
from succeeding, in order to prevent false-positive import successes
when clashing subdirectories are present on ``sys.path``.
So, with this slight change, the ``datagen.py`` example above will
work correctly. When it does ``import json``, the mere presence of a
``json/`` directory will simply not affect the import process at all,
even if it contains ``.py`` files. The ``json/`` directory will still
only be searched in the case where an import like ``import
json.converter`` is attempted.
Meanwhile, tools that expect to locate packages and modules by
walking a directory tree can be updated to use the existing
@ -361,41 +401,54 @@ packages in memory should use the other APIs described in the
Specification
=============
Two changes are made to the existing import process.
A change is made to the existing import process, when importing
names containing at least one ``.`` -- that is, imports of modules
that have a parent package.
First, the built-in ``__import__`` function must not raise an
``ImportError`` when importing a submodule of a module with no
``__path__``. Instead, it must attempt to *create* a ``__path__``
attribute for the parent module first, as described in `__path__
creation`_, below.
Specifically, if the parent package does not exist, or exists but
lacks a ``__path__`` attribute, an attempt is first made to create a
"virtual path" for the parent package (following the algorithm
described in the section on `virtual paths`_, below).
Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent
package ``__path__``) fails to find a module being imported, the
import process must attempt to create a ``__path__`` attribute for
the missing module. If the attempt succeeds, an empty module is
created and its ``__path__`` is set. Otherwise, importing fails.
If the computed "virtual path" is empty, an ``ImportError`` results,
just as it would today. However, if a non-empty virtual path is
obtained, the normal import of the submodule or subpackage proceeds,
using that virtual path to find the submodule or subpackage. (Just
as it would have with the parent's ``__path__``, if the parent package
had existed and had a ``__path__``.)
In both of the above cases, if a non-empty ``__path__`` is created,
the name of the module whose ``__path__`` was created is added to
``sys.virtual_packages`` -- an initially-empty ``set()`` of package
names.
When a submodule or subpackage is found (but not yet loaded),
the parent package is created and added to ``sys.modules`` (if it
didn't exist before), and its ``__path__`` is set to the computed
virtual path (if it wasn't already set).
(This way, code that extends ``sys.path`` at runtime can find out
what virtual packages are currently imported, and thereby add any
new subdirectories to those packages' ``__path__`` attributes. See
`Standard Library Changes/Additions`_ below for more details.)
In this way, when the actual loading of the submodule or subpackage
occurs, it will see a parent package existing, and any relative
imports will work correctly. However, if no submodule or subpackage
exists, then the parent package will *not* be created, nor will a
standalone module be converted into a package (by the addition of a
spurious ``__path__`` attribute).
Conversely, if an empty ``__path__`` results, an ``ImportError``
is immediately raised, and the module is not created or changed, nor
is its name added to ``sys.virtual_packages``.
Note, by the way, that this change must be applied *recursively*: that
is, if ``foo`` and ``foo.bar`` are pure virtual packages, then
``import foo.bar.baz`` must wait until ``foo.bar.baz`` is found before
creating module objects for *both* ``foo`` and ``foo.bar``, and then
create both of them together, properly setting the ``foo`` module's
``.bar`` attrbute to point to the ``foo.bar``module.
In this way, pure virtual packages are never directly importable:
an ``import foo`` or ``import foo.bar`` by itself will fail, and the
corresponding modules will not appear in ``sys.modules`` until they
are needed to point to a *successfully* imported submodule or
self-contained subpackage.
``__path__`` Creation
---------------------
Virtual Paths
-------------
A virtual ``__path__`` is created by obtaining a PEP 302 "importer"
object for each of the path entries found in ``sys.path`` (for a
top-level module) or the parent ``__path__`` (for a submodule).
A virtual path is created by obtaining a PEP 302 "importer" object for
each of the path entries found in ``sys.path`` (for a top-level
module) or the parent ``__path__`` (for a submodule).
(Note: because ``sys.meta_path`` importers are not associated with
``sys.path`` or ``__path__`` entry strings, such importers do *not*
@ -403,18 +456,34 @@ participate in this process.)
Each importer is checked for a ``get_subpath()`` method, and if
present, the method is called with the full name of the module/package
the ``__path__`` is being constructed for. The return value is either
a string representing a subdirectory for the requested package, or
the path is being constructed for. The return value is either a
string representing a subdirectory for the requested package, or
``None`` if no such subdirectory exists.
The strings returned by the importers are added to the ``__path__``
The strings returned by the importers are added to the path list
being built, in the same order as they are found. (``None`` values
and missing ``get_subpath()`` methods are simply skipped.)
In Python code, the algorithm would look something like this::
The resulting list (whether empty or not) is then stored in a
``sys.virtual_package_paths`` dictionary, keyed by module name.
This dictionary has two purposes. First, it serves as a cache, in
the event that more than one attempt is made to import a submodule
of a virtual package.
Second, and more importantly, the dictionary can be used by code that
extends ``sys.path`` at runtime to *update* imported packages'
``__path__`` attributes accordingly. (See `Standard Library
Changes/Additions`_ below for more details.)
In Python code, the virtual path construction algorithm would look
something like this::
def get_virtual_path(modulename, parent_path=None):
if modulename in sys.virtual_package_paths:
return sys.virtual_package_paths[modulename]
if parent_path is None:
parent_path = sys.path
@ -429,6 +498,7 @@ In Python code, the algorithm would look something like this::
if subpath is not None:
path.append(subpath)
sys.virtual_package_paths[modulename] = path
return path
And a function like this one should be exposed in the standard
@ -453,19 +523,25 @@ Specifically the proposed changes and additions to ``pkgutil`` are:
path.
The implementation of this function does a simple top-down traversal
of ``sys.virtual_packages``, and performs any necessary
``get_subpath()`` calls to identify what path entries need to
be added to each package's ``__path__``, given that `path_entry`
of ``sys.virtual_package_paths``, and performs any necessary
``get_subpath()`` calls to identify what path entries need to be
added to the virtual path for that package, given that `path_entry`
has been added to ``sys.path``. (Or, in the case of sub-packages,
adding a derived subpath entry, based on their parent namespace's
``__path__``.)
adding a derived subpath entry, based on their parent package's
virtual path.)
(Note: this function must update both the path values in
``sys.virtual_package_paths`` as well as the ``__path__`` attributes
of any corresponding modules in ``sys.modules``, even though in the
common case they will both be the same ``list`` object.)
* A new ``iter_virtual_packages(parent='')`` function to allow
top-down traversal of virtual packages in ``sys.virtual_packages``,
by yielding the child virtual packages of `parent`. For example,
calling ``iter_virtual_packages("zope")`` might yield ``zope.app``
and ``zope.products`` (if they are imported virtual packages listed
in ``sys.virtual_packages``), but **not** ``zope.foo.bar``.
top-down traversal of virtual packages from
``sys.virtual_package_paths``, by yielding the child virtual
packages of `parent`. For example, calling
``iter_virtual_packages("zope")`` might yield ``zope.app``
and ``zope.products`` (if they are virtual packages listed in
``sys.virtual_package_paths``), but **not** ``zope.foo.bar``.
(This function is needed to implement ``extend_virtual_paths()``,
but is also potentially useful for other code that needs to inspect
imported virtual packages.)
@ -500,10 +576,11 @@ For users, developers, and distributors of virtual packages:
and do other things that make more sense for a self-contained
project than for a mere "namespace" package.
* ``sys.virtual_packages`` is allowed to contain non-existent or
not-yet-imported package names; code that uses its contents should
not assume that every name in this set is also present in
``sys.modules`` or that importing the name will necessarily succeed.
* ``sys.virtual_package_paths`` is allowed to contain entries for
non-existent or not-yet-imported package names; code that uses its
contents should not assume that every key in this dictionary is also
present in ``sys.modules`` or that importing the name will
necessarily succeed.
* If you are changing a currently self-contained package into a
virtual one, it's important to note that you can no longer use its
@ -539,7 +616,9 @@ For those implementing PEP \302 importer objects:
XXX This might list a lot of not-really-packages. Should we
require importable contents to exist? If so, how deep do we
search, and how do we prevent e.g. link loops, or traversing onto
different filesystems, etc.? Ick.
different filesystems, etc.? Ick. Also, if virtual packages are
listed, they still can't be *imported*, which is a problem for the
way that ``pkgutil.walk_modules()`` is currently implemented.
* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
not need to implement ``get_subpath()``, because the method