287 lines
9.7 KiB
ReStructuredText
287 lines
9.7 KiB
ReStructuredText
PEP: 267
|
|
Title: Optimized Access to Module Namespaces
|
|
Author: Jeremy Hylton <jeremy@alum.mit.edu>
|
|
Status: Deferred
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 23-May-2001
|
|
Python-Version: 2.2
|
|
Post-History:
|
|
|
|
|
|
Deferral
|
|
========
|
|
|
|
While this PEP is a nice idea, no-one has yet emerged to do the work of
|
|
hashing out the differences between this PEP, :pep:`266` and :pep:`280`.
|
|
Hence, it is being deferred.
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes a new implementation of global module namespaces
|
|
and the builtin namespace that speeds name resolution. The
|
|
implementation would use an array of object pointers for most
|
|
operations in these namespaces. The compiler would assign indices
|
|
for global variables and module attributes at compile time.
|
|
|
|
The current implementation represents these namespaces as
|
|
dictionaries. A global name incurs a dictionary lookup each time
|
|
it is used; a builtin name incurs two dictionary lookups, a failed
|
|
lookup in the global namespace and a second lookup in the builtin
|
|
namespace.
|
|
|
|
This implementation should speed Python code that uses
|
|
module-level functions and variables. It should also eliminate
|
|
awkward coding styles that have evolved to speed access to these
|
|
names.
|
|
|
|
The implementation is complicated because the global and builtin
|
|
namespaces can be modified dynamically in ways that are impossible
|
|
for the compiler to detect. (Example: A module's namespace is
|
|
modified by a script after the module is imported.) As a result,
|
|
the implementation must maintain several auxiliary data structures
|
|
to preserve these dynamic features.
|
|
|
|
|
|
Introduction
|
|
============
|
|
|
|
This PEP proposes a new implementation of attribute access for
|
|
module objects that optimizes access to module variables known at
|
|
compile time. The module will store these variables in an array
|
|
and provide an interface to lookup attributes using array offsets.
|
|
For globals, builtins, and attributes of imported modules, the
|
|
compiler will generate code that uses the array offsets for fast
|
|
access.
|
|
|
|
[describe the key parts of the design: dlict, compiler support,
|
|
stupid name trick workarounds, optimization of other module's
|
|
globals]
|
|
|
|
The implementation will preserve existing semantics for module
|
|
namespaces, including the ability to modify module namespaces at
|
|
runtime in ways that affect the visibility of builtin names.
|
|
|
|
|
|
DLict design
|
|
============
|
|
|
|
The namespaces are implemented using a data structure that has
|
|
sometimes gone under the name ``dlict``. It is a dictionary that has
|
|
numbered slots for some dictionary entries. The type must be
|
|
implemented in C to achieve acceptable performance. The new
|
|
type-class unification work should make this fairly easy. The
|
|
``DLict`` will presumably be a subclass of dictionary with an
|
|
alternate storage module for some keys.
|
|
|
|
A Python implementation is included here to illustrate the basic
|
|
design::
|
|
|
|
"""A dictionary-list hybrid"""
|
|
|
|
import types
|
|
|
|
class DLict:
|
|
def __init__(self, names):
|
|
assert isinstance(names, types.DictType)
|
|
self.names = {}
|
|
self.list = [None] * size
|
|
self.empty = [1] * size
|
|
self.dict = {}
|
|
self.size = 0
|
|
|
|
def __getitem__(self, name):
|
|
i = self.names.get(name)
|
|
if i is None:
|
|
return self.dict[name]
|
|
if self.empty[i] is not None:
|
|
raise KeyError, name
|
|
return self.list[i]
|
|
|
|
def __setitem__(self, name, val):
|
|
i = self.names.get(name)
|
|
if i is None:
|
|
self.dict[name] = val
|
|
else:
|
|
self.empty[i] = None
|
|
self.list[i] = val
|
|
self.size += 1
|
|
|
|
def __delitem__(self, name):
|
|
i = self.names.get(name)
|
|
if i is None:
|
|
del self.dict[name]
|
|
else:
|
|
if self.empty[i] is not None:
|
|
raise KeyError, name
|
|
self.empty[i] = 1
|
|
self.list[i] = None
|
|
self.size -= 1
|
|
|
|
def keys(self):
|
|
if self.dict:
|
|
return self.names.keys() + self.dict.keys()
|
|
else:
|
|
return self.names.keys()
|
|
|
|
def values(self):
|
|
if self.dict:
|
|
return self.names.values() + self.dict.values()
|
|
else:
|
|
return self.names.values()
|
|
|
|
def items(self):
|
|
if self.dict:
|
|
return self.names.items()
|
|
else:
|
|
return self.names.items() + self.dict.items()
|
|
|
|
def __len__(self):
|
|
return self.size + len(self.dict)
|
|
|
|
def __cmp__(self, dlict):
|
|
c = cmp(self.names, dlict.names)
|
|
if c != 0:
|
|
return c
|
|
c = cmp(self.size, dlict.size)
|
|
if c != 0:
|
|
return c
|
|
for i in range(len(self.names)):
|
|
c = cmp(self.empty[i], dlict.empty[i])
|
|
if c != 0:
|
|
return c
|
|
if self.empty[i] is None:
|
|
c = cmp(self.list[i], dlict.empty[i])
|
|
if c != 0:
|
|
return c
|
|
return cmp(self.dict, dlict.dict)
|
|
|
|
def clear(self):
|
|
self.dict.clear()
|
|
for i in range(len(self.names)):
|
|
if self.empty[i] is None:
|
|
self.empty[i] = 1
|
|
self.list[i] = None
|
|
|
|
def update(self):
|
|
pass
|
|
|
|
def load(self, index):
|
|
"""dlict-special method to support indexed access"""
|
|
if self.empty[index] is None:
|
|
return self.list[index]
|
|
else:
|
|
raise KeyError, index # XXX might want reverse mapping
|
|
|
|
def store(self, index, val):
|
|
"""dlict-special method to support indexed access"""
|
|
self.empty[index] = None
|
|
self.list[index] = val
|
|
|
|
def delete(self, index):
|
|
"""dlict-special method to support indexed access"""
|
|
self.empty[index] = 1
|
|
self.list[index] = None
|
|
|
|
|
|
Compiler issues
|
|
===============
|
|
|
|
The compiler currently collects the names of all global variables
|
|
in a module. These are names bound at the module level or bound
|
|
in a class or function body that declares them to be global.
|
|
|
|
The compiler would assign indices for each global name and add the
|
|
names and indices of the globals to the module's code object.
|
|
Each code object would then be bound irrevocably to the module it
|
|
was defined in. (Not sure if there are some subtle problems with
|
|
this.)
|
|
|
|
For attributes of imported modules, the module will store an
|
|
indirection record. Internally, the module will store a pointer
|
|
to the defining module and the offset of the attribute in the
|
|
defining module's global variable array. The offset would be
|
|
initialized the first time the name is looked up.
|
|
|
|
|
|
Runtime model
|
|
=============
|
|
|
|
The PythonVM will be extended with new opcodes to access globals
|
|
and module attributes via a module-level array.
|
|
|
|
A function object would need to point to the module that defined
|
|
it in order to provide access to the module-level global array.
|
|
|
|
For module attributes stored in the ``dlict`` (call them static
|
|
attributes), the get/delattr implementation would need to track
|
|
access to these attributes using the old by-name interface. If a
|
|
static attribute is updated dynamically, e.g.::
|
|
|
|
mod.__dict__["foo"] = 2
|
|
|
|
The implementation would need to update the array slot instead of
|
|
the backup dict.
|
|
|
|
|
|
Backwards compatibility
|
|
=======================
|
|
|
|
The ``dlict`` will need to maintain meta-information about whether a
|
|
slot is currently used or not. It will also need to maintain a
|
|
pointer to the builtin namespace. When a name is not currently
|
|
used in the global namespace, the lookup will have to fail over to
|
|
the builtin namespace.
|
|
|
|
In the reverse case, each module may need a special accessor
|
|
function for the builtin namespace that checks to see if a global
|
|
shadowing the builtin has been added dynamically. This check
|
|
would only occur if there was a dynamic change to the module's
|
|
``dlict``, i.e. when a name is bound that wasn't discovered at
|
|
compile-time.
|
|
|
|
These mechanisms would have little if any cost for the common case
|
|
whether a module's global namespace is not modified in strange
|
|
ways at runtime. They would add overhead for modules that did
|
|
unusual things with global names, but this is an uncommon practice
|
|
and probably one worth discouraging.
|
|
|
|
It may be desirable to disable dynamic additions to the global
|
|
namespace in some future version of Python. If so, the new
|
|
implementation could provide warnings.
|
|
|
|
|
|
Related PEPs
|
|
============
|
|
|
|
:pep:`266`, Optimizing Global Variable/Attribute Access, proposes a
|
|
different mechanism for optimizing access to global variables as
|
|
well as attributes of objects. The mechanism uses two new opcodes
|
|
``TRACK_OBJECT`` and ``UNTRACK_OBJECT`` to create a slot in the local
|
|
variables array that aliases the global or object attribute. If
|
|
the object being aliases is rebound, the rebind operation is
|
|
responsible for updating the aliases.
|
|
|
|
The objecting tracking approach applies to a wider range of
|
|
objects than just module. It may also have a higher runtime cost,
|
|
because each function that uses a global or object attribute must
|
|
execute extra opcodes to register its interest in an object and
|
|
unregister on exit; the cost of registration is unclear, but
|
|
presumably involves a dynamically resizable data structure to hold
|
|
a list of callbacks.
|
|
|
|
The implementation proposed here avoids the need for registration,
|
|
because it does not create aliases. Instead it allows functions
|
|
that reference a global variable or module attribute to retain a
|
|
pointer to the location where the original binding is stored. A
|
|
second advantage is that the initial lookup is performed once per
|
|
module rather than once per function call.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|