285 lines
10 KiB
Plaintext
285 lines
10 KiB
Plaintext
PEP: 267
|
||
Title: Optimized Access to Module Namespaces
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: jeremy@alum.mit.edu (Jeremy Hylton)
|
||
Status: Deferred
|
||
Type: Standards Track
|
||
Created: 23-May-2001
|
||
Python-Version: 2.2
|
||
Post-History:
|
||
|
||
Deferral
|
||
|
||
While this PEP is a nice idea, no-one has yet emerged to do the work of
|
||
hashing out the differences between this PEP, PEP 266 and PEP 280.
|
||
Hence, it is being deferred.
|
||
|
||
|
||
Abstract
|
||
|
||
This PEP proposes a new implementation of global module namespaces
|
||
and the builtin namespace that speeds name resolution. The
|
||
implementation would use an array of object pointers for most
|
||
operations in these namespaces. The compiler would assign indices
|
||
for global variables and module attributes at compile time.
|
||
|
||
The current implementation represents these namespaces as
|
||
dictionaries. A global name incurs a dictionary lookup each time
|
||
it is used; a builtin name incurs two dictionary lookups, a failed
|
||
lookup in the global namespace and a second lookup in the builtin
|
||
namespace.
|
||
|
||
This implementation should speed Python code that uses
|
||
module-level functions and variables. It should also eliminate
|
||
awkward coding styles that have evolved to speed access to these
|
||
names.
|
||
|
||
The implementation is complicated because the global and builtin
|
||
namespaces can be modified dynamically in ways that are impossible
|
||
for the compiler to detect. (Example: A module's namespace is
|
||
modified by a script after the module is imported.) As a result,
|
||
the implementation must maintain several auxiliary data structures
|
||
to preserve these dynamic features.
|
||
|
||
|
||
Introduction
|
||
|
||
This PEP proposes a new implementation of attribute access for
|
||
module objects that optimizes access to module variables known at
|
||
compile time. The module will store these variables in an array
|
||
and provide an interface to lookup attributes using array offsets.
|
||
For globals, builtins, and attributes of imported modules, the
|
||
compiler will generate code that uses the array offsets for fast
|
||
access.
|
||
|
||
[describe the key parts of the design: dlict, compiler support,
|
||
stupid name trick workarounds, optimization of other module's
|
||
globals]
|
||
|
||
The implementation will preserve existing semantics for module
|
||
namespaces, including the ability to modify module namespaces at
|
||
runtime in ways that affect the visibility of builtin names.
|
||
|
||
|
||
DLict design
|
||
|
||
The namespaces are implemented using a data structure that has
|
||
sometimes gone under the name dlict. It is a dictionary that has
|
||
numbered slots for some dictionary entries. The type must be
|
||
implemented in C to achieve acceptable performance. The new
|
||
type-class unification work should make this fairly easy. The
|
||
DLict will presumably be a subclass of dictionary with an
|
||
alternate storage module for some keys.
|
||
|
||
A Python implementation is included here to illustrate the basic
|
||
design:
|
||
|
||
"""A dictionary-list hybrid"""
|
||
|
||
import types
|
||
|
||
class DLict:
|
||
def __init__(self, names):
|
||
assert isinstance(names, types.DictType)
|
||
self.names = {}
|
||
self.list = [None] * size
|
||
self.empty = [1] * size
|
||
self.dict = {}
|
||
self.size = 0
|
||
|
||
def __getitem__(self, name):
|
||
i = self.names.get(name)
|
||
if i is None:
|
||
return self.dict[name]
|
||
if self.empty[i] is not None:
|
||
raise KeyError, name
|
||
return self.list[i]
|
||
|
||
def __setitem__(self, name, val):
|
||
i = self.names.get(name)
|
||
if i is None:
|
||
self.dict[name] = val
|
||
else:
|
||
self.empty[i] = None
|
||
self.list[i] = val
|
||
self.size += 1
|
||
|
||
def __delitem__(self, name):
|
||
i = self.names.get(name)
|
||
if i is None:
|
||
del self.dict[name]
|
||
else:
|
||
if self.empty[i] is not None:
|
||
raise KeyError, name
|
||
self.empty[i] = 1
|
||
self.list[i] = None
|
||
self.size -= 1
|
||
|
||
def keys(self):
|
||
if self.dict:
|
||
return self.names.keys() + self.dict.keys()
|
||
else:
|
||
return self.names.keys()
|
||
|
||
def values(self):
|
||
if self.dict:
|
||
return self.names.values() + self.dict.values()
|
||
else:
|
||
return self.names.values()
|
||
|
||
def items(self):
|
||
if self.dict:
|
||
return self.names.items()
|
||
else:
|
||
return self.names.items() + self.dict.items()
|
||
|
||
def __len__(self):
|
||
return self.size + len(self.dict)
|
||
|
||
def __cmp__(self, dlict):
|
||
c = cmp(self.names, dlict.names)
|
||
if c != 0:
|
||
return c
|
||
c = cmp(self.size, dlict.size)
|
||
if c != 0:
|
||
return c
|
||
for i in range(len(self.names)):
|
||
c = cmp(self.empty[i], dlict.empty[i])
|
||
if c != 0:
|
||
return c
|
||
if self.empty[i] is None:
|
||
c = cmp(self.list[i], dlict.empty[i])
|
||
if c != 0:
|
||
return c
|
||
return cmp(self.dict, dlict.dict)
|
||
|
||
def clear(self):
|
||
self.dict.clear()
|
||
for i in range(len(self.names)):
|
||
if self.empty[i] is None:
|
||
self.empty[i] = 1
|
||
self.list[i] = None
|
||
|
||
def update(self):
|
||
pass
|
||
|
||
def load(self, index):
|
||
"""dlict-special method to support indexed access"""
|
||
if self.empty[index] is None:
|
||
return self.list[index]
|
||
else:
|
||
raise KeyError, index # XXX might want reverse mapping
|
||
|
||
def store(self, index, val):
|
||
"""dlict-special method to support indexed access"""
|
||
self.empty[index] = None
|
||
self.list[index] = val
|
||
|
||
def delete(self, index):
|
||
"""dlict-special method to support indexed access"""
|
||
self.empty[index] = 1
|
||
self.list[index] = None
|
||
|
||
|
||
Compiler issues
|
||
|
||
The compiler currently collects the names of all global variables
|
||
in a module. These are names bound at the module level or bound
|
||
in a class or function body that declares them to be global.
|
||
|
||
The compiler would assign indices for each global name and add the
|
||
names and indices of the globals to the module's code object.
|
||
Each code object would then be bound irrevocably to the module it
|
||
was defined in. (Not sure if there are some subtle problems with
|
||
this.)
|
||
|
||
For attributes of imported modules, the module will store an
|
||
indirection record. Internally, the module will store a pointer
|
||
to the defining module and the offset of the attribute in the
|
||
defining module's global variable array. The offset would be
|
||
initialized the first time the name is looked up.
|
||
|
||
|
||
Runtime model
|
||
|
||
The PythonVM will be extended with new opcodes to access globals
|
||
and module attributes via a module-level array.
|
||
|
||
A function object would need to point to the module that defined
|
||
it in order to provide access to the module-level global array.
|
||
|
||
For module attributes stored in the dlict (call them static
|
||
attributes), the get/delattr implementation would need to track
|
||
access to these attributes using the old by-name interface. If a
|
||
static attribute is updated dynamically, e.g.
|
||
|
||
mod.__dict__["foo"] = 2
|
||
|
||
The implementation would need to update the array slot instead of
|
||
the backup dict.
|
||
|
||
|
||
Backwards compatibility
|
||
|
||
The dlict will need to maintain meta-information about whether a
|
||
slot is currently used or not. It will also need to maintain a
|
||
pointer to the builtin namespace. When a name is not currently
|
||
used in the global namespace, the lookup will have to fail over to
|
||
the builtin namespace.
|
||
|
||
In the reverse case, each module may need a special accessor
|
||
function for the builtin namespace that checks to see if a global
|
||
shadowing the builtin has been added dynamically. This check
|
||
would only occur if there was a dynamic change to the module's
|
||
dlict, i.e. when a name is bound that wasn't discovered at
|
||
compile-time.
|
||
|
||
These mechanisms would have little if any cost for the common case
|
||
whether a module's global namespace is not modified in strange
|
||
ways at runtime. They would add overhead for modules that did
|
||
unusual things with global names, but this is an uncommon practice
|
||
and probably one worth discouraging.
|
||
|
||
It may be desirable to disable dynamic additions to the global
|
||
namespace in some future version of Python. If so, the new
|
||
implementation could provide warnings.
|
||
|
||
|
||
Related PEPs
|
||
|
||
PEP 266, Optimizing Global Variable/Attribute Access, proposes a
|
||
different mechanism for optimizing access to global variables as
|
||
well as attributes of objects. The mechanism uses two new opcodes
|
||
TRACK_OBJECT and UNTRACK_OBJECT to create a slot in the local
|
||
variables array that aliases the global or object attribute. If
|
||
the object being aliases is rebound, the rebind operation is
|
||
responsible for updating the aliases.
|
||
|
||
The objecting tracking approach applies to a wider range of
|
||
objects than just module. It may also have a higher runtime cost,
|
||
because each function that uses a global or object attribute must
|
||
execute extra opcodes to register its interest in an object and
|
||
unregister on exit; the cost of registration is unclear, but
|
||
presumably involves a dynamically resizable data structure to hold
|
||
a list of callbacks.
|
||
|
||
The implementation proposed here avoids the need for registration,
|
||
because it does not create aliases. Instead it allows functions
|
||
that reference a global variable or module attribute to retain a
|
||
pointer to the location where the original binding is stored. A
|
||
second advantage is that the initial lookup is performed once per
|
||
module rather than once per function call.
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
End:
|