Flesh out a more aggressive builtin-caching variation.

This commit is contained in:
Tim Peters 2002-02-11 07:05:01 +00:00
parent 27dd1ffaf0
commit 805ab456a8
1 changed files with 189 additions and 2 deletions

View File

@ -183,7 +183,8 @@ Description
name of the corresponding global and this name is used to index the name of the corresponding global and this name is used to index the
function's globals dict. function's globals dict.
Additional ideas:
Additional Ideas
- Never make func_cell a NULL pointer; instead, make up an array - Never make func_cell a NULL pointer; instead, make up an array
of empty cells, so that LOAD_GLOBAL_CELL can index func_cells of empty cells, so that LOAD_GLOBAL_CELL can index func_cells
@ -204,7 +205,193 @@ Description
return obj # Existing global return obj # Existing global
return c.cellptr.objptr # Built-in or NULL return c.cellptr.objptr # Built-in or NULL
XXX Incorporate Tim's most recent posts. (Tim, can you do this?) - Be more aggressive: put the actual values of builtins into module
dicts, not just pointers to cells containing the actual values.
There are two points to this: (1) Simplify and speed access, which
is the most common operation. (2) Support faithful emulation of
extreme existing corner cases.
WRT #2, the set of builtins in the scheme above is captured at the
time a module dict is first created. Mutations to the set of builtin
names following that don't get reflected in the module dicts. Example:
consider files main.py and cheater.py:
[main.py]
import cheater
def f():
cheater.cheat()
return pachinko()
print f()
[cheater.py]
def cheat():
import __builtin__
__builtin__.pachinko = lambda: 666
If main.py is run under Python 2.2 (or before), 666 is printed. But
under the proposal, __builtin__.pachinko doesn't exist at the time
main's __dict__ is initialized. When the function object for
f is created, main.__dict__ grows a pachinko cell mapping to two
NULLs. When cheat() is called, __builtin__.__dict__ grows a pachinko
cell too, but main.__dict__ doesn't know-- and will never know --about
that. When f's return stmt references pachinko, in will still find
the double-NULLs in main.__dict__'s pachinko cell, and so raise
NameError.
A similar (in cause) break in compatibility can occur if a module
global foo is del'ed, but a builtin foo was created prior to that
but after the module dict was first created. Then the builtin foo
becomes visible in the module under 2.2 and before, but remains
invisible under the proposal.
Mutating builtins is extremely rare (most programs never mutate the
builtins, and it's hard to imagine a plausible use for frequent
mutation of the builtins -- I've never seen or heard of one), so it
doesn't matter how expensive mutating the builtins becomes. OTOH,
referencing globals and builtins is very common. Combining those
observations suggests a more aggressive caching of builtins in module
globals, speeding access at the expense of making mutations of the
builtins (potentially much) more expensive to keep the caches in
synch.
Much of the scheme above remains the same, and most of the rest is
just a little different. A cell changes to:
class cell(object):
def __init__(self, obj=NULL, builtin=0):
self.objptr = obj
self.buitinflag = builtin
and a celldict maps strings to this version of cells. builtinflag
is true when and only when objptr contains a value obtained from
the builtins; in other words, it's true when and only when a cell
is acting as a cached value. When builtinflag is false, objptr is
the value of a module global (possibly NULL). celldict changes to:
class celldict(object):
def __init__(self, builtindict=()):
self.basedict = builtindict
self.__dict = d = {}
for k, v in builtindict.items():
d[k] = cell(v, 1)
def __getitem__(self, key):
c = self.__dict.get(key)
if c is None or c.objptr is NULL or c.builtinflag:
raise KeyError, key
return c.objptr
def __setitem__(self, key, value):
c = self.__dict.get(key)
if c is None:
c = cell()
self.__dict[key] = c
c.objptr = value
c.builtinflag = 0
def __delitem__(self, key):
c = self.__dict.get(key)
if c is None or c.objptr is NULL or c.builtinflag:
raise KeyError, key
c.objptr = NULL
# We may have unmasked a builtin. Note that because
# we're checking the builtin dict for that *now*, this
# still works if the builtin first came into existence
# after we were constructed. Note too that del on
# namespace dicts is rare, so the expensse of this check
# shouldn't matter.
if key in self.basedict:
c.objptr = self.basedict[key]
assert c.objptr is not NULL # else "in" lied
c.buitinflag = 1
else:
# There is no builtin with the same name.
assert not c.buitinflag
def keys(self):
return [k for k, c in self.__dict.iteritems()
if c.objptr is not NULL and not c.buitinflag]
def items(self):
return [k, c.objptr for k, c in self.__dict.iteritems()
if c.objptr is not NULL and not c.buitinflag]
def values(self):
preturn [c.objptr for c in self.__dict.itervalues()
if c.objptr is not NULL and not c.buitinflag]
def clear(self):
for c in self.__dict.values():
if not c.buitinflag:
c.objptr = NULL
# Etc.
The speed benefit comes from simplifying LOAD_GLOBAL_CELL, which
I expect is executed more frequently than all other namespace
operations combined:
def LOAD_GLOBAL_CELL(self, i):
# self is the frame
c = self.func_cells[i]
return c.objptr # may be NULL (also true before)
That is, accessing builtins and accessing module globals are equally
fast. For module globals, a NULL-pointer test+branch is saved. For
builtins, an additional pointer chase is also saved.
The other part needed to make this fly is expensive, propagating
mutations of builtins into the module dicts that were initialized
from the builtins. This is much like, in 2.2, propagating changes
in new-style base classes to their descendants: the builtins need to
maintain a list of weakrefs to the modules (or module dicts)
initialized from the builtin's dict. Given a mutation to the builtin
dict (adding a new key, changing the value associated with an
existing key, or deleting a key), traverse the list of module dicts
and make corresponding mutations to them. This is straightforward;
for example, if a key is deleted from builtins, execute
reflect_bltin_del in each module:
def reflect_bltin_del(self, key):
c = self.__dict.get(key)
assert c is not None # else we were already out of synch
if c.buitinflag:
# Put us back in synch.
c.objptr = NULL
c.buitinflag = 0
# Else we're shadowing the builtin, so don't care that
# the builtin went away.
Note that c.buitinflag protects from us erroneously deleting a
module global of the same name. Adding a new (key, value) builtin
pair is similar:
def reflect_bltin_new(self, key, value):
c = self.__dict.get(key)
if c is None:
# Never heard of it before: cache the builtin value.
self.__dict[key] = cell(value, 1)
elif c.objptr is NULL:
# This used to exist in the module or the builtins,
# but doesn't anymore; rehabilitate it.
assert not c.builtinflag
c.objptr = value
c.buitinflag = 1
else:
# We're shadowing it already.
assert not c.buitinflag
Changing the value of an existing builtin can be viewed as deleting
the name, then adding it again. Indeed, since mutating builtins is
so rare, that's probably the right way to implement it too (speed
doesn't matter here):
def reflect_bltin_change(self, key, newvalue):
assert key in self.__dict
self.reflect_bltin_del(key)
self.reflect_bltin_new(key, newvalue)
Comparison Comparison