Flesh out a more aggressive builtin-caching variation.
This commit is contained in:
parent
27dd1ffaf0
commit
805ab456a8
191
pep-0280.txt
191
pep-0280.txt
|
@ -183,7 +183,8 @@ Description
|
||||||
name of the corresponding global and this name is used to index the
|
name of the corresponding global and this name is used to index the
|
||||||
function's globals dict.
|
function's globals dict.
|
||||||
|
|
||||||
Additional ideas:
|
|
||||||
|
Additional Ideas
|
||||||
|
|
||||||
- Never make func_cell a NULL pointer; instead, make up an array
|
- Never make func_cell a NULL pointer; instead, make up an array
|
||||||
of empty cells, so that LOAD_GLOBAL_CELL can index func_cells
|
of empty cells, so that LOAD_GLOBAL_CELL can index func_cells
|
||||||
|
@ -204,7 +205,193 @@ Description
|
||||||
return obj # Existing global
|
return obj # Existing global
|
||||||
return c.cellptr.objptr # Built-in or NULL
|
return c.cellptr.objptr # Built-in or NULL
|
||||||
|
|
||||||
XXX Incorporate Tim's most recent posts. (Tim, can you do this?)
|
- Be more aggressive: put the actual values of builtins into module
|
||||||
|
dicts, not just pointers to cells containing the actual values.
|
||||||
|
|
||||||
|
There are two points to this: (1) Simplify and speed access, which
|
||||||
|
is the most common operation. (2) Support faithful emulation of
|
||||||
|
extreme existing corner cases.
|
||||||
|
|
||||||
|
WRT #2, the set of builtins in the scheme above is captured at the
|
||||||
|
time a module dict is first created. Mutations to the set of builtin
|
||||||
|
names following that don't get reflected in the module dicts. Example:
|
||||||
|
consider files main.py and cheater.py:
|
||||||
|
|
||||||
|
[main.py]
|
||||||
|
import cheater
|
||||||
|
def f():
|
||||||
|
cheater.cheat()
|
||||||
|
return pachinko()
|
||||||
|
print f()
|
||||||
|
|
||||||
|
[cheater.py]
|
||||||
|
def cheat():
|
||||||
|
import __builtin__
|
||||||
|
__builtin__.pachinko = lambda: 666
|
||||||
|
|
||||||
|
If main.py is run under Python 2.2 (or before), 666 is printed. But
|
||||||
|
under the proposal, __builtin__.pachinko doesn't exist at the time
|
||||||
|
main's __dict__ is initialized. When the function object for
|
||||||
|
f is created, main.__dict__ grows a pachinko cell mapping to two
|
||||||
|
NULLs. When cheat() is called, __builtin__.__dict__ grows a pachinko
|
||||||
|
cell too, but main.__dict__ doesn't know-- and will never know --about
|
||||||
|
that. When f's return stmt references pachinko, in will still find
|
||||||
|
the double-NULLs in main.__dict__'s pachinko cell, and so raise
|
||||||
|
NameError.
|
||||||
|
|
||||||
|
A similar (in cause) break in compatibility can occur if a module
|
||||||
|
global foo is del'ed, but a builtin foo was created prior to that
|
||||||
|
but after the module dict was first created. Then the builtin foo
|
||||||
|
becomes visible in the module under 2.2 and before, but remains
|
||||||
|
invisible under the proposal.
|
||||||
|
|
||||||
|
Mutating builtins is extremely rare (most programs never mutate the
|
||||||
|
builtins, and it's hard to imagine a plausible use for frequent
|
||||||
|
mutation of the builtins -- I've never seen or heard of one), so it
|
||||||
|
doesn't matter how expensive mutating the builtins becomes. OTOH,
|
||||||
|
referencing globals and builtins is very common. Combining those
|
||||||
|
observations suggests a more aggressive caching of builtins in module
|
||||||
|
globals, speeding access at the expense of making mutations of the
|
||||||
|
builtins (potentially much) more expensive to keep the caches in
|
||||||
|
synch.
|
||||||
|
|
||||||
|
Much of the scheme above remains the same, and most of the rest is
|
||||||
|
just a little different. A cell changes to:
|
||||||
|
|
||||||
|
class cell(object):
|
||||||
|
def __init__(self, obj=NULL, builtin=0):
|
||||||
|
self.objptr = obj
|
||||||
|
self.buitinflag = builtin
|
||||||
|
|
||||||
|
and a celldict maps strings to this version of cells. builtinflag
|
||||||
|
is true when and only when objptr contains a value obtained from
|
||||||
|
the builtins; in other words, it's true when and only when a cell
|
||||||
|
is acting as a cached value. When builtinflag is false, objptr is
|
||||||
|
the value of a module global (possibly NULL). celldict changes to:
|
||||||
|
|
||||||
|
class celldict(object):
|
||||||
|
|
||||||
|
def __init__(self, builtindict=()):
|
||||||
|
self.basedict = builtindict
|
||||||
|
self.__dict = d = {}
|
||||||
|
for k, v in builtindict.items():
|
||||||
|
d[k] = cell(v, 1)
|
||||||
|
|
||||||
|
def __getitem__(self, key):
|
||||||
|
c = self.__dict.get(key)
|
||||||
|
if c is None or c.objptr is NULL or c.builtinflag:
|
||||||
|
raise KeyError, key
|
||||||
|
return c.objptr
|
||||||
|
|
||||||
|
def __setitem__(self, key, value):
|
||||||
|
c = self.__dict.get(key)
|
||||||
|
if c is None:
|
||||||
|
c = cell()
|
||||||
|
self.__dict[key] = c
|
||||||
|
c.objptr = value
|
||||||
|
c.builtinflag = 0
|
||||||
|
|
||||||
|
def __delitem__(self, key):
|
||||||
|
c = self.__dict.get(key)
|
||||||
|
if c is None or c.objptr is NULL or c.builtinflag:
|
||||||
|
raise KeyError, key
|
||||||
|
c.objptr = NULL
|
||||||
|
# We may have unmasked a builtin. Note that because
|
||||||
|
# we're checking the builtin dict for that *now*, this
|
||||||
|
# still works if the builtin first came into existence
|
||||||
|
# after we were constructed. Note too that del on
|
||||||
|
# namespace dicts is rare, so the expensse of this check
|
||||||
|
# shouldn't matter.
|
||||||
|
if key in self.basedict:
|
||||||
|
c.objptr = self.basedict[key]
|
||||||
|
assert c.objptr is not NULL # else "in" lied
|
||||||
|
c.buitinflag = 1
|
||||||
|
else:
|
||||||
|
# There is no builtin with the same name.
|
||||||
|
assert not c.buitinflag
|
||||||
|
|
||||||
|
def keys(self):
|
||||||
|
return [k for k, c in self.__dict.iteritems()
|
||||||
|
if c.objptr is not NULL and not c.buitinflag]
|
||||||
|
|
||||||
|
def items(self):
|
||||||
|
return [k, c.objptr for k, c in self.__dict.iteritems()
|
||||||
|
if c.objptr is not NULL and not c.buitinflag]
|
||||||
|
|
||||||
|
def values(self):
|
||||||
|
preturn [c.objptr for c in self.__dict.itervalues()
|
||||||
|
if c.objptr is not NULL and not c.buitinflag]
|
||||||
|
|
||||||
|
def clear(self):
|
||||||
|
for c in self.__dict.values():
|
||||||
|
if not c.buitinflag:
|
||||||
|
c.objptr = NULL
|
||||||
|
|
||||||
|
# Etc.
|
||||||
|
|
||||||
|
The speed benefit comes from simplifying LOAD_GLOBAL_CELL, which
|
||||||
|
I expect is executed more frequently than all other namespace
|
||||||
|
operations combined:
|
||||||
|
|
||||||
|
def LOAD_GLOBAL_CELL(self, i):
|
||||||
|
# self is the frame
|
||||||
|
c = self.func_cells[i]
|
||||||
|
return c.objptr # may be NULL (also true before)
|
||||||
|
|
||||||
|
That is, accessing builtins and accessing module globals are equally
|
||||||
|
fast. For module globals, a NULL-pointer test+branch is saved. For
|
||||||
|
builtins, an additional pointer chase is also saved.
|
||||||
|
|
||||||
|
The other part needed to make this fly is expensive, propagating
|
||||||
|
mutations of builtins into the module dicts that were initialized
|
||||||
|
from the builtins. This is much like, in 2.2, propagating changes
|
||||||
|
in new-style base classes to their descendants: the builtins need to
|
||||||
|
maintain a list of weakrefs to the modules (or module dicts)
|
||||||
|
initialized from the builtin's dict. Given a mutation to the builtin
|
||||||
|
dict (adding a new key, changing the value associated with an
|
||||||
|
existing key, or deleting a key), traverse the list of module dicts
|
||||||
|
and make corresponding mutations to them. This is straightforward;
|
||||||
|
for example, if a key is deleted from builtins, execute
|
||||||
|
reflect_bltin_del in each module:
|
||||||
|
|
||||||
|
def reflect_bltin_del(self, key):
|
||||||
|
c = self.__dict.get(key)
|
||||||
|
assert c is not None # else we were already out of synch
|
||||||
|
if c.buitinflag:
|
||||||
|
# Put us back in synch.
|
||||||
|
c.objptr = NULL
|
||||||
|
c.buitinflag = 0
|
||||||
|
# Else we're shadowing the builtin, so don't care that
|
||||||
|
# the builtin went away.
|
||||||
|
|
||||||
|
Note that c.buitinflag protects from us erroneously deleting a
|
||||||
|
module global of the same name. Adding a new (key, value) builtin
|
||||||
|
pair is similar:
|
||||||
|
|
||||||
|
def reflect_bltin_new(self, key, value):
|
||||||
|
c = self.__dict.get(key)
|
||||||
|
if c is None:
|
||||||
|
# Never heard of it before: cache the builtin value.
|
||||||
|
self.__dict[key] = cell(value, 1)
|
||||||
|
elif c.objptr is NULL:
|
||||||
|
# This used to exist in the module or the builtins,
|
||||||
|
# but doesn't anymore; rehabilitate it.
|
||||||
|
assert not c.builtinflag
|
||||||
|
c.objptr = value
|
||||||
|
c.buitinflag = 1
|
||||||
|
else:
|
||||||
|
# We're shadowing it already.
|
||||||
|
assert not c.buitinflag
|
||||||
|
|
||||||
|
Changing the value of an existing builtin can be viewed as deleting
|
||||||
|
the name, then adding it again. Indeed, since mutating builtins is
|
||||||
|
so rare, that's probably the right way to implement it too (speed
|
||||||
|
doesn't matter here):
|
||||||
|
|
||||||
|
def reflect_bltin_change(self, key, newvalue):
|
||||||
|
assert key in self.__dict
|
||||||
|
self.reflect_bltin_del(key)
|
||||||
|
self.reflect_bltin_new(key, newvalue)
|
||||||
|
|
||||||
|
|
||||||
Comparison
|
Comparison
|
||||||
|
|
Loading…
Reference in New Issue