added bit about caching functions in pystone

added (very) short bit about threads
added "Questions" and "Unresolved Issues" sections
This commit is contained in:
Skip Montanaro 2001-08-16 01:04:55 +00:00
parent 0b34071cba
commit f0d981671f
1 changed files with 253 additions and 4 deletions

View File

@ -49,6 +49,18 @@ Introduction
objects themselves changes rarely, the cost of keeping track of objects themselves changes rarely, the cost of keeping track of
such objects should be low and the potential payoff fairly large. such objects should be low and the potential payoff fairly large.
In an attempt to gauge the effect of this proposal, I modified the
Pystone benchmark program included in the Python distribution to
cache global functions. Its main function, Proc0, makes calls to
ten different functions inside its for loop. In addition, Func2
calls Func1 repeatedly inside a loop. If local copies of these 11
global idenfiers are made before the functions' loops are entered,
performance on this particular benchmark improves by about two per
cent (from 5561 pystones to 5685 on my laptop). It gives some
indication that performance would be improved by caching most
global variable access. Note also that the pystone benchmark
makes essentially no accesses of global module attributes, an
anticipated area of improvement for this PEP.
Proposed Change Proposed Change
@ -63,6 +75,17 @@ Proposed Change
any association between the name and the local variable slot. any association between the name and the local variable slot.
Threads
Operation of this code in threaded programs will be no different
than in unthreaded programs. If you need to lock an object to
access it, you would have had to do that before TRACK_OBJECT would
have been executed and retain that lock until after you stop using
it.
FIXME: I suspect I need more here.
Rationale Rationale
Global variables and attributes rarely change. For example, once Global variables and attributes rarely change. For example, once
@ -103,7 +126,9 @@ Rationale
optimizer) could recognize that both math.sin and l.append are optimizer) could recognize that both math.sin and l.append are
being called in the loop and decide to generate the tracked local being called in the loop and decide to generate the tracked local
code, avoiding it for the builtin range() function because it's code, avoiding it for the builtin range() function because it's
only called once during loop setup. only called once during loop setup. Performance issues related to
accessing local variables make tracking l.append less attractive
than tracking globals such as math.sin.
According to a post to python-dev by Marc-Andre Lemburg [1], According to a post to python-dev by Marc-Andre Lemburg [1],
LOAD_GLOBAL opcodes account for over 7% of all instructions LOAD_GLOBAL opcodes account for over 7% of all instructions
@ -111,8 +136,8 @@ Rationale
expensive instruction, at least relative to a LOAD_FAST expensive instruction, at least relative to a LOAD_FAST
instruction, which is a simple array index and requires no extra instruction, which is a simple array index and requires no extra
function calls by the virtual machine. I believe many LOAD_GLOBAL function calls by the virtual machine. I believe many LOAD_GLOBAL
instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be converted instructions and LOAD_GLOBAL/LOAD_ATTR pairs could be converted to
to LOAD_FAST instructions. LOAD_FAST instructions.
Code that uses global variables heavily often resorts to various Code that uses global variables heavily often resorts to various
tricks to avoid global variable and attribute lookup. The tricks to avoid global variable and attribute lookup. The
@ -126,6 +151,228 @@ Rationale
array that it imports from sre_constants.py. array that it imports from sre_constants.py.
Questions
Q. What about threads? What if math.sin changes while in cache?
A. I believe the global interpreter lock will protect values from
being corrupted. In any case, the situation would be no worse
than it is today. If one thread modified math.sin after another
thread had already executed "LOAD_GLOBAL math", but before it
executed "LOAD_ATTR sin", the client thread would see the old
value of math.sin.
The idea is this. I use a multi-attribute load below as an
example, not because it would happen very often, but because by
demonstrating the recursive nature with an extra call hopefully
it will become clearer what I have in mind. Suppose a function
defined in module foo wants to access spam.eggs.ham and that
spam is a module imported at the module level in foo:
import spam
...
def somefunc():
...
x = spam.eggs.ham
Upon entry to somefunc, a TRACK_GLOBAL instruction will be
executed:
TRACK_GLOBAL spam.eggs.ham n
"spam.eggs.ham" is a string literal stored in the function's
constants array. "n" is a fastlocals index. "&fastlocals[n]"
is a reference to slot "n" in the executing frame's fastlocals
array, the location in which the spam.eggs.ham reference will
be stored. Here's what I envision happening:
1. The TRACK_GLOBAL instruction locates the object referred to
by the name "spam" and finds it in its module scope. It
then executes a C function like
_PyObject_TrackName(m, "spam.eggs.ham", &fastlocals[n])
where "m" is the module object with an attribute "spam".
2. The module object strips the leading "spam." stores the
necessary information ("eggs.ham" and &fastlocals[n]) in
case its binding for the name "eggs" changes. It then
locates the object referred to by the key "eggs" in its
dict and recursively calls
_PyObject_TrackName(eggs, "eggs.ham", &fastlocals[n])
3. The eggs object strips the leading "eggs.", stores the
("ham", &fastlocals[n]) info, locates the object in its
namespace called "ham" and calls _PyObject_TrackName once
again:
_PyObject_TrackName(ham, "ham", &fastlocals[n])
4. The "ham" object strips the leading string (no "." this
time, but that's a minor point), sees that the result is
empty, then uses its own value (self, probably) to update
the location it was handed:
Py_XDECREF(&fastlocals[n]);
&fastlocals[n] = self;
Py_INCREF(&fastlocals[n]);
At this point, each object involved in resolving
"spam.eggs.ham" knows which entry in its namespace needs to be
tracked and what location to update if that name changes.
Furthermore, if the one name it is tracking in its local
storage changes, it can call _PyObject_TrackName using the new
object once the change has been made. At the bottom end of
the food chain, the last object will always strip a name, see
the empty string and know that its value should be stuffed
into the location it's been passed.
When the object referred to by the dotted expression
"spam.eggs.ham" is going to go out of scope, an
"UNTRACK_GLOBAL spam.eggs.ham n" instruction is executed. It
has the effect of deleting all the tracking information that
TRACK_GLOBAL established.
The tracking operation may seem expensive, but recall that the
objects being tracked are assumed to be "almost constant", so
the setup cost will be traded off against hopefully multiple
local instead of global loads. For globals with attributes
the tracking setup cost grows but is offset by avoiding the
extra LOAD_ATTR cost. The TRACK_GLOBAL instruction needs to
perform a PyDict_GetItemString for the first name in the chain
to determine where the top-level object resides. Each object
in the chain has to store a string and an address somewhere,
probably in a dict that uses storage locations as keys
(e.g. the &fastlocals[n]) and strings as values. (This dict
could possibly be a central dict of dicts whose keys are
object addresses instead of a per-object dict.) It shouldn't
be the other way around because multiple active frames may
want to track "spam.eggs.ham", but only one frame will want to
associate that name with one of its fast locals slots.
Unresolved Issues
Threading -
What about this (dumb) code?
l = []
lock = threading.Lock()
...
def fill_l():
for i in range(1000):
lock.acquire()
l.append(math.sin(i))
lock.release()
...
def consume_l():
while 1:
lock.acquire()
if l:
elt = l.pop()
lock.release()
fiddle(elt)
It's not clear from a static analysis of the code what the lock is
protecting. (You can't tell at compile-time that threads are even
involved can you?) Would or should it affect attempts to track
"l.append" or "math.sin" in the fill_l function?
If we annotate the code with mythical track_object and untrack_object
builtins (I'm not proposing this, just illustrating where stuff would
go!), we get
l = []
lock = threading.Lock()
...
def fill_l():
track_object("l.append", append)
track_object("math.sin", sin)
for i in range(1000):
lock.acquire()
append(sin(i))
lock.release()
untrack_object("math.sin", sin)
untrack_object("l.append", append)
...
def consume_l():
while 1:
lock.acquire()
if l:
elt = l.pop()
lock.release()
fiddle(elt)
Is that correct both with and without threads (or at least equally
incorrect with and without threads)?
Nested Scopes -
The presence of nested scopes will affect where TRACK_GLOBAL finds
a global variable, but shouldn't affect anything after that. (I
think.)
Missing Attributes -
Suppose I am tracking the object referred to by "spam.eggs.ham"
and "spam.eggs" is rebound to an object that does not have a "ham"
attribute. It's clear this will be an AttributeError if the
programmer attempts to resolve "spam.eggs.ham" in the current
Python virtual machine, but suppose the programmer has anticipated
this case:
if hasattr(spam.eggs, "ham"):
print spam.eggs.ham
elif hasattr(spam.eggs, "bacon"):
print spam.eggs.bacon
else:
print "what? no meat?"
You can't raise an AttributeError when the tracking information is
recalculated. If it does not raise AttributeError and instead
lets the tracking stand, it may be setting the programmer up for a
very subtle error.
One solution to this problem would be to track the shortest
possible root of each dotted expression the function refers to
directly. In the above example, "spam.eggs" would be tracked, but
"spam.eggs.ham" and "spam.eggs.bacon" would not.
Who does the dirty work? -
In the Questions section I postulated the existence of a
_PyObject_TrackName function. While the API is fairly easy to
specify, the implementation behind-the-scenes is not so obvious.
A central dictionary could be used to track the name/location
mappings, but it appears that all setattr functions might need to
be modified to accommodate this new functionality.
If all types used the PyObject_GenericSetAttr function to set
attributes that would localize the update code somewhat. They
don't however (which is not too surprising), so it seems that all
getattrfunc and getattrofunc functions will have to be updated.
In addition, this would place an absolute requirement on C
extension module authors to call some function when an attribute
changes value (PyObject_TrackUpdate?).
Finally, it's quite possible that some attributes will be set by
side effect and not by any direct call to a setattr method of some
sort. Consider a device interface module that has an interrupt
routine that copies the contents of a device register into a slot
in the object's struct whenever it changes. In these situations,
more extensive modifications would have to be made by the module
author. To identify such situations at compile time would be
impossible. I think an extra slot could be added to PyTypeObjects
to indicate if an object's code is safe for global tracking. It
would have a default value of 0 (Py_TRACKING_NOT_SAFE). If an
extension module author has implemented the necessary tracking
support, that field could be initialized to 1 (Py_TRACKING_SAFE).
_PyObject_TrackName could check that field and issue a warning if
it is asked to track an object that the author has not explicitly
said was safe for tracking.
Discussion Discussion
Jeremy Hylton has an alternate proposal on the table [2]. His Jeremy Hylton has an alternate proposal on the table [2]. His
@ -135,7 +382,8 @@ Discussion
available to examine, the Python implementation given in his available to examine, the Python implementation given in his
proposal still appears to require dictionary key lookup. It proposal still appears to require dictionary key lookup. It
doesn't appear that his proposal could speed local variable doesn't appear that his proposal could speed local variable
attribute lookup, which might be worthwhile in some situations. attribute lookup, which might be worthwhile in some situations if
potential performance burdens could be addressed.
Backwards Compatibility Backwards Compatibility
@ -201,4 +449,5 @@ Copyright
Local Variables: Local Variables:
mode: indented-text mode: indented-text
indent-tabs-mode: nil indent-tabs-mode: nil
fill-column: 70
End: End: