added bit about caching functions in pystone
added (very) short bit about threads added "Questions" and "Unresolved Issues" sections
This commit is contained in:
parent
0b34071cba
commit
f0d981671f
257
pep-0266.txt
257
pep-0266.txt
|
@ -49,6 +49,18 @@ Introduction
|
||||||
objects themselves changes rarely, the cost of keeping track of
|
objects themselves changes rarely, the cost of keeping track of
|
||||||
such objects should be low and the potential payoff fairly large.
|
such objects should be low and the potential payoff fairly large.
|
||||||
|
|
||||||
|
In an attempt to gauge the effect of this proposal, I modified the
|
||||||
|
Pystone benchmark program included in the Python distribution to
|
||||||
|
cache global functions. Its main function, Proc0, makes calls to
|
||||||
|
ten different functions inside its for loop. In addition, Func2
|
||||||
|
calls Func1 repeatedly inside a loop. If local copies of these 11
|
||||||
|
global idenfiers are made before the functions' loops are entered,
|
||||||
|
performance on this particular benchmark improves by about two per
|
||||||
|
cent (from 5561 pystones to 5685 on my laptop). It gives some
|
||||||
|
indication that performance would be improved by caching most
|
||||||
|
global variable access. Note also that the pystone benchmark
|
||||||
|
makes essentially no accesses of global module attributes, an
|
||||||
|
anticipated area of improvement for this PEP.
|
||||||
|
|
||||||
Proposed Change
|
Proposed Change
|
||||||
|
|
||||||
|
@ -63,6 +75,17 @@ Proposed Change
|
||||||
any association between the name and the local variable slot.
|
any association between the name and the local variable slot.
|
||||||
|
|
||||||
|
|
||||||
|
Threads
|
||||||
|
|
||||||
|
Operation of this code in threaded programs will be no different
|
||||||
|
than in unthreaded programs. If you need to lock an object to
|
||||||
|
access it, you would have had to do that before TRACK_OBJECT would
|
||||||
|
have been executed and retain that lock until after you stop using
|
||||||
|
it.
|
||||||
|
|
||||||
|
FIXME: I suspect I need more here.
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
|
|
||||||
Global variables and attributes rarely change. For example, once
|
Global variables and attributes rarely change. For example, once
|
||||||
|
@ -103,7 +126,9 @@ Rationale
|
||||||
optimizer) could recognize that both math.sin and l.append are
|
optimizer) could recognize that both math.sin and l.append are
|
||||||
being called in the loop and decide to generate the tracked local
|
being called in the loop and decide to generate the tracked local
|
||||||
code, avoiding it for the builtin range() function because it's
|
code, avoiding it for the builtin range() function because it's
|
||||||
only called once during loop setup.
|
only called once during loop setup. Performance issues related to
|
||||||
|
accessing local variables make tracking l.append less attractive
|
||||||
|
than tracking globals such as math.sin.
|
||||||
|
|
||||||
According to a post to python-dev by Marc-Andre Lemburg [1],
|
According to a post to python-dev by Marc-Andre Lemburg [1],
|
||||||
LOAD_GLOBAL opcodes account for over 7% of all instructions
|
LOAD_GLOBAL opcodes account for over 7% of all instructions
|
||||||
|
@ -111,8 +136,8 @@ Rationale
|
||||||
expensive instruction, at least relative to a LOAD_FAST
|
expensive instruction, at least relative to a LOAD_FAST
|
||||||
instruction, which is a simple array index and requires no extra
|
instruction, which is a simple array index and requires no extra
|
||||||
function calls by the virtual machine. I believe many LOAD_GLOBAL
|
function calls by the virtual machine. I believe many LOAD_GLOBAL
|
||||||
instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be converted
|
instructions and LOAD_GLOBAL/LOAD_ATTR pairs could be converted to
|
||||||
to LOAD_FAST instructions.
|
LOAD_FAST instructions.
|
||||||
|
|
||||||
Code that uses global variables heavily often resorts to various
|
Code that uses global variables heavily often resorts to various
|
||||||
tricks to avoid global variable and attribute lookup. The
|
tricks to avoid global variable and attribute lookup. The
|
||||||
|
@ -126,6 +151,228 @@ Rationale
|
||||||
array that it imports from sre_constants.py.
|
array that it imports from sre_constants.py.
|
||||||
|
|
||||||
|
|
||||||
|
Questions
|
||||||
|
|
||||||
|
Q. What about threads? What if math.sin changes while in cache?
|
||||||
|
|
||||||
|
A. I believe the global interpreter lock will protect values from
|
||||||
|
being corrupted. In any case, the situation would be no worse
|
||||||
|
than it is today. If one thread modified math.sin after another
|
||||||
|
thread had already executed "LOAD_GLOBAL math", but before it
|
||||||
|
executed "LOAD_ATTR sin", the client thread would see the old
|
||||||
|
value of math.sin.
|
||||||
|
|
||||||
|
The idea is this. I use a multi-attribute load below as an
|
||||||
|
example, not because it would happen very often, but because by
|
||||||
|
demonstrating the recursive nature with an extra call hopefully
|
||||||
|
it will become clearer what I have in mind. Suppose a function
|
||||||
|
defined in module foo wants to access spam.eggs.ham and that
|
||||||
|
spam is a module imported at the module level in foo:
|
||||||
|
|
||||||
|
import spam
|
||||||
|
...
|
||||||
|
def somefunc():
|
||||||
|
...
|
||||||
|
x = spam.eggs.ham
|
||||||
|
|
||||||
|
Upon entry to somefunc, a TRACK_GLOBAL instruction will be
|
||||||
|
executed:
|
||||||
|
|
||||||
|
TRACK_GLOBAL spam.eggs.ham n
|
||||||
|
|
||||||
|
"spam.eggs.ham" is a string literal stored in the function's
|
||||||
|
constants array. "n" is a fastlocals index. "&fastlocals[n]"
|
||||||
|
is a reference to slot "n" in the executing frame's fastlocals
|
||||||
|
array, the location in which the spam.eggs.ham reference will
|
||||||
|
be stored. Here's what I envision happening:
|
||||||
|
|
||||||
|
1. The TRACK_GLOBAL instruction locates the object referred to
|
||||||
|
by the name "spam" and finds it in its module scope. It
|
||||||
|
then executes a C function like
|
||||||
|
|
||||||
|
_PyObject_TrackName(m, "spam.eggs.ham", &fastlocals[n])
|
||||||
|
|
||||||
|
where "m" is the module object with an attribute "spam".
|
||||||
|
|
||||||
|
2. The module object strips the leading "spam." stores the
|
||||||
|
necessary information ("eggs.ham" and &fastlocals[n]) in
|
||||||
|
case its binding for the name "eggs" changes. It then
|
||||||
|
locates the object referred to by the key "eggs" in its
|
||||||
|
dict and recursively calls
|
||||||
|
|
||||||
|
_PyObject_TrackName(eggs, "eggs.ham", &fastlocals[n])
|
||||||
|
|
||||||
|
3. The eggs object strips the leading "eggs.", stores the
|
||||||
|
("ham", &fastlocals[n]) info, locates the object in its
|
||||||
|
namespace called "ham" and calls _PyObject_TrackName once
|
||||||
|
again:
|
||||||
|
|
||||||
|
_PyObject_TrackName(ham, "ham", &fastlocals[n])
|
||||||
|
|
||||||
|
4. The "ham" object strips the leading string (no "." this
|
||||||
|
time, but that's a minor point), sees that the result is
|
||||||
|
empty, then uses its own value (self, probably) to update
|
||||||
|
the location it was handed:
|
||||||
|
|
||||||
|
Py_XDECREF(&fastlocals[n]);
|
||||||
|
&fastlocals[n] = self;
|
||||||
|
Py_INCREF(&fastlocals[n]);
|
||||||
|
|
||||||
|
At this point, each object involved in resolving
|
||||||
|
"spam.eggs.ham" knows which entry in its namespace needs to be
|
||||||
|
tracked and what location to update if that name changes.
|
||||||
|
Furthermore, if the one name it is tracking in its local
|
||||||
|
storage changes, it can call _PyObject_TrackName using the new
|
||||||
|
object once the change has been made. At the bottom end of
|
||||||
|
the food chain, the last object will always strip a name, see
|
||||||
|
the empty string and know that its value should be stuffed
|
||||||
|
into the location it's been passed.
|
||||||
|
|
||||||
|
When the object referred to by the dotted expression
|
||||||
|
"spam.eggs.ham" is going to go out of scope, an
|
||||||
|
"UNTRACK_GLOBAL spam.eggs.ham n" instruction is executed. It
|
||||||
|
has the effect of deleting all the tracking information that
|
||||||
|
TRACK_GLOBAL established.
|
||||||
|
|
||||||
|
The tracking operation may seem expensive, but recall that the
|
||||||
|
objects being tracked are assumed to be "almost constant", so
|
||||||
|
the setup cost will be traded off against hopefully multiple
|
||||||
|
local instead of global loads. For globals with attributes
|
||||||
|
the tracking setup cost grows but is offset by avoiding the
|
||||||
|
extra LOAD_ATTR cost. The TRACK_GLOBAL instruction needs to
|
||||||
|
perform a PyDict_GetItemString for the first name in the chain
|
||||||
|
to determine where the top-level object resides. Each object
|
||||||
|
in the chain has to store a string and an address somewhere,
|
||||||
|
probably in a dict that uses storage locations as keys
|
||||||
|
(e.g. the &fastlocals[n]) and strings as values. (This dict
|
||||||
|
could possibly be a central dict of dicts whose keys are
|
||||||
|
object addresses instead of a per-object dict.) It shouldn't
|
||||||
|
be the other way around because multiple active frames may
|
||||||
|
want to track "spam.eggs.ham", but only one frame will want to
|
||||||
|
associate that name with one of its fast locals slots.
|
||||||
|
|
||||||
|
|
||||||
|
Unresolved Issues
|
||||||
|
|
||||||
|
Threading -
|
||||||
|
|
||||||
|
What about this (dumb) code?
|
||||||
|
|
||||||
|
l = []
|
||||||
|
lock = threading.Lock()
|
||||||
|
...
|
||||||
|
def fill_l():
|
||||||
|
for i in range(1000):
|
||||||
|
lock.acquire()
|
||||||
|
l.append(math.sin(i))
|
||||||
|
lock.release()
|
||||||
|
...
|
||||||
|
def consume_l():
|
||||||
|
while 1:
|
||||||
|
lock.acquire()
|
||||||
|
if l:
|
||||||
|
elt = l.pop()
|
||||||
|
lock.release()
|
||||||
|
fiddle(elt)
|
||||||
|
|
||||||
|
It's not clear from a static analysis of the code what the lock is
|
||||||
|
protecting. (You can't tell at compile-time that threads are even
|
||||||
|
involved can you?) Would or should it affect attempts to track
|
||||||
|
"l.append" or "math.sin" in the fill_l function?
|
||||||
|
|
||||||
|
If we annotate the code with mythical track_object and untrack_object
|
||||||
|
builtins (I'm not proposing this, just illustrating where stuff would
|
||||||
|
go!), we get
|
||||||
|
|
||||||
|
l = []
|
||||||
|
lock = threading.Lock()
|
||||||
|
...
|
||||||
|
def fill_l():
|
||||||
|
track_object("l.append", append)
|
||||||
|
track_object("math.sin", sin)
|
||||||
|
for i in range(1000):
|
||||||
|
lock.acquire()
|
||||||
|
append(sin(i))
|
||||||
|
lock.release()
|
||||||
|
untrack_object("math.sin", sin)
|
||||||
|
untrack_object("l.append", append)
|
||||||
|
...
|
||||||
|
def consume_l():
|
||||||
|
while 1:
|
||||||
|
lock.acquire()
|
||||||
|
if l:
|
||||||
|
elt = l.pop()
|
||||||
|
lock.release()
|
||||||
|
fiddle(elt)
|
||||||
|
|
||||||
|
Is that correct both with and without threads (or at least equally
|
||||||
|
incorrect with and without threads)?
|
||||||
|
|
||||||
|
Nested Scopes -
|
||||||
|
|
||||||
|
The presence of nested scopes will affect where TRACK_GLOBAL finds
|
||||||
|
a global variable, but shouldn't affect anything after that. (I
|
||||||
|
think.)
|
||||||
|
|
||||||
|
Missing Attributes -
|
||||||
|
|
||||||
|
Suppose I am tracking the object referred to by "spam.eggs.ham"
|
||||||
|
and "spam.eggs" is rebound to an object that does not have a "ham"
|
||||||
|
attribute. It's clear this will be an AttributeError if the
|
||||||
|
programmer attempts to resolve "spam.eggs.ham" in the current
|
||||||
|
Python virtual machine, but suppose the programmer has anticipated
|
||||||
|
this case:
|
||||||
|
|
||||||
|
if hasattr(spam.eggs, "ham"):
|
||||||
|
print spam.eggs.ham
|
||||||
|
elif hasattr(spam.eggs, "bacon"):
|
||||||
|
print spam.eggs.bacon
|
||||||
|
else:
|
||||||
|
print "what? no meat?"
|
||||||
|
|
||||||
|
You can't raise an AttributeError when the tracking information is
|
||||||
|
recalculated. If it does not raise AttributeError and instead
|
||||||
|
lets the tracking stand, it may be setting the programmer up for a
|
||||||
|
very subtle error.
|
||||||
|
|
||||||
|
One solution to this problem would be to track the shortest
|
||||||
|
possible root of each dotted expression the function refers to
|
||||||
|
directly. In the above example, "spam.eggs" would be tracked, but
|
||||||
|
"spam.eggs.ham" and "spam.eggs.bacon" would not.
|
||||||
|
|
||||||
|
Who does the dirty work? -
|
||||||
|
|
||||||
|
In the Questions section I postulated the existence of a
|
||||||
|
_PyObject_TrackName function. While the API is fairly easy to
|
||||||
|
specify, the implementation behind-the-scenes is not so obvious.
|
||||||
|
A central dictionary could be used to track the name/location
|
||||||
|
mappings, but it appears that all setattr functions might need to
|
||||||
|
be modified to accommodate this new functionality.
|
||||||
|
|
||||||
|
If all types used the PyObject_GenericSetAttr function to set
|
||||||
|
attributes that would localize the update code somewhat. They
|
||||||
|
don't however (which is not too surprising), so it seems that all
|
||||||
|
getattrfunc and getattrofunc functions will have to be updated.
|
||||||
|
In addition, this would place an absolute requirement on C
|
||||||
|
extension module authors to call some function when an attribute
|
||||||
|
changes value (PyObject_TrackUpdate?).
|
||||||
|
|
||||||
|
Finally, it's quite possible that some attributes will be set by
|
||||||
|
side effect and not by any direct call to a setattr method of some
|
||||||
|
sort. Consider a device interface module that has an interrupt
|
||||||
|
routine that copies the contents of a device register into a slot
|
||||||
|
in the object's struct whenever it changes. In these situations,
|
||||||
|
more extensive modifications would have to be made by the module
|
||||||
|
author. To identify such situations at compile time would be
|
||||||
|
impossible. I think an extra slot could be added to PyTypeObjects
|
||||||
|
to indicate if an object's code is safe for global tracking. It
|
||||||
|
would have a default value of 0 (Py_TRACKING_NOT_SAFE). If an
|
||||||
|
extension module author has implemented the necessary tracking
|
||||||
|
support, that field could be initialized to 1 (Py_TRACKING_SAFE).
|
||||||
|
_PyObject_TrackName could check that field and issue a warning if
|
||||||
|
it is asked to track an object that the author has not explicitly
|
||||||
|
said was safe for tracking.
|
||||||
|
|
||||||
Discussion
|
Discussion
|
||||||
|
|
||||||
Jeremy Hylton has an alternate proposal on the table [2]. His
|
Jeremy Hylton has an alternate proposal on the table [2]. His
|
||||||
|
@ -135,7 +382,8 @@ Discussion
|
||||||
available to examine, the Python implementation given in his
|
available to examine, the Python implementation given in his
|
||||||
proposal still appears to require dictionary key lookup. It
|
proposal still appears to require dictionary key lookup. It
|
||||||
doesn't appear that his proposal could speed local variable
|
doesn't appear that his proposal could speed local variable
|
||||||
attribute lookup, which might be worthwhile in some situations.
|
attribute lookup, which might be worthwhile in some situations if
|
||||||
|
potential performance burdens could be addressed.
|
||||||
|
|
||||||
|
|
||||||
Backwards Compatibility
|
Backwards Compatibility
|
||||||
|
@ -201,4 +449,5 @@ Copyright
|
||||||
Local Variables:
|
Local Variables:
|
||||||
mode: indented-text
|
mode: indented-text
|
||||||
indent-tabs-mode: nil
|
indent-tabs-mode: nil
|
||||||
|
fill-column: 70
|
||||||
End:
|
End:
|
||||||
|
|
Loading…
Reference in New Issue