added bit about caching functions in pystone

added (very) short bit about threads added "Questions" and "Unresolved Issues" sections
2001-08-16 01:04:55 +00:00 · 2001-08-16 01:04:55 +00:00 · f0d981671f
parent 0b34071cba
commit f0d981671f
1 changed files with 253 additions and 4 deletions
--- a/pep-0266.txt
+++ b/pep-0266.txt
@ -49,6 +49,18 @@ Introduction
    objects themselves changes rarely, the cost of keeping track of
    such objects should be low and the potential payoff fairly large.

+    In an attempt to gauge the effect of this proposal, I modified the
+    Pystone benchmark program included in the Python distribution to
+    cache global functions.  Its main function, Proc0, makes calls to
+    ten different functions inside its for loop.  In addition, Func2
+    calls Func1 repeatedly inside a loop.  If local copies of these 11
+    global idenfiers are made before the functions' loops are entered,
+    performance on this particular benchmark improves by about two per
+    cent (from 5561 pystones to 5685 on my laptop).  It gives some
+    indication that performance would be improved by caching most
+    global variable access.  Note also that the pystone benchmark
+    makes essentially no accesses of global module attributes, an
+    anticipated area of improvement for this PEP.

 Proposed Change

@ -63,6 +75,17 @@ Proposed Change
    any association between the name and the local variable slot.


+Threads
+
+    Operation of this code in threaded programs will be no different
+    than in unthreaded programs.  If you need to lock an object to
+    access it, you would have had to do that before TRACK_OBJECT would
+    have been executed and retain that lock until after you stop using
+    it.
+
+    FIXME: I suspect I need more here.
+
+
 Rationale

    Global variables and attributes rarely change.  For example, once
@ -103,7 +126,9 @@ Rationale
    optimizer) could recognize that both math.sin and l.append are
    being called in the loop and decide to generate the tracked local
    code, avoiding it for the builtin range() function because it's
-    only called once during loop setup.
+    only called once during loop setup.  Performance issues related to
+    accessing local variables make tracking l.append less attractive
+    than tracking globals such as math.sin.

    According to a post to python-dev by Marc-Andre Lemburg [1],
    LOAD_GLOBAL opcodes account for over 7% of all instructions
@ -111,8 +136,8 @@ Rationale
    expensive instruction, at least relative to a LOAD_FAST
    instruction, which is a simple array index and requires no extra
    function calls by the virtual machine.  I believe many LOAD_GLOBAL
-    instructions and LOAD_GLOBAL/ LOAD_ATTR pairs could be converted
-    to LOAD_FAST instructions.
+    instructions and LOAD_GLOBAL/LOAD_ATTR pairs could be converted to
+    LOAD_FAST instructions.

    Code that uses global variables heavily often resorts to various
    tricks to avoid global variable and attribute lookup.  The
@ -126,6 +151,228 @@ Rationale
    array that it imports from sre_constants.py.


+Questions
+
+    Q.  What about threads?  What if math.sin changes while in cache?
+
+    A.  I believe the global interpreter lock will protect values from
+        being corrupted.  In any case, the situation would be no worse
+        than it is today.  If one thread modified math.sin after another
+        thread had already executed "LOAD_GLOBAL math", but before it
+        executed "LOAD_ATTR sin", the client thread would see the old
+        value of math.sin.
+
+        The idea is this.  I use a multi-attribute load below as an
+        example, not because it would happen very often, but because by
+        demonstrating the recursive nature with an extra call hopefully
+        it will become clearer what I have in mind.  Suppose a function
+        defined in module foo wants to access spam.eggs.ham and that
+        spam is a module imported at the module level in foo:
+
+            import spam
+            ...
+            def somefunc():
+                ...
+                x = spam.eggs.ham
+
+        Upon entry to somefunc, a TRACK_GLOBAL instruction will be
+        executed:
+
+            TRACK_GLOBAL spam.eggs.ham n
+
+        "spam.eggs.ham" is a string literal stored in the function's
+        constants array.  "n" is a fastlocals index.  "&fastlocals[n]"
+        is a reference to slot "n" in the executing frame's fastlocals
+        array, the location in which the spam.eggs.ham reference will
+        be stored.  Here's what I envision happening:
+
+        1. The TRACK_GLOBAL instruction locates the object referred to
+           by the name "spam" and finds it in its module scope.  It
+           then executes a C function like
+
+               _PyObject_TrackName(m, "spam.eggs.ham", &fastlocals[n])
+
+           where "m" is the module object with an attribute "spam".
+
+        2. The module object strips the leading "spam." stores the
+           necessary information ("eggs.ham" and &fastlocals[n]) in
+           case its binding for the name "eggs" changes.  It then
+           locates the object referred to by the key "eggs" in its
+           dict and recursively calls
+
+               _PyObject_TrackName(eggs, "eggs.ham", &fastlocals[n])
+
+        3. The eggs object strips the leading "eggs.", stores the
+           ("ham", &fastlocals[n]) info, locates the object in its
+           namespace called "ham" and calls _PyObject_TrackName once
+           again:
+
+               _PyObject_TrackName(ham, "ham", &fastlocals[n])
+
+        4. The "ham" object strips the leading string (no "." this
+           time, but that's a minor point), sees that the result is
+           empty, then uses its own value (self, probably) to update
+           the location it was handed:
+
+               Py_XDECREF(&fastlocals[n]);
+               &fastlocals[n] = self;
+               Py_INCREF(&fastlocals[n]);
+
+        At this point, each object involved in resolving
+        "spam.eggs.ham" knows which entry in its namespace needs to be
+        tracked and what location to update if that name changes.
+        Furthermore, if the one name it is tracking in its local
+        storage changes, it can call _PyObject_TrackName using the new
+        object once the change has been made.  At the bottom end of
+        the food chain, the last object will always strip a name, see
+        the empty string and know that its value should be stuffed
+        into the location it's been passed.
+
+        When the object referred to by the dotted expression
+        "spam.eggs.ham" is going to go out of scope, an
+        "UNTRACK_GLOBAL spam.eggs.ham n" instruction is executed.  It
+        has the effect of deleting all the tracking information that
+        TRACK_GLOBAL established.
+
+        The tracking operation may seem expensive, but recall that the
+        objects being tracked are assumed to be "almost constant", so
+        the setup cost will be traded off against hopefully multiple
+        local instead of global loads.  For globals with attributes
+        the tracking setup cost grows but is offset by avoiding the
+        extra LOAD_ATTR cost.  The TRACK_GLOBAL instruction needs to
+        perform a PyDict_GetItemString for the first name in the chain
+        to determine where the top-level object resides.  Each object
+        in the chain has to store a string and an address somewhere,
+        probably in a dict that uses storage locations as keys
+        (e.g. the &fastlocals[n]) and strings as values.  (This dict
+        could possibly be a central dict of dicts whose keys are
+        object addresses instead of a per-object dict.)  It shouldn't
+        be the other way around because multiple active frames may
+        want to track "spam.eggs.ham", but only one frame will want to
+        associate that name with one of its fast locals slots.
+
+
+Unresolved Issues
+
+    Threading -
+
+    What about this (dumb) code?
+
+        l = []
+        lock = threading.Lock()
+        ...
+        def fill_l():
+            for i in range(1000):
+                lock.acquire()
+                l.append(math.sin(i))
+                lock.release()
+        ...
+        def consume_l():
+            while 1:
+                lock.acquire()
+                if l:
+                    elt = l.pop()
+                lock.release()
+                fiddle(elt)
+
+    It's not clear from a static analysis of the code what the lock is
+    protecting.  (You can't tell at compile-time that threads are even
+    involved can you?)  Would or should it affect attempts to track
+    "l.append" or "math.sin" in the fill_l function?
+
+    If we annotate the code with mythical track_object and untrack_object
+    builtins (I'm not proposing this, just illustrating where stuff would
+    go!), we get
+
+        l = []
+        lock = threading.Lock()
+        ...
+        def fill_l():
+            track_object("l.append", append)
+            track_object("math.sin", sin)
+            for i in range(1000):
+                lock.acquire()
+                append(sin(i))
+                lock.release()
+            untrack_object("math.sin", sin)
+            untrack_object("l.append", append)
+        ...
+        def consume_l():
+            while 1:
+                lock.acquire()
+                if l:
+                    elt = l.pop()
+                lock.release()
+                fiddle(elt)
+
+    Is that correct both with and without threads (or at least equally
+    incorrect with and without threads)?
+
+    Nested Scopes -
+
+    The presence of nested scopes will affect where TRACK_GLOBAL finds
+    a global variable, but shouldn't affect anything after that.  (I
+    think.)
+
+    Missing Attributes -
+
+    Suppose I am tracking the object referred to by "spam.eggs.ham"
+    and "spam.eggs" is rebound to an object that does not have a "ham"
+    attribute.  It's clear this will be an AttributeError if the
+    programmer attempts to resolve "spam.eggs.ham" in the current
+    Python virtual machine, but suppose the programmer has anticipated
+    this case:
+
+        if hasattr(spam.eggs, "ham"):
+            print spam.eggs.ham
+        elif hasattr(spam.eggs, "bacon"):
+            print spam.eggs.bacon
+        else:
+            print "what? no meat?"
+
+    You can't raise an AttributeError when the tracking information is
+    recalculated.  If it does not raise AttributeError and instead
+    lets the tracking stand, it may be setting the programmer up for a
+    very subtle error.
+
+    One solution to this problem would be to track the shortest
+    possible root of each dotted expression the function refers to
+    directly.  In the above example, "spam.eggs" would be tracked, but
+    "spam.eggs.ham" and "spam.eggs.bacon" would not.
+
+    Who does the dirty work? -
+
+    In the Questions section I postulated the existence of a
+    _PyObject_TrackName function.  While the API is fairly easy to
+    specify, the implementation behind-the-scenes is not so obvious.
+    A central dictionary could be used to track the name/location
+    mappings, but it appears that all setattr functions might need to
+    be modified to accommodate this new functionality.
+
+    If all types used the PyObject_GenericSetAttr function to set
+    attributes that would localize the update code somewhat.  They
+    don't however (which is not too surprising), so it seems that all
+    getattrfunc and getattrofunc functions will have to be updated.
+    In addition, this would place an absolute requirement on C
+    extension module authors to call some function when an attribute
+    changes value (PyObject_TrackUpdate?).
+
+    Finally, it's quite possible that some attributes will be set by
+    side effect and not by any direct call to a setattr method of some
+    sort.  Consider a device interface module that has an interrupt
+    routine that copies the contents of a device register into a slot
+    in the object's struct whenever it changes.  In these situations,
+    more extensive modifications would have to be made by the module
+    author.  To identify such situations at compile time would be
+    impossible.  I think an extra slot could be added to PyTypeObjects
+    to indicate if an object's code is safe for global tracking.  It
+    would have a default value of 0 (Py_TRACKING_NOT_SAFE).  If an
+    extension module author has implemented the necessary tracking
+    support, that field could be initialized to 1 (Py_TRACKING_SAFE).
+    _PyObject_TrackName could check that field and issue a warning if
+    it is asked to track an object that the author has not explicitly
+    said was safe for tracking.
+
 Discussion

    Jeremy Hylton has an alternate proposal on the table [2].  His
@ -135,7 +382,8 @@ Discussion
    available to examine, the Python implementation given in his
    proposal still appears to require dictionary key lookup.  It
    doesn't appear that his proposal could speed local variable
-    attribute lookup, which might be worthwhile in some situations.
+    attribute lookup, which might be worthwhile in some situations if
+    potential performance burdens could be addressed.


 Backwards Compatibility
@ -201,4 +449,5 @@ Copyright
 Local Variables:
 mode: indented-text
 indent-tabs-mode: nil
+fill-column: 70
 End: