From 34771adaea6fa19da1acfd1275a4bab774d948e0 Mon Sep 17 00:00:00 2001 From: Alyssa Coghlan Date: Mon, 5 Aug 2024 13:54:25 +1000 Subject: [PATCH] PEP 558: Clarify rationale for locals() snapshots (#3895) --- peps/pep-0558.rst | 284 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 265 insertions(+), 19 deletions(-) diff --git a/peps/pep-0558.rst b/peps/pep-0558.rst index 5eb32ef68..16219f760 100644 --- a/peps/pep-0558.rst +++ b/peps/pep-0558.rst @@ -34,6 +34,12 @@ are outweighed by the availability of a viable reference implementation. Accordingly, this PEP has been withdrawn in favour of proceeding with :pep:`667`. +Note: while implementing :pep:`667` it became apparent that the rationale for and impact +of ``locals()`` being updated to return independent snapshots in +:term:`optimized scopes ` was not entirely clear in either PEP. +The Motivation and Rationale sections in this PEP have been updated accordingly (since those +aspects are equally applicable to the accepted :pep:`667`). + Abstract ======== @@ -64,6 +70,7 @@ Python C API/ABI: It also proposes the addition of several supporting functions and type definitions to the CPython C API. +.. _pep-558-motivation: Motivation ========== @@ -89,6 +96,32 @@ independent snapshot of the function locals and closure variables on each call, rather than continuing to return the semi-dynamic intermittently updated shared copy that it has historically returned in CPython. +Specifically, the proposal in this PEP eliminates the historical behaviour where +adding a new local variable can change the behaviour of code executed with +``exec()`` in function scopes, even if that code runs *before* the local variable +is defined. + +For example:: + + def f(): + exec("x = 1") + print(locals().get("x")) + f() + +prints ``1``, but:: + + def f(): + exec("x = 1") + print(locals().get("x")) + x = 0 + f() + +prints ``None`` (the default value from the ``.get()`` call). + +With this PEP both examples would print ``None``, as the call to +``exec()`` and the subsequent call to ``locals()`` would use +independent dictionary snapshots of the local variables rather +than using the same shared dictionary cached on the frame object. Proposal ======== @@ -797,25 +830,6 @@ frame machinery will allow rebinding of local and nonlocal variable references in a way that is hidden from static analysis. -Retaining the internal frame value cache ----------------------------------------- - -Retaining the internal frame value cache results in some visible quirks when -frame proxy instances are kept around and re-used after name binding and -unbinding operations have been executed on the frame. - -The primary reason for retaining the frame value cache is to maintain backwards -compatibility with the ``PyEval_GetLocals()`` API. That API returns a borrowed -reference, so it must refer to persistent state stored on the frame object. -Storing a fast locals proxy object on the frame creates a problematic reference -cycle, so the cleanest option is to instead continue to return a frame value -cache, just as this function has done since optimised frames were first -introduced. - -With the frame value cache being kept around anyway, it then further made sense -to rely on it to simplify the fast locals proxy mapping implementation. - - What happens with the default args for ``eval()`` and ``exec()``? ----------------------------------------------------------------- @@ -840,9 +854,241 @@ namespace on each iteration). to make a list from the keys). +.. _pep-558-exec-eval-impact: + +Additional considerations for ``eval()`` and ``exec()`` in optimized scopes +--------------------------------------------------------------------------- + +Note: while implementing :pep:`667`, it was noted that neither that PEP nor this one +clearly explained the impact the ``locals()`` changes would have on code execution APIs +like ``exec()`` and ``eval()``. This section was added to this PEP's rationale to better +describe the impact and explain the intended benefits of the change. + +When ``exec()`` was converted from a statement to a builtin function +in Python 3.0 (part of the core language changes in :pep:`3100`), the +associated implicit call to ``PyFrame_LocalsToFast()`` was removed, so +it typically appears as if attempts to write to local variables with +``exec()`` in optimized frames are ignored:: + + >>> def f(): + ... x = 0 + ... exec("x = 1") + ... print(x) + ... print(locals()["x"]) + ... + >>> f() + 0 + 0 + +In truth, the writes aren't being ignored, they just aren't +being copied from the dictionary cache back to the optimized local +variable array. The changes to the dictionary are then overwritten +the next time the dictionary cache is refreshed from the array:: + + >>> def f(): + ... x = 0 + ... locals_cache = locals() + ... exec("x = 1") + ... print(x) + ... print(locals_cache["x"]) + ... print(locals()["x"]) + ... + >>> f() + 0 + 1 + 0 + +.. _pep-558-ctypes-example: + +The behaviour becomes even stranger if a tracing function +or another piece of code invokes ``PyFrame_LocalsToFast()`` before +the cache is next refreshed. In those cases the change *is* +written back to the optimized local variable array:: + + >>> from sys import _getframe + >>> from ctypes import pythonapi, py_object, c_int + >>> _locals_to_fast = pythonapi.PyFrame_LocalsToFast + >>> _locals_to_fast.argtypes = [py_object, c_int] + >>> def f(): + ... _frame = _getframe() + ... _f_locals = _frame.f_locals + ... x = 0 + ... exec("x = 1") + ... _locals_to_fast(_frame, 0) + ... print(x) + ... print(locals()["x"]) + ... print(_f_locals["x"]) + ... + >>> f() + 1 + 1 + 1 + +This situation was more common in Python 3.10 and earlier +versions, as merely installing a tracing function was enough +to trigger implicit calls to ``PyFrame_LocalsToFast()`` after +every line of Python code. However, it can still happen in Python +3.11+ depending on exactly which tracing functions are active +(e.g. interactive debuggers intentionally do this so that changes +made at the debugging prompt are visible when code execution +resumes). + +All of the above comments in relation to ``exec()`` apply to +*any* attempt to mutate the result of ``locals()`` in optimized +scopes, and are the main reason that the ``locals()`` builtin +docs contain this caveat: + + Note: The contents of this dictionary should not be modified; + changes may not affect the values of local and free variables + used by the interpreter. + +While the exact wording in the library reference is not entirely explicit, +both ``exec()`` and ``eval()`` have long used the results of calling +``globals()`` and ``locals()`` in the calling Python frame as their default +execution namespace. + +This was historically also equivalent to using the calling frame's +``frame.f_globals`` and ``frame.f_locals`` attributes, but this PEP maps +the default namespace arguments for ``exec()`` and ``eval()`` to +``globals()`` and ``locals()`` in the calling frame in order to preserve +the property of defaulting to ignoring attempted writes to the local +namespace in optimized scopes. + +This poses a potential compatibility issue for some code, as with the +previous implementation that returns the same dict when ``locals()`` is called +multiple times in function scope, the following code usually worked due to +the implicitly shared local variable namespace:: + + def f(): + exec('a = 0') # equivalent to exec('a = 0', globals(), locals()) + exec('print(a)') # equivalent to exec('print(a)', globals(), locals()) + print(locals()) # {'a': 0} + # However, print(a) will not work here + f() + +With ``locals()`` in an optimised scope returning the same shared dict for each call, +it was possible to store extra "fake locals" in that dict. While these aren't real +locals known by the compiler (so they can't be printed with code like ``print(a)``), +they can still be accessed via ``locals()`` and shared between multiple ``exec()`` +calls in the same function scope. Furthermore, because they're *not* real locals, +they don't get implicitly updated or removed when the shared cache is refreshed +from the local variable storage array. + +When the code in ``exec()`` tries to write to an existing local variable, the +runtime behaviour gets harder to predict:: + + def f(): + a = None + exec('a = 0') # equivalent to exec('a = 0', globals(), locals()) + exec('print(a)') # equivalent to exec('print(a)', globals(), locals()) + print(locals()) # {'a': None} + f() + +``print(a)`` will print ``None`` because the implicit ``locals()`` call in +``exec()`` refreshes the cached dict with the actual values on the frame. +This means that, unlike the "fake" locals created by writing back to ``locals()`` +(including via previous calls to ``exec()``), the real locals known by the +compiler can't easily be modified by ``exec()`` (it can be done, but it requires +both retrieving the ``frame.f_locals`` attribute to enable writes back to the frame, +and then invoking ``PyFrame_LocalsToFast()``, as :ref:`shown ` +using ``ctypes`` above). + +As noted in the :ref:`pep-558-motivation` section, this confusing side effect +happens even if the local variable is only defined *after* the ``exec()`` calls:: + + >>> def f(): + ... exec("a = 0") + ... exec("print('a' in locals())") # Printing 'a' directly won't work + ... print(locals()) + ... a = None + ... print(locals()) + ... + >>> f() + False + {} + {'a': None} + +Because ``a`` is a real local variable that is not currently bound to a value, it +gets explicitly removed from the dictionary returned by ``locals()`` whenever +``locals()`` is called prior to the ``a = None`` line. This removal is intentional, +as it allows the contents of ``locals()`` to be updated correctly in optimized +scopes when ``del`` statements are used to delete previously bound local variables. + +As noted in the ``ctypes`` :ref:`example `, the above behavioural +description may be invalidated if the CPython ``PyFrame_LocalsToFast()`` API gets invoked +while the frame is still running. In that case, the changes to ``a`` *might* become visible +to the running code, depending on exactly when that API is called (and whether the frame +has been primed for locals modification by accessing the ``frame.f_locals`` attribute). + +As described above, two options were considered to replace this confusing behaviour: + +* make ``locals()`` return write-through proxy instances (similar + to ``frame.f_locals``) +* make ``locals()`` return genuinely independent snapshots so that + attempts to change the values of local variables via ``exec()`` + would be *consistently* ignored without any of the caveats + noted above. + +The PEP chooses the second option for the following reasons: + +* returning independent snapshots in optimized scopes preserves + the Python 3.0 change to ``exec()`` that resulted in attempts + to mutate local variables via ``exec()`` being ignored in most + cases +* the distinction between "``locals()`` gives an instantaneous + snapshot of the local variables in optimized scopes, and + read/write access in other scopes" and "``frame.f_locals`` + gives read/write access to the local variables in all scopes, + including optimized scopes" allows the intent of a piece of + code to be clearer than it would be if both APIs granted + full read/write access in optimized scopes, even when write + access wasn't needed or desired +* in addition to improving clarity for human readers, ensuring + that name rebinding in optimized scopes remains lexically + visible in the code (as long as the frame introspection APIs + are not accessed) allows compilers and interpreters to apply + related performance optimizations more consistently +* only Python implementations that support the optional frame + introspection APIs will need to provide the new write-through + proxy support for optimized frames + +With the semantic changes to ``locals()`` in this PEP, it becomes much easier to explain +the behavior of ``exec()`` and ``eval()``: in optimized scopes, they will *never* implicitly +affect local variables; in other scopes, they will *always* implicitly affect local +variables. In optimized scopes, any implicit assignment to the local variables will be +discarded when the code execution API returns, since a fresh copy of the local variables +is used on each invocation. + + +Retaining the internal frame value cache +---------------------------------------- + +Retaining the internal frame value cache results in some visible quirks when +frame proxy instances are kept around and re-used after name binding and +unbinding operations have been executed on the frame. + +The primary reason for retaining the frame value cache is to maintain backwards +compatibility with the ``PyEval_GetLocals()`` API. That API returns a borrowed +reference, so it must refer to persistent state stored on the frame object. +Storing a fast locals proxy object on the frame creates a problematic reference +cycle, so the cleanest option is to instead continue to return a frame value +cache, just as this function has done since optimised frames were first +introduced. + +With the frame value cache being kept around anyway, it then further made sense +to rely on it to simplify the fast locals proxy mapping implementation. + +Note: the fact :pep:`667` *doesn't* use the internal frame value cache as part of the +write-through proxy implementation is the key Python level difference between the two PEPs. + + Changing the frame API semantics in regular operation ----------------------------------------------------- +Note: when this PEP was first written, it predated the Python 3.11 change to drop the +implicit writeback of the frame local variables whenever a tracing function was installed, +so making that change was included as part of the proposal. + Earlier versions of this PEP proposed having the semantics of the frame ``f_locals`` attribute depend on whether or not a tracing hook was currently installed - only providing the write-through proxy behaviour when a tracing hook