PEP 558: Update to use a write-through proxy

It turns out that *any* write-back based design has inherent flaws that make it difficult to build a source debugger that reliably allows mutation of function local variables. So this switches to Nathaniel's suggested write-through proxy idea, but constrains it to only applying when a trace hook is installed. This means the official language level semantics can just use the simpler model where rebinding function local variables via locals() simply isn't possible - only folks already working with frames and trace functions will need to be aware of the semantics of the write-through proxy.
2017-10-22 16:12:25 +10:00 · 2017-10-22 16:12:25 +10:00 · 9a8e590a52
parent 86c06a737c
commit 9a8e590a52
1 changed files with 225 additions and 139 deletions
--- a/pep-0558.rst
+++ b/pep-0558.rst
@ -32,6 +32,12 @@ Other implementations such as PyPy are currently replicating that behaviour,
 up to and including replication of local variable mutation bugs that
 can arise when a trace hook is installed [1]_.

+While we consider CPython's current behaviour when no trace hooks are installed
+acceptable and desirable, we consider the current behaviour when trace hooks
+are installed to be problematic, as it causes bugs like [1]_ *without* reliably
+enabling the desired functionality of allowing debuggers like ``pdb`` to mutate
+local variables [3]_.
+

 Proposal
 ========
@ -46,6 +52,65 @@ execution scope. For this purpose, the defined scopes of execution are:
  namespaces
 * function scope: code in the body of a ``def`` or ``async def`` statement

+We also allow interpreters to define two "modes" of execution, with only the
+first mode being considered part of the language specification itself:
+
+* regular operation: the way the interpreter behaves by default
+* tracing mode: the way the interpreter behaves when a trace hook has been
+  registered in one or more threads via an implementation dependent mechanism
+  like ``sys.settrace`` ([4]_) in CPython's ``sys`` module or
+  ``PyEval_SetTrace`` ([5]_) in CPython's C API
+
+For regular operation, this PEP proposes elevating the current behaviour of
+the CPython reference implementation to become part of the language
+specification.
+
+For tracing mode, this PEP proposes changes to CPython's behaviour at function
+scope that bring the `locals()`` builtin semantics closer to those used in
+regular operation, while also making the related frame API semantics clearer
+and easier for interactive debuggers to rely on.
+
+
+New ``locals()`` documentation
+------------------------------
+
+The heart of this proposal is to revise the documentation for the ``locals()``
+builtin to read as follows:
+
+    Return a dictionary representing the current local symbol table, with
+    variable names as the keys, and their currently bound references as the
+    values. This will always be the same dictionary for a given runtime
+    execution frame.
+
+    At module scope, as well as when using ``exec()`` or ``eval()`` with a
+    single namespace, this function returns the same namespace as ``globals()``.
+
+    At class scope, it returns the namespace that will be passed to the
+    metaclass constructor.
+
+    When using ``exec()`` or ``eval()`` with separate local and global
+    namespaces, it returns the local namespace passed in to the function call.
+
+    At function scope (including for generators and coroutines), it returns a
+    dynamic snapshot of the function's local variables and any nonlocal cell
+    references. In this case, changes made via the snapshot are *not* written
+    back to the corresponding local variables or nonlocal cell references, and
+    any such changes to the snapshot will be overwritten if the snapshot is
+    subsequently refreshed (e.g. by another call to ``locals()``).
+
+    CPython implementation detail: the dynamic snapshot for the current frame
+    will be implicitly refreshed before each call to the trace function when a
+    trace function is active.
+
+For reference, the current documentation of this builtin reads as follows:
+
+    Update and return a dictionary representing the current local symbol table.
+    Free variables are returned by locals() when it is called in function
+    blocks, but not in class blocks.
+
+    Note: The contents of this dictionary should not be modified; changes may
+    not affect the values of local and free variables used by the interpreter.
+

 Module scope
 ------------
@ -61,6 +126,15 @@ dynamically change the contents of the returned mapping, and changes to the
 returned mapping must change the values bound to local variable names in the
 execution environment.

+The semantics at module scope are required to be the same in both tracing
+mode (if provided by the implementation) and in regular operation.
+
+To capture this expectation as part of the language specification, the following
+paragraph will be added to the documentation for ``locals()``:
+
+   At module scope, as well as when using ``exec()`` or ``eval()`` with a
+   single namespace, this function returns the same namespace as ``globals()``.
+
 This part of the proposal does not require any changes to the reference
 implementation - it is standardisation of the current behaviour.

@ -81,14 +155,26 @@ change the contents of the returned mapping, and changes to the returned mapping
 must change the values bound to local variable names in the
 execution environment.

-The mapping returned by ``locals()`` will *not* be used as the actual class namespace
-underlying the defined class (the class creation process will copy the contents
-to a fresh dictionary that is only accessible by going through the class
-machinery).
+The mapping returned by ``locals()`` will *not* be used as the actual class
+namespace underlying the defined class (the class creation process will copy
+the contents to a fresh dictionary that is only accessible by going through the
+class machinery).

 For nested classes defined inside a function, any nonlocal cells referenced from
 the class scope are *not* included in the ``locals()`` mapping.

+The semantics at class scope are required to be the same in both tracing
+mode (if provided by the implementation) and in regular operation.
+
+To capture this expectation as part of the language specification, the following
+two paragraphs will be added to the documentation for ``locals()``:
+
+   When using ``exec()`` or ``eval()`` with separate local and global
+   namespaces, [this function] returns the given local namespace.
+
+   At class scope, it returns the namespace that will be passed to the metaclass
+   constructor.
+
 This part of the proposal does not require any changes to the reference
 implementation - it is standardisation of the current behaviour.

@ -101,91 +187,109 @@ to optimise local variable access, and hence are NOT required to permit
 arbitrary modification of local and nonlocal variable bindings through the
 mapping returned from ``locals()``.

-Instead, ``locals()`` is expected to return a mutable *snapshot* of the
-function's local variables and any referenced nonlocal cells with the following
-semantics:
+Historically, this leniency has been described in the language specification
+with the words "The contents of this dictionary should not be modified; changes
+may not affect the values of local and free variables used by the interpreter."

-* each call to ``locals()`` returns the *same* mapping object
-* each call to ``locals()`` updates the mapping to the current state of the
-  local variables and any nonlocal cells referenced from either the function
-  itself, or from any nested class definitions
-* changes to the returned mapping are *not* written back to the
-  local variable bindings or the nonlocal cell references
-* changes to the returned mapping may be overwritten by subsequent calls to
-  ``locals()`` and other operations that cause the mapping to be refreshed from
-  the actual execution state
-* for interpreters that provide access to frame objects, the reference returned
-  by ``locals()`` *must* be a reference to the same namespace as is returned by
-  ``inspect.currentframe().f_locals`` (in a running function, generator, or
-  coroutine), ``inspect.getgeneratorlocals()`` (in a running or suspended
-  generator), and ``inspect.getcoroutinelocals()`` (in a running or suspended
-  coroutine)
+This PEP proposes to change that text to instead say:

-Additional entries may also be added through ``locals()`` or ``frame.f_locals``
-and will then be accessible through both ``frame.f_locals`` and ``locals()``,
-but will not be accessible by name from within the function (as any
-names which don't appear as local or nonlocal variables at compile time will
-only be looked up in the module globals and process builtins, not in the
-function locals).
+    At function scope (including for generators and coroutines), [this function]
+    returns a
+    dynamic snapshot of the function's local variables and any nonlocal cell
+    references. In this case, changes made via the snapshot are *not* written
+    back to the corresponding local variables or nonlocal cell references, and
+    any such changes to the snapshot will be overwritten if the snapshot is
+    subsequently refreshed (e.g. by another call to ``locals()``).
+
+    CPython implementation detail: the dynamic snapshot for the currently
+    executing frame will be implicitly refreshed before each call to the trace
+    function when a trace function is active.
+
+This part of the proposal *does* require changes to the CPython reference
+implementation, as while it accurately describes the behaviour in regular
+operation, the "write back" strategy currently used to support namespace changes
+from trace functions doesn't comply with it (and also causes the quirky
+behavioural problems mentioned in the Rationale).


-Allowing trace hooks to reliably mutate local variables
-------------------------------------------------------
+CPython Implementation Changes
+==============================

-To allow for the implementation of runtime debuggers that can update local
-variable state, trace functions are required to write changes made to
-``frame.f_locals`` back to the actual execution namespace.
+The current cause of CPython's tracing mode quirks (both the side effects from
+simply installing a tracing function and the fact that writing values back to
+function locals only works for the specific function being traced) is the way
+that locals mutation support for trace hooks is currently implemented: the
+``PyFrame_FastToLocals`` function.

-This is not a problem for trace hooks executed at module or class scope, as
-any changes made via ``frame.f_locals`` are made directly to the actual local
-namespace used for code execution, and hence no special handling of trace hooks
-is required.
+When a trace function is installed, CPython currently does the following for
+function frames (those where the code object uses "fast locals" semantics):

-At function scope, however, special trace hook handling is needed in order to
-copy changes made through ``frame.f_locals`` back into the actual execution
-state.
+1. Calls ``PyFrame_FastToLocals`` to update the dynamic snapshot
+2. Calls the trace hook (with tracing of the hook itself disabled)
+3. Calls ``PyFrame_LocalsToFast`` to capture any changes made to the dynamic
+   snapshot

-For Python versions up to and including Python 3.6, this worked as follows:
+This approach is problematic for a few different reasons:

-1. Before calling the trace hook, update ``frame.f_locals`` from the current
-   execution state
-2. Run the trace hook
-3. After the trace hook returns, update the current execution state from
-   ``frame.f_locals``
+* Even if the trace function doesn't mutate the snapshot, the final step resets
+  any cell references back to the state they were in before the trace function
+  was called (this is the root cause of the bug report in [1]_)
+* If the trace function *does* mutate the snapshot, but then does something
+  that causes the snapshot to be refreshed, those changes are lost (this is
+  one aspect of the bug report in [3]_)
+* If the trace function attempts to mutate the local variables of a frame other
+  than the one being traced (e.g. ``frame.f_back.f_locals``), those changes
+  will almost certainly be lost (this is another aspect of the bug report in
+  [3]_)
+* If a ``locals()`` reference is passed to another function, and *that*
+  function mutates the snapshot namespace, then those changes *may* be written
+  back to the execution frame *if* a trace hook is installed

-Due to the problems this behaviour creates for closure references (as reported
-in [1]_), this PEP proposes to amend this behaviour as follows:
+The proposed resolution to this problem is to take advantage of the fact that
+whereas functions typically access their *own* namespace using the language
+defined ``locals()`` builtin, trace functions necessarily use the implementation
+dependent ``frame.f_locals`` interface, as a frame reference is what gets
+passed to hook implementations.

-1. Before calling the trace hook, update ``frame.f_locals`` from the current
-   execution state, but include the actual cell object for all closure
-   references, *not* the value referred to by the cell
-2. Run the trace hook
-3. After the trace hook returns:
+In regular operation, nothing will change - ``frame.f_locals`` will be a direct
+reference to the dynamic snapshot, and ``locals()`` will return a reference to
+that snapshot. This reflects the fact that it's only CPython's tracing mode
+semantics that are currently problematic.

-  * update the current execution state from ``frame.f_locals``, but leave
-    closure reference values unmodified if ``frame.f_locals`` still contains
-    the relevant cell object for that variable reference (and hence clearly
-    hasn't been modified by the trace function)
-  * after updating the execution state, replace the cells for closure references
-    in ``frame.f_locals`` with the values referenced by those cells (restoring
-    the expected behaviour of ``locals()`` at function scope)
+In tracing mode, however, we will change ``frame.f_locals`` to instead return
+a dedicated proxy type (probably implemented as a private subclass of
+``types.MappingProxyType``) that has two internal attributes not exposed as
+part of either the Python or public C API:

+* *mapping*: the dynamic snapshot that would be returned by ``frame.f_locals``
+  during regular operation
+* *frame*: the underlying frame that the snapshot is for

-Open Questions
-==============
+The ``locals()`` builtin would be aware of this proxy type, and continue to
+return a reference to the dynamic snapshot even when in tracing mode.

-How much compatibility is enough compatibility?
-----------------------------------------------
+As long as the process remains in tracing mode, then ``__setitem__`` and
+``__delitem__`` operations on the proxy will affect not only the dynamic
+snapshot, but *also* the corresponding fast local or cell reference on the
+underlying frame.

-As discussed below, the proposed design aims to keep almost all current code
-working, *except* code that relies on being able to read the values of
-closure references directly from ``frame.f_locals`` while a trace hook is
-running.
+If the process leaves tracing mode (i.e. all previously installed trace hooks
+are uninstalled), then any already created proxy objects will remain in place,
+but their ``__setitem__`` and ``__delitem__`` methods will skip mutating
+the underlying frame.

-This is considered reasonable, as trace hooks may use
-``frame.f_code.co_freevars`` and ``frame.f_code.co_cellvars`` to identify
-variables for which they need to read ``frame.f_locals[varname].cell_contents``
-to get the actual current value, rather than the cell object.
+At the C API layer, ``PyEval_GetLocals()`` will implement the same semantics
+as the Python level ``locals()`` builtin, and a new ``PyFrame_GetLocals(frame)``
+accessor API will be provided to allow the proxy bypass logic to be encapsulated
+entirely inside the frame implementation. The C level equivalent of accessing
+``pyframe.f_locals`` in Python will be to access ``cframe->f_locals`` directly
+(the one difference is that the Python descriptor will continue to include an
+implicit snapshot refresh).
+
+The ``PyFrame_LocalsToFast()`` function will be changed to always emit
+``RuntimeError``, explaining that it is no longer a supported operation, and
+affected code should be updated to rely on the write-through tracing mode
+proxy instead.


 Design Discussion
@ -199,8 +303,10 @@ reference implementation it has historically returned a mutable mapping with
 the following characteristics:

 * each call to ``locals()`` returns the *same* mapping
-* each call to ``locals()`` updates the mapping with the current
-  state of the local variables and any referenced nonlocal cells
+* for namespaces where ``locals()`` returns a reference to something other than
+  the actual local execution namespace, each call to ``locals()`` updates the
+  mapping with the current state of the local variables and any referenced
+  nonlocal cells
 * changes to the returned mapping *usually* aren't written back to the
  local variable bindings or the nonlocal cell references, but write backs
  can be triggered by doing one of the following:
@ -211,10 +317,31 @@ the following characteristics:
  * running an ``exec`` statement in the function's scope (Py2 only, since
    ``exec`` became an ordinary builtin in Python 3)

-The current proposal aims to retain the first two properties (to maintain
-backwards compatibility with as much code as possible) while still
-eliminating the ability to dynamically alter local and nonlocal variable
-bindings through the mapping returned by ``locals()``.
+The proposal in this PEP aims to retain the first two properties (to maintain
+backwards compatibility with as much code as possible) while ensuring that
+simply installing a trace hook can't enable rebinding of function locals via
+the ``locals()`` builtin (whereas enabling rebinding via
+``inspect.currentframe().f_locals`` is fully intended).
+
+
+Ensuring any semantic changes are restricted to tracing mode
+------------------------------------------------------------
+
+It would be possible to say that ``frame.f_locals`` should *always* return a
+write-through proxy, even in regular operation.
+
+This PEP avoids that option for a couple of key reasons, one pragmatic and one
+more philosophical:
+
+* Object allocations and method wrappers aren't free, and tracing functions
+  aren't the only operations that access frame locals from outside the function.
+  Restricting the changes to tracing mode means that the additional memory and
+  execution time overhead of these changes are going to be as close to zero in
+  regular operation as we can possibly make them
+* "Don't change what isn't broken": the current tracing mode problems are caused
+  by a requirement that's specific to tracing mode (support for external
+  rebinding of function local variable references), so it makes sense to also
+  restrict any related fixes to tracing mode


 What happens with the default args for ``eval()`` and ``exec()``?
@ -240,7 +367,7 @@ are rather quirky due to historical implementation details:

  * allowing trace functions to read the state of local variables
  * allowing traceback processors to read the state of local variables
-  * allowing locals() to read the state of local variables
+  * allowing ``locals()`` to read the state of local variables
 * a direct reference to ``frame.f_locals`` is returned from ``locals()``, so if
  you hand out multiple concurrent references, then all those references will be
  to the exact same dictionary
@ -259,72 +386,21 @@ only make sense in terms of the historical evolution of the language and the
 reference implementation, rather than being deliberately designed.


-Rejected Alternatives
-=====================
-
-Allowing local variable binding mutation outside trace functions
----------------------------------------------------------------
-
-Earlier versions of this PEP allowed local variable bindings to be mutated
-whenever code had access to the frame object - it didn't restrict that ability
-to trace functions the way the status quo does.
-
-This was considered undesirable, so the design was changed to retain the
-characteristic where only trace hooks can mutate local variable bindings
-from outside a function.
-
-
-Making ``frame.f_locals`` a write-through proxy at function scope
-----------------------------------------------------------------
-
-While frame objects and related APIs are an explicitly optional feature of
-Python implementations, there are nevertheless a lot of debuggers and other
-introspection tools that expect them to behave in certain ways, including the
-ability to update the bindings of local variables and nonlocal cell references
-by modifying ``frame.f_locals`` in a trace hook, as well as being able to store
-custom keys in the local namespace for arbitrary frames and retrieve those
-values later.
-
-Rather than the proposed approach of temporarily injecting the closure cells
-into ``frame.f_locals`` and using that to determine if a trace hook has
-rebound a particular local variable reference, it would technically be
-possible to devise a write-through proxy that *immediately* wrote local variable
-rebindings back to the frame execution state, closer to the way things work
-at module and class scope.
-
-However, in addition to being more complex to implement, adopting such an
-approach would *also* allow arbitrary changes to local variables in suspended
-generators and coroutines, as well as potentially allowing other threads to
-mutate a regular synchronous function's local variables while it was running.
-
-While it does introduce some additional runtime overhead when calling trace
-hooks in frames that provide or reference closure variables, the proposal in
-the PEP more specifically targets the actual problem being solved (i.e. updates
-to closure variable references being unexpectedly overwritten by the trace hook
-machinery) while otherwise preserving the existing semantics of both
-``locals()`` and ``frame.f_locals``.
-
-
-Making ``locals()`` and ``frame.f_locals`` refer to different namespaces
------------------------------------------------------------------------
-
-Rather than replacing closure references in ``frame.f_locals`` before and
-after calling trace hooks, it would also be possible to persistently maintain
-two different namespaces, one containing the cell objects, and one containing
-the values they reference.
-
-Similar to the write-through proxy idea, this has been rejected mainly on the
-basis of it being a larger divergence from established semantics than is needed
-to actually solve the problem with changes to closure variable references being
-unexpectedly overwritten by the trace hook machinery.
-
-
 Implementation
 ==============

 The reference implementation update is TBD - when available, it will be linked
 from [2]_.

+
+Acknowledgements
+================
+
+Thanks to Nathaniel J. Smith for proposing the write-through proxy idea in
+[1]_ and pointing out some critical design flaws in earlier iterations of the
+PEP that attempted to avoid introducing such a proxy.
+
+
 References
 ==========

@ -334,6 +410,16 @@ References
 .. [2] Clarify the required behaviour of ``locals()``
   (https://bugs.python.org/issue17960)

+.. [3] Updating function local variables from pdb is unreliable
+   (https://bugs.python.org/issue9633)
+
+.. [4] CPython's Python API for installing trace hooks
+   (https://docs.python.org/dev/library/sys.html#sys.settrace)
+
+.. [5] CPython's C API for installing trace hooks
+   (https://docs.python.org/3/c-api/init.html#c.PyEval_SetTrace)
+
+
 Copyright
 =========