PEP 558: Switch to independent snapshots at function scope (#1094)
* Switch to independent snapshots at function scope * New public C API, PyEval_Get_PyLocals(), that matches the updated locals() builtin * At function scope, PyEval_GetLocals() returns the internal shared mapping from inside the proxy (returning a borrowed reference means this API can't offer the new independent snapshot semantics)
This commit is contained in:
parent
8067cb323e
commit
54888058ce
255
pep-0558.rst
255
pep-0558.rst
|
@ -7,7 +7,7 @@ Type: Standards Track
|
|||
Content-Type: text/x-rst
|
||||
Created: 2017-09-08
|
||||
Python-Version: 3.9
|
||||
Post-History: 2017-09-08
|
||||
Post-History: 2017-09-08, 2019-05-22, 2019-05-30, 2019-12-30
|
||||
|
||||
|
||||
Abstract
|
||||
|
@ -34,11 +34,18 @@ up to and including replication of local variable mutation bugs that
|
|||
can arise when a trace hook is installed [1]_.
|
||||
|
||||
While this PEP considers CPython's current behaviour when no trace hooks are
|
||||
installed to be acceptable (and largely desirable), it considers the current
|
||||
installed to be largely acceptable, it considers the current
|
||||
behaviour when trace hooks are installed to be problematic, as it causes bugs
|
||||
like [1]_ *without* even reliably enabling the desired functionality of allowing
|
||||
debuggers like ``pdb`` to mutate local variables [3]_.
|
||||
|
||||
Review of the initial PEP and the draft implementation then identified an
|
||||
opportunity for simplification of both the documentation and implementation
|
||||
of the function level ``locals()`` behaviour by updating it to return an
|
||||
independent snapshot of the function locals and closure variables on each call,
|
||||
rather than continuing to return the semi-dynamic snapshot that it has
|
||||
historically returned in CPython.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
@ -51,7 +58,9 @@ execution scope. For this purpose, the defined scopes of execution are:
|
|||
* class scope: code in the body of a ``class`` statement, as well as any other
|
||||
code executed using ``exec()`` or ``eval()`` with separate local and global
|
||||
namespaces
|
||||
* function scope: code in the body of a ``def`` or ``async def`` statement
|
||||
* function scope: code in the body of a ``def`` or ``async def`` statement,
|
||||
or any other construct that creates an optimized code block in CPython (e.g.
|
||||
comprehensions, lambda functions)
|
||||
|
||||
We also allow interpreters to define two "modes" of execution, with only the
|
||||
first mode being considered part of the language specification itself:
|
||||
|
@ -62,12 +71,14 @@ first mode being considered part of the language specification itself:
|
|||
like ``sys.settrace`` ([4]_) in CPython's ``sys`` module or
|
||||
``PyEval_SetTrace`` ([5]_) in CPython's C API
|
||||
|
||||
For regular operation, this PEP proposes elevating the current behaviour of
|
||||
the CPython reference implementation to become part of the language
|
||||
specification.
|
||||
For regular operation, this PEP proposes elevating most of the current behaviour
|
||||
of the CPython reference implementation to become part of the language
|
||||
specification, *except* that each call to ``locals()`` at function scope will
|
||||
create a new dictionary object, rather than caching a common dict instance in
|
||||
the frame object that each invocation will update and return.
|
||||
|
||||
For tracing mode, this PEP proposes changes to CPython's behaviour at function
|
||||
scope that bring the ``locals()`` builtin semantics closer to those used in
|
||||
scope that make the ``locals()`` builtin semantics identical to those used in
|
||||
regular operation, while also making the related frame API semantics clearer
|
||||
and easier for interactive debuggers to rely on.
|
||||
|
||||
|
@ -82,10 +93,9 @@ New ``locals()`` documentation
|
|||
The heart of this proposal is to revise the documentation for the ``locals()``
|
||||
builtin to read as follows:
|
||||
|
||||
Return a dictionary representing the current local symbol table, with
|
||||
Return a mapping object representing the current local symbol table, with
|
||||
variable names as the keys, and their currently bound references as the
|
||||
values. This will always be the same dictionary for a given runtime
|
||||
execution frame.
|
||||
values.
|
||||
|
||||
At module scope, as well as when using ``exec()`` or ``eval()`` with a
|
||||
single namespace, this function returns the same namespace as ``globals()``.
|
||||
|
@ -96,16 +106,32 @@ builtin to read as follows:
|
|||
When using ``exec()`` or ``eval()`` with separate local and global
|
||||
namespaces, it returns the local namespace passed in to the function call.
|
||||
|
||||
At function scope (including for generators and coroutines), it returns a
|
||||
dynamic snapshot of the function's local variables and any nonlocal cell
|
||||
references. In this case, changes made via the snapshot are *not* written
|
||||
back to the corresponding local variables or nonlocal cell references, and
|
||||
any such changes to the snapshot will be overwritten if the snapshot is
|
||||
subsequently refreshed (e.g. by another call to ``locals()``).
|
||||
In all of the above cases, each call to ``locals()`` in a given frame of
|
||||
execution will return the *same* mapping object. Changes made through
|
||||
the mapping object returned from ``locals()`` will be visible as bound,
|
||||
rebound, or deleted local variables, and binding, rebinding, or deleting
|
||||
local variables will immediately affect the contents of the returned mapping
|
||||
object.
|
||||
|
||||
At function scope (including for generators and coroutines), each call to
|
||||
``locals()`` instead returns a fresh snapshot of the function's local
|
||||
variables and any nonlocal cell references. In this case, changes made via
|
||||
the snapshot are *not* written back to the corresponding local variables or
|
||||
nonlocal cell references, and binding, rebinding, or deleting local
|
||||
variables and nonlocal cell references does *not* affect the contents
|
||||
of previously created snapshots.
|
||||
|
||||
|
||||
There would also be a versionchanged note for Python 3.9:
|
||||
|
||||
In prior versions, the semantics of mutating the mapping object returned
|
||||
from ``locals()`` were formally undefined. In CPython specifically,
|
||||
the mapping returned at function scope could be implicitly refreshed by
|
||||
other operations, such as calling ``locals()`` again, or the interpreter
|
||||
implicitly invoking a Python level trace function. Obtaining the legacy
|
||||
CPython behaviour now requires explicit calls to update the originally
|
||||
returned snapshot from a freshly updated one.
|
||||
|
||||
CPython implementation detail: the dynamic snapshot for the current frame
|
||||
will be implicitly refreshed before each call to the trace function when a
|
||||
trace function is active.
|
||||
|
||||
For reference, the current documentation of this builtin reads as follows:
|
||||
|
||||
|
@ -117,11 +143,11 @@ For reference, the current documentation of this builtin reads as follows:
|
|||
not affect the values of local and free variables used by the interpreter.
|
||||
|
||||
(In other words: the status quo is that the semantics and behaviour of
|
||||
``locals()`` are currently formally implementation defined, whereas the proposed
|
||||
``locals()`` are formally implementation defined, whereas the proposed
|
||||
state after this PEP is that the only implementation defined behaviour will be
|
||||
that encountered at function scope when a tracing function is defined, with the
|
||||
behaviour in all other cases being defined by the language and library
|
||||
references)
|
||||
that associated with whether or not the implementation emulates the CPython
|
||||
frame API, with the behaviour in all other cases being defined by the language
|
||||
and library references)
|
||||
|
||||
|
||||
Module scope
|
||||
|
@ -205,27 +231,28 @@ may not affect the values of local and free variables used by the interpreter."
|
|||
|
||||
This PEP proposes to change that text to instead say:
|
||||
|
||||
At function scope (including for generators and coroutines), [this function]
|
||||
returns a dynamic snapshot of the function's local variables and any
|
||||
nonlocal cell references. In this case, changes made via the snapshot are
|
||||
*not* written back to the corresponding local variables or nonlocal cell
|
||||
references, and any such changes to the snapshot will be overwritten if the
|
||||
snapshot is subsequently refreshed (e.g. by another call to ``locals()``).
|
||||
|
||||
CPython implementation detail: the dynamic snapshot for the currently
|
||||
executing frame will be implicitly refreshed before each call to the trace
|
||||
function when a trace function is active.
|
||||
At function scope (including for generators and coroutines), each call to
|
||||
``locals()`` instead returns a fresh snapshot of the function's local
|
||||
variables and any nonlocal cell references. In this case, changes made via
|
||||
the snapshot are *not* written back to the corresponding local variables or
|
||||
nonlocal cell references, and binding, rebinding, or deleting local
|
||||
variables and nonlocal cell references does *not* affect the contents
|
||||
of previously created snapshots.
|
||||
|
||||
This part of the proposal *does* require changes to the CPython reference
|
||||
implementation, as while it accurately describes the behaviour in regular
|
||||
operation, the "write back" strategy currently used to support namespace changes
|
||||
from trace functions doesn't comply with it (and also causes the quirky
|
||||
implementation, as CPython currently returns a shared mapping object that may
|
||||
be implicitly refreshed by additional calls to ``locals()``, and the
|
||||
"write back" strategy currently used to support namespace changes
|
||||
from trace functions also doesn't comply with it (and causes the quirky
|
||||
behavioural problems mentioned in the Rationale).
|
||||
|
||||
|
||||
CPython Implementation Changes
|
||||
==============================
|
||||
|
||||
Resolving the issues with tracing mode behaviour
|
||||
------------------------------------------------
|
||||
|
||||
The current cause of CPython's tracing mode quirks (both the side effects from
|
||||
simply installing a tracing function and the fact that writing values back to
|
||||
function locals only works for the specific function being traced) is the way
|
||||
|
@ -268,48 +295,103 @@ proxy type (implemented as a private subclass of the existing
|
|||
``types.MappingProxyType``) that has two internal attributes not exposed as
|
||||
part of either the Python or public C API:
|
||||
|
||||
* *mapping*: the dynamic snapshot that is returned by the ``locals()`` builtin
|
||||
* *mapping*: an implicitly updated snapshot of the function local variables
|
||||
and closure references, as well as any arbitrary items that have been set via
|
||||
the mapping API, even if they don't have storage allocated for them on the
|
||||
underlying frame
|
||||
* *frame*: the underlying frame that the snapshot is for
|
||||
|
||||
``__getitem__`` operations on the proxy will read directly from the stored
|
||||
snapshot.
|
||||
|
||||
The stored snapshot is implicitly updated when the ``f_locals`` attribute is
|
||||
retrieved from the frame object, as well as individual keys being updated by
|
||||
mutating operations on the proxy itself. This means that if a reference to the
|
||||
proxy is obtained from within the function, the proxy won't implicitly pick up
|
||||
name binding operations that take place as the function executes - the
|
||||
``f_locals`` attribute on the frame will need to be accessed again in order to
|
||||
trigger a refresh.
|
||||
|
||||
``__setitem__`` and ``__delitem__`` operations on the proxy will affect not only
|
||||
the dynamic snapshot, but *also* the corresponding fast local or cell reference
|
||||
on the underlying frame.
|
||||
|
||||
The ``locals()`` builtin will be made aware of this proxy type, and continue to
|
||||
return a reference to the dynamic snapshot rather than to the write-through
|
||||
proxy.
|
||||
After a frame has finished executing, cell references can still be updated via
|
||||
the proxy, but the link back to the underlying frame is explicitly broken to
|
||||
avoid creating a persistent reference cycle that unexpectedly keeps frames
|
||||
alive.
|
||||
|
||||
At the C API layer, ``PyEval_GetLocals()`` will implement the same semantics
|
||||
as the Python level ``locals()`` builtin, and a new
|
||||
``PyFrame_GetPyLocals(frame)`` accessor API will be provided to allow the
|
||||
function level proxy bypass logic to be encapsulated entirely inside the frame
|
||||
implementation.
|
||||
Other MutableMapping methods will behave as expected for a mapping with these
|
||||
essential method semantics.
|
||||
|
||||
The C level equivalent of accessing ``pyframe.f_locals`` in Python will be a
|
||||
new ``PyFrame_GetLocalsAttr(frame)`` API. Like the Python level descriptor, the
|
||||
new API will implicitly refresh the dynamic snapshot at function scope before
|
||||
returning a reference to the write-through proxy.
|
||||
|
||||
Making the behaviour at function scope less surprising
|
||||
------------------------------------------------------
|
||||
|
||||
The ``locals()`` builtin will be made aware of the new fast locals proxy type,
|
||||
and when it detects it on a frame, will return a fresh snapshot of the local
|
||||
namespace (i.e. the equivalent of ``dict(frame.f_locals)``) rather than
|
||||
returning the proxy directly.
|
||||
|
||||
|
||||
Changes to the public CPython C API
|
||||
-----------------------------------
|
||||
|
||||
The existing ``PyEval_GetLocals()`` API returns a borrowed reference, which
|
||||
means it cannot be updated to return the new dynamic snapshots at function
|
||||
scope. Instead, it will return a borrowed reference to the internal mapping
|
||||
maintained by the fast locals proxy. This shared mapping will behave similarly
|
||||
to the existing shared mapping in Python 3.8 and earlier, but the exact
|
||||
conditions under which it gets refreshed will be different. Specifically:
|
||||
|
||||
* accessing the Python level ``f_locals`` frame attribute
|
||||
* any call to ``PyFrame_GetPyLocals()`` or ``PyFrame_GetLocalsAttribute()``
|
||||
for the frame
|
||||
* any call to ``PyEval_GetLocals()``, ``PyEval_GetPyLocals()`` or the Python
|
||||
``locals()`` builtin while the frame is running
|
||||
|
||||
A new ``PyFrame_GetPyLocals(frame)`` API will be provided such that
|
||||
``PyFrame_GetPyLocals(PyEval_GetFrame())`` directly matches the
|
||||
semantics of the Python ``locals()`` builtin, returning a shallow copy of the
|
||||
internal mapping at function scope, rather than a direct reference to it.
|
||||
|
||||
A new ``PyEval_GetPyLocals()`` API will be provided as a convenience wrapper
|
||||
for the above operation that is suitable for inclusion in the stable ABI.
|
||||
|
||||
A new ``PyFrame_GetLocalsAttribute(frame)`` API will be provided as the C level
|
||||
equivalent of accessing ``pyframe.f_locals`` in Python. Like the Python level
|
||||
descriptor, the new API will implicitly create the write-through proxy object
|
||||
for function level frames if it doesn't already exist, and update the stored
|
||||
mapping to ensure it reflects the current state of the function local variables
|
||||
and closure references.
|
||||
|
||||
The ``PyFrame_LocalsToFast()`` function will be changed to always emit
|
||||
``RuntimeError``, explaining that it is no longer a supported operation, and
|
||||
affected code should be updated to rely on the write-through tracing mode
|
||||
proxy instead.
|
||||
affected code should be updated to use ``PyFrame_GetPyLocals(frame)`` or
|
||||
``PyFrame_GetLocalsAttribute(frame)`` instead.
|
||||
|
||||
|
||||
Additions to the stable ABI
|
||||
---------------------------
|
||||
|
||||
The new ``PyEval_GetPyLocals()`` API will be added to the stable ABI. The other
|
||||
new C API functions will be part of the CPython specific API only.
|
||||
|
||||
|
||||
Design Discussion
|
||||
=================
|
||||
|
||||
Ensuring ``locals()`` returns a shared snapshot at function scope
|
||||
-----------------------------------------------------------------
|
||||
Changing ``locals()`` to return independent snapshots at function scope
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
The ``locals()`` builtin is a required part of the language, and in the
|
||||
reference implementation it has historically returned a mutable mapping with
|
||||
the following characteristics:
|
||||
|
||||
* each call to ``locals()`` returns the *same* mapping
|
||||
* each call to ``locals()`` returns the *same* mapping object
|
||||
* for namespaces where ``locals()`` returns a reference to something other than
|
||||
the actual local execution namespace, each call to ``locals()`` updates the
|
||||
mapping with the current state of the local variables and any referenced
|
||||
mapping object with the current state of the local variables and any referenced
|
||||
nonlocal cells
|
||||
* changes to the returned mapping *usually* aren't written back to the
|
||||
local variable bindings or the nonlocal cell references, but write backs
|
||||
|
@ -321,19 +403,27 @@ the following characteristics:
|
|||
* running an ``exec`` statement in the function's scope (Py2 only, since
|
||||
``exec`` became an ordinary builtin in Python 3)
|
||||
|
||||
The proposal in this PEP aims to retain the first two properties (to maintain
|
||||
backwards compatibility with as much code as possible) while ensuring that
|
||||
simply installing a trace hook can't enable rebinding of function locals via
|
||||
the ``locals()`` builtin (whereas enabling rebinding via
|
||||
``frame.f_locals`` inside the tracehook implementation is fully intended).
|
||||
Originally this PEP proposed to retain the first two of these properties,
|
||||
while changing the third in order to address the outright behaviour bugs that
|
||||
it can cause.
|
||||
|
||||
In [7]_ Nathaniel Smith made a persuasive case that we could make the behaviour
|
||||
of ``locals()`` at function scope substantially less confusing by retaining only
|
||||
the second property and having each call to ``locals()`` at function scope
|
||||
return an *independent* snapshot of the local variables and closure references
|
||||
rather than updating an implicitly shared snapshot.
|
||||
|
||||
As this revised design also made the implementation markedly easier to follow,
|
||||
the PEP was updated to propose this change in behaviour, rather than retaining
|
||||
the historical shared snapshot.
|
||||
|
||||
|
||||
Keeping ``locals()`` as a dynamic snapshot at function scope
|
||||
------------------------------------------------------------
|
||||
Keeping ``locals()`` as a snapshot at function scope
|
||||
----------------------------------------------------
|
||||
|
||||
It would theoretically be possible to change the semantics of the ``locals()``
|
||||
builtin to return the write-through proxy at function scope, rather than
|
||||
continuing to return a dynamic snapshot.
|
||||
As discussed in [7]_, it would theoretically be possible to change the semantics
|
||||
of the ``locals()`` builtin to return the write-through proxy at function scope,
|
||||
rather than switching it to return independent snapshots.
|
||||
|
||||
This PEP doesn't (and won't) propose this as it's a backwards incompatible
|
||||
change in practice, even though code that relies on the current behaviour is
|
||||
|
@ -360,7 +450,8 @@ on the current reference interpreter implementation::
|
|||
1
|
||||
|
||||
Similarly, ``locals()`` can be passed to the ``exec()`` and ``eval()`` builtins
|
||||
at function scope without risking unexpected rebinding of local variables.
|
||||
at function scope (either explicitly or implicitly) without risking unexpected
|
||||
rebinding of local variables or closure references.
|
||||
|
||||
Provoking the reference interpreter into incorrectly mutating the local variable
|
||||
state requires a more complex setup where a nested function closes over a
|
||||
|
@ -379,6 +470,11 @@ JIT-compiled implementations only need to enable it when a frame introspection
|
|||
API is invoked, or a trace hook is installed, not whenever ``locals()`` is
|
||||
accessed at function scope.
|
||||
|
||||
Returning snapshots from ``locals()`` at function scope also means that static
|
||||
analysis for function level code will be more reliable, as only access to the
|
||||
frame machinery will allow mutation of local and nonlocal variables in a way
|
||||
that's hidden from static analysis.
|
||||
|
||||
|
||||
What happens with the default args for ``eval()`` and ``exec()``?
|
||||
-----------------------------------------------------------------
|
||||
|
@ -388,6 +484,21 @@ the calling scope by default.
|
|||
|
||||
There isn't any need for the PEP to change these defaults, so it doesn't.
|
||||
|
||||
However, usage of the C level ``PyEval_GetLocals()`` API in the CPython
|
||||
reference implementation will need to be reviewed to determine which cases
|
||||
need to be changed to use the new ``PyEval_GetPyLocals()`` API instead.
|
||||
|
||||
These changes will also have potential performance implications, especially
|
||||
for functions with large numbers of local variables (e.g. if these functions
|
||||
are called in a loop, calling ``locals()`` once before the loop and then passing
|
||||
the namespace into the function explicitly will give the same semantics and
|
||||
performance characteristics as the status quo, whereas relying on the implicit
|
||||
default would create a new snapshot on each iteration).
|
||||
|
||||
(Note: the reference implementation draft PR has updated the ``locals()`` and
|
||||
``vars()`` builtins to use ``PyEval_GetPyLocals()``, but has not yet
|
||||
updated the default local namespace arguments for ``eval()`` and ``exec()``).
|
||||
|
||||
|
||||
Changing the frame API semantics in regular operation
|
||||
-----------------------------------------------------
|
||||
|
@ -395,7 +506,8 @@ Changing the frame API semantics in regular operation
|
|||
Earlier versions of this PEP proposed having the semantics of the frame
|
||||
``f_locals`` attribute depend on whether or not a tracing hook was currently
|
||||
installed - only providing the write-through proxy behaviour when a tracing hook
|
||||
was active, and otherwise behaving the same as the ``locals()`` builtin.
|
||||
was active, and otherwise behaving the same as the historical ``locals()``
|
||||
builtin.
|
||||
|
||||
That was adopted as the original design proposal for a couple of key reasons,
|
||||
one pragmatic and one more philosophical:
|
||||
|
@ -403,7 +515,7 @@ one pragmatic and one more philosophical:
|
|||
* Object allocations and method wrappers aren't free, and tracing functions
|
||||
aren't the only operations that access frame locals from outside the function.
|
||||
Restricting the changes to tracing mode meant that the additional memory and
|
||||
execution time overhead of these changes would as close to zero in regular
|
||||
execution time overhead of these changes would be as close to zero in regular
|
||||
operation as we can possibly make them.
|
||||
* "Don't change what isn't broken": the current tracing mode problems are caused
|
||||
by a requirement that's specific to tracing mode (support for external
|
||||
|
@ -418,11 +530,11 @@ and removed.
|
|||
|
||||
Accordingly, the design was switched to the current one, where
|
||||
``frame.f_locals`` is always a write-through proxy, and ``locals()`` is always
|
||||
a dynamic snapshot, which is both simpler to implement and easier to explain.
|
||||
a snapshot, which is both simpler to implement and easier to explain.
|
||||
|
||||
Regardless of how the CPython reference implementation chooses to handle this,
|
||||
optimising compilers and interpreters also remain free to impose additional
|
||||
restrictions on debuggers, by making local variable mutation through frame
|
||||
restrictions on debuggers, such as making local variable mutation through frame
|
||||
objects an opt-in behaviour that may disable some optimisations (just as the
|
||||
emulation of CPython's frame API is already an opt-in flag in some Python
|
||||
implementations).
|
||||
|
@ -497,6 +609,9 @@ References
|
|||
.. [6] PEP 558 reference implementation
|
||||
(https://github.com/python/cpython/pull/3640/files)
|
||||
|
||||
.. [7] Nathaniel's review of possible function level semantics for locals()
|
||||
(https://mail.python.org/pipermail/python-dev/2019-May/157738.html)
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue