465 lines
20 KiB
ReStructuredText
465 lines
20 KiB
ReStructuredText
PEP: 558
|
||
Title: Defined semantics for locals()
|
||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||
BDFL-Delegate: Nathaniel J. Smith
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 2017-09-08
|
||
Python-Version: 3.7
|
||
Post-History: 2017-09-08
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
The semantics of the ``locals()`` builtin have historically been underspecified
|
||
and hence implementation dependent.
|
||
|
||
This PEP proposes formally standardising on the behaviour of the CPython 3.6
|
||
reference implementation for most execution scopes, with some adjustments to the
|
||
behaviour at function scope to make it more predictable and independent of the
|
||
presence or absence of tracing functions.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
While the precise semantics of the ``locals()`` builtin are nominally undefined,
|
||
in practice, many Python programs depend on it behaving exactly as it behaves in
|
||
CPython (at least when no tracing functions are installed).
|
||
|
||
Other implementations such as PyPy are currently replicating that behaviour,
|
||
up to and including replication of local variable mutation bugs that
|
||
can arise when a trace hook is installed [1]_.
|
||
|
||
While this PEP considers CPython's current behaviour when no trace hooks are
|
||
installed to be acceptable (and even desirable), it considers the current
|
||
behaviour when trace hooks are installed to be problematic, as it causes bugs
|
||
like [1]_ *without* even reliably enabling the desired functionality of allowing
|
||
debuggers like ``pdb`` to mutate local variables [3]_.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
The expected semantics of the ``locals()`` builtin change based on the current
|
||
execution scope. For this purpose, the defined scopes of execution are:
|
||
|
||
* module scope: top-level module code, as well as any other code executed using
|
||
``exec()`` or ``eval()`` with a single namespace
|
||
* class scope: code in the body of a ``class`` statement, as well as any other
|
||
code executed using ``exec()`` or ``eval()`` with separate local and global
|
||
namespaces
|
||
* function scope: code in the body of a ``def`` or ``async def`` statement
|
||
|
||
We also allow interpreters to define two "modes" of execution, with only the
|
||
first mode being considered part of the language specification itself:
|
||
|
||
* regular operation: the way the interpreter behaves by default
|
||
* tracing mode: the way the interpreter behaves when a trace hook has been
|
||
registered in one or more threads via an implementation dependent mechanism
|
||
like ``sys.settrace`` ([4]_) in CPython's ``sys`` module or
|
||
``PyEval_SetTrace`` ([5]_) in CPython's C API
|
||
|
||
For regular operation, this PEP proposes elevating the current behaviour of
|
||
the CPython reference implementation to become part of the language
|
||
specification.
|
||
|
||
For tracing mode, this PEP proposes changes to CPython's behaviour at function
|
||
scope that bring the ``locals()`` builtin semantics closer to those used in
|
||
regular operation, while also making the related frame API semantics clearer
|
||
and easier for interactive debuggers to rely on.
|
||
|
||
|
||
New ``locals()`` documentation
|
||
------------------------------
|
||
|
||
The heart of this proposal is to revise the documentation for the ``locals()``
|
||
builtin to read as follows:
|
||
|
||
Return a dictionary representing the current local symbol table, with
|
||
variable names as the keys, and their currently bound references as the
|
||
values. This will always be the same dictionary for a given runtime
|
||
execution frame.
|
||
|
||
At module scope, as well as when using ``exec()`` or ``eval()`` with a
|
||
single namespace, this function returns the same namespace as ``globals()``.
|
||
|
||
At class scope, it returns the namespace that will be passed to the
|
||
metaclass constructor.
|
||
|
||
When using ``exec()`` or ``eval()`` with separate local and global
|
||
namespaces, it returns the local namespace passed in to the function call.
|
||
|
||
At function scope (including for generators and coroutines), it returns a
|
||
dynamic snapshot of the function's local variables and any nonlocal cell
|
||
references. In this case, changes made via the snapshot are *not* written
|
||
back to the corresponding local variables or nonlocal cell references, and
|
||
any such changes to the snapshot will be overwritten if the snapshot is
|
||
subsequently refreshed (e.g. by another call to ``locals()``).
|
||
|
||
CPython implementation detail: the dynamic snapshot for the current frame
|
||
will be implicitly refreshed before each call to the trace function when a
|
||
trace function is active.
|
||
|
||
For reference, the current documentation of this builtin reads as follows:
|
||
|
||
Update and return a dictionary representing the current local symbol table.
|
||
Free variables are returned by locals() when it is called in function
|
||
blocks, but not in class blocks.
|
||
|
||
Note: The contents of this dictionary should not be modified; changes may
|
||
not affect the values of local and free variables used by the interpreter.
|
||
|
||
(In other words: the status quo is that the semantics and behaviour of
|
||
``locals()`` are currently formally implementation defined, whereas the proposed
|
||
state after this PEP is that the only implementation defined behaviour will be
|
||
that encountered at function scope when a tracing function is defined, with the
|
||
behaviour in all other cases being defined by the language and library
|
||
references)
|
||
|
||
|
||
Module scope
|
||
------------
|
||
|
||
At module scope, as well as when using ``exec()`` or ``eval()`` with a
|
||
single namespace, ``locals()`` must return the same object as ``globals()``,
|
||
which must be the actual execution namespace (available as
|
||
``inspect.currentframe().f_locals`` in implementations that provide access
|
||
to frame objects).
|
||
|
||
Variable assignments during subsequent code execution in the same scope must
|
||
dynamically change the contents of the returned mapping, and changes to the
|
||
returned mapping must change the values bound to local variable names in the
|
||
execution environment.
|
||
|
||
The semantics at module scope are required to be the same in both tracing
|
||
mode (if provided by the implementation) and in regular operation.
|
||
|
||
To capture this expectation as part of the language specification, the following
|
||
paragraph will be added to the documentation for ``locals()``:
|
||
|
||
At module scope, as well as when using ``exec()`` or ``eval()`` with a
|
||
single namespace, this function returns the same namespace as ``globals()``.
|
||
|
||
This part of the proposal does not require any changes to the reference
|
||
implementation - it is standardisation of the current behaviour.
|
||
|
||
|
||
Class scope
|
||
-----------
|
||
|
||
At class scope, as well as when using ``exec()`` or ``eval()`` with separate
|
||
global and local namespaces, ``locals()`` must return the specified local
|
||
namespace (which may be supplied by the metaclass ``__prepare__`` method
|
||
in the case of classes). As for module scope, this must be a direct reference
|
||
to the actual execution namespace (available as
|
||
``inspect.currentframe().f_locals`` in implementations that provide access
|
||
to frame objects).
|
||
|
||
Variable assignments during subsequent code execution in the same scope must
|
||
change the contents of the returned mapping, and changes to the returned mapping
|
||
must change the values bound to local variable names in the
|
||
execution environment.
|
||
|
||
The mapping returned by ``locals()`` will *not* be used as the actual class
|
||
namespace underlying the defined class (the class creation process will copy
|
||
the contents to a fresh dictionary that is only accessible by going through the
|
||
class machinery).
|
||
|
||
For nested classes defined inside a function, any nonlocal cells referenced from
|
||
the class scope are *not* included in the ``locals()`` mapping.
|
||
|
||
The semantics at class scope are required to be the same in both tracing
|
||
mode (if provided by the implementation) and in regular operation.
|
||
|
||
To capture this expectation as part of the language specification, the following
|
||
two paragraphs will be added to the documentation for ``locals()``:
|
||
|
||
When using ``exec()`` or ``eval()`` with separate local and global
|
||
namespaces, [this function] returns the given local namespace.
|
||
|
||
At class scope, it returns the namespace that will be passed to the metaclass
|
||
constructor.
|
||
|
||
This part of the proposal does not require any changes to the reference
|
||
implementation - it is standardisation of the current behaviour.
|
||
|
||
|
||
Function scope
|
||
--------------
|
||
|
||
At function scope, interpreter implementations are granted significant freedom
|
||
to optimise local variable access, and hence are NOT required to permit
|
||
arbitrary modification of local and nonlocal variable bindings through the
|
||
mapping returned from ``locals()``.
|
||
|
||
Historically, this leniency has been described in the language specification
|
||
with the words "The contents of this dictionary should not be modified; changes
|
||
may not affect the values of local and free variables used by the interpreter."
|
||
|
||
This PEP proposes to change that text to instead say:
|
||
|
||
At function scope (including for generators and coroutines), [this function]
|
||
returns a
|
||
dynamic snapshot of the function's local variables and any nonlocal cell
|
||
references. In this case, changes made via the snapshot are *not* written
|
||
back to the corresponding local variables or nonlocal cell references, and
|
||
any such changes to the snapshot will be overwritten if the snapshot is
|
||
subsequently refreshed (e.g. by another call to ``locals()``).
|
||
|
||
CPython implementation detail: the dynamic snapshot for the currently
|
||
executing frame will be implicitly refreshed before each call to the trace
|
||
function when a trace function is active.
|
||
|
||
This part of the proposal *does* require changes to the CPython reference
|
||
implementation, as while it accurately describes the behaviour in regular
|
||
operation, the "write back" strategy currently used to support namespace changes
|
||
from trace functions doesn't comply with it (and also causes the quirky
|
||
behavioural problems mentioned in the Rationale).
|
||
|
||
|
||
CPython Implementation Changes
|
||
==============================
|
||
|
||
The current cause of CPython's tracing mode quirks (both the side effects from
|
||
simply installing a tracing function and the fact that writing values back to
|
||
function locals only works for the specific function being traced) is the way
|
||
that locals mutation support for trace hooks is currently implemented: the
|
||
``PyFrame_LocalsToFast`` function.
|
||
|
||
When a trace function is installed, CPython currently does the following for
|
||
function frames (those where the code object uses "fast locals" semantics):
|
||
|
||
1. Calls ``PyFrame_FastToLocals`` to update the dynamic snapshot
|
||
2. Calls the trace hook (with tracing of the hook itself disabled)
|
||
3. Calls ``PyFrame_LocalsToFast`` to capture any changes made to the dynamic
|
||
snapshot
|
||
|
||
This approach is problematic for a few different reasons:
|
||
|
||
* Even if the trace function doesn't mutate the snapshot, the final step resets
|
||
any cell references back to the state they were in before the trace function
|
||
was called (this is the root cause of the bug report in [1]_)
|
||
* If the trace function *does* mutate the snapshot, but then does something
|
||
that causes the snapshot to be refreshed, those changes are lost (this is
|
||
one aspect of the bug report in [3]_)
|
||
* If the trace function attempts to mutate the local variables of a frame other
|
||
than the one being traced (e.g. ``frame.f_back.f_locals``), those changes
|
||
will almost certainly be lost (this is another aspect of the bug report in
|
||
[3]_)
|
||
* If a ``locals()`` reference is passed to another function, and *that*
|
||
function mutates the snapshot namespace, then those changes *may* be written
|
||
back to the execution frame *if* a trace hook is installed
|
||
|
||
The proposed resolution to this problem is to take advantage of the fact that
|
||
whereas functions typically access their *own* namespace using the language
|
||
defined ``locals()`` builtin, trace functions necessarily use the implementation
|
||
dependent ``frame.f_locals`` interface, as a frame reference is what gets
|
||
passed to hook implementations.
|
||
|
||
In regular operation, nothing will change - ``frame.f_locals`` will be a direct
|
||
reference to the dynamic snapshot, and ``locals()`` will return a reference to
|
||
that snapshot. This reflects the fact that it's only CPython's tracing mode
|
||
semantics that are currently problematic.
|
||
|
||
In tracing mode, however, we will change ``frame.f_locals`` to instead return
|
||
a dedicated proxy type (implemented as a private subclass of the existing
|
||
``types.MappingProxyType``) that has two internal attributes not exposed as
|
||
part of either the Python or public C API:
|
||
|
||
* *mapping*: the dynamic snapshot that would be returned by ``frame.f_locals``
|
||
during regular operation
|
||
* *frame*: the underlying frame that the snapshot is for
|
||
|
||
The ``locals()`` builtin would be aware of this proxy type, and continue to
|
||
return a reference to the dynamic snapshot even when in tracing mode.
|
||
|
||
As long as the process remains in tracing mode, then ``__setitem__`` and
|
||
``__delitem__`` operations on the proxy will affect not only the dynamic
|
||
snapshot, but *also* the corresponding fast local or cell reference on the
|
||
underlying frame.
|
||
|
||
If the process leaves tracing mode (i.e. all previously installed trace hooks
|
||
are uninstalled), then any already created proxy objects will remain in place,
|
||
but their ``__setitem__`` and ``__delitem__`` methods will skip mutating
|
||
the underlying frame.
|
||
|
||
At the C API layer, ``PyEval_GetLocals()`` will implement the same semantics
|
||
as the Python level ``locals()`` builtin, and a new ``PyFrame_GetLocals(frame)``
|
||
accessor API will be provided to allow the proxy bypass logic to be encapsulated
|
||
entirely inside the frame implementation. The C level equivalent of accessing
|
||
``pyframe.f_locals`` in Python will be to access ``cframe->f_locals`` directly
|
||
(the one difference is that the Python descriptor will continue to include an
|
||
implicit snapshot refresh).
|
||
|
||
The ``PyFrame_LocalsToFast()`` function will be changed to always emit
|
||
``RuntimeError``, explaining that it is no longer a supported operation, and
|
||
affected code should be updated to rely on the write-through tracing mode
|
||
proxy instead.
|
||
|
||
|
||
Open Questions
|
||
==============
|
||
|
||
Is it necessary to restrict frame semantic changes to tracing mode?
|
||
-------------------------------------------------------------------
|
||
|
||
It would be possible to say that ``frame.f_locals`` should *always* return a
|
||
write-through proxy, even in regular operation.
|
||
|
||
This PEP currently avoids that option for a couple of key reasons, one pragmatic
|
||
and one more philosophical:
|
||
|
||
* Object allocations and method wrappers aren't free, and tracing functions
|
||
aren't the only operations that access frame locals from outside the function.
|
||
Restricting the changes to tracing mode means that the additional memory and
|
||
execution time overhead of these changes are going to be as close to zero in
|
||
regular operation as we can possibly make them.
|
||
* "Don't change what isn't broken": the current tracing mode problems are caused
|
||
by a requirement that's specific to tracing mode (support for external
|
||
rebinding of function local variable references), so it makes sense to also
|
||
restrict any related fixes to tracing mode
|
||
|
||
The counter-argument to this is that it makes for a really subtle behavioural
|
||
runtime state dependent distinction in how ``frame.f_locals`` works, and creates
|
||
some new edge cases around how ``f_locals`` behaves as trace functions are added
|
||
and removed. If the design adopted were instead "``frame.f_locals`` is always a
|
||
write-through proxy, while ``locals()`` is always a dynamic snapshot", that
|
||
would likely be both simpler to implement and easier to explain, suggesting
|
||
that it may be a good idea.
|
||
|
||
Regardless of how the CPython reference implementation chooses to handle this,
|
||
optimising compilers and interpreters can potentially impose additional
|
||
restrictions on debuggers, by making local variable mutation through frame
|
||
objects an opt-in behaviour that may disable some optimisations.
|
||
|
||
|
||
Design Discussion
|
||
=================
|
||
|
||
Ensuring ``locals()`` returns a shared snapshot at function scope
|
||
-----------------------------------------------------------------
|
||
|
||
The ``locals()`` builtin is a required part of the language, and in the
|
||
reference implementation it has historically returned a mutable mapping with
|
||
the following characteristics:
|
||
|
||
* each call to ``locals()`` returns the *same* mapping
|
||
* for namespaces where ``locals()`` returns a reference to something other than
|
||
the actual local execution namespace, each call to ``locals()`` updates the
|
||
mapping with the current state of the local variables and any referenced
|
||
nonlocal cells
|
||
* changes to the returned mapping *usually* aren't written back to the
|
||
local variable bindings or the nonlocal cell references, but write backs
|
||
can be triggered by doing one of the following:
|
||
|
||
* installing a Python level trace hook (write backs then happen whenever
|
||
the trace hook is called)
|
||
* running a function level wildcard import (requires bytecode injection in Py3)
|
||
* running an ``exec`` statement in the function's scope (Py2 only, since
|
||
``exec`` became an ordinary builtin in Python 3)
|
||
|
||
The proposal in this PEP aims to retain the first two properties (to maintain
|
||
backwards compatibility with as much code as possible) while ensuring that
|
||
simply installing a trace hook can't enable rebinding of function locals via
|
||
the ``locals()`` builtin (whereas enabling rebinding via
|
||
``inspect.currentframe().f_locals`` is fully intended).
|
||
|
||
|
||
What happens with the default args for ``eval()`` and ``exec()``?
|
||
-----------------------------------------------------------------
|
||
|
||
These are formally defined as inheriting ``globals()`` and ``locals()`` from
|
||
the calling scope by default.
|
||
|
||
There doesn't seem to be any reason for the PEP to change this.
|
||
|
||
|
||
Historical semantics at function scope
|
||
--------------------------------------
|
||
|
||
The current semantics of mutating ``locals()`` and ``frame.f_locals`` in CPython
|
||
are rather quirky due to historical implementation details:
|
||
|
||
* actual execution uses the fast locals array for local variable bindings and
|
||
cell references for nonlocal variables
|
||
* there's a ``PyFrame_FastToLocals`` operation that populates the frame's
|
||
``f_locals`` attribute based on the current state of the fast locals array
|
||
and any referenced cells. This exists for three reasons:
|
||
|
||
* allowing trace functions to read the state of local variables
|
||
* allowing traceback processors to read the state of local variables
|
||
* allowing ``locals()`` to read the state of local variables
|
||
* a direct reference to ``frame.f_locals`` is returned from ``locals()``, so if
|
||
you hand out multiple concurrent references, then all those references will be
|
||
to the exact same dictionary
|
||
* the two common calls to the reverse operation, ``PyFrame_LocalsToFast``, were
|
||
removed in the migration to Python 3: ``exec`` is no longer a statement (and
|
||
hence can no longer affect function local namespaces), and the compiler now
|
||
disallows the use of ``from module import *`` operations at function scope
|
||
* however, two obscure calling paths remain: ``PyFrame_LocalsToFast`` is called
|
||
as part of returning from a trace function (which allows debuggers to make
|
||
changes to the local variable state), and you can also still inject the
|
||
``IMPORT_STAR`` opcode when creating a function directly from a code object
|
||
rather than via the compiler
|
||
|
||
This proposal deliberately *doesn't* formalise these semantics as is, since they
|
||
only make sense in terms of the historical evolution of the language and the
|
||
reference implementation, rather than being deliberately designed.
|
||
|
||
|
||
Implementation
|
||
==============
|
||
|
||
The reference implementation update is in development as a draft pull
|
||
request on GitHub ([6]_).
|
||
|
||
|
||
Acknowledgements
|
||
================
|
||
|
||
Thanks to Nathaniel J. Smith for proposing the write-through proxy idea in
|
||
[1]_ and pointing out some critical design flaws in earlier iterations of the
|
||
PEP that attempted to avoid introducing such a proxy.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] Broken local variable assignment given threads + trace hook + closure
|
||
(https://bugs.python.org/issue30744)
|
||
|
||
.. [2] Clarify the required behaviour of ``locals()``
|
||
(https://bugs.python.org/issue17960)
|
||
|
||
.. [3] Updating function local variables from pdb is unreliable
|
||
(https://bugs.python.org/issue9633)
|
||
|
||
.. [4] CPython's Python API for installing trace hooks
|
||
(https://docs.python.org/dev/library/sys.html#sys.settrace)
|
||
|
||
.. [5] CPython's C API for installing trace hooks
|
||
(https://docs.python.org/3/c-api/init.html#c.PyEval_SetTrace)
|
||
|
||
.. [6] PEP 558 reference implementation
|
||
(https://github.com/python/cpython/pull/3640/files)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|