PEP 768: Safe external debugger interface for CPython (#4158)
This commit is contained in:
parent
f91366e50c
commit
24e078b7ce
|
@ -0,0 +1,351 @@
|
|||
PEP: 768
|
||||
Title: Safe external debugger interface for CPython
|
||||
Author: Pablo Galindo Salgado <pablogsal@python.org>, Matt Wozniski <godlygeek@gmail.com>, Ivona Stojanovic <stojanovic.i@hotmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 25-Nov-2024
|
||||
Python-Version: 3.14
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes adding a zero-overhead debugging interface to CPython that
|
||||
allows debuggers and profilers to safely attach to running Python processes. The
|
||||
interface provides safe execution points for attaching debugger code without
|
||||
modifying the interpreter's normal execution path or adding runtime overhead.
|
||||
|
||||
A key application of this interface will be enabling pdb to attach to live
|
||||
processes by process ID, similar to ``gdb -p``, allowing developers to inspect and
|
||||
debug Python applications interactively in real-time without stopping or
|
||||
restarting them.
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
|
||||
Debugging Python processes in production and live environments presents unique
|
||||
challenges. Developers often need to analyze application behavior without
|
||||
stopping or restarting services, which is especially crucial for
|
||||
high-availability systems. Common scenarios include diagnosing deadlocks,
|
||||
inspecting memory usage, or investigating unexpected behavior in real-time.
|
||||
|
||||
Very few Python tools can attach to running processes, primarily because doing
|
||||
so requires deep expertise in both operating system debugging interfaces and
|
||||
CPython internals. While C/C++ debuggers like GDB and LLDB can attach to
|
||||
processes using well-understood techniques, Python tools must implement all of
|
||||
these low-level mechanisms plus handle additional complexity. For example, when
|
||||
GDB needs to execute code in a target process, it:
|
||||
|
||||
1. Uses ptrace to allocate a small chunk of executable memory (easier said than done)
|
||||
2. Writes a small sequence of machine code - typically a function prologue, the
|
||||
desired instructions, and code to restore registers
|
||||
3. Saves all the target thread's registers
|
||||
4. Changes the instruction pointer to the injected code
|
||||
5. Lets the process run until it hits a breakpoint at the end of the injected code
|
||||
6. Restores the original registers and continues execution
|
||||
|
||||
Python tools face this same challenge of code injection, but with an additional
|
||||
layer of complexity. Not only do they need to implement the above mechanism,
|
||||
they must also understand and safely interact with CPython's runtime state,
|
||||
including the interpreter loop, garbage collector, thread state, and reference
|
||||
counting system. This combination of low-level system manipulation and
|
||||
deep domain specific interpreter knowledge makes implementing Python debugging tools
|
||||
exceptionally difficult.
|
||||
|
||||
The few tools (see for example `DebugPy
|
||||
<https://github.com/microsoft/debugpy/blob/43f41029eabce338becbd1fa1a09727b3cfb1140/src/debugpy/_vendored/pydevd/pydevd_attach_to_process/linux_and_mac/attach.cpp#L4>`__
|
||||
and `Memray
|
||||
<https://github.com/bloomberg/memray/blob/main/src/memray/_memray/inject.cpp>`__)
|
||||
that do attempt this resort to suboptimal and unsafe methods,
|
||||
using system debuggers like GDB and LLDB to forcefully inject code. This
|
||||
approach is fundamentally unsafe because the injected code can execute at any
|
||||
point during the interpreter's execution cycle - even during critical operations
|
||||
like memory allocation, garbage collection, or thread state management. When
|
||||
this happens, the results are catastrophic: attempting to allocate memory while
|
||||
already inside ``malloc()`` causes crashes, modifying objects during garbage
|
||||
collection corrupts the interpreter's state, and touching thread state at the
|
||||
wrong time leads to deadlocks.
|
||||
|
||||
Various tools attempt to minimize these risks through complex workarounds, such
|
||||
as spawning separate threads for injected code or carefully timing their
|
||||
operations or trying to select some good points to stop the process. However,
|
||||
these mitigations cannot fully solve the underlying problem: without cooperation
|
||||
from the interpreter, there's no way to know if it's safe to execute code at any
|
||||
given moment. Even carefully implemented tools can crash the interpreter because
|
||||
they're fundamentally working against it rather than with it.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
|
||||
Rather than forcing tools to work around interpreter limitations with unsafe
|
||||
code injection, we can extend CPython with a proper debugging interface that
|
||||
guarantees safe execution. By adding a few thread state fields and integrating
|
||||
with the interpreter's existing evaluation loop, we can ensure debugging
|
||||
operations only occur at well-defined safe points. This eliminates the
|
||||
possibility of crashes and corruption while maintaining zero overhead during
|
||||
normal execution.
|
||||
|
||||
The key insight is that we don't need to inject code at arbitrary points - we
|
||||
just need to signal to the interpreter that we want code executed at the next
|
||||
safe opportunity. This approach works with the interpreter's natural execution
|
||||
flow rather than fighting against it.
|
||||
|
||||
After describing this idea to the PyPy development team, this proposal has
|
||||
already `been implemented in PyPy <https://github.com/pypy/pypy/pull/5135>`__,
|
||||
proving both its feasibility and effectiveness. Their implementation
|
||||
demonstrates that we can provide safe debugging capabilities with zero runtime
|
||||
overhead during normal execution. The proposed mechanism not only reduces risks
|
||||
associated with current debugging approaches but also lays the foundation for
|
||||
future enhancements. For instance, this framework could enable integration with
|
||||
popular observability tools, providing real-time insights into interpreter
|
||||
performance or memory usage. One compelling use case for this interface is
|
||||
enabling pdb to attach to running Python processes, similar to how gdb allows
|
||||
users to attach to a program by process ID (``gdb -p <pid>``). With this
|
||||
feature, developers could inspect the state of a running application, evaluate
|
||||
expressions, and step through code dynamically. This approach would align
|
||||
Python's debugging capabilities with those of other major programming languages
|
||||
and debugging tools that support this mode.
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
|
||||
This proposal introduces a safe debugging mechanism that allows external
|
||||
processes to trigger code execution in a Python interpreter at well-defined safe
|
||||
points. The key insight is that rather than injecting code directly via system
|
||||
debuggers, we can leverage the interpreter's existing evaluation loop and thread
|
||||
state to coordinate debugging operations.
|
||||
|
||||
The mechanism works by having debuggers write to specific memory locations in
|
||||
the target process that the interpreter then checks during its normal execution
|
||||
cycle. When the interpreter detects that a debugger wants to attach, it executes the
|
||||
requested operations only when it's safe to do so - that is, when no internal
|
||||
locks are held and all data structures are in a consistent state.
|
||||
|
||||
|
||||
Runtime State Extensions
|
||||
------------------------
|
||||
|
||||
A new structure is added to PyThreadState to support remote debugging:
|
||||
|
||||
.. code-block:: C
|
||||
|
||||
typedef struct _remote_debugger_support {
|
||||
int debugger_pending_call;
|
||||
char debugger_script[MAX_SCRIPT_SIZE];
|
||||
} _PyRemoteDebuggerSupport;
|
||||
|
||||
|
||||
This structure is appended to ``PyThreadState``, adding only a few fields that
|
||||
are **never accessed during normal execution**. The ``debugger_pending_call`` field
|
||||
indicates when a debugger has requested execution, while ``debugger_script``
|
||||
provides Python code to be executed when the interpreter reaches a safe point.
|
||||
|
||||
|
||||
Debug Offsets Table
|
||||
-------------------
|
||||
|
||||
|
||||
Python 3.12 introduced a debug offsets table placed at the start of the
|
||||
PyRuntime structure. This section contains the ``_Py_DebugOffsets`` structure that
|
||||
allows external tools to reliably find critical runtime structures regardless of
|
||||
`ASLR <https://en.wikipedia.org/wiki/Address_space_layout_randomization>`__ or
|
||||
how Python was compiled.
|
||||
|
||||
This proposal extends the existing debug offsets table with new fields for
|
||||
debugger support:
|
||||
|
||||
.. code-block:: C
|
||||
|
||||
struct _debugger_support {
|
||||
uint64_t eval_breaker; // Location of the eval breaker flag
|
||||
uint64_t remote_debugger_support; // Offset to our support structure
|
||||
uint64_t debugger_pending_call; // Where to write the pending flag
|
||||
uint64_t debugger_script; // Where to write the script
|
||||
} debugger_support;
|
||||
|
||||
These offsets allow debuggers to locate critical debugging control structures in
|
||||
the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support``
|
||||
offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call``
|
||||
and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport``
|
||||
structure, allowing the new structure and its fields to be found regardless of
|
||||
where they are in memory.
|
||||
|
||||
Attachment Protocol
|
||||
-------------------
|
||||
When a debugger wants to attach to a Python process, it follows these steps:
|
||||
|
||||
1. Locate ``PyRuntime`` structure in the process:
|
||||
|
||||
- Find Python binary (executable or libpython) in process memory (OS dependent process)
|
||||
- Extract ``.PyRuntime`` section offset from binary's format (ELF/Mach-O/PE)
|
||||
- Calculate the actual ``PyRuntime`` address in the running process by relocating the offset to the binary's load address
|
||||
|
||||
2. Access debug offset information by reading the ``_Py_DebugOffsets`` at the start of the ``PyRuntime`` structure.
|
||||
|
||||
3. Use the offsets to locate the desired thread state
|
||||
|
||||
4. Use the offsets to locate the debugger interface fields within that thread state
|
||||
|
||||
5. Write control information:
|
||||
|
||||
- Write python code to be executed into the ``debugger_script`` field in ``_PyRemoteDebuggerSupport``
|
||||
- Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport``
|
||||
- Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field
|
||||
|
||||
Once the interpreter reaches the next safe point, it will execute the script
|
||||
provided by the debugger.
|
||||
|
||||
Interpreter Integration
|
||||
-----------------------
|
||||
|
||||
The interpreter's regular evaluation loop already includes a check of the
|
||||
``eval_breaker`` flag for handling signals, periodic tasks, and other interrupts. We
|
||||
leverage this existing mechanism by checking for debugger pending calls only
|
||||
when the ``eval_breaker`` is set, ensuring zero overhead during normal execution.
|
||||
This check has no overhead. Indeed, profiling with Linux ``perf`` shows this branch
|
||||
is highly predictable - the ``debugger_pending_call`` check is never taken during
|
||||
normal execution, allowing modern CPUs to effectively speculate past it.
|
||||
|
||||
|
||||
When a debugger has set both the ``eval_breaker`` flag and ``debugger_pending_call``,
|
||||
the interpreter will execute the provided debugging code at the next safe point
|
||||
and executes the provided code. This all happens in a completely safe context, since
|
||||
the interpreter is guaranteed to be in a consistent state whenever the eval breaker
|
||||
is checked.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
// In ceval.c
|
||||
if (tstate->eval_breaker) {
|
||||
if (tstate->remote_debugger_support.debugger_pending_call) {
|
||||
tstate->remote_debugger_support.debugger_pending_call = 0;
|
||||
if (tstate->remote_debugger_support.debugger_script[0]) {
|
||||
if (PyRun_SimpleString(tstate->remote_debugger_support.debugger_script)<0) {
|
||||
PyErr_Clear();
|
||||
};
|
||||
// ...
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Python API
|
||||
----------
|
||||
|
||||
To support safe execution of Python code in a remote process without having to
|
||||
re-implement all these steps in every tool, this proposal extends the ``sys`` module
|
||||
with a new function. This function allows debuggers or external tools to execute
|
||||
arbitrary Python code within the context of a specified Python process:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def remote_exec(pid: int, code: str) -> None:
|
||||
"""
|
||||
Executes a block of Python code in a given remote Python process.
|
||||
|
||||
Args:
|
||||
pid (int): The process ID of the target Python process.
|
||||
code (str): A string containing the Python code to be executed.
|
||||
"""
|
||||
|
||||
An example usage of the API would look like:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import sys
|
||||
# Execute a print statement in a remote Python process with PID 12345
|
||||
try:
|
||||
sys.remote_exec(12345, "print('Hello from remote execution!')")
|
||||
except Exception as e:
|
||||
print(f"Failed to execute code: {e}")
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
This change has no impact on existing Python code or interpreter performance.
|
||||
The added fields are only accessed during debugger attachment, and the checking
|
||||
mechanism piggybacks on existing interpreter safe points.
|
||||
|
||||
|
||||
Security Implications
|
||||
=====================
|
||||
|
||||
This interface does not introduce new security concerns as it relies entirely on
|
||||
existing operating system security mechanisms for process memory access. Although
|
||||
the PEP doesn't specify how memory should be written to the target process, in practice
|
||||
this will be done using standard system calls that are already being used by other
|
||||
debuggers and tools. Some examples are:
|
||||
|
||||
* On Linux, the ``process_vm_readv()`` and ``process_vm_writev()`` system calls
|
||||
are used to read and write memory from another process. These operations are
|
||||
controlled by ptrace access mode checks - the same ones that govern debugger
|
||||
attachment. A process can only read from or write to another process's memory
|
||||
if it has the appropriate permissions (typically requiring either root or the
|
||||
``CAP_SYS_PTRACE`` capability, though less security minded distributions may
|
||||
allow any process running as the same uid to attach).
|
||||
|
||||
* On macOS, the interface would leverage ``mach_vm_read_overwrite()`` and
|
||||
``mach_vm_write()`` through the Mach task system. These operations require
|
||||
``task_for_pid()`` access, which is strictly controlled by the operating
|
||||
system. By default, access is limited to processes running as root or those
|
||||
with specific entitlements granted by Apple's security framework.
|
||||
|
||||
* On Windows, the ``ReadProcessMemory()`` and ``WriteProcessMemory()`` functions
|
||||
provide similar functionality. Access is controlled through the Windows
|
||||
security model - a process needs ``PROCESS_VM_READ`` and ``PROCESS_VM_WRITE``
|
||||
permissions, which typically require the same user context or appropriate
|
||||
privileges. These are the same permissions required by debuggers, ensuring
|
||||
consistent security semantics across platforms.
|
||||
|
||||
All mechanisms ensure that:
|
||||
|
||||
1. Only authorized processes can read/write memory
|
||||
2. The same security model that governs traditional debugger attachment applies
|
||||
3. No additional attack surface is exposed beyond what the OS already provides for debugging
|
||||
|
||||
The memory operations themselves are well-established and have been used safely
|
||||
for decades in tools like GDB, LLDB, and various system profilers.
|
||||
|
||||
It’s important to note that any attempt to attach to a Python process via this
|
||||
mechanism would be detectable by system-level monitoring tools. This
|
||||
transparency provides an additional layer of accountability, allowing
|
||||
administrators to audit debugging operations in sensitive environments.
|
||||
|
||||
Further, the strict reliance on OS-level security controls ensures that existing
|
||||
system policies remain effective. For enterprise environments, this means
|
||||
administrators can continue to enforce debugging restrictions using standard
|
||||
tools and policies without requiring additional configuration. For instance,
|
||||
leveraging Linux’s ``ptrace_scope`` or macOS’s ``taskgated`` to restrict
|
||||
debugger access will equally govern the proposed interface.
|
||||
|
||||
By maintaining compatibility with existing security frameworks, this design
|
||||
ensures that adopting the new interface requires no changes to established
|
||||
security practices, thereby minimizing barriers to adoption.
|
||||
|
||||
How to Teach This
|
||||
=================
|
||||
|
||||
For tool authors, this interface becomes the standard way to implement debugger
|
||||
attachment, replacing unsafe system debugger approaches. A section in the Python
|
||||
Developer Guide could describe the internal workings of the mechanism, including
|
||||
the ``debugger_support`` offsets and how to interact with them using system
|
||||
APIs.
|
||||
|
||||
End users need not be aware of the interface, benefiting only from improved
|
||||
debugging tool stability and reliability.
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
https://github.com/pablogsal/cpython/commits/remote_pdb/
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the CC0-1.0-Universal
|
||||
license, whichever is more permissive.
|
Loading…
Reference in New Issue