352 lines
16 KiB
ReStructuredText
352 lines
16 KiB
ReStructuredText
|
PEP: 768
|
|||
|
Title: Safe external debugger interface for CPython
|
|||
|
Author: Pablo Galindo Salgado <pablogsal@python.org>, Matt Wozniski <godlygeek@gmail.com>, Ivona Stojanovic <stojanovic.i@hotmail.com>
|
|||
|
Status: Draft
|
|||
|
Type: Standards Track
|
|||
|
Created: 25-Nov-2024
|
|||
|
Python-Version: 3.14
|
|||
|
|
|||
|
Abstract
|
|||
|
========
|
|||
|
|
|||
|
This PEP proposes adding a zero-overhead debugging interface to CPython that
|
|||
|
allows debuggers and profilers to safely attach to running Python processes. The
|
|||
|
interface provides safe execution points for attaching debugger code without
|
|||
|
modifying the interpreter's normal execution path or adding runtime overhead.
|
|||
|
|
|||
|
A key application of this interface will be enabling pdb to attach to live
|
|||
|
processes by process ID, similar to ``gdb -p``, allowing developers to inspect and
|
|||
|
debug Python applications interactively in real-time without stopping or
|
|||
|
restarting them.
|
|||
|
|
|||
|
Motivation
|
|||
|
==========
|
|||
|
|
|||
|
|
|||
|
Debugging Python processes in production and live environments presents unique
|
|||
|
challenges. Developers often need to analyze application behavior without
|
|||
|
stopping or restarting services, which is especially crucial for
|
|||
|
high-availability systems. Common scenarios include diagnosing deadlocks,
|
|||
|
inspecting memory usage, or investigating unexpected behavior in real-time.
|
|||
|
|
|||
|
Very few Python tools can attach to running processes, primarily because doing
|
|||
|
so requires deep expertise in both operating system debugging interfaces and
|
|||
|
CPython internals. While C/C++ debuggers like GDB and LLDB can attach to
|
|||
|
processes using well-understood techniques, Python tools must implement all of
|
|||
|
these low-level mechanisms plus handle additional complexity. For example, when
|
|||
|
GDB needs to execute code in a target process, it:
|
|||
|
|
|||
|
1. Uses ptrace to allocate a small chunk of executable memory (easier said than done)
|
|||
|
2. Writes a small sequence of machine code - typically a function prologue, the
|
|||
|
desired instructions, and code to restore registers
|
|||
|
3. Saves all the target thread's registers
|
|||
|
4. Changes the instruction pointer to the injected code
|
|||
|
5. Lets the process run until it hits a breakpoint at the end of the injected code
|
|||
|
6. Restores the original registers and continues execution
|
|||
|
|
|||
|
Python tools face this same challenge of code injection, but with an additional
|
|||
|
layer of complexity. Not only do they need to implement the above mechanism,
|
|||
|
they must also understand and safely interact with CPython's runtime state,
|
|||
|
including the interpreter loop, garbage collector, thread state, and reference
|
|||
|
counting system. This combination of low-level system manipulation and
|
|||
|
deep domain specific interpreter knowledge makes implementing Python debugging tools
|
|||
|
exceptionally difficult.
|
|||
|
|
|||
|
The few tools (see for example `DebugPy
|
|||
|
<https://github.com/microsoft/debugpy/blob/43f41029eabce338becbd1fa1a09727b3cfb1140/src/debugpy/_vendored/pydevd/pydevd_attach_to_process/linux_and_mac/attach.cpp#L4>`__
|
|||
|
and `Memray
|
|||
|
<https://github.com/bloomberg/memray/blob/main/src/memray/_memray/inject.cpp>`__)
|
|||
|
that do attempt this resort to suboptimal and unsafe methods,
|
|||
|
using system debuggers like GDB and LLDB to forcefully inject code. This
|
|||
|
approach is fundamentally unsafe because the injected code can execute at any
|
|||
|
point during the interpreter's execution cycle - even during critical operations
|
|||
|
like memory allocation, garbage collection, or thread state management. When
|
|||
|
this happens, the results are catastrophic: attempting to allocate memory while
|
|||
|
already inside ``malloc()`` causes crashes, modifying objects during garbage
|
|||
|
collection corrupts the interpreter's state, and touching thread state at the
|
|||
|
wrong time leads to deadlocks.
|
|||
|
|
|||
|
Various tools attempt to minimize these risks through complex workarounds, such
|
|||
|
as spawning separate threads for injected code or carefully timing their
|
|||
|
operations or trying to select some good points to stop the process. However,
|
|||
|
these mitigations cannot fully solve the underlying problem: without cooperation
|
|||
|
from the interpreter, there's no way to know if it's safe to execute code at any
|
|||
|
given moment. Even carefully implemented tools can crash the interpreter because
|
|||
|
they're fundamentally working against it rather than with it.
|
|||
|
|
|||
|
|
|||
|
Rationale
|
|||
|
=========
|
|||
|
|
|||
|
|
|||
|
Rather than forcing tools to work around interpreter limitations with unsafe
|
|||
|
code injection, we can extend CPython with a proper debugging interface that
|
|||
|
guarantees safe execution. By adding a few thread state fields and integrating
|
|||
|
with the interpreter's existing evaluation loop, we can ensure debugging
|
|||
|
operations only occur at well-defined safe points. This eliminates the
|
|||
|
possibility of crashes and corruption while maintaining zero overhead during
|
|||
|
normal execution.
|
|||
|
|
|||
|
The key insight is that we don't need to inject code at arbitrary points - we
|
|||
|
just need to signal to the interpreter that we want code executed at the next
|
|||
|
safe opportunity. This approach works with the interpreter's natural execution
|
|||
|
flow rather than fighting against it.
|
|||
|
|
|||
|
After describing this idea to the PyPy development team, this proposal has
|
|||
|
already `been implemented in PyPy <https://github.com/pypy/pypy/pull/5135>`__,
|
|||
|
proving both its feasibility and effectiveness. Their implementation
|
|||
|
demonstrates that we can provide safe debugging capabilities with zero runtime
|
|||
|
overhead during normal execution. The proposed mechanism not only reduces risks
|
|||
|
associated with current debugging approaches but also lays the foundation for
|
|||
|
future enhancements. For instance, this framework could enable integration with
|
|||
|
popular observability tools, providing real-time insights into interpreter
|
|||
|
performance or memory usage. One compelling use case for this interface is
|
|||
|
enabling pdb to attach to running Python processes, similar to how gdb allows
|
|||
|
users to attach to a program by process ID (``gdb -p <pid>``). With this
|
|||
|
feature, developers could inspect the state of a running application, evaluate
|
|||
|
expressions, and step through code dynamically. This approach would align
|
|||
|
Python's debugging capabilities with those of other major programming languages
|
|||
|
and debugging tools that support this mode.
|
|||
|
|
|||
|
Specification
|
|||
|
=============
|
|||
|
|
|||
|
|
|||
|
This proposal introduces a safe debugging mechanism that allows external
|
|||
|
processes to trigger code execution in a Python interpreter at well-defined safe
|
|||
|
points. The key insight is that rather than injecting code directly via system
|
|||
|
debuggers, we can leverage the interpreter's existing evaluation loop and thread
|
|||
|
state to coordinate debugging operations.
|
|||
|
|
|||
|
The mechanism works by having debuggers write to specific memory locations in
|
|||
|
the target process that the interpreter then checks during its normal execution
|
|||
|
cycle. When the interpreter detects that a debugger wants to attach, it executes the
|
|||
|
requested operations only when it's safe to do so - that is, when no internal
|
|||
|
locks are held and all data structures are in a consistent state.
|
|||
|
|
|||
|
|
|||
|
Runtime State Extensions
|
|||
|
------------------------
|
|||
|
|
|||
|
A new structure is added to PyThreadState to support remote debugging:
|
|||
|
|
|||
|
.. code-block:: C
|
|||
|
|
|||
|
typedef struct _remote_debugger_support {
|
|||
|
int debugger_pending_call;
|
|||
|
char debugger_script[MAX_SCRIPT_SIZE];
|
|||
|
} _PyRemoteDebuggerSupport;
|
|||
|
|
|||
|
|
|||
|
This structure is appended to ``PyThreadState``, adding only a few fields that
|
|||
|
are **never accessed during normal execution**. The ``debugger_pending_call`` field
|
|||
|
indicates when a debugger has requested execution, while ``debugger_script``
|
|||
|
provides Python code to be executed when the interpreter reaches a safe point.
|
|||
|
|
|||
|
|
|||
|
Debug Offsets Table
|
|||
|
-------------------
|
|||
|
|
|||
|
|
|||
|
Python 3.12 introduced a debug offsets table placed at the start of the
|
|||
|
PyRuntime structure. This section contains the ``_Py_DebugOffsets`` structure that
|
|||
|
allows external tools to reliably find critical runtime structures regardless of
|
|||
|
`ASLR <https://en.wikipedia.org/wiki/Address_space_layout_randomization>`__ or
|
|||
|
how Python was compiled.
|
|||
|
|
|||
|
This proposal extends the existing debug offsets table with new fields for
|
|||
|
debugger support:
|
|||
|
|
|||
|
.. code-block:: C
|
|||
|
|
|||
|
struct _debugger_support {
|
|||
|
uint64_t eval_breaker; // Location of the eval breaker flag
|
|||
|
uint64_t remote_debugger_support; // Offset to our support structure
|
|||
|
uint64_t debugger_pending_call; // Where to write the pending flag
|
|||
|
uint64_t debugger_script; // Where to write the script
|
|||
|
} debugger_support;
|
|||
|
|
|||
|
These offsets allow debuggers to locate critical debugging control structures in
|
|||
|
the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support``
|
|||
|
offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call``
|
|||
|
and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport``
|
|||
|
structure, allowing the new structure and its fields to be found regardless of
|
|||
|
where they are in memory.
|
|||
|
|
|||
|
Attachment Protocol
|
|||
|
-------------------
|
|||
|
When a debugger wants to attach to a Python process, it follows these steps:
|
|||
|
|
|||
|
1. Locate ``PyRuntime`` structure in the process:
|
|||
|
|
|||
|
- Find Python binary (executable or libpython) in process memory (OS dependent process)
|
|||
|
- Extract ``.PyRuntime`` section offset from binary's format (ELF/Mach-O/PE)
|
|||
|
- Calculate the actual ``PyRuntime`` address in the running process by relocating the offset to the binary's load address
|
|||
|
|
|||
|
2. Access debug offset information by reading the ``_Py_DebugOffsets`` at the start of the ``PyRuntime`` structure.
|
|||
|
|
|||
|
3. Use the offsets to locate the desired thread state
|
|||
|
|
|||
|
4. Use the offsets to locate the debugger interface fields within that thread state
|
|||
|
|
|||
|
5. Write control information:
|
|||
|
|
|||
|
- Write python code to be executed into the ``debugger_script`` field in ``_PyRemoteDebuggerSupport``
|
|||
|
- Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport``
|
|||
|
- Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field
|
|||
|
|
|||
|
Once the interpreter reaches the next safe point, it will execute the script
|
|||
|
provided by the debugger.
|
|||
|
|
|||
|
Interpreter Integration
|
|||
|
-----------------------
|
|||
|
|
|||
|
The interpreter's regular evaluation loop already includes a check of the
|
|||
|
``eval_breaker`` flag for handling signals, periodic tasks, and other interrupts. We
|
|||
|
leverage this existing mechanism by checking for debugger pending calls only
|
|||
|
when the ``eval_breaker`` is set, ensuring zero overhead during normal execution.
|
|||
|
This check has no overhead. Indeed, profiling with Linux ``perf`` shows this branch
|
|||
|
is highly predictable - the ``debugger_pending_call`` check is never taken during
|
|||
|
normal execution, allowing modern CPUs to effectively speculate past it.
|
|||
|
|
|||
|
|
|||
|
When a debugger has set both the ``eval_breaker`` flag and ``debugger_pending_call``,
|
|||
|
the interpreter will execute the provided debugging code at the next safe point
|
|||
|
and executes the provided code. This all happens in a completely safe context, since
|
|||
|
the interpreter is guaranteed to be in a consistent state whenever the eval breaker
|
|||
|
is checked.
|
|||
|
|
|||
|
.. code-block:: c
|
|||
|
|
|||
|
// In ceval.c
|
|||
|
if (tstate->eval_breaker) {
|
|||
|
if (tstate->remote_debugger_support.debugger_pending_call) {
|
|||
|
tstate->remote_debugger_support.debugger_pending_call = 0;
|
|||
|
if (tstate->remote_debugger_support.debugger_script[0]) {
|
|||
|
if (PyRun_SimpleString(tstate->remote_debugger_support.debugger_script)<0) {
|
|||
|
PyErr_Clear();
|
|||
|
};
|
|||
|
// ...
|
|||
|
}
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
|
|||
|
Python API
|
|||
|
----------
|
|||
|
|
|||
|
To support safe execution of Python code in a remote process without having to
|
|||
|
re-implement all these steps in every tool, this proposal extends the ``sys`` module
|
|||
|
with a new function. This function allows debuggers or external tools to execute
|
|||
|
arbitrary Python code within the context of a specified Python process:
|
|||
|
|
|||
|
.. code-block:: python
|
|||
|
|
|||
|
def remote_exec(pid: int, code: str) -> None:
|
|||
|
"""
|
|||
|
Executes a block of Python code in a given remote Python process.
|
|||
|
|
|||
|
Args:
|
|||
|
pid (int): The process ID of the target Python process.
|
|||
|
code (str): A string containing the Python code to be executed.
|
|||
|
"""
|
|||
|
|
|||
|
An example usage of the API would look like:
|
|||
|
|
|||
|
.. code-block:: python
|
|||
|
|
|||
|
import sys
|
|||
|
# Execute a print statement in a remote Python process with PID 12345
|
|||
|
try:
|
|||
|
sys.remote_exec(12345, "print('Hello from remote execution!')")
|
|||
|
except Exception as e:
|
|||
|
print(f"Failed to execute code: {e}")
|
|||
|
|
|||
|
|
|||
|
Backwards Compatibility
|
|||
|
=======================
|
|||
|
|
|||
|
This change has no impact on existing Python code or interpreter performance.
|
|||
|
The added fields are only accessed during debugger attachment, and the checking
|
|||
|
mechanism piggybacks on existing interpreter safe points.
|
|||
|
|
|||
|
|
|||
|
Security Implications
|
|||
|
=====================
|
|||
|
|
|||
|
This interface does not introduce new security concerns as it relies entirely on
|
|||
|
existing operating system security mechanisms for process memory access. Although
|
|||
|
the PEP doesn't specify how memory should be written to the target process, in practice
|
|||
|
this will be done using standard system calls that are already being used by other
|
|||
|
debuggers and tools. Some examples are:
|
|||
|
|
|||
|
* On Linux, the ``process_vm_readv()`` and ``process_vm_writev()`` system calls
|
|||
|
are used to read and write memory from another process. These operations are
|
|||
|
controlled by ptrace access mode checks - the same ones that govern debugger
|
|||
|
attachment. A process can only read from or write to another process's memory
|
|||
|
if it has the appropriate permissions (typically requiring either root or the
|
|||
|
``CAP_SYS_PTRACE`` capability, though less security minded distributions may
|
|||
|
allow any process running as the same uid to attach).
|
|||
|
|
|||
|
* On macOS, the interface would leverage ``mach_vm_read_overwrite()`` and
|
|||
|
``mach_vm_write()`` through the Mach task system. These operations require
|
|||
|
``task_for_pid()`` access, which is strictly controlled by the operating
|
|||
|
system. By default, access is limited to processes running as root or those
|
|||
|
with specific entitlements granted by Apple's security framework.
|
|||
|
|
|||
|
* On Windows, the ``ReadProcessMemory()`` and ``WriteProcessMemory()`` functions
|
|||
|
provide similar functionality. Access is controlled through the Windows
|
|||
|
security model - a process needs ``PROCESS_VM_READ`` and ``PROCESS_VM_WRITE``
|
|||
|
permissions, which typically require the same user context or appropriate
|
|||
|
privileges. These are the same permissions required by debuggers, ensuring
|
|||
|
consistent security semantics across platforms.
|
|||
|
|
|||
|
All mechanisms ensure that:
|
|||
|
|
|||
|
1. Only authorized processes can read/write memory
|
|||
|
2. The same security model that governs traditional debugger attachment applies
|
|||
|
3. No additional attack surface is exposed beyond what the OS already provides for debugging
|
|||
|
|
|||
|
The memory operations themselves are well-established and have been used safely
|
|||
|
for decades in tools like GDB, LLDB, and various system profilers.
|
|||
|
|
|||
|
It’s important to note that any attempt to attach to a Python process via this
|
|||
|
mechanism would be detectable by system-level monitoring tools. This
|
|||
|
transparency provides an additional layer of accountability, allowing
|
|||
|
administrators to audit debugging operations in sensitive environments.
|
|||
|
|
|||
|
Further, the strict reliance on OS-level security controls ensures that existing
|
|||
|
system policies remain effective. For enterprise environments, this means
|
|||
|
administrators can continue to enforce debugging restrictions using standard
|
|||
|
tools and policies without requiring additional configuration. For instance,
|
|||
|
leveraging Linux’s ``ptrace_scope`` or macOS’s ``taskgated`` to restrict
|
|||
|
debugger access will equally govern the proposed interface.
|
|||
|
|
|||
|
By maintaining compatibility with existing security frameworks, this design
|
|||
|
ensures that adopting the new interface requires no changes to established
|
|||
|
security practices, thereby minimizing barriers to adoption.
|
|||
|
|
|||
|
How to Teach This
|
|||
|
=================
|
|||
|
|
|||
|
For tool authors, this interface becomes the standard way to implement debugger
|
|||
|
attachment, replacing unsafe system debugger approaches. A section in the Python
|
|||
|
Developer Guide could describe the internal workings of the mechanism, including
|
|||
|
the ``debugger_support`` offsets and how to interact with them using system
|
|||
|
APIs.
|
|||
|
|
|||
|
End users need not be aware of the interface, benefiting only from improved
|
|||
|
debugging tool stability and reliability.
|
|||
|
|
|||
|
Reference Implementation
|
|||
|
========================
|
|||
|
|
|||
|
https://github.com/pablogsal/cpython/commits/remote_pdb/
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
=========
|
|||
|
|
|||
|
This document is placed in the public domain or under the CC0-1.0-Universal
|
|||
|
license, whichever is more permissive.
|