python-peps/peps/pep-0768.rst

PEP: 768
Title: Safe external debugger interface for CPython
Author: Pablo Galindo Salgado <pablogsal@python.org>, Matt Wozniski <godlygeek@gmail.com>, Ivona Stojanovic <stojanovic.i@hotmail.com>
Status: Draft
Type: Standards Track
Created: 25-Nov-2024
Python-Version: 3.14

Abstract
========

This PEP proposes adding a zero-overhead debugging interface to CPython that
allows debuggers and profilers to safely attach to running Python processes. The
interface provides safe execution points for attaching debugger code without
modifying the interpreter's normal execution path or adding runtime overhead.

A key application of this interface will be enabling pdb to attach to live
processes by process ID, similar to ``gdb -p``, allowing developers to inspect and
debug Python applications interactively in real-time without stopping or
restarting them.

Motivation
==========


Debugging Python processes in production and live environments presents unique
challenges. Developers often need to analyze application behavior without
stopping or restarting services, which is especially crucial for
high-availability systems. Common scenarios include diagnosing deadlocks,
inspecting memory usage, or investigating unexpected behavior in real-time.

Very few Python tools can attach to running processes, primarily because doing
so requires deep expertise in both operating system debugging interfaces and
CPython internals. While C/C++ debuggers like GDB and LLDB can attach to
processes using well-understood techniques, Python tools must implement all of
these low-level mechanisms plus handle additional complexity. For example, when
GDB needs to execute code in a target process, it:

1. Uses ptrace to allocate a small chunk of executable memory (easier said than done)
2. Writes a small sequence of machine code - typically a function prologue, the
   desired instructions, and code to restore registers
3. Saves all the target thread's registers
4. Changes the instruction pointer to the injected code
5. Lets the process run until it hits a breakpoint at the end of the injected code
6. Restores the original registers and continues execution

Python tools face this same challenge of code injection, but with an additional
layer of complexity. Not only do they need to implement the above mechanism,
they must also understand and safely interact with CPython's runtime state,
including the interpreter loop, garbage collector, thread state, and reference
counting system. This combination of low-level system manipulation and
deep domain specific interpreter knowledge makes implementing Python debugging tools
exceptionally difficult.

The few tools (see for example `DebugPy
<https://github.com/microsoft/debugpy/blob/43f41029eabce338becbd1fa1a09727b3cfb1140/src/debugpy/_vendored/pydevd/pydevd_attach_to_process/linux_and_mac/attach.cpp#L4>`__
and `Memray
<https://github.com/bloomberg/memray/blob/main/src/memray/_memray/inject.cpp>`__)
that do attempt this resort to suboptimal and unsafe methods,
using system debuggers like GDB and LLDB to forcefully inject code. This
approach is fundamentally unsafe because the injected code can execute at any
point during the interpreter's execution cycle - even during critical operations
like memory allocation, garbage collection, or thread state management. When
this happens, the results are catastrophic: attempting to allocate memory while
already inside ``malloc()`` causes crashes, modifying objects during garbage
collection corrupts the interpreter's state, and touching thread state at the
wrong time leads to deadlocks.

Various tools attempt to minimize these risks through complex workarounds, such
as spawning separate threads for injected code or carefully timing their
operations or trying to select some good points to stop the process. However,
these mitigations cannot fully solve the underlying problem: without cooperation
from the interpreter, there's no way to know if it's safe to execute code at any
given moment. Even carefully implemented tools can crash the interpreter because
they're fundamentally working against it rather than with it.


Rationale
=========


Rather than forcing tools to work around interpreter limitations with unsafe
code injection, we can extend CPython with a proper debugging interface that
guarantees safe execution. By adding a few thread state fields and integrating
with the interpreter's existing evaluation loop, we can ensure debugging
operations only occur at well-defined safe points. This eliminates the
possibility of crashes and corruption while maintaining zero overhead during
normal execution.

The key insight is that we don't need to inject code at arbitrary points - we
just need to signal to the interpreter that we want code executed at the next
safe opportunity. This approach works with the interpreter's natural execution
flow rather than fighting against it.

After describing this idea to the PyPy development team, this proposal has
already `been implemented in PyPy <https://github.com/pypy/pypy/pull/5135>`__,
proving both its feasibility and effectiveness. Their implementation
demonstrates that we can provide safe debugging capabilities with zero runtime
overhead during normal execution.  The proposed mechanism not only reduces risks
associated with current debugging approaches but also lays the foundation for
future enhancements. For instance, this framework could enable integration with
popular observability tools, providing real-time insights into interpreter
performance or memory usage. One compelling use case for this interface is
enabling pdb to attach to running Python processes, similar to how gdb allows
users to attach to a program by process ID (``gdb -p <pid>``). With this
feature, developers could inspect the state of a running application, evaluate
expressions, and step through code dynamically. This approach would align
Python's debugging capabilities with those of other major programming languages
and debugging tools that support this mode.

Specification
=============


This proposal introduces a safe debugging mechanism that allows external
processes to trigger code execution in a Python interpreter at well-defined safe
points. The key insight is that rather than injecting code directly via system
debuggers, we can leverage the interpreter's existing evaluation loop and thread
state to coordinate debugging operations.

The mechanism works by having debuggers write to specific memory locations in
the target process that the interpreter then checks during its normal execution
cycle. When the interpreter detects that a debugger wants to attach, it executes the
requested operations only when it's safe to do so - that is, when no internal
locks are held and all data structures are in a consistent state.


Runtime State Extensions
------------------------

A new structure is added to PyThreadState to support remote debugging:

.. code-block:: C

    typedef struct _remote_debugger_support {
        int debugger_pending_call;
        char debugger_script[MAX_SCRIPT_SIZE];
    } _PyRemoteDebuggerSupport;


This structure is appended to ``PyThreadState``, adding only a few fields that
are **never accessed during normal execution**. The ``debugger_pending_call`` field
indicates when a debugger has requested execution, while ``debugger_script``
provides Python code to be executed when the interpreter reaches a safe point.


Debug Offsets Table
-------------------


Python 3.12 introduced a debug offsets table placed at the start of the
PyRuntime structure. This section contains the ``_Py_DebugOffsets`` structure that
allows external tools to reliably find critical runtime structures regardless of
`ASLR <https://en.wikipedia.org/wiki/Address_space_layout_randomization>`__ or
how Python was compiled.

This proposal extends the existing debug offsets table with new fields for
debugger support:

.. code-block:: C

    struct _debugger_support {
        uint64_t eval_breaker;            // Location of the eval breaker flag
        uint64_t remote_debugger_support; // Offset to our support structure
        uint64_t debugger_pending_call;   // Where to write the pending flag
        uint64_t debugger_script;         // Where to write the script
    } debugger_support;

These offsets allow debuggers to locate critical debugging control structures in
the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support``
offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call``
and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport``
structure, allowing the new structure and its fields to be found regardless of
where they are in memory.

Attachment Protocol
-------------------
When a debugger wants to attach to a Python process, it follows these steps:

1. Locate ``PyRuntime`` structure in the process:

   - Find Python binary (executable or libpython) in process memory (OS dependent process)
   - Extract ``.PyRuntime`` section offset from binary's format (ELF/Mach-O/PE)
   - Calculate the actual ``PyRuntime`` address in the running process by relocating the offset to the binary's load address

2. Access debug offset information by reading the ``_Py_DebugOffsets`` at the start of the ``PyRuntime`` structure.

3. Use the offsets to locate the desired thread state

4. Use the offsets to locate the debugger interface fields within that thread state

5. Write control information:

   - Write python code to be executed into the ``debugger_script`` field in ``_PyRemoteDebuggerSupport``
   - Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport``
   - Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field

Once the interpreter reaches the next safe point, it will execute the script
provided by the debugger.

Interpreter Integration
-----------------------

The interpreter's regular evaluation loop already includes a check of the
``eval_breaker`` flag for handling signals, periodic tasks, and other interrupts. We
leverage this existing mechanism by checking for debugger pending calls only
when the ``eval_breaker`` is set, ensuring zero overhead during normal execution.
This check has no overhead. Indeed, profiling with Linux ``perf`` shows this branch
is highly predictable - the ``debugger_pending_call`` check is never taken during
normal execution, allowing modern CPUs to effectively speculate past it.


When a debugger has set both the ``eval_breaker`` flag and ``debugger_pending_call``,
the interpreter will execute the provided debugging code at the next safe point
and executes the provided code. This all happens in a completely safe context, since
the interpreter is guaranteed to be in a consistent state whenever the eval breaker
is checked.

.. code-block:: c

    // In ceval.c
    if (tstate->eval_breaker) {
        if (tstate->remote_debugger_support.debugger_pending_call) {
            tstate->remote_debugger_support.debugger_pending_call = 0;
            if (tstate->remote_debugger_support.debugger_script[0]) {
               if (PyRun_SimpleString(tstate->remote_debugger_support.debugger_script)<0) {
                   PyErr_Clear();
               };
               // ...
            }
        }
    }


Python API
----------

To support safe execution of Python code in a remote process without having to
re-implement all these steps in every tool, this proposal extends the ``sys`` module
with a new function. This function allows debuggers or external tools to execute
arbitrary Python code within the context of a specified Python process:

.. code-block:: python

  def remote_exec(pid: int, code: str) -> None:
      """
      Executes a block of Python code in a given remote Python process.

      Args:
           pid (int): The process ID of the target Python process.
           code (str): A string containing the Python code to be executed.
      """

An example usage of the API would look like:

.. code-block:: python

    import sys
    # Execute a print statement in a remote Python process with PID 12345
    try:
        sys.remote_exec(12345, "print('Hello from remote execution!')")
    except Exception as e:
        print(f"Failed to execute code: {e}")


Backwards Compatibility
=======================

This change has no impact on existing Python code or interpreter performance.
The added fields are only accessed during debugger attachment, and the checking
mechanism piggybacks on existing interpreter safe points.


Security Implications
=====================

This interface does not introduce new security concerns as it relies entirely on
existing operating system security mechanisms for process memory access. Although
the PEP doesn't specify how memory should be written to the target process, in practice
this will be done using standard system calls that are already being used by other
debuggers and tools. Some examples are:

* On Linux, the ``process_vm_readv()`` and ``process_vm_writev()`` system calls
  are used to read and write memory from another process. These operations are
  controlled by ptrace access mode checks - the same ones that govern debugger
  attachment. A process can only read from or write to another process's memory
  if it has the appropriate permissions (typically requiring either root or the
  ``CAP_SYS_PTRACE`` capability, though less security minded distributions may
  allow any process running as the same uid to attach).

* On macOS, the interface would leverage ``mach_vm_read_overwrite()`` and
  ``mach_vm_write()`` through the Mach task system. These operations require
  ``task_for_pid()`` access, which is strictly controlled by the operating
  system. By default, access is limited to processes running as root or those
  with specific entitlements granted by Apple's security framework.

* On Windows, the ``ReadProcessMemory()`` and ``WriteProcessMemory()`` functions
  provide similar functionality. Access is controlled through the Windows
  security model - a process needs ``PROCESS_VM_READ`` and ``PROCESS_VM_WRITE``
  permissions, which typically require the same user context or appropriate
  privileges. These are the same permissions required by debuggers, ensuring
  consistent security semantics across platforms.

All mechanisms ensure that:

1. Only authorized processes can read/write memory
2. The same security model that governs traditional debugger attachment applies
3. No additional attack surface is exposed beyond what the OS already provides for debugging

The memory operations themselves are well-established and have been used safely
for decades in tools like GDB, LLDB, and various system profilers.

It’s important to note that any attempt to attach to a Python process via this
mechanism would be detectable by system-level monitoring tools. This
transparency provides an additional layer of accountability, allowing
administrators to audit debugging operations in sensitive environments.

Further, the strict reliance on OS-level security controls ensures that existing
system policies remain effective. For enterprise environments, this means
administrators can continue to enforce debugging restrictions using standard
tools and policies without requiring additional configuration. For instance,
leveraging Linux’s ``ptrace_scope`` or macOS’s ``taskgated`` to restrict
debugger access will equally govern the proposed interface.

By maintaining compatibility with existing security frameworks, this design
ensures that adopting the new interface requires no changes to established
security practices, thereby minimizing barriers to adoption.

How to Teach This
=================

For tool authors, this interface becomes the standard way to implement debugger
attachment, replacing unsafe system debugger approaches. A section in the Python
Developer Guide could describe the internal workings of the mechanism, including
the ``debugger_support`` offsets and how to interact with them using system
APIs.

End users need not be aware of the interface, benefiting only from improved
debugging tool stability and reliability.

Reference Implementation
========================

https://github.com/pablogsal/cpython/commits/remote_pdb/


Copyright
=========

This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.