python-peps/peps/pep-0669.rst

PEP: 669
Title: Low Impact Monitoring for CPython
Author: Mark Shannon <mark@hotpy.org>
Discussions-To: https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/
Status: Final
Type: Standards Track
Created: 18-Aug-2021
Python-Version: 3.12
Post-History: `07-Dec-2021 <https://mail.python.org/archives/list/python-dev@python.org/thread/VNSD4TSAM2BM64FJNIQPAOPNEGNX4MDX/>`__,
              `10-Jan-2022 <https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018>`__,
Resolution: https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/42

.. canonical-doc:: :mod:`python:sys.monitoring`

Abstract
========

Using a profiler or debugger in CPython can have a severe impact on
performance. Slowdowns by an order of magnitude are common.

This PEP proposes an API for monitoring Python programs running
on CPython that will enable monitoring at low cost.

Although this PEP does not specify an implementation, it is expected that
it will be implemented using the quickening step of
:pep:`659`.

A ``sys.monitoring`` namespace will be added, which will contain
the relevant functions and constants.


Motivation
==========

Developers should not have to pay an unreasonable cost to use debuggers,
profilers and other similar tools.

C++ and Java developers expect to be able to run a program at full speed
(or very close to it) under a debugger.
Python developers should expect that too.

Rationale
=========

The quickening mechanism provided by :pep:`659` provides a way to dynamically
modify executing Python bytecode. These modifications have little cost beyond
the parts of the code that are modified and a relatively low cost to those 
parts that are modified. We can leverage this to provide an efficient
mechanism for monitoring that was not possible in 3.10 or earlier.

By using quickening, we expect that code run under a debugger on 3.12
should outperform code run without a debugger on 3.11.
Profiling will still slow down execution, but by much less than in 3.11.


Specification
=============

Monitoring of Python programs is done by registering callback functions
for events and by activating a set of events.

Activating events and registering callback functions are independent of each other.

Both registering callbacks and activating events are done on a per-tool basis.
It is possible to have multiple tools that respond to different sets of events.

Note that, unlike ``sys.settrace()``, events and callbacks are per interpreter, not per thread.

Events
------

As a code object executes various events occur that might be of interest
to tools. By activating events and by registering callback functions
tools can respond to these events in any way that suits them.
Events can be set globally, or for individual code objects.

For 3.12, CPython will support the following events:

* PY_START: Start of a Python function (occurs immediately after the call, the callee's frame will be on the stack)
* PY_RESUME: Resumption of a Python function (for generator and coroutine functions), except for throw() calls.
* PY_THROW: A Python function is resumed by a throw() call.
* PY_RETURN: Return from a Python function (occurs immediately before the return, the callee's frame will be on the stack).
* PY_YIELD: Yield from a Python function (occurs immediately before the yield, the callee's frame will be on the stack).
* PY_UNWIND:  Exit from a Python function during exception unwinding.
* CALL: A call in Python code (event occurs before the call).
* C_RETURN: Return from any callable, except Python functions (event occurs after the return).
* C_RAISE: Exception raised from any callable, except Python functions (event occurs after the exit).
* RAISE: An exception is raised, except those that cause a ``STOP_ITERATION`` event.
* EXCEPTION_HANDLED: An exception is handled.
* LINE: An instruction is about to be executed that has a different line number from the preceding instruction.
* INSTRUCTION -- A VM instruction is about to be executed.
* JUMP -- An unconditional jump in the control flow graph is made.
* BRANCH -- A conditional branch is taken (or not).
* STOP_ITERATION -- An artificial ``StopIteration`` is raised;
  see `the STOP_ITERATION event`_.

More events may be added in the future.

All events will be attributes of the ``events`` namespace in ``sys.monitoring``.
All events will represented by a power of two integer, so that they can be combined
with the ``|`` operator.

Events are divided into three groups:

Local events
''''''''''''

Local events are associated with normal execution of the program and happen
at clearly defined locations. All local events can be disabled.
The local events are:

* PY_START
* PY_RESUME
* PY_RETURN
* PY_YIELD
* CALL
* LINE
* INSTRUCTION
* JUMP
* BRANCH
* STOP_ITERATION

Ancilliary events
'''''''''''''''''

Ancillary events can be monitored like other events, but are controlled
by another event:

* C_RAISE
* C_RETURN

The ``C_RETURN`` and ``C_RAISE`` events are are controlled by the ``CALL``
event. ``C_RETURN`` and ``C_RAISE`` events will only be seen if the
corresponding ``CALL`` event is being monitored.

Other events
''''''''''''

Other events are not necessarily tied to a specific location in the
program and cannot be individually disabled.

The other events that can be monitored are:

* PY_THROW
* PY_UNWIND
* RAISE
* EXCEPTION_HANDLED


The STOP_ITERATION event
''''''''''''''''''''''''

:pep:`PEP 380 <380#use-of-stopiteration-to-return-values>`
specifies that a ``StopIteration`` exception is raised when returning a value
from a generator or coroutine. However, this is a very inefficient way to 
return a value, so some Python implementations, notably CPython 3.12+, do not
raise an exception unless it would be visible to other code.

To allow tools to monitor for real exceptions without slowing down generators
and coroutines, the ``STOP_ITERATION`` event is provided.
``STOP_ITERATION`` can be locally disabled, unlike ``RAISE``.

Tool identifiers
----------------

The VM can support up to 6 tools at once.
Before registering or activating events, a tool should choose an identifier.
Identifiers are integers in the range 0 to 5.

::

  sys.monitoring.use_tool_id(id, name:str) -> None
  sys.monitoring.free_tool_id(id) -> None
  sys.monitoring.get_tool(id) ->  str | None

``sys.monitoring.use_tool_id`` raises a ``ValueError`` if ``id`` is in use.
``sys.monitoring.get_tool`` returns the name of the tool if ``id`` is in use,
otherwise it returns ``None``.

All IDs are treated the same by the VM with regard to events, but the
following IDs are pre-defined to make co-operation of tools easier::

  sys.monitoring.DEBUGGER_ID = 0
  sys.monitoring.COVERAGE_ID = 1
  sys.monitoring.PROFILER_ID = 2
  sys.monitoring.OPTIMIZER_ID = 5

There is no obligation to set an ID, nor is there anything preventing a tool
from using an ID even it is already in use.
However, tools are encouraged to use a unique ID and respect other tools.

For example, if a debugger were attached and ``DEBUGGER_ID`` were in use, it
should report an error, rather than carrying on regardless.

The ``OPTIMIZER_ID`` is provided for tools like Cinder or PyTorch
that want to optimize Python code, but need to decide what to
optimize in a way that depends on some wider context.

Setting events globally
-----------------------

Events can be controlled globally by modifying the set of events being monitored:

* ``sys.monitoring.get_events(tool_id:int)->int``
  Returns the ``int`` representing all the active events.

* ``sys.monitoring.set_events(tool_id:int, event_set: int)``
  Activates all events which are set in ``event_set``.
  Raises a ``ValueError`` if ``tool_id`` is not in use.

No events are active by default.

Per code object events
----------------------

Events can also be controlled on a per code object basis:

* ``sys.monitoring.get_local_events(tool_id:int, code: CodeType)->int``
  Returns all the local events for ``code``

* ``sys.monitoring.set_local_events(tool_id:int, code: CodeType, event_set: int)``
  Activates all the local events for ``code``  which are set in ``event_set``.
  Raises a ``ValueError`` if ``tool_id`` is not in use.

Local events add to global events, but do not mask them.
In other words, all global events will trigger for a code object,
regardless of the local events.

Register callback functions
---------------------------

To register a callable for events call::

  sys.monitoring.register_callback(tool_id:int, event: int, func: Callable | None) -> Callable | None

If another callback was registered for the given ``tool_id`` and ``event``,
it is unregistered and returned.
Otherwise ``register_callback`` returns ``None``.

Functions can be unregistered by calling
``sys.monitoring.register_callback(tool_id, event, None)``.

Callback functions can be registered and unregistered at any time.

Registering or unregistering a callback function will generate a ``sys.audit`` event.

Callback function arguments
'''''''''''''''''''''''''''

When an active event occurs, the registered callback function is called.
Different events will provide the callback function with different arguments, as follows:

* ``PY_START`` and ``PY_RESUME``::

    func(code: CodeType, instruction_offset: int) -> DISABLE | Any

* ``PY_RETURN`` and ``PY_YIELD``:

    ``func(code: CodeType, instruction_offset: int, retval: object) -> DISABLE | Any``

* ``CALL``, ``C_RAISE`` and ``C_RETURN``:

    ``func(code: CodeType, instruction_offset: int, callable: object, arg0: object | MISSING) -> DISABLE | Any``

    If there are no arguments, ``arg0`` is set to ``MISSING``.

* ``RAISE`` and ``EXCEPTION_HANDLED``:

    ``func(code: CodeType, instruction_offset: int, exception: BaseException) -> DISABLE | Any``

* ``LINE``:

    ``func(code: CodeType, line_number: int) -> DISABLE | Any``

* ``BRANCH``:

    ``func(code: CodeType, instruction_offset: int, destination_offset: int) -> DISABLE | Any``

  Note that the ``destination_offset`` is where the code will next execute.
  For an untaken branch this will be the offset of the instruction following
  the branch.

* ``INSTRUCTION``:

    ``func(code: CodeType, instruction_offset: int) -> DISABLE | Any``


If a callback function returns ``DISABLE``, then that function will no longer
be called for that ``(code, instruction_offset)`` until
``sys.monitoring.restart_events()`` is called.
This feature is provided for coverage and other tools that are only interested
seeing an event once. 

Note that ``sys.monitoring.restart_events()`` is not specific to one tool,
so tools must be prepared to receive events that they have chosen to DISABLE.

Events in callback functions
----------------------------

Events are suspended in callback functions and their callees for the tool
that registered that callback.

That means that other tools will see events in the callback functions for other
tools. This could be useful for debugging a profiling tool, but would produce
misleading profiles, as the debugger tool would show up in the profile.

Order of events
---------------

If an instructions triggers several events they occur in the following order:

* LINE
* INSTRUCTION
* All other events (only one of these events can occur per instruction)

Each event is delivered to tools in ascending order of ID.

The "call" event group
----------------------

Most events are independent; setting or disabling one event has no effect on the others.
However, the ``CALL``, ``C_RAISE`` and ``C_RETURN`` events form a group.
If any of those events are set or disabled, then all events in the group are.
Disabling a ``CALL`` event will not disable the matching ``C_RAISE`` or ``C_RETURN``,
but will disable all subsequent events.


Attributes of the ``sys.monitoring`` namespace
----------------------------------------------

* ``def use_tool_id(id)->None``
* ``def free_tool_id(id)->None``
* ``def get_events(tool_id: int)->int``
* ``def set_events(tool_id: int, event_set: int)->None``
* ``def get_local_events(tool_id: int, code: CodeType)->int``
* ``def set_local_events(tool_id: int, code: CodeType, event_set: int)->None``
* ``def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]``
* ``def restart_events()->None``
* ``DISABLE: object``
* ``MISSING: object``

Access to "debug only" features
-------------------------------

Some features of the standard library are not accessible to normal code,
but are accessible to debuggers. For example, setting local variables, or
the line number.

These features will be available to callback functions.

Backwards Compatibility
=======================

This PEP is mostly backwards compatible.

There are some compatibility issues with :pep:`523`, as the behavior
of :pep:`523` plugins is outside of the VM's control.
It is up to :pep:`523` plugins to ensure that they respect the semantics
of this PEP. Simple plugins that do not change the state of the VM, and
defer execution to ``_PyEval_EvalFrameDefault()`` should continue to work.

:func:`sys.settrace` and :func:`sys.setprofile` will act as if they were tools
6 and 7 respectively, so can be used alongside this PEP.

This means that :func:`sys.settrace` and :func:`sys.setprofile` may not work
correctly with all :pep:`523` plugins. Although, simple :pep:`523`
plugins, as described above, should be fine.

Performance
-----------

If no events are active, this PEP should have a small positive impact on
performance. Experiments show between 1 and 2% speedup from not supporting
:func:`sys.settrace` directly.

The performance of :func:`sys.settrace` will be about the same.
The performance of :func:`sys.setprofile` should be better.
However, tools relying on :func:`sys.settrace` and
:func:`sys.setprofile` can be made a lot faster by using the
API provided by this PEP.

If a small set of events are active, e.g. for a debugger, then the overhead
of callbacks will be orders of magnitudes less than for :func:`sys.settrace`
and much cheaper than using :pep:`523`.

Coverage tools can be implemented at very low cost,
by returning ``DISABLE`` in all callbacks.

For heavily instrumented code, e.g. using ``LINE``, performance should be
better than ``sys.settrace``, but not by that much as performance will be
dominated by the time spent in callbacks.

For optimizing virtual machines, such as future versions of CPython
(and ``PyPy`` should they choose to support this API), changes to the set
active events in the midst of a long running program could be quite
expensive, possibly taking hundreds of milliseconds as it triggers
de-optimizations. Once such de-optimization has occurred, performance should
recover as the VM can re-optimize the instrumented code.

In general these operations can be considered to be fast:

* ``def get_events(tool_id: int)->int``
* ``def get_local_events(tool_id: int, code: CodeType)->int``
* ``def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]``
* ``def get_tool(tool_id) -> str | None``

These operations are slower, but not especially so:

* ``def set_local_events(tool_id: int, code: CodeType, event_set: int)->None``

And these operations should be regarded as slow:

* ``def use_tool_id(id, name:str)->None``
* ``def free_tool_id(id)->None``
* ``def set_events(tool_id: int, event_set: int)->None``
* ``def restart_events()->None``

How slow the slow operations are depends on when they happen.
If done early in the program, before modules are loaded,
they should be fairly inexpensive.

Memory Consumption
''''''''''''''''''

When not in use, this PEP will have a negligible change on memory consumption.

How memory is used is very much an implementation detail.
However, we expect that for 3.12 the additional memory consumption per
code object will be **roughly** as follows:

+-------------+--------+--------+-------------+
|                      |   Events             |
+-------------+--------+--------+-------------+
|    Tools    | Others |  LINE  | INSTRUCTION |
+=============+========+========+=============+
|      One    | None   |  ≈40%  |    ≈80%     |
+-------------+--------+--------+-------------+
+ Two or more |  ≈40%  | ≈120%  |    ≈200%    |
+-------------+--------+--------+-------------+


Security Implications
=====================

Allowing modification of running code has some security implications,
but no more than the ability to generate and call new code.

All the new functions listed above will trigger audit hooks.

Implementation
==============

This outlines the proposed implementation for CPython 3.12. The actual
implementation for later versions of CPython and other Python implementations
may differ considerably.

The proposed implementation of this PEP will be built on top of the quickening
step of CPython 3.11, as described in :pep:`PEP 659 <659#quickening>`.
Instrumentation works in much the same way as quickening, bytecodes are
replaced with instrumented ones as needed.

For example, if the ``CALL`` event is turned on,
then all call instructions will be
replaced with a ``INSTRUMENTED_CALL`` instruction.

Note that this will interfere with specialization, which will result in some
performance degradation in addition to the overhead of calling the
registered callable.

When the set of active events changes, the VM will immediately update
all code objects present on the call stack of any thread. It will also set in
place traps to ensure that all code objects are correctly instrumented when
called. Consequently changing the set of active events should be done as 
infrequently as possible, as it could be quite an expensive operation.

Other events, such as ``RAISE`` can be turned on or off cheaply,
as they do not rely on code instrumentation, but runtime checks when the
underlying event occurs.

The exact set of events that require instrumentation is an implementation detail,
but for the current design, the following events will require instrumentation:

* PY_START
* PY_RESUME
* PY_RETURN
* PY_YIELD
* CALL
* LINE
* INSTRUCTION
* JUMP
* BRANCH

Each instrumented bytecode will require an additional 8 bits of information to
note which tool the instrumentation applies to.
``LINE`` and ``INSTRUCTION`` events require additional information, as they
need to store the original instruction, or even the instrumented instruction
if they overlap other instrumentation.


Implementing tools
==================

It is the philosophy of this PEP that it should be possible for third-party monitoring
tools to achieve high-performance, not that it should be easy for them to do so.

Converting events into data that is meaningful to the users is
the responsibility of the tool.

All events have a cost, and tools should attempt to the use set of events
that trigger the least often and still provide the necessary information.

Debuggers
---------

Inserting breakpoints
'''''''''''''''''''''

Breakpoints can be inserted setting per code object events, either ``LINE`` or ``INSTRUCTION``,
and returning ``DISABLE`` for any events not matching a breakpoint.

Stepping
''''''''

Debuggers usually offer the ability to step execution by a
single instruction or line.

Like breakpoints, stepping can be implemented by setting per code object events.
As soon as normal execution is to be resumed, the local events can be unset.

Attaching
'''''''''

Debuggers can use the ``PY_START`` and ``PY_RESUME`` events to be informed
when a code object is first encountered, so that any necessary breakpoints
can be inserted.

Coverage Tools
--------------

Coverage tools need to track which parts of the control graph have been
executed. To do this, they need to register for the ``PY_`` events,
plus ``JUMP`` and ``BRANCH``.

This information can be then be converted back into a line based report
after execution has completed.

Profilers
---------

Simple profilers need to gather information about calls.
To do this profilers should register for the following events:

* PY_START
* PY_RESUME
* PY_THROW
* PY_RETURN
* PY_YIELD
* PY_UNWIND
* CALL
* C_RAISE
* C_RETURN


Line based profilers
''''''''''''''''''''

Line based profilers can use the ``LINE`` and ``JUMP`` events.
Implementers of profilers should be aware that instrumenting ``LINE``
events will have a large impact on performance.

.. note::

  Instrumenting profilers have significant overhead and will distort 
  the results of profiling. Unless you need exact call counts,
  consider using a statistical profiler.


Rejected ideas
==============

A draft version of this PEP proposed making the user responsible
for inserting the monitoring instructions, rather than have VM do it.
However, that puts too much of a burden on the tools, and would make
attaching a debugger nearly impossible.

An earlier version of this PEP, proposed storing events as ``enums``::

  class Event(enum.IntFlag):
      PY_START = ...

However, that would prevent monitoring of code before the ``enum`` module was
loaded and could cause unnecessary overhead.

Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.