604 lines
20 KiB
ReStructuredText
604 lines
20 KiB
ReStructuredText
PEP: 669
|
|
Title: Low Impact Monitoring for CPython
|
|
Author: Mark Shannon <mark@hotpy.org>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 18-Aug-2021
|
|
Post-History: 07-Dec-2021
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Using a profiler or debugger in CPython can have a severe impact on
|
|
performance. Slowdowns by an order of magnitude are common.
|
|
|
|
This PEP proposes an API for monitoring Python programs running
|
|
on CPython that will enable monitoring at low cost.
|
|
|
|
Although this PEP does not specify an implementation, it is expected that
|
|
it will be implemented using the quickening step of
|
|
:pep:`659`.
|
|
|
|
A ``sys.monitoring`` namespace will be added, which will contain
|
|
the relevant functions and constants.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Developers should not have to pay an unreasonable cost to use debuggers,
|
|
profilers and other similar tools.
|
|
|
|
C++ and Java developers expect to be able to run a program at full speed
|
|
(or very close to it) under a debugger.
|
|
Python developers should expect that too.
|
|
|
|
Rationale
|
|
=========
|
|
|
|
The quickening mechanism provided by :pep:`659` provides a way to dynamically
|
|
modify executing Python bytecode. These modifications have little cost beyond
|
|
the parts of the code that are modified and a relatively low cost to those
|
|
parts that are modified. We can leverage this to provide an efficient
|
|
mechanism for monitoring that was not possible in 3.10 or earlier.
|
|
|
|
By using quickening, we expect that code run under a debugger on 3.12
|
|
should outperform code run without a debugger on 3.11.
|
|
Profiling will still slow down execution, but by much less than in 3.11.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
Monitoring of Python programs is done by registering callback functions
|
|
for events and by activating a set of events.
|
|
|
|
Activating events and registering callback functions are independent of each other.
|
|
|
|
Both registering callbacks and activating events are done on a per-tool basis.
|
|
It is possible to have multiple tools that respond to different sets of events.
|
|
|
|
Note that, unlike ``sys.settrace()``, events and callbacks are per interpreter, not per thread.
|
|
|
|
Events
|
|
------
|
|
|
|
As a code object executes various events occur that might be of interest
|
|
to tools. By activating events and by registering callback functions
|
|
tools can respond to these events in any way that suits them.
|
|
Events can be set globally, or for individual code objects.
|
|
|
|
For 3.12, CPython will support the following events:
|
|
|
|
* PY_START: Start of a Python function (occurs immediately after the call, the callee's frame will be on the stack)
|
|
* PY_RESUME: Resumption of a Python function (for generator and coroutine functions), except for throw() calls.
|
|
* PY_THROW: A Python function is resumed by a throw() call.
|
|
* PY_RETURN: Return from a Python function (occurs immediately before the return, the callee's frame will be on the stack).
|
|
* PY_YIELD: Yield from a Python function (occurs immediately before the yield, the callee's frame will be on the stack).
|
|
* PY_UNWIND: Exit from a Python function during exception unwinding.
|
|
* CALL: A call in Python code (event occurs before the call).
|
|
* C_RETURN: Return from any callable, except Python functions (event occurs after the return).
|
|
* C_RAISE: Exception raised from any callable, except Python functions (event occurs after the exit).
|
|
* RAISE: An exception is raised, except those that cause a ``STOP_ITERATION`` event.
|
|
* EXCEPTION_HANDLED: An exception is handled.
|
|
* LINE: An instruction is about to be executed that has a different line number from the preceding instruction.
|
|
* INSTRUCTION -- A VM instruction is about to be executed.
|
|
* JUMP -- An unconditional jump in the control flow graph is made.
|
|
* BRANCH -- A conditional branch is taken (or not).
|
|
* STOP_ITERATION -- An artificial ``StopIteration`` is raised; see :ref:`669-stopiteration`.
|
|
|
|
More events may be added in the future.
|
|
|
|
All events will be attributes of the ``events`` namespace in ``sys.monitoring``.
|
|
All events will represented by a power of two integer, so that they can be combined
|
|
with the ``|`` operator.
|
|
|
|
Events are divided into three groups:
|
|
|
|
Local events
|
|
''''''''''''
|
|
|
|
Local events are associated with normal execution of the program and happen
|
|
at clearly defined locations. All local events can be disabled.
|
|
The local events are:
|
|
|
|
* PY_START
|
|
* PY_RESUME
|
|
* PY_RETURN
|
|
* PY_YIELD
|
|
* CALL
|
|
* LINE
|
|
* INSTRUCTION
|
|
* JUMP
|
|
* BRANCH
|
|
* STOP_ITERATION
|
|
|
|
Ancilliary events
|
|
'''''''''''''''''
|
|
|
|
Ancillary events can be monitored like other events, but are controlled
|
|
by another event:
|
|
|
|
* C_RAISE
|
|
* C_RETURN
|
|
|
|
The ``C_RETURN`` and ``C_RAISE`` events are are controlled by the ``CALL``
|
|
event. ``C_RETURN`` and ``C_RAISE`` events will only be seen if the
|
|
corresponding ``CALL`` event is being monitored.
|
|
|
|
Other events
|
|
''''''''''''
|
|
|
|
Other events are not necessarily tied to a specific location in the
|
|
program and cannot be individually disabled.
|
|
|
|
The other events that can be monitored are:
|
|
|
|
* PY_THROW
|
|
* PY_UNWIND
|
|
* RAISE
|
|
* EXCEPTION_HANDLED
|
|
|
|
|
|
.. _669-stopiteration:
|
|
|
|
The STOP_ITERATION event
|
|
''''''''''''''''''''''''
|
|
|
|
:pep:`PEP 380 <380#use-of-stopiteration-to-return-values>`
|
|
specifies that a ``StopIteration`` exception is raised when returning a value
|
|
from a generator or coroutine. However, this is a very inefficient way to
|
|
return a value, so some Python implementations, notably CPython 3.12+, do not
|
|
raise an exception unless it would be visible to other code.
|
|
|
|
To allow tools to monitor for real exceptions without slowing down generators
|
|
and coroutines, the ``STOP_ITERATION`` event is provided.
|
|
``STOP_ITERATION`` can be locally disabled, unlike ``RAISE``.
|
|
|
|
Tool identifiers
|
|
----------------
|
|
|
|
The VM can support up to 6 tools at once.
|
|
Before registering or activating events, a tool should choose an identifier.
|
|
Identifiers are integers in the range 0 to 5.
|
|
|
|
::
|
|
|
|
sys.monitoring.use_tool_id(id, name:str) -> None
|
|
sys.monitoring.free_tool_id(id) -> None
|
|
sys.monitoring.get_tool(id) -> str | None
|
|
|
|
``sys.monitoring.use_tool_id`` raises a ``ValueError`` if ``id`` is in use.
|
|
``sys.monitoring.get_tool`` returns the name of the tool if ``id`` is in use,
|
|
otherwise it returns ``None``.
|
|
|
|
All IDs are treated the same by the VM with regard to events, but the
|
|
following IDs are pre-defined to make co-operation of tools easier::
|
|
|
|
sys.monitoring.DEBUGGER_ID = 0
|
|
sys.monitoring.COVERAGE_ID = 1
|
|
sys.monitoring.PROFILER_ID = 2
|
|
sys.monitoring.OPTIMIZER_ID = 5
|
|
|
|
There is no obligation to set an ID, nor is there anything preventing a tool
|
|
from using an ID even it is already in use.
|
|
However, tools are encouraged to use a unique ID and respect other tools.
|
|
|
|
For example, if a debugger were attached and ``DEBUGGER_ID`` were in use, it
|
|
should report an error, rather than carrying on regardless.
|
|
|
|
The ``OPTIMIZER_ID`` is provided for tools like Cinder or PyTorch
|
|
that want to optimize Python code, but need to decide what to
|
|
optimize in a way that depends on some wider context.
|
|
|
|
Setting events globally
|
|
-----------------------
|
|
|
|
Events can be controlled globally by modifying the set of events being monitored:
|
|
|
|
* ``sys.monitoring.get_events(tool_id:int)->int``
|
|
Returns the ``int`` representing all the active events.
|
|
|
|
* ``sys.monitoring.set_events(tool_id:int, event_set: int)``
|
|
Activates all events which are set in ``event_set``.
|
|
|
|
No events are active by default.
|
|
|
|
Per code object events
|
|
----------------------
|
|
|
|
Events can also be controlled on a per code object basis:
|
|
|
|
* ``sys.monitoring.get_local_events(tool_id:int, code: CodeType)->int``
|
|
Returns all the local events for ``code``
|
|
|
|
* ``sys.monitoring.set_local_events(tool_id:int, code: CodeType, event_set: int)``
|
|
Activates all the local events for ``code`` which are set in ``event_set``.
|
|
|
|
Local events add to global events, but do not mask them.
|
|
In other words, all global events will trigger for a code object,
|
|
regardless of the local events.
|
|
|
|
Register callback functions
|
|
---------------------------
|
|
|
|
To register a callable for events call::
|
|
|
|
sys.monitoring.register_callback(tool_id:int, event: int, func: Callable | None) -> Callable | None
|
|
|
|
If another callback was registered for the given ``tool_id`` and ``event``,
|
|
it is unregistered and returned.
|
|
Otherwise ``register_callback`` returns ``None``.
|
|
|
|
Functions can be unregistered by calling
|
|
``sys.monitoring.register_callback(tool_id, event, None)``.
|
|
|
|
Callback functions can be registered and unregistered at any time.
|
|
|
|
Registering or unregistering a callback function will generate a ``sys.audit`` event.
|
|
|
|
Callback function arguments
|
|
'''''''''''''''''''''''''''
|
|
|
|
When an active event occurs, the registered callback function is called.
|
|
Different events will provide the callback function with different arguments, as follows:
|
|
|
|
* ``PY_START`` and ``PY_RESUME``::
|
|
|
|
func(code: CodeType, instruction_offset: int) -> DISABLE | Any
|
|
|
|
* ``PY_RETURN`` and ``PY_YIELD``:
|
|
|
|
``func(code: CodeType, instruction_offset: int, retval: object) -> DISABLE | Any``
|
|
|
|
* ``CALL``, ``C_RAISE`` and ``C_RETURN``:
|
|
|
|
``func(code: CodeType, instruction_offset: int, callable: object, arg0: object | MISSING) -> DISABLE | Any``
|
|
|
|
If there are no arguments, ``arg0`` is set to ``MISSING``.
|
|
|
|
* ``RAISE`` and ``EXCEPTION_HANDLED``:
|
|
|
|
``func(code: CodeType, instruction_offset: int, exception: BaseException) -> DISABLE | Any``
|
|
|
|
* ``LINE``:
|
|
|
|
``func(code: CodeType, line_number: int) -> DISABLE | Any``
|
|
|
|
* ``BRANCH``:
|
|
|
|
``func(code: CodeType, instruction_offset: int, destination_offset: int) -> DISABLE | Any``
|
|
|
|
Note that the ``destination_offset`` is where the code will next execute.
|
|
For an untaken branch this will be the offset of the instruction following
|
|
the branch.
|
|
|
|
* ``INSTRUCTION``:
|
|
|
|
``func(code: CodeType, instruction_offset: int) -> DISABLE | Any``
|
|
|
|
|
|
If a callback function returns ``DISABLE``, then that function will no longer
|
|
be called for that ``(code, instruction_offset)`` until
|
|
``sys.monitoring.restart_events()`` is called.
|
|
This feature is provided for coverage and other tools that are only interested
|
|
seeing an event once.
|
|
|
|
Note that ``sys.monitoring.restart_events()`` is not specific to one tool,
|
|
so tools must be prepared to receive events that they have chosen to DISABLE.
|
|
|
|
Events in callback functions
|
|
----------------------------
|
|
|
|
Events are suspended in callback functions and their callees for the tool
|
|
that registered that callback.
|
|
|
|
That means that other tools will see events in the callback functions for other
|
|
tools. This could be useful for debugging a profiling tool, but would produce
|
|
misleading profiles, as the debugger tool would show up in the profile.
|
|
|
|
Order of events
|
|
---------------
|
|
|
|
If an instructions triggers several events they occur in the following order:
|
|
|
|
* INSTRUCTION
|
|
* LINE
|
|
* All other events (only one of these events can occur per instruction)
|
|
|
|
Each event is delivered to tools in ascending order of ID.
|
|
|
|
The "call" event group
|
|
----------------------
|
|
|
|
Most events are independent; setting or disabling one event has no effect on the others.
|
|
However, the ``CALL``, ``C_RAISE`` and ``C_RETURN`` events form a group.
|
|
If any of those events are set or disabled, then all events in the group are.
|
|
Disabling a ``CALL`` event will not disable the matching ``C_RAISE`` or ``C_RETURN``,
|
|
but will disable all subsequent events.
|
|
|
|
|
|
Attributes of the ``sys.monitoring`` namespace
|
|
----------------------------------------------
|
|
|
|
* ``def use_tool_id(id)->None``
|
|
* ``def free_tool_id(id)->None``
|
|
* ``def get_events(tool_id: int)->int``
|
|
* ``def set_events(tool_id: int, event_set: int)->None``
|
|
* ``def get_local_events(tool_id: int, code: CodeType)->int``
|
|
* ``def set_local_events(tool_id: int, code: CodeType, event_set: int)->None``
|
|
* ``def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]``
|
|
* ``def restart_events()->None``
|
|
* ``DISABLE: object``
|
|
* ``MISSING: object``
|
|
|
|
Access to "debug only" features
|
|
-------------------------------
|
|
|
|
Some features of the standard library are not accessible to normal code,
|
|
but are accessible to debuggers. For example, setting local variables, or
|
|
the line number.
|
|
|
|
These features will be available to callback functions.
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
This PEP is mostly backwards compatible.
|
|
|
|
There are some compatibility issues with :pep:`523`, as the behavior
|
|
of :pep:`523` plugins is outside of the VM's control.
|
|
It is up to :pep:`523` plugins to ensure that they respect the semantics
|
|
of this PEP. Simple plugins that do not change the state of the VM, and
|
|
defer execution to ``_PyEval_EvalFrameDefault()`` should continue to work.
|
|
|
|
:func:`sys.settrace` and :func:`sys.setprofile` will act as if they were tools
|
|
6 and 7 respectively, so can be used alongside this PEP.
|
|
|
|
This means that :func:`sys.settrace` and :func:`sys.setprofile` may not work
|
|
correctly with all :pep:`523` plugins. Although, simple :pep:`523`
|
|
plugins, as described above, should be fine.
|
|
|
|
Performance
|
|
-----------
|
|
|
|
If no events are active, this PEP should have a small positive impact on
|
|
performance. Experiments show between 1 and 2% speedup from not supporting
|
|
:func:`sys.settrace` directly.
|
|
|
|
The performance of :func:`sys.settrace` will be about the same.
|
|
The performance of :func:`sys.setprofile` should be better.
|
|
However, tools relying on :func:`sys.settrace` and
|
|
:func:`sys.setprofile` can be made a lot faster by using the
|
|
API provided by this PEP.
|
|
|
|
If a small set of events are active, e.g. for a debugger, then the overhead
|
|
of callbacks will be orders of magnitudes less than for :func:`sys.settrace`
|
|
and much cheaper than using :pep:`523`.
|
|
|
|
Coverage tools can be implemented at very low cost,
|
|
by returning ``DISABLE`` in all callbacks.
|
|
|
|
For heavily instrumented code, e.g. using ``LINE``, performance should be
|
|
better than ``sys.settrace``, but not by that much as performance will be
|
|
dominated by the time spent in callbacks.
|
|
|
|
For optimizing virtual machines, such as future versions of CPython
|
|
(and ``PyPy`` should they choose to support this API), changes to the set
|
|
active events in the midst of a long running program could be quite
|
|
expensive, possibly taking hundreds of milliseconds as it triggers
|
|
de-optimizations. Once such de-optimization has occurred, performance should
|
|
recover as the VM can re-optimize the instrumented code.
|
|
|
|
In general these operations can be considered to be fast:
|
|
|
|
* ``def get_events(tool_id: int)->int``
|
|
* ``def get_local_events(tool_id: int, code: CodeType)->int``
|
|
* ``def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]``
|
|
* ``def get_tool(tool_id) -> str | None``
|
|
|
|
These operations are slower, but not especially so:
|
|
|
|
* ``def set_local_events(tool_id: int, code: CodeType, event_set: int)->None``
|
|
|
|
And these operations should be regarded as slow:
|
|
|
|
* ``def use_tool_id(id, name:str)->None``
|
|
* ``def free_tool_id(id)->None``
|
|
* ``def set_events(tool_id: int, event_set: int)->None``
|
|
* ``def restart_events()->None``
|
|
|
|
How slow the slow operations are depends on when they happen.
|
|
If done early in the program, before modules are loaded,
|
|
they should be fairly inexpensive.
|
|
|
|
Memory Consumption
|
|
''''''''''''''''''
|
|
|
|
When not in use, this PEP will have a neglible change on memory consumption.
|
|
|
|
How memory is used is very much an implementation detail.
|
|
However, we expect that for 3.12 the additional memory consumption per
|
|
code object will be **roughly** as follows:
|
|
|
|
+-------------+--------+--------+-------------+
|
|
| | Events |
|
|
+-------------+--------+--------+-------------+
|
|
| Tools | Others | LINE | INSTRUCTION |
|
|
+=============+========+========+=============+
|
|
| One | None | ≈40% | ≈80% |
|
|
+-------------+--------+--------+-------------+
|
|
+ Two or more | ≈40% | ≈120% | ≈200% |
|
|
+-------------+--------+--------+-------------+
|
|
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
Allowing modification of running code has some security implications,
|
|
but no more than the ability to generate and call new code.
|
|
|
|
All the new functions listed above will trigger audit hooks.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
This outlines the proposed implementation for CPython 3.12. The actual
|
|
implementation for later versions of CPython and other Python implementations
|
|
may differ considerably.
|
|
|
|
The proposed implementation of this PEP will be built on top of the quickening
|
|
step of CPython 3.11, as described in :pep:`PEP 659 <659#quickening>`.
|
|
Instrumentation works in much the same way as quickening, bytecodes are
|
|
replaced with instrumented ones as needed.
|
|
|
|
For example, if the ``CALL`` event is turned on,
|
|
then all call instructions will be
|
|
replaced with a ``INSTRUMENTED_CALL`` instruction.
|
|
|
|
Note that this will interfere with specialization, which will result in some
|
|
performance degradation in addition to the overhead of calling the
|
|
registered callable.
|
|
|
|
When the set of active events changes, the VM will immediately update
|
|
all code objects present on the call stack of any thread. It will also set in
|
|
place traps to ensure that all code objects are correctly instrumented when
|
|
called. Consequently changing the set of active events should be done as
|
|
infrequently as possible, as it could be quite an expensive operation.
|
|
|
|
Other events, such as ``RAISE`` can be turned on or off cheaply,
|
|
as they do not rely on code instrumentation, but runtime checks when the
|
|
underlying event occurs.
|
|
|
|
The exact set of events that require instrumentation is an implementation detail,
|
|
but for the current design, the following events will require instrumentation:
|
|
|
|
* PY_START
|
|
* PY_RESUME
|
|
* PY_RETURN
|
|
* PY_YIELD
|
|
* CALL
|
|
* LINE
|
|
* INSTRUCTION
|
|
* JUMP
|
|
* BRANCH
|
|
|
|
Each instrumented bytecode will require an additional 8 bits of information to
|
|
note which tool the instrumentation applies to.
|
|
``LINE`` and ``INSTRUCTION`` events require additional information, as they
|
|
need to store the original instruction, or even the instrumented instruction
|
|
if they overlap other instrumentation.
|
|
|
|
|
|
Implementing tools
|
|
==================
|
|
|
|
It is the philosophy of this PEP that it should be possible for third-party monitoring
|
|
tools to achieve high-performance, not that it should be easy for them to do so.
|
|
|
|
Converting events into data that is meaningful to the users is
|
|
the responsibility of the tool.
|
|
|
|
All events have a cost, and tools should attempt to the use set of events
|
|
that trigger the least often and still provide the necessary information.
|
|
|
|
Debuggers
|
|
---------
|
|
|
|
Inserting breakpoints
|
|
'''''''''''''''''''''
|
|
|
|
Breakpoints can be inserted setting per code object events, either ``LINE`` or ``INSTRUCTION``,
|
|
and returning ``DISABLE`` for any events not matching a breakpoint.
|
|
|
|
Stepping
|
|
''''''''
|
|
|
|
Debuggers usually offer the ability to step execution by a
|
|
single instruction or line.
|
|
|
|
Like breakpoints, stepping can be implemented by setting per code object events.
|
|
As soon as normal execution is to be resumed, the local events can be unset.
|
|
|
|
Attaching
|
|
'''''''''
|
|
|
|
Debuggers can use the ``PY_START`` and ``PY_RESUME`` events to be informed
|
|
when a code object is first encountered, so that any necessary breakpoints
|
|
can be inserted.
|
|
|
|
Coverage Tools
|
|
--------------
|
|
|
|
Coverage tools need to track which parts of the control graph have been
|
|
executed. To do this, they need to register for the ``PY_`` events,
|
|
plus ``JUMP`` and ``BRANCH``.
|
|
|
|
This information can be then be converted back into a line based report
|
|
after execution has completed.
|
|
|
|
Profilers
|
|
---------
|
|
|
|
Simple profilers need to gather information about calls.
|
|
To do this profilers should register for the following events:
|
|
|
|
* PY_START
|
|
* PY_RESUME
|
|
* PY_THROW
|
|
* PY_RETURN
|
|
* PY_YIELD
|
|
* PY_UNWIND
|
|
* CALL
|
|
* C_RAISE
|
|
* C_RETURN
|
|
|
|
|
|
Line based profilers
|
|
''''''''''''''''''''
|
|
|
|
Line based profilers can use the ``LINE`` and ``JUMP`` events.
|
|
Implementers of profilers should be aware that instrumenting ``LINE``
|
|
events will have a large impact on performance.
|
|
|
|
.. note::
|
|
|
|
Instrumenting profilers have significant overhead and will distort
|
|
the results of profiling. Unless you need exact call counts,
|
|
consider using a statistical profiler.
|
|
|
|
|
|
Rejected ideas
|
|
==============
|
|
|
|
A draft version of this PEP proposed making the user responsible
|
|
for inserting the monitoring instructions, rather than have VM do it.
|
|
However, that puts too much of a burden on the tools, and would make
|
|
attaching a debugger nearly impossible.
|
|
|
|
An earlier version of this PEP, proposed storing events as ``enums``::
|
|
|
|
class Event(enum.IntFlag):
|
|
PY_START = ...
|
|
|
|
However, that would prevent monitoring of code before the ``enum`` module was
|
|
loaded and could cause unnecessary overhead.
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|