282 lines
10 KiB
ReStructuredText
282 lines
10 KiB
ReStructuredText
PEP: 626
|
|
Title: Precise line numbers for debugging and other tools.
|
|
Author: Mark Shannon <mark@hotpy.org>
|
|
BDFL-Delegate: Pablo Galindo <pablogsal@python.org>
|
|
Status: Accepted
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 15-Jul-2020
|
|
Python-Version: 3.10
|
|
Post-History: 17-Jul-2020
|
|
|
|
Abstract
|
|
========
|
|
|
|
Python should guarantee that when tracing is turned on, "line" tracing events are generated for *all* lines of code executed and *only* for lines of
|
|
code that are executed.
|
|
|
|
The ``f_lineo`` attribute of frame objects should always contain the expected line number.
|
|
During frame execution, the expected line number is the line number of source code currently being executed.
|
|
After a frame has completed, either by returning or by raising an exception,
|
|
the expected line number is the line number of the last line of source that was executed.
|
|
|
|
A side effect of ensuring correct line numbers, is that some bytecodes will need to be marked as artificial, and not have a meaningful line number.
|
|
To assist tools, a new ``co_lines`` attribute will be added that describes the mapping from bytecode to source.
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Users of ``sys.settrace`` and associated tools should be able to rely on tracing events being
|
|
generated for all lines of code, and only for actual code.
|
|
They should also be able to assume that the line number in ``f_lineno`` is correct.
|
|
|
|
The current implementation mostly does this, but fails in a few cases.
|
|
This requires workarounds in tooling and is a nuisance for alternative Python implementions.
|
|
|
|
Having this guarantee also benefits implementers of CPython in the long term, as the current behaviour is not obvious and has some odd corner cases.
|
|
|
|
Rationale
|
|
=========
|
|
|
|
In order to guarantee that line events are generated when expected, the ``co_lnotab`` attribute, in its current form,
|
|
can no longer be the source of truth for line number information.
|
|
|
|
Rather than attempt to fix the ``co_lnotab`` attribute, a new method
|
|
``co_lines()`` will be added, which returns an iterator over bytecode offsets and source code lines.
|
|
|
|
Ensuring that the bytecode is annotated correctly to enable accurate line number information means that
|
|
some bytecodes must be marked as artificial, and not have a line number.
|
|
|
|
Some care must be taken not to break existing tooling.
|
|
To minimize breakage, the ``co_lnotab`` attribute will be retained, but lazily generated on demand.
|
|
|
|
Specification
|
|
=============
|
|
|
|
Line events and the ``f_lineno`` attribute should act as an experienced Python user would expect in *all* cases.
|
|
|
|
Tracing
|
|
'''''''
|
|
|
|
Tracing generates events for calls, returns, exceptions, lines of source code executed, and, under some circumstances, instructions executed.
|
|
|
|
Only line events are covered by this PEP.
|
|
|
|
When tracing is turned on, line events will be generated when:
|
|
|
|
* A new line of source code is reached.
|
|
* A backwards jump occurs, even if it jumps to the same line, as may happen in list comprehensions.
|
|
|
|
Additionally, line events will *never* be generated for source code lines that are not executed.
|
|
|
|
The f_lineno attribute
|
|
''''''''''''''''''''''
|
|
|
|
* When a frame object is created, the ``f_lineno`` attribute will be set to the line
|
|
at which the function or class is defined; that is the line on which the ``def`` or ``class`` keyword appears.
|
|
For modules it will be set to zero.
|
|
* The ``f_lineno`` attribute will be updated to match the line number about to be executed,
|
|
even if tracing is turned off and no event is generated.
|
|
|
|
The new co_lines() method of code objects
|
|
'''''''''''''''''''''''''''''''''''''''''
|
|
|
|
The ``co_lines()`` method will return an iterator which yields tuples of values,
|
|
each representing the line number of a range of bytecodes. Each tuple will consist of three values:
|
|
|
|
* ``start`` -- The offset (inclusive) of the start of the bytecode range
|
|
* ``end`` -- The offset (exclusive) of the end of the bytecode range
|
|
* ``line`` -- The line number, or ``None`` if the the bytecodes in the given range do not have a line number.
|
|
|
|
The sequence generated will have the following properties:
|
|
|
|
* The first range in the sequence with have a ``start`` of ``0``
|
|
* The ``(start, end)`` ranges will be strictly increasing and consecutive.
|
|
That is, for any pair of tuples the ``start`` of the second
|
|
will equal to the ``end`` of the first.
|
|
* No range will be empty, that is ``end`` > ``start`` for all triples.
|
|
* The final range in the sequence with have ``end`` equal to the size of the bytecode.
|
|
* ``line`` will either be a positive integer, or ``None``
|
|
|
|
The co_linetable attribute
|
|
''''''''''''''''''''''''''
|
|
|
|
The co_linetable attribute will hold the line number information.
|
|
The format is opaque, unspecified and may be changed without notice.
|
|
The attribute is public only to support creation of new code objects.
|
|
|
|
The co_lnotab attribute
|
|
'''''''''''''''''''''''
|
|
|
|
Historically the ``co_lnotab`` attribute held a mapping from bytecode offset to line number, but does not support bytecodes without a line number.
|
|
For backward compatibility, the ``co_lnotab`` bytes object will be lazily created when needed.
|
|
For ranges of bytecodes without a line number, the line number of the previous bytecode range will be used.
|
|
|
|
Tools that parse the ``co_lnotab`` table should move to using the new ``co_lines()`` method as soon as is practical.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
The ``co_lnotab`` attribute will be deprecated in 3.10 and removed in 3.12.
|
|
|
|
Any tools that parse the ``co_lnotab`` attribute of code objects will need to move to using ``co_lines()`` before 3.12 is released.
|
|
Tools that use ``sys.settrace`` will be unaffected, except in cases where the "line" events they receive are more accurate.
|
|
|
|
|
|
Examples of code for which the sequence of trace events will change
|
|
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
|
|
|
|
In the following examples, events are listed as "name", ``f_lineno`` pairs.
|
|
|
|
|
|
``pass`` statement in an ``if`` statement.
|
|
------------------------------------------
|
|
|
|
::
|
|
|
|
0. def spam(a):
|
|
1. if a:
|
|
2. eggs()
|
|
3. else:
|
|
4. pass
|
|
|
|
If ``a`` is ``True``, then the sequence of events generated by Python 3.9 is::
|
|
|
|
"line" 1
|
|
"line" 2
|
|
"line" 4
|
|
"return" 4
|
|
|
|
From 3.10 the sequence will be::
|
|
|
|
"line" 1
|
|
"line" 2
|
|
"return" 2
|
|
|
|
Multiple ``pass`` statements.
|
|
-----------------------------
|
|
|
|
::
|
|
|
|
0. def bar():
|
|
1. pass
|
|
2. pass
|
|
3. pass
|
|
|
|
The sequence of events generated by Python 3.9 is::
|
|
|
|
"line" 3
|
|
"return" 3
|
|
|
|
From 3.10 the sequence will be::
|
|
|
|
"line" 1
|
|
"line" 2
|
|
"line" 3
|
|
"return" 3
|
|
|
|
C API
|
|
'''''
|
|
|
|
Access to the ``f_lineno`` attribute of frame objects through C API functions is unchanged.
|
|
``f_lineno`` can be read by ``PyFrame_GetLineNumber``. ``f_lineno`` can only be set via ``PyObject_SetAttr`` and similar functions.
|
|
|
|
Accessing ``f_lineno`` directly through the underlying data structure is forbidden.
|
|
|
|
Out of process debuggers and profilers
|
|
''''''''''''''''''''''''''''''''''''''
|
|
|
|
Out of process tools, such as py-spy [1]_, cannot use the C-API, and must parse the line number table themselves.
|
|
Although the line number table format may change without warning,
|
|
it will not change during a release unless absolutely necessary for a bug fix.
|
|
|
|
To reduce the work required to implement these tools, the following C struct and utility functions are provided.
|
|
Note that these functions are not part of the C-API, so will be need to be linked into any code that needs to use them.
|
|
|
|
::
|
|
|
|
typedef struct addressrange {
|
|
int ar_start;
|
|
int ar_end;
|
|
int ar_line;
|
|
int opaque1;
|
|
void *opaque2;
|
|
} PyCodeAddressRange;
|
|
|
|
void PyLineTable_InitAddressRange(char *linetable, int firstlineno, PyCodeAddressRange *range);
|
|
int PyLineTable_NextAddressRange(PyCodeAddressRange *range);
|
|
int PyLineTable_PreviousAddressRange(PyCodeAddressRange *range);
|
|
|
|
``PyLineTable_InitAddressRange`` initializes the ``PyCodeAddressRange`` struct from the line number table and first line number.
|
|
|
|
``PyLineTable_NextAddressRange`` advances the range to the next entry, returning non-zero if valid.
|
|
|
|
``PyLineTable_PreviousAddressRange`` retreats the range to the previous entry, returning non-zero if valid.
|
|
|
|
.. note::
|
|
The data in ``linetable`` is immutable, but its lifetime depends on its code object.
|
|
For reliable operation, ``linetable`` should be copied into a local buffer before calling ``PyLineTable_InitAddressRange``.
|
|
|
|
Although these functions are not part of C-API, they will provided by all future versions of CPython.
|
|
The ``PyLineTable_`` functions do not call into the C-API, so can be safely copied into any tool that needs to use them.
|
|
The ``PyCodeAddressRange`` struct may acquire additional ``opaque`` fields in future versions, but the ``ar_`` fields will remain unchanged.
|
|
|
|
For example, the following code prints out all the address ranges:
|
|
|
|
::
|
|
|
|
void print_address_ranges(char *linetable, int firstlineno)
|
|
{
|
|
PyCodeAddressRange range;
|
|
PyLineTable_InitAddressRange(linetable, firstlineno, &range);
|
|
while (PyLineTable_NextAddressRange(&range)) {
|
|
printf("Bytecodes from %d (inclusive) to %d (exclusive) ",
|
|
range.start, range.end);
|
|
if (range.line < 0) {
|
|
/* line < 0 means no line number */
|
|
printf("have no line number\n");
|
|
}
|
|
else {
|
|
printf("have line number %d\n", range.line);
|
|
}
|
|
}
|
|
}
|
|
|
|
|
|
Performance Implications
|
|
========================
|
|
|
|
In general, there should be no change in performance.
|
|
When tracing, programs should run a little faster as the new table format can be designed with line number calculation speed in mind.
|
|
Code with long sequences of ``pass`` statements will probably become a bit slower.
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
https://github.com/markshannon/cpython/tree/new-linetable-format-version-2
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] py-spy: Sampling profiler for Python programs
|
|
(https://github.com/benfred/py-spy)
|
|
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|
|
|