PEP: 626 Title: Precise line numbers for debugging and other tools. Author: Mark Shannon BDFL-Delegate: Pablo Galindo Status: Accepted Type: Standards Track Content-Type: text/x-rst Created: 15-Jul-2020 Python-Version: 3.10 Post-History: 17-Jul-2020 Abstract ======== Python should guarantee that when tracing is turned on, "line" tracing events are generated for *all* lines of code executed and *only* for lines of code that are executed. The ``f_lineno`` attribute of frame objects should always contain the expected line number. During frame execution, the expected line number is the line number of source code currently being executed. After a frame has completed, either by returning or by raising an exception, the expected line number is the line number of the last line of source that was executed. A side effect of ensuring correct line numbers, is that some bytecodes will need to be marked as artificial, and not have a meaningful line number. To assist tools, a new ``co_lines`` attribute will be added that describes the mapping from bytecode to source. Motivation ========== Users of ``sys.settrace`` and associated tools should be able to rely on tracing events being generated for all lines of code, and only for actual code. They should also be able to assume that the line number in ``f_lineno`` is correct. The current implementation mostly does this, but fails in a few cases. This requires workarounds in tooling and is a nuisance for alternative Python implementions. Having this guarantee also benefits implementers of CPython in the long term, as the current behaviour is not obvious and has some odd corner cases. Rationale ========= In order to guarantee that line events are generated when expected, the ``co_lnotab`` attribute, in its current form, can no longer be the source of truth for line number information. Rather than attempt to fix the ``co_lnotab`` attribute, a new method ``co_lines()`` will be added, which returns an iterator over bytecode offsets and source code lines. Ensuring that the bytecode is annotated correctly to enable accurate line number information means that some bytecodes must be marked as artificial, and not have a line number. Some care must be taken not to break existing tooling. To minimize breakage, the ``co_lnotab`` attribute will be retained, but lazily generated on demand. Specification ============= Line events and the ``f_lineno`` attribute should act as an experienced Python user would expect in *all* cases. Tracing ''''''' Tracing generates events for calls, returns, exceptions, lines of source code executed, and, under some circumstances, instructions executed. Only line events are covered by this PEP. When tracing is turned on, line events will be generated when: * A new line of source code is reached. * A backwards jump occurs, even if it jumps to the same line, as may happen in list comprehensions. Additionally, line events will *never* be generated for source code lines that are not executed. What is considered to be code for the purposes of tracing ''''''''''''''''''''''''''''''''''''''''''''''''''''''''' All expressions and parts of expressions are considered to be executable code. In general, all statements are also considered to be executable code. However, when a statement is spread over several lines, we must consider which parts of a statement are considered to be executable code. Statements are made up of keywords and expressions. Not all keywords have a direct runtime effect, so not all keywords are considered to be executable code. For example, ``else``, is a necessary part of an ``if`` statement, but there is no runtime effect associated with an ``else``. For the purposes of tracing, the following keywords will *not* be considered to be executable code: * ``del`` -- The expression to be deleted is treated as the executable code. * ``else`` -- No runtime effect * ``finally`` -- No runtime effect * ``global`` -- Purely declarative * ``nonlocal`` -- Purely declarative All other keywords are considered to be executable code. Example event sequences ''''''''''''''''''''''' In the following examples, events are listed as "name", ``f_lineno`` pairs. The code :: 1. global x 2. x = a generates the following event:: "line" 2 The code :: 1. try: 2. pass 3. finally: 4. pass generates the following events:: "line" 1 "line" 2 "line" 4 The code :: 1. for ( 2. x) in [1]: 3. pass 4. return generates the following events:: "line" 2 # evaluate [1] "line" 1 # for "line" 2 # store to x "line" 3 # pass "line" 1 # for "line" 4 # return "return" 1 The f_lineno attribute '''''''''''''''''''''' * When a frame object is created, the ``f_lineno`` attribute will be set to the line at which the function or class is defined; that is the line on which the ``def`` or ``class`` keyword appears. For modules it will be set to zero. * The ``f_lineno`` attribute will be updated to match the line number about to be executed, even if tracing is turned off and no event is generated. The new co_lines() method of code objects ''''''''''''''''''''''''''''''''''''''''' The ``co_lines()`` method will return an iterator which yields tuples of values, each representing the line number of a range of bytecodes. Each tuple will consist of three values: * ``start`` -- The offset (inclusive) of the start of the bytecode range * ``end`` -- The offset (exclusive) of the end of the bytecode range * ``line`` -- The line number, or ``None`` if the bytecodes in the given range do not have a line number. The sequence generated will have the following properties: * The first range in the sequence with have a ``start`` of ``0`` * The ``(start, end)`` ranges will be non-decreasing and consecutive. That is, for any pair of tuples the ``start`` of the second will equal to the ``end`` of the first. * No range will be backwards, that is ``end >= start`` for all triples. * The final range in the sequence with have ``end`` equal to the size of the bytecode. * ``line`` will either be a positive integer, or ``None`` Zero width ranges ----------------- Zero width range, that is ranges where ``start == end`` are allowed. Zero width ranges are used for lines that are present in the source code, but have been eliminated by the bytecode compiler. The co_linetable attribute '''''''''''''''''''''''''' The co_linetable attribute will hold the line number information. The format is opaque, unspecified and may be changed without notice. The attribute is public only to support creation of new code objects. The co_lnotab attribute ''''''''''''''''''''''' Historically the ``co_lnotab`` attribute held a mapping from bytecode offset to line number, but does not support bytecodes without a line number. For backward compatibility, the ``co_lnotab`` bytes object will be lazily created when needed. For ranges of bytecodes without a line number, the line number of the previous bytecode range will be used. Tools that parse the ``co_lnotab`` table should move to using the new ``co_lines()`` method as soon as is practical. Backwards Compatibility ======================= The ``co_lnotab`` attribute will be deprecated in 3.10 and removed in 3.12. Any tools that parse the ``co_lnotab`` attribute of code objects will need to move to using ``co_lines()`` before 3.12 is released. Tools that use ``sys.settrace`` will be unaffected, except in cases where the "line" events they receive are more accurate. Examples of code for which the sequence of trace events will change ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' In the following examples, events are listed as "name", ``f_lineno`` pairs. ``pass`` statement in an ``if`` statement. ------------------------------------------ :: 0. def spam(a): 1. if a: 2. eggs() 3. else: 4. pass If ``a`` is ``True``, then the sequence of events generated by Python 3.9 is:: "line" 1 "line" 2 "line" 4 "return" 4 From 3.10 the sequence will be:: "line" 1 "line" 2 "return" 2 Multiple ``pass`` statements. ----------------------------- :: 0. def bar(): 1. pass 2. pass 3. pass The sequence of events generated by Python 3.9 is:: "line" 3 "return" 3 From 3.10 the sequence will be:: "line" 1 "line" 2 "line" 3 "return" 3 C API ''''' Access to the ``f_lineno`` attribute of frame objects through C API functions is unchanged. ``f_lineno`` can be read by ``PyFrame_GetLineNumber``. ``f_lineno`` can only be set via ``PyObject_SetAttr`` and similar functions. Accessing ``f_lineno`` directly through the underlying data structure is forbidden. Out of process debuggers and profilers '''''''''''''''''''''''''''''''''''''' Out of process tools, such as py-spy [1]_, cannot use the C-API, and must parse the line number table themselves. Although the line number table format may change without warning, it will not change during a release unless absolutely necessary for a bug fix. To reduce the work required to implement these tools, the following C struct and utility functions are provided. Note that these functions are not part of the C-API, so will be need to be linked into any code that needs to use them. :: typedef struct addressrange { int ar_start; int ar_end; int ar_line; int opaque1; void *opaque2; } PyCodeAddressRange; void PyLineTable_InitAddressRange(char *linetable, int firstlineno, PyCodeAddressRange *range); int PyLineTable_NextAddressRange(PyCodeAddressRange *range); int PyLineTable_PreviousAddressRange(PyCodeAddressRange *range); ``PyLineTable_InitAddressRange`` initializes the ``PyCodeAddressRange`` struct from the line number table and first line number. ``PyLineTable_NextAddressRange`` advances the range to the next entry, returning non-zero if valid. ``PyLineTable_PreviousAddressRange`` retreats the range to the previous entry, returning non-zero if valid. .. note:: The data in ``linetable`` is immutable, but its lifetime depends on its code object. For reliable operation, ``linetable`` should be copied into a local buffer before calling ``PyLineTable_InitAddressRange``. Although these functions are not part of C-API, they will provided by all future versions of CPython. The ``PyLineTable_`` functions do not call into the C-API, so can be safely copied into any tool that needs to use them. The ``PyCodeAddressRange`` struct may acquire additional ``opaque`` fields in future versions, but the ``ar_`` fields will remain unchanged. For example, the following code prints out all the address ranges: :: void print_address_ranges(char *linetable, int firstlineno) { PyCodeAddressRange range; PyLineTable_InitAddressRange(linetable, firstlineno, &range); while (PyLineTable_NextAddressRange(&range)) { printf("Bytecodes from %d (inclusive) to %d (exclusive) ", range.start, range.end); if (range.line < 0) { /* line < 0 means no line number */ printf("have no line number\n"); } else { printf("have line number %d\n", range.line); } } } Performance Implications ======================== In general, there should be no change in performance. When tracing, programs should run a little faster as the new table format can be designed with line number calculation speed in mind. Code with long sequences of ``pass`` statements will probably become a bit slower. Reference Implementation ======================== https://github.com/markshannon/cpython/tree/new-linetable-format-version-2 Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. References ========== .. [1] py-spy: Sampling profiler for Python programs (https://github.com/benfred/py-spy) .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: