PEP 657: Include fine grained error locations in tracebacks (GH-1950)

2021-05-08 18:27:58 +01:00 · 2021-05-08 18:27:58 +01:00 · 822755724f
parent fae0ce2014
commit 822755724f
1 changed files with 405 additions and 0 deletions
--- a/pep-0657.rst
+++ b/pep-0657.rst
@ -0,0 +1,405 @@
+PEP: 657
+Title: Include Fine Grained Error Locations in Tracebacks
+Version: $Revision$
+Last-Modified: $Date$
+Author: Pablo Galindo <pablogsal@python.org>,
+        Batuhan Taskaya <batuhan@python.org>,
+        Ammar Askar <ammar@ammaraskar.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 08-May-2021
+Python-Version: 3.11
+Post-History:
+
+Abstract
+========
+
+This PEP proposes adding a mapping from each bytecode instruction to the start
+and end column offsets of the line that generated them. This data will be used
+to improve tracebacks displayed by the CPython interpreter in order to improve
+the debugging experience. The PEP also proposes adding APIs that allow other
+tools (such as coverage analysis tools, profilers, tracers, debuggers) to
+consume this information from code objects.
+
+Motivation
+==========
+
+The primary motivation for this PEP is to improve the feedback presented about the location of errors to aid with debugging.
+
+Python currently keeps a mapping of bytecode to line numbers from compilation.
+The interpreter uses this mapping to point to the source line associated with
+an error. While this line-level granularity for instructions is useful, a
+single line of Python code can compile into dozens of bytecode operations
+making it hard to track which part of the line caused the error.
+
+Consider the following line of Python code::
+
+    x['a']['b']['c']['d'] = 1
+
+If any of the values in the dictionaries are ``None``, the error shown is::
+
+    Traceback (most recent call last):
+      File "test.py", line 2, in <module>
+        x['a']['b']['c']['d'] = 1
+    TypeError: 'NoneType' object is not subscriptable
+
+From the traceback, it is impossible to determine which one of the dictionaries
+had the ``None`` element that caused the error. Users often have to attach a
+debugger or split up their expression to track down the problem.
+
+However, if the interpreter had a mapping of bytecode to column offsets as well
+as line numbers, it could helpfully display::
+
+    Traceback (most recent call last):
+      File "test.py", line 2, in <module>
+        x['a']['b']['c']['d'] = 1
+        ^^^^^^^^^^^^^^^^
+    TypeError: 'NoneType' object is not subscriptable
+
+indicating to the user that the object ``x['a']['b']`` must have been ``None``.
+This highlighting will occur for every frame in the traceback. For instance, if
+a similar error is part of a complex function call chain, the traceback would
+display the code associated to the current instruction in every frame::
+
+    Traceback (most recent call last):
+      File "test.py", line 14, in <module>
+        lel3(x)
+        ^^^^^^^
+      File "test.py", line 12, in lel3
+        return lel2(x) / 23
+               ^^^^^^^
+      File "test.py", line 9, in lel2
+        return 25 + lel(x) + lel(x)
+                    ^^^^^^
+      File "test.py", line 6, in lel
+        return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
+                             ^^^^^^^^^^^^^^^^^^^^^
+    TypeError: 'NoneType' object is not subscriptable
+
+This problem presents itself in the following situations.
+
+* When passing down multiple objects to function calls while
+  accessing the same attribute in them.
+  For instance, this error::
+
+    Traceback (most recent call last):
+      File "test.py", line 19, in <module>
+        foo(a.name, b.name, c.name)
+    AttributeError: 'NoneType' object has no attribute 'name'
+
+  With the improvements in this PEP this would show::
+
+    Traceback (most recent call last):
+      File "test.py", line 17, in <module>
+        foo(a.name, b.name, c.name)
+                    ^^^^^^
+    AttributeError: 'NoneType' object has no attribute 'name'
+
+* When dealing with lines with complex mathematical expressions,
+  especially with libraries such as numpy where arithmetic
+  operations can fail based on the arguments. For example: ::
+
+    Traceback (most recent call last):
+      File "test.py", line 1, in <module>
+        x = (a + b) @ (c + d)
+      ValueError: operands could not be broadcast together with shapes (1,2) (2,3)
+
+  There is no clear indication as to which operation failed, was it the addition
+  on the left, the right or the matrix multiplication in the middle? With this
+  PEP the new error message would look like::
+
+    Traceback (most recent call last):
+      File "test.py", line 1, in <module>
+        x = (a + b) @ (c + d)
+                       ^^^^^
+      ValueError: operands could not be broadcast together with shapes (1,2) (2,3)
+
+  Giving a much clearer and easier to debug error message.
+
+
+Debugging aside, this extra information would also be useful for code
+coverage tools, enabling them to measure expression-level coverage instead of
+just line-level coverage. For instance, given the following line: ::
+
+    x = foo() if bar() else baz()
+
+coverage, profile or state analysis tools will highlight the full line in both
+branches, making it impossible to differentiate what branch was taken. This is
+a known problem in pycoverage_.
+
+Similar efforts to this PEP have taken place in other languages such as Java in
+the form of JEP358_. ``NullPointerExceptions`` in Java were similarly nebulous when
+it came to lines with complicated expressions. A ``NullPointerException`` would
+provide very little aid in finding the root cause of an error. The
+implementation for JEP358 is fairly complex, requiring walking back through the
+bytecode by using a control flow graph analyzer and decompilation techniques to
+recover the source code that led to the null pointer. Although the complexity
+of this solution is high and requires maintenance for the decompiler every time
+Java bytecode is changed, this improvement was deemed to be worth it for the
+extra information provided for *just one exception type*.
+
+
+Rationale
+=========
+
+In order to identify the range of source code being executed when exceptions
+are raised, this proposal requires adding new data for every bytecode
+instruction. This will have an impact on the size of ``pyc`` files on disk and
+the size of code objects in memory. The authors of this proposal have chosen
+the data types in a way that tries to minimize this impact. The proposed
+overhead is storing two ``uint8_t`` (one for the start offset and one for the
+end offset) for every bytecode instruction.
+
+As an illustrative example to gauge the impact of this change, we have
+calculated that this change will increase the size of the standard library’s
+pyc files by 22% (6MB) from 70MB to 76MB. The overhead in memory usage will be
+the same (assuming the *full standard library* is loaded into the same
+program). We believe that this is a very acceptable number since the order of
+magnitude of the overhead is very small, especially considering the storage
+size and memory capabilities of modern computers. Additionally, in general the
+memory size of a Python program is not dominated by code objects. To check this
+assumption we have executed the test suite of several popular PyPI projects
+(including NumPy, pytest, Django and Cython) as well as several applications
+(Black, pylint, mypy executed over either mypy or the standard library) and we
+found that code objects represent normally 3-6% of the average memory size of
+the program.
+
+We understand that the extra cost of this information may not be acceptable for
+some users, so we propose an opt-out mechanism when Python is executed in
+optimized mode (``python -O``), which will cause pyo files to not include the
+extra information.
+
+
+Specification
+=============
+
+In order to have enough information to correctly resolve the location within a
+given line where an error was raised, a map linking bytecode instructions and
+column offsets (start and end offset) is needed. This is similar in fashion to
+how line numbers are currently linked to bytecode instructions.
+
+The following changes will be performed as part of the implementation of this PEP:
+
+* The offset information will be exposed to Python via a new attribute in the
+  code object class called ``co_col_offsets`` that will return a sequence of
+  two-element tuples (containing the start offsets and end offsets) or None if
+  the code object was created without the offset information. 
+* Two new C-API functions, ``PyCode_Addr2StartOffset`` and
+  ``PyCode_Addr2EndOffset`` will be added that can obtain the start and end
+  offsets respectively given the index of a bytecode instruction. These
+  functions will return 0 if the offset information is not available. 
+* A new private (underscore prefixed) C-API constructor for code objects will
+  be added that takes a bytes object containing the start offsets in the even
+  position and the end offsets in the odd positions. Old constructors will be
+  left untouched for backwards compatibility and will create code objects
+  without the new field.
+
+Offset semantics
+^^^^^^^^^^^^^^^^
+
+These offsets are propagated by the compiler from the ones stored currently in
+all AST nodes. They are 1-indexed and a value of 0 will mean that the
+information is not available. Although the AST nodes use ``int`` types to store
+these values, ``uint8_t`` types will be used for storage in the new map to
+minimize storage impact. This decision allows offsets to go from 0 to 255,
+while offsets bigger than these values will be treated as missing (value of 0).
+We believe this is an acceptable compromise as line lengths in Python tend to
+be much lower than this limit (a query of the top 100 packages in PyPI shows
+that less than 0.01% of lines were longer than 255 characters).
+
+Maintaining the current behavior, only a single line will be displayed in
+tracebacks. For instructions that span multiple lines (the end offset and the
+start offset belong to different lines), the end offset will be set to 0
+(meaning it is unavailable). If the start offset is not 0, this will be
+interpreted by the displaying code as if the range spans from the starting
+offset to the end of the line. The actual end offset cannot be calculated at
+compile time since the compiler does not know how many characters “the end of
+the line” actually represents.
+
+Displaying tracebacks
+^^^^^^^^^^^^^^^^^^^^^
+
+When displaying tracebacks, the default exception hook will be modified to
+query this information from the code objects and use it to display a sequence
+of carets for every displayed line in the traceback if the information is
+available. For instance::
+
+      File "test.py", line 6, in lel
+        return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
+                             ^^^^^^^^^^^^^^^^^^^^^
+    TypeError: 'NoneType' object is not subscriptable
+
+When displaying tracebacks, instruction offsets will be taken from the
+traceback objects. This makes highlighting exceptions that are re-raised work
+naturally without the need to store the new information in the stack. For
+example, for this code::
+
+    def foo(x):
+        1 + 1/0 + 2
+
+    def bar(x):
+        try:
+            1 + foo(x) + foo(x)
+        except Exception as e:
+            raise ValueError("oh no!") from e
+
+    bar(bar(bar(2)))
+
+The printed traceback would look like this::
+
+    Traceback (most recent call last):
+      File "test.py", line 6, in bar
+        1 + foo(x) + foo(x)
+            ^^^^^^
+      File "test.py", line 2, in foo
+        1 + 1/0 + 2
+            ^^^
+    ZeroDivisionError: division by zero
+
+    The above exception was the direct cause of the following exception:
+
+    Traceback (most recent call last):
+      File "test.py", line 10, in <module>
+        bar(bar(bar(2)))
+                ^^^^^^
+      File "test.py", line 8, in bar
+        raise ValueError("oh no!") from e
+        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ValueError: oh no
+
+While this code::
+
+    def foo(x):
+        1 + 1/0 + 2
+    def bar(x):
+        try:
+            1 + foo(x) + foo(x)
+        except Exception:
+            raise
+    bar(bar(bar(2)))
+
+Will be displayed as::
+
+    Traceback (most recent call last):
+      File "test.py", line 10, in <module>
+        bar(bar(bar(2)))
+                ^^^^^^
+      File "test.py", line 6, in bar
+        1 + foo(x) + foo(x)
+            ^^^^^^
+      File "test.py", line 2, in foo
+        1 + 1/0 + 2
+            ^^^
+    ZeroDivisionError: division by zero
+
+
+Opt-out mechanism
+^^^^^^^^^^^^^^^^^
+
+To offer an opt-out mechanism for those users that care about the storage and
+memory overhead, the functionality will be deactivated along with the extra
+information when Python is executed in optimized mode (``python -O``) resulting
+in ``pyo`` files not having the overhead associated with the extra required
+data.
+
+To allow third party tools and other programs that are currently parsing
+tracebacks to catch up and to allow users to deactivate the new feature, the
+following methods will be provided to deactivate displaying the new highlight
+carets (but not to avoid to storing the data, users will need to use Python in
+optimized mode for that):
+
+* A new environment variable: ``PY_DEACTIVATE_TRACEBACK_RANGES``
+* A new command line option for the dev mode: ``python -Xnotracebackranges``.
+
+These flags will be removed in the next version of the Python interpreter
+(counting from the version that releases this feature).
+
+Backwards Compatibility
+=======================
+
+The change is fully backwards compatible.
+
+
+Reference Implementation
+========================
+
+A reference implementation can be found in the implementation_ fork.
+
+Rejected Ideas
+==============
+
+Include end line number
+^^^^^^^^^^^^^^^^^^^^^^^
+Some instructions can span across multiple lines and therefore the end offset
+and the start offset can be located in two different lines. We have decided to
+set the value for the start offset to the correct value and set a value of 0 to
+the end offset. This will result in highlighting the entire line starting from
+the value of the starting offset. The reason behind this decision is that
+storing the end line will require us to store another field similar to
+``co_lnotab``, but our traceback machinery only highlights a single line
+per frame so this information would only be used to decide to highlight to the
+end of the line. On the other hand, the end line could be useful for other
+tools such as coverage-measuring tools and tracers.
+
+Have a configure flag to opt out
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Having a configure flag to opt out of the overhead even when executing Python
+in non-optimized mode may sound desirable, but it may cause problems when
+reading pyc files that were created with a version of the interpreter that was
+not compiled with the flag activated. This can lead to crashes that would be
+very difficult to debug for regular users and will make different pyc files
+incompatible between each other. As this pyc could be shipped as part of
+libraries or applications without the original source, it is also not always
+possible to force recompilation of said pyc files. For these reasons we have
+decided to use the -O flag to opt-out of this behaviour. 
+
+Lazy loading of column information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+One potential solution to reduce the memory usage of this feature is to not
+load the column information from the pyc file when code is imported. Only if an
+uncaught exception bubbles up or if a call to the C-API functions is made will
+the column information be loaded from the pyc file. This is similar to how we
+only read source lines to display them in the traceback when an exception
+bubbles up. While this would indeed lower memory usage, it also results in a
+far more complex implementation requiring changes to the importing machinery to
+selectively ignore a part of the code object. We consider this an interesting
+avenue to explore but ultimately we think is out of the scope for this particular
+PEP. It also means that column information will not be available if the user is
+not using pyc files or for code objects created dynamically at runtime.
+
+Implement compression
+^^^^^^^^^^^^^^^^^^^^^
+Although it would be possible to implement some form of compression over the
+pyc files and the new data in code objects, we believe that this is out of the
+scope of this proposal due to its larger impact (in the case of pyc files) and
+the fact that we expect column offsets to not compress well due to the lack of
+patterns in them (in case of the new data in code objects).
+
+Acknowledgments
+===============
+Thanks to Carl Friedrich Bolz-Tereick for showing an initial prototype of this
+idea for the Pypy interpreter and for the helpful discussion.
+
+
+References
+==========
+
+.. _JEP358: https://openjdk.java.net/jeps/358
+.. _implementation: https://github.com/colnotab/cpython/tree/bpo-43950
+.. _pycoverage: https://github.com/nedbat/coveragepy/issues/509
+
+Copyright
+=========
+
+This document is placed in the public domain or under the
+CC0-1.0-Universal license, whichever is more permissive.
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End: