PEP 657: Update the public API and the opt-out mechanism (#1959)
This commit is contained in:
parent
bcf1f22b20
commit
27c75f67c4
135
pep-0657.rst
135
pep-0657.rst
|
@ -17,16 +17,17 @@ Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
This PEP proposes adding a mapping from each bytecode instruction to the start
|
This PEP proposes adding a mapping from each bytecode instruction to the start
|
||||||
and end column offsets of the line that generated them. This data will be used
|
and end column offsets of the line that generated them as well as the end line
|
||||||
to improve tracebacks displayed by the CPython interpreter in order to improve
|
number. This data will be used to improve tracebacks displayed by the CPython
|
||||||
the debugging experience. The PEP also proposes adding APIs that allow other
|
interpreter in order to improve the debugging experience. The PEP also proposes
|
||||||
tools (such as coverage analysis tools, profilers, tracers, debuggers) to
|
adding APIs that allow other tools (such as coverage analysis tools, profilers,
|
||||||
consume this information from code objects.
|
tracers, debuggers) to consume this information from code objects.
|
||||||
|
|
||||||
Motivation
|
Motivation
|
||||||
==========
|
==========
|
||||||
|
|
||||||
The primary motivation for this PEP is to improve the feedback presented about the location of errors to aid with debugging.
|
The primary motivation for this PEP is to improve the feedback presented about
|
||||||
|
the location of errors to aid with debugging.
|
||||||
|
|
||||||
Python currently keeps a mapping of bytecode to line numbers from compilation.
|
Python currently keeps a mapping of bytecode to line numbers from compilation.
|
||||||
The interpreter uses this mapping to point to the source line associated with
|
The interpreter uses this mapping to point to the source line associated with
|
||||||
|
@ -150,51 +151,55 @@ instruction. This will have an impact on the size of ``pyc`` files on disk and
|
||||||
the size of code objects in memory. The authors of this proposal have chosen
|
the size of code objects in memory. The authors of this proposal have chosen
|
||||||
the data types in a way that tries to minimize this impact. The proposed
|
the data types in a way that tries to minimize this impact. The proposed
|
||||||
overhead is storing two ``uint8_t`` (one for the start offset and one for the
|
overhead is storing two ``uint8_t`` (one for the start offset and one for the
|
||||||
end offset) for every bytecode instruction.
|
end offset) and the end line information for every bytecode instruction (in
|
||||||
|
the same encoded fashion as the start line is stored currently).
|
||||||
|
|
||||||
As an illustrative example to gauge the impact of this change, we have
|
As an illustrative example to gauge the impact of this change, we have
|
||||||
calculated that this change will increase the size of the standard library’s
|
calculated that including the start and end offsets will increase the size of
|
||||||
pyc files by 22% (6MB) from 28.4MB to 34.7MB. The overhead in memory usage will be
|
the standard library’s pyc files by 22% (6MB) from 28.4MB to 34.7MB. The
|
||||||
the same (assuming the *full standard library* is loaded into the same
|
overhead in memory usage will be the same (assuming the *full standard library*
|
||||||
program). We believe that this is a very acceptable number since the order of
|
is loaded into the same program). We believe that this is a very acceptable
|
||||||
magnitude of the overhead is very small, especially considering the storage
|
number since the order of magnitude of the overhead is very small, especially
|
||||||
size and memory capabilities of modern computers. Additionally, in general the
|
considering the storage size and memory capabilities of modern computers.
|
||||||
memory size of a Python program is not dominated by code objects. To check this
|
Additionally, in general the memory size of a Python program is not dominated
|
||||||
assumption we have executed the test suite of several popular PyPI projects
|
by code objects. To check this assumption we have executed the test suite of
|
||||||
(including NumPy, pytest, Django and Cython) as well as several applications
|
several popular PyPI projects (including NumPy, pytest, Django and Cython) as
|
||||||
(Black, pylint, mypy executed over either mypy or the standard library) and we
|
well as several applications (Black, pylint, mypy executed over either mypy or
|
||||||
found that code objects represent normally 3-6% of the average memory size of
|
the standard library) and we found that code objects represent normally 3-6% of
|
||||||
the program.
|
the average memory size of the program.
|
||||||
|
|
||||||
We understand that the extra cost of this information may not be acceptable for
|
We understand that the extra cost of this information may not be acceptable for
|
||||||
some users, so we propose an opt-out mechanism when Python is executed in
|
some users, so we propose an opt-out mechanism which will cause generated code
|
||||||
"opt-2" optimized mode (``python -OO``), which will cause pyc files to not include
|
objects to not have the extra information while also allowing pyc files to not
|
||||||
the extra information.
|
include the extra information.
|
||||||
|
|
||||||
|
|
||||||
Specification
|
Specification
|
||||||
=============
|
=============
|
||||||
|
|
||||||
In order to have enough information to correctly resolve the location within a
|
In order to have enough information to correctly resolve the location
|
||||||
given line where an error was raised, a map linking bytecode instructions and
|
within a given line where an error was raised, a map linking bytecode
|
||||||
column offsets (start and end offset) is needed. This is similar in fashion to
|
instructions to column offsets (start and end offset) and end line numbers
|
||||||
how line numbers are currently linked to bytecode instructions.
|
is needed. This is similar in fashion to how line numbers are currently linked
|
||||||
|
to bytecode instructions.
|
||||||
|
|
||||||
The following changes will be performed as part of the implementation of this PEP:
|
The following changes will be performed as part of the implementation of
|
||||||
|
this PEP:
|
||||||
|
|
||||||
* The offset information will be exposed to Python via a new attribute in the
|
* The offset information will be exposed to Python via a new attribute in the
|
||||||
code object class called ``co_col_offsets`` that will return a sequence of
|
code object class called ``co_positions`` that will return a sequence of
|
||||||
two-element tuples (containing the start offsets and end offsets) or None if
|
four-element tuples containing the full location of every instruction
|
||||||
the code object was created without the offset information.
|
(including start line, end line, start column offset and end column offset)
|
||||||
* Two new C-API functions, ``PyCode_Addr2StartOffset`` and
|
or ``None`` if the code object was created without the offset information.
|
||||||
``PyCode_Addr2EndOffset`` will be added that can obtain the start and end
|
* Three new C-API functions, ``PyCode_Addr2EndLine``, ``PyCode_Addr2StartOffset``
|
||||||
offsets respectively given the index of a bytecode instruction. These
|
and ``PyCode_Addr2EndOffset`` will be added that can obtain the end line, the
|
||||||
functions will return 0 if the offset information is not available.
|
start column offsets and the end column offset respectively given the index
|
||||||
* A new private (underscore prefixed) C-API constructor for code objects will
|
of a bytecode instruction. These functions will return 0 if the information
|
||||||
be added that takes a bytes object containing the start offsets in the even
|
is not available.
|
||||||
position and the end offsets in the odd positions. Old constructors will be
|
|
||||||
left untouched for backwards compatibility and will create code objects
|
The internal storage, compression and encoding of the information is left as an
|
||||||
without the new field.
|
implementation detail and can be changed at any point as long as the public API
|
||||||
|
remains unchanged.
|
||||||
|
|
||||||
Offset semantics
|
Offset semantics
|
||||||
^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^
|
||||||
|
@ -209,14 +214,12 @@ We believe this is an acceptable compromise as line lengths in Python tend to
|
||||||
be much lower than this limit (a query of the top 100 packages in PyPI shows
|
be much lower than this limit (a query of the top 100 packages in PyPI shows
|
||||||
that less than 0.01% of lines were longer than 255 characters).
|
that less than 0.01% of lines were longer than 255 characters).
|
||||||
|
|
||||||
Maintaining the current behavior, only a single line will be displayed in
|
As specified previously, the underlying storage of the offsets should be
|
||||||
tracebacks. For instructions that span multiple lines (the end offset and the
|
considered an implementation detail, as the public APIs to obtain this values
|
||||||
start offset belong to different lines), the end offset will be set to 0
|
will return either C ``int`` types or Python ``int`` objects, which allows to
|
||||||
(meaning it is unavailable). If the start offset is not 0, this will be
|
implement better compression/encoding in the future if bigger ranges would need
|
||||||
interpreted by the displaying code as if the range spans from the starting
|
to be supported. This PEP proposes to start with this simpler version and
|
||||||
offset to the end of the line. The actual end offset cannot be calculated at
|
defer improvements to future work.
|
||||||
compile time since the compiler does not know how many characters “the end of
|
|
||||||
the line” actually represents.
|
|
||||||
|
|
||||||
Displaying tracebacks
|
Displaying tracebacks
|
||||||
^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
@ -294,27 +297,37 @@ Will be displayed as::
|
||||||
^^^
|
^^^
|
||||||
ZeroDivisionError: division by zero
|
ZeroDivisionError: division by zero
|
||||||
|
|
||||||
|
Maintaining the current behavior, only a single line will be displayed
|
||||||
|
in tracebacks. For instructions that span multiple lines (the end offset
|
||||||
|
and the start offset belong to different lines), the end line number must
|
||||||
|
be inspected to know if the end offset applies to the same line as the
|
||||||
|
starting offset.
|
||||||
|
|
||||||
Opt-out mechanism
|
Opt-out mechanism
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
To offer an opt-out mechanism for those users that care about the storage and
|
To offer an opt-out mechanism for those users that care about the
|
||||||
memory overhead, the functionality will be deactivated along with the extra
|
storage and memory overhead and to allow third party tools and other
|
||||||
information when Python is executed in "opt-2" optimized mode (``python -OO``)
|
programs that are currently parsing tracebacks to catch up the following
|
||||||
resulting in ``pyc`` files not having the overhead associated with the extra
|
methods will be provided to deactivate this feature:
|
||||||
required data.
|
|
||||||
|
|
||||||
To allow third party tools and other programs that are currently parsing
|
* A new environment variable: ``PYNODEBUGRANGES``.
|
||||||
tracebacks to catch up and to allow users to deactivate the new feature, the
|
* A new command line option for the dev mode: ``python -Xnodebugranges``.
|
||||||
following methods will be provided to deactivate displaying the new highlight
|
|
||||||
carets (but not to avoid to storing the data, users will need to use Python in
|
|
||||||
"opt-2" optimized mode for that):
|
|
||||||
|
|
||||||
* A new environment variable: ``PY_DEACTIVATE_TRACEBACK_RANGES``
|
If any of these methods are used, the Python compiler will **not** populate
|
||||||
* A new command line option for the dev mode: ``python -Xnotracebackranges``.
|
code objects with the new information (``None`` will be used instead) and any
|
||||||
|
unmarshalled code objects that contain the extra information will have it stripped
|
||||||
|
away and replaced with ``None``). This method allows users to:
|
||||||
|
|
||||||
These flags will be removed in the next version of the Python interpreter
|
* Create smaller ``pyc`` files by using one of the two methods when said files
|
||||||
(counting from the version that releases this feature).
|
are created.
|
||||||
|
* Don't load the extra information from ``pyc`` files if those were created with
|
||||||
|
the extra information in the first place.
|
||||||
|
|
||||||
|
Doing this has a **very small** performance hit as the interpreter state needs
|
||||||
|
to be fetched when code objects are created to look up the configuration.
|
||||||
|
Creating code objects is not a performance sensitive operation so this should
|
||||||
|
not be a concern.
|
||||||
|
|
||||||
Backwards Compatibility
|
Backwards Compatibility
|
||||||
=======================
|
=======================
|
||||||
|
|
Loading…
Reference in New Issue