PEP 670: Add benchmarks (#2156)

* Add benchmarks.
* Elaborate the Debug Build section
This commit is contained in:
Victor Stinner 2021-11-23 17:30:43 +01:00 committed by GitHub
parent 855dc06e8a
commit 570cea56c2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 126 additions and 46 deletions

View File

@ -60,13 +60,13 @@ The `GCC documentation
<https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_ lists several
common macro pitfalls:
- Misnesting
- Operator precedence problems
- Swallowing the semicolon
- Duplication of side effects
- Self-referential macros
- Argument prescan
- Newlines in arguments
- Misnesting;
- Operator precedence problems;
- Swallowing the semicolon;
- Duplication of side effects;
- Self-referential macros;
- Argument prescan;
- Newlines in arguments.
Performance and inlining
@ -77,19 +77,39 @@ compilers have efficient heuristics to decide if a function should be
inlined or not.
When a C compiler decides to not inline, there is likely a good reason.
For example, inlining would reuse a register which require to
save/restore the register value on the stack and so increase the stack
memory usage or be less efficient.
For example, inlining would reuse a register which requires to
save/restore the register value on the stack and so increases the stack
memory usage, or be less efficient.
Debug build
-----------
When Python is built in debug mode, most compiler optimizations are
disabled. For example, Visual Studio disables inlining. Benchmarks must
not be run on a Python debug build, only on release build: using LTO and
PGO is recommended for reliable benchmarks. PGO helps the compiler to
decide if function should be inlined or not.
Benchmarks must not be run on a Python debug build, only on release
build. Moreover, using LTO and PGO optimizations is recommended for best
performances and reliable benchmarks. PGO helps the compiler to decide
if function should be inlined or not.
``./configure --with-pydebug`` uses the ``-Og`` compiler option if it's
supported by the compiler (GCC and LLVM clang support it): optimize
debugging experience. Otherwise, the ``-O0`` compiler option is used:
disable most optimizations.
With GCC 11, ``gcc -Og`` can inline static inline functions, whereas
``gcc -O0`` does not inline static inline functions. Examples:
* Call ``Py_INCREF()`` in ``PyBool_FromLong()``:
* ``gcc -Og``: inlined
* ``gcc -O0``: not inlined, call ``Py_INCREF()`` function
* Call ``_PyErr_Occurred()`` in ``_Py_CheckFunctionResult()``:
* ``gcc -Og``: inlined
* ``gcc -O0``: not inlined, call ``_PyErr_Occurred()`` function
On Windows, when Python is built in debug mode by Visual Studio, static
inline functions are not inlined.
Force inlining
@ -154,6 +174,11 @@ functions should be measured with benchmarks. If there is a significant
slowdown, there should be a good reason to do the conversion. One reason
can be hiding implementation details.
To avoid any risk of performance slowdown on Python built without LTO,
it is possible to keep a private static inline function in the internal
C API and use it in Python, but expose a regular function in the public
C API.
Using static inline functions in the internal C API is fine: the
internal C API exposes implementation details by design and should not be
used outside Python.
@ -164,8 +189,8 @@ Cast to PyObject*
When a macro is converted to a function and the macro casts its
arguments to ``PyObject*``, the new function comes with a new macro
which cast arguments to ``PyObject*`` to prevent emitting new compiler
warnings. So the converted functions still accept pointers to structures
inheriting from ``PyObject`` (ex: ``PyTupleObject``).
warnings. So the converted functions still accept pointers to other
structures inheriting from ``PyObject`` (ex: ``PyTupleObject``).
For example, the ``Py_TYPE(obj)`` macro casts its ``obj`` argument to
``PyObject*``::
@ -224,9 +249,47 @@ the macro.
People using macros should be considered "consenting adults". People who
feel unsafe with macros should simply not use them.
The idea was rejected because macros are error prone and it is too easy
to miss a macro pitfall when writing a macro. Moreover, macros are
harder to read and to maintain than functions.
Examples of hard to read macros
===============================
PyObject_INIT()
---------------
Example showing the usage of commas in a macro which has a return value.
Python 3.7 macro::
#define PyObject_INIT(op, typeobj) \
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )
Python 3.8 function (simplified code)::
static inline PyObject*
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
{
Py_TYPE(op) = typeobj;
_Py_NewReference(op);
return op;
}
#define PyObject_INIT(op, typeobj) \
_PyObject_INIT(_PyObject_CAST(op), (typeobj))
* The function doesn't need the line continuation character ``"\"``.
* It has an explicit ``"return op;"`` rather than the surprising
``", (op)"`` syntax at the end of the macro.
* It uses short statements on multiple lines, rather than being written
as a single long line.
* Inside the function, the *op* argument has the well defined type
``PyObject*`` and so doesn't need casts like ``(PyObject *)(op)``.
* Arguments don't need to be put inside parenthesis: use ``typeobj``,
rather than ``(typeobj)``.
_Py_NewReference()
------------------
@ -254,35 +317,6 @@ Python 3.8 function (simplified code)::
Py_REFCNT(op) = 1;
}
PyObject_INIT()
---------------
Example showing the usage of commas in a macro.
Python 3.7 macro::
#define PyObject_INIT(op, typeobj) \
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )
Python 3.8 function (simplified code)::
static inline PyObject*
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
{
Py_TYPE(op) = typeobj;
_Py_NewReference(op);
return op;
}
#define PyObject_INIT(op, typeobj) \
_PyObject_INIT(_PyObject_CAST(op), (typeobj))
The function doesn't need the line continuation character. It has an
explicit ``"return op;"`` rather than a surprising ``", (op)"`` at the
end of the macro. It uses one short statement per line, rather than a
single long line. Inside the function, the *op* argument has a well
defined type: ``PyObject*``.
Macros converted to functions since Python 3.8
==============================================
@ -346,6 +380,52 @@ private static inline function has been added to the internal C API:
* ``_PyVectorcall_FunctionInline()``
Benchmarks
==========
Benchmarks run on Fedora 35 (Linux) with GCC 11 on a laptop with 8
logical CPUs (4 physical CPU cores).
gcc -O0 versus gcc -Og
----------------------
Benchmark of the ``./python -m test -j10`` command on a Python debug
build:
* ``gcc -Og``: 220 sec ± 3 sec
* ``gcc -O0``: 360 sec ± 6 sec
Python built with ``gcc -O0`` is **1.6x slower** than Python built with
``gcc -Og``.
Replace macros with static inline functions
-------------------------------------------
The `PR 29728 <https://github.com/python/cpython/pull/29728>`_ replaces
existing the following static inline functions with macros:
* ``PyObject_TypeCheck()``
* ``PyType_Check()``, ``PyType_CheckExact()``
* ``PyType_HasFeature()``
* ``PyVectorcall_NARGS()``
* ``Py_DECREF()``, ``Py_XDECREF()``
* ``Py_INCREF()``, ``Py_XINCREF()``
* ``Py_IS_TYPE()``
* ``Py_NewRef()``
* ``Py_REFCNT()``, ``Py_TYPE()``, ``Py_SIZE()``
Benchmark of the ``./python -m test -j10`` command on a Python debug
build:
* Macros (PR 29728), ``gcc -O0``: 345 sec ± 5 sec
* Static inline functions (reference), ``gcc -O0``: 360 sec ± 6 sec
Replacing macros with static inline functions makes Python
**1.04x slower** when the compiler **does not inline** static inline
functions.
References
==========