PEP 670: Add benchmarks (#2156)
* Add benchmarks. * Elaborate the Debug Build section
This commit is contained in:
parent
855dc06e8a
commit
570cea56c2
172
pep-0670.rst
172
pep-0670.rst
|
@ -60,13 +60,13 @@ The `GCC documentation
|
||||||
<https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_ lists several
|
<https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_ lists several
|
||||||
common macro pitfalls:
|
common macro pitfalls:
|
||||||
|
|
||||||
- Misnesting
|
- Misnesting;
|
||||||
- Operator precedence problems
|
- Operator precedence problems;
|
||||||
- Swallowing the semicolon
|
- Swallowing the semicolon;
|
||||||
- Duplication of side effects
|
- Duplication of side effects;
|
||||||
- Self-referential macros
|
- Self-referential macros;
|
||||||
- Argument prescan
|
- Argument prescan;
|
||||||
- Newlines in arguments
|
- Newlines in arguments.
|
||||||
|
|
||||||
|
|
||||||
Performance and inlining
|
Performance and inlining
|
||||||
|
@ -77,19 +77,39 @@ compilers have efficient heuristics to decide if a function should be
|
||||||
inlined or not.
|
inlined or not.
|
||||||
|
|
||||||
When a C compiler decides to not inline, there is likely a good reason.
|
When a C compiler decides to not inline, there is likely a good reason.
|
||||||
For example, inlining would reuse a register which require to
|
For example, inlining would reuse a register which requires to
|
||||||
save/restore the register value on the stack and so increase the stack
|
save/restore the register value on the stack and so increases the stack
|
||||||
memory usage or be less efficient.
|
memory usage, or be less efficient.
|
||||||
|
|
||||||
|
|
||||||
Debug build
|
Debug build
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
When Python is built in debug mode, most compiler optimizations are
|
Benchmarks must not be run on a Python debug build, only on release
|
||||||
disabled. For example, Visual Studio disables inlining. Benchmarks must
|
build. Moreover, using LTO and PGO optimizations is recommended for best
|
||||||
not be run on a Python debug build, only on release build: using LTO and
|
performances and reliable benchmarks. PGO helps the compiler to decide
|
||||||
PGO is recommended for reliable benchmarks. PGO helps the compiler to
|
if function should be inlined or not.
|
||||||
decide if function should be inlined or not.
|
|
||||||
|
``./configure --with-pydebug`` uses the ``-Og`` compiler option if it's
|
||||||
|
supported by the compiler (GCC and LLVM clang support it): optimize
|
||||||
|
debugging experience. Otherwise, the ``-O0`` compiler option is used:
|
||||||
|
disable most optimizations.
|
||||||
|
|
||||||
|
With GCC 11, ``gcc -Og`` can inline static inline functions, whereas
|
||||||
|
``gcc -O0`` does not inline static inline functions. Examples:
|
||||||
|
|
||||||
|
* Call ``Py_INCREF()`` in ``PyBool_FromLong()``:
|
||||||
|
|
||||||
|
* ``gcc -Og``: inlined
|
||||||
|
* ``gcc -O0``: not inlined, call ``Py_INCREF()`` function
|
||||||
|
|
||||||
|
* Call ``_PyErr_Occurred()`` in ``_Py_CheckFunctionResult()``:
|
||||||
|
|
||||||
|
* ``gcc -Og``: inlined
|
||||||
|
* ``gcc -O0``: not inlined, call ``_PyErr_Occurred()`` function
|
||||||
|
|
||||||
|
On Windows, when Python is built in debug mode by Visual Studio, static
|
||||||
|
inline functions are not inlined.
|
||||||
|
|
||||||
|
|
||||||
Force inlining
|
Force inlining
|
||||||
|
@ -154,6 +174,11 @@ functions should be measured with benchmarks. If there is a significant
|
||||||
slowdown, there should be a good reason to do the conversion. One reason
|
slowdown, there should be a good reason to do the conversion. One reason
|
||||||
can be hiding implementation details.
|
can be hiding implementation details.
|
||||||
|
|
||||||
|
To avoid any risk of performance slowdown on Python built without LTO,
|
||||||
|
it is possible to keep a private static inline function in the internal
|
||||||
|
C API and use it in Python, but expose a regular function in the public
|
||||||
|
C API.
|
||||||
|
|
||||||
Using static inline functions in the internal C API is fine: the
|
Using static inline functions in the internal C API is fine: the
|
||||||
internal C API exposes implementation details by design and should not be
|
internal C API exposes implementation details by design and should not be
|
||||||
used outside Python.
|
used outside Python.
|
||||||
|
@ -164,8 +189,8 @@ Cast to PyObject*
|
||||||
When a macro is converted to a function and the macro casts its
|
When a macro is converted to a function and the macro casts its
|
||||||
arguments to ``PyObject*``, the new function comes with a new macro
|
arguments to ``PyObject*``, the new function comes with a new macro
|
||||||
which cast arguments to ``PyObject*`` to prevent emitting new compiler
|
which cast arguments to ``PyObject*`` to prevent emitting new compiler
|
||||||
warnings. So the converted functions still accept pointers to structures
|
warnings. So the converted functions still accept pointers to other
|
||||||
inheriting from ``PyObject`` (ex: ``PyTupleObject``).
|
structures inheriting from ``PyObject`` (ex: ``PyTupleObject``).
|
||||||
|
|
||||||
For example, the ``Py_TYPE(obj)`` macro casts its ``obj`` argument to
|
For example, the ``Py_TYPE(obj)`` macro casts its ``obj`` argument to
|
||||||
``PyObject*``::
|
``PyObject*``::
|
||||||
|
@ -224,9 +249,47 @@ the macro.
|
||||||
People using macros should be considered "consenting adults". People who
|
People using macros should be considered "consenting adults". People who
|
||||||
feel unsafe with macros should simply not use them.
|
feel unsafe with macros should simply not use them.
|
||||||
|
|
||||||
|
The idea was rejected because macros are error prone and it is too easy
|
||||||
|
to miss a macro pitfall when writing a macro. Moreover, macros are
|
||||||
|
harder to read and to maintain than functions.
|
||||||
|
|
||||||
|
|
||||||
Examples of hard to read macros
|
Examples of hard to read macros
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
|
PyObject_INIT()
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Example showing the usage of commas in a macro which has a return value.
|
||||||
|
|
||||||
|
Python 3.7 macro::
|
||||||
|
|
||||||
|
#define PyObject_INIT(op, typeobj) \
|
||||||
|
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )
|
||||||
|
|
||||||
|
Python 3.8 function (simplified code)::
|
||||||
|
|
||||||
|
static inline PyObject*
|
||||||
|
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
|
||||||
|
{
|
||||||
|
Py_TYPE(op) = typeobj;
|
||||||
|
_Py_NewReference(op);
|
||||||
|
return op;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define PyObject_INIT(op, typeobj) \
|
||||||
|
_PyObject_INIT(_PyObject_CAST(op), (typeobj))
|
||||||
|
|
||||||
|
* The function doesn't need the line continuation character ``"\"``.
|
||||||
|
* It has an explicit ``"return op;"`` rather than the surprising
|
||||||
|
``", (op)"`` syntax at the end of the macro.
|
||||||
|
* It uses short statements on multiple lines, rather than being written
|
||||||
|
as a single long line.
|
||||||
|
* Inside the function, the *op* argument has the well defined type
|
||||||
|
``PyObject*`` and so doesn't need casts like ``(PyObject *)(op)``.
|
||||||
|
* Arguments don't need to be put inside parenthesis: use ``typeobj``,
|
||||||
|
rather than ``(typeobj)``.
|
||||||
|
|
||||||
_Py_NewReference()
|
_Py_NewReference()
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
|
@ -254,35 +317,6 @@ Python 3.8 function (simplified code)::
|
||||||
Py_REFCNT(op) = 1;
|
Py_REFCNT(op) = 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
PyObject_INIT()
|
|
||||||
---------------
|
|
||||||
|
|
||||||
Example showing the usage of commas in a macro.
|
|
||||||
|
|
||||||
Python 3.7 macro::
|
|
||||||
|
|
||||||
#define PyObject_INIT(op, typeobj) \
|
|
||||||
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )
|
|
||||||
|
|
||||||
Python 3.8 function (simplified code)::
|
|
||||||
|
|
||||||
static inline PyObject*
|
|
||||||
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
|
|
||||||
{
|
|
||||||
Py_TYPE(op) = typeobj;
|
|
||||||
_Py_NewReference(op);
|
|
||||||
return op;
|
|
||||||
}
|
|
||||||
|
|
||||||
#define PyObject_INIT(op, typeobj) \
|
|
||||||
_PyObject_INIT(_PyObject_CAST(op), (typeobj))
|
|
||||||
|
|
||||||
The function doesn't need the line continuation character. It has an
|
|
||||||
explicit ``"return op;"`` rather than a surprising ``", (op)"`` at the
|
|
||||||
end of the macro. It uses one short statement per line, rather than a
|
|
||||||
single long line. Inside the function, the *op* argument has a well
|
|
||||||
defined type: ``PyObject*``.
|
|
||||||
|
|
||||||
|
|
||||||
Macros converted to functions since Python 3.8
|
Macros converted to functions since Python 3.8
|
||||||
==============================================
|
==============================================
|
||||||
|
@ -346,6 +380,52 @@ private static inline function has been added to the internal C API:
|
||||||
* ``_PyVectorcall_FunctionInline()``
|
* ``_PyVectorcall_FunctionInline()``
|
||||||
|
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
Benchmarks run on Fedora 35 (Linux) with GCC 11 on a laptop with 8
|
||||||
|
logical CPUs (4 physical CPU cores).
|
||||||
|
|
||||||
|
|
||||||
|
gcc -O0 versus gcc -Og
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
Benchmark of the ``./python -m test -j10`` command on a Python debug
|
||||||
|
build:
|
||||||
|
|
||||||
|
* ``gcc -Og``: 220 sec ± 3 sec
|
||||||
|
* ``gcc -O0``: 360 sec ± 6 sec
|
||||||
|
|
||||||
|
Python built with ``gcc -O0`` is **1.6x slower** than Python built with
|
||||||
|
``gcc -Og``.
|
||||||
|
|
||||||
|
Replace macros with static inline functions
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
The `PR 29728 <https://github.com/python/cpython/pull/29728>`_ replaces
|
||||||
|
existing the following static inline functions with macros:
|
||||||
|
|
||||||
|
* ``PyObject_TypeCheck()``
|
||||||
|
* ``PyType_Check()``, ``PyType_CheckExact()``
|
||||||
|
* ``PyType_HasFeature()``
|
||||||
|
* ``PyVectorcall_NARGS()``
|
||||||
|
* ``Py_DECREF()``, ``Py_XDECREF()``
|
||||||
|
* ``Py_INCREF()``, ``Py_XINCREF()``
|
||||||
|
* ``Py_IS_TYPE()``
|
||||||
|
* ``Py_NewRef()``
|
||||||
|
* ``Py_REFCNT()``, ``Py_TYPE()``, ``Py_SIZE()``
|
||||||
|
|
||||||
|
Benchmark of the ``./python -m test -j10`` command on a Python debug
|
||||||
|
build:
|
||||||
|
|
||||||
|
* Macros (PR 29728), ``gcc -O0``: 345 sec ± 5 sec
|
||||||
|
* Static inline functions (reference), ``gcc -O0``: 360 sec ± 6 sec
|
||||||
|
|
||||||
|
Replacing macros with static inline functions makes Python
|
||||||
|
**1.04x slower** when the compiler **does not inline** static inline
|
||||||
|
functions.
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue