PEP 670: Add benchmarks (#2156)
* Add benchmarks. * Elaborate the Debug Build section
This commit is contained in:
parent
855dc06e8a
commit
570cea56c2
172
pep-0670.rst
172
pep-0670.rst
|
@ -60,13 +60,13 @@ The `GCC documentation
|
|||
<https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html>`_ lists several
|
||||
common macro pitfalls:
|
||||
|
||||
- Misnesting
|
||||
- Operator precedence problems
|
||||
- Swallowing the semicolon
|
||||
- Duplication of side effects
|
||||
- Self-referential macros
|
||||
- Argument prescan
|
||||
- Newlines in arguments
|
||||
- Misnesting;
|
||||
- Operator precedence problems;
|
||||
- Swallowing the semicolon;
|
||||
- Duplication of side effects;
|
||||
- Self-referential macros;
|
||||
- Argument prescan;
|
||||
- Newlines in arguments.
|
||||
|
||||
|
||||
Performance and inlining
|
||||
|
@ -77,19 +77,39 @@ compilers have efficient heuristics to decide if a function should be
|
|||
inlined or not.
|
||||
|
||||
When a C compiler decides to not inline, there is likely a good reason.
|
||||
For example, inlining would reuse a register which require to
|
||||
save/restore the register value on the stack and so increase the stack
|
||||
memory usage or be less efficient.
|
||||
For example, inlining would reuse a register which requires to
|
||||
save/restore the register value on the stack and so increases the stack
|
||||
memory usage, or be less efficient.
|
||||
|
||||
|
||||
Debug build
|
||||
-----------
|
||||
|
||||
When Python is built in debug mode, most compiler optimizations are
|
||||
disabled. For example, Visual Studio disables inlining. Benchmarks must
|
||||
not be run on a Python debug build, only on release build: using LTO and
|
||||
PGO is recommended for reliable benchmarks. PGO helps the compiler to
|
||||
decide if function should be inlined or not.
|
||||
Benchmarks must not be run on a Python debug build, only on release
|
||||
build. Moreover, using LTO and PGO optimizations is recommended for best
|
||||
performances and reliable benchmarks. PGO helps the compiler to decide
|
||||
if function should be inlined or not.
|
||||
|
||||
``./configure --with-pydebug`` uses the ``-Og`` compiler option if it's
|
||||
supported by the compiler (GCC and LLVM clang support it): optimize
|
||||
debugging experience. Otherwise, the ``-O0`` compiler option is used:
|
||||
disable most optimizations.
|
||||
|
||||
With GCC 11, ``gcc -Og`` can inline static inline functions, whereas
|
||||
``gcc -O0`` does not inline static inline functions. Examples:
|
||||
|
||||
* Call ``Py_INCREF()`` in ``PyBool_FromLong()``:
|
||||
|
||||
* ``gcc -Og``: inlined
|
||||
* ``gcc -O0``: not inlined, call ``Py_INCREF()`` function
|
||||
|
||||
* Call ``_PyErr_Occurred()`` in ``_Py_CheckFunctionResult()``:
|
||||
|
||||
* ``gcc -Og``: inlined
|
||||
* ``gcc -O0``: not inlined, call ``_PyErr_Occurred()`` function
|
||||
|
||||
On Windows, when Python is built in debug mode by Visual Studio, static
|
||||
inline functions are not inlined.
|
||||
|
||||
|
||||
Force inlining
|
||||
|
@ -154,6 +174,11 @@ functions should be measured with benchmarks. If there is a significant
|
|||
slowdown, there should be a good reason to do the conversion. One reason
|
||||
can be hiding implementation details.
|
||||
|
||||
To avoid any risk of performance slowdown on Python built without LTO,
|
||||
it is possible to keep a private static inline function in the internal
|
||||
C API and use it in Python, but expose a regular function in the public
|
||||
C API.
|
||||
|
||||
Using static inline functions in the internal C API is fine: the
|
||||
internal C API exposes implementation details by design and should not be
|
||||
used outside Python.
|
||||
|
@ -164,8 +189,8 @@ Cast to PyObject*
|
|||
When a macro is converted to a function and the macro casts its
|
||||
arguments to ``PyObject*``, the new function comes with a new macro
|
||||
which cast arguments to ``PyObject*`` to prevent emitting new compiler
|
||||
warnings. So the converted functions still accept pointers to structures
|
||||
inheriting from ``PyObject`` (ex: ``PyTupleObject``).
|
||||
warnings. So the converted functions still accept pointers to other
|
||||
structures inheriting from ``PyObject`` (ex: ``PyTupleObject``).
|
||||
|
||||
For example, the ``Py_TYPE(obj)`` macro casts its ``obj`` argument to
|
||||
``PyObject*``::
|
||||
|
@ -224,9 +249,47 @@ the macro.
|
|||
People using macros should be considered "consenting adults". People who
|
||||
feel unsafe with macros should simply not use them.
|
||||
|
||||
The idea was rejected because macros are error prone and it is too easy
|
||||
to miss a macro pitfall when writing a macro. Moreover, macros are
|
||||
harder to read and to maintain than functions.
|
||||
|
||||
|
||||
Examples of hard to read macros
|
||||
===============================
|
||||
|
||||
PyObject_INIT()
|
||||
---------------
|
||||
|
||||
Example showing the usage of commas in a macro which has a return value.
|
||||
|
||||
Python 3.7 macro::
|
||||
|
||||
#define PyObject_INIT(op, typeobj) \
|
||||
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )
|
||||
|
||||
Python 3.8 function (simplified code)::
|
||||
|
||||
static inline PyObject*
|
||||
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
|
||||
{
|
||||
Py_TYPE(op) = typeobj;
|
||||
_Py_NewReference(op);
|
||||
return op;
|
||||
}
|
||||
|
||||
#define PyObject_INIT(op, typeobj) \
|
||||
_PyObject_INIT(_PyObject_CAST(op), (typeobj))
|
||||
|
||||
* The function doesn't need the line continuation character ``"\"``.
|
||||
* It has an explicit ``"return op;"`` rather than the surprising
|
||||
``", (op)"`` syntax at the end of the macro.
|
||||
* It uses short statements on multiple lines, rather than being written
|
||||
as a single long line.
|
||||
* Inside the function, the *op* argument has the well defined type
|
||||
``PyObject*`` and so doesn't need casts like ``(PyObject *)(op)``.
|
||||
* Arguments don't need to be put inside parenthesis: use ``typeobj``,
|
||||
rather than ``(typeobj)``.
|
||||
|
||||
_Py_NewReference()
|
||||
------------------
|
||||
|
||||
|
@ -254,35 +317,6 @@ Python 3.8 function (simplified code)::
|
|||
Py_REFCNT(op) = 1;
|
||||
}
|
||||
|
||||
PyObject_INIT()
|
||||
---------------
|
||||
|
||||
Example showing the usage of commas in a macro.
|
||||
|
||||
Python 3.7 macro::
|
||||
|
||||
#define PyObject_INIT(op, typeobj) \
|
||||
( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )
|
||||
|
||||
Python 3.8 function (simplified code)::
|
||||
|
||||
static inline PyObject*
|
||||
_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
|
||||
{
|
||||
Py_TYPE(op) = typeobj;
|
||||
_Py_NewReference(op);
|
||||
return op;
|
||||
}
|
||||
|
||||
#define PyObject_INIT(op, typeobj) \
|
||||
_PyObject_INIT(_PyObject_CAST(op), (typeobj))
|
||||
|
||||
The function doesn't need the line continuation character. It has an
|
||||
explicit ``"return op;"`` rather than a surprising ``", (op)"`` at the
|
||||
end of the macro. It uses one short statement per line, rather than a
|
||||
single long line. Inside the function, the *op* argument has a well
|
||||
defined type: ``PyObject*``.
|
||||
|
||||
|
||||
Macros converted to functions since Python 3.8
|
||||
==============================================
|
||||
|
@ -346,6 +380,52 @@ private static inline function has been added to the internal C API:
|
|||
* ``_PyVectorcall_FunctionInline()``
|
||||
|
||||
|
||||
Benchmarks
|
||||
==========
|
||||
|
||||
Benchmarks run on Fedora 35 (Linux) with GCC 11 on a laptop with 8
|
||||
logical CPUs (4 physical CPU cores).
|
||||
|
||||
|
||||
gcc -O0 versus gcc -Og
|
||||
----------------------
|
||||
|
||||
Benchmark of the ``./python -m test -j10`` command on a Python debug
|
||||
build:
|
||||
|
||||
* ``gcc -Og``: 220 sec ± 3 sec
|
||||
* ``gcc -O0``: 360 sec ± 6 sec
|
||||
|
||||
Python built with ``gcc -O0`` is **1.6x slower** than Python built with
|
||||
``gcc -Og``.
|
||||
|
||||
Replace macros with static inline functions
|
||||
-------------------------------------------
|
||||
|
||||
The `PR 29728 <https://github.com/python/cpython/pull/29728>`_ replaces
|
||||
existing the following static inline functions with macros:
|
||||
|
||||
* ``PyObject_TypeCheck()``
|
||||
* ``PyType_Check()``, ``PyType_CheckExact()``
|
||||
* ``PyType_HasFeature()``
|
||||
* ``PyVectorcall_NARGS()``
|
||||
* ``Py_DECREF()``, ``Py_XDECREF()``
|
||||
* ``Py_INCREF()``, ``Py_XINCREF()``
|
||||
* ``Py_IS_TYPE()``
|
||||
* ``Py_NewRef()``
|
||||
* ``Py_REFCNT()``, ``Py_TYPE()``, ``Py_SIZE()``
|
||||
|
||||
Benchmark of the ``./python -m test -j10`` command on a Python debug
|
||||
build:
|
||||
|
||||
* Macros (PR 29728), ``gcc -O0``: 345 sec ± 5 sec
|
||||
* Static inline functions (reference), ``gcc -O0``: 360 sec ± 6 sec
|
||||
|
||||
Replacing macros with static inline functions makes Python
|
||||
**1.04x slower** when the compiler **does not inline** static inline
|
||||
functions.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
|
|
Loading…
Reference in New Issue