python-peps/pep-0674.rst

423 lines
14 KiB
ReStructuredText

PEP: 674
Title: Disallow using macros as l-value
Author: Victor Stinner <vstinner@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-Dec-2021
Python-Version: 3.11
Abstract
========
Incompatible C API change disallowing using macros as l-value to:
* Allow evolving CPython internals;
* Ease the C API implementation on other Python implementation;
* Help migrating existing C extensions to the HPy API.
On the PyPI top 5000 projects, only 14 projects (0.3%) are affected by 4
macro changes. Moreover, 24 projects just have to regenerate their
Cython code to use ``Py_SET_TYPE()``.
In practice, the majority of affected projects only have to make two
changes:
* Replace ``Py_TYPE(obj) = new_type;``
with ``Py_SET_TYPE(obj, new_type);``.
* Replace ``Py_SIZE(obj) = new_size;``
with ``Py_SET_SIZE(obj, new_size);``.
Rationale
=========
Using a macro as a l-value
--------------------------
In the Python C API, some functions are implemented as macro because
writing a macro is simpler than writing a regular function. If a macro
exposes directly a structure member, it is technically possible to use
this macro to not only get the structure member but also set it.
Example with the Python 3.10 ``Py_TYPE()`` macro::
#define Py_TYPE(ob) (((PyObject *)(ob))->ob_type)
This macro can be used as a **r-value** to **get** an object type::
type = Py_TYPE(object);
It can also be used as **l-value** to **set** an object type::
Py_TYPE(object) = new_type;
It is also possible to set an object reference count and an object size
using ``Py_REFCNT()`` and ``Py_SIZE()`` macros.
Setting directly an object attribute relies on the current exact CPython
implementation. Implementing this feature in other Python
implementations can make their C API implementation less efficient.
CPython nogil fork
------------------
Sam Gross forked Python 3.9 to remove the GIL: the `nogil branch
<https://github.com/colesbury/nogil/>`_. This fork has no
``PyObject.ob_refcnt`` member, but a more elaborated implementation for
reference counting, and so the ``Py_REFCNT(obj) = new_refcnt;`` code
fails with a compiler error.
Merging the nogil fork into the upstream CPython main branch requires
first to fix this C API compatibility issue. It is a concrete example of
a Python optimization blocked indirectly by the C API.
This issue was already fixed in Python 3.10: the ``Py_REFCNT()`` macro
has been already modified to disallow using it as a l-value.
These statements are endorsed by Sam Gross (nogil developer).
HPy project
-----------
The `HPy project <https://hpyproject.org/>`_ is a brand new C API for
Python using only handles and function calls: handles are opaque,
structure members cannot be accessed directly, and pointers cannot be
dereferenced.
Searching and replacing ``Py_SET_SIZE()`` is easier and safer than
searching and replacing some strange macro uses of ``Py_SIZE()``.
``Py_SIZE()`` can be semi-mechanically replaced by ``HPy_Length()``,
whereas seeing ``Py_SET_SIZE()`` would immediately make clear that the
code needs bigger changes in order to be ported to HPy (for example by
using ``HPyTupleBuilder`` or ``HPyListBuilder``).
The fewer internal details exposed via macros, the easier it will be for
HPy to provide direct equivalents. Any macro that references
"non-public" interfaces effectively exposes those interfaces publicly.
These statements are endorsed by Antonio Cuni (HPy developer).
GraalVM Python
--------------
In GraalVM, when a Python object is accessed by the Python C API, the C API
emulation layer has to wrap the GraalVM objects into wrappers that expose
the internal structure of the CPython structures (PyObject, PyLongObject,
PyTypeObject, etc). This is because when the C code accesses it directly or via
macros, all GraalVM can intercept is a read at the struct offset, which has
to be mapped back to the representation in GraalVM. The smaller the
"effective" number of exposed struct members (by replacing macros with
functions), the simpler GraalVM wrappers can be.
This PEP alone is not enough to get rid of the wrappers in GraalVM, but it
is a step towards this long term goal. GraalVM already supports HPy which is a better
solution in the long term.
These statements are endorsed by Tim Felgentreff (GraalVM Python developer).
Specification
=============
Disallow using macros as l-value
--------------------------------
PyObject and PyVarObject macros
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``Py_TYPE()``: ``Py_SET_TYPE()`` must be used instead
* ``Py_SIZE()``: ``Py_SET_SIZE()`` must be used instead
"GET" macros
^^^^^^^^^^^^
* ``PyByteArray_GET_SIZE()``
* ``PyBytes_GET_SIZE()``
* ``PyCFunction_GET_CLASS()``
* ``PyCFunction_GET_FLAGS()``
* ``PyCFunction_GET_FUNCTION()``
* ``PyCFunction_GET_SELF()``
* ``PyCell_GET()``
* ``PyCode_GetNumFree()``
* ``PyDict_GET_SIZE()``
* ``PyFunction_GET_ANNOTATIONS()``
* ``PyFunction_GET_CLOSURE()``
* ``PyFunction_GET_CODE()``
* ``PyFunction_GET_DEFAULTS()``
* ``PyFunction_GET_GLOBALS()``
* ``PyFunction_GET_KW_DEFAULTS()``
* ``PyFunction_GET_MODULE()``
* ``PyHeapType_GET_MEMBERS()``
* ``PyInstanceMethod_GET_FUNCTION()``
* ``PyList_GET_SIZE()``
* ``PyMemoryView_GET_BASE()``
* ``PyMemoryView_GET_BUFFER()``
* ``PyMethod_GET_FUNCTION()``
* ``PyMethod_GET_SELF()``
* ``PySet_GET_SIZE()``
* ``PyTuple_GET_SIZE()``
* ``PyUnicode_GET_DATA_SIZE()``
* ``PyUnicode_GET_LENGTH()``
* ``PyUnicode_GET_LENGTH()``
* ``PyUnicode_GET_SIZE()``
* ``PyWeakref_GET_OBJECT()``
"AS" macros
^^^^^^^^^^^
* ``PyByteArray_AS_STRING()``
* ``PyBytes_AS_STRING()``
* ``PyFloat_AS_DOUBLE()``
* ``PyUnicode_AS_DATA()``
* ``PyUnicode_AS_UNICODE()``
PyUnicode macros
^^^^^^^^^^^^^^^^
* ``PyUnicode_1BYTE_DATA()``
* ``PyUnicode_2BYTE_DATA()``
* ``PyUnicode_4BYTE_DATA()``
* ``PyUnicode_DATA()``
* ``PyUnicode_IS_ASCII()``
* ``PyUnicode_IS_COMPACT()``
* ``PyUnicode_IS_READY()``
* ``PyUnicode_KIND()``
* ``PyUnicode_READ()``
* ``PyUnicode_READ_CHAR()``
PyDateTime "GET" macros
^^^^^^^^^^^^^^^^^^^^^^^
* ``PyDateTime_DATE_GET_FOLD()``
* ``PyDateTime_DATE_GET_HOUR()``
* ``PyDateTime_DATE_GET_MICROSECOND()``
* ``PyDateTime_DATE_GET_MINUTE()``
* ``PyDateTime_DATE_GET_SECOND()``
* ``PyDateTime_DATE_GET_TZINFO()``
* ``PyDateTime_DELTA_GET_DAYS()``
* ``PyDateTime_DELTA_GET_MICROSECONDS()``
* ``PyDateTime_DELTA_GET_SECONDS()``
* ``PyDateTime_GET_DAY()``
* ``PyDateTime_GET_MONTH()``
* ``PyDateTime_GET_YEAR()``
* ``PyDateTime_TIME_GET_FOLD()``
* ``PyDateTime_TIME_GET_HOUR()``
* ``PyDateTime_TIME_GET_MICROSECOND()``
* ``PyDateTime_TIME_GET_MINUTE()``
* ``PyDateTime_TIME_GET_SECOND()``
* ``PyDateTime_TIME_GET_TZINFO()``
PyDescr macros
^^^^^^^^^^^^^^
* ``PyDescr_NAME()``
* ``PyDescr_TYPE()``
Port C extensions to Python 3.11
--------------------------------
In practice, the majority of projects affected by these PEP only have to
make two changes:
* Replace ``Py_TYPE(obj) = new_type;``
with ``Py_SET_TYPE(obj, new_type);``.
* Replace ``Py_SIZE(obj) = new_size;``
with ``Py_SET_SIZE(obj, new_size);``.
The `pythoncapi_compat project
<https://github.com/pythoncapi/pythoncapi_compat>`_ can be used to
update automatically C extensions: add Python 3.11 support without
losing support with older Python versions. The project provides a header
file which provides ``Py_SET_REFCNT()``, ``Py_SET_TYPE()`` and
``Py_SET_SIZE()`` functions to Python 3.8 and older.
PyTuple_GET_ITEM() and PyList_GET_ITEM()
----------------------------------------
The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are left
unchanged.
The code pattern ``&PyTuple_GET_ITEM(tuple, 0)`` and
``&PyList_GET_ITEM(list, 0)`` is still commonly used to get access to
the inner ``PyObject**`` array.
Changing these macros is out of the scope of this PEP.
Backwards Compatibility
=======================
The proposed C API changes are backward incompatible on purpose.
At December 1, 2021, a code search on the PyPI top 5000 projects (4760
projects in practice, others don't have a source archive) found that
`only 14 projects are affected
<https://bugs.python.org/issue45476#msg407456>`_ (0.3%):
* datatable (1.0.0)
* frozendict (2.1.1)
* guppy3 (3.1.2)
* M2Crypto (0.38.0)
* mecab-python3 (1.0.4)
* mypy (0.910)
* Naked (0.1.31)
* pickle5 (0.0.12)
* pysha3 (1.0.2)
* python-snappy (0.6.0)
* recordclass (0.16.3)
* scipy (1.7.3)
* zodbpickle (2.2.0)
* zstd (1.5.0.2)
These 14 projects only use 4 macros as l-value:
* ``PyDescr_NAME()`` and ``PyDescr_TYPE()`` (2 projects)
* ``Py_SIZE()`` (8 projects)
* ``Py_TYPE()`` (4 projects)
Moreover, `24 projects just have to regenerate their Cython code
<https://bugs.python.org/issue45476#msg407416>`_ to use
``Py_SET_TYPE()``.
This change does not follow the PEP 387 deprecation process. There is no
known way to emit a deprecation warning only when a macro is used as a
l-value, but not when it's used differently (ex: as a r-value).
Relationship with the HPy project
=================================
The HPy project
---------------
The hope with the HPy project is indeed to provide a C API that is close
to the original API to make porting easy and have it perform as close to
the existing API as possible. At the same time, HPy is sufficently
removed to be a good "C extension API" (as opposed to a stable subset of
the CPython implementation API) that does not leak implementation
details. To ensure this latter property is why the HPy project tries to
develop everything in parallel for CPython, PyPy, and GraalVM Python.
HPy is still evolving very fast. Issues are still being solved while
migrating NumPy, and work has begun on adding support for HPy to Cython. Work on
pybind11 is starting soon. Tim Felgentreff believes by the time HPy has
these users of the existing C API working, HPy should be in a state
where it is generally useful and can be deemed stable enough that
further development can follow a more stable process.
In the long run the HPy project would like to become a promoted API to
write Python C extensions.
The HPy project is a good solution for the long term. It has the
advantage of being developed outside Python and it doesn't require any C
API change.
The C API is here is stay for a few more years
----------------------------------------------
The first concern about HPy is that right now, HPy is not mature nor
widely used, and CPython still has to continue supporting a large amount
of C extensions which are not likely to be ported to HPy soon.
The second concern is the inability to evolve CPython internals to
implement new optimizations, and the inefficient implementation of the
current C API in PyPy, GraalPython, etc. Sadly, HPy will only solve
these problems when most C extensions will be fully ported to HPy:
when it will become reasonable to consider dropping the "legacy" Python
C API.
While porting a C extension to HPy can be done incrementally on CPython,
it requires to modify a lot of code and takes time. Porting most C
extensions to HPy is expected to take a few years.
This PEP proposes to make the C API "less bad" by fixing one problem
which is clearily identified as causing practical issues: macros used as
l-values. This PEP only requires to update a minority of C extensions,
and usually only a few lines need to be changed in impacted extensions.
For example, numpy 1.22 is made of 307,300 lines of C code, and adapting
numpy to the this PEP only modified 11 lines (use Py_SET_TYPE and
Py_SET_SIZE) and adding 4 lines (to define Py_SET_TYPE and Py_SET_SIZE
for Python 3.8 and older). Porting numpy to HPy is expected to require
modifying more lines than that.
Right now, it's hard to bet which approach is the best: fixing the
current C API, or focusing on HPy. It would be risky to only focus on
HPy.
Rejected Idea: Leave the macros as they are
===========================================
The documentation of each function can discourage developers to use
macros to modify Python objects.
If these is a need to make an assignment, a setter function can be added
and the macro documentation can require to use the setter function. For
example, a ``Py_SET_TYPE()`` function has been added to Python 3.9 and
the ``Py_TYPE()`` documentation now requires to use the
``Py_SET_TYPE()`` function to set an object type.
If developers use macros as l-value, it's their responsibility when
their code breaks, not the Python responsibility. We are operating under
the consenting adults principle: we expect users of the Python C API to
use it as documented and expect them to take care of the fallout, if
things break when they don't.
This idea was rejected because only few developers read the
documentation, and only a minority is tracking changes of the Python C
API documentation. The majority of developers are only using CPython and
so are not aware of compatibility issues with other Python
implementations.
Moreover, continuing to allow using macros as l-value does not help the
HPy project and leaves the burden of emulating them on GraalVM's Python
implementation.
Macros already modified
=======================
The following C API macros have already been modified to disallow using
them as l-value:
* ``PyCell_SET()``
* ``PyList_SET_ITEM()``
* ``PyTuple_SET_ITEM()``
* ``Py_REFCNT()`` (Python 3.10): ``Py_SET_REFCNT()`` must be used
* ``_PyGCHead_SET_FINALIZED()``
* ``_PyGCHead_SET_NEXT()``
* ``asdl_seq_GET()``
* ``asdl_seq_GET_UNTYPED()``
* ``asdl_seq_LEN()``
* ``asdl_seq_SET()``
* ``asdl_seq_SET_UNTYPED()``
For example, ``PyList_SET_ITEM(list, 0, item) < 0`` now fails with a
compiler error as expected.
References
==========
* `Python C API: Add functions to access PyObject
<https://vstinner.github.io/c-api-abstract-pyobject.html>`_ (October
2021) article by Victor Stinner
* `[C API] Disallow using PyFloat_AS_DOUBLE() as l-value
<https://bugs.python.org/issue45476>`_
(October 2021)
* `[capi-sig] Py_TYPE() and Py_SIZE() become static inline functions
<https://mail.python.org/archives/list/capi-sig@python.org/thread/WGRLTHTHC32DQTACPPX36TPR2GLJAFRB/>`_
(September 2021)
* `[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type
<https://bugs.python.org/issue39573>`__ (February 2020)
* `bpo-30459: PyList_SET_ITEM could be safer
<https://bugs.python.org/issue30459>`_ (May 2017)
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.