448 lines
15 KiB
ReStructuredText
448 lines
15 KiB
ReStructuredText
PEP: 674
|
|
Title: Disallow using macros as an l-value
|
|
Author: Victor Stinner <vstinner@python.org>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 30-Nov-2021
|
|
Python-Version: 3.11
|
|
|
|
Abstract
|
|
========
|
|
|
|
Incompatible C API change disallowing using macros as an l-value to:
|
|
|
|
* Allow evolving CPython internals;
|
|
* Ease the C API implementation on other Python implementation;
|
|
* Help migrating existing C extensions to the HPy API.
|
|
|
|
Only 9 out of the top 5000 PyPI projects (0.2%) are affected by 2 macro
|
|
changes. An additional 22 projects just have to regenerate their Cython
|
|
code.
|
|
|
|
In practice, the majority of affected projects only have to make two
|
|
changes:
|
|
|
|
* Replace ``Py_TYPE(obj) = new_type;``
|
|
with ``Py_SET_TYPE(obj, new_type);``.
|
|
* Replace ``Py_SIZE(obj) = new_size;``
|
|
with ``Py_SET_SIZE(obj, new_size);``.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Using a macro as a an l-value
|
|
-----------------------------
|
|
|
|
In the Python C API, some functions are implemented as macro because
|
|
writing a macro is simpler than writing a regular function. If a macro
|
|
exposes directly a structure member, it is technically possible to use
|
|
this macro to not only get the structure member but also set it.
|
|
|
|
Example with the Python 3.10 ``Py_TYPE()`` macro::
|
|
|
|
#define Py_TYPE(ob) (((PyObject *)(ob))->ob_type)
|
|
|
|
This macro can be used as a **r-value** to **get** an object type::
|
|
|
|
type = Py_TYPE(object);
|
|
|
|
It can also be used as an **l-value** to **set** an object type::
|
|
|
|
Py_TYPE(object) = new_type;
|
|
|
|
It is also possible to set an object reference count and an object size
|
|
using ``Py_REFCNT()`` and ``Py_SIZE()`` macros.
|
|
|
|
Setting directly an object attribute relies on the current exact CPython
|
|
implementation. Implementing this feature in other Python
|
|
implementations can make their C API implementation less efficient.
|
|
|
|
CPython nogil fork
|
|
------------------
|
|
|
|
Sam Gross forked Python 3.9 to remove the GIL: the `nogil branch
|
|
<https://github.com/colesbury/nogil/>`_. This fork has no
|
|
``PyObject.ob_refcnt`` member, but a more elaborated implementation for
|
|
reference counting, and so the ``Py_REFCNT(obj) = new_refcnt;`` code
|
|
fails with a compiler error.
|
|
|
|
Merging the nogil fork into the upstream CPython main branch requires
|
|
first to fix this C API compatibility issue. It is a concrete example of
|
|
a Python optimization blocked indirectly by the C API.
|
|
|
|
This issue was already fixed in Python 3.10: the ``Py_REFCNT()`` macro
|
|
has been already modified to disallow using it as an l-value.
|
|
|
|
These statements are endorsed by Sam Gross (nogil developer).
|
|
|
|
HPy project
|
|
-----------
|
|
|
|
The `HPy project <https://hpyproject.org/>`_ is a brand new C API for
|
|
Python using only handles and function calls: handles are opaque,
|
|
structure members cannot be accessed directly, and pointers cannot be
|
|
dereferenced.
|
|
|
|
Searching and replacing ``Py_SET_SIZE()`` is easier and safer than
|
|
searching and replacing some strange macro uses of ``Py_SIZE()``.
|
|
``Py_SIZE()`` can be semi-mechanically replaced by ``HPy_Length()``,
|
|
whereas seeing ``Py_SET_SIZE()`` would immediately make clear that the
|
|
code needs bigger changes in order to be ported to HPy (for example by
|
|
using ``HPyTupleBuilder`` or ``HPyListBuilder``).
|
|
|
|
The fewer internal details exposed via macros, the easier it will be for
|
|
HPy to provide direct equivalents. Any macro that references
|
|
"non-public" interfaces effectively exposes those interfaces publicly.
|
|
|
|
These statements are endorsed by Antonio Cuni (HPy developer).
|
|
|
|
GraalVM Python
|
|
--------------
|
|
|
|
In GraalVM, when a Python object is accessed by the Python C API, the C API
|
|
emulation layer has to wrap the GraalVM objects into wrappers that expose
|
|
the internal structure of the CPython structures (PyObject, PyLongObject,
|
|
PyTypeObject, etc). This is because when the C code accesses it directly or via
|
|
macros, all GraalVM can intercept is a read at the struct offset, which has
|
|
to be mapped back to the representation in GraalVM. The smaller the
|
|
"effective" number of exposed struct members (by replacing macros with
|
|
functions), the simpler GraalVM wrappers can be.
|
|
|
|
This PEP alone is not enough to get rid of the wrappers in GraalVM, but it
|
|
is a step towards this long term goal. GraalVM already supports HPy which is a better
|
|
solution in the long term.
|
|
|
|
These statements are endorsed by Tim Felgentreff (GraalVM Python developer).
|
|
|
|
Specification
|
|
=============
|
|
|
|
Disallow using macros as an l-value
|
|
-----------------------------------
|
|
|
|
PyObject and PyVarObject macros
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
* ``Py_TYPE()``: ``Py_SET_TYPE()`` must be used instead
|
|
* ``Py_SIZE()``: ``Py_SET_SIZE()`` must be used instead
|
|
|
|
"GET" macros
|
|
^^^^^^^^^^^^
|
|
|
|
* ``PyByteArray_GET_SIZE()``
|
|
* ``PyBytes_GET_SIZE()``
|
|
* ``PyCFunction_GET_CLASS()``
|
|
* ``PyCFunction_GET_FLAGS()``
|
|
* ``PyCFunction_GET_FUNCTION()``
|
|
* ``PyCFunction_GET_SELF()``
|
|
* ``PyCell_GET()``
|
|
* ``PyCode_GetNumFree()``
|
|
* ``PyDict_GET_SIZE()``
|
|
* ``PyFunction_GET_ANNOTATIONS()``
|
|
* ``PyFunction_GET_CLOSURE()``
|
|
* ``PyFunction_GET_CODE()``
|
|
* ``PyFunction_GET_DEFAULTS()``
|
|
* ``PyFunction_GET_GLOBALS()``
|
|
* ``PyFunction_GET_KW_DEFAULTS()``
|
|
* ``PyFunction_GET_MODULE()``
|
|
* ``PyHeapType_GET_MEMBERS()``
|
|
* ``PyInstanceMethod_GET_FUNCTION()``
|
|
* ``PyList_GET_SIZE()``
|
|
* ``PyMemoryView_GET_BASE()``
|
|
* ``PyMemoryView_GET_BUFFER()``
|
|
* ``PyMethod_GET_FUNCTION()``
|
|
* ``PyMethod_GET_SELF()``
|
|
* ``PySet_GET_SIZE()``
|
|
* ``PyTuple_GET_SIZE()``
|
|
* ``PyUnicode_GET_DATA_SIZE()``
|
|
* ``PyUnicode_GET_LENGTH()``
|
|
* ``PyUnicode_GET_LENGTH()``
|
|
* ``PyUnicode_GET_SIZE()``
|
|
* ``PyWeakref_GET_OBJECT()``
|
|
|
|
"AS" macros
|
|
^^^^^^^^^^^
|
|
|
|
* ``PyByteArray_AS_STRING()``
|
|
* ``PyBytes_AS_STRING()``
|
|
* ``PyFloat_AS_DOUBLE()``
|
|
* ``PyUnicode_AS_DATA()``
|
|
* ``PyUnicode_AS_UNICODE()``
|
|
|
|
PyUnicode macros
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
* ``PyUnicode_1BYTE_DATA()``
|
|
* ``PyUnicode_2BYTE_DATA()``
|
|
* ``PyUnicode_4BYTE_DATA()``
|
|
* ``PyUnicode_DATA()``
|
|
* ``PyUnicode_IS_ASCII()``
|
|
* ``PyUnicode_IS_COMPACT()``
|
|
* ``PyUnicode_IS_READY()``
|
|
* ``PyUnicode_KIND()``
|
|
* ``PyUnicode_READ()``
|
|
* ``PyUnicode_READ_CHAR()``
|
|
|
|
PyDateTime "GET" macros
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
* ``PyDateTime_DATE_GET_FOLD()``
|
|
* ``PyDateTime_DATE_GET_HOUR()``
|
|
* ``PyDateTime_DATE_GET_MICROSECOND()``
|
|
* ``PyDateTime_DATE_GET_MINUTE()``
|
|
* ``PyDateTime_DATE_GET_SECOND()``
|
|
* ``PyDateTime_DATE_GET_TZINFO()``
|
|
* ``PyDateTime_DELTA_GET_DAYS()``
|
|
* ``PyDateTime_DELTA_GET_MICROSECONDS()``
|
|
* ``PyDateTime_DELTA_GET_SECONDS()``
|
|
* ``PyDateTime_GET_DAY()``
|
|
* ``PyDateTime_GET_MONTH()``
|
|
* ``PyDateTime_GET_YEAR()``
|
|
* ``PyDateTime_TIME_GET_FOLD()``
|
|
* ``PyDateTime_TIME_GET_HOUR()``
|
|
* ``PyDateTime_TIME_GET_MICROSECOND()``
|
|
* ``PyDateTime_TIME_GET_MINUTE()``
|
|
* ``PyDateTime_TIME_GET_SECOND()``
|
|
* ``PyDateTime_TIME_GET_TZINFO()``
|
|
|
|
Port C extensions to Python 3.11
|
|
--------------------------------
|
|
|
|
In practice, the majority of projects affected by these PEP only have to
|
|
make two changes:
|
|
|
|
* Replace ``Py_TYPE(obj) = new_type;``
|
|
with ``Py_SET_TYPE(obj, new_type);``.
|
|
* Replace ``Py_SIZE(obj) = new_size;``
|
|
with ``Py_SET_SIZE(obj, new_size);``.
|
|
|
|
The `pythoncapi_compat project
|
|
<https://github.com/pythoncapi/pythoncapi_compat>`_ can be used to
|
|
update automatically C extensions: add Python 3.11 support without
|
|
losing support with older Python versions. The project provides a header
|
|
file which provides ``Py_SET_REFCNT()``, ``Py_SET_TYPE()`` and
|
|
``Py_SET_SIZE()`` functions to Python 3.8 and older.
|
|
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
The proposed C API changes are backward incompatible on purpose.
|
|
|
|
On January 26, 2022, a code search on the top 5000 PyPI projects (4762
|
|
projects in practice; others don't have a source archive) found that
|
|
only 9 projects are affected (0.2%):
|
|
|
|
* datatable (1.0.0)
|
|
* guppy3 (3.1.2)
|
|
* pickle5 (0.0.12)
|
|
* psycopg2 (2.9.3)
|
|
* pysha3 (1.0.2)
|
|
* python-snappy (0.6.0)
|
|
* recordclass (0.17.1)
|
|
* scipy (1.7.3)
|
|
* zodbpickle (2.2.0)
|
|
|
|
Of these 9 projects, only 2 macros are used as an l-value:
|
|
``Py_TYPE()`` and ``Py_SIZE()``.
|
|
|
|
An additional 22 projects just have to regenerate their Cython code to
|
|
use ``Py_SET_TYPE()`` and ``Py_SET_SIZE()``.
|
|
|
|
This change does not follow the :pep:`387` deprecation process. There is
|
|
no known way to emit a deprecation warning only when a macro is used as
|
|
an l-value, but not when it's used differently (ex: as a r-value).
|
|
|
|
PyTuple_GET_ITEM() and PyList_GET_ITEM() are left unchanged
|
|
-----------------------------------------------------------
|
|
|
|
The ``PyTuple_GET_ITEM()`` and ``PyList_GET_ITEM()`` macros are left
|
|
unchanged.
|
|
|
|
The code patterns ``&PyTuple_GET_ITEM(tuple, 0)`` and
|
|
``&PyList_GET_ITEM(list, 0)`` are still commonly used to get access to
|
|
the inner ``PyObject**`` array.
|
|
|
|
Changing these macros is out of the scope of this PEP.
|
|
|
|
PyDescr_NAME() and PyDescr_TYPE() are left unchanged
|
|
----------------------------------------------------
|
|
|
|
The ``PyDescr_NAME()`` and ``PyDescr_TYPE()`` macros are left unchanged.
|
|
|
|
These macros give access to ``PyDescrObject.d_name`` and
|
|
``PyDescrObject.d_type`` members. They can be used as l-values to set
|
|
these members.
|
|
|
|
The SWIG project uses these macros as l-values to set these members. It
|
|
would be possible to modify SWIG to prevent setting ``PyDescrObject``
|
|
structure members directly, but it is not really worth it since the
|
|
``PyDescrObject`` structure is not performance critical and is unlikely
|
|
to change soon.
|
|
|
|
See the `bpo-46538 <https://bugs.python.org/issue46538>`_ "[C API] Make
|
|
the PyDescrObject structure opaque: PyDescr_NAME() and PyDescr_TYPE()"
|
|
issue for more details.
|
|
|
|
|
|
Relationship with the HPy project
|
|
=================================
|
|
|
|
The HPy project
|
|
---------------
|
|
|
|
The hope with the HPy project is to provide a C API that is close
|
|
to the original API—to make porting easy—and have it perform as close to
|
|
the existing API as possible. At the same time, HPy is sufficently
|
|
removed to be a good "C extension API" (as opposed to a stable subset of
|
|
the CPython implementation API) that does not leak implementation
|
|
details. To ensure this latter property, the HPy project tries to
|
|
develop everything in parallel for CPython, PyPy, and GraalVM Python.
|
|
|
|
HPy is still evolving very fast. Issues are still being solved while
|
|
migrating NumPy, and work has begun on adding support for HPy to Cython. Work on
|
|
pybind11 is starting soon. Tim Felgentreff believes by the time HPy has
|
|
these users of the existing C API working, HPy should be in a state
|
|
where it is generally useful and can be deemed stable enough that
|
|
further development can follow a more stable process.
|
|
|
|
In the long run the HPy project would like to become a promoted API to
|
|
write Python C extensions.
|
|
|
|
The HPy project is a good solution for the long term. It has the
|
|
advantage of being developed outside Python and it doesn't require any C
|
|
API change.
|
|
|
|
The C API is here is stay for a few more years
|
|
----------------------------------------------
|
|
|
|
The first concern about HPy is that right now, HPy is not mature nor
|
|
widely used, and CPython still has to continue supporting a large amount
|
|
of C extensions which are not likely to be ported to HPy soon.
|
|
|
|
The second concern is the inability to evolve CPython internals to
|
|
implement new optimizations, and the inefficient implementation of the
|
|
current C API in PyPy, GraalPython, etc. Sadly, HPy will only solve
|
|
these problems when most C extensions will be fully ported to HPy:
|
|
when it will become reasonable to consider dropping the "legacy" Python
|
|
C API.
|
|
|
|
While porting a C extension to HPy can be done incrementally on CPython,
|
|
it requires to modify a lot of code and takes time. Porting most C
|
|
extensions to HPy is expected to take a few years.
|
|
|
|
This PEP proposes to make the C API "less bad" by fixing one problem
|
|
which is clearily identified as causing practical issues: macros used as
|
|
l-values. This PEP only requires updating a minority of C
|
|
extensions, and usually only a few lines need to be changed in impacted
|
|
extensions.
|
|
|
|
For example, NumPy 1.22 is made of 307,300 lines of C code, and adapting
|
|
NumPy to the this PEP only modified 11 lines (use Py_SET_TYPE and
|
|
Py_SET_SIZE) and adding 4 lines (to define Py_SET_TYPE and Py_SET_SIZE
|
|
for Python 3.8 and older). The beginnings of the NumPy port to HPy
|
|
already required modifying more lines than that.
|
|
|
|
Right now, it's hard to bet which approach is the best: fixing the
|
|
current C API, or focusing on HPy. It would be risky to only focus on
|
|
HPy.
|
|
|
|
|
|
Rejected Idea: Leave the macros as they are
|
|
===========================================
|
|
|
|
The documentation of each function can discourage developers to use
|
|
macros to modify Python objects.
|
|
|
|
If these is a need to make an assignment, a setter function can be added
|
|
and the macro documentation can require to use the setter function. For
|
|
example, a ``Py_SET_TYPE()`` function has been added to Python 3.9 and
|
|
the ``Py_TYPE()`` documentation now requires to use the
|
|
``Py_SET_TYPE()`` function to set an object type.
|
|
|
|
If developers use macros as an l-value, it's their responsibility when
|
|
their code breaks, not Python's responsibility. We are operating under
|
|
the consenting adults principle: we expect users of the Python C API to
|
|
use it as documented and expect them to take care of the fallout, if
|
|
things break when they don't.
|
|
|
|
This idea was rejected because only few developers read the
|
|
documentation, and only a minority is tracking changes of the Python C
|
|
API documentation. The majority of developers are only using CPython and
|
|
so are not aware of compatibility issues with other Python
|
|
implementations.
|
|
|
|
Moreover, continuing to allow using macros as an l-value does not help
|
|
the HPy project, and leaves the burden of emulating them on GraalVM's
|
|
Python implementation.
|
|
|
|
|
|
Macros already modified
|
|
=======================
|
|
|
|
The following C API macros have already been modified to disallow using
|
|
them as l-value:
|
|
|
|
* ``PyCell_SET()``
|
|
* ``PyList_SET_ITEM()``
|
|
* ``PyTuple_SET_ITEM()``
|
|
* ``Py_REFCNT()`` (Python 3.10): ``Py_SET_REFCNT()`` must be used
|
|
* ``_PyGCHead_SET_FINALIZED()``
|
|
* ``_PyGCHead_SET_NEXT()``
|
|
* ``asdl_seq_GET()``
|
|
* ``asdl_seq_GET_UNTYPED()``
|
|
* ``asdl_seq_LEN()``
|
|
* ``asdl_seq_SET()``
|
|
* ``asdl_seq_SET_UNTYPED()``
|
|
|
|
For example, ``PyList_SET_ITEM(list, 0, item) < 0`` now fails with a
|
|
compiler error as expected.
|
|
|
|
|
|
Discussion
|
|
==========
|
|
|
|
* `PEP 674: Disallow using macros as l-value (version 2)
|
|
<https://mail.python.org/archives/list/python-dev@python.org/thread/J7SXC2YQGP37UYIEULISLUTKW5FHN3Z7/>`_
|
|
(Jan 18, 2022)
|
|
* `PEP 674: Disallow using macros as l-value
|
|
<https://mail.python.org/archives/list/python-dev@python.org/thread/KPIJPPJ6XVNOLGZQD2PFGMT7LBJMTTCO/>`_
|
|
(Nov 30, 2021)
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
* `Python C API: Add functions to access PyObject
|
|
<https://vstinner.github.io/c-api-abstract-pyobject.html>`_ (October
|
|
2021) article by Victor Stinner
|
|
* `[C API] Disallow using PyFloat_AS_DOUBLE() as l-value
|
|
<https://bugs.python.org/issue45476>`_
|
|
(October 2021)
|
|
* `[capi-sig] Py_TYPE() and Py_SIZE() become static inline functions
|
|
<https://mail.python.org/archives/list/capi-sig@python.org/thread/WGRLTHTHC32DQTACPPX36TPR2GLJAFRB/>`_
|
|
(September 2021)
|
|
* `[C API] Avoid accessing PyObject and PyVarObject members directly: add Py_SET_TYPE() and Py_IS_TYPE(), disallow Py_TYPE(obj)=type
|
|
<https://bugs.python.org/issue39573>`__ (February 2020)
|
|
* `bpo-30459: PyList_SET_ITEM could be safer
|
|
<https://bugs.python.org/issue30459>`_ (May 2017)
|
|
|
|
|
|
Version History
|
|
===============
|
|
|
|
* Version 3: No longer change PyDescr_TYPE() and PyDescr_NAME() macros
|
|
* Version 2: Add "Relationship with the HPy project" section, remove
|
|
the PyPy section
|
|
* Version 1: First public version
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|