diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 8d6ba1744..03d9c709e 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -610,6 +610,7 @@ peps/pep-0729.rst @JelleZijlstra @hauntsaninja peps/pep-0730.rst @ned-deily peps/pep-0731.rst @gvanrossum @encukou @vstinner @zooba @iritkatriel peps/pep-0732.rst @Mariatta +peps/pep-0733.rst @encukou @vstinner @zooba @iritkatriel # ... # peps/pep-0754.rst # ... diff --git a/peps/pep-0733.rst b/peps/pep-0733.rst new file mode 100644 index 000000000..f052f375a --- /dev/null +++ b/peps/pep-0733.rst @@ -0,0 +1,685 @@ +PEP: 733 +Title: An Evaluation of Python's Public C API +Author: Erlend Egeberg Aasland , + Domenico Andreoli , + Stefan Behnel , + Carl Friedrich Bolz-Tereick , + Simon Cross , + Steve Dower , + Tim Felgentreff , + David Hewitt <1939362+davidhewitt@users.noreply.github.com>, + Shantanu Jain , + Wenzel Jakob , + Irit Katriel , + Marc-Andre Lemburg , + Donghee Na , + Karl Nelson , + Ronald Oussoren , + Antoine Pitrou , + Neil Schemenauer , + Mark Shannon , + Stepan Sindelar , + Gregory P. Smith , + Eric Snow , + Victor Stinner , + Guido van Rossum , + Petr Viktorin , + Carol Willing , + William Woodruff , + David Woods , + Jelle Zijlstra , +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 16-Oct-2023 + + +Abstract +======== + +This **informational** PEP describes our shared view of the public C API. The +document defines: + +* purposes of the C API +* stakeholders and their particular use cases and requirements +* strengths of the C API +* problems of the C API categorized into nine areas of weakness + +This document does not propose solutions to any of the identified problems. By +creating a shared list of C API issues, this document will help to guide +continuing discussion about change proposals and to identify evaluation +criteria. + + +Introduction +============ + +Python's C API was not designed for the different purposes it currently +fulfills. It evolved from what was initially the internal API between +the C code of the interpreter and the Python language and libraries. +In its first incarnation, it was exposed to make it possible to embed +Python into C/C++ applications and to write extension modules in C/C++. +These capabilities were instrumental to the growth of Python's ecosystem. +Over the decades, the C API grew to provide different tiers of stability, +conventions changed, and new usage patterns have emerged, such as bindings +to languages other than C/C++. In the next few years, new developments +are expected to further test the C API, such as the removal of the GIL +and the development of a JIT compiler. However, this growth was not +supported by clearly documented guidelines, resulting in inconsitent +approaches to API design in different subsystems of CPython. In addition, +CPython is no longer the only implementation of Python, and some of the +design decisions made when it was, are difficult for alternative +implementations to work with +[`Issue 64 `__]. +In the meantime, lessons were learned and mistakes in both the design +and the implementation of the C API were identified. + +Evolving the C API is hard due to the combination of backwards +compatibility constraints and its inherent complexity, both +technical and social. Different types of users bring different, +sometimes conflicting, requirements. The tradeoff between stability +and progress is an ongoing, highly contentious topic of discussion +when suggestions are made for incremental improvements. +Several proposals have been put forward for improvement, redesign +or replacement of the C API, each representing a deep analysis of +the problems. At the 2023 Language Summit, three back-to-back +sessions were devoted to different aspects of the C API. There is +general agreement that a new design can remedy the problems that +the C API has accumulated over the last 30 years, while at the +same time updating it for use cases that it was not originally +designed for. + +However, there was also a sense at the Language Summit that we are +trying to discuss solutions without a clear common understanding +of the problems that we are trying to solve. We decided that +we need to agree on the current problems with the C API, before +we are able to evaluate any of the proposed solutions. We +therefore created the +`capi-workgroup `__ +repository on GitHub in order to collect everyone's ideas on that +question. + +Over 60 different issues were created on that repository, each +describing a problem with the C API. We categorized them and +identified a number of recurring themes. The sections below +mostly correspond to these themes, and each contains a combined +description of the issues raised in that category, along with +links to the individual issues. In addition, we included a section +that aims to identify the different stakeholders of the C API, +and the particular requirements that each of them has. + + +C API Stakeholders +================== + +As mentioned in the introduction, the C API was originally +created as the internal interface between CPython's +interpreter and the Python layer. It was later exposed as +a way for third-party developers to extend and embed Python +programs. Over the years, new types of stakeholders emerged, +with different requirements and areas of focus. This section +describes this complex state of affairs in terms of the +actions that different stakeholders need to perform through +the C API. + +Common Actions for All Stakeholders +----------------------------------- + +There are actions which are generic, and required by +all types of API users: + +* Define functions and call them +* Define new types +* Create instances of builtin and user-defined types +* Perform operations on object instances +* Introspect objects, including types, instances, and functions +* Raise and handle exceptions +* Import modules +* Access to Python's OS interface + +The following sections look at the unique requirements of various stakeholders. + +Extension Writers +----------------- + +Extension writers are the traditional users of the C API. Their requirements +are the common actions listed above. They also commonly need to: + +* Create new modules +* Efficiently interface between modules at the C level + + +Authors of Embedded Python Applications +--------------------------------------- + +Applications with an embedded Python interpreter. Examples are +`Blender `__ and +`OBS `__. + +They need to be able to: + +* Configure the interpreter (import paths, inittab, ``sys.argv``, memory + allocator, etc.). +* Interact with the execution model and program lifetime, including + clean interpreter shutdown and restart. +* Represent complex data models in a way Python can use without + having to create deep copies. +* Provide and import frozen modules. +* Run and manage multiple independent interpreters (in particular, when + embedded in a library that wants to avoid global effects). + +Python Implementations +---------------------- + +Python implementations such as +`CPython `__, +`PyPy `__, +`GraalPy `__, +`IronPython `__, +`RustPython `__, +`MicroPython `__, +and `Jython `__), may take +very different approaches for the implementation of +different subsystems. They need: + +* The API to be abstract and hide implementation details. +* A specification of the API, ideally with a test suite + that ensures compatibility. +* It would be nice to have an ABI that can be shared + across Python implementations. + +Alternative APIs and Binding Generators +--------------------------------------- + +There are several projects that implement alternatives to the +C API, which offer extension users advantanges over programming +directly with the C API. These APIs are implemented with the +C API, and in some cases by using CPython internals. + +There are also libraries that create bindings between Python and +other object models, paradigms or languages. + +There is overlap between these categories: binding generators +usually provide alternative APIs, and vice versa. + +Examples are +`Cython `__, +`cffi `__, +`pybind11 `__ and +`nanobind `__ for C++, +`PyO3 `__ for Rust, +`PySide `__ for Qt, +`PyGObject `__ for GTK, +`Pygolo `__ for Go, +`JPype `__ for Java, +`PyJNIus `__ for Android, +`PyObjC `__ for Objective-C, +`SWIG `__ for C/C++, +`Python.NET `__ for .NET (C#), +`HPy `__, +`Mypyc `__, +`Pythran `__ and +`pythoncapi-compat `__. +CPython's DSL for parsing function arguments, the +`Argument Clinic `__, +can also be seen as belonging to this category of stakeholders. + +Alternative APIs need minimal building blocks for accessing CPython +efficiently. They don't necessarily need an ergonomic API, because +they typically generate code that is not intended to be read +by humans. But they do need it to be comprehensive enough so that +they can avoid accessing internals, without sacrificing performance. + +Binding generators often need to: + +* Create custom objects (e.g. function/module objects + and traceback entries) that match the behavior of equivalent + Python code as closely as possible. +* Dynamically create objects which are static in traditional + C extensions (e.g. classes/modules), and need CPython to manage + their state and lifetime. +* Dynamically adapt foreign objects (strings, GC'd containers), with + low overhead. +* Adapt external mechanisms, execution models and guarantees to the + Python way (stackful coroutines, continuations, + one-writer-or-multiple-readers semantics, virtual multiple inheritance, + 1-based indexing, super-long inheritance chains, goroutines, channels, + etc.). + +These tools might also benefit from a choice between a more stable +and a faster (possibly lower-level) API. Their users could +then decide whether they can afford to regenerate the code often or +trade some performance for more stability and less maintenance work. + + +Strengths of the C API +====================== + +While the bulk of this document is devoted to problems with the +C API that we would like to see fixed in any new design, it is +also important to point out the strengths of the C API, and to +make sure that they are preserved. + +As mentioned in the introduction, the C API enabled the +development and growth of the Python ecosystem over the last +three decades, while evolving to support use cases that it was +not originally designed for. This track record in itself is +indication of how effective and valuable it has been. + +A number of specific strengths were mentioned in the +capi-workgroup discussions. Heap types were identified +as much safer and easier to use than static types +[`Issue 4 `__]. + +API functions that take a C string literal for lookups based +on a Python string are very convenient +[`Issue 30 `__]. + +The limited API demonstrates that an API which hides implementation +details makes it easier to evolve Python +[`Issue 30 `__]. + +C API problems +============== + +The remainder of this document summarizes and categorizes the problems that were reported on +the `capi-workgroup `__ repository. +The issues are grouped into several categories. + + +API Evolution and Maintenance +----------------------------- + +The difficulty of making changes in the C API is central to this report. It is +implicit in many of the issues we discuss here, particularly when we need to +decide whether an incremental bugfix can resolve the issue, or whether it can +only be addressed as part of an API redesign +[`Issue 44 `__]. The +benefit of each incremental change is often viewed as too small to justify the +disruption. Over time, this implies that every mistake we make in an API's +design or implementation remains with us indefinitely. + +We can take two views on this issue. One is that this is a problem and the +solution needs to be baked into any new C API we design, in the form of a +process for incremental API evolution, which includes deprecation and +removal of API elements. The other possible approach is that this is not +a problem to be solved, but rather a feature of any API. In this +view, API evolution should not be incremental, but rather through large +redesigns, each of which learns from the mistakes of the past and is not +shackled by backwards compatibility requirements (in the meantime, new +API elements may be added, but nothing can ever be removed). A compromise +approach is somewhere between these two extremes, fixing issues which are +easy or important enough to tackle incrementally, and leaving others alone. + +The problem we have in CPython is that we don't have an agreed, official +approach to API evolution. Different members of the core team are pulling in +different directions and this is an ongoing source of disagreements. +Any new C API needs to come with a clear decision about the model +that its maintenance will follow, as well as the technical and +organizational processes by which this will work. + +If the model does include provisions for incremental evolution of the API, +it will include processes for managing the impact of the change on users +[`Issue 60 `__], +perhaps through introducing an external backwards compatibility module +[`Issue 62 `__], +or a new API tier of "blessed" functions +[`Issue 55 `__]. + + +API Specification and Abstraction +--------------------------------- + +The C API does not have a formal specification, it is currently defined +as whatever the reference implementation (CPython) contains in a +particular version. The documentation acts as an incomplete description, +which is not sufficient for verifying the correctness of either the full +API, the limited API, or the stable ABI. As a result, the C API may +change significantly between releases without needing a more visible +specification update, and this leads to a number of problems. + +Bindings for languages other than C/C++ must parse C code +[`Issue 7 `__]. +Some C language features are hard to handle in this way, because +they produce compiler-dependent output (such as enums) or require +a C preprocessor/compiler rather than just a parser (such as macros) +[`Issue 35 `__]. + +Furthermore, C header files tend to expose more than what is intended +to be part of the public API +[`Issue 34 `__]. +In particular, implementation details such as the precise memory +layouts of internal data structures can be exposed +[`Issue 22 `__ +and :pep:`620`]. +This can make API evolution very difficult, in particular when it +occurs in the stable ABI as in the case of ``ob_refcnt`` and ``ob_type``, +which are accessed via the reference counting macros +[`Issue 45 `__]. + +We identified a deeper issue in relation to the way that reference +counting is exposed. The way that C extensions are required to +manage references with calls to ``Py_INCREF`` and ``Py_DECREF`` is +specific to CPython's memory model, and is hard for alternative +Python implementations to emulate. +[`Issue 12 `__]. + +Another set of problems arises from the fact that a ``PyObject*`` is +exposed in the C API as an actual pointer rather than a handle. The +address of an object serves as its ID and is used for comparison, +and this complicates matters for alternative Python implementations +that move objects during GC +[`Issue 37 `__]. + +A separate issue is that object references are opaque to the runtime, +discoverable only through calls to ``tp_traverse``/``tp_clear``, +which have their own purposes. If there was a way for the runtime to +know the structure of the object graph, and keep up with changes in it, +this would make it possible for alternative implementations to implement +different memory management schemes +[`Issue 33 `__]. + +Object Reference Management +--------------------------- + +There does not exist a consistent naming convention for functions +which makes their reference semantics obvious, and this leads to +error prone C API functions, where they do not follow the typical +behaviour. When a C API function returns a ``PyObject*``, the +caller typically gains ownership of a reference to the object. +However, there are exceptions where a function returns a +"borrowed" reference, which the caller can access but does not own +a reference to. Similarly, functions typically do not change the +ownership of references to their arguments, but there are +exceptions where a function "steals" a reference, i.e., the +ownership of the reference is permanently transferred from the +caller to the callee by the call +[`Issue 8 `__ +and `Issue 52 `__]. +The terminology used to describe these situations in the documentation +can also be improved +[`Issue 11 `__]. + +A more radical change is necessary in the case of functions that +return "borrowed" references (such as ``PyList_GetItem``) +[`Issue 5 `__ and +`Issue 21 `__] +or pointers to parts of the internal structure of an object +(such as ``PyBytes_AsString``) +[`Issue 57 `__]. +In both cases, the reference/pointer is valid for as long as the +owning object holds the reference, but this time is hard to reason about. +Such functions should not exist in the API without a mechanism that can +make them safe. + +For containers, the API is currently missing bulk operations on the +references of contained objects. This is particularly important for +a stable ABI where ``INCREF`` and ``DECREF`` cannot be macros, making +bulk operations expensive when implemented as a sequence of function +calls +[`Issue 15 `__]. + +Type Definition and Object Creation +----------------------------------- + +The C API has functions that make it possible to create incomplete +or inconsistent Python objects, such as ``PyTuple_New`` and +``PyUnicode_New``. This causes problems when the object is tracked +by GC or its ``tp_traverse``/``tp_clear`` functions are called. +A related issue is with functions such as ``PyTuple_SetItem`` +which is used to modify a partially initialized tuple (tuples +are immutable once fully initialized) +[`Issue 56 `__]. + +We identified a few issues with type definition APIs. For legacy +reasons, there is often a significant amount of code duplication +between ``tp_new`` and ``tp_vectorcall`` +[`Issue 24 `__]. +The type slot function should be called indirectly, so that their +signatures can change to include context information +[`Issue 13 `__]. +Several aspects of the type definition and creation process are not +well defined, such as which stage of the process is responsible for +initializing and clearing different fields of the type object +[`Issue 49 `__]. + +Error Handling +-------------- + +Error handling in the C API is based on the error indicator which is stored +on the thread state (in global scope). The design intention was that each +API function returns a value indicating whether an error has occurred (by +convention, ``-1`` or ``NULL``). When the program knows that an error +occurred, it can fetch the exception object which is stored in the +error indicator. We identified a number of problems which are related +to error handling, pointing at APIs which are too easy to use incorrectly. + +There are functions that do not report all errors that occur while they +execute. For example, ``PyDict_GetItem`` clears any errors that occur +when it calls the key's hash function, or while performing a lookup +in the dictionary +[`Issue 51 `__]. + +Python code never executes with an in-flight exception (by definition), +and typically code using native functions should also be interrupted by +an error being raised. This is not checked in most C API functions, and +there are places in the interpreter where error handling code calls a C API +function while an exception is set. For example, see the call to +``PyUnicode_FromString`` in the error handler of ``_PyErr_WriteUnraisableMsg`` +[`Issue 2 `__]. + + +There are functions that do not return a value, so a caller is forced to +query the error indicator in order to identify whether an error has occurred. +An example is ``PyBuffer_Release`` +[`Issue 20 `__]. +There are other functions which do have a return value, but this return value +does not unambiguously indicate whether an error has occurred. For example, +``PyLong_AsLong`` returns ``-1`` in case of error, or when the value of the +argument is indeed ``-1`` +[`Issue 1 `__]. +In both cases, the API is error prone because it is possible that the +error indicator was already set before the function was called, and the +error is incorrectly attributed. The fact that the error was not detected +before the call is a bug in the calling code, but the behaviour of the +program in this case doesn't make it easy to identify and debug the +problem. + +There are functions that take a ``PyObject*`` argument, with special meaning +when it is ``NULL``. For example, if ``PyObject_SetAttr`` receives ``NULL`` as +the value to set, this means that the attribute should be cleared. This is error +prone because it could be that ``NULL`` indicates an error in the construction +of the value, and the program failed to check for this error. The program will +misinterpret the ``NULL`` to mean something different than error +[`Issue 47 `__]. + + +API Tiers and Stability Guarantees +---------------------------------- + +The different API tiers provide different tradeoffs of stability vs +API evolution, and sometimes performance. + +The stable ABI was identified as an area that needs to be looked into. At +the moment it is incomplete and not widely adopted. At the same time, its +existence is making it hard to make changes to some implementation +details, because it exposes struct fields such as ``ob_refcnt``, +``ob_type`` and ``ob_size``. There was some discussion about whether +the stable ABI is worth keeping. Arguments on both sides can be +found in [`Issue 4 `__] +and [`Issue 9 `__]. + +Alternatively, it was suggested that in order to be able to evolve +the stable ABI, we need a mechanism to support multiple versions of +it in the same Python binary. It was pointed out that versioning +individual functions within a single ABI version is not enough +because it may be necessary to evolve, together, a group of functions +that interoperate with each other +[`Issue 39 `__]. + +The limited API was introduced in 3.2 as a blessed subset of the C API +which is recommended for users who would like to restrict themselves +to high quality APIs which are not likely to change often. The +``Py_LIMITED_API`` flag allows users to restrict their program to older +versions of the limited API, but we now need the opposite option, to +exclude older versions. This would make it possible to evolve the +limited API by replacing flawed elements in it +[`Issue 54 `__]. +More generally, in a redesign we should revisit the way that API +tiers are specified and consider designing a method that will unify the +way we currently select between the different tiers +[`Issue 59 `__]. + +API elements whose names begin with an underscore are considered +private, essentially an API tier with no stability guarantees. +However, this was only clarified recently, in :pep:`689`. It is +not clear what the change policy should be with respect to such +API elements that predate PEP 689 +[`Issue 58 `__]. + +There are API functions which have an unsafe (but fast) version as well as +a safe version which performs error checking (for example, +``PyTuple_GET_ITEM`` vs ``PyTuple_GetItem``). It may help to +be able to group them into their own tiers - the "unsafe API" tier and +the "safe API" tier +[`Issue 61 `__]. + +Use of the C Language +--------------------- + +A number of issues were raised with respect to the way that CPython +uses the C language. First there is the issue of which C dialect +we use, and how we test our compatibility with it, as well as API +header compatibility with C++ dialects +[`Issue 42 `__]. + +Usage of ``const`` in the API is currently sparse, but it is not +clear whether this is something that we should consider changing +[`Issue 38 `__]. + +We currently use the C types ``long`` and ``int``, where fixed-width integers +such as ``int32_t`` and ``int64_t`` may now be better choices +[`Issue 27 `__]. + +We are using C language features which are hard for other languages +to interact with, such as macros, variadic arguments, enums, bitfields, +and non-function symbols +[`Issue 35 `__]. + +There are API functions that take a ``PyObject*`` arg which must be +of a more specific type (such as ``PyTuple_Size``, which fails if +its arg is not a ``PyTupleObject*``). It is an open question whether this +is a good pattern to have, or whether the API should expect the +more specific type +[`Issue 31 `__]. + +There are functions in the API that take concrete types, such as +``PyDict_GetItemString`` which performs a dictionary lookup for a key +specified as a C string rather than ``PyObject*``. At the same time, +for ``PyDict_ContainsString`` it is not considered appropriate to +add a concrete type alternative. The principle around this should +be documented in the guidelines +[`Issue 23 `__]. + +Implementation Flaws +-------------------- + +Below is a list of localized implementation flaws. Most of these can +probably be fixed incrementally, if we choose to do so. They should, +in any case, be avoided in any new API design. + +There are functions that don't follow the convention of +returning ``0`` for success and ``-1`` for failure. For +example, ``PyArg_ParseTuple`` returns 0 for success and +non-zero for failure +[`Issue 25 `__]. + +The macros ``Py_CLEAR`` and ``Py_SETREF`` access their arg more than +once, so if the arg is an expression with side effects, they are +duplicated +[`Issue 3 `__]. + +The meaning of ``Py_SIZE`` depends on the type and is not always +reliable +[`Issue 10 `__]. + +Some API function do not have the same behaviour as their Python +equivalents. The behaviour of ``PyIter_Next`` is different from +``tp_iternext``. +[`Issue 29 `__]. +The behaviour of ``PySet_Contains`` is different from ``set.__contains__`` +[`Issue 6 `__]. + +The fact that ``PyArg_ParseTupleAndKeywords`` takes a non-const +``char*`` array as argument makes it more difficult to use +[`Issue 28 `__]. + +``Python.h`` does not expose the whole API. Some headers (like ``marshal.h``) +are not included from ``Python.h``. +[`Issue 43 `__]. + +**Naming** + +``PyLong`` and ``PyUnicode`` use names which no longer match the Python +types they represent (``int``/``str``). This could be fixed in a new API +[`Issue 14 `__]. + +There are identifiers in the API which are lacking a ``Py``/``_Py`` +prefix +[`Issue 46 `__]. + +Missing Functionality +--------------------- + +This section consists of a list of feature requests, i.e., functionality +that was identified as missing in the current C API. + +Debug Mode +~~~~~~~~~~ + +A debug mode that can be activated without recompilation and which +activates various checks that can help detect various types of errors +[`Issue 36 `__]. + +Introspection +~~~~~~~~~~~~~ + +There aren't currently reliable introspection capabilities for objects +defined in C in the same way as there are for Python objects +[`Issue 32 `__]. + +Efficient type checking for heap types +[`Issue 17 `__]. + +Improved Interaction with Other Languages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Interfacing with other GC based languages, and integrating their +GC with Python's GC +[`Issue 19 `__]. + +Inject foreign stack frames to the traceback +[`Issue 18 `__]. + +Concrete strings that can be used in other languages +[`Issue 16 `__]. + +References +========== + +1. `Python/C API Reference Manual `__ +2. `2023 Language Summit Blog Post: Three Talks on the C API `__ +3. `capi-workgroup on GitHub `__ +4. `Irit's Core Sprint 2023 slides about C API workgroup `__ +5. `Petr's Core Sprint 2023 slides `__ +6. `HPy team's Core Sprint 2023 slides for Things to Learn from HPy `__ +7. `Victor's slides of Core Sprint 2023 Python C API talk `__ +8. `The Python's stability promise — Cristián Maureira-Fredes, PySide maintainer `__ +9. `Report on the issues PySide had 5 years ago when switching to the stable ABI `__ + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.