PEP: 757 Title: C API to import-export Python integers Author: Sergey B Kirpichev , Victor Stinner PEP-Delegate: C API Working Group Discussions-To: https://discuss.python.org/t/63895 Status: Draft Type: Standards Track Created: 13-Sep-2024 Python-Version: 3.14 Post-History: `14-Sep-2024 `__ .. highlight:: c Abstract ======== Add a new C API to import and export Python integers, :class:`int` objects: especially :c:func:`PyLongWriter_Create()` and :c:func:`PyLong_Export()` functions. Rationale ========= Projects such as `gmpy2 `_, `SAGE `_ and `Python-FLINT `_ access directly Python "internals" (the :c:type:`PyLongObject` structure) or use an inefficient temporary format (hex strings for Python-FLINT) to import and export Python :class:`int` objects. The Python :class:`int` implementation changed in Python 3.12 to add a tag and "compact values". In the 3.13 alpha 1 release, the private undocumented :c:func:`!_PyLong_New()` function had been removed, but it is being used by these projects to import Python integers. The private function has been restored in 3.13 alpha 2. A public efficient abstraction is needed to interface Python with these projects without exposing implementation details. It would allow Python to change its internals without breaking these projects. For example, implementation for gmpy2 was changed recently for CPython 3.9 and for CPython 3.12. Specification ============= Layout API ---------- Data needed by `GMP `_-like import-export functions. .. c:struct:: PyLongLayout Layout of an array of digits, used by Python :class:`int` object. Use :c:func:`PyLong_GetNativeLayout` to get the native layout of Python :class:`int` objects. See also :data:`sys.int_info` which exposes similar information to Python. .. c:member:: uint8_t bits_per_digit Bits per digit. .. c:member:: uint8_t digit_size Digit size in bytes. .. c:member:: int8_t digits_order Digits order: - ``1`` for most significant digit first - ``-1`` for least significant digit first .. c:member:: int8_t endianness Digit endianness: - ``1`` for most significant byte first (big endian) - ``-1`` for least significant first (little endian) .. c:function:: const PyLongLayout* PyLong_GetNativeLayout(void) Get the native layout of Python :class:`int` objects. See the :c:struct:`PyLongLayout` structure. The function must not be called before Python initialization nor after Python finalization. The returned layout is valid until Python is finalized. The layout is the same for all Python sub-interpreters and so it can be cached. Export API ---------- .. c:struct:: PyLongExport Export of a Python :class:`int` object. There are two cases: * If :c:member:`digits` is ``NULL``, only use the :c:member:`value` member. Calling :c:func:`PyLong_FreeExport` is optional in this case. * If :c:member:`digits` is not ``NULL``, use :c:member:`negative`, :c:member:`ndigits` and :c:member:`digits` members. Calling :c:func:`PyLong_FreeExport` is mandatory in this case. .. c:member:: int64_t value The native integer value of the exported :class:`int` object. Only valid if :c:member:`digits` is ``NULL``. .. c:member:: uint8_t negative 1 if the number is negative, 0 otherwise. Only valid if :c:member:`digits` is not ``NULL``. .. c:member:: Py_ssize_t ndigits Number of digits in :c:member:`digits` array. Only valid if :c:member:`digits` is not ``NULL``. .. c:member:: const void *digits Read-only array of unsigned digits. Can be ``NULL``. If :c:member:`digits` not ``NULL``, a private field of the :c:struct:`PyLongExport` structure stores a strong reference to the Python :class:`int` object to make sure that that structure remains valid until :c:func:`PyLong_FreeExport()` is called. .. c:function:: int PyLong_Export(PyObject *obj, PyLongExport *export_long) Export a Python :class:`int` object. On success, set *\*export_long* and return 0. On error, set an exception and return -1. This function always succeeds if *obj* is a Python :class:`int` object or a subclass. If *export_long->digits* is not ``NULL``, :c:func:`PyLong_FreeExport` must be called when the export is no longer needed. On CPython 3.14, no memory copy is needed, it's just a thin wrapper to expose Python int internal digits array. .. c:function:: void PyLong_FreeExport(PyLongExport *export_long) Release the export *export_long* created by :c:func:`PyLong_Export`. Import API ---------- The :c:type:`PyLongWriter` API can be used to import an integer: create a Python :class:`int` object from a digits array. .. c:struct:: PyLongWriter A Python :class:`int` writer instance. The instance must be destroyed by :c:func:`PyLongWriter_Finish`. .. c:function:: PyLongWriter* PyLongWriter_Create(int negative, Py_ssize_t ndigits, void **digits) Create a :c:type:`PyLongWriter`. On success, set *\*digits* and return a writer. On error, set an exception and return ``NULL``. *negative* is ``1`` if the number is negative, or ``0`` otherwise. *ndigits* is the number of digits in the *digits* array. It must be greater than or equal to 0. The caller must initialize the array of digits *digits* and then call :c:func:`PyLongWriter_Finish` to get a Python :class:`int`. Digits must be in the range [``0``; ``(1 << sys.int_info.bits_per_digit) - 1``]. Unused digits must be set to ``0``. On CPython 3.14, the implementation is a thin wrapper to the private :c:func:`!_PyLong_New()` function. .. c:function:: PyObject* PyLongWriter_Finish(PyLongWriter *writer) Finish a :c:type:`PyLongWriter` created by :c:func:`PyLongWriter_Create`. On success, return a Python :class:`int` object. On error, set an exception and return ``NULL``. The function takes care of normalizing the digits and converts the object to a compact integer if needed. .. c:function:: void PyLongWriter_Discard(PyLongWriter *writer) Discard the internal object and destroy the writer instance. Optimize import for small integers ================================== Proposed import API is efficient for large integers. Compared to accessing directly Python internals, the proposed import API can have a significant performance overhead on small integers. For small integers of a few digits (for example, 1 or 2 digits), existing APIs can be used: * :external+py3.14:c:func:`PyLong_FromUInt64()`; * :c:func:`PyLong_FromLong()`; * :c:func:`PyLong_FromNativeBytes()`. Implementation ============== * CPython: * https://github.com/python/cpython/pull/121339 * https://github.com/vstinner/cpython/pull/5 * gmpy: * https://github.com/aleaxit/gmpy/pull/495 Benchmarks ========== Code:: /* Query parameters of Python’s internal representation of integers. */ const PyLongLayout *layout = PyLong_GetNativeLayout(); size_t int_digit_size = layout->digit_size; int int_digits_order = layout->digits_order; size_t int_bits_per_digit = layout->bits_per_digit; size_t int_nails = int_digit_size*8 - int_bits_per_digit; int int_endianness = layout->endianness; Export: :c:func:`PyLong_Export()` with gmpy2 -------------------------------------------- Code:: static void mpz_set_PyLong(mpz_t z, PyObject *obj) { static PyLongExport long_export; PyLong_Export(obj, &long_export); if (long_export.digits) { mpz_import(z, long_export.ndigits, int_digits_order, int_digit_size, int_endianness, int_nails, long_export.digits); if (long_export.negative) { mpz_neg(z, z); } PyLong_FreeExport(&long_export); } else { const int64_t value = long_export.value; if (LONG_MIN <= value && value <= LONG_MAX) { mpz_set_si(z, value); } else { mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value); if (value < 0) { mpz_t tmp; mpz_init(tmp); mpz_ui_pow_ui(tmp, 2, 64); mpz_sub(z, z, tmp); mpz_clear(tmp); } } } } Reference code: `mpz_set_PyLong() in the gmpy2 master for commit 9177648 `_. Benchmark: .. code-block:: py import pyperf from gmpy2 import mpz runner = pyperf.Runner() runner.bench_func('1<<7', mpz, 1 << 7) runner.bench_func('1<<38', mpz, 1 << 38) runner.bench_func('1<<300', mpz, 1 << 300) runner.bench_func('1<<3000', mpz, 1 << 3000) Results on Linux Fedora 40 with CPU isolation, Python built in release mode: +----------------+---------+-----------------------+ | Benchmark | ref | pep757 | +================+=========+=======================+ | 1<<7 | 91.3 ns | 89.9 ns: 1.02x faster | +----------------+---------+-----------------------+ | 1<<38 | 120 ns | 94.9 ns: 1.27x faster | +----------------+---------+-----------------------+ | 1<<300 | 196 ns | 203 ns: 1.04x slower | +----------------+---------+-----------------------+ | 1<<3000 | 939 ns | 945 ns: 1.01x slower | +----------------+---------+-----------------------+ | Geometric mean | (ref) | 1.05x faster | +----------------+---------+-----------------------+ Import: :c:func:`PyLongWriter_Create()` with gmpy2 -------------------------------------------------- Code:: static PyObject * GMPy_PyLong_From_MPZ(MPZ_Object *obj, CTXT_Object *context) { if (mpz_fits_slong_p(obj->z)) { return PyLong_FromLong(mpz_get_si(obj->z)); } size_t size = (mpz_sizeinbase(obj->z, 2) + int_bits_per_digit - 1) / int_bits_per_digit; void *digits; PyLongWriter *writer = PyLongWriter_Create(mpz_sgn(obj->z) < 0, size, &digits); if (writer == NULL) { return NULL; } mpz_export(digits, NULL, int_digits_order, int_digit_size, int_endianness, int_nails, obj->z); return PyLongWriter_Finish(writer); } Reference code: `GMPy_PyLong_From_MPZ() in the gmpy2 master for commit 9177648 `_. Benchmark: .. code-block:: py import pyperf from gmpy2 import mpz runner = pyperf.Runner() runner.bench_func('1<<7', int, mpz(1 << 7)) runner.bench_func('1<<38', int, mpz(1 << 38)) runner.bench_func('1<<300', int, mpz(1 << 300)) runner.bench_func('1<<3000', int, mpz(1 << 3000)) Results on Linux Fedora 40 with CPU isolation, Python built in release mode: +----------------+---------+-----------------------+ | Benchmark | ref | pep757 | +================+=========+=======================+ | 1<<7 | 56.7 ns | 56.2 ns: 1.01x faster | +----------------+---------+-----------------------+ | 1<<300 | 191 ns | 213 ns: 1.12x slower | +----------------+---------+-----------------------+ | Geometric mean | (ref) | 1.03x slower | +----------------+---------+-----------------------+ Benchmark hidden because not significant (2): 1<<38, 1<<3000. Backwards Compatibility ======================= There is no impact on the backward compatibility, only new APIs are added. Open Questions ============== * Should we add *digits_order* and *endianness* members to :data:`sys.int_info` and remove :c:func:`PyLong_GetNativeLayout()`? The :c:func:`PyLong_GetNativeLayout()` function returns a C structure which is more convenient to use in C than :data:`sys.int_info` which uses Python objects. * Currently, all required information for :class:`int` import/export is already available via :c:func:`PyLong_GetInfo()` or :data:`sys.int_info`. Native endianness of "digits" and current order of digits (least significant digit first) --- is a common denominator of all libraries for arbitrary precision integer arithmetic. So, shouldn't we just remove from API both :c:struct:`PyLongLayout` and :c:func:`PyLong_GetNativeLayout()` (which is actually just a minor convenience)? Rejected Ideas ============== Support arbitrary layout ------------------------ It would be convenient to support arbitrary layout to import-export Python integers. For example, it was proposed to add a *layout* parameter to :c:func:`PyLongWriter_Create()` and a *layout* member to the :c:struct:`PyLongExport` structure. The problem is that it's more complex to implement and not really needed. What's strictly needed is only an API to import-export using the Python "native" layout. If later there are use cases for arbitrary layouts, new APIs can be added. Discussions =========== * Discourse: `PEP 757 – C API to import-export Python integers `_ * `C API Working Group decision issue #35 `_ * `Pull request #121339 `_ * `Issue #102471 `_: The C-API for Python to C integer conversion is, to be frank, a mess. * `Add public function PyLong_GetDigits() `_ * `Consider restoring _PyLong_New() function as public `_ * `Pull request gh-106320 `_: Remove private _PyLong_New() function. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.