python-peps/peps/pep-0757.rst

449 lines
13 KiB
ReStructuredText
Raw Normal View History

PEP: 757
Title: C API to import-export Python integers
Author: Sergey B Kirpichev <skirpichev@gmail.com>,
Victor Stinner <vstinner@python.org>
PEP-Delegate: C API Working Group
2024-09-14 08:26:39 -04:00
Discussions-To: https://discuss.python.org/t/63895
Status: Draft
Type: Standards Track
Created: 13-Sep-2024
Python-Version: 3.14
2024-09-14 08:26:39 -04:00
Post-History: `14-Sep-2024 <https://discuss.python.org/t/63895>`__
.. highlight:: c
Abstract
========
2024-09-14 08:26:39 -04:00
Add a new C API to import and export Python integers, :class:`int` objects:
2024-09-17 11:38:13 -04:00
especially ``PyLongWriter_Create()`` and ``PyLong_Export()`` functions.
Rationale
=========
Projects such as `gmpy2 <https://github.com/aleaxit/gmpy>`_, `SAGE
<https://www.sagemath.org/>`_ and `Python-FLINT
<https://github.com/flintlib/python-flint>`_ access directly Python
"internals" (the ``PyLongObject`` structure) or use an inefficient
temporary format (hex strings for Python-FLINT) to import and
2024-09-14 08:26:39 -04:00
export Python :class:`int` objects. The Python :class:`int` implementation
changed in Python 3.12 to add a tag and "compact values".
In the 3.13 alpha 1 release, the private undocumented ``_PyLong_New()``
function had been removed, but it is being used by these projects to
import Python integers. The private function has been restored in 3.13
alpha 2.
A public efficient abstraction is needed to interface Python with these
projects without exposing implementation details. It would allow Python
to change its internals without breaking these projects. For example,
implementation for gmpy2 was changed recently for CPython 3.9 and
for CPython 3.12.
Specification
=============
Layout API
----------
API::
typedef struct PyLongLayout {
// Bits per digit
uint8_t bits_per_digit;
// Digit size in bytes
uint8_t digit_size;
// Digits order:
// * 1 for most significant digit first
// * -1 for least significant digit first
int8_t digits_order;
// Endianness:
// * 1 for most significant byte first (big endian)
// * -1 for least significant byte first (little endian)
int8_t endianness;
} PyLongLayout;
PyAPI_FUNC(const PyLongLayout*) PyLong_GetNativeLayout(void);
Data needed by `GMP <https://gmplib.org/>`_-like import-export functions.
PyLong_GetNativeLayout()
^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-14 08:26:39 -04:00
API::
const PyLongLayout* PyLong_GetNativeLayout(void)
2024-09-14 08:26:39 -04:00
Get the native layout of Python :class:`int` objects.
The function must not be called before Python initialization nor after
Python finalization. The returned layout is valid until Python is
finalized. The layout is the same for all Python sub-interpreters and
so it can be cached.
Export API
----------
Export a Python integer as a digits array::
2024-09-17 11:38:13 -04:00
typedef struct PyLongExport {
// use value, if digits set to NULL.
int64_t value;
// 1 if the number is negative, 0 otherwise.
2024-09-17 11:38:13 -04:00
uint8_t negative;
// Number of digits in the 'digits' array.
Py_ssize_t ndigits;
// Read-only array of unsigned digits.
const void *digits;
// Member used internally, must not be used for other purpose.
Py_uintptr_t _reserved;
2024-09-17 11:38:13 -04:00
} PyLongExport;
2024-09-17 11:38:13 -04:00
int PyLong_Export(PyObject *obj, PyLongExport *array);
void PyLong_FreeExport(PyLongExport *array);
On CPython 3.14, no memory copy is needed, it's just a thin wrapper to
expose Python int internal digits array.
2024-09-17 11:38:13 -04:00
``PyLongExport._reserved``, if ``digits`` not ``NULL``, stores a strong
reference to the Python :class:`int` object to make sure that that structure
remains valid until ``PyLong_FreeExport()`` is called.
2024-09-17 11:38:13 -04:00
PyLong_Export()
^^^^^^^^^^^^^^^
2024-09-14 08:26:39 -04:00
API::
2024-09-17 11:38:13 -04:00
int PyLong_Export(PyObject *obj, PyLongExport *array)
2024-09-14 08:26:39 -04:00
Export a Python :class:`int` object as a digits array.
On success, set *\*array* and return 0.
On error, set an exception and return -1.
2024-09-17 11:38:13 -04:00
If ``array->digits`` set to ``NULL``, caller must use instead ``array->value``
to get value of an :class:`int` object.
2024-09-17 11:38:13 -04:00
CPython implementation detail: This function always succeeds if *obj* is a
Python :class:`int` object or a subclass.
2024-09-17 11:38:13 -04:00
``PyLong_FreeExport()`` must be called once done with using *array*.
2024-09-17 11:38:13 -04:00
PyLong_FreeExport()
^^^^^^^^^^^^^^^^^^^
2024-09-14 08:26:39 -04:00
API::
2024-09-17 11:38:13 -04:00
void PyLong_FreeExport(PyLongExport *array)
2024-09-17 11:38:13 -04:00
Free the export *array* created by ``PyLong_Export()``.
Import API
----------
Import a Python integer from a digits array::
// A Python integer writer instance.
// The instance must be destroyed by PyLongWriter_Finish().
typedef struct PyLongWriter PyLongWriter;
PyAPI_FUNC(PyLongWriter*) PyLongWriter_Create(
int negative,
Py_ssize_t ndigits,
void **digits);
PyAPI_FUNC(PyObject*) PyLongWriter_Finish(PyLongWriter *writer);
PyAPI_FUNC(void) PyLongWriter_Discard(PyLongWriter *writer);
On CPython 3.14, the implementation is a thin wrapper to the private
``_PyLong_New()`` function.
``PyLongWriter_Finish()`` takes care of normalizing the digits and
2024-09-15 08:33:16 -04:00
converts the object to a compact integer if needed.
PyLongWriter_Create()
^^^^^^^^^^^^^^^^^^^^^
2024-09-14 08:26:39 -04:00
API::
PyLongWriter* PyLongWriter_Create(int negative, Py_ssize_t ndigits, void **digits)
Create a ``PyLongWriter``.
On success, set *\*digits* and return a writer.
On error, set an exception and return ``NULL``.
*negative* is ``1`` if the number is negative, or ``0`` otherwise.
*ndigits* is the number of digits in the *digits* array. It must be
greater than or equal to 0.
The caller must initialize the digits array *digits* and then call
2024-09-14 08:26:39 -04:00
``PyLongWriter_Finish()`` to get a Python :class:`int`. Digits must be
in the range [``0``; ``PyLong_BASE - 1``]. Unused digits must be set to
``0``.
PyLongWriter_Finish()
^^^^^^^^^^^^^^^^^^^^^
2024-09-14 08:26:39 -04:00
API::
PyObject* PyLongWriter_Finish(PyLongWriter *writer)
Finish a ``PyLongWriter`` created by ``PyLongWriter_Create()``.
2024-09-14 08:26:39 -04:00
On success, return a Python :class:`int` object.
On error, set an exception and return ``NULL``.
PyLongWriter_Discard()
^^^^^^^^^^^^^^^^^^^^^^
2024-09-14 08:26:39 -04:00
API::
void PyLongWriter_Discard(PyLongWriter *writer)
Discard the internal object and destroy the writer instance.
Optimize import for small integers
==================================
Proposed import API is efficient for large integers. Compared to
accessing directly Python internals, the proposed import API can have a
significant performance overhead on small integers.
For small integers of a few digits (for example, 1 or 2 digits), existing APIs
can be used:
* :external+py3.14:c:func:`PyLong_FromUInt64()`;
* :c:func:`PyLong_FromLong()`;
* :c:func:`PyLong_FromNativeBytes()`.
Implementation
==============
* CPython:
* https://github.com/python/cpython/pull/121339
* https://github.com/vstinner/cpython/pull/5
* gmpy:
* https://github.com/aleaxit/gmpy/pull/495
Benchmarks
==========
2024-09-17 11:38:13 -04:00
Export: PyLong_Export() with gmpy2
----------------------------------
Code::
static void
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
2024-09-17 11:38:13 -04:00
const PyLongLayout* layout = PyLong_GetNativeLayout();
static PyLongExport long_export;
2024-09-17 11:38:13 -04:00
PyLong_Export(obj, &long_export);
if (long_export.digits) {
mpz_import(z, long_export.ndigits, layout->digits_order,
layout->digit_size, layout->endianness,
layout->digit_size*8 - layout->bits_per_digit,
long_export.digits);
if (long_export.negative) {
mpz_neg(z, z);
}
2024-09-17 11:38:13 -04:00
PyLong_FreeExport(&long_export);
}
else {
2024-09-17 11:38:13 -04:00
if (LONG_MIN <= long_export.value && long_export.value <= LONG_MAX) {
mpz_set_si(z, long_export.value);
}
else {
mpz_import(z, 1, -1, sizeof(int64_t), 0, 0,
&long_export.value);
if (long_export.value < 0) {
mpz_t tmp;
mpz_init(tmp);
mpz_ui_pow_ui(tmp, 2, 8*sizeof(size_t));
mpz_sub(z, z, tmp);
mpz_clear(tmp);
}
}
}
}
Benchmark:
.. code-block:: py
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', mpz, 1 << 7)
runner.bench_func('1<<38', mpz, 1 << 38)
runner.bench_func('1<<300', mpz, 1 << 300)
runner.bench_func('1<<3000', mpz, 1 << 3000)
Results on Linux Fedora 40 with CPU isolation, Python built in release
mode:
+----------------+---------+-----------------------+
| Benchmark | ref | pep757 |
+================+=========+=======================+
| 1<<7 | 91.3 ns | 89.9 ns: 1.02x faster |
+----------------+---------+-----------------------+
| 1<<38 | 120 ns | 94.9 ns: 1.27x faster |
+----------------+---------+-----------------------+
| 1<<300 | 196 ns | 203 ns: 1.04x slower |
+----------------+---------+-----------------------+
| 1<<3000 | 939 ns | 945 ns: 1.01x slower |
+----------------+---------+-----------------------+
| Geometric mean | (ref) | 1.05x faster |
+----------------+---------+-----------------------+
Import: PyLongWriter_Create() with gmpy2
----------------------------------------
Code::
static PyObject *
GMPy_PyLong_From_MPZ(MPZ_Object *obj, CTXT_Object *context)
{
if (mpz_fits_slong_p(obj->z)) {
return PyLong_FromLong(mpz_get_si(obj->z));
}
const PyLongLayout *layout = PyLong_GetNativeLayout();
size_t size = (mpz_sizeinbase(obj->z, 2) +
layout->bits_per_digit - 1) / layout->bits_per_digit;
void *digits;
PyLongWriter *writer = PyLongWriter_Create(mpz_sgn(obj->z) < 0, size,
&digits);
if (writer == NULL) {
return NULL;
}
mpz_export(digits, NULL, layout->endianness,
layout->digit_size, layout->digits_order,
layout->digit_size*8 - layout->bits_per_digit,
obj->z);
return PyLongWriter_Finish(writer);
}
Benchmark:
.. code-block:: py
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', int, mpz(1 << 7))
runner.bench_func('1<<38', int, mpz(1 << 38))
runner.bench_func('1<<300', int, mpz(1 << 300))
runner.bench_func('1<<3000', int, mpz(1 << 3000))
Results on Linux Fedora 40 with CPU isolation, Python built in release
mode:
+----------------+---------+-----------------------+
| Benchmark | ref | pep757 |
+================+=========+=======================+
| 1<<7 | 56.7 ns | 56.2 ns: 1.01x faster |
+----------------+---------+-----------------------+
| 1<<300 | 191 ns | 213 ns: 1.12x slower |
+----------------+---------+-----------------------+
| Geometric mean | (ref) | 1.03x slower |
+----------------+---------+-----------------------+
Benchmark hidden because not significant (2): 1<<38, 1<<3000.
Backwards Compatibility
=======================
There is no impact on the backward compatibility, only new APIs are
added.
2024-09-14 08:26:39 -04:00
Open Questions
==============
* Should we add *digits_order* and *endianness* members to :data:`sys.int_info`
and remove ``PyLong_GetNativeLayout()``? The
``PyLong_GetNativeLayout()`` function returns a C structure
2024-09-14 08:26:39 -04:00
which is more convenient to use in C than :data:`sys.int_info` which uses
Python objects.
2024-09-14 08:26:39 -04:00
* Currenly, all required information for :class:`int` import/export is
already available via :c:func:`PyLong_GetInfo()` or :data:`sys.int_info`.
Native endianness of "digits" and current order of digits (least
significant digit first) --- is a common denominator of all libraries
for aribitrary precision integer arithmetic. So, shouldn't we just remove
from API both ``PyLongLayout`` and ``PyLong_GetNativeLayout()`` (which
is actually just a minor convenience)?
Rejected Ideas
==============
Support arbitrary layout
------------------------
It would be convenient to support arbitrary layout to import-export
Python integers.
For example, it was proposed to add a *layout* parameter to
``PyLongWriter_Create()`` and a *layout* member to the
2024-09-17 11:38:13 -04:00
``PyLongExport`` structure.
The problem is that it's more complex to implement and not really
needed. What's strictly needed is only an API to import-export using the
Python "native" layout.
If later there are use cases for arbitrary layouts, new APIs can be
added.
Discussions
===========
* https://github.com/capi-workgroup/decisions/issues/35
* https://github.com/python/cpython/pull/121339
* https://github.com/python/cpython/issues/102471
* `Add public function PyLong_GetDigits()
<https://github.com/capi-workgroup/decisions/issues/31>`_
* `Consider restoring _PyLong_New() function as public
<https://github.com/python/cpython/issues/111415>`_
* `gh-106320: Remove private _PyLong_New() function
<https://github.com/python/cpython/pull/108604>`_
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.