PEP 756: Remove Open Questions (#3968)

This commit is contained in:
Victor Stinner 2024-09-17 15:34:14 +02:00 committed by GitHub
parent 80f7aadb73
commit b6cf6d47f3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 21 additions and 20 deletions

View File

@ -102,7 +102,12 @@ longer rationale.
PyUnicode_Export()
------------------
API: ``int32_t PyUnicode_Export(PyObject *unicode, int32_t requested_formats, Py_buffer *view)``.
API::
int32_t PyUnicode_Export(
PyObject *unicode,
int32_t requested_formats,
Py_buffer *view)
Export the contents of the *unicode* string in one of the *requested_formats*.
@ -116,6 +121,10 @@ The contents of the buffer are valid until they are released.
The buffer is read-only and must not be modified.
The ``view->len`` member must be used to get the string length. The
buffer should end with a trailing NUL character, but it's not
recommended to rely on that because of embedded NUL characters.
*unicode* and *view* must not be NULL.
Available formats:
@ -152,7 +161,7 @@ needed. There are cases when a copy is needed, *O*\ (*n*) complexity:
* If only UTF-8 is requested: the string is encoded to UTF-8 at the
first call, and then the encoded UTF-8 string is cached.
To have an *O*\ (1) complexity on CPython and PyPy, it's recommended to
To get the best performance on CPython and PyPy, it's recommended to
support these 4 formats::
(PyUnicode_FORMAT_UCS1 \
@ -160,6 +169,10 @@ support these 4 formats::
| PyUnicode_FORMAT_UCS4 \
| PyUnicode_FORMAT_UTF8)
PyPy uses UTF-8 natively and so the ``PyUnicode_FORMAT_UTF8`` format is
recommended. It requires a memory copy, since PyPy ``str`` objects can
be moved in memory (PyPy uses a moving garbage collector).
Py_buffer format and item size
------------------------------
@ -181,7 +194,12 @@ Export format Buffer format Item size
PyUnicode_Import()
------------------
API: ``PyObject* PyUnicode_Import(const void *data, Py_ssize_t nbytes, int32_t format)``.
API::
PyObject* PyUnicode_Import(
const void *data,
Py_ssize_t nbytes,
int32_t format)
Create a Unicode string object from a buffer in a supported format.
@ -224,10 +242,6 @@ example, the UTF-8 format uses the ``surrogatepass`` error handler.
Embedded NUL characters are allowed: they can be imported and exported.
An exported string does not end with a trailing NUL character: the
``PyUnicode_Export()`` caller must use ``Py_buffer.len`` to get the
string length.
Implementation
==============
@ -242,19 +256,6 @@ There is no impact on the backward compatibility, only new C API
functions are added.
Open Questions
==============
* Should we guarantee that the exported buffer always ends with a NUL
character? Is it possible to implement it in *O*\ (1) complexity
in all Python implementations?
* Is it ok to allow surrogate characters?
* Should we add a flag to disallow embedded NUL characters? It would
have an *O*\ (*n*) complexity.
* Should we add a flag to disallow surrogate characters? It would
have an *O*\ (*n*) complexity.
Usage of PEP 393 C APIs
=======================