diff --git a/peps/pep-0756.rst b/peps/pep-0756.rst index 89aa38eeb..9de988b19 100644 --- a/peps/pep-0756.rst +++ b/peps/pep-0756.rst @@ -102,7 +102,12 @@ longer rationale. PyUnicode_Export() ------------------ -API: ``int32_t PyUnicode_Export(PyObject *unicode, int32_t requested_formats, Py_buffer *view)``. +API:: + + int32_t PyUnicode_Export( + PyObject *unicode, + int32_t requested_formats, + Py_buffer *view) Export the contents of the *unicode* string in one of the *requested_formats*. @@ -116,6 +121,10 @@ The contents of the buffer are valid until they are released. The buffer is read-only and must not be modified. +The ``view->len`` member must be used to get the string length. The +buffer should end with a trailing NUL character, but it's not +recommended to rely on that because of embedded NUL characters. + *unicode* and *view* must not be NULL. Available formats: @@ -152,7 +161,7 @@ needed. There are cases when a copy is needed, *O*\ (*n*) complexity: * If only UTF-8 is requested: the string is encoded to UTF-8 at the first call, and then the encoded UTF-8 string is cached. -To have an *O*\ (1) complexity on CPython and PyPy, it's recommended to +To get the best performance on CPython and PyPy, it's recommended to support these 4 formats:: (PyUnicode_FORMAT_UCS1 \ @@ -160,6 +169,10 @@ support these 4 formats:: | PyUnicode_FORMAT_UCS4 \ | PyUnicode_FORMAT_UTF8) +PyPy uses UTF-8 natively and so the ``PyUnicode_FORMAT_UTF8`` format is +recommended. It requires a memory copy, since PyPy ``str`` objects can +be moved in memory (PyPy uses a moving garbage collector). + Py_buffer format and item size ------------------------------ @@ -181,7 +194,12 @@ Export format Buffer format Item size PyUnicode_Import() ------------------ -API: ``PyObject* PyUnicode_Import(const void *data, Py_ssize_t nbytes, int32_t format)``. +API:: + + PyObject* PyUnicode_Import( + const void *data, + Py_ssize_t nbytes, + int32_t format) Create a Unicode string object from a buffer in a supported format. @@ -224,10 +242,6 @@ example, the UTF-8 format uses the ``surrogatepass`` error handler. Embedded NUL characters are allowed: they can be imported and exported. -An exported string does not end with a trailing NUL character: the -``PyUnicode_Export()`` caller must use ``Py_buffer.len`` to get the -string length. - Implementation ============== @@ -242,19 +256,6 @@ There is no impact on the backward compatibility, only new C API functions are added. -Open Questions -============== - -* Should we guarantee that the exported buffer always ends with a NUL - character? Is it possible to implement it in *O*\ (1) complexity - in all Python implementations? -* Is it ok to allow surrogate characters? -* Should we add a flag to disallow embedded NUL characters? It would - have an *O*\ (*n*) complexity. -* Should we add a flag to disallow surrogate characters? It would - have an *O*\ (*n*) complexity. - - Usage of PEP 393 C APIs =======================