diff --git a/peps/pep-0756.rst b/peps/pep-0756.rst index 926f918ac..be6174682 100644 --- a/peps/pep-0756.rst +++ b/peps/pep-0756.rst @@ -21,9 +21,9 @@ Add functions to the limited C API version 3.14: view. * ``PyUnicode_Import()``: import a Python str object. -In general, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory -copy is needed. See the :ref:`specification ` for -cases when a copy is needed. +By default, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory +is copied. See the :ref:`specification ` for cases +when a copy is needed. Rationale @@ -95,6 +95,8 @@ Add the following API to the limited C API version 3.14:: #define PyUnicode_FORMAT_UTF8 0x08 // char* #define PyUnicode_FORMAT_ASCII 0x10 // char* (ASCII string) + #define PyUnicode_EXPORT_ALLOW_COPY 0x10000 + The ``int32_t`` type is used instead of ``int`` to have a well defined type size and not depend on the platform or the compiler. See `Avoid C-specific Types @@ -150,18 +152,41 @@ flags. Note that future versions of Python may introduce additional formats. +By default, no memory is copied and no conversion is done. + +If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set in +*requested_formats*, the function can copy memory to provide the +requested format and convert from a format to another. + +The ``PyUnicode_EXPORT_ALLOW_COPY`` flag is needed to export to +``PyUnicode_FORMAT_UTF8`` a string containing surrogate characters. + +Available flags: + +=============================== =========== =================================== +Flag Value Description +=============================== =========== =================================== +``PyUnicode_EXPORT_ALLOW_COPY`` ``0x10000`` Allow memory copies and conversions +=============================== =========== =================================== + + .. _export-complexity: Export complexity ----------------- -In general, an export has a complexity of *O*\ (1): no memory copy is -needed. There are cases when a copy is needed, *O*\ (*n*) complexity: +By default, an export has a complexity of *O*\ (1): no memory is copied +and no conversion is done. There is an exception: if only UTF-8 is +requested and the UTF-8 cache is not filled, the string is encoded to +UTF-8 to fill the cache. + +If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set, there are cases when a +copy is needed, *O*\ (*n*) complexity: * If only UCS-2 is requested and the native format is UCS-1. * If only UCS-4 is requested and the native format is UCS-1 or UCS-2. -* If only UTF-8 is requested: the string is encoded to UTF-8 at the - first call, and then the encoded UTF-8 string is cached. +* If only UTF-8 is requested and the string contains surrogate + characters. To get the best performance on CPython and PyPy, it's recommended to support these 4 formats:: @@ -236,8 +261,8 @@ The ``PyUnicode_FORMAT_ASCII`` format is mostly useful for characters. -Surrogate characters and NUL characters ---------------------------------------- +Surrogate characters and embedded NUL characters +------------------------------------------------ Surrogate characters are allowed: they can be imported and exported. For example, the UTF-8 format uses the ``surrogatepass`` error handler. @@ -347,6 +372,7 @@ to return NULL on embedded null characters Rejecting embedded NUL characters require to scan the string which has an *O*\ (*n*) complexity. + Reject surrogate characters ---------------------------