PEP 756: Add PyUnicode_EXPORT_ALLOW_COPY flag (#3988)

2024-09-24 23:03:10 +02:00 · 2024-09-24 23:03:10 +02:00 · f085d19db9
parent 680c8b1c13
commit f085d19db9
1 changed files with 35 additions and 9 deletions
--- a/peps/pep-0756.rst
+++ b/peps/pep-0756.rst
@ -21,9 +21,9 @@ Add functions to the limited C API version 3.14:
  view.
 * ``PyUnicode_Import()``: import a Python str object.
-In general, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
+By default, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
-copy is needed. See the :ref:`specification <export-complexity>` for
+is copied. See the :ref:`specification <export-complexity>` for cases
-cases when a copy is needed.
+when a copy is needed.
 Rationale
@ -95,6 +95,8 @@ Add the following API to the limited C API version 3.14::
    #define PyUnicode_FORMAT_UTF8  0x08   // char*
    #define PyUnicode_FORMAT_ASCII 0x10   // char* (ASCII string)
    #define PyUnicode_EXPORT_ALLOW_COPY 0x10000
 The ``int32_t`` type is used instead of ``int`` to have a well defined
 type size and not depend on the platform or the compiler.
 See `Avoid C-specific Types
@ -150,18 +152,41 @@ flags.
 Note that future versions of Python may introduce additional formats.
 By default, no memory is copied and no conversion is done.
 If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set in
 *requested_formats*, the function can copy memory to provide the
 requested format and convert from a format to another.
 The ``PyUnicode_EXPORT_ALLOW_COPY`` flag is needed to export to
 ``PyUnicode_FORMAT_UTF8`` a string containing surrogate characters.
 Available flags:
 ===============================  ===========  ===================================
 Flag                             Value        Description
 ===============================  ===========  ===================================
 ``PyUnicode_EXPORT_ALLOW_COPY``  ``0x10000``  Allow memory copies and conversions
 ===============================  ===========  ===================================
 .. _export-complexity:
 Export complexity
 -----------------
-In general, an export has a complexity of *O*\ (1): no memory copy is
+By default, an export has a complexity of *O*\ (1): no memory is copied
-needed. There are cases when a copy is needed, *O*\ (*n*) complexity:
+and no conversion is done. There is an exception: if only UTF-8 is
 requested and the UTF-8 cache is not filled, the string is encoded to
 UTF-8 to fill the cache.
 If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set, there are cases when a
 copy is needed, *O*\ (*n*) complexity:
 * If only UCS-2 is requested and the native format is UCS-1.
 * If only UCS-4 is requested and the native format is UCS-1 or UCS-2.
-* If only UTF-8 is requested: the string is encoded to UTF-8 at the
+* If only UTF-8 is requested and the string contains surrogate
-  first call, and then the encoded UTF-8 string is cached.
+  characters.
 To get the best performance on CPython and PyPy, it's recommended to
 support these 4 formats::
@ -236,8 +261,8 @@ The ``PyUnicode_FORMAT_ASCII`` format is mostly useful for
 characters.
-Surrogate characters and NUL characters
+Surrogate characters and embedded NUL characters
---------------------------------------
+------------------------------------------------
 Surrogate characters are allowed: they can be imported and exported. For
 example, the UTF-8 format uses the ``surrogatepass`` error handler.
@ -347,6 +372,7 @@ to return NULL on embedded null characters
 Rejecting embedded NUL characters require to scan the string which has
 an *O*\ (*n*) complexity.
 Reject surrogate characters
 ---------------------------