PEP 756: Add PyUnicode_EXPORT_ALLOW_COPY flag (#3988)
This commit is contained in:
parent
680c8b1c13
commit
f085d19db9
|
@ -21,9 +21,9 @@ Add functions to the limited C API version 3.14:
|
||||||
view.
|
view.
|
||||||
* ``PyUnicode_Import()``: import a Python str object.
|
* ``PyUnicode_Import()``: import a Python str object.
|
||||||
|
|
||||||
In general, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
|
By default, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
|
||||||
copy is needed. See the :ref:`specification <export-complexity>` for
|
is copied. See the :ref:`specification <export-complexity>` for cases
|
||||||
cases when a copy is needed.
|
when a copy is needed.
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
|
@ -95,6 +95,8 @@ Add the following API to the limited C API version 3.14::
|
||||||
#define PyUnicode_FORMAT_UTF8 0x08 // char*
|
#define PyUnicode_FORMAT_UTF8 0x08 // char*
|
||||||
#define PyUnicode_FORMAT_ASCII 0x10 // char* (ASCII string)
|
#define PyUnicode_FORMAT_ASCII 0x10 // char* (ASCII string)
|
||||||
|
|
||||||
|
#define PyUnicode_EXPORT_ALLOW_COPY 0x10000
|
||||||
|
|
||||||
The ``int32_t`` type is used instead of ``int`` to have a well defined
|
The ``int32_t`` type is used instead of ``int`` to have a well defined
|
||||||
type size and not depend on the platform or the compiler.
|
type size and not depend on the platform or the compiler.
|
||||||
See `Avoid C-specific Types
|
See `Avoid C-specific Types
|
||||||
|
@ -150,18 +152,41 @@ flags.
|
||||||
|
|
||||||
Note that future versions of Python may introduce additional formats.
|
Note that future versions of Python may introduce additional formats.
|
||||||
|
|
||||||
|
By default, no memory is copied and no conversion is done.
|
||||||
|
|
||||||
|
If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set in
|
||||||
|
*requested_formats*, the function can copy memory to provide the
|
||||||
|
requested format and convert from a format to another.
|
||||||
|
|
||||||
|
The ``PyUnicode_EXPORT_ALLOW_COPY`` flag is needed to export to
|
||||||
|
``PyUnicode_FORMAT_UTF8`` a string containing surrogate characters.
|
||||||
|
|
||||||
|
Available flags:
|
||||||
|
|
||||||
|
=============================== =========== ===================================
|
||||||
|
Flag Value Description
|
||||||
|
=============================== =========== ===================================
|
||||||
|
``PyUnicode_EXPORT_ALLOW_COPY`` ``0x10000`` Allow memory copies and conversions
|
||||||
|
=============================== =========== ===================================
|
||||||
|
|
||||||
|
|
||||||
.. _export-complexity:
|
.. _export-complexity:
|
||||||
|
|
||||||
Export complexity
|
Export complexity
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
In general, an export has a complexity of *O*\ (1): no memory copy is
|
By default, an export has a complexity of *O*\ (1): no memory is copied
|
||||||
needed. There are cases when a copy is needed, *O*\ (*n*) complexity:
|
and no conversion is done. There is an exception: if only UTF-8 is
|
||||||
|
requested and the UTF-8 cache is not filled, the string is encoded to
|
||||||
|
UTF-8 to fill the cache.
|
||||||
|
|
||||||
|
If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set, there are cases when a
|
||||||
|
copy is needed, *O*\ (*n*) complexity:
|
||||||
|
|
||||||
* If only UCS-2 is requested and the native format is UCS-1.
|
* If only UCS-2 is requested and the native format is UCS-1.
|
||||||
* If only UCS-4 is requested and the native format is UCS-1 or UCS-2.
|
* If only UCS-4 is requested and the native format is UCS-1 or UCS-2.
|
||||||
* If only UTF-8 is requested: the string is encoded to UTF-8 at the
|
* If only UTF-8 is requested and the string contains surrogate
|
||||||
first call, and then the encoded UTF-8 string is cached.
|
characters.
|
||||||
|
|
||||||
To get the best performance on CPython and PyPy, it's recommended to
|
To get the best performance on CPython and PyPy, it's recommended to
|
||||||
support these 4 formats::
|
support these 4 formats::
|
||||||
|
@ -236,8 +261,8 @@ The ``PyUnicode_FORMAT_ASCII`` format is mostly useful for
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
|
||||||
Surrogate characters and NUL characters
|
Surrogate characters and embedded NUL characters
|
||||||
---------------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
Surrogate characters are allowed: they can be imported and exported. For
|
Surrogate characters are allowed: they can be imported and exported. For
|
||||||
example, the UTF-8 format uses the ``surrogatepass`` error handler.
|
example, the UTF-8 format uses the ``surrogatepass`` error handler.
|
||||||
|
@ -347,6 +372,7 @@ to return NULL on embedded null characters
|
||||||
Rejecting embedded NUL characters require to scan the string which has
|
Rejecting embedded NUL characters require to scan the string which has
|
||||||
an *O*\ (*n*) complexity.
|
an *O*\ (*n*) complexity.
|
||||||
|
|
||||||
|
|
||||||
Reject surrogate characters
|
Reject surrogate characters
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue