PEP 624: Update alternative ideas (#1793)
Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs.
This commit is contained in:
parent
d3f48ed58f
commit
814daa8aea
117
pep-0624.rst
117
pep-0624.rst
|
@ -33,7 +33,7 @@ This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in Python 3.1
|
||||||
`PEP 623 <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove
|
`PEP 623 <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove
|
||||||
Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP
|
Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP
|
||||||
is not relating to Unicode object. These PEPs are split because they have
|
is not relating to Unicode object. These PEPs are split because they have
|
||||||
different motivation and need different discussion.
|
different motivations and need different discussions.
|
||||||
|
|
||||||
|
|
||||||
Motivation
|
Motivation
|
||||||
|
@ -51,7 +51,7 @@ Rationale
|
||||||
Deprecated since Python 3.3
|
Deprecated since Python 3.3
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
``Py_UNICODE`` and APIs using it have been deprecated since Python 3.3.
|
``Py_UNICODE`` and APIs using it has been deprecated since Python 3.3.
|
||||||
|
|
||||||
|
|
||||||
Inefficient
|
Inefficient
|
||||||
|
@ -65,7 +65,7 @@ object.
|
||||||
Not used widely
|
Not used widely
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
When searching from top 4000 PyPI packages [1]_, only pyodbc use
|
When searching from the top 4000 PyPI packages [1]_, only pyodbc use
|
||||||
these APIs.
|
these APIs.
|
||||||
|
|
||||||
* ``PyUnicode_EncodeUTF8()``
|
* ``PyUnicode_EncodeUTF8()``
|
||||||
|
@ -139,23 +139,22 @@ Remove these APIs in Python 3.11. They have been deprecated already.
|
||||||
* ``PyUnicode_TransformDecimalToASCII()``
|
* ``PyUnicode_TransformDecimalToASCII()``
|
||||||
|
|
||||||
|
|
||||||
Alternative ideas
|
Alternative Ideas
|
||||||
=================
|
=================
|
||||||
|
|
||||||
Instead of just removing deprecated APIs, we may be able to use their
|
Replace ``Py_UNICODE*`` with ``PyObjct*``
|
||||||
names with different signature.
|
-----------------------------------------
|
||||||
|
|
||||||
|
As described in the "Alternative APIs" section, some APIs don't have
|
||||||
|
public alternative APIs accepting ``PyObject *unicode`` input.
|
||||||
|
And some public alternative APIs have restrictions like missing
|
||||||
|
``errors`` and ``byteorder`` parameters.
|
||||||
|
|
||||||
Make some private APIs public
|
Instead of removing deprecated APIs, we can reuse their names for
|
||||||
------------------------------
|
alternative public APIs.
|
||||||
|
|
||||||
``PyUnicode_EncodeUTF7()`` doesn't have public alternative APIs.
|
Since we have private alternative APIs already, it is just renaming
|
||||||
|
from private name to public and deprecated names.
|
||||||
Some APIs have alternative public APIs. But they are missing
|
|
||||||
``const char *errors`` or ``int byteorder`` parameters.
|
|
||||||
|
|
||||||
We can rename some private APIs and make them public to cover missing
|
|
||||||
APIs and parameters.
|
|
||||||
|
|
||||||
============================= ================================
|
============================= ================================
|
||||||
Rename to Rename from
|
Rename to Rename from
|
||||||
|
@ -170,11 +169,12 @@ APIs and parameters.
|
||||||
|
|
||||||
Pros:
|
Pros:
|
||||||
|
|
||||||
* We have more consistent API set.
|
* We have a more consistent API set.
|
||||||
|
|
||||||
Cons:
|
Cons:
|
||||||
|
|
||||||
* We have more public APIs to maintain.
|
* Backward incompatible.
|
||||||
|
* We have more public APIs to maintain for rare use cases.
|
||||||
* Existing public APIs are enough for most use cases, and
|
* Existing public APIs are enough for most use cases, and
|
||||||
``PyUnicode_AsEncodedString()`` can be used in other cases.
|
``PyUnicode_AsEncodedString()`` can be used in other cases.
|
||||||
|
|
||||||
|
@ -182,51 +182,71 @@ Cons:
|
||||||
Replace ``Py_UNICODE*`` with ``Py_UCS4*``
|
Replace ``Py_UNICODE*`` with ``Py_UCS4*``
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
We can replace ``Py_UNICODE`` (typedef of ``wchar_t``) with
|
We can replace ``Py_UNICODE`` with ``Py_UCS4`` and undeprecate
|
||||||
``Py_UCS4``. Since builtin codecs support UCS-4, we don't need to
|
these APIs.
|
||||||
convert ``Py_UCS4*`` string to Unicode object.
|
|
||||||
|
UTF-8, UTF-16, UTF-32 encoders support ``Py_UCS4`` internally.
|
||||||
|
So ``PyUnicode_EncodeUTF8()``, ``PyUnicode_EncodeUTF16()``, and
|
||||||
|
``PyUnicode_EncodeUTF32()`` can avoid to create a temporary Unicode
|
||||||
|
object.
|
||||||
|
|
||||||
|
|
||||||
Pros:
|
Pros:
|
||||||
|
|
||||||
* We have more consistent API set.
|
* We can avoid creating temporary Unicode object when encoding from
|
||||||
* User can encode UCS-4 string in C without creating Unicode object.
|
``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs.
|
||||||
|
|
||||||
Cons:
|
Cons:
|
||||||
|
|
||||||
* We have more public APIs to maintain.
|
* Backward incompatible.
|
||||||
* Applications which uses UTF-8 or UTF-16 can not use these APIs
|
* We have more public APIs to maintain for rare use cases.
|
||||||
anyway.
|
* Other Python implementations that want to support Python/C API need
|
||||||
* Other Python implementations may not have builtin codec for UCS-4.
|
to support these APIs too.
|
||||||
* If we change the Unicode internal representation to UTF-8, we need
|
* If we change the Unicode internal representation to UTF-8 in the
|
||||||
to keep UCS-4 support only for these APIs.
|
future, we need to keep UCS-4 support only for these APIs.
|
||||||
|
|
||||||
|
|
||||||
Replace ``Py_UNICODE*`` with ``wchar_t*``
|
Replace ``Py_UNICODE*`` with ``wchar_t*``
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
We can replace ``Py_UNICODE`` to ``wchar_t``.
|
We can replace ``Py_UNICODE`` with ``wchar_t``. Since ``Py_UNICODE``
|
||||||
|
is typedef of ``wchar_t`` already, this is status quo.
|
||||||
|
|
||||||
|
On platforms where ``sizeof(wchar_t) == 4``, we can avoid to create a
|
||||||
|
temporary Unicode object when encoding from ``wchar_t*`` to bytes
|
||||||
|
objects using UTF-8, UTF-16, and UTF-32 codec, like the "Replace
|
||||||
|
``Py_UNICODE*`` with ``Py_UCS4*``" idea.
|
||||||
|
|
||||||
|
|
||||||
Pros:
|
Pros:
|
||||||
|
|
||||||
* We have more consistent API set.
|
|
||||||
* Backward compatible.
|
* Backward compatible.
|
||||||
|
* We can avoid creating temporary Unicode object when encode from
|
||||||
|
``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs
|
||||||
|
on platform where ``sizeof(wchar_t) == 4``.
|
||||||
|
|
||||||
Cons:
|
Cons:
|
||||||
|
|
||||||
* We have more public APIs to maintain.
|
* Although Windows is the most major platform that uses ``wchar_t``
|
||||||
* They are inefficient on platforms ``wchar_t*`` is UTF-16. It is
|
heavily, these APIs need to create a temporary Unicode object
|
||||||
because built-in codecs supports only UCS-1, UCS-2, and UCS-4
|
always because ``sizeof(wchar_t) == 2`` on Windows.
|
||||||
input.
|
* We have more public APIs to maintain for rare use cases.
|
||||||
|
* Other Python implementations that want to support Python/C API need
|
||||||
|
to support these APIs too.
|
||||||
|
* If we change the Unicode internal representation to UTF-8 in the
|
||||||
|
future, we need to keep UCS-4 support only for these APIs.
|
||||||
|
|
||||||
|
|
||||||
Rejected ideas
|
Rejected Ideas
|
||||||
==============
|
==============
|
||||||
|
|
||||||
Using runtime warning
|
Emit runtime warning
|
||||||
---------------------
|
--------------------
|
||||||
|
|
||||||
These APIs doesn't release GIL for now. Emitting a warning from
|
In addition to existing compiler warning, emitting runtime
|
||||||
|
``DeprecationWarning`` is suggested.
|
||||||
|
|
||||||
|
But these APIs doesn't release GIL for now. Emitting a warning from
|
||||||
such APIs is not safe. See this example.
|
such APIs is not safe. See this example.
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
@ -244,7 +264,6 @@ filters and other threads may change the ``list`` and ``u`` can be
|
||||||
a dangling reference after ``PyUnicode_EncodeUTF8()`` returned.
|
a dangling reference after ``PyUnicode_EncodeUTF8()`` returned.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Discussions
|
Discussions
|
||||||
===========
|
===========
|
||||||
|
|
||||||
|
@ -256,22 +275,24 @@ Discussions
|
||||||
Objections
|
Objections
|
||||||
----------
|
----------
|
||||||
|
|
||||||
* Removing these APIs removes ability to use codec without temporary Unicode.
|
* Removing these APIs removes ability to use codec without temporary
|
||||||
|
Unicode.
|
||||||
|
|
||||||
* Codecs can not encode Unicode buffer directly without temporary Unicode
|
* Codecs can not encode Unicode buffer directly without temporary
|
||||||
object since Python 3.3. All these APIs creates temporary Unicode object
|
Unicode object since Python 3.3. All these APIs creates temporary
|
||||||
for now. So removing them doesn't reduce any abilities.
|
Unicode object for now. So removing them doesn't reduce any
|
||||||
|
abilities.
|
||||||
|
|
||||||
* Why not remove decoder APIs too?
|
* Why not remove decoder APIs too?
|
||||||
|
|
||||||
* They are part of stable ABI.
|
* They are part of stable ABI.
|
||||||
|
|
||||||
* ``PyUnicode_DecodeASCII()`` and ``PyUnicode_DecodeUTF8()`` are used
|
* ``PyUnicode_DecodeASCII()`` and ``PyUnicode_DecodeUTF8()`` are
|
||||||
very widely. Deprecating them is not worth enough.
|
used very widely. Deprecating them is not worth enough.
|
||||||
|
|
||||||
* Decoder APIs can decode from byte buffer directly, without creating
|
* Decoder APIs can decode from byte buffer directly, without
|
||||||
temporary bytes object. On the other hand, encoder APIs can not avoid
|
creating temporary bytes object. On the other hand, encoder APIs
|
||||||
temporary Unicode object.
|
can not avoid temporary Unicode object.
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
|
|
Loading…
Reference in New Issue