PEP 624: Update alternative ideas (#1793)
Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs.
This commit is contained in:
parent
d3f48ed58f
commit
814daa8aea
117
pep-0624.rst
117
pep-0624.rst
|
@ -33,7 +33,7 @@ This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in Python 3.1
|
|||
`PEP 623 <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove
|
||||
Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP
|
||||
is not relating to Unicode object. These PEPs are split because they have
|
||||
different motivation and need different discussion.
|
||||
different motivations and need different discussions.
|
||||
|
||||
|
||||
Motivation
|
||||
|
@ -51,7 +51,7 @@ Rationale
|
|||
Deprecated since Python 3.3
|
||||
---------------------------
|
||||
|
||||
``Py_UNICODE`` and APIs using it have been deprecated since Python 3.3.
|
||||
``Py_UNICODE`` and APIs using it has been deprecated since Python 3.3.
|
||||
|
||||
|
||||
Inefficient
|
||||
|
@ -65,7 +65,7 @@ object.
|
|||
Not used widely
|
||||
---------------
|
||||
|
||||
When searching from top 4000 PyPI packages [1]_, only pyodbc use
|
||||
When searching from the top 4000 PyPI packages [1]_, only pyodbc use
|
||||
these APIs.
|
||||
|
||||
* ``PyUnicode_EncodeUTF8()``
|
||||
|
@ -139,23 +139,22 @@ Remove these APIs in Python 3.11. They have been deprecated already.
|
|||
* ``PyUnicode_TransformDecimalToASCII()``
|
||||
|
||||
|
||||
Alternative ideas
|
||||
Alternative Ideas
|
||||
=================
|
||||
|
||||
Instead of just removing deprecated APIs, we may be able to use their
|
||||
names with different signature.
|
||||
Replace ``Py_UNICODE*`` with ``PyObjct*``
|
||||
-----------------------------------------
|
||||
|
||||
As described in the "Alternative APIs" section, some APIs don't have
|
||||
public alternative APIs accepting ``PyObject *unicode`` input.
|
||||
And some public alternative APIs have restrictions like missing
|
||||
``errors`` and ``byteorder`` parameters.
|
||||
|
||||
Make some private APIs public
|
||||
------------------------------
|
||||
Instead of removing deprecated APIs, we can reuse their names for
|
||||
alternative public APIs.
|
||||
|
||||
``PyUnicode_EncodeUTF7()`` doesn't have public alternative APIs.
|
||||
|
||||
Some APIs have alternative public APIs. But they are missing
|
||||
``const char *errors`` or ``int byteorder`` parameters.
|
||||
|
||||
We can rename some private APIs and make them public to cover missing
|
||||
APIs and parameters.
|
||||
Since we have private alternative APIs already, it is just renaming
|
||||
from private name to public and deprecated names.
|
||||
|
||||
============================= ================================
|
||||
Rename to Rename from
|
||||
|
@ -170,11 +169,12 @@ APIs and parameters.
|
|||
|
||||
Pros:
|
||||
|
||||
* We have more consistent API set.
|
||||
* We have a more consistent API set.
|
||||
|
||||
Cons:
|
||||
|
||||
* We have more public APIs to maintain.
|
||||
* Backward incompatible.
|
||||
* We have more public APIs to maintain for rare use cases.
|
||||
* Existing public APIs are enough for most use cases, and
|
||||
``PyUnicode_AsEncodedString()`` can be used in other cases.
|
||||
|
||||
|
@ -182,51 +182,71 @@ Cons:
|
|||
Replace ``Py_UNICODE*`` with ``Py_UCS4*``
|
||||
-----------------------------------------
|
||||
|
||||
We can replace ``Py_UNICODE`` (typedef of ``wchar_t``) with
|
||||
``Py_UCS4``. Since builtin codecs support UCS-4, we don't need to
|
||||
convert ``Py_UCS4*`` string to Unicode object.
|
||||
We can replace ``Py_UNICODE`` with ``Py_UCS4`` and undeprecate
|
||||
these APIs.
|
||||
|
||||
UTF-8, UTF-16, UTF-32 encoders support ``Py_UCS4`` internally.
|
||||
So ``PyUnicode_EncodeUTF8()``, ``PyUnicode_EncodeUTF16()``, and
|
||||
``PyUnicode_EncodeUTF32()`` can avoid to create a temporary Unicode
|
||||
object.
|
||||
|
||||
|
||||
Pros:
|
||||
|
||||
* We have more consistent API set.
|
||||
* User can encode UCS-4 string in C without creating Unicode object.
|
||||
* We can avoid creating temporary Unicode object when encoding from
|
||||
``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs.
|
||||
|
||||
Cons:
|
||||
|
||||
* We have more public APIs to maintain.
|
||||
* Applications which uses UTF-8 or UTF-16 can not use these APIs
|
||||
anyway.
|
||||
* Other Python implementations may not have builtin codec for UCS-4.
|
||||
* If we change the Unicode internal representation to UTF-8, we need
|
||||
to keep UCS-4 support only for these APIs.
|
||||
* Backward incompatible.
|
||||
* We have more public APIs to maintain for rare use cases.
|
||||
* Other Python implementations that want to support Python/C API need
|
||||
to support these APIs too.
|
||||
* If we change the Unicode internal representation to UTF-8 in the
|
||||
future, we need to keep UCS-4 support only for these APIs.
|
||||
|
||||
|
||||
Replace ``Py_UNICODE*`` with ``wchar_t*``
|
||||
-----------------------------------------
|
||||
|
||||
We can replace ``Py_UNICODE`` to ``wchar_t``.
|
||||
We can replace ``Py_UNICODE`` with ``wchar_t``. Since ``Py_UNICODE``
|
||||
is typedef of ``wchar_t`` already, this is status quo.
|
||||
|
||||
On platforms where ``sizeof(wchar_t) == 4``, we can avoid to create a
|
||||
temporary Unicode object when encoding from ``wchar_t*`` to bytes
|
||||
objects using UTF-8, UTF-16, and UTF-32 codec, like the "Replace
|
||||
``Py_UNICODE*`` with ``Py_UCS4*``" idea.
|
||||
|
||||
|
||||
Pros:
|
||||
|
||||
* We have more consistent API set.
|
||||
* Backward compatible.
|
||||
* We can avoid creating temporary Unicode object when encode from
|
||||
``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs
|
||||
on platform where ``sizeof(wchar_t) == 4``.
|
||||
|
||||
Cons:
|
||||
|
||||
* We have more public APIs to maintain.
|
||||
* They are inefficient on platforms ``wchar_t*`` is UTF-16. It is
|
||||
because built-in codecs supports only UCS-1, UCS-2, and UCS-4
|
||||
input.
|
||||
* Although Windows is the most major platform that uses ``wchar_t``
|
||||
heavily, these APIs need to create a temporary Unicode object
|
||||
always because ``sizeof(wchar_t) == 2`` on Windows.
|
||||
* We have more public APIs to maintain for rare use cases.
|
||||
* Other Python implementations that want to support Python/C API need
|
||||
to support these APIs too.
|
||||
* If we change the Unicode internal representation to UTF-8 in the
|
||||
future, we need to keep UCS-4 support only for these APIs.
|
||||
|
||||
|
||||
Rejected ideas
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
Using runtime warning
|
||||
---------------------
|
||||
Emit runtime warning
|
||||
--------------------
|
||||
|
||||
These APIs doesn't release GIL for now. Emitting a warning from
|
||||
In addition to existing compiler warning, emitting runtime
|
||||
``DeprecationWarning`` is suggested.
|
||||
|
||||
But these APIs doesn't release GIL for now. Emitting a warning from
|
||||
such APIs is not safe. See this example.
|
||||
|
||||
.. code-block::
|
||||
|
@ -244,7 +264,6 @@ filters and other threads may change the ``list`` and ``u`` can be
|
|||
a dangling reference after ``PyUnicode_EncodeUTF8()`` returned.
|
||||
|
||||
|
||||
|
||||
Discussions
|
||||
===========
|
||||
|
||||
|
@ -256,22 +275,24 @@ Discussions
|
|||
Objections
|
||||
----------
|
||||
|
||||
* Removing these APIs removes ability to use codec without temporary Unicode.
|
||||
* Removing these APIs removes ability to use codec without temporary
|
||||
Unicode.
|
||||
|
||||
* Codecs can not encode Unicode buffer directly without temporary Unicode
|
||||
object since Python 3.3. All these APIs creates temporary Unicode object
|
||||
for now. So removing them doesn't reduce any abilities.
|
||||
* Codecs can not encode Unicode buffer directly without temporary
|
||||
Unicode object since Python 3.3. All these APIs creates temporary
|
||||
Unicode object for now. So removing them doesn't reduce any
|
||||
abilities.
|
||||
|
||||
* Why not remove decoder APIs too?
|
||||
|
||||
* They are part of stable ABI.
|
||||
|
||||
* ``PyUnicode_DecodeASCII()`` and ``PyUnicode_DecodeUTF8()`` are used
|
||||
very widely. Deprecating them is not worth enough.
|
||||
* ``PyUnicode_DecodeASCII()`` and ``PyUnicode_DecodeUTF8()`` are
|
||||
used very widely. Deprecating them is not worth enough.
|
||||
|
||||
* Decoder APIs can decode from byte buffer directly, without creating
|
||||
temporary bytes object. On the other hand, encoder APIs can not avoid
|
||||
temporary Unicode object.
|
||||
* Decoder APIs can decode from byte buffer directly, without
|
||||
creating temporary bytes object. On the other hand, encoder APIs
|
||||
can not avoid temporary Unicode object.
|
||||
|
||||
|
||||
References
|
||||
|
|
Loading…
Reference in New Issue