python-peps/pep-0624.rst

PEP: 624
Title: Remove Py_UNICODE encoder APIs
Author: Inada Naoki <songofacandy@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 06-Jul-2020
Python-Version: 3.11
Post-History: 08-Jul-2020


Abstract
========

This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in Python 3.11:

* ``PyUnicode_Encode()``
* ``PyUnicode_EncodeASCII()``
* ``PyUnicode_EncodeLatin1()``
* ``PyUnicode_EncodeUTF7()``
* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``
* ``PyUnicode_EncodeUTF32()``
* ``PyUnicode_EncodeUnicodeEscape()``
* ``PyUnicode_EncodeRawUnicodeEscape()``
* ``PyUnicode_EncodeCharmap()``
* ``PyUnicode_TranslateCharmap()``
* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``

.. note::

   `PEP 623  <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove
   Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP
   is not relating to Unicode object. These PEPs are split because they have
   different motivations and need different discussions.


Motivation
==========

In general, reducing the number of APIs that have been deprecated for
a long time and have few users is a good idea for not only it
improves the maintainability of CPython, but it also helps API users
and other Python implementations.


Rationale
=========

Deprecated since Python 3.3
---------------------------

``Py_UNICODE`` and APIs using it has been deprecated since Python 3.3.


Inefficient
-----------

All of these APIs are implemented using ``PyUnicode_FromWideChar``.
So these APIs are inefficient when user want to encode Unicode
object.


Not used widely
---------------

When searching from the top 4000 PyPI packages [1]_, only pyodbc use
these APIs.

* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``

pyodbc uses these APIs to encode Unicode object into bytes object.
So it is easy to fix it. [2]_


Alternative APIs
================

There are alternative APIs to accept ``PyObject *unicode`` instead of
``Py_UNICODE *``. Users can migrate to them.


========================================= ==========================================
Deprecated API                            Alternative APIs
========================================= ==========================================
``PyUnicode_Encode()``                    ``PyUnicode_AsEncodedString()``
``PyUnicode_EncodeASCII()``               ``PyUnicode_AsASCIIString()`` \(1)
``PyUnicode_EncodeLatin1()``              ``PyUnicode_AsLatin1String()`` \(1)
``PyUnicode_EncodeUTF7()``                \(2)
``PyUnicode_EncodeUTF8()``                ``PyUnicode_AsUTF8String()`` \(1)
``PyUnicode_EncodeUTF16()``               ``PyUnicode_AsUTF16String()`` \(3)
``PyUnicode_EncodeUTF32()``               ``PyUnicode_AsUTF32String()`` \(3)
``PyUnicode_EncodeUnicodeEscape()``       ``PyUnicode_AsUnicodeEscapeString()``
``PyUnicode_EncodeRawUnicodeEscape()``    ``PyUnicode_AsRawUnicodeEscapeString()``
``PyUnicode_EncodeCharmap()``             ``PyUnicode_AsCharmapString()`` \(1)
``PyUnicode_TranslateCharmap()``          ``PyUnicode_Translate()``
``PyUnicode_EncodeDecimal()``              \(4)
``PyUnicode_TransformDecimalToASCII()``    \(4)
========================================= ==========================================

Notes:

(1)
   ``const char *errors`` parameter is missing.

(2)
   There is no public alternative API. But user can use generic
   ``PyUnicode_AsEncodedString()`` instead.

(3)
   ``const char *errors, int byteorder`` parameters are missing.

(4)
   There is no direct replacement. But ``Py_UNICODE_TODECIMAL``
   can be used instead. CPython uses
   ``_PyUnicode_TransformDecimalAndSpaceToASCII`` for converting
   from Unicode to numbers instead.


Plan
====

Remove these APIs in Python 3.11. They have been deprecated already.

* ``PyUnicode_Encode()``
* ``PyUnicode_EncodeASCII()``
* ``PyUnicode_EncodeLatin1()``
* ``PyUnicode_EncodeUTF7()``
* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``
* ``PyUnicode_EncodeUTF32()``
* ``PyUnicode_EncodeUnicodeEscape()``
* ``PyUnicode_EncodeRawUnicodeEscape()``
* ``PyUnicode_EncodeCharmap()``
* ``PyUnicode_TranslateCharmap()``
* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``


Alternative Ideas
=================

Replace ``Py_UNICODE*`` with ``PyObjct*``
-----------------------------------------

As described in the "Alternative APIs" section, some APIs don't have
public alternative APIs accepting ``PyObject *unicode`` input.
And some public alternative APIs have restrictions like missing
``errors`` and ``byteorder`` parameters.

Instead of removing deprecated APIs, we can reuse their names for
alternative public APIs.

Since we have private alternative APIs already, it is just renaming
from private name to public and deprecated names.

============================= ================================
 Rename to                     Rename from
============================= ================================
``PyUnicode_EncodeASCII()``    ``_PyUnicode_AsASCIIString()``
``PyUnicode_EncodeLatin1()``   ``_PyUnicode_AsLatin1String()``
``PyUnicode_EncodeUTF7()``     ``_PyUnicode_EncodeUTF7()``
``PyUnicode_EncodeUTF8()``     ``_PyUnicode_AsUTF8String()``
``PyUnicode_EncodeUTF16()``    ``_PyUnicode_EncodeUTF16()``
``PyUnicode_EncodeUTF32()``    ``_PyUnicode_EncodeUTF32()``
============================= ================================

Pros:

* We have a more consistent API set.

Cons:

* Backward incompatible.
* We have more public APIs to maintain for rare use cases.
* Existing public APIs are enough for most use cases, and
  ``PyUnicode_AsEncodedString()`` can be used in other cases.


Replace ``Py_UNICODE*`` with ``Py_UCS4*``
-----------------------------------------

We can replace ``Py_UNICODE`` with ``Py_UCS4`` and undeprecate
these APIs.

UTF-8, UTF-16, UTF-32 encoders support ``Py_UCS4`` internally.
So ``PyUnicode_EncodeUTF8()``, ``PyUnicode_EncodeUTF16()``, and
``PyUnicode_EncodeUTF32()`` can avoid to create a temporary Unicode
object.


Pros:

* We can avoid creating temporary Unicode object when encoding from
  ``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs.

Cons:

* Backward incompatible.
* We have more public APIs to maintain for rare use cases.
* Other Python implementations that want to support Python/C API need
  to support these APIs too.
* If we change the Unicode internal representation to UTF-8 in the
  future, we need to keep UCS-4 support only for these APIs.


Replace ``Py_UNICODE*`` with ``wchar_t*``
-----------------------------------------

We can replace ``Py_UNICODE`` with ``wchar_t``. Since ``Py_UNICODE``
is typedef of ``wchar_t`` already, this is status quo.

On platforms where ``sizeof(wchar_t) == 4``, we can avoid to create a
temporary Unicode object when encoding from ``wchar_t*`` to bytes
objects using UTF-8, UTF-16, and UTF-32 codec, like the "Replace
``Py_UNICODE*`` with ``Py_UCS4*``" idea.


Pros:

* Backward compatible.
* We can avoid creating temporary Unicode object when encode from
  ``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs
  on platform where ``sizeof(wchar_t) == 4``.

Cons:

* Although Windows is the most major platform that uses ``wchar_t``
  heavily, these APIs need to create a temporary Unicode object
  always because ``sizeof(wchar_t) == 2`` on Windows.
* We have more public APIs to maintain for rare use cases.
* Other Python implementations that want to support Python/C API need
  to support these APIs too.
* If we change the Unicode internal representation to UTF-8 in the
  future, we need to keep UCS-4 support only for these APIs.


Rejected Ideas
==============

Emit runtime warning
--------------------

In addition to existing compiler warning, emitting runtime
``DeprecationWarning`` is suggested.

But these APIs doesn't release GIL for now. Emitting a warning from
such APIs is not safe. See this example.

.. code-block::

   PyObject *u = PyList_GET_ITEM(list, i);  // u is borrowed reference.
   PyObject *b = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u),
           PyUnicode_GET_SIZE(u), NULL);
   // Assumes u is still living reference.
   PyObject *t = PyTuple_Pack(2, u, b);
   Py_DECREF(b);
   return t;

If we emit Python warning from ``PyUnicode_EncodeUTF8()``, warning
filters and other threads may change the ``list`` and ``u`` can be
a dangling reference after ``PyUnicode_EncodeUTF8()`` returned.


Discussions
===========

* `[python-dev] Plan to remove Py_UNICODE APis except PEP 623
  <https://mail.python.org/archives/list/python-dev@python.org/thread/S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE/#S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE>`_
* `bpo-41123: Remove Py_UNICODE APIs except PEP 623
  <https://bugs.python.org/issue41123>`_
* `[python-dev] PEP 624: Remove Py_UNICODE encoder APIs
  <https://mail.python.org/archives/list/python-dev@python.org/thread/THXVM7FZVT56B7CPEDIYKJG6VMAYIEK5/#QUGBVLQNBFVNX25AEIL77WSFOHQES6LJ>`_


Objections
----------

* Removing these APIs removes ability to use codec without temporary
  Unicode.

  * Codecs can not encode Unicode buffer directly without temporary
    Unicode object since Python 3.3. All these APIs creates temporary
    Unicode object for now. So removing them doesn't reduce any
    abilities.

* Why not remove decoder APIs too?

  * They are part of stable ABI.

  * ``PyUnicode_DecodeASCII()`` and ``PyUnicode_DecodeUTF8()`` are
    used very widely. Deprecating them is not worth enough.

  * Decoder APIs can decode from byte buffer directly, without
    creating temporary bytes object. On the other hand, encoder APIs
    can not avoid temporary Unicode object.


References
==========

.. [1] Source package list chosen from top 4000 PyPI packages.
   (https://github.com/methane/notes/blob/master/2020/wchar-cache/package_list.txt)

.. [2] pyodbc -- Don't use PyUnicode_Encode API #792
   (https://github.com/mkleehammer/pyodbc/pull/792)


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00			`PEP: 624`
			`Title: Remove Py_UNICODE encoder APIs`
			`Author: Inada Naoki <songofacandy@gmail.com>`
			`Status: Draft`
			`Type: Standards Track`
			`Content-Type: text/x-rst`
			`Created: 06-Jul-2020`
			`Python-Version: 3.11`
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00			`Post-History: 08-Jul-2020`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00

			`Abstract`
			`========`

			This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in Python 3.11:

			* ``PyUnicode_Encode()``
			* ``PyUnicode_EncodeASCII()``
			* ``PyUnicode_EncodeLatin1()``
			* ``PyUnicode_EncodeUTF7()``
			* ``PyUnicode_EncodeUTF8()``
			* ``PyUnicode_EncodeUTF16()``
			* ``PyUnicode_EncodeUTF32()``
			* ``PyUnicode_EncodeUnicodeEscape()``
			* ``PyUnicode_EncodeRawUnicodeEscape()``
			* ``PyUnicode_EncodeCharmap()``
			* ``PyUnicode_TranslateCharmap()``
			* ``PyUnicode_EncodeDecimal()``
			* ``PyUnicode_TransformDecimalToASCII()``

			`.. note::`

			`PEP 623 <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove
			Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP
			`is not relating to Unicode object. These PEPs are split because they have`
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`different motivations and need different discussions.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00

			`Motivation`
			`==========`

			`In general, reducing the number of APIs that have been deprecated for`
			`a long time and have few users is a good idea for not only it`
			`improves the maintainability of CPython, but it also helps API users`
			`and other Python implementations.`


			`Rationale`
			`=========`

			`Deprecated since Python 3.3`
			`---------------------------`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			``Py_UNICODE`` and APIs using it has been deprecated since Python 3.3.
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00

			`Inefficient`
			`-----------`

			All of these APIs are implemented using ``PyUnicode_FromWideChar``.
			`So these APIs are inefficient when user want to encode Unicode`
			`object.`


			`Not used widely`
			`---------------`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`When searching from the top 4000 PyPI packages [1]_, only pyodbc use`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00			`these APIs.`

			* ``PyUnicode_EncodeUTF8()``
			* ``PyUnicode_EncodeUTF16()``

			`pyodbc uses these APIs to encode Unicode object into bytes object.`
			`So it is easy to fix it. [2]_`


			`Alternative APIs`
			`================`

			There are alternative APIs to accept ``PyObject *unicode`` instead of
			``Py_UNICODE *``. Users can migrate to them.


			`========================================= ==========================================`
			`Deprecated API Alternative APIs`
			`========================================= ==========================================`
			``PyUnicode_Encode()`` ``PyUnicode_AsEncodedString()``
			``PyUnicode_EncodeASCII()`` ``PyUnicode_AsASCIIString()`` \(1)
			``PyUnicode_EncodeLatin1()`` ``PyUnicode_AsLatin1String()`` \(1)
			``PyUnicode_EncodeUTF7()`` \(2)
			``PyUnicode_EncodeUTF8()`` ``PyUnicode_AsUTF8String()`` \(1)
			``PyUnicode_EncodeUTF16()`` ``PyUnicode_AsUTF16String()`` \(3)
			``PyUnicode_EncodeUTF32()`` ``PyUnicode_AsUTF32String()`` \(3)
			``PyUnicode_EncodeUnicodeEscape()`` ``PyUnicode_AsUnicodeEscapeString()``
			``PyUnicode_EncodeRawUnicodeEscape()`` ``PyUnicode_AsRawUnicodeEscapeString()``
			``PyUnicode_EncodeCharmap()`` ``PyUnicode_AsCharmapString()`` \(1)
			``PyUnicode_TranslateCharmap()`` ``PyUnicode_Translate()``
			``PyUnicode_EncodeDecimal()`` \(4)
			``PyUnicode_TransformDecimalToASCII()`` \(4)
			`========================================= ==========================================`

			`Notes:`

			`(1)`
			``const char *errors`` parameter is missing.

			`(2)`
			`There is no public alternative API. But user can use generic`
			``PyUnicode_AsEncodedString()`` instead.

			`(3)`
			``const char *errors, int byteorder`` parameters are missing.

			`(4)`
			There is no direct replacement. But ``Py_UNICODE_TODECIMAL``
			`can be used instead. CPython uses`
			``_PyUnicode_TransformDecimalAndSpaceToASCII`` for converting
			`from Unicode to numbers instead.`


			`Plan`
			`====`

PEP 624: Minor updates (#1666) Remove Python 3.9 from plan, because it is released already. Rewrite objections section. 2020-10-20 04:42:54 -04:00			`Remove these APIs in Python 3.11. They have been deprecated already.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
			* ``PyUnicode_Encode()``
			* ``PyUnicode_EncodeASCII()``
			* ``PyUnicode_EncodeLatin1()``
			* ``PyUnicode_EncodeUTF7()``
			* ``PyUnicode_EncodeUTF8()``
			* ``PyUnicode_EncodeUTF16()``
			* ``PyUnicode_EncodeUTF32()``
			* ``PyUnicode_EncodeUnicodeEscape()``
			* ``PyUnicode_EncodeRawUnicodeEscape()``
			* ``PyUnicode_EncodeCharmap()``
			* ``PyUnicode_TranslateCharmap()``
			* ``PyUnicode_EncodeDecimal()``
			* ``PyUnicode_TransformDecimalToASCII()``


PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`Alternative Ideas`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00			`=================`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			Replace ``Py_UNICODE`` with ``PyObjct``
			`-----------------------------------------`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`As described in the "Alternative APIs" section, some APIs don't have`
			public alternative APIs accepting ``PyObject *unicode`` input.
			`And some public alternative APIs have restrictions like missing`
			``errors`` and ``byteorder`` parameters.
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`Instead of removing deprecated APIs, we can reuse their names for`
			`alternative public APIs.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`Since we have private alternative APIs already, it is just renaming`
			`from private name to public and deprecated names.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
			`============================= ================================`
			`Rename to Rename from`
			`============================= ================================`
			``PyUnicode_EncodeASCII()`` ``_PyUnicode_AsASCIIString()``
			``PyUnicode_EncodeLatin1()`` ``_PyUnicode_AsLatin1String()``
			``PyUnicode_EncodeUTF7()`` ``_PyUnicode_EncodeUTF7()``
			``PyUnicode_EncodeUTF8()`` ``_PyUnicode_AsUTF8String()``
			``PyUnicode_EncodeUTF16()`` ``_PyUnicode_EncodeUTF16()``
			``PyUnicode_EncodeUTF32()`` ``_PyUnicode_EncodeUTF32()``
			`============================= ================================`

			`Pros:`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* We have a more consistent API set.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
			`Cons:`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* Backward incompatible.`
			`* We have more public APIs to maintain for rare use cases.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00			`* Existing public APIs are enough for most use cases, and`
			``PyUnicode_AsEncodedString()`` can be used in other cases.


			Replace ``Py_UNICODE`` with ``Py_UCS4``
			`-----------------------------------------`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			We can replace ``Py_UNICODE`` with ``Py_UCS4`` and undeprecate
			`these APIs.`

			UTF-8, UTF-16, UTF-32 encoders support ``Py_UCS4`` internally.
			So ``PyUnicode_EncodeUTF8()``, ``PyUnicode_EncodeUTF16()``, and
			``PyUnicode_EncodeUTF32()`` can avoid to create a temporary Unicode
			`object.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00

			`Pros:`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* We can avoid creating temporary Unicode object when encoding from`
			``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs.
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
			`Cons:`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* Backward incompatible.`
			`* We have more public APIs to maintain for rare use cases.`
			`* Other Python implementations that want to support Python/C API need`
			`to support these APIs too.`
			`* If we change the Unicode internal representation to UTF-8 in the`
			`future, we need to keep UCS-4 support only for these APIs.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00

			Replace ``Py_UNICODE`` with ``wchar_t``
			`-----------------------------------------`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			We can replace ``Py_UNICODE`` with ``wchar_t``. Since ``Py_UNICODE``
			is typedef of ``wchar_t`` already, this is status quo.

			On platforms where ``sizeof(wchar_t) == 4``, we can avoid to create a
			temporary Unicode object when encoding from ``wchar_t*`` to bytes
			`objects using UTF-8, UTF-16, and UTF-32 codec, like the "Replace`
			``Py_UNICODE`` with ``Py_UCS4``" idea.

PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
			`Pros:`

			`* Backward compatible.`
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* We can avoid creating temporary Unicode object when encode from`
			``Py_UCS4*`` into bytes object with UTF-8, UTF-16, UTF-32 codecs
			on platform where ``sizeof(wchar_t) == 4``.
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
			`Cons:`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			* Although Windows is the most major platform that uses ``wchar_t``
			`heavily, these APIs need to create a temporary Unicode object`
			always because ``sizeof(wchar_t) == 2`` on Windows.
			`* We have more public APIs to maintain for rare use cases.`
			`* Other Python implementations that want to support Python/C API need`
			`to support these APIs too.`
			`* If we change the Unicode internal representation to UTF-8 in the`
			`future, we need to keep UCS-4 support only for these APIs.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`Rejected Ideas`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00			`==============`

PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`Emit runtime warning`
			`--------------------`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`In addition to existing compiler warning, emitting runtime`
			``DeprecationWarning`` is suggested.

			`But these APIs doesn't release GIL for now. Emitting a warning from`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00			`such APIs is not safe. See this example.`

			`.. code-block::`

			`PyObject *u = PyList_GET_ITEM(list, i); // u is borrowed reference.`
			`PyObject *b = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u),`
			`PyUnicode_GET_SIZE(u), NULL);`
			`// Assumes u is still living reference.`
			`PyObject *t = PyTuple_Pack(2, u, b);`
			`Py_DECREF(b);`
			`return t;`

			If we emit Python warning from ``PyUnicode_EncodeUTF8()``, warning
			filters and other threads may change the ``list`` and ``u`` can be
			a dangling reference after ``PyUnicode_EncodeUTF8()`` returned.

PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00
PEP 624: Minor updates (#1666) Remove Python 3.9 from plan, because it is released already. Rewrite objections section. 2020-10-20 04:42:54 -04:00			`Discussions`
			`===========`
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00
PEP 624: Add a discussion link in python-dev (#1818) 2021-02-15 03:36:09 -05:00			* `[python-dev] Plan to remove Py_UNICODE APis except PEP 623
PEP 624: Minor updates (#1666) Remove Python 3.9 from plan, because it is released already. Rewrite objections section. 2020-10-20 04:42:54 -04:00			<https://mail.python.org/archives/list/python-dev@python.org/thread/S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE/#S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE>`_
PEP 624: Add a discussion link in python-dev (#1818) 2021-02-15 03:36:09 -05:00			* `bpo-41123: Remove Py_UNICODE APIs except PEP 623
			<https://bugs.python.org/issue41123>`_
			* `[python-dev] PEP 624: Remove Py_UNICODE encoder APIs
			<https://mail.python.org/archives/list/python-dev@python.org/thread/THXVM7FZVT56B7CPEDIYKJG6VMAYIEK5/#QUGBVLQNBFVNX25AEIL77WSFOHQES6LJ>`_
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00

PEP 624: Minor updates (#1666) Remove Python 3.9 from plan, because it is released already. Rewrite objections section. 2020-10-20 04:42:54 -04:00			`Objections`
			`----------`
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* Removing these APIs removes ability to use codec without temporary`
			`Unicode.`
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* Codecs can not encode Unicode buffer directly without temporary`
			`Unicode object since Python 3.3. All these APIs creates temporary`
			`Unicode object for now. So removing them doesn't reduce any`
			`abilities.`
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00
PEP 624: Minor updates (#1666) Remove Python 3.9 from plan, because it is released already. Rewrite objections section. 2020-10-20 04:42:54 -04:00			`* Why not remove decoder APIs too?`
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00
PEP 624: Minor updates (#1666) Remove Python 3.9 from plan, because it is released already. Rewrite objections section. 2020-10-20 04:42:54 -04:00			`* They are part of stable ABI.`
PEP 624: Add rejected idea. (#1508) New section describes why this PEP doesn't deprecate PyUnicode_Decode*** APIs. 2020-07-15 01:31:39 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			* ``PyUnicode_DecodeASCII()`` and ``PyUnicode_DecodeUTF8()`` are
			`used very widely. Deprecating them is not worth enough.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
PEP 624: Update alternative ideas (#1793) Add note about we can avoid creating a temporary Unicode object in deprecated APIs for some codecs. 2021-02-04 03:39:05 -05:00			`* Decoder APIs can decode from byte buffer directly, without`
			`creating temporary bytes object. On the other hand, encoder APIs`
			`can not avoid temporary Unicode object.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00

			`References`
			`==========`

			`.. [1] Source package list chosen from top 4000 PyPI packages.`
			`(https://github.com/methane/notes/blob/master/2020/wchar-cache/package_list.txt)`

			`.. [2] pyodbc -- Don't use PyUnicode_Encode API #792`
			`(https://github.com/mkleehammer/pyodbc/pull/792)`


			`Copyright`
			`=========`

PEP 624: Add a discussion link in python-dev (#1818) 2021-02-15 03:36:09 -05:00			`This document is placed in the public domain or under the`
			`CC0-1.0-Universal license, whichever is more permissive.`
PEP 624: Remove Py_UNICODE encoder APIs (#1497) Co-authored-by: Victor Stinner <vstinner@python.org> 2020-07-07 11:08:47 -04:00
			`..`
			`Local Variables:`
			`mode: indented-text`
			`indent-tabs-mode: nil`
			`sentence-end-double-space: t`
			`fill-column: 70`
			`coding: utf-8`
			`End:`