diff --git a/pep-0623.rst b/pep-0623.rst new file mode 100644 index 000000000..1eadb7b2f --- /dev/null +++ b/pep-0623.rst @@ -0,0 +1,172 @@ +PEP: 623 +Title: Remove wstr from Unicode +Author: Inada Naoki +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 25-Jun-2020 +Python-Version: 3.10 + + +Abstract +======== + +PEP 393 deprecated some unicode APIs, and introduced ``wchar_t *wstr``, +and ``Py_ssize_t wstr_length`` in the Unicode structure to support +these deprecated APIs. [1]_ + +This PEP is planning removal of ``wstr``, and ``wstr_length`` with +deprecated APIs using these members by Python 3.12. + +Deprecated APIs which doesn't use the members are out of scope because +they can be removed independently. + + +Motivation +========== + +Memory usage +------------ + +``str`` is one of the most used types in Python. Even most simple ASCII +strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems. + + +Runtime overhead +---------------- + +To support legacy Unicode object created by +``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has +``PyUnicode_READY()`` check. + +When we drop support of legacy unicode object, We can reduce this +overhead too. + + +Simplicity +---------- + +Support of legacy Unicode object makes Unicode implementation complex. +Until we drop legacy Unicode object, it is very hard to try other +Unicode implementation like UTF-8 based implementation in PyPy. + + +Rationale +========= + +Python 4.0 is not scheduled yet +------------------------------- + +PEP 393 introduced efficient internal representation of Unicode and +removed border between "narrow" and "wide" build of Python. + +PEP 393 was implemented in Python 3.3 which is released in 2012. Old +APIs were deprecated since then, and the removal was scheduled in +Python 4.0. + +Python 4.0 was expected as next version of Python 3.9 when PEP 393 +was accepted. But the next version of Python 3.9 is Python 3.10, +not 4.0. This is why this PEP schedule the removal plan again. + + +Python 2 reached EOL +-------------------- + +Since Python 2 didn't have PEP 393 Unicode implementation, legacy +APIs might help C extensiom modules supporting both of Python 2 and 3. + +But Python 2 reached the EOL in 2020. We can remove legacy APIs kept +for compatibility with Python 2. + + +Plan +==== + +Python 3.9 (current) +-------------------- + +These macros and functions are marked as deprecated, using +``Py_DEPRECATED`` macro. + +* ``Py_UNICODE_WSTR_LENGTH()`` +* ``PyUnicode_GET_SIZE()`` +* ``PyUnicode_GetSize()`` +* ``PyUnicode_GET_DATA_SIZE()`` +* ``PyUnicode_AS_UNICODE()`` +* ``PyUnicode_AS_DATA()`` +* ``PyUnicode_AsUnicode()`` +* ``_PyUnicode_AsUnicode()`` +* ``PyUnicode_AsUnicodeAndSize()`` +* ``PyUnicode_FromUnicode()`` + + +Python 3.10 +----------- + +* Following macros, enum members will be marked as deprecated. + ``Py_DEPRECATED(3.10)`` macro will be used as possible. But they + will be deprecated only in comment and document if the macro can + not be used easily. + + * ``PyUnicode_WCHAR_KIND`` + * ``PyUnicode_READY()`` + * ``PyUnicode_IS_READY()`` + * ``PyUnicode_IS_COMPACT()`` + +* ``PyUnicode_FromUnicode(NULL, size)`` and + ``PyUnicode_FromStringAndSize(NULL, size)`` will emit + ``DeprecationWarning`` when ``size > 0``. + +* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will emit + ``DeprecationWarning`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used. + + +Python 3.12 +----------- + +* Following members will be removed from the Unicode strucutres: + + * ``wstr`` + * ``wstr_length`` + * ``state.compact`` + * ``state.ready`` + +* The ``PyUnicodeObject`` struct will be removed. + +* Following macros and functions, and enum members will be removed: + + * ``Py_UNICODE_WSTR_LENGTH()`` + * ``PyUnicode_GET_SIZE()`` + * ``PyUnicode_GetSize()`` + * ``PyUnicode_GET_DATA_SIZE()`` + * ``PyUnicode_AS_UNICODE()`` + * ``PyUnicode_AS_DATA()`` + * ``PyUnicode_AsUnicode()`` + * ``_PyUnicode_AsUnicode()`` + * ``PyUnicode_AsUnicodeAndSize()`` + * ``PyUnicode_FromUnicode()`` + * ``PyUnicode_WCHAR_KIND`` + * ``PyUnicode_READY()`` + * ``PyUnicode_IS_READY()`` + * ``PyUnicode_IS_COMPACT()`` + +* ``PyUnicode_FromStringAndSize(NULL, size))`` will raise + ``RuntimeError`` when ``size > 0``. + +* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will raise + ``SystemError`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used, + as other unsupported format character. + + +References +========== +A collection of URLs used as references through the PEP. + +.. [1] PEP 393 -- Flexible String Representation + (https://www.python.org/dev/peps/pep-0393/) + + +Copyright +========= + +This document has been placed in the public domain.