PEP 623: Remove wstr from Unicode object (#1462)

2020-06-25 20:16:25 +09:00 · 2020-06-25 20:16:25 +09:00 · 9ea076fbfb
parent fa90054ead
commit 9ea076fbfb
1 changed files with 172 additions and 0 deletions
--- a/pep-0623.rst
+++ b/pep-0623.rst
@ -0,0 +1,172 @@
+PEP: 623
+Title: Remove wstr from Unicode
+Author: Inada Naoki <songofacandy@gmail.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 25-Jun-2020
+Python-Version: 3.10
+
+
+Abstract
+========
+
+PEP 393 deprecated some unicode APIs, and introduced ``wchar_t *wstr``,
+and ``Py_ssize_t wstr_length`` in the Unicode structure to support
+these deprecated APIs. [1]_
+
+This PEP is planning removal of ``wstr``, and ``wstr_length`` with
+deprecated APIs using these members by Python 3.12.
+
+Deprecated APIs which doesn't use the members are out of scope because
+they can be removed independently.
+
+
+Motivation
+==========
+
+Memory usage
+------------
+
+``str`` is one of the most used types in Python. Even most simple ASCII
+strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems.
+
+
+Runtime overhead
+----------------
+
+To support legacy Unicode object created by
+``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
+``PyUnicode_READY()`` check.
+
+When we drop support of legacy unicode object, We can reduce this
+overhead too.
+
+
+Simplicity
+----------
+
+Support of legacy Unicode object makes Unicode implementation complex.
+Until we drop legacy Unicode object, it is very hard to try other
+Unicode implementation like UTF-8 based implementation in PyPy.
+
+
+Rationale
+=========
+
+Python 4.0 is not scheduled yet
+-------------------------------
+
+PEP 393 introduced efficient internal representation of Unicode and
+removed border between "narrow" and "wide" build of Python.
+
+PEP 393 was implemented in Python 3.3 which is released in 2012. Old
+APIs were deprecated since then, and the removal was scheduled in
+Python 4.0.
+
+Python 4.0 was expected as next version of Python 3.9 when PEP 393
+was accepted. But the next version of Python 3.9 is Python 3.10,
+not 4.0. This is why this PEP schedule the removal plan again.
+
+
+Python 2 reached EOL
+--------------------
+
+Since Python 2 didn't have PEP 393 Unicode implementation, legacy
+APIs might help C extensiom modules supporting both of Python 2 and 3.
+
+But Python 2 reached the EOL in 2020. We can remove legacy APIs kept
+for compatibility with Python 2.
+
+
+Plan
+====
+
+Python 3.9 (current)
+--------------------
+
+These macros and functions are marked as deprecated, using
+``Py_DEPRECATED`` macro.
+
+* ``Py_UNICODE_WSTR_LENGTH()``
+* ``PyUnicode_GET_SIZE()``
+* ``PyUnicode_GetSize()``
+* ``PyUnicode_GET_DATA_SIZE()``
+* ``PyUnicode_AS_UNICODE()``
+* ``PyUnicode_AS_DATA()``
+* ``PyUnicode_AsUnicode()``
+* ``_PyUnicode_AsUnicode()``
+* ``PyUnicode_AsUnicodeAndSize()``
+* ``PyUnicode_FromUnicode()``
+
+
+Python 3.10
+-----------
+
+* Following macros, enum members will be marked as deprecated.
+  ``Py_DEPRECATED(3.10)`` macro will be used as possible. But they
+  will be deprecated only in comment and document if the macro can
+  not be used easily.
+
+   * ``PyUnicode_WCHAR_KIND``
+   * ``PyUnicode_READY()``
+   * ``PyUnicode_IS_READY()``
+   * ``PyUnicode_IS_COMPACT()``
+
+* ``PyUnicode_FromUnicode(NULL, size)`` and
+  ``PyUnicode_FromStringAndSize(NULL, size)`` will emit
+  ``DeprecationWarning`` when ``size > 0``.
+
+* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will emit
+  ``DeprecationWarning`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used.
+
+
+Python 3.12
+-----------
+
+* Following members will be removed from the Unicode strucutres:
+
+   * ``wstr``
+   * ``wstr_length``
+   * ``state.compact``
+   * ``state.ready``
+
+* The ``PyUnicodeObject`` struct will be removed.
+
+* Following macros and functions, and enum members will be removed:
+
+   * ``Py_UNICODE_WSTR_LENGTH()``
+   * ``PyUnicode_GET_SIZE()``
+   * ``PyUnicode_GetSize()``
+   * ``PyUnicode_GET_DATA_SIZE()``
+   * ``PyUnicode_AS_UNICODE()``
+   * ``PyUnicode_AS_DATA()``
+   * ``PyUnicode_AsUnicode()``
+   * ``_PyUnicode_AsUnicode()``
+   * ``PyUnicode_AsUnicodeAndSize()``
+   * ``PyUnicode_FromUnicode()``
+   * ``PyUnicode_WCHAR_KIND``
+   * ``PyUnicode_READY()``
+   * ``PyUnicode_IS_READY()``
+   * ``PyUnicode_IS_COMPACT()``
+
+* ``PyUnicode_FromStringAndSize(NULL, size))`` will raise
+  ``RuntimeError`` when ``size > 0``.
+
+* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will raise
+  ``SystemError`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used,
+  as other unsupported format character.
+
+
+References
+==========
+A collection of URLs used as references through the PEP.
+
+.. [1] PEP 393 -- Flexible String Representation
+       (https://www.python.org/dev/peps/pep-0393/)
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.