Rephrase the PEP 623 (#1492)

* Rephrase the PEP 623 * Add Discussion section and bpo links * Update pep-0623.rst Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com> Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>
2020-07-04 23:12:10 +02:00 · 2020-07-04 23:12:10 +02:00 · 95ac2ff27f
parent 058b126e7b
commit 95ac2ff27f
1 changed files with 62 additions and 42 deletions
--- a/pep-0623.rst
+++ b/pep-0623.rst
@ -30,24 +30,25 @@ Memory usage
 ------------

 ``str`` is one of the most used types in Python. Even most simple ASCII
-strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems.
+strings have a ``wstr`` member. It consumes 8 bytes per string on 64-bit
+systems.


 Runtime overhead
 ----------------

-To support legacy Unicode object created by
-``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
-``PyUnicode_READY()`` check.
+To support legacy Unicode object, many Unicode APIs must call
+``PyUnicode_READY()``.

-When we drop support of legacy unicode object, We can reduce this
-overhead too.
+We can remove this overhead too by dropping support of legacy Unicode
+object.


 Simplicity
 ----------

-Support of legacy Unicode object makes Unicode implementation complex.
+Supporting legacy Unicode object makes the Unicode implementation more
+complex.
 Until we drop legacy Unicode object, it is very hard to try other
 Unicode implementation like UTF-8 based implementation in PyPy.

@ -83,8 +84,8 @@ for compatibility with Python 2.
 Plan
 ====

-Python 3.9 (current)
--------------------
+Python 3.9
+----------

 These macros and functions are marked as deprecated, using
 ``Py_DEPRECATED`` macro.
@ -104,64 +105,83 @@ These macros and functions are marked as deprecated, using
 Python 3.10
 -----------

-* Following macros, enum members will be marked as deprecated.
-  ``Py_DEPRECATED(3.10)`` macro will be used as possible. But they
-  will be deprecated only in comment and document if the macro can
+* Following macros, enum members are marked as deprecated.
+  ``Py_DEPRECATED(3.10)`` macro are used as possible. But they
+  are deprecated only in comment and document if the macro can
  not be used easily.

-   * ``PyUnicode_WCHAR_KIND``
-   * ``PyUnicode_READY()``
-   * ``PyUnicode_IS_READY()``
-   * ``PyUnicode_IS_COMPACT()``
+  * ``PyUnicode_WCHAR_KIND``
+  * ``PyUnicode_READY()``
+  * ``PyUnicode_IS_READY()``
+  * ``PyUnicode_IS_COMPACT()``

 * ``PyUnicode_FromUnicode(NULL, size)`` and
-  ``PyUnicode_FromStringAndSize(NULL, size)`` will emit
+  ``PyUnicode_FromStringAndSize(NULL, size)`` emit
  ``DeprecationWarning`` when ``size > 0``.

-* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will emit
+* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` emit
  ``DeprecationWarning`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used.


 Python 3.12
 -----------

-* Following members will be removed from the Unicode strucutres:
+* Following members are removed from the Unicode structures:

-   * ``wstr``
-   * ``wstr_length``
-   * ``state.compact``
-   * ``state.ready``
+  * ``wstr``
+  * ``wstr_length``
+  * ``state.compact``
+  * ``state.ready``

-* The ``PyUnicodeObject`` struct will be removed.
+* The ``PyUnicodeObject`` structure is removed.

-* Following macros and functions, and enum members will be removed:
+* Following macros and functions, and enum members are removed:

-   * ``Py_UNICODE_WSTR_LENGTH()``
-   * ``PyUnicode_GET_SIZE()``
-   * ``PyUnicode_GetSize()``
-   * ``PyUnicode_GET_DATA_SIZE()``
-   * ``PyUnicode_AS_UNICODE()``
-   * ``PyUnicode_AS_DATA()``
-   * ``PyUnicode_AsUnicode()``
-   * ``_PyUnicode_AsUnicode()``
-   * ``PyUnicode_AsUnicodeAndSize()``
-   * ``PyUnicode_FromUnicode()``
-   * ``PyUnicode_WCHAR_KIND``
-   * ``PyUnicode_READY()``
-   * ``PyUnicode_IS_READY()``
-   * ``PyUnicode_IS_COMPACT()``
+  * ``Py_UNICODE_WSTR_LENGTH()``
+  * ``PyUnicode_GET_SIZE()``
+  * ``PyUnicode_GetSize()``
+  * ``PyUnicode_GET_DATA_SIZE()``
+  * ``PyUnicode_AS_UNICODE()``
+  * ``PyUnicode_AS_DATA()``
+  * ``PyUnicode_AsUnicode()``
+  * ``_PyUnicode_AsUnicode()``
+  * ``PyUnicode_AsUnicodeAndSize()``
+  * ``PyUnicode_FromUnicode()``
+  * ``PyUnicode_WCHAR_KIND``
+  * ``PyUnicode_READY()``
+  * ``PyUnicode_IS_READY()``
+  * ``PyUnicode_IS_COMPACT()``

-* ``PyUnicode_FromStringAndSize(NULL, size))`` will raise
+* ``PyUnicode_FromStringAndSize(NULL, size))`` raises
  ``RuntimeError`` when ``size > 0``.

-* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will raise
+* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` raise
  ``SystemError`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used,
  as other unsupported format character.


+Discussion
+==========
+
+* `Draft PEP: Remove wstr from Unicode
+  <https://mail.python.org/archives/list/python-dev@python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/#BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH>`_
+* `When can we remove wchar_t* cache from string?
+  <https://mail.python.org/archives/list/python-dev@python.org/thread/7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR/#7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR>`_
+* `PEP 623: Remove wstr from Unicode object #1462
+  <https://github.com/python/peps/pull/1462>`_
+
+
 References
 ==========
-A collection of URLs used as references through the PEP.
+
+* `bpo-38604: Schedule Py_UNICODE API removal
+  <https://bugs.python.org/issue38604>`_
+* `bpo-36346: Prepare for removing the legacy Unicode C API
+  <https://bugs.python.org/issue36346>`_
+* `bpo-30863: Rewrite PyUnicode_AsWideChar() and
+  PyUnicode_AsWideCharString() <https://bugs.python.org/issue30863>`_:
+  They no longer cache the ``wchar_t*`` representation of string
+  objects.

 .. [1] PEP 393 -- Flexible String Representation
       (https://www.python.org/dev/peps/pep-0393/)