Rephrase the PEP 623 (#1492)

* Rephrase the PEP 623

* Add Discussion section and bpo links

* Update pep-0623.rst

Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>

Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>
This commit is contained in:
Victor Stinner 2020-07-04 23:12:10 +02:00 committed by GitHub
parent 058b126e7b
commit 95ac2ff27f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 62 additions and 42 deletions

View File

@ -30,24 +30,25 @@ Memory usage
------------
``str`` is one of the most used types in Python. Even most simple ASCII
strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems.
strings have a ``wstr`` member. It consumes 8 bytes per string on 64-bit
systems.
Runtime overhead
----------------
To support legacy Unicode object created by
``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
``PyUnicode_READY()`` check.
To support legacy Unicode object, many Unicode APIs must call
``PyUnicode_READY()``.
When we drop support of legacy unicode object, We can reduce this
overhead too.
We can remove this overhead too by dropping support of legacy Unicode
object.
Simplicity
----------
Support of legacy Unicode object makes Unicode implementation complex.
Supporting legacy Unicode object makes the Unicode implementation more
complex.
Until we drop legacy Unicode object, it is very hard to try other
Unicode implementation like UTF-8 based implementation in PyPy.
@ -83,8 +84,8 @@ for compatibility with Python 2.
Plan
====
Python 3.9 (current)
--------------------
Python 3.9
----------
These macros and functions are marked as deprecated, using
``Py_DEPRECATED`` macro.
@ -104,64 +105,83 @@ These macros and functions are marked as deprecated, using
Python 3.10
-----------
* Following macros, enum members will be marked as deprecated.
``Py_DEPRECATED(3.10)`` macro will be used as possible. But they
will be deprecated only in comment and document if the macro can
* Following macros, enum members are marked as deprecated.
``Py_DEPRECATED(3.10)`` macro are used as possible. But they
are deprecated only in comment and document if the macro can
not be used easily.
* ``PyUnicode_WCHAR_KIND``
* ``PyUnicode_READY()``
* ``PyUnicode_IS_READY()``
* ``PyUnicode_IS_COMPACT()``
* ``PyUnicode_WCHAR_KIND``
* ``PyUnicode_READY()``
* ``PyUnicode_IS_READY()``
* ``PyUnicode_IS_COMPACT()``
* ``PyUnicode_FromUnicode(NULL, size)`` and
``PyUnicode_FromStringAndSize(NULL, size)`` will emit
``PyUnicode_FromStringAndSize(NULL, size)`` emit
``DeprecationWarning`` when ``size > 0``.
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will emit
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` emit
``DeprecationWarning`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used.
Python 3.12
-----------
* Following members will be removed from the Unicode strucutres:
* Following members are removed from the Unicode structures:
* ``wstr``
* ``wstr_length``
* ``state.compact``
* ``state.ready``
* ``wstr``
* ``wstr_length``
* ``state.compact``
* ``state.ready``
* The ``PyUnicodeObject`` struct will be removed.
* The ``PyUnicodeObject`` structure is removed.
* Following macros and functions, and enum members will be removed:
* Following macros and functions, and enum members are removed:
* ``Py_UNICODE_WSTR_LENGTH()``
* ``PyUnicode_GET_SIZE()``
* ``PyUnicode_GetSize()``
* ``PyUnicode_GET_DATA_SIZE()``
* ``PyUnicode_AS_UNICODE()``
* ``PyUnicode_AS_DATA()``
* ``PyUnicode_AsUnicode()``
* ``_PyUnicode_AsUnicode()``
* ``PyUnicode_AsUnicodeAndSize()``
* ``PyUnicode_FromUnicode()``
* ``PyUnicode_WCHAR_KIND``
* ``PyUnicode_READY()``
* ``PyUnicode_IS_READY()``
* ``PyUnicode_IS_COMPACT()``
* ``Py_UNICODE_WSTR_LENGTH()``
* ``PyUnicode_GET_SIZE()``
* ``PyUnicode_GetSize()``
* ``PyUnicode_GET_DATA_SIZE()``
* ``PyUnicode_AS_UNICODE()``
* ``PyUnicode_AS_DATA()``
* ``PyUnicode_AsUnicode()``
* ``_PyUnicode_AsUnicode()``
* ``PyUnicode_AsUnicodeAndSize()``
* ``PyUnicode_FromUnicode()``
* ``PyUnicode_WCHAR_KIND``
* ``PyUnicode_READY()``
* ``PyUnicode_IS_READY()``
* ``PyUnicode_IS_COMPACT()``
* ``PyUnicode_FromStringAndSize(NULL, size))`` will raise
* ``PyUnicode_FromStringAndSize(NULL, size))`` raises
``RuntimeError`` when ``size > 0``.
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will raise
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` raise
``SystemError`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used,
as other unsupported format character.
Discussion
==========
* `Draft PEP: Remove wstr from Unicode
<https://mail.python.org/archives/list/python-dev@python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/#BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH>`_
* `When can we remove wchar_t* cache from string?
<https://mail.python.org/archives/list/python-dev@python.org/thread/7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR/#7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR>`_
* `PEP 623: Remove wstr from Unicode object #1462
<https://github.com/python/peps/pull/1462>`_
References
==========
A collection of URLs used as references through the PEP.
* `bpo-38604: Schedule Py_UNICODE API removal
<https://bugs.python.org/issue38604>`_
* `bpo-36346: Prepare for removing the legacy Unicode C API
<https://bugs.python.org/issue36346>`_
* `bpo-30863: Rewrite PyUnicode_AsWideChar() and
PyUnicode_AsWideCharString() <https://bugs.python.org/issue30863>`_:
They no longer cache the ``wchar_t*`` representation of string
objects.
.. [1] PEP 393 -- Flexible String Representation
(https://www.python.org/dev/peps/pep-0393/)