Rephrase the PEP 623 (#1492)
* Rephrase the PEP 623 * Add Discussion section and bpo links * Update pep-0623.rst Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com> Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com>
This commit is contained in:
parent
058b126e7b
commit
95ac2ff27f
104
pep-0623.rst
104
pep-0623.rst
|
@ -30,24 +30,25 @@ Memory usage
|
|||
------------
|
||||
|
||||
``str`` is one of the most used types in Python. Even most simple ASCII
|
||||
strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems.
|
||||
strings have a ``wstr`` member. It consumes 8 bytes per string on 64-bit
|
||||
systems.
|
||||
|
||||
|
||||
Runtime overhead
|
||||
----------------
|
||||
|
||||
To support legacy Unicode object created by
|
||||
``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
|
||||
``PyUnicode_READY()`` check.
|
||||
To support legacy Unicode object, many Unicode APIs must call
|
||||
``PyUnicode_READY()``.
|
||||
|
||||
When we drop support of legacy unicode object, We can reduce this
|
||||
overhead too.
|
||||
We can remove this overhead too by dropping support of legacy Unicode
|
||||
object.
|
||||
|
||||
|
||||
Simplicity
|
||||
----------
|
||||
|
||||
Support of legacy Unicode object makes Unicode implementation complex.
|
||||
Supporting legacy Unicode object makes the Unicode implementation more
|
||||
complex.
|
||||
Until we drop legacy Unicode object, it is very hard to try other
|
||||
Unicode implementation like UTF-8 based implementation in PyPy.
|
||||
|
||||
|
@ -83,8 +84,8 @@ for compatibility with Python 2.
|
|||
Plan
|
||||
====
|
||||
|
||||
Python 3.9 (current)
|
||||
--------------------
|
||||
Python 3.9
|
||||
----------
|
||||
|
||||
These macros and functions are marked as deprecated, using
|
||||
``Py_DEPRECATED`` macro.
|
||||
|
@ -104,64 +105,83 @@ These macros and functions are marked as deprecated, using
|
|||
Python 3.10
|
||||
-----------
|
||||
|
||||
* Following macros, enum members will be marked as deprecated.
|
||||
``Py_DEPRECATED(3.10)`` macro will be used as possible. But they
|
||||
will be deprecated only in comment and document if the macro can
|
||||
* Following macros, enum members are marked as deprecated.
|
||||
``Py_DEPRECATED(3.10)`` macro are used as possible. But they
|
||||
are deprecated only in comment and document if the macro can
|
||||
not be used easily.
|
||||
|
||||
* ``PyUnicode_WCHAR_KIND``
|
||||
* ``PyUnicode_READY()``
|
||||
* ``PyUnicode_IS_READY()``
|
||||
* ``PyUnicode_IS_COMPACT()``
|
||||
* ``PyUnicode_WCHAR_KIND``
|
||||
* ``PyUnicode_READY()``
|
||||
* ``PyUnicode_IS_READY()``
|
||||
* ``PyUnicode_IS_COMPACT()``
|
||||
|
||||
* ``PyUnicode_FromUnicode(NULL, size)`` and
|
||||
``PyUnicode_FromStringAndSize(NULL, size)`` will emit
|
||||
``PyUnicode_FromStringAndSize(NULL, size)`` emit
|
||||
``DeprecationWarning`` when ``size > 0``.
|
||||
|
||||
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will emit
|
||||
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` emit
|
||||
``DeprecationWarning`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used.
|
||||
|
||||
|
||||
Python 3.12
|
||||
-----------
|
||||
|
||||
* Following members will be removed from the Unicode strucutres:
|
||||
* Following members are removed from the Unicode structures:
|
||||
|
||||
* ``wstr``
|
||||
* ``wstr_length``
|
||||
* ``state.compact``
|
||||
* ``state.ready``
|
||||
* ``wstr``
|
||||
* ``wstr_length``
|
||||
* ``state.compact``
|
||||
* ``state.ready``
|
||||
|
||||
* The ``PyUnicodeObject`` struct will be removed.
|
||||
* The ``PyUnicodeObject`` structure is removed.
|
||||
|
||||
* Following macros and functions, and enum members will be removed:
|
||||
* Following macros and functions, and enum members are removed:
|
||||
|
||||
* ``Py_UNICODE_WSTR_LENGTH()``
|
||||
* ``PyUnicode_GET_SIZE()``
|
||||
* ``PyUnicode_GetSize()``
|
||||
* ``PyUnicode_GET_DATA_SIZE()``
|
||||
* ``PyUnicode_AS_UNICODE()``
|
||||
* ``PyUnicode_AS_DATA()``
|
||||
* ``PyUnicode_AsUnicode()``
|
||||
* ``_PyUnicode_AsUnicode()``
|
||||
* ``PyUnicode_AsUnicodeAndSize()``
|
||||
* ``PyUnicode_FromUnicode()``
|
||||
* ``PyUnicode_WCHAR_KIND``
|
||||
* ``PyUnicode_READY()``
|
||||
* ``PyUnicode_IS_READY()``
|
||||
* ``PyUnicode_IS_COMPACT()``
|
||||
* ``Py_UNICODE_WSTR_LENGTH()``
|
||||
* ``PyUnicode_GET_SIZE()``
|
||||
* ``PyUnicode_GetSize()``
|
||||
* ``PyUnicode_GET_DATA_SIZE()``
|
||||
* ``PyUnicode_AS_UNICODE()``
|
||||
* ``PyUnicode_AS_DATA()``
|
||||
* ``PyUnicode_AsUnicode()``
|
||||
* ``_PyUnicode_AsUnicode()``
|
||||
* ``PyUnicode_AsUnicodeAndSize()``
|
||||
* ``PyUnicode_FromUnicode()``
|
||||
* ``PyUnicode_WCHAR_KIND``
|
||||
* ``PyUnicode_READY()``
|
||||
* ``PyUnicode_IS_READY()``
|
||||
* ``PyUnicode_IS_COMPACT()``
|
||||
|
||||
* ``PyUnicode_FromStringAndSize(NULL, size))`` will raise
|
||||
* ``PyUnicode_FromStringAndSize(NULL, size))`` raises
|
||||
``RuntimeError`` when ``size > 0``.
|
||||
|
||||
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will raise
|
||||
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` raise
|
||||
``SystemError`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used,
|
||||
as other unsupported format character.
|
||||
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
||||
* `Draft PEP: Remove wstr from Unicode
|
||||
<https://mail.python.org/archives/list/python-dev@python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/#BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH>`_
|
||||
* `When can we remove wchar_t* cache from string?
|
||||
<https://mail.python.org/archives/list/python-dev@python.org/thread/7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR/#7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR>`_
|
||||
* `PEP 623: Remove wstr from Unicode object #1462
|
||||
<https://github.com/python/peps/pull/1462>`_
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
A collection of URLs used as references through the PEP.
|
||||
|
||||
* `bpo-38604: Schedule Py_UNICODE API removal
|
||||
<https://bugs.python.org/issue38604>`_
|
||||
* `bpo-36346: Prepare for removing the legacy Unicode C API
|
||||
<https://bugs.python.org/issue36346>`_
|
||||
* `bpo-30863: Rewrite PyUnicode_AsWideChar() and
|
||||
PyUnicode_AsWideCharString() <https://bugs.python.org/issue30863>`_:
|
||||
They no longer cache the ``wchar_t*`` representation of string
|
||||
objects.
|
||||
|
||||
.. [1] PEP 393 -- Flexible String Representation
|
||||
(https://www.python.org/dev/peps/pep-0393/)
|
||||
|
|
Loading…
Reference in New Issue