PEP 623: Remove wstr from Unicode object (#1462)
This commit is contained in:
parent
fa90054ead
commit
9ea076fbfb
|
@ -0,0 +1,172 @@
|
|||
PEP: 623
|
||||
Title: Remove wstr from Unicode
|
||||
Author: Inada Naoki <songofacandy@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 25-Jun-2020
|
||||
Python-Version: 3.10
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
PEP 393 deprecated some unicode APIs, and introduced ``wchar_t *wstr``,
|
||||
and ``Py_ssize_t wstr_length`` in the Unicode structure to support
|
||||
these deprecated APIs. [1]_
|
||||
|
||||
This PEP is planning removal of ``wstr``, and ``wstr_length`` with
|
||||
deprecated APIs using these members by Python 3.12.
|
||||
|
||||
Deprecated APIs which doesn't use the members are out of scope because
|
||||
they can be removed independently.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Memory usage
|
||||
------------
|
||||
|
||||
``str`` is one of the most used types in Python. Even most simple ASCII
|
||||
strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems.
|
||||
|
||||
|
||||
Runtime overhead
|
||||
----------------
|
||||
|
||||
To support legacy Unicode object created by
|
||||
``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
|
||||
``PyUnicode_READY()`` check.
|
||||
|
||||
When we drop support of legacy unicode object, We can reduce this
|
||||
overhead too.
|
||||
|
||||
|
||||
Simplicity
|
||||
----------
|
||||
|
||||
Support of legacy Unicode object makes Unicode implementation complex.
|
||||
Until we drop legacy Unicode object, it is very hard to try other
|
||||
Unicode implementation like UTF-8 based implementation in PyPy.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Python 4.0 is not scheduled yet
|
||||
-------------------------------
|
||||
|
||||
PEP 393 introduced efficient internal representation of Unicode and
|
||||
removed border between "narrow" and "wide" build of Python.
|
||||
|
||||
PEP 393 was implemented in Python 3.3 which is released in 2012. Old
|
||||
APIs were deprecated since then, and the removal was scheduled in
|
||||
Python 4.0.
|
||||
|
||||
Python 4.0 was expected as next version of Python 3.9 when PEP 393
|
||||
was accepted. But the next version of Python 3.9 is Python 3.10,
|
||||
not 4.0. This is why this PEP schedule the removal plan again.
|
||||
|
||||
|
||||
Python 2 reached EOL
|
||||
--------------------
|
||||
|
||||
Since Python 2 didn't have PEP 393 Unicode implementation, legacy
|
||||
APIs might help C extensiom modules supporting both of Python 2 and 3.
|
||||
|
||||
But Python 2 reached the EOL in 2020. We can remove legacy APIs kept
|
||||
for compatibility with Python 2.
|
||||
|
||||
|
||||
Plan
|
||||
====
|
||||
|
||||
Python 3.9 (current)
|
||||
--------------------
|
||||
|
||||
These macros and functions are marked as deprecated, using
|
||||
``Py_DEPRECATED`` macro.
|
||||
|
||||
* ``Py_UNICODE_WSTR_LENGTH()``
|
||||
* ``PyUnicode_GET_SIZE()``
|
||||
* ``PyUnicode_GetSize()``
|
||||
* ``PyUnicode_GET_DATA_SIZE()``
|
||||
* ``PyUnicode_AS_UNICODE()``
|
||||
* ``PyUnicode_AS_DATA()``
|
||||
* ``PyUnicode_AsUnicode()``
|
||||
* ``_PyUnicode_AsUnicode()``
|
||||
* ``PyUnicode_AsUnicodeAndSize()``
|
||||
* ``PyUnicode_FromUnicode()``
|
||||
|
||||
|
||||
Python 3.10
|
||||
-----------
|
||||
|
||||
* Following macros, enum members will be marked as deprecated.
|
||||
``Py_DEPRECATED(3.10)`` macro will be used as possible. But they
|
||||
will be deprecated only in comment and document if the macro can
|
||||
not be used easily.
|
||||
|
||||
* ``PyUnicode_WCHAR_KIND``
|
||||
* ``PyUnicode_READY()``
|
||||
* ``PyUnicode_IS_READY()``
|
||||
* ``PyUnicode_IS_COMPACT()``
|
||||
|
||||
* ``PyUnicode_FromUnicode(NULL, size)`` and
|
||||
``PyUnicode_FromStringAndSize(NULL, size)`` will emit
|
||||
``DeprecationWarning`` when ``size > 0``.
|
||||
|
||||
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will emit
|
||||
``DeprecationWarning`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used.
|
||||
|
||||
|
||||
Python 3.12
|
||||
-----------
|
||||
|
||||
* Following members will be removed from the Unicode strucutres:
|
||||
|
||||
* ``wstr``
|
||||
* ``wstr_length``
|
||||
* ``state.compact``
|
||||
* ``state.ready``
|
||||
|
||||
* The ``PyUnicodeObject`` struct will be removed.
|
||||
|
||||
* Following macros and functions, and enum members will be removed:
|
||||
|
||||
* ``Py_UNICODE_WSTR_LENGTH()``
|
||||
* ``PyUnicode_GET_SIZE()``
|
||||
* ``PyUnicode_GetSize()``
|
||||
* ``PyUnicode_GET_DATA_SIZE()``
|
||||
* ``PyUnicode_AS_UNICODE()``
|
||||
* ``PyUnicode_AS_DATA()``
|
||||
* ``PyUnicode_AsUnicode()``
|
||||
* ``_PyUnicode_AsUnicode()``
|
||||
* ``PyUnicode_AsUnicodeAndSize()``
|
||||
* ``PyUnicode_FromUnicode()``
|
||||
* ``PyUnicode_WCHAR_KIND``
|
||||
* ``PyUnicode_READY()``
|
||||
* ``PyUnicode_IS_READY()``
|
||||
* ``PyUnicode_IS_COMPACT()``
|
||||
|
||||
* ``PyUnicode_FromStringAndSize(NULL, size))`` will raise
|
||||
``RuntimeError`` when ``size > 0``.
|
||||
|
||||
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will raise
|
||||
``SystemError`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used,
|
||||
as other unsupported format character.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
A collection of URLs used as references through the PEP.
|
||||
|
||||
.. [1] PEP 393 -- Flexible String Representation
|
||||
(https://www.python.org/dev/peps/pep-0393/)
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
Loading…
Reference in New Issue