2020-06-25 07:16:25 -04:00
|
|
|
PEP: 623
|
|
|
|
Title: Remove wstr from Unicode
|
|
|
|
Author: Inada Naoki <songofacandy@gmail.com>
|
2020-06-25 19:02:01 -04:00
|
|
|
BDFL-Delegate: Victor Stinner <vstinner@python.org>
|
2020-06-25 07:16:25 -04:00
|
|
|
Status: Draft
|
|
|
|
Type: Standards Track
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 25-Jun-2020
|
|
|
|
Python-Version: 3.10
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
PEP 393 deprecated some unicode APIs, and introduced ``wchar_t *wstr``,
|
|
|
|
and ``Py_ssize_t wstr_length`` in the Unicode structure to support
|
|
|
|
these deprecated APIs. [1]_
|
|
|
|
|
|
|
|
This PEP is planning removal of ``wstr``, and ``wstr_length`` with
|
|
|
|
deprecated APIs using these members by Python 3.12.
|
|
|
|
|
|
|
|
Deprecated APIs which doesn't use the members are out of scope because
|
|
|
|
they can be removed independently.
|
|
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
==========
|
|
|
|
|
|
|
|
Memory usage
|
|
|
|
------------
|
|
|
|
|
|
|
|
``str`` is one of the most used types in Python. Even most simple ASCII
|
|
|
|
strings have a ``wstr`` member. It consumes 8 bytes on 64bit systems.
|
|
|
|
|
|
|
|
|
|
|
|
Runtime overhead
|
|
|
|
----------------
|
|
|
|
|
|
|
|
To support legacy Unicode object created by
|
|
|
|
``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
|
|
|
|
``PyUnicode_READY()`` check.
|
|
|
|
|
|
|
|
When we drop support of legacy unicode object, We can reduce this
|
|
|
|
overhead too.
|
|
|
|
|
|
|
|
|
|
|
|
Simplicity
|
|
|
|
----------
|
|
|
|
|
|
|
|
Support of legacy Unicode object makes Unicode implementation complex.
|
|
|
|
Until we drop legacy Unicode object, it is very hard to try other
|
|
|
|
Unicode implementation like UTF-8 based implementation in PyPy.
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
=========
|
|
|
|
|
|
|
|
Python 4.0 is not scheduled yet
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
PEP 393 introduced efficient internal representation of Unicode and
|
|
|
|
removed border between "narrow" and "wide" build of Python.
|
|
|
|
|
|
|
|
PEP 393 was implemented in Python 3.3 which is released in 2012. Old
|
|
|
|
APIs were deprecated since then, and the removal was scheduled in
|
|
|
|
Python 4.0.
|
|
|
|
|
|
|
|
Python 4.0 was expected as next version of Python 3.9 when PEP 393
|
|
|
|
was accepted. But the next version of Python 3.9 is Python 3.10,
|
|
|
|
not 4.0. This is why this PEP schedule the removal plan again.
|
|
|
|
|
|
|
|
|
|
|
|
Python 2 reached EOL
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Since Python 2 didn't have PEP 393 Unicode implementation, legacy
|
|
|
|
APIs might help C extensiom modules supporting both of Python 2 and 3.
|
|
|
|
|
|
|
|
But Python 2 reached the EOL in 2020. We can remove legacy APIs kept
|
|
|
|
for compatibility with Python 2.
|
|
|
|
|
|
|
|
|
|
|
|
Plan
|
|
|
|
====
|
|
|
|
|
|
|
|
Python 3.9 (current)
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
These macros and functions are marked as deprecated, using
|
|
|
|
``Py_DEPRECATED`` macro.
|
|
|
|
|
|
|
|
* ``Py_UNICODE_WSTR_LENGTH()``
|
|
|
|
* ``PyUnicode_GET_SIZE()``
|
|
|
|
* ``PyUnicode_GetSize()``
|
|
|
|
* ``PyUnicode_GET_DATA_SIZE()``
|
|
|
|
* ``PyUnicode_AS_UNICODE()``
|
|
|
|
* ``PyUnicode_AS_DATA()``
|
|
|
|
* ``PyUnicode_AsUnicode()``
|
|
|
|
* ``_PyUnicode_AsUnicode()``
|
|
|
|
* ``PyUnicode_AsUnicodeAndSize()``
|
|
|
|
* ``PyUnicode_FromUnicode()``
|
|
|
|
|
|
|
|
|
|
|
|
Python 3.10
|
|
|
|
-----------
|
|
|
|
|
|
|
|
* Following macros, enum members will be marked as deprecated.
|
|
|
|
``Py_DEPRECATED(3.10)`` macro will be used as possible. But they
|
|
|
|
will be deprecated only in comment and document if the macro can
|
|
|
|
not be used easily.
|
|
|
|
|
|
|
|
* ``PyUnicode_WCHAR_KIND``
|
|
|
|
* ``PyUnicode_READY()``
|
|
|
|
* ``PyUnicode_IS_READY()``
|
|
|
|
* ``PyUnicode_IS_COMPACT()``
|
|
|
|
|
|
|
|
* ``PyUnicode_FromUnicode(NULL, size)`` and
|
|
|
|
``PyUnicode_FromStringAndSize(NULL, size)`` will emit
|
|
|
|
``DeprecationWarning`` when ``size > 0``.
|
|
|
|
|
|
|
|
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will emit
|
|
|
|
``DeprecationWarning`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used.
|
|
|
|
|
|
|
|
|
|
|
|
Python 3.12
|
|
|
|
-----------
|
|
|
|
|
|
|
|
* Following members will be removed from the Unicode strucutres:
|
|
|
|
|
|
|
|
* ``wstr``
|
|
|
|
* ``wstr_length``
|
|
|
|
* ``state.compact``
|
|
|
|
* ``state.ready``
|
|
|
|
|
|
|
|
* The ``PyUnicodeObject`` struct will be removed.
|
|
|
|
|
|
|
|
* Following macros and functions, and enum members will be removed:
|
|
|
|
|
|
|
|
* ``Py_UNICODE_WSTR_LENGTH()``
|
|
|
|
* ``PyUnicode_GET_SIZE()``
|
|
|
|
* ``PyUnicode_GetSize()``
|
|
|
|
* ``PyUnicode_GET_DATA_SIZE()``
|
|
|
|
* ``PyUnicode_AS_UNICODE()``
|
|
|
|
* ``PyUnicode_AS_DATA()``
|
|
|
|
* ``PyUnicode_AsUnicode()``
|
|
|
|
* ``_PyUnicode_AsUnicode()``
|
|
|
|
* ``PyUnicode_AsUnicodeAndSize()``
|
|
|
|
* ``PyUnicode_FromUnicode()``
|
|
|
|
* ``PyUnicode_WCHAR_KIND``
|
|
|
|
* ``PyUnicode_READY()``
|
|
|
|
* ``PyUnicode_IS_READY()``
|
|
|
|
* ``PyUnicode_IS_COMPACT()``
|
|
|
|
|
|
|
|
* ``PyUnicode_FromStringAndSize(NULL, size))`` will raise
|
|
|
|
``RuntimeError`` when ``size > 0``.
|
|
|
|
|
|
|
|
* ``PyArg_ParseTuple()`` and ``PyArg_ParseTupleAndKeywords()`` will raise
|
|
|
|
``SystemError`` when ``u``, ``u#``, ``Z``, and ``Z#`` formats are used,
|
|
|
|
as other unsupported format character.
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
A collection of URLs used as references through the PEP.
|
|
|
|
|
|
|
|
.. [1] PEP 393 -- Flexible String Representation
|
|
|
|
(https://www.python.org/dev/peps/pep-0393/)
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|