python-peps/pep-0623/index.html

321 lines
22 KiB
HTML
Raw Normal View History

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 623 Remove wstr from Unicode | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0623/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 623 Remove wstr from Unicode | peps.python.org'>
<meta property="og:description" content="PEP 393 deprecated some unicode APIs, and introduced wchar_t *wstr, and Py_ssize_t wstr_length in the Unicode structure to support these deprecated APIs.">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0623/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="PEP 393 deprecated some unicode APIs, and introduced wchar_t *wstr, and Py_ssize_t wstr_length in the Unicode structure to support these deprecated APIs.">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 623</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 623 Remove wstr from Unicode</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Inada Naoki &lt;songofacandy&#32;&#97;t&#32;gmail.com&gt;</dd>
<dt class="field-even">BDFL-Delegate<span class="colon">:</span></dt>
<dd class="field-even">Victor Stinner &lt;vstinner&#32;&#97;t&#32;python.org&gt;</dd>
<dt class="field-odd">Discussions-To<span class="colon">:</span></dt>
<dd class="field-odd"><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/">Python-Dev thread</a></dd>
<dt class="field-even">Status<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd>
<dt class="field-odd">Type<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-even">Created<span class="colon">:</span></dt>
<dd class="field-even">25-Jun-2020</dd>
<dt class="field-odd">Python-Version<span class="colon">:</span></dt>
<dd class="field-odd">3.10</dd>
<dt class="field-even">Resolution<span class="colon">:</span></dt>
<dd class="field-even"><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/VQKDIZLZ6HF2MLTNCUFURK2IFTXVQEYA/">Python-Dev thread</a></dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a><ul>
<li><a class="reference internal" href="#memory-usage">Memory usage</a></li>
<li><a class="reference internal" href="#runtime-overhead">Runtime overhead</a></li>
<li><a class="reference internal" href="#simplicity">Simplicity</a></li>
</ul>
</li>
<li><a class="reference internal" href="#rationale">Rationale</a><ul>
<li><a class="reference internal" href="#python-4-0-is-not-scheduled-yet">Python 4.0 is not scheduled yet</a></li>
<li><a class="reference internal" href="#python-2-reached-eol">Python 2 reached EOL</a></li>
</ul>
</li>
<li><a class="reference internal" href="#plan">Plan</a><ul>
<li><a class="reference internal" href="#python-3-9">Python 3.9</a></li>
<li><a class="reference internal" href="#python-3-10">Python 3.10</a></li>
<li><a class="reference internal" href="#python-3-12">Python 3.12</a></li>
</ul>
</li>
<li><a class="reference internal" href="#discussion">Discussion</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p><a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> deprecated some unicode APIs, and introduced <code class="docutils literal notranslate"><span class="pre">wchar_t</span> <span class="pre">*wstr</span></code>,
and <code class="docutils literal notranslate"><span class="pre">Py_ssize_t</span> <span class="pre">wstr_length</span></code> in the Unicode structure to support
these deprecated APIs.</p>
<p>This PEP is planning removal of <code class="docutils literal notranslate"><span class="pre">wstr</span></code>, and <code class="docutils literal notranslate"><span class="pre">wstr_length</span></code> with
deprecated APIs using these members by Python 3.12.</p>
<p>Deprecated APIs which doesnt use the members are out of scope because
they can be removed independently.</p>
</section>
<section id="motivation">
<h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2>
<section id="memory-usage">
<h3><a class="toc-backref" href="#memory-usage" role="doc-backlink">Memory usage</a></h3>
<p><code class="docutils literal notranslate"><span class="pre">str</span></code> is one of the most used types in Python. Even most simple ASCII
strings have a <code class="docutils literal notranslate"><span class="pre">wstr</span></code> member. It consumes 8 bytes per string on 64-bit
systems.</p>
</section>
<section id="runtime-overhead">
<h3><a class="toc-backref" href="#runtime-overhead" role="doc-backlink">Runtime overhead</a></h3>
<p>To support legacy Unicode object, many Unicode APIs must call
<code class="docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code>.</p>
<p>We can remove this overhead too by dropping support of legacy Unicode
object.</p>
</section>
<section id="simplicity">
<h3><a class="toc-backref" href="#simplicity" role="doc-backlink">Simplicity</a></h3>
<p>Supporting legacy Unicode object makes the Unicode implementation more
complex.
Until we drop legacy Unicode object, it is very hard to try other
Unicode implementation like UTF-8 based implementation in PyPy.</p>
</section>
</section>
<section id="rationale">
<h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2>
<section id="python-4-0-is-not-scheduled-yet">
<h3><a class="toc-backref" href="#python-4-0-is-not-scheduled-yet" role="doc-backlink">Python 4.0 is not scheduled yet</a></h3>
<p><a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> introduced efficient internal representation of Unicode and
removed border between “narrow” and “wide” build of Python.</p>
<p><a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> was implemented in Python 3.3 which is released in 2012. Old
APIs were deprecated since then, and the removal was scheduled in
Python 4.0.</p>
<p>Python 4.0 was expected as next version of Python 3.9 when <a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a>
was accepted. But the next version of Python 3.9 is Python 3.10,
not 4.0. This is why this PEP schedule the removal plan again.</p>
</section>
<section id="python-2-reached-eol">
<h3><a class="toc-backref" href="#python-2-reached-eol" role="doc-backlink">Python 2 reached EOL</a></h3>
<p>Since Python 2 didnt have <a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> Unicode implementation, legacy
APIs might help C extension modules supporting both of Python 2 and 3.</p>
<p>But Python 2 reached the EOL in 2020. We can remove legacy APIs kept
for compatibility with Python 2.</p>
</section>
</section>
<section id="plan">
<h2><a class="toc-backref" href="#plan" role="doc-backlink">Plan</a></h2>
<section id="python-3-9">
<h3><a class="toc-backref" href="#python-3-9" role="doc-backlink">Python 3.9</a></h3>
<p>These macros and functions are marked as deprecated, using
<code class="docutils literal notranslate"><span class="pre">Py_DEPRECATED</span></code> macro.</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">Py_UNICODE_WSTR_LENGTH()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GetSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_DATA_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_UNICODE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_DATA()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">_PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicodeAndSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode()</span></code></li>
</ul>
</section>
<section id="python-3-10">
<h3><a class="toc-backref" href="#python-3-10" role="doc-backlink">Python 3.10</a></h3>
<ul class="simple">
<li>Following macros, enum members are marked as deprecated.
<code class="docutils literal notranslate"><span class="pre">Py_DEPRECATED(3.10)</span></code> macro are used as possible. But they
are deprecated only in comment and document if the macro can
not be used easily.<ul>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_WCHAR_KIND</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_COMPACT()</span></code></li>
</ul>
</li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode(NULL,</span> <span class="pre">size)</span></code> and
<code class="docutils literal notranslate"><span class="pre">PyUnicode_FromStringAndSize(NULL,</span> <span class="pre">size)</span></code> emit
<code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code> when <code class="docutils literal notranslate"><span class="pre">size</span> <span class="pre">&gt;</span> <span class="pre">0</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">PyArg_ParseTuple()</span></code> and <code class="docutils literal notranslate"><span class="pre">PyArg_ParseTupleAndKeywords()</span></code> emit
<code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code> when <code class="docutils literal notranslate"><span class="pre">u</span></code>, <code class="docutils literal notranslate"><span class="pre">u#</span></code>, <code class="docutils literal notranslate"><span class="pre">Z</span></code>, and <code class="docutils literal notranslate"><span class="pre">Z#</span></code> formats are used.</li>
</ul>
</section>
<section id="python-3-12">
<h3><a class="toc-backref" href="#python-3-12" role="doc-backlink">Python 3.12</a></h3>
<ul class="simple">
<li>Following members are removed from the Unicode structures:<ul>
<li><code class="docutils literal notranslate"><span class="pre">wstr</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">wstr_length</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">state.compact</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">state.ready</span></code></li>
</ul>
</li>
<li>The <code class="docutils literal notranslate"><span class="pre">PyUnicodeObject</span></code> structure is removed.</li>
<li>Following macros and functions, and enum members are removed:<ul>
<li><code class="docutils literal notranslate"><span class="pre">Py_UNICODE_WSTR_LENGTH()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GetSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_DATA_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_UNICODE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_DATA()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">_PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicodeAndSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_WCHAR_KIND</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_COMPACT()</span></code></li>
</ul>
</li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromStringAndSize(NULL,</span> <span class="pre">size))</span></code> raises
<code class="docutils literal notranslate"><span class="pre">RuntimeError</span></code> when <code class="docutils literal notranslate"><span class="pre">size</span> <span class="pre">&gt;</span> <span class="pre">0</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">PyArg_ParseTuple()</span></code> and <code class="docutils literal notranslate"><span class="pre">PyArg_ParseTupleAndKeywords()</span></code> raise
<code class="docutils literal notranslate"><span class="pre">SystemError</span></code> when <code class="docutils literal notranslate"><span class="pre">u</span></code>, <code class="docutils literal notranslate"><span class="pre">u#</span></code>, <code class="docutils literal notranslate"><span class="pre">Z</span></code>, and <code class="docutils literal notranslate"><span class="pre">Z#</span></code> formats are used,
as other unsupported format character.</li>
</ul>
</section>
</section>
<section id="discussion">
<h2><a class="toc-backref" href="#discussion" role="doc-backlink">Discussion</a></h2>
<ul class="simple">
<li><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/#BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH">Draft PEP: Remove wstr from Unicode</a></li>
<li><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR/#7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR">When can we remove wchar_t* cache from string?</a></li>
<li><a class="reference external" href="https://github.com/python/peps/pull/1462">PEP 623: Remove wstr from Unicode object #1462</a></li>
</ul>
</section>
<section id="references">
<h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue38604">bpo-38604: Schedule Py_UNICODE API removal</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue36346">bpo-36346: Prepare for removing the legacy Unicode C API</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue30863">bpo-30863: Rewrite PyUnicode_AsWideChar() and
PyUnicode_AsWideCharString()</a>:
They no longer cache the <code class="docutils literal notranslate"><span class="pre">wchar_t*</span></code> representation of string
objects.</li>
</ul>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document has been placed in the public domain.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0623.rst">https://github.com/python/peps/blob/main/peps/pep-0623.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0623.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a><ul>
<li><a class="reference internal" href="#memory-usage">Memory usage</a></li>
<li><a class="reference internal" href="#runtime-overhead">Runtime overhead</a></li>
<li><a class="reference internal" href="#simplicity">Simplicity</a></li>
</ul>
</li>
<li><a class="reference internal" href="#rationale">Rationale</a><ul>
<li><a class="reference internal" href="#python-4-0-is-not-scheduled-yet">Python 4.0 is not scheduled yet</a></li>
<li><a class="reference internal" href="#python-2-reached-eol">Python 2 reached EOL</a></li>
</ul>
</li>
<li><a class="reference internal" href="#plan">Plan</a><ul>
<li><a class="reference internal" href="#python-3-9">Python 3.9</a></li>
<li><a class="reference internal" href="#python-3-10">Python 3.10</a></li>
<li><a class="reference internal" href="#python-3-12">Python 3.12</a></li>
</ul>
</li>
<li><a class="reference internal" href="#discussion">Discussion</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0623.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>