python-peps/pep-0623/index.html

321 lines
22 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 623 Remove wstr from Unicode | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0623/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 623 Remove wstr from Unicode | peps.python.org'>
<meta property="og:description" content="PEP 393 deprecated some unicode APIs, and introduced wchar_t *wstr, and Py_ssize_t wstr_length in the Unicode structure to support these deprecated APIs.">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0623/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="PEP 393 deprecated some unicode APIs, and introduced wchar_t *wstr, and Py_ssize_t wstr_length in the Unicode structure to support these deprecated APIs.">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 623</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 623 Remove wstr from Unicode</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Inada Naoki &lt;songofacandy&#32;&#97;t&#32;gmail.com&gt;</dd>
<dt class="field-even">BDFL-Delegate<span class="colon">:</span></dt>
<dd class="field-even">Victor Stinner &lt;vstinner&#32;&#97;t&#32;python.org&gt;</dd>
<dt class="field-odd">Discussions-To<span class="colon">:</span></dt>
<dd class="field-odd"><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/">Python-Dev thread</a></dd>
<dt class="field-even">Status<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd>
<dt class="field-odd">Type<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-even">Created<span class="colon">:</span></dt>
<dd class="field-even">25-Jun-2020</dd>
<dt class="field-odd">Python-Version<span class="colon">:</span></dt>
<dd class="field-odd">3.10</dd>
<dt class="field-even">Resolution<span class="colon">:</span></dt>
<dd class="field-even"><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/VQKDIZLZ6HF2MLTNCUFURK2IFTXVQEYA/">Python-Dev thread</a></dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a><ul>
<li><a class="reference internal" href="#memory-usage">Memory usage</a></li>
<li><a class="reference internal" href="#runtime-overhead">Runtime overhead</a></li>
<li><a class="reference internal" href="#simplicity">Simplicity</a></li>
</ul>
</li>
<li><a class="reference internal" href="#rationale">Rationale</a><ul>
<li><a class="reference internal" href="#python-4-0-is-not-scheduled-yet">Python 4.0 is not scheduled yet</a></li>
<li><a class="reference internal" href="#python-2-reached-eol">Python 2 reached EOL</a></li>
</ul>
</li>
<li><a class="reference internal" href="#plan">Plan</a><ul>
<li><a class="reference internal" href="#python-3-9">Python 3.9</a></li>
<li><a class="reference internal" href="#python-3-10">Python 3.10</a></li>
<li><a class="reference internal" href="#python-3-12">Python 3.12</a></li>
</ul>
</li>
<li><a class="reference internal" href="#discussion">Discussion</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p><a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> deprecated some unicode APIs, and introduced <code class="docutils literal notranslate"><span class="pre">wchar_t</span> <span class="pre">*wstr</span></code>,
and <code class="docutils literal notranslate"><span class="pre">Py_ssize_t</span> <span class="pre">wstr_length</span></code> in the Unicode structure to support
these deprecated APIs.</p>
<p>This PEP is planning removal of <code class="docutils literal notranslate"><span class="pre">wstr</span></code>, and <code class="docutils literal notranslate"><span class="pre">wstr_length</span></code> with
deprecated APIs using these members by Python 3.12.</p>
<p>Deprecated APIs which doesnt use the members are out of scope because
they can be removed independently.</p>
</section>
<section id="motivation">
<h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2>
<section id="memory-usage">
<h3><a class="toc-backref" href="#memory-usage" role="doc-backlink">Memory usage</a></h3>
<p><code class="docutils literal notranslate"><span class="pre">str</span></code> is one of the most used types in Python. Even most simple ASCII
strings have a <code class="docutils literal notranslate"><span class="pre">wstr</span></code> member. It consumes 8 bytes per string on 64-bit
systems.</p>
</section>
<section id="runtime-overhead">
<h3><a class="toc-backref" href="#runtime-overhead" role="doc-backlink">Runtime overhead</a></h3>
<p>To support legacy Unicode object, many Unicode APIs must call
<code class="docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code>.</p>
<p>We can remove this overhead too by dropping support of legacy Unicode
object.</p>
</section>
<section id="simplicity">
<h3><a class="toc-backref" href="#simplicity" role="doc-backlink">Simplicity</a></h3>
<p>Supporting legacy Unicode object makes the Unicode implementation more
complex.
Until we drop legacy Unicode object, it is very hard to try other
Unicode implementation like UTF-8 based implementation in PyPy.</p>
</section>
</section>
<section id="rationale">
<h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2>
<section id="python-4-0-is-not-scheduled-yet">
<h3><a class="toc-backref" href="#python-4-0-is-not-scheduled-yet" role="doc-backlink">Python 4.0 is not scheduled yet</a></h3>
<p><a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> introduced efficient internal representation of Unicode and
removed border between “narrow” and “wide” build of Python.</p>
<p><a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> was implemented in Python 3.3 which is released in 2012. Old
APIs were deprecated since then, and the removal was scheduled in
Python 4.0.</p>
<p>Python 4.0 was expected as next version of Python 3.9 when <a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a>
was accepted. But the next version of Python 3.9 is Python 3.10,
not 4.0. This is why this PEP schedule the removal plan again.</p>
</section>
<section id="python-2-reached-eol">
<h3><a class="toc-backref" href="#python-2-reached-eol" role="doc-backlink">Python 2 reached EOL</a></h3>
<p>Since Python 2 didnt have <a class="pep reference internal" href="../pep-0393/" title="PEP 393 Flexible String Representation">PEP 393</a> Unicode implementation, legacy
APIs might help C extension modules supporting both of Python 2 and 3.</p>
<p>But Python 2 reached the EOL in 2020. We can remove legacy APIs kept
for compatibility with Python 2.</p>
</section>
</section>
<section id="plan">
<h2><a class="toc-backref" href="#plan" role="doc-backlink">Plan</a></h2>
<section id="python-3-9">
<h3><a class="toc-backref" href="#python-3-9" role="doc-backlink">Python 3.9</a></h3>
<p>These macros and functions are marked as deprecated, using
<code class="docutils literal notranslate"><span class="pre">Py_DEPRECATED</span></code> macro.</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">Py_UNICODE_WSTR_LENGTH()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GetSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_DATA_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_UNICODE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_DATA()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">_PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicodeAndSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode()</span></code></li>
</ul>
</section>
<section id="python-3-10">
<h3><a class="toc-backref" href="#python-3-10" role="doc-backlink">Python 3.10</a></h3>
<ul class="simple">
<li>Following macros, enum members are marked as deprecated.
<code class="docutils literal notranslate"><span class="pre">Py_DEPRECATED(3.10)</span></code> macro are used as possible. But they
are deprecated only in comment and document if the macro can
not be used easily.<ul>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_WCHAR_KIND</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_COMPACT()</span></code></li>
</ul>
</li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode(NULL,</span> <span class="pre">size)</span></code> and
<code class="docutils literal notranslate"><span class="pre">PyUnicode_FromStringAndSize(NULL,</span> <span class="pre">size)</span></code> emit
<code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code> when <code class="docutils literal notranslate"><span class="pre">size</span> <span class="pre">&gt;</span> <span class="pre">0</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">PyArg_ParseTuple()</span></code> and <code class="docutils literal notranslate"><span class="pre">PyArg_ParseTupleAndKeywords()</span></code> emit
<code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code> when <code class="docutils literal notranslate"><span class="pre">u</span></code>, <code class="docutils literal notranslate"><span class="pre">u#</span></code>, <code class="docutils literal notranslate"><span class="pre">Z</span></code>, and <code class="docutils literal notranslate"><span class="pre">Z#</span></code> formats are used.</li>
</ul>
</section>
<section id="python-3-12">
<h3><a class="toc-backref" href="#python-3-12" role="doc-backlink">Python 3.12</a></h3>
<ul class="simple">
<li>Following members are removed from the Unicode structures:<ul>
<li><code class="docutils literal notranslate"><span class="pre">wstr</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">wstr_length</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">state.compact</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">state.ready</span></code></li>
</ul>
</li>
<li>The <code class="docutils literal notranslate"><span class="pre">PyUnicodeObject</span></code> structure is removed.</li>
<li>Following macros and functions, and enum members are removed:<ul>
<li><code class="docutils literal notranslate"><span class="pre">Py_UNICODE_WSTR_LENGTH()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GetSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_GET_DATA_SIZE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_UNICODE()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AS_DATA()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">_PyUnicode_AsUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicodeAndSize()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_WCHAR_KIND</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_READY()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_IS_COMPACT()</span></code></li>
</ul>
</li>
<li><code class="docutils literal notranslate"><span class="pre">PyUnicode_FromStringAndSize(NULL,</span> <span class="pre">size))</span></code> raises
<code class="docutils literal notranslate"><span class="pre">RuntimeError</span></code> when <code class="docutils literal notranslate"><span class="pre">size</span> <span class="pre">&gt;</span> <span class="pre">0</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">PyArg_ParseTuple()</span></code> and <code class="docutils literal notranslate"><span class="pre">PyArg_ParseTupleAndKeywords()</span></code> raise
<code class="docutils literal notranslate"><span class="pre">SystemError</span></code> when <code class="docutils literal notranslate"><span class="pre">u</span></code>, <code class="docutils literal notranslate"><span class="pre">u#</span></code>, <code class="docutils literal notranslate"><span class="pre">Z</span></code>, and <code class="docutils literal notranslate"><span class="pre">Z#</span></code> formats are used,
as other unsupported format character.</li>
</ul>
</section>
</section>
<section id="discussion">
<h2><a class="toc-backref" href="#discussion" role="doc-backlink">Discussion</a></h2>
<ul class="simple">
<li><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/#BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH">Draft PEP: Remove wstr from Unicode</a></li>
<li><a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR/#7JVC3IKS2V73K36ISEJAAWMRFN2T4KKR">When can we remove wchar_t* cache from string?</a></li>
<li><a class="reference external" href="https://github.com/python/peps/pull/1462">PEP 623: Remove wstr from Unicode object #1462</a></li>
</ul>
</section>
<section id="references">
<h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2>
<ul class="simple">
<li><a class="reference external" href="https://bugs.python.org/issue38604">bpo-38604: Schedule Py_UNICODE API removal</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue36346">bpo-36346: Prepare for removing the legacy Unicode C API</a></li>
<li><a class="reference external" href="https://bugs.python.org/issue30863">bpo-30863: Rewrite PyUnicode_AsWideChar() and
PyUnicode_AsWideCharString()</a>:
They no longer cache the <code class="docutils literal notranslate"><span class="pre">wchar_t*</span></code> representation of string
objects.</li>
</ul>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document has been placed in the public domain.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0623.rst">https://github.com/python/peps/blob/main/peps/pep-0623.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0623.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a><ul>
<li><a class="reference internal" href="#memory-usage">Memory usage</a></li>
<li><a class="reference internal" href="#runtime-overhead">Runtime overhead</a></li>
<li><a class="reference internal" href="#simplicity">Simplicity</a></li>
</ul>
</li>
<li><a class="reference internal" href="#rationale">Rationale</a><ul>
<li><a class="reference internal" href="#python-4-0-is-not-scheduled-yet">Python 4.0 is not scheduled yet</a></li>
<li><a class="reference internal" href="#python-2-reached-eol">Python 2 reached EOL</a></li>
</ul>
</li>
<li><a class="reference internal" href="#plan">Plan</a><ul>
<li><a class="reference internal" href="#python-3-9">Python 3.9</a></li>
<li><a class="reference internal" href="#python-3-10">Python 3.10</a></li>
<li><a class="reference internal" href="#python-3-12">Python 3.12</a></li>
</ul>
</li>
<li><a class="reference internal" href="#discussion">Discussion</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0623.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>