python-peps/pep-0686/index.html

298 lines
21 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 686 Make UTF-8 mode default | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0686/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 686 Make UTF-8 mode default | peps.python.org'>
<meta property="og:description" content="This PEP proposes enabling UTF-8 mode by default.">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0686/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="This PEP proposes enabling UTF-8 mode by default.">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 686</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 686 Make UTF-8 mode default</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Inada Naoki &lt;songofacandy&#32;&#97;t&#32;gmail.com&gt;</dd>
<dt class="field-even">Discussions-To<span class="colon">:</span></dt>
<dd class="field-even"><a class="reference external" href="https://discuss.python.org/t/14737">Discourse thread</a></dd>
<dt class="field-odd">Status<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Normative proposal accepted for implementation">Accepted</abbr></dd>
<dt class="field-even">Type<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-odd">Created<span class="colon">:</span></dt>
<dd class="field-odd">18-Mar-2022</dd>
<dt class="field-even">Python-Version<span class="colon">:</span></dt>
<dd class="field-even">3.15</dd>
<dt class="field-odd">Post-History<span class="colon">:</span></dt>
<dd class="field-odd"><a class="reference external" href="https://discuss.python.org/t/14435" title="Discourse thread">18-Mar-2022</a>,
<a class="reference external" href="https://discuss.python.org/t/14737" title="Discourse thread">31-Mar-2022</a></dd>
<dt class="field-even">Resolution<span class="colon">:</span></dt>
<dd class="field-even"><a class="reference external" href="https://discuss.python.org/t/14737/9">Discourse message</a></dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#specification">Specification</a><ul>
<li><a class="reference internal" href="#enable-utf-8-mode-by-default">Enable UTF-8 mode by default</a></li>
<li><a class="reference internal" href="#locale-getencoding"><code class="docutils literal notranslate"><span class="pre">locale.getencoding()</span></code></a></li>
<li><a class="reference internal" href="#fixing-encoding-locale-option">Fixing <code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code> option</a></li>
</ul>
</li>
<li><a class="reference internal" href="#backward-compatibility">Backward Compatibility</a></li>
<li><a class="reference internal" href="#preceding-examples">Preceding examples</a></li>
<li><a class="reference internal" href="#rejected-alternative">Rejected Alternative</a><ul>
<li><a class="reference internal" href="#deprecate-implicit-encoding">Deprecate implicit encoding</a></li>
<li><a class="reference internal" href="#use-pythonioencoding-for-pipes">Use <code class="docutils literal notranslate"><span class="pre">PYTHONIOENCODING</span></code> for PIPEs</a></li>
</ul>
</li>
<li><a class="reference internal" href="#how-to-teach-this">How to teach this</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p>This PEP proposes enabling <a class="pep reference internal" href="../pep-0540/" title="PEP 540 Add a new UTF-8 Mode">UTF-8 mode</a> by default.</p>
<p>With this change, Python consistently uses UTF-8 for default encoding of
files, stdio, and pipes.</p>
</section>
<section id="motivation">
<h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2>
<p>UTF-8 becomes de facto standard text encoding.</p>
<ul class="simple">
<li>The default encoding of Python source files is UTF-8.</li>
<li>JSON, TOML, YAML use UTF-8.</li>
<li>Most text editors, including Visual Studio Code and Windows Notepad use
UTF-8 by default.</li>
<li>Most websites and text data on the internet use UTF-8.</li>
<li>And many other popular programming languages, including Node.js, Go, Rust,
and Java uses UTF-8 by default.</li>
</ul>
<p>Changing the default encoding to UTF-8 makes it easier for Python to
interoperate with them.</p>
<p>Additionally, many Python developers using Unix forget that the default
encoding is platform dependent.
They omit to specify <code class="docutils literal notranslate"><span class="pre">encoding=&quot;utf-8&quot;</span></code> when they read text files encoded
in UTF-8 (e.g. JSON, TOML, Markdown, and Python source files).
Inconsistent default encoding causes many bugs.</p>
</section>
<section id="specification">
<h2><a class="toc-backref" href="#specification" role="doc-backlink">Specification</a></h2>
<section id="enable-utf-8-mode-by-default">
<h3><a class="toc-backref" href="#enable-utf-8-mode-by-default" role="doc-backlink">Enable UTF-8 mode by default</a></h3>
<p>Python will enable UTF-8 mode by default from Python 3.15.</p>
<p>Users can still disable UTF-8 mode by setting <code class="docutils literal notranslate"><span class="pre">PYTHONUTF8=0</span></code> or
<code class="docutils literal notranslate"><span class="pre">-X</span> <span class="pre">utf8=0</span></code>.</p>
</section>
<section id="locale-getencoding">
<h3><a class="toc-backref" href="#locale-getencoding" role="doc-backlink"><code class="docutils literal notranslate"><span class="pre">locale.getencoding()</span></code></a></h3>
<p>Since UTF-8 mode affects <code class="docutils literal notranslate"><span class="pre">locale.getpreferredencoding(False)</span></code>,
we need an API to get locale encoding regardless of UTF-8 mode.</p>
<p><code class="docutils literal notranslate"><span class="pre">locale.getencoding()</span></code> will be added for this purpose.
It returns locale encoding too, but ignores UTF-8 mode.</p>
<p>When <code class="docutils literal notranslate"><span class="pre">warn_default_encoding</span></code> option is specified,
<code class="docutils literal notranslate"><span class="pre">locale.getpreferredencoding()</span></code> will emit <code class="docutils literal notranslate"><span class="pre">EncodingWarning</span></code> like
<code class="docutils literal notranslate"><span class="pre">open()</span></code> (see also <a class="pep reference internal" href="../pep-0597/" title="PEP 597 Add optional EncodingWarning">PEP 597</a>).</p>
<p>This API was added in Python 3.11.</p>
</section>
<section id="fixing-encoding-locale-option">
<h3><a class="toc-backref" href="#fixing-encoding-locale-option" role="doc-backlink">Fixing <code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code> option</a></h3>
<p><a class="pep reference internal" href="../pep-0597/" title="PEP 597 Add optional EncodingWarning">PEP 597</a> added the <code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code> option to the <code class="docutils literal notranslate"><span class="pre">TextIOWrapper</span></code>.
This option is used to specify the locale encoding explicitly.
<code class="docutils literal notranslate"><span class="pre">TextIOWrapper</span></code> should use locale encoding when the option is specified,
regardless of default text encoding.</p>
<p>But <code class="docutils literal notranslate"><span class="pre">TextIOWrapper</span></code> uses <code class="docutils literal notranslate"><span class="pre">&quot;UTF-8&quot;</span></code> in UTF-8 mode even if
<code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code> is specified for now.
This behavior is inconsistent with the <a class="pep reference internal" href="../pep-0597/" title="PEP 597 Add optional EncodingWarning">PEP 597</a> motivation.
It is because we didnt expect making UTF-8 mode default when Python
changes its default text encoding.</p>
<p>This inconsistency should be fixed before making UTF-8 mode default.
<code class="docutils literal notranslate"><span class="pre">TextIOWrapper</span></code> should use locale encoding when <code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code> is
passed even in UTF-8 mode.</p>
<p>This issue was fixed in Python 3.11.</p>
</section>
</section>
<section id="backward-compatibility">
<h2><a class="toc-backref" href="#backward-compatibility" role="doc-backlink">Backward Compatibility</a></h2>
<p>Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its
locale is C or POSIX.
So this change mostly affects Windows users.</p>
<p>When a Python program depends on the default encoding, this change may cause
<code class="docutils literal notranslate"><span class="pre">UnicodeError</span></code>, mojibake, or even silent data corruption.
So this change should be announced loudly.</p>
<p>This is the guideline to fix this backward compatibility issue:</p>
<ol class="arabic simple">
<li>Disable UTF-8 mode.</li>
<li>Use <code class="docutils literal notranslate"><span class="pre">EncodingWarning</span></code> (<a class="pep reference internal" href="../pep-0597/" title="PEP 597 Add optional EncodingWarning">PEP 597</a>) to find every places UTF-8 mode
affects.<ul class="simple">
<li>If <code class="docutils literal notranslate"><span class="pre">encoding</span></code> option is omitted, consider using <code class="docutils literal notranslate"><span class="pre">encoding=&quot;utf-8&quot;</span></code>
or <code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code>.</li>
<li>If <code class="docutils literal notranslate"><span class="pre">locale.getpreferredencoding()</span></code> is used, consider using
<code class="docutils literal notranslate"><span class="pre">&quot;utf-8&quot;</span></code> or <code class="docutils literal notranslate"><span class="pre">locale.getencoding()</span></code>.</li>
</ul>
</li>
<li>Test the application with UTF-8 mode.</li>
</ol>
</section>
<section id="preceding-examples">
<h2><a class="toc-backref" href="#preceding-examples" role="doc-backlink">Preceding examples</a></h2>
<ul class="simple">
<li>Ruby <a class="reference external" href="https://bugs.ruby-lang.org/issues/16604">changed</a> the default <code class="docutils literal notranslate"><span class="pre">external_encoding</span></code>
to UTF-8 on Windows in Ruby 3.0 (2020).</li>
<li>Java <a class="reference external" href="https://openjdk.java.net/jeps/400">changed</a> the default text encoding
to UTF-8 in JDK 18. (2022).</li>
</ul>
<p>Both Ruby and Java have an option for backward compatibility.
They dont provide any warning like <a class="pep reference internal" href="../pep-0597/" title="PEP 597 Add optional EncodingWarning">PEP 597</a>s <code class="docutils literal notranslate"><span class="pre">EncodingWarning</span></code>
in Python for use of the default encoding.</p>
</section>
<section id="rejected-alternative">
<h2><a class="toc-backref" href="#rejected-alternative" role="doc-backlink">Rejected Alternative</a></h2>
<section id="deprecate-implicit-encoding">
<h3><a class="toc-backref" href="#deprecate-implicit-encoding" role="doc-backlink">Deprecate implicit encoding</a></h3>
<p>Deprecating the use of the default encoding is considered.</p>
<p>But there are many cases that the default encoding is used for reading/writing
only ASCII text.
Additionally, such warnings are not useful for non-cross platform applications
run on Unix.</p>
<p>So forcing users to specify the <code class="docutils literal notranslate"><span class="pre">encoding</span></code> everywhere is too painful.
Emitting a lot of <code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code> will lead users ignore warnings.</p>
<p><a class="pep reference internal" href="../pep-0387/" title="PEP 387 Backwards Compatibility Policy">PEP 387</a> requires adding a warning for backward incompatible changes.
But it doesnt require using <code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code>.
So using optional <code class="docutils literal notranslate"><span class="pre">EncodingWarning</span></code> doesnt violate the <a class="pep reference internal" href="../pep-0387/" title="PEP 387 Backwards Compatibility Policy">PEP 387</a>.</p>
<p>Java also rejected this idea in <a class="reference external" href="https://openjdk.java.net/jeps/400">JEP 400</a>.</p>
</section>
<section id="use-pythonioencoding-for-pipes">
<h3><a class="toc-backref" href="#use-pythonioencoding-for-pipes" role="doc-backlink">Use <code class="docutils literal notranslate"><span class="pre">PYTHONIOENCODING</span></code> for PIPEs</a></h3>
<p>To ease backward compatibility issue, using <code class="docutils literal notranslate"><span class="pre">PYTHONIOENCODING</span></code> as the
default encoding of PIPEs in the <code class="docutils literal notranslate"><span class="pre">subprocess</span></code> module is considered.</p>
<p>With this idea, users can use legacy encoding for
<code class="docutils literal notranslate"><span class="pre">subprocess.Popen(text=True)</span></code> even in UTF-8 mode.</p>
<p>But this idea makes “default encoding” complicated.
And this idea is also backward incompatible.</p>
<p>So this idea is rejected. Users can disable UTF-8 mode until they replace
<code class="docutils literal notranslate"><span class="pre">text=True</span></code> with <code class="docutils literal notranslate"><span class="pre">encoding=&quot;utf-8&quot;</span></code> or <code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code>.</p>
</section>
</section>
<section id="how-to-teach-this">
<h2><a class="toc-backref" href="#how-to-teach-this" role="doc-backlink">How to teach this</a></h2>
<p>For new users, this change reduces things that need to teach.
Users dont need to learn about text encoding in their first year.
They should learn it when they need to use non-UTF-8 text files.</p>
<p>For existing users, see the <a class="reference internal" href="#backward-compatibility">Backward compatibility</a> section.</p>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0686.rst">https://github.com/python/peps/blob/main/peps/pep-0686.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0686.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a></li>
<li><a class="reference internal" href="#specification">Specification</a><ul>
<li><a class="reference internal" href="#enable-utf-8-mode-by-default">Enable UTF-8 mode by default</a></li>
<li><a class="reference internal" href="#locale-getencoding"><code class="docutils literal notranslate"><span class="pre">locale.getencoding()</span></code></a></li>
<li><a class="reference internal" href="#fixing-encoding-locale-option">Fixing <code class="docutils literal notranslate"><span class="pre">encoding=&quot;locale&quot;</span></code> option</a></li>
</ul>
</li>
<li><a class="reference internal" href="#backward-compatibility">Backward Compatibility</a></li>
<li><a class="reference internal" href="#preceding-examples">Preceding examples</a></li>
<li><a class="reference internal" href="#rejected-alternative">Rejected Alternative</a><ul>
<li><a class="reference internal" href="#deprecate-implicit-encoding">Deprecate implicit encoding</a></li>
<li><a class="reference internal" href="#use-pythonioencoding-for-pipes">Use <code class="docutils literal notranslate"><span class="pre">PYTHONIOENCODING</span></code> for PIPEs</a></li>
</ul>
</li>
<li><a class="reference internal" href="#how-to-teach-this">How to teach this</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0686.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>