python-peps/pep-0331/index.html

321 lines
22 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 331 Locale-Independent Float/String Conversions | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0331/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 331 Locale-Independent Float/String Conversions | peps.python.org'>
<meta property="og:description" content="Support for the LC_NUMERIC locale category in Python 2.3 is implemented only in Python-space. This causes inconsistent behavior and thread-safety issues for applications that use extension modules and libraries implemented in C that parse and generate ...">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0331/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="Support for the LC_NUMERIC locale category in Python 2.3 is implemented only in Python-space. This causes inconsistent behavior and thread-safety issues for applications that use extension modules and libraries implemented in C that parse and generate ...">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 331</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 331 Locale-Independent Float/String Conversions</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Christian R. Reis &lt;kiko at async.com.br&gt;</dd>
<dt class="field-even">Status<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd>
<dt class="field-odd">Type<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-even">Created<span class="colon">:</span></dt>
<dd class="field-even">19-Jul-2003</dd>
<dt class="field-odd">Python-Version<span class="colon">:</span></dt>
<dd class="field-odd">2.4</dd>
<dt class="field-even">Post-History<span class="colon">:</span></dt>
<dd class="field-even">21-Jul-2003, 13-Aug-2003, 18-Jun-2004</dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#introduction">Introduction</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#example-problem">Example Problem</a></li>
<li><a class="reference internal" href="#proposal">Proposal</a></li>
<li><a class="reference internal" href="#potential-code-contributions">Potential Code Contributions</a></li>
<li><a class="reference internal" href="#risks">Risks</a></li>
<li><a class="reference internal" href="#implementation">Implementation</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p>Support for the <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> locale category in Python 2.3 is
implemented only in Python-space. This causes inconsistent
behavior and thread-safety issues for applications that use
extension modules and libraries implemented in C that parse and
generate floats from strings. This document proposes a plan for
removing this inconsistency by providing and using substitute
locale-agnostic functions as necessary.</p>
</section>
<section id="introduction">
<h2><a class="toc-backref" href="#introduction" role="doc-backlink">Introduction</a></h2>
<p>Python provides generic localization services through the locale
module, which among other things allows localizing the display and
conversion process of numeric types. Locale categories, such as
<code class="docutils literal notranslate"><span class="pre">LC_TIME</span></code> and <code class="docutils literal notranslate"><span class="pre">LC_COLLATE</span></code>, allow configuring precisely what aspects
of the application are to be localized.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> category specifies formatting for non-monetary
numeric information, such as the decimal separator in float and
fixed-precision numbers. Localization of the <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> category
is currently implemented only in Python-space; C libraries invoked
from the Python runtime are unaware of Pythons <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code>
setting. This is done to avoid changing the behavior of certain
low-level functions that are used by the Python parser and related
code <a class="footnote-reference brackets" href="#id11" id="id1">[2]</a>.</p>
<p>However, this presents a problem for extension modules that wrap C
libraries. Applications that use these extension modules will
inconsistently display and convert floating-point values.</p>
<p>James Henstridge, the author of PyGTK <a class="footnote-reference brackets" href="#id12" id="id2">[3]</a>, has additionally
pointed out that the <code class="docutils literal notranslate"><span class="pre">setlocale()</span></code> function also presents
thread-safety issues, since a thread may call the C library
<code class="docutils literal notranslate"><span class="pre">setlocale()</span></code> outside of the GIL, and cause Python to parse and
generate floats incorrectly.</p>
</section>
<section id="rationale">
<h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2>
<p>The inconsistency between Python and C library localization for
<code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> is a problem for any localized application using C
extensions. The exact nature of the problem will vary depending
on the application, but it will most likely occur when parsing or
formatting a floating-point value.</p>
</section>
<section id="example-problem">
<h2><a class="toc-backref" href="#example-problem" role="doc-backlink">Example Problem</a></h2>
<p>The initial problem that motivated this PEP is related to the
GtkSpinButton <a class="footnote-reference brackets" href="#id13" id="id3">[4]</a> widget in the GTK+ UI toolkit, wrapped by the
PyGTK module. The widget can be set to numeric mode, and when
this occurs, characters typed into it are evaluated as a number.</p>
<p>Problems occur when <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> is set to a locale with a float
separator that differs from the C locales standard (for instance,
, instead of . for the Brazilian locale pt_BR). Because
<code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> is not set at the libc level, float values are
displayed incorrectly (using . as a separator) in the
spinbuttons text entry, and it is impossible to enter fractional
values using the , separator.</p>
<p>This small example demonstrates reduced usability for localized
applications using this toolkit when coded in Python.</p>
</section>
<section id="proposal">
<h2><a class="toc-backref" href="#proposal" role="doc-backlink">Proposal</a></h2>
<p>Martin v. Löwis commented on the initial constraints for an
acceptable solution to the problem on python-dev:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> can be set at the C library level without
breaking the parser.</li>
<li><code class="docutils literal notranslate"><span class="pre">float()</span></code> and <code class="docutils literal notranslate"><span class="pre">str()</span></code> stay locale-unaware.</li>
<li>locale-aware <code class="docutils literal notranslate"><span class="pre">str()</span></code> and <code class="docutils literal notranslate"><span class="pre">atof()</span></code> stay in the locale module.</li>
</ul>
<p>An analysis of the Python source suggests that the following
functions currently depend on <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code> being set to the C
locale:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">Python/compile.c:parsenumber()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Python/marshal.c:r_object()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Objects/complexobject.c:complex_to_buf()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Objects/complexobject.c:complex_subtype_from_string()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Objects/floatobject.c:PyFloat_FromString()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Objects/floatobject.c:format_float()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Objects/stringobject.c:formatfloat()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Modules/stropmodule.c:strop_atof()</span></code></li>
<li><code class="docutils literal notranslate"><span class="pre">Modules/cPickle.c:load_float()</span></code></li>
</ul>
<p>The proposed approach is to implement <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code>-agnostic
functions for converting from (<code class="docutils literal notranslate"><span class="pre">strtod()</span></code>/<code class="docutils literal notranslate"><span class="pre">atof()</span></code>) and to
(<code class="docutils literal notranslate"><span class="pre">snprintf()</span></code>) float formats, using these functions where the
formatting should not vary according to the user-specified locale.</p>
<p>The locale module should also be changed to remove the
special-casing for <code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code>.</p>
<p>This change should also solve the aforementioned thread-safety
problems.</p>
</section>
<section id="potential-code-contributions">
<h2><a class="toc-backref" href="#potential-code-contributions" role="doc-backlink">Potential Code Contributions</a></h2>
<p>This problem was initially reported as a problem in the GTK+
libraries <a class="footnote-reference brackets" href="#id14" id="id4">[5]</a>; since then it has been correctly diagnosed as an
inconsistency in Pythons implementation. However, in a fortunate
coincidence, the glib library (developed primarily for GTK+, not
to be confused with the GNU C library) implements a number of
<code class="docutils literal notranslate"><span class="pre">LC_NUMERIC</span></code>-agnostic functions (for an example, see <a class="footnote-reference brackets" href="#id15" id="id5">[6]</a>) for
reasons similar to those presented in this paper.</p>
<p>In the same GTK+ problem report, Havoc Pennington suggested that
the glib authors would be willing to contribute this code to the
PSF, which would simplify implementation of this PEP considerably.
Alex Larsson, the original author of the glib code, submitted a
PSF Contributor Agreement <a class="footnote-reference brackets" href="#id16" id="id6">[7]</a> on 2003-08-20 <a class="footnote-reference brackets" href="#id17" id="id7">[8]</a> to ensure the code
could be safely integrated; this agreement has been received and
accepted.</p>
</section>
<section id="risks">
<h2><a class="toc-backref" href="#risks" role="doc-backlink">Risks</a></h2>
<p>There may be cross-platform issues with the provided
locale-agnostic functions, though this risk is low given that the
code supplied simply reverses any locale-dependent changes made to
floating-point numbers.</p>
<p>Martin and Guido pointed out potential copyright issues with the
contributed code. I believe we will have no problems in this area
as members of the GTK+ and glib teams have said they are fine with
relicensing the code, and a PSF contributor agreement has been
mailed in to ensure this safety.</p>
<p>Tim Peters has pointed out <a class="footnote-reference brackets" href="#id18" id="id8">[9]</a> that there are situations involving
threading in which the proposed change is insufficient to solve
the problem completely. A complete solution, however, does not
currently exist.</p>
</section>
<section id="implementation">
<h2><a class="toc-backref" href="#implementation" role="doc-backlink">Implementation</a></h2>
<p>An implementation was developed by Gustavo Carneiro &lt;gjc at
inescporto.pt&gt;, and attached to Sourceforge.net bug 774665 <a class="footnote-reference brackets" href="#id19" id="id9">[10]</a></p>
<p>The final patch <a class="footnote-reference brackets" href="#id20" id="id10">[11]</a> was integrated into Python CVS by Martin v.
Löwis on 2004-06-08, as stated in the bug report.</p>
</section>
<section id="references">
<h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2>
<aside class="footnote-list brackets">
<aside class="footnote brackets" id="id11" role="doc-footnote">
<dt class="label" id="id11">[<a href="#id1">2</a>]</dt>
<dd>Python locale documentation for embedding,
<a class="reference external" href="http://docs.python.org/library/locale.html">http://docs.python.org/library/locale.html</a></aside>
<aside class="footnote brackets" id="id12" role="doc-footnote">
<dt class="label" id="id12">[<a href="#id2">3</a>]</dt>
<dd>PyGTK homepage, <a class="reference external" href="http://www.daa.com.au/~james/pygtk/">http://www.daa.com.au/~james/pygtk/</a></aside>
<aside class="footnote brackets" id="id13" role="doc-footnote">
<dt class="label" id="id13">[<a href="#id3">4</a>]</dt>
<dd>GtkSpinButton screenshot (demonstrating problem),
<a class="reference external" href="http://www.async.com.br/~kiko/spin.png">http://www.async.com.br/~kiko/spin.png</a></aside>
<aside class="footnote brackets" id="id14" role="doc-footnote">
<dt class="label" id="id14">[<a href="#id4">5</a>]</dt>
<dd>GNOME bug report, <a class="reference external" href="http://bugzilla.gnome.org/show_bug.cgi?id=114132">http://bugzilla.gnome.org/show_bug.cgi?id=114132</a></aside>
<aside class="footnote brackets" id="id15" role="doc-footnote">
<dt class="label" id="id15">[<a href="#id5">6</a>]</dt>
<dd>Code submission of g_ascii_strtod and g_ascii_dtostr (later
renamed g_ascii_formatd) by Alex Larsson,
<a class="reference external" href="http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html">http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html</a></aside>
<aside class="footnote brackets" id="id16" role="doc-footnote">
<dt class="label" id="id16">[<a href="#id6">7</a>]</dt>
<dd>PSF Contributor Agreement,
<a class="reference external" href="https://www.python.org/psf/contrib/contrib-form/">https://www.python.org/psf/contrib/contrib-form/</a></aside>
<aside class="footnote brackets" id="id17" role="doc-footnote">
<dt class="label" id="id17">[<a href="#id7">8</a>]</dt>
<dd>Alex Larssons email confirming his agreement was mailed in,
<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2003-August/037755.html">https://mail.python.org/pipermail/python-dev/2003-August/037755.html</a></aside>
<aside class="footnote brackets" id="id18" role="doc-footnote">
<dt class="label" id="id18">[<a href="#id8">9</a>]</dt>
<dd>Tim Peters email summarizing LC_NUMERIC trouble with Spambayes,
<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2003-September/037898.html">https://mail.python.org/pipermail/python-dev/2003-September/037898.html</a></aside>
<aside class="footnote brackets" id="id19" role="doc-footnote">
<dt class="label" id="id19">[<a href="#id9">10</a>]</dt>
<dd>Python bug report, <a class="reference external" href="https://bugs.python.org/issue774665">https://bugs.python.org/issue774665</a></aside>
<aside class="footnote brackets" id="id20" role="doc-footnote">
<dt class="label" id="id20">[<a href="#id10">11</a>]</dt>
<dd>Integrated LC_NUMERIC-agnostic patch,
<a class="reference external" href="https://sourceforge.net/tracker/download.php?group_id=5470&amp;atid=305470&amp;file_id=89685&amp;aid=774665">https://sourceforge.net/tracker/download.php?group_id=5470&amp;atid=305470&amp;file_id=89685&amp;aid=774665</a></aside>
</aside>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document has been placed in the public domain.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0331.rst">https://github.com/python/peps/blob/main/peps/pep-0331.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0331.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#introduction">Introduction</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#example-problem">Example Problem</a></li>
<li><a class="reference internal" href="#proposal">Proposal</a></li>
<li><a class="reference internal" href="#potential-code-contributions">Potential Code Contributions</a></li>
<li><a class="reference internal" href="#risks">Risks</a></li>
<li><a class="reference internal" href="#implementation">Implementation</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0331.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>