2004-07-19 14:08:57 -04:00
|
|
|
PEP: 331
|
|
|
|
Title: Locale-Independent Float/String Conversions
|
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: Christian R. Reis <kiko at async.com.br>
|
2007-05-18 13:41:31 -04:00
|
|
|
Status: Final
|
2004-07-19 14:08:57 -04:00
|
|
|
Type: Standards Track
|
2017-02-02 12:58:49 -05:00
|
|
|
Content-Type: text/x-rst
|
2004-07-19 14:08:57 -04:00
|
|
|
Created: 19-Jul-2003
|
|
|
|
Python-Version: 2.4
|
|
|
|
Post-History: 21-Jul-2003, 13-Aug-2003, 18-Jun-2004
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
2017-02-02 12:58:49 -05:00
|
|
|
========
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
Support for the ``LC_NUMERIC`` locale category in Python 2.3 is
|
|
|
|
implemented only in Python-space. This causes inconsistent
|
|
|
|
behavior and thread-safety issues for applications that use
|
|
|
|
extension modules and libraries implemented in C that parse and
|
|
|
|
generate floats from strings. This document proposes a plan for
|
|
|
|
removing this inconsistency by providing and using substitute
|
|
|
|
locale-agnostic functions as necessary.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Introduction
|
2017-02-02 12:58:49 -05:00
|
|
|
============
|
|
|
|
|
|
|
|
Python provides generic localization services through the locale
|
|
|
|
module, which among other things allows localizing the display and
|
|
|
|
conversion process of numeric types. Locale categories, such as
|
|
|
|
``LC_TIME`` and ``LC_COLLATE``, allow configuring precisely what aspects
|
|
|
|
of the application are to be localized.
|
|
|
|
|
|
|
|
The ``LC_NUMERIC`` category specifies formatting for non-monetary
|
|
|
|
numeric information, such as the decimal separator in float and
|
|
|
|
fixed-precision numbers. Localization of the ``LC_NUMERIC`` category
|
|
|
|
is currently implemented only in Python-space; C libraries invoked
|
|
|
|
from the Python runtime are unaware of Python's ``LC_NUMERIC``
|
|
|
|
setting. This is done to avoid changing the behavior of certain
|
|
|
|
low-level functions that are used by the Python parser and related
|
|
|
|
code [2]_.
|
|
|
|
|
|
|
|
However, this presents a problem for extension modules that wrap C
|
|
|
|
libraries. Applications that use these extension modules will
|
|
|
|
inconsistently display and convert floating-point values.
|
|
|
|
|
|
|
|
James Henstridge, the author of PyGTK [3]_, has additionally
|
|
|
|
pointed out that the ``setlocale()`` function also presents
|
|
|
|
thread-safety issues, since a thread may call the C library
|
|
|
|
``setlocale()`` outside of the GIL, and cause Python to parse and
|
|
|
|
generate floats incorrectly.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
2017-02-02 12:58:49 -05:00
|
|
|
=========
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
The inconsistency between Python and C library localization for
|
|
|
|
``LC_NUMERIC`` is a problem for any localized application using C
|
|
|
|
extensions. The exact nature of the problem will vary depending
|
|
|
|
on the application, but it will most likely occur when parsing or
|
|
|
|
formatting a floating-point value.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Example Problem
|
2017-02-02 12:58:49 -05:00
|
|
|
===============
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
The initial problem that motivated this PEP is related to the
|
|
|
|
GtkSpinButton [4]_ widget in the GTK+ UI toolkit, wrapped by the
|
|
|
|
PyGTK module. The widget can be set to numeric mode, and when
|
|
|
|
this occurs, characters typed into it are evaluated as a number.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
Problems occur when ``LC_NUMERIC`` is set to a locale with a float
|
|
|
|
separator that differs from the C locale's standard (for instance,
|
|
|
|
',' instead of '.' for the Brazilian locale pt_BR). Because
|
|
|
|
``LC_NUMERIC`` is not set at the libc level, float values are
|
|
|
|
displayed incorrectly (using '.' as a separator) in the
|
|
|
|
spinbutton's text entry, and it is impossible to enter fractional
|
|
|
|
values using the ',' separator.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
This small example demonstrates reduced usability for localized
|
|
|
|
applications using this toolkit when coded in Python.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Proposal
|
2017-02-02 12:58:49 -05:00
|
|
|
========
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
Martin v. Löwis commented on the initial constraints for an
|
|
|
|
acceptable solution to the problem on python-dev:
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
- ``LC_NUMERIC`` can be set at the C library level without
|
|
|
|
breaking the parser.
|
|
|
|
- ``float()`` and ``str()`` stay locale-unaware.
|
|
|
|
- locale-aware ``str()`` and ``atof()`` stay in the locale module.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
An analysis of the Python source suggests that the following
|
|
|
|
functions currently depend on ``LC_NUMERIC`` being set to the C
|
|
|
|
locale:
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
- ``Python/compile.c:parsenumber()``
|
|
|
|
- ``Python/marshal.c:r_object()``
|
|
|
|
- ``Objects/complexobject.c:complex_to_buf()``
|
|
|
|
- ``Objects/complexobject.c:complex_subtype_from_string()``
|
|
|
|
- ``Objects/floatobject.c:PyFloat_FromString()``
|
|
|
|
- ``Objects/floatobject.c:format_float()``
|
|
|
|
- ``Objects/stringobject.c:formatfloat()``
|
|
|
|
- ``Modules/stropmodule.c:strop_atof()``
|
|
|
|
- ``Modules/cPickle.c:load_float()``
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
The proposed approach is to implement ``LC_NUMERIC``-agnostic
|
|
|
|
functions for converting from (``strtod()``/``atof()``) and to
|
|
|
|
(``snprintf()``) float formats, using these functions where the
|
|
|
|
formatting should not vary according to the user-specified locale.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
The locale module should also be changed to remove the
|
|
|
|
special-casing for ``LC_NUMERIC``.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
This change should also solve the aforementioned thread-safety
|
|
|
|
problems.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Potential Code Contributions
|
2017-02-02 12:58:49 -05:00
|
|
|
============================
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
This problem was initially reported as a problem in the GTK+
|
|
|
|
libraries [5]_; since then it has been correctly diagnosed as an
|
|
|
|
inconsistency in Python's implementation. However, in a fortunate
|
|
|
|
coincidence, the glib library (developed primarily for GTK+, not
|
|
|
|
to be confused with the GNU C library) implements a number of
|
|
|
|
``LC_NUMERIC``-agnostic functions (for an example, see [6]_) for
|
|
|
|
reasons similar to those presented in this paper.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
In the same GTK+ problem report, Havoc Pennington suggested that
|
|
|
|
the glib authors would be willing to contribute this code to the
|
|
|
|
PSF, which would simplify implementation of this PEP considerably.
|
|
|
|
Alex Larsson, the original author of the glib code, submitted a
|
|
|
|
PSF Contributor Agreement [7]_ on 2003-08-20 [8]_ to ensure the code
|
|
|
|
could be safely integrated; this agreement has been received and
|
|
|
|
accepted.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Risks
|
2017-02-02 12:58:49 -05:00
|
|
|
=====
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
There may be cross-platform issues with the provided
|
|
|
|
locale-agnostic functions, though this risk is low given that the
|
|
|
|
code supplied simply reverses any locale-dependent changes made to
|
|
|
|
floating-point numbers.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
Martin and Guido pointed out potential copyright issues with the
|
|
|
|
contributed code. I believe we will have no problems in this area
|
|
|
|
as members of the GTK+ and glib teams have said they are fine with
|
|
|
|
relicensing the code, and a PSF contributor agreement has been
|
|
|
|
mailed in to ensure this safety.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
Tim Peters has pointed out [9]_ that there are situations involving
|
|
|
|
threading in which the proposed change is insufficient to solve
|
|
|
|
the problem completely. A complete solution, however, does not
|
|
|
|
currently exist.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Implementation
|
2017-02-02 12:58:49 -05:00
|
|
|
==============
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
An implementation was developed by Gustavo Carneiro <gjc at
|
|
|
|
inescporto.pt>, and attached to Sourceforge.net bug 774665 [10]_
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
The final patch [11]_ was integrated into Python CVS by Martin v.
|
|
|
|
Löwis on 2004-06-08, as stated in the bug report.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
References
|
2017-02-02 12:58:49 -05:00
|
|
|
==========
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [2] Python locale documentation for embedding,
|
|
|
|
http://docs.python.org/library/locale.html
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [4] GtkSpinButton screenshot (demonstrating problem),
|
|
|
|
http://www.async.com.br/~kiko/spin.png
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
|
|
|
|
renamed g_ascii_formatd) by Alex Larsson,
|
|
|
|
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [7] PSF Contributor Agreement,
|
|
|
|
https://www.python.org/psf/contrib/contrib-form/
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [8] Alex Larsson's email confirming his agreement was mailed in,
|
2017-06-11 15:02:39 -04:00
|
|
|
https://mail.python.org/pipermail/python-dev/2003-August/037755.html
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
|
2017-06-11 15:02:39 -04:00
|
|
|
https://mail.python.org/pipermail/python-dev/2003-September/037898.html
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [10] Python bug report, https://bugs.python.org/issue774665
|
2004-07-19 14:08:57 -04:00
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
.. [11] Integrated LC_NUMERIC-agnostic patch,
|
|
|
|
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
2017-02-02 12:58:49 -05:00
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
2017-02-02 12:58:49 -05:00
|
|
|
..
|
|
|
|
Local Variables:
|
|
|
|
mode: indented-text
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
sentence-end-double-space: t
|
|
|
|
fill-column: 70
|
|
|
|
coding: utf-8
|
|
|
|
End:
|