2004-07-19 14:08:57 -04:00
|
|
|
|
PEP: 331
|
|
|
|
|
Title: Locale-Independent Float/String Conversions
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Christian R. Reis <kiko at async.com.br>
|
2007-05-18 13:41:31 -04:00
|
|
|
|
Status: Final
|
2004-07-19 14:08:57 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/plain
|
|
|
|
|
Created: 19-Jul-2003
|
|
|
|
|
Python-Version: 2.4
|
|
|
|
|
Post-History: 21-Jul-2003, 13-Aug-2003, 18-Jun-2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
Support for the LC_NUMERIC locale category in Python 2.3 is
|
|
|
|
|
implemented only in Python-space. This causes inconsistent
|
|
|
|
|
behavior and thread-safety issues for applications that use
|
|
|
|
|
extension modules and libraries implemented in C that parse and
|
|
|
|
|
generate floats from strings. This document proposes a plan for
|
|
|
|
|
removing this inconsistency by providing and using substitute
|
|
|
|
|
locale-agnostic functions as necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Introduction
|
|
|
|
|
|
|
|
|
|
Python provides generic localization services through the locale
|
|
|
|
|
module, which among other things allows localizing the display and
|
|
|
|
|
conversion process of numeric types. Locale categories, such as
|
|
|
|
|
LC_TIME and LC_COLLATE, allow configuring precisely what aspects
|
|
|
|
|
of the application are to be localized.
|
|
|
|
|
|
|
|
|
|
The LC_NUMERIC category specifies formatting for non-monetary
|
|
|
|
|
numeric information, such as the decimal separator in float and
|
|
|
|
|
fixed-precision numbers. Localization of the LC_NUMERIC category
|
|
|
|
|
is currently implemented only in Python-space; C libraries invoked
|
|
|
|
|
from the Python runtime are unaware of Python's LC_NUMERIC
|
|
|
|
|
setting. This is done to avoid changing the behavior of certain
|
|
|
|
|
low-level functions that are used by the Python parser and related
|
|
|
|
|
code [2].
|
|
|
|
|
|
|
|
|
|
However, this presents a problem for extension modules that wrap C
|
|
|
|
|
libraries. Applications that use these extension modules will
|
|
|
|
|
inconsistently display and convert floating-point values.
|
|
|
|
|
|
|
|
|
|
James Henstridge, the author of PyGTK [3], has additionally
|
|
|
|
|
pointed out that the setlocale() function also presents
|
|
|
|
|
thread-safety issues, since a thread may call the C library
|
|
|
|
|
setlocale() outside of the GIL, and cause Python to parse and
|
|
|
|
|
generate floats incorrectly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
The inconsistency between Python and C library localization for
|
|
|
|
|
LC_NUMERIC is a problem for any localized application using C
|
|
|
|
|
extensions. The exact nature of the problem will vary depending
|
|
|
|
|
on the application, but it will most likely occur when parsing or
|
|
|
|
|
formatting a floating-point value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example Problem
|
|
|
|
|
|
|
|
|
|
The initial problem that motivated this PEP is related to the
|
|
|
|
|
GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by the
|
|
|
|
|
PyGTK module. The widget can be set to numeric mode, and when
|
|
|
|
|
this occurs, characters typed into it are evaluated as a number.
|
|
|
|
|
|
|
|
|
|
Problems occur when LC_NUMERIC is set to a locale with a float
|
|
|
|
|
separator that differs from the C locale's standard (for instance,
|
|
|
|
|
`,' instead of `.' for the Brazilian locale pt_BR). Because
|
|
|
|
|
LC_NUMERIC is not set at the libc level, float values are
|
|
|
|
|
displayed incorrectly (using `.' as a separator) in the
|
|
|
|
|
spinbutton's text entry, and it is impossible to enter fractional
|
|
|
|
|
values using the `,' separator.
|
|
|
|
|
|
|
|
|
|
This small example demonstrates reduced usability for localized
|
|
|
|
|
applications using this toolkit when coded in Python.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Proposal
|
|
|
|
|
|
2006-03-02 14:54:50 -05:00
|
|
|
|
Martin v. Löwis commented on the initial constraints for an
|
2004-07-19 14:08:57 -04:00
|
|
|
|
acceptable solution to the problem on python-dev:
|
|
|
|
|
|
|
|
|
|
- LC_NUMERIC can be set at the C library level without
|
|
|
|
|
breaking the parser.
|
|
|
|
|
- float() and str() stay locale-unaware.
|
|
|
|
|
- locale-aware str() and atof() stay in the locale module.
|
|
|
|
|
|
|
|
|
|
An analysis of the Python source suggests that the following
|
|
|
|
|
functions currently depend on LC_NUMERIC being set to the C
|
|
|
|
|
locale:
|
|
|
|
|
|
|
|
|
|
- Python/compile.c:parsenumber()
|
|
|
|
|
- Python/marshal.c:r_object()
|
|
|
|
|
- Objects/complexobject.c:complex_to_buf()
|
|
|
|
|
- Objects/complexobject.c:complex_subtype_from_string()
|
|
|
|
|
- Objects/floatobject.c:PyFloat_FromString()
|
|
|
|
|
- Objects/floatobject.c:format_float()
|
|
|
|
|
- Objects/stringobject.c:formatfloat()
|
|
|
|
|
- Modules/stropmodule.c:strop_atof()
|
|
|
|
|
- Modules/cPickle.c:load_float()
|
|
|
|
|
|
|
|
|
|
The proposed approach is to implement LC_NUMERIC-agnostic
|
|
|
|
|
functions for converting from (strtod()/atof()) and to
|
|
|
|
|
(snprintf()) float formats, using these functions where the
|
|
|
|
|
formatting should not vary according to the user-specified locale.
|
|
|
|
|
|
|
|
|
|
The locale module should also be changed to remove the
|
|
|
|
|
special-casing for LC_NUMERIC.
|
|
|
|
|
|
|
|
|
|
This change should also solve the aforementioned thread-safety
|
|
|
|
|
problems.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Potential Code Contributions
|
|
|
|
|
|
|
|
|
|
This problem was initially reported as a problem in the GTK+
|
|
|
|
|
libraries [5]; since then it has been correctly diagnosed as an
|
|
|
|
|
inconsistency in Python's implementation. However, in a fortunate
|
|
|
|
|
coincidence, the glib library (developed primarily for GTK+, not
|
|
|
|
|
to be confused with the GNU C library) implements a number of
|
|
|
|
|
LC_NUMERIC-agnostic functions (for an example, see [6]) for
|
|
|
|
|
reasons similar to those presented in this paper.
|
|
|
|
|
|
|
|
|
|
In the same GTK+ problem report, Havoc Pennington suggested that
|
|
|
|
|
the glib authors would be willing to contribute this code to the
|
|
|
|
|
PSF, which would simplify implementation of this PEP considerably.
|
|
|
|
|
Alex Larsson, the original author of the glib code, submitted a
|
|
|
|
|
PSF Contributor Agreement [7] on 2003-08-20 [8] to ensure the code
|
2004-07-19 19:09:04 -04:00
|
|
|
|
could be safely integrated; this agreement has been received and
|
|
|
|
|
accepted.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Risks
|
|
|
|
|
|
|
|
|
|
There may be cross-platform issues with the provided
|
|
|
|
|
locale-agnostic functions, though this risk is low given that the
|
|
|
|
|
code supplied simply reverses any locale-dependent changes made to
|
|
|
|
|
floating-point numbers.
|
|
|
|
|
|
|
|
|
|
Martin and Guido pointed out potential copyright issues with the
|
|
|
|
|
contributed code. I believe we will have no problems in this area
|
|
|
|
|
as members of the GTK+ and glib teams have said they are fine with
|
|
|
|
|
relicensing the code, and a PSF contributor agreement has been
|
|
|
|
|
mailed in to ensure this safety.
|
|
|
|
|
|
|
|
|
|
Tim Peters has pointed out [9] that there are situations involving
|
|
|
|
|
threading in which the proposed change is insufficient to solve
|
|
|
|
|
the problem completely. A complete solution, however, does not
|
|
|
|
|
currently exist.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implementation
|
|
|
|
|
|
|
|
|
|
An implementation was developed by Gustavo Carneiro <gjc at
|
2004-07-21 08:30:07 -04:00
|
|
|
|
inescporto.pt>, and attached to Sourceforge.net bug 774665 [10]
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
The final patch [11] was integrated into Python CVS by Martin v.
|
2006-03-02 14:54:50 -05:00
|
|
|
|
Löwis on 2004-06-08, as stated in the bug report.
|
2004-07-19 14:08:57 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
|
|
|
|
|
http://www.python.org/peps/pep-0001.html
|
|
|
|
|
|
|
|
|
|
[2] Python locale documentation for embedding,
|
|
|
|
|
http://www.python.org/doc/current/lib/embedding-locale.html
|
|
|
|
|
|
|
|
|
|
[3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
|
|
|
|
|
|
|
|
|
|
[4] GtkSpinButton screenshot (demonstrating problem),
|
|
|
|
|
http://www.async.com.br/~kiko/spin.png
|
|
|
|
|
|
|
|
|
|
[5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
|
|
|
|
|
|
|
|
|
|
[6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
|
|
|
|
|
renamed g_ascii_formatd) by Alex Larsson,
|
|
|
|
|
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
|
|
|
|
|
|
|
|
|
|
[7] PSF Contributor Agreement,
|
|
|
|
|
http://www.python.org/psf/psf-contributor-agreement.html
|
|
|
|
|
|
|
|
|
|
[8] Alex Larsson's email confirming his agreement was mailed in,
|
|
|
|
|
http://mail.python.org/pipermail/python-dev/2003-August/037755.html
|
|
|
|
|
|
|
|
|
|
[9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
|
|
|
|
|
http://mail.python.org/pipermail/python-dev/2003-September/037898.html
|
|
|
|
|
|
|
|
|
|
[10] Python bug report, http://www.python.org/sf/774665
|
|
|
|
|
|
|
|
|
|
[11] Integrated LC_NUMERIC-agnostic patch,
|
|
|
|
|
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
2006-03-02 14:54:50 -05:00
|
|
|
|
coding: utf-8
|
2004-07-19 14:08:57 -04:00
|
|
|
|
End:
|