210 lines
7.9 KiB
Plaintext
210 lines
7.9 KiB
Plaintext
PEP: 331
|
||
Title: Locale-Independent Float/String Conversions
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Christian R. Reis <kiko at async.com.br>
|
||
Status: Final
|
||
Type: Standards Track
|
||
Content-Type: text/plain
|
||
Created: 19-Jul-2003
|
||
Python-Version: 2.4
|
||
Post-History: 21-Jul-2003, 13-Aug-2003, 18-Jun-2004
|
||
|
||
|
||
Abstract
|
||
|
||
Support for the LC_NUMERIC locale category in Python 2.3 is
|
||
implemented only in Python-space. This causes inconsistent
|
||
behavior and thread-safety issues for applications that use
|
||
extension modules and libraries implemented in C that parse and
|
||
generate floats from strings. This document proposes a plan for
|
||
removing this inconsistency by providing and using substitute
|
||
locale-agnostic functions as necessary.
|
||
|
||
|
||
Introduction
|
||
|
||
Python provides generic localization services through the locale
|
||
module, which among other things allows localizing the display and
|
||
conversion process of numeric types. Locale categories, such as
|
||
LC_TIME and LC_COLLATE, allow configuring precisely what aspects
|
||
of the application are to be localized.
|
||
|
||
The LC_NUMERIC category specifies formatting for non-monetary
|
||
numeric information, such as the decimal separator in float and
|
||
fixed-precision numbers. Localization of the LC_NUMERIC category
|
||
is currently implemented only in Python-space; C libraries invoked
|
||
from the Python runtime are unaware of Python's LC_NUMERIC
|
||
setting. This is done to avoid changing the behavior of certain
|
||
low-level functions that are used by the Python parser and related
|
||
code [2].
|
||
|
||
However, this presents a problem for extension modules that wrap C
|
||
libraries. Applications that use these extension modules will
|
||
inconsistently display and convert floating-point values.
|
||
|
||
James Henstridge, the author of PyGTK [3], has additionally
|
||
pointed out that the setlocale() function also presents
|
||
thread-safety issues, since a thread may call the C library
|
||
setlocale() outside of the GIL, and cause Python to parse and
|
||
generate floats incorrectly.
|
||
|
||
|
||
Rationale
|
||
|
||
The inconsistency between Python and C library localization for
|
||
LC_NUMERIC is a problem for any localized application using C
|
||
extensions. The exact nature of the problem will vary depending
|
||
on the application, but it will most likely occur when parsing or
|
||
formatting a floating-point value.
|
||
|
||
|
||
Example Problem
|
||
|
||
The initial problem that motivated this PEP is related to the
|
||
GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by the
|
||
PyGTK module. The widget can be set to numeric mode, and when
|
||
this occurs, characters typed into it are evaluated as a number.
|
||
|
||
Problems occur when LC_NUMERIC is set to a locale with a float
|
||
separator that differs from the C locale's standard (for instance,
|
||
`,' instead of `.' for the Brazilian locale pt_BR). Because
|
||
LC_NUMERIC is not set at the libc level, float values are
|
||
displayed incorrectly (using `.' as a separator) in the
|
||
spinbutton's text entry, and it is impossible to enter fractional
|
||
values using the `,' separator.
|
||
|
||
This small example demonstrates reduced usability for localized
|
||
applications using this toolkit when coded in Python.
|
||
|
||
|
||
Proposal
|
||
|
||
Martin v. Löwis commented on the initial constraints for an
|
||
acceptable solution to the problem on python-dev:
|
||
|
||
- LC_NUMERIC can be set at the C library level without
|
||
breaking the parser.
|
||
- float() and str() stay locale-unaware.
|
||
- locale-aware str() and atof() stay in the locale module.
|
||
|
||
An analysis of the Python source suggests that the following
|
||
functions currently depend on LC_NUMERIC being set to the C
|
||
locale:
|
||
|
||
- Python/compile.c:parsenumber()
|
||
- Python/marshal.c:r_object()
|
||
- Objects/complexobject.c:complex_to_buf()
|
||
- Objects/complexobject.c:complex_subtype_from_string()
|
||
- Objects/floatobject.c:PyFloat_FromString()
|
||
- Objects/floatobject.c:format_float()
|
||
- Objects/stringobject.c:formatfloat()
|
||
- Modules/stropmodule.c:strop_atof()
|
||
- Modules/cPickle.c:load_float()
|
||
|
||
The proposed approach is to implement LC_NUMERIC-agnostic
|
||
functions for converting from (strtod()/atof()) and to
|
||
(snprintf()) float formats, using these functions where the
|
||
formatting should not vary according to the user-specified locale.
|
||
|
||
The locale module should also be changed to remove the
|
||
special-casing for LC_NUMERIC.
|
||
|
||
This change should also solve the aforementioned thread-safety
|
||
problems.
|
||
|
||
|
||
Potential Code Contributions
|
||
|
||
This problem was initially reported as a problem in the GTK+
|
||
libraries [5]; since then it has been correctly diagnosed as an
|
||
inconsistency in Python's implementation. However, in a fortunate
|
||
coincidence, the glib library (developed primarily for GTK+, not
|
||
to be confused with the GNU C library) implements a number of
|
||
LC_NUMERIC-agnostic functions (for an example, see [6]) for
|
||
reasons similar to those presented in this paper.
|
||
|
||
In the same GTK+ problem report, Havoc Pennington suggested that
|
||
the glib authors would be willing to contribute this code to the
|
||
PSF, which would simplify implementation of this PEP considerably.
|
||
Alex Larsson, the original author of the glib code, submitted a
|
||
PSF Contributor Agreement [7] on 2003-08-20 [8] to ensure the code
|
||
could be safely integrated; this agreement has been received and
|
||
accepted.
|
||
|
||
|
||
Risks
|
||
|
||
There may be cross-platform issues with the provided
|
||
locale-agnostic functions, though this risk is low given that the
|
||
code supplied simply reverses any locale-dependent changes made to
|
||
floating-point numbers.
|
||
|
||
Martin and Guido pointed out potential copyright issues with the
|
||
contributed code. I believe we will have no problems in this area
|
||
as members of the GTK+ and glib teams have said they are fine with
|
||
relicensing the code, and a PSF contributor agreement has been
|
||
mailed in to ensure this safety.
|
||
|
||
Tim Peters has pointed out [9] that there are situations involving
|
||
threading in which the proposed change is insufficient to solve
|
||
the problem completely. A complete solution, however, does not
|
||
currently exist.
|
||
|
||
|
||
Implementation
|
||
|
||
An implementation was developed by Gustavo Carneiro <gjc at
|
||
inescporto.pt>, and attached to Sourceforge.net bug 774665 [10]
|
||
|
||
The final patch [11] was integrated into Python CVS by Martin v.
|
||
Löwis on 2004-06-08, as stated in the bug report.
|
||
|
||
|
||
References
|
||
|
||
[1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
|
||
http://www.python.org/peps/pep-0001.html
|
||
|
||
[2] Python locale documentation for embedding,
|
||
http://docs.python.org/library/locale.html
|
||
|
||
[3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
|
||
|
||
[4] GtkSpinButton screenshot (demonstrating problem),
|
||
http://www.async.com.br/~kiko/spin.png
|
||
|
||
[5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
|
||
|
||
[6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
|
||
renamed g_ascii_formatd) by Alex Larsson,
|
||
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
|
||
|
||
[7] PSF Contributor Agreement,
|
||
http://www.python.org/psf/psf-contributor-agreement.html
|
||
|
||
[8] Alex Larsson's email confirming his agreement was mailed in,
|
||
http://mail.python.org/pipermail/python-dev/2003-August/037755.html
|
||
|
||
[9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
|
||
http://mail.python.org/pipermail/python-dev/2003-September/037898.html
|
||
|
||
[10] Python bug report, http://www.python.org/sf/774665
|
||
|
||
[11] Integrated LC_NUMERIC-agnostic patch,
|
||
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|