added PEP 331, Locale-Independent Float/String Conversions, by Christian R. Reis

This commit is contained in:
David Goodger 2004-07-19 18:08:57 +00:00
parent 5591215fe0
commit a0686d62bb
2 changed files with 212 additions and 0 deletions

View File

@ -122,6 +122,7 @@ Index by Category
S 324 popen5 - New POSIX process module Astrand S 324 popen5 - New POSIX process module Astrand
S 325 Resource-Release Support for Generators Pedroni S 325 Resource-Release Support for Generators Pedroni
S 330 Python Bytecode Verification Pelletier S 330 Python Bytecode Verification Pelletier
S 331 Locale-Independent Float/String conversions Reis
S 754 IEEE 754 Floating Point Special Values Warnes S 754 IEEE 754 Floating Point Special Values Warnes
Finished PEPs (done, implemented in CVS) Finished PEPs (done, implemented in CVS)
@ -353,6 +354,7 @@ Numerical Index
SA 328 Imports: Multi-Line and Absolute/Relative Aahz SA 328 Imports: Multi-Line and Absolute/Relative Aahz
SR 329 Treating Builtins as Constants in the Standard Library Hettinger SR 329 Treating Builtins as Constants in the Standard Library Hettinger
S 330 Python Bytecode Verification Pelletier S 330 Python Bytecode Verification Pelletier
S 331 Locale-Independent Float/String conversions Reis
SR 666 Reject Foolish Indentation Creighton SR 666 Reject Foolish Indentation Creighton
S 754 IEEE 754 Floating Point Special Values Warnes S 754 IEEE 754 Floating Point Special Values Warnes
@ -430,6 +432,7 @@ Owners
Prescod, Paul paul@prescod.net Prescod, Paul paul@prescod.net
Reedy, Terry tjreedy@udel.edu Reedy, Terry tjreedy@udel.edu
Reifschneider, Sean jafo-pep@tummy.com Reifschneider, Sean jafo-pep@tummy.com
Reis, Christian R. kiko@async.com.br
Riehl, Jonathan jriehl@spaceship.com Riehl, Jonathan jriehl@spaceship.com
van Rossum, Guido (GvR) guido@python.org van Rossum, Guido (GvR) guido@python.org
van Rossum, Just (JvR) just@letterror.com van Rossum, Just (JvR) just@letterror.com

209
pep-0331.txt Normal file
View File

@ -0,0 +1,209 @@
PEP: 331
Title: Locale-Independent Float/String Conversions
Version: $Revision$
Last-Modified: $Date$
Author: Christian R. Reis <kiko at async.com.br>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 19-Jul-2003
Python-Version: 2.4
Post-History: 21-Jul-2003, 13-Aug-2003, 18-Jun-2004
Abstract
Support for the LC_NUMERIC locale category in Python 2.3 is
implemented only in Python-space. This causes inconsistent
behavior and thread-safety issues for applications that use
extension modules and libraries implemented in C that parse and
generate floats from strings. This document proposes a plan for
removing this inconsistency by providing and using substitute
locale-agnostic functions as necessary.
Introduction
Python provides generic localization services through the locale
module, which among other things allows localizing the display and
conversion process of numeric types. Locale categories, such as
LC_TIME and LC_COLLATE, allow configuring precisely what aspects
of the application are to be localized.
The LC_NUMERIC category specifies formatting for non-monetary
numeric information, such as the decimal separator in float and
fixed-precision numbers. Localization of the LC_NUMERIC category
is currently implemented only in Python-space; C libraries invoked
from the Python runtime are unaware of Python's LC_NUMERIC
setting. This is done to avoid changing the behavior of certain
low-level functions that are used by the Python parser and related
code [2].
However, this presents a problem for extension modules that wrap C
libraries. Applications that use these extension modules will
inconsistently display and convert floating-point values.
James Henstridge, the author of PyGTK [3], has additionally
pointed out that the setlocale() function also presents
thread-safety issues, since a thread may call the C library
setlocale() outside of the GIL, and cause Python to parse and
generate floats incorrectly.
Rationale
The inconsistency between Python and C library localization for
LC_NUMERIC is a problem for any localized application using C
extensions. The exact nature of the problem will vary depending
on the application, but it will most likely occur when parsing or
formatting a floating-point value.
Example Problem
The initial problem that motivated this PEP is related to the
GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by the
PyGTK module. The widget can be set to numeric mode, and when
this occurs, characters typed into it are evaluated as a number.
Problems occur when LC_NUMERIC is set to a locale with a float
separator that differs from the C locale's standard (for instance,
`,' instead of `.' for the Brazilian locale pt_BR). Because
LC_NUMERIC is not set at the libc level, float values are
displayed incorrectly (using `.' as a separator) in the
spinbutton's text entry, and it is impossible to enter fractional
values using the `,' separator.
This small example demonstrates reduced usability for localized
applications using this toolkit when coded in Python.
Proposal
Martin v. Löwis commented on the initial constraints for an
acceptable solution to the problem on python-dev:
- LC_NUMERIC can be set at the C library level without
breaking the parser.
- float() and str() stay locale-unaware.
- locale-aware str() and atof() stay in the locale module.
An analysis of the Python source suggests that the following
functions currently depend on LC_NUMERIC being set to the C
locale:
- Python/compile.c:parsenumber()
- Python/marshal.c:r_object()
- Objects/complexobject.c:complex_to_buf()
- Objects/complexobject.c:complex_subtype_from_string()
- Objects/floatobject.c:PyFloat_FromString()
- Objects/floatobject.c:format_float()
- Objects/stringobject.c:formatfloat()
- Modules/stropmodule.c:strop_atof()
- Modules/cPickle.c:load_float()
The proposed approach is to implement LC_NUMERIC-agnostic
functions for converting from (strtod()/atof()) and to
(snprintf()) float formats, using these functions where the
formatting should not vary according to the user-specified locale.
The locale module should also be changed to remove the
special-casing for LC_NUMERIC.
This change should also solve the aforementioned thread-safety
problems.
Potential Code Contributions
This problem was initially reported as a problem in the GTK+
libraries [5]; since then it has been correctly diagnosed as an
inconsistency in Python's implementation. However, in a fortunate
coincidence, the glib library (developed primarily for GTK+, not
to be confused with the GNU C library) implements a number of
LC_NUMERIC-agnostic functions (for an example, see [6]) for
reasons similar to those presented in this paper.
In the same GTK+ problem report, Havoc Pennington suggested that
the glib authors would be willing to contribute this code to the
PSF, which would simplify implementation of this PEP considerably.
Alex Larsson, the original author of the glib code, submitted a
PSF Contributor Agreement [7] on 2003-08-20 [8] to ensure the code
could be safely integrated.
[XXX: was the agreement actually received and accepted?]
Risks
There may be cross-platform issues with the provided
locale-agnostic functions, though this risk is low given that the
code supplied simply reverses any locale-dependent changes made to
floating-point numbers.
Martin and Guido pointed out potential copyright issues with the
contributed code. I believe we will have no problems in this area
as members of the GTK+ and glib teams have said they are fine with
relicensing the code, and a PSF contributor agreement has been
mailed in to ensure this safety.
Tim Peters has pointed out [9] that there are situations involving
threading in which the proposed change is insufficient to solve
the problem completely. A complete solution, however, does not
currently exist.
Implementation
An implementation was developed by Gustavo Carneiro <gjc at
inescporto.pt>, and attached to Sourceforge.net bug 744665 [10]
The final patch [11] was integrated into Python CVS by Martin v.
Löwis on 2004-06-08, as stated in the bug report.
References
[1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
http://www.python.org/peps/pep-0001.html
[2] Python locale documentation for embedding,
http://www.python.org/doc/current/lib/embedding-locale.html
[3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
[4] GtkSpinButton screenshot (demonstrating problem),
http://www.async.com.br/~kiko/spin.png
[5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
[6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
renamed g_ascii_formatd) by Alex Larsson,
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
[7] PSF Contributor Agreement,
http://www.python.org/psf/psf-contributor-agreement.html
[8] Alex Larsson's email confirming his agreement was mailed in,
http://mail.python.org/pipermail/python-dev/2003-August/037755.html
[9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
http://mail.python.org/pipermail/python-dev/2003-September/037898.html
[10] Python bug report, http://www.python.org/sf/774665
[11] Integrated LC_NUMERIC-agnostic patch,
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: