added PEP 331, Locale-Independent Float/String Conversions, by Christian R. Reis
This commit is contained in:
parent
5591215fe0
commit
a0686d62bb
|
@ -122,6 +122,7 @@ Index by Category
|
|||
S 324 popen5 - New POSIX process module Astrand
|
||||
S 325 Resource-Release Support for Generators Pedroni
|
||||
S 330 Python Bytecode Verification Pelletier
|
||||
S 331 Locale-Independent Float/String conversions Reis
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
|
||||
Finished PEPs (done, implemented in CVS)
|
||||
|
@ -353,6 +354,7 @@ Numerical Index
|
|||
SA 328 Imports: Multi-Line and Absolute/Relative Aahz
|
||||
SR 329 Treating Builtins as Constants in the Standard Library Hettinger
|
||||
S 330 Python Bytecode Verification Pelletier
|
||||
S 331 Locale-Independent Float/String conversions Reis
|
||||
SR 666 Reject Foolish Indentation Creighton
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
|
||||
|
@ -430,6 +432,7 @@ Owners
|
|||
Prescod, Paul paul@prescod.net
|
||||
Reedy, Terry tjreedy@udel.edu
|
||||
Reifschneider, Sean jafo-pep@tummy.com
|
||||
Reis, Christian R. kiko@async.com.br
|
||||
Riehl, Jonathan jriehl@spaceship.com
|
||||
van Rossum, Guido (GvR) guido@python.org
|
||||
van Rossum, Just (JvR) just@letterror.com
|
||||
|
|
|
@ -0,0 +1,209 @@
|
|||
PEP: 331
|
||||
Title: Locale-Independent Float/String Conversions
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Christian R. Reis <kiko at async.com.br>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/plain
|
||||
Created: 19-Jul-2003
|
||||
Python-Version: 2.4
|
||||
Post-History: 21-Jul-2003, 13-Aug-2003, 18-Jun-2004
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
Support for the LC_NUMERIC locale category in Python 2.3 is
|
||||
implemented only in Python-space. This causes inconsistent
|
||||
behavior and thread-safety issues for applications that use
|
||||
extension modules and libraries implemented in C that parse and
|
||||
generate floats from strings. This document proposes a plan for
|
||||
removing this inconsistency by providing and using substitute
|
||||
locale-agnostic functions as necessary.
|
||||
|
||||
|
||||
Introduction
|
||||
|
||||
Python provides generic localization services through the locale
|
||||
module, which among other things allows localizing the display and
|
||||
conversion process of numeric types. Locale categories, such as
|
||||
LC_TIME and LC_COLLATE, allow configuring precisely what aspects
|
||||
of the application are to be localized.
|
||||
|
||||
The LC_NUMERIC category specifies formatting for non-monetary
|
||||
numeric information, such as the decimal separator in float and
|
||||
fixed-precision numbers. Localization of the LC_NUMERIC category
|
||||
is currently implemented only in Python-space; C libraries invoked
|
||||
from the Python runtime are unaware of Python's LC_NUMERIC
|
||||
setting. This is done to avoid changing the behavior of certain
|
||||
low-level functions that are used by the Python parser and related
|
||||
code [2].
|
||||
|
||||
However, this presents a problem for extension modules that wrap C
|
||||
libraries. Applications that use these extension modules will
|
||||
inconsistently display and convert floating-point values.
|
||||
|
||||
James Henstridge, the author of PyGTK [3], has additionally
|
||||
pointed out that the setlocale() function also presents
|
||||
thread-safety issues, since a thread may call the C library
|
||||
setlocale() outside of the GIL, and cause Python to parse and
|
||||
generate floats incorrectly.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
The inconsistency between Python and C library localization for
|
||||
LC_NUMERIC is a problem for any localized application using C
|
||||
extensions. The exact nature of the problem will vary depending
|
||||
on the application, but it will most likely occur when parsing or
|
||||
formatting a floating-point value.
|
||||
|
||||
|
||||
Example Problem
|
||||
|
||||
The initial problem that motivated this PEP is related to the
|
||||
GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by the
|
||||
PyGTK module. The widget can be set to numeric mode, and when
|
||||
this occurs, characters typed into it are evaluated as a number.
|
||||
|
||||
Problems occur when LC_NUMERIC is set to a locale with a float
|
||||
separator that differs from the C locale's standard (for instance,
|
||||
`,' instead of `.' for the Brazilian locale pt_BR). Because
|
||||
LC_NUMERIC is not set at the libc level, float values are
|
||||
displayed incorrectly (using `.' as a separator) in the
|
||||
spinbutton's text entry, and it is impossible to enter fractional
|
||||
values using the `,' separator.
|
||||
|
||||
This small example demonstrates reduced usability for localized
|
||||
applications using this toolkit when coded in Python.
|
||||
|
||||
|
||||
Proposal
|
||||
|
||||
Martin v. Löwis commented on the initial constraints for an
|
||||
acceptable solution to the problem on python-dev:
|
||||
|
||||
- LC_NUMERIC can be set at the C library level without
|
||||
breaking the parser.
|
||||
- float() and str() stay locale-unaware.
|
||||
- locale-aware str() and atof() stay in the locale module.
|
||||
|
||||
An analysis of the Python source suggests that the following
|
||||
functions currently depend on LC_NUMERIC being set to the C
|
||||
locale:
|
||||
|
||||
- Python/compile.c:parsenumber()
|
||||
- Python/marshal.c:r_object()
|
||||
- Objects/complexobject.c:complex_to_buf()
|
||||
- Objects/complexobject.c:complex_subtype_from_string()
|
||||
- Objects/floatobject.c:PyFloat_FromString()
|
||||
- Objects/floatobject.c:format_float()
|
||||
- Objects/stringobject.c:formatfloat()
|
||||
- Modules/stropmodule.c:strop_atof()
|
||||
- Modules/cPickle.c:load_float()
|
||||
|
||||
The proposed approach is to implement LC_NUMERIC-agnostic
|
||||
functions for converting from (strtod()/atof()) and to
|
||||
(snprintf()) float formats, using these functions where the
|
||||
formatting should not vary according to the user-specified locale.
|
||||
|
||||
The locale module should also be changed to remove the
|
||||
special-casing for LC_NUMERIC.
|
||||
|
||||
This change should also solve the aforementioned thread-safety
|
||||
problems.
|
||||
|
||||
|
||||
Potential Code Contributions
|
||||
|
||||
This problem was initially reported as a problem in the GTK+
|
||||
libraries [5]; since then it has been correctly diagnosed as an
|
||||
inconsistency in Python's implementation. However, in a fortunate
|
||||
coincidence, the glib library (developed primarily for GTK+, not
|
||||
to be confused with the GNU C library) implements a number of
|
||||
LC_NUMERIC-agnostic functions (for an example, see [6]) for
|
||||
reasons similar to those presented in this paper.
|
||||
|
||||
In the same GTK+ problem report, Havoc Pennington suggested that
|
||||
the glib authors would be willing to contribute this code to the
|
||||
PSF, which would simplify implementation of this PEP considerably.
|
||||
Alex Larsson, the original author of the glib code, submitted a
|
||||
PSF Contributor Agreement [7] on 2003-08-20 [8] to ensure the code
|
||||
could be safely integrated.
|
||||
|
||||
[XXX: was the agreement actually received and accepted?]
|
||||
|
||||
|
||||
Risks
|
||||
|
||||
There may be cross-platform issues with the provided
|
||||
locale-agnostic functions, though this risk is low given that the
|
||||
code supplied simply reverses any locale-dependent changes made to
|
||||
floating-point numbers.
|
||||
|
||||
Martin and Guido pointed out potential copyright issues with the
|
||||
contributed code. I believe we will have no problems in this area
|
||||
as members of the GTK+ and glib teams have said they are fine with
|
||||
relicensing the code, and a PSF contributor agreement has been
|
||||
mailed in to ensure this safety.
|
||||
|
||||
Tim Peters has pointed out [9] that there are situations involving
|
||||
threading in which the proposed change is insufficient to solve
|
||||
the problem completely. A complete solution, however, does not
|
||||
currently exist.
|
||||
|
||||
|
||||
Implementation
|
||||
|
||||
An implementation was developed by Gustavo Carneiro <gjc at
|
||||
inescporto.pt>, and attached to Sourceforge.net bug 744665 [10]
|
||||
|
||||
The final patch [11] was integrated into Python CVS by Martin v.
|
||||
Löwis on 2004-06-08, as stated in the bug report.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
|
||||
http://www.python.org/peps/pep-0001.html
|
||||
|
||||
[2] Python locale documentation for embedding,
|
||||
http://www.python.org/doc/current/lib/embedding-locale.html
|
||||
|
||||
[3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
|
||||
|
||||
[4] GtkSpinButton screenshot (demonstrating problem),
|
||||
http://www.async.com.br/~kiko/spin.png
|
||||
|
||||
[5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
|
||||
|
||||
[6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
|
||||
renamed g_ascii_formatd) by Alex Larsson,
|
||||
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
|
||||
|
||||
[7] PSF Contributor Agreement,
|
||||
http://www.python.org/psf/psf-contributor-agreement.html
|
||||
|
||||
[8] Alex Larsson's email confirming his agreement was mailed in,
|
||||
http://mail.python.org/pipermail/python-dev/2003-August/037755.html
|
||||
|
||||
[9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
|
||||
http://mail.python.org/pipermail/python-dev/2003-September/037898.html
|
||||
|
||||
[10] Python bug report, http://www.python.org/sf/774665
|
||||
|
||||
[11] Integrated LC_NUMERIC-agnostic patch,
|
||||
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
Loading…
Reference in New Issue