added PEP 331, Locale-Independent Float/String Conversions, by Christian R. Reis
This commit is contained in:
parent
5591215fe0
commit
a0686d62bb
|
@ -122,6 +122,7 @@ Index by Category
|
||||||
S 324 popen5 - New POSIX process module Astrand
|
S 324 popen5 - New POSIX process module Astrand
|
||||||
S 325 Resource-Release Support for Generators Pedroni
|
S 325 Resource-Release Support for Generators Pedroni
|
||||||
S 330 Python Bytecode Verification Pelletier
|
S 330 Python Bytecode Verification Pelletier
|
||||||
|
S 331 Locale-Independent Float/String conversions Reis
|
||||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||||
|
|
||||||
Finished PEPs (done, implemented in CVS)
|
Finished PEPs (done, implemented in CVS)
|
||||||
|
@ -353,6 +354,7 @@ Numerical Index
|
||||||
SA 328 Imports: Multi-Line and Absolute/Relative Aahz
|
SA 328 Imports: Multi-Line and Absolute/Relative Aahz
|
||||||
SR 329 Treating Builtins as Constants in the Standard Library Hettinger
|
SR 329 Treating Builtins as Constants in the Standard Library Hettinger
|
||||||
S 330 Python Bytecode Verification Pelletier
|
S 330 Python Bytecode Verification Pelletier
|
||||||
|
S 331 Locale-Independent Float/String conversions Reis
|
||||||
SR 666 Reject Foolish Indentation Creighton
|
SR 666 Reject Foolish Indentation Creighton
|
||||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||||
|
|
||||||
|
@ -430,6 +432,7 @@ Owners
|
||||||
Prescod, Paul paul@prescod.net
|
Prescod, Paul paul@prescod.net
|
||||||
Reedy, Terry tjreedy@udel.edu
|
Reedy, Terry tjreedy@udel.edu
|
||||||
Reifschneider, Sean jafo-pep@tummy.com
|
Reifschneider, Sean jafo-pep@tummy.com
|
||||||
|
Reis, Christian R. kiko@async.com.br
|
||||||
Riehl, Jonathan jriehl@spaceship.com
|
Riehl, Jonathan jriehl@spaceship.com
|
||||||
van Rossum, Guido (GvR) guido@python.org
|
van Rossum, Guido (GvR) guido@python.org
|
||||||
van Rossum, Just (JvR) just@letterror.com
|
van Rossum, Just (JvR) just@letterror.com
|
||||||
|
|
|
@ -0,0 +1,209 @@
|
||||||
|
PEP: 331
|
||||||
|
Title: Locale-Independent Float/String Conversions
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: Christian R. Reis <kiko at async.com.br>
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Content-Type: text/plain
|
||||||
|
Created: 19-Jul-2003
|
||||||
|
Python-Version: 2.4
|
||||||
|
Post-History: 21-Jul-2003, 13-Aug-2003, 18-Jun-2004
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
|
||||||
|
Support for the LC_NUMERIC locale category in Python 2.3 is
|
||||||
|
implemented only in Python-space. This causes inconsistent
|
||||||
|
behavior and thread-safety issues for applications that use
|
||||||
|
extension modules and libraries implemented in C that parse and
|
||||||
|
generate floats from strings. This document proposes a plan for
|
||||||
|
removing this inconsistency by providing and using substitute
|
||||||
|
locale-agnostic functions as necessary.
|
||||||
|
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
|
||||||
|
Python provides generic localization services through the locale
|
||||||
|
module, which among other things allows localizing the display and
|
||||||
|
conversion process of numeric types. Locale categories, such as
|
||||||
|
LC_TIME and LC_COLLATE, allow configuring precisely what aspects
|
||||||
|
of the application are to be localized.
|
||||||
|
|
||||||
|
The LC_NUMERIC category specifies formatting for non-monetary
|
||||||
|
numeric information, such as the decimal separator in float and
|
||||||
|
fixed-precision numbers. Localization of the LC_NUMERIC category
|
||||||
|
is currently implemented only in Python-space; C libraries invoked
|
||||||
|
from the Python runtime are unaware of Python's LC_NUMERIC
|
||||||
|
setting. This is done to avoid changing the behavior of certain
|
||||||
|
low-level functions that are used by the Python parser and related
|
||||||
|
code [2].
|
||||||
|
|
||||||
|
However, this presents a problem for extension modules that wrap C
|
||||||
|
libraries. Applications that use these extension modules will
|
||||||
|
inconsistently display and convert floating-point values.
|
||||||
|
|
||||||
|
James Henstridge, the author of PyGTK [3], has additionally
|
||||||
|
pointed out that the setlocale() function also presents
|
||||||
|
thread-safety issues, since a thread may call the C library
|
||||||
|
setlocale() outside of the GIL, and cause Python to parse and
|
||||||
|
generate floats incorrectly.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
|
||||||
|
The inconsistency between Python and C library localization for
|
||||||
|
LC_NUMERIC is a problem for any localized application using C
|
||||||
|
extensions. The exact nature of the problem will vary depending
|
||||||
|
on the application, but it will most likely occur when parsing or
|
||||||
|
formatting a floating-point value.
|
||||||
|
|
||||||
|
|
||||||
|
Example Problem
|
||||||
|
|
||||||
|
The initial problem that motivated this PEP is related to the
|
||||||
|
GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by the
|
||||||
|
PyGTK module. The widget can be set to numeric mode, and when
|
||||||
|
this occurs, characters typed into it are evaluated as a number.
|
||||||
|
|
||||||
|
Problems occur when LC_NUMERIC is set to a locale with a float
|
||||||
|
separator that differs from the C locale's standard (for instance,
|
||||||
|
`,' instead of `.' for the Brazilian locale pt_BR). Because
|
||||||
|
LC_NUMERIC is not set at the libc level, float values are
|
||||||
|
displayed incorrectly (using `.' as a separator) in the
|
||||||
|
spinbutton's text entry, and it is impossible to enter fractional
|
||||||
|
values using the `,' separator.
|
||||||
|
|
||||||
|
This small example demonstrates reduced usability for localized
|
||||||
|
applications using this toolkit when coded in Python.
|
||||||
|
|
||||||
|
|
||||||
|
Proposal
|
||||||
|
|
||||||
|
Martin v. Löwis commented on the initial constraints for an
|
||||||
|
acceptable solution to the problem on python-dev:
|
||||||
|
|
||||||
|
- LC_NUMERIC can be set at the C library level without
|
||||||
|
breaking the parser.
|
||||||
|
- float() and str() stay locale-unaware.
|
||||||
|
- locale-aware str() and atof() stay in the locale module.
|
||||||
|
|
||||||
|
An analysis of the Python source suggests that the following
|
||||||
|
functions currently depend on LC_NUMERIC being set to the C
|
||||||
|
locale:
|
||||||
|
|
||||||
|
- Python/compile.c:parsenumber()
|
||||||
|
- Python/marshal.c:r_object()
|
||||||
|
- Objects/complexobject.c:complex_to_buf()
|
||||||
|
- Objects/complexobject.c:complex_subtype_from_string()
|
||||||
|
- Objects/floatobject.c:PyFloat_FromString()
|
||||||
|
- Objects/floatobject.c:format_float()
|
||||||
|
- Objects/stringobject.c:formatfloat()
|
||||||
|
- Modules/stropmodule.c:strop_atof()
|
||||||
|
- Modules/cPickle.c:load_float()
|
||||||
|
|
||||||
|
The proposed approach is to implement LC_NUMERIC-agnostic
|
||||||
|
functions for converting from (strtod()/atof()) and to
|
||||||
|
(snprintf()) float formats, using these functions where the
|
||||||
|
formatting should not vary according to the user-specified locale.
|
||||||
|
|
||||||
|
The locale module should also be changed to remove the
|
||||||
|
special-casing for LC_NUMERIC.
|
||||||
|
|
||||||
|
This change should also solve the aforementioned thread-safety
|
||||||
|
problems.
|
||||||
|
|
||||||
|
|
||||||
|
Potential Code Contributions
|
||||||
|
|
||||||
|
This problem was initially reported as a problem in the GTK+
|
||||||
|
libraries [5]; since then it has been correctly diagnosed as an
|
||||||
|
inconsistency in Python's implementation. However, in a fortunate
|
||||||
|
coincidence, the glib library (developed primarily for GTK+, not
|
||||||
|
to be confused with the GNU C library) implements a number of
|
||||||
|
LC_NUMERIC-agnostic functions (for an example, see [6]) for
|
||||||
|
reasons similar to those presented in this paper.
|
||||||
|
|
||||||
|
In the same GTK+ problem report, Havoc Pennington suggested that
|
||||||
|
the glib authors would be willing to contribute this code to the
|
||||||
|
PSF, which would simplify implementation of this PEP considerably.
|
||||||
|
Alex Larsson, the original author of the glib code, submitted a
|
||||||
|
PSF Contributor Agreement [7] on 2003-08-20 [8] to ensure the code
|
||||||
|
could be safely integrated.
|
||||||
|
|
||||||
|
[XXX: was the agreement actually received and accepted?]
|
||||||
|
|
||||||
|
|
||||||
|
Risks
|
||||||
|
|
||||||
|
There may be cross-platform issues with the provided
|
||||||
|
locale-agnostic functions, though this risk is low given that the
|
||||||
|
code supplied simply reverses any locale-dependent changes made to
|
||||||
|
floating-point numbers.
|
||||||
|
|
||||||
|
Martin and Guido pointed out potential copyright issues with the
|
||||||
|
contributed code. I believe we will have no problems in this area
|
||||||
|
as members of the GTK+ and glib teams have said they are fine with
|
||||||
|
relicensing the code, and a PSF contributor agreement has been
|
||||||
|
mailed in to ensure this safety.
|
||||||
|
|
||||||
|
Tim Peters has pointed out [9] that there are situations involving
|
||||||
|
threading in which the proposed change is insufficient to solve
|
||||||
|
the problem completely. A complete solution, however, does not
|
||||||
|
currently exist.
|
||||||
|
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
|
||||||
|
An implementation was developed by Gustavo Carneiro <gjc at
|
||||||
|
inescporto.pt>, and attached to Sourceforge.net bug 744665 [10]
|
||||||
|
|
||||||
|
The final patch [11] was integrated into Python CVS by Martin v.
|
||||||
|
Löwis on 2004-06-08, as stated in the bug report.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
|
||||||
|
[1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
|
||||||
|
http://www.python.org/peps/pep-0001.html
|
||||||
|
|
||||||
|
[2] Python locale documentation for embedding,
|
||||||
|
http://www.python.org/doc/current/lib/embedding-locale.html
|
||||||
|
|
||||||
|
[3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
|
||||||
|
|
||||||
|
[4] GtkSpinButton screenshot (demonstrating problem),
|
||||||
|
http://www.async.com.br/~kiko/spin.png
|
||||||
|
|
||||||
|
[5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
|
||||||
|
|
||||||
|
[6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
|
||||||
|
renamed g_ascii_formatd) by Alex Larsson,
|
||||||
|
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
|
||||||
|
|
||||||
|
[7] PSF Contributor Agreement,
|
||||||
|
http://www.python.org/psf/psf-contributor-agreement.html
|
||||||
|
|
||||||
|
[8] Alex Larsson's email confirming his agreement was mailed in,
|
||||||
|
http://mail.python.org/pipermail/python-dev/2003-August/037755.html
|
||||||
|
|
||||||
|
[9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
|
||||||
|
http://mail.python.org/pipermail/python-dev/2003-September/037898.html
|
||||||
|
|
||||||
|
[10] Python bug report, http://www.python.org/sf/774665
|
||||||
|
|
||||||
|
[11] Integrated LC_NUMERIC-agnostic patch,
|
||||||
|
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
End:
|
Loading…
Reference in New Issue