Update PEP:

* Summarize commentary to date.
* Add APOSTROPHE and non-breaking SPACE to the list of separators.
* Add more links to external references.
* Detail issues with the locale module.
* Clarify how proposal II is parsed.
This commit is contained in:
Raymond Hettinger 2009-03-13 23:28:44 +00:00
parent 08f2e14f09
commit 964ee28e3f
1 changed files with 50 additions and 13 deletions

View File

@ -25,6 +25,16 @@ In the finance world, output with commas is the norm. Finance
users and non-professional programmers find the locale users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious. approach to be frustrating, arcane and non-obvious.
The locale module presents two other challenges. First, it is
a global setting and not suitable for multi-threaded apps that
need to serve-up requests in multiple locales. Second, the
name of a relevant locale (perhaps "de_DE") can vary from
platform to platform or may not be defined at all. The docs
for the locale module describe these and `other challenges`_
in detail.
.. _`other challenges`: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats
It is not the goal to replace locale or to accommodate every It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier possible convention. The goal is to make a common task easier
for many users. for many users.
@ -54,17 +64,19 @@ ten-thousands. Eric Smith pointed-out that these are already
handled by the "n" specifier in the locale module (albeit only handled by the "n" specifier in the locale module (albeit only
for integers). for integers).
Visual Basic and its brethren (like MS Excel) use a completely Visual Basic and its brethren (like `MS Excel`_) use a completely
different style and have ultra-flexible custom format different style and have ultra-flexible custom format
specifiers like:: specifiers like::
"_($* #,##0_)". "_($* #,##0_)".
.. _`MS Excel`: http://www.brainbell.com/tutorials/ms-office/excel/Create_Custom_Number_Formats.htm
`COBOL`_ uses picture clauses like:: `COBOL`_ uses picture clauses like::
PIC $***,**9.99CR PICTURE $***,**9.99CR
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol .. _`COBOL`: http://en.wikipedia.org/wiki/Cobol#Syntactic_features
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to `Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
emit a COMMA as a thousands separator. The general form of ``~D`` is emit a COMMA as a thousands separator. The general form of ``~D`` is
@ -91,7 +103,7 @@ offers a COMMA as a thousands separator::
Proposal I (from Nick Coghlan) Proposal I (from Nick Coghlan)
============================== ==============================
A comma will be added to the format() specifier mini-language: A comma will be added to the format() specifier mini-language::
[[fill]align][sign][#][0][width][,][.precision][type] [[fill]align][sign][#][0][width][,][.precision][type]
@ -124,15 +136,15 @@ Proposal II (from Eric Smith)
Make both the thousands separator and decimal separator user Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the specifiable but not locale aware. For simplicity, limit the
choices to a COMMA, DOT, SPACE, or UNDERSCORE. choices to a COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
The SPACE can be eitherU +0020 or U+00A0.
Whenever the separator is followed by a precision, it is a Whenever the separator is followed by a precision, it is a
decimal separator and the optional separator preceding it is a decimal separator and an optional separator preceding it is a
thousands separator. When the precision is absent, the thousands separator. When the precision is absent, a lone
context is integral and a lone specifier means a thousands specifier means a thousands separator::
separator::
[[fill]align][sign][#][0][width][tsep|([tsep] dsep precision)][type] [[fill]align][sign][#][0][width][tsep][dsep precision]][type]
Examples:: Examples::
@ -142,13 +154,12 @@ Examples::
format(1234, "8 ,f") --> ' 1 234,0' format(1234, "8 ,f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234' format(1234, "8d") --> ' 1234'
format(1234, "8,d") --> ' 1,234' format(1234, "8,d") --> ' 1,234'
format(1234, "8_d") --> ' 1_234'
This proposal meets mosts needs (except for people wanting This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but it comes at the grouping for hundreds or ten-thousands), but it comes at the
expense of being a little more complicated to learn and expense of being a little more complicated to learn and
remember. Also, it makes it more challenging to write custom remember.
__format__ methods that follow the format specification
mini-language.
As shown in the examples, the *width* argument means the total As shown in the examples, the *width* argument means the total
length including the thousands separators and decimal separators. length including the thousands separators and decimal separators.
@ -179,6 +190,32 @@ Other Ideas
to write custom __format__ methods. That way to write custom __format__ methods. That way
Decimal.__format__ would not have to be written from scratch. Decimal.__format__ would not have to be written from scratch.
* Antoine Pitrou noted that the provision for a SPACE separator
should also allow a non-breaking space (U+00A0).
* A poster on the newgroup, Wolfgang Rohdewald, noted that a
convention in Switzerland is use an APOSTROPHE as a
thousands separator, ``12`000.99``.
Commentary
==========
* Some commenters do not like the idea of format strings at all
and find them to be unreadable. Suggested alternatives include
the COBOL style PICTURE approach or a convenience function with
keyword arguments for every possible combination.
* Some newsgroup respondants think there is no place for any
scripts that are not internationalized and that it is a step
backwards to provide a simple way to hardwire a given convention.
* Another thought is that embedding some particular convention in
individual format strings makes it hard to change that convention
later. No workable alternative was suggested but the general idea
is to set the convention once and have it apply everywhere (others
commented that locale already does this).
Copyright Copyright
========= =========