Rearranged and updated to reflect Guido's comments.

This commit is contained in:
Raymond Hettinger 2009-03-16 22:16:11 +00:00
parent 51a6726ee3
commit f4d2fefd2c
1 changed files with 87 additions and 113 deletions

View File

@ -33,7 +33,7 @@ platform to platform or may not be defined at all. The docs
for the locale module describe these and `many other challenges`_
in detail.
.. _`many other challenges`: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats
.. _`many other challenges`: http://www.python.org/doc/2.6.1/library/locale.html#background-details-hints-tips-and-caveats
It is not the goal to replace the locale module, to perform
internationalization tasks, or accommodate every possible
@ -44,95 +44,8 @@ task easier for many users.
.. _`Babel`: http://babel.edgewall.org/
Current Version of the Mini-Language
====================================
* `Python 2.6 docs`_
.. _Python 2.6 docs: http://www.python.org/doc/2.6.1/library/string.html#formatstrings
* PEP 3101 Advanced String Formatting
Research so far
===============
Scanning the web, I've found that thousands separators are
usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
Visual Basic and its brethren (like `MS Excel`_) use a completely
different style and have ultra-flexible custom format
specifiers like::
"_($* #,##0_)".
.. _`MS Excel`: http://www.brainbell.com/tutorials/ms-office/excel/Create_Custom_Number_Formats.htm
`COBOL`_ uses picture clauses like::
PICTURE $***,**9.99CR
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol#Syntactic_features
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
emit a COMMA as a thousands separator. The general form of ``~D`` is
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
to SPACE. The *commachar* defaults to COMMA. The *commainterval*
defaults to three.
::
(format nil "~:D" 229345007) => "229,345,007"
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
The type specifier approach is locale aware. The picture formatting only
offers a COMMA as a thousands separator::
String.Format("{0:n}", 12400) ==> "12,400"
String.Format("{0:0,0}", 12400) ==> "12,400"
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
Proposal I (from Nick Coghlan)
==============================
A comma will be added to the format() specifier mini-language::
[[fill]align][sign][#][0][width][,][.precision][type]
The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.
The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example::
format(n, "6,d").replace(",", "_")
This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped::
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
The *width* argument means the total length including the commas
and decimal point::
format(1234, "08,d") --> '0001,234'
format(1234.5, "08,.1f") --> '01,234.5'
The ',' option is defined as shown above for types 'd', 'f',
and 'F'. It is undefined for other types (binary, octal, hex,
character, exponential, general, percentage, etc.)
Proposal II (from Eric Smith)
=============================
Main Proposal (from Eric Smith)
===============================
Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
@ -168,45 +81,106 @@ length including the thousands separators and decimal separators.
No change is proposed for the locale module.
The thousands separator is defined as shown above for types
'd', 'f', and 'F'. It is undefined for other types (binary,
octal, hex, character, exponential, general, percentage, etc.)
'd', 'e', 'f', 'g', 'E', 'G'and 'F'. To allow future extensions, it is
undefined for other types: binary, octal, hex, character, etc.
Comparison
==========
Current Version of the Mini-Language
====================================
The difference between the two proposals is that the first is hard-wired
to a COMMA for a thousands separator and a DOT as a decimal separator.
The second allows either separator to be one of several possibilities.
* `Python 2.6 docs`_
The PEP author recommends Proposal II.
.. _Python 2.6 docs: http://www.python.org/doc/2.6.1/library/string.html#formatstrings
* PEP 3101 Advanced String Formatting
Other Ideas
===========
Research into what Other Languages Do
=====================================
* Lie Ryan suggested a convenience function of the form::
Scanning the web, I've found that thousands separators are
usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
create_format(self, type='i', base=16, seppos=4, sep=':',
charset='0123456789abcdef', maxwidth=32,
minwidth=32, pad='0')
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
The type specifier approach is locale aware. The picture formatting only
offers a COMMA as a thousands separator::
* Eric Smith would like the C version of the mini-language
parser to be exposed with hooks that would make it easier
to write custom *__format__* methods. That way, methods like
*Decimal.__format__* would not have to be written from scratch.
String.Format("{0:n}", 12400) ==> "12,400"
String.Format("{0:0,0}", 12400) ==> "12,400"
* Antoine Pitrou noted that the provision for a SPACE separator
should also allow a non-breaking space (U+00A0).
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
emit a COMMA as a thousands separator. The general form of ``~D`` is
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
to SPACE. The *commachar* defaults to COMMA. The *commainterval*
defaults to three.
::
(format nil "~:D" 229345007) => "229,345,007"
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
* A poster on the newgroup, Wolfgang Rohdewald, noted that a
convention in Switzerland is to use an APOSTROPHE as a
thousands separator, ``12`000.99``.
* The `ADA language`_ allows UNDERSCORES in its numeric literals.
.. _`ADA language`: http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html
Visual Basic and its brethren (like `MS Excel`_) use a completely
different style and have ultra-flexible custom format
specifiers like::
"_($* #,##0_)".
.. _`MS Excel`: http://www.brainbell.com/tutorials/ms-office/excel/Create_Custom_Number_Formats.htm
`COBOL`_ uses picture clauses like::
PICTURE $***,**9.99CR
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol#Syntactic_features
Alternative Proposal (from Nick Coghlan)
========================================
A comma will be added to the format() specifier mini-language::
[[fill]align][sign][#][0][width][,][.precision][type]
The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.
The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example::
format(n, "6,d").replace(",", "_")
This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped::
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
The *width* argument means the total length including the commas
and decimal point::
format(1234, "08,d") --> '0001,234'
format(1234.5, "08,.1f") --> '01,234.5'
The ',' option is defined as shown above for types 'd', 'e',
'f', 'g', 'E', 'G'and 'F'. To allow future extensions, it is
undefined for other types: binary, octal, hex, character,
etc.
This alternative proposal has the virtue of being simpler
than the main proposal but is much less flexible and meets
the needs of fewer users right out of the box.
Commentary
==========