Rearranged and updated to reflect Guido's comments.
This commit is contained in:
parent
51a6726ee3
commit
f4d2fefd2c
200
pep-0378.txt
200
pep-0378.txt
|
@ -33,7 +33,7 @@ platform to platform or may not be defined at all. The docs
|
|||
for the locale module describe these and `many other challenges`_
|
||||
in detail.
|
||||
|
||||
.. _`many other challenges`: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats
|
||||
.. _`many other challenges`: http://www.python.org/doc/2.6.1/library/locale.html#background-details-hints-tips-and-caveats
|
||||
|
||||
It is not the goal to replace the locale module, to perform
|
||||
internationalization tasks, or accommodate every possible
|
||||
|
@ -44,95 +44,8 @@ task easier for many users.
|
|||
.. _`Babel`: http://babel.edgewall.org/
|
||||
|
||||
|
||||
Current Version of the Mini-Language
|
||||
====================================
|
||||
|
||||
* `Python 2.6 docs`_
|
||||
|
||||
.. _Python 2.6 docs: http://www.python.org/doc/2.6.1/library/string.html#formatstrings
|
||||
|
||||
* PEP 3101 Advanced String Formatting
|
||||
|
||||
|
||||
Research so far
|
||||
===============
|
||||
|
||||
Scanning the web, I've found that thousands separators are
|
||||
usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
|
||||
|
||||
Visual Basic and its brethren (like `MS Excel`_) use a completely
|
||||
different style and have ultra-flexible custom format
|
||||
specifiers like::
|
||||
|
||||
"_($* #,##0_)".
|
||||
|
||||
.. _`MS Excel`: http://www.brainbell.com/tutorials/ms-office/excel/Create_Custom_Number_Formats.htm
|
||||
|
||||
`COBOL`_ uses picture clauses like::
|
||||
|
||||
PICTURE $***,**9.99CR
|
||||
|
||||
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol#Syntactic_features
|
||||
|
||||
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
|
||||
emit a COMMA as a thousands separator. The general form of ``~D`` is
|
||||
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
|
||||
to SPACE. The *commachar* defaults to COMMA. The *commainterval*
|
||||
defaults to three.
|
||||
|
||||
::
|
||||
|
||||
(format nil "~:D" 229345007) => "229,345,007"
|
||||
|
||||
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
|
||||
|
||||
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
|
||||
The type specifier approach is locale aware. The picture formatting only
|
||||
offers a COMMA as a thousands separator::
|
||||
|
||||
String.Format("{0:n}", 12400) ==> "12,400"
|
||||
String.Format("{0:0,0}", 12400) ==> "12,400"
|
||||
|
||||
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
|
||||
|
||||
|
||||
Proposal I (from Nick Coghlan)
|
||||
==============================
|
||||
|
||||
A comma will be added to the format() specifier mini-language::
|
||||
|
||||
[[fill]align][sign][#][0][width][,][.precision][type]
|
||||
|
||||
The ',' option indicates that commas should be included in the
|
||||
output as a thousands separator. As with locales which do not
|
||||
use a period as the decimal point, locales which use a
|
||||
different convention for digit separation will need to use the
|
||||
locale module to obtain appropriate formatting.
|
||||
|
||||
The proposal works well with floats, ints, and decimals.
|
||||
It also allows easy substitution for other separators.
|
||||
For example::
|
||||
|
||||
format(n, "6,d").replace(",", "_")
|
||||
|
||||
This technique is completely general but it is awkward in the
|
||||
one case where the commas and periods need to be swapped::
|
||||
|
||||
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
|
||||
|
||||
The *width* argument means the total length including the commas
|
||||
and decimal point::
|
||||
|
||||
format(1234, "08,d") --> '0001,234'
|
||||
format(1234.5, "08,.1f") --> '01,234.5'
|
||||
|
||||
The ',' option is defined as shown above for types 'd', 'f',
|
||||
and 'F'. It is undefined for other types (binary, octal, hex,
|
||||
character, exponential, general, percentage, etc.)
|
||||
|
||||
|
||||
Proposal II (from Eric Smith)
|
||||
=============================
|
||||
Main Proposal (from Eric Smith)
|
||||
===============================
|
||||
|
||||
Make both the thousands separator and decimal separator user
|
||||
specifiable but not locale aware. For simplicity, limit the
|
||||
|
@ -168,45 +81,106 @@ length including the thousands separators and decimal separators.
|
|||
No change is proposed for the locale module.
|
||||
|
||||
The thousands separator is defined as shown above for types
|
||||
'd', 'f', and 'F'. It is undefined for other types (binary,
|
||||
octal, hex, character, exponential, general, percentage, etc.)
|
||||
'd', 'e', 'f', 'g', 'E', 'G'and 'F'. To allow future extensions, it is
|
||||
undefined for other types: binary, octal, hex, character, etc.
|
||||
|
||||
|
||||
Comparison
|
||||
==========
|
||||
Current Version of the Mini-Language
|
||||
====================================
|
||||
|
||||
The difference between the two proposals is that the first is hard-wired
|
||||
to a COMMA for a thousands separator and a DOT as a decimal separator.
|
||||
The second allows either separator to be one of several possibilities.
|
||||
* `Python 2.6 docs`_
|
||||
|
||||
The PEP author recommends Proposal II.
|
||||
.. _Python 2.6 docs: http://www.python.org/doc/2.6.1/library/string.html#formatstrings
|
||||
|
||||
* PEP 3101 Advanced String Formatting
|
||||
|
||||
|
||||
Other Ideas
|
||||
===========
|
||||
Research into what Other Languages Do
|
||||
=====================================
|
||||
|
||||
* Lie Ryan suggested a convenience function of the form::
|
||||
Scanning the web, I've found that thousands separators are
|
||||
usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
|
||||
|
||||
create_format(self, type='i', base=16, seppos=4, sep=':',
|
||||
charset='0123456789abcdef', maxwidth=32,
|
||||
minwidth=32, pad='0')
|
||||
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
|
||||
The type specifier approach is locale aware. The picture formatting only
|
||||
offers a COMMA as a thousands separator::
|
||||
|
||||
* Eric Smith would like the C version of the mini-language
|
||||
parser to be exposed with hooks that would make it easier
|
||||
to write custom *__format__* methods. That way, methods like
|
||||
*Decimal.__format__* would not have to be written from scratch.
|
||||
String.Format("{0:n}", 12400) ==> "12,400"
|
||||
String.Format("{0:0,0}", 12400) ==> "12,400"
|
||||
|
||||
* Antoine Pitrou noted that the provision for a SPACE separator
|
||||
should also allow a non-breaking space (U+00A0).
|
||||
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
|
||||
|
||||
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
|
||||
emit a COMMA as a thousands separator. The general form of ``~D`` is
|
||||
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
|
||||
to SPACE. The *commachar* defaults to COMMA. The *commainterval*
|
||||
defaults to three.
|
||||
|
||||
::
|
||||
|
||||
(format nil "~:D" 229345007) => "229,345,007"
|
||||
|
||||
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
|
||||
|
||||
* A poster on the newgroup, Wolfgang Rohdewald, noted that a
|
||||
convention in Switzerland is to use an APOSTROPHE as a
|
||||
thousands separator, ``12`000.99``.
|
||||
|
||||
* The `ADA language`_ allows UNDERSCORES in its numeric literals.
|
||||
|
||||
.. _`ADA language`: http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html
|
||||
|
||||
Visual Basic and its brethren (like `MS Excel`_) use a completely
|
||||
different style and have ultra-flexible custom format
|
||||
specifiers like::
|
||||
|
||||
"_($* #,##0_)".
|
||||
|
||||
.. _`MS Excel`: http://www.brainbell.com/tutorials/ms-office/excel/Create_Custom_Number_Formats.htm
|
||||
|
||||
`COBOL`_ uses picture clauses like::
|
||||
|
||||
PICTURE $***,**9.99CR
|
||||
|
||||
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol#Syntactic_features
|
||||
|
||||
|
||||
Alternative Proposal (from Nick Coghlan)
|
||||
========================================
|
||||
|
||||
A comma will be added to the format() specifier mini-language::
|
||||
|
||||
[[fill]align][sign][#][0][width][,][.precision][type]
|
||||
|
||||
The ',' option indicates that commas should be included in the
|
||||
output as a thousands separator. As with locales which do not
|
||||
use a period as the decimal point, locales which use a
|
||||
different convention for digit separation will need to use the
|
||||
locale module to obtain appropriate formatting.
|
||||
|
||||
The proposal works well with floats, ints, and decimals.
|
||||
It also allows easy substitution for other separators.
|
||||
For example::
|
||||
|
||||
format(n, "6,d").replace(",", "_")
|
||||
|
||||
This technique is completely general but it is awkward in the
|
||||
one case where the commas and periods need to be swapped::
|
||||
|
||||
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
|
||||
|
||||
The *width* argument means the total length including the commas
|
||||
and decimal point::
|
||||
|
||||
format(1234, "08,d") --> '0001,234'
|
||||
format(1234.5, "08,.1f") --> '01,234.5'
|
||||
|
||||
The ',' option is defined as shown above for types 'd', 'e',
|
||||
'f', 'g', 'E', 'G'and 'F'. To allow future extensions, it is
|
||||
undefined for other types: binary, octal, hex, character,
|
||||
etc.
|
||||
|
||||
This alternative proposal has the virtue of being simpler
|
||||
than the main proposal but is much less flexible and meets
|
||||
the needs of fewer users right out of the box.
|
||||
|
||||
|
||||
Commentary
|
||||
==========
|
||||
|
|
Loading…
Reference in New Issue