python-peps/pep-0378.txt

239 lines
7.7 KiB
Plaintext
Raw Normal View History

PEP: 378
Title: Format Specifier for Thousands Separator
Version: $Revision$
Last-Modified: $Date$
2009-03-12 08:48:58 -04:00
Author: Raymond Hettinger <python@rcn.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Mar-2009
2009-03-12 14:20:21 -04:00
Python-Version: 2.7 and 3.1
Post-History: 12-Mar-2009
Motivation
==========
Provide a simple, non-locale aware way to format a number
with a thousands separator.
Adding thousands separators is one of the simplest ways to
2009-03-13 23:08:00 -04:00
humanize a program's output, improving its professional appearance
and readability.
2009-03-13 23:08:00 -04:00
In the finance world, output with thousands separators is the norm.
Finance users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.
The locale module presents two other challenges. First, it is
a global setting and not suitable for multi-threaded apps that
need to serve-up requests in multiple locales. Second, the
2009-03-13 20:48:29 -04:00
name of a relevant locale (such as "de_DE") can vary from
platform to platform or may not be defined at all. The docs
for the locale module describe these and `many other challenges`_
in detail.
.. _`many other challenges`: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats
It is not the goal to replace the locale module or to
accommodate every possible convention. Such tasks are better
suited to robust tools like `Babel`_ . Instead, our goal is to
make a common, everyday task easier for many users.
.. _`Babel`: http://babel.edgewall.org/
Current Version of the Mini-Language
====================================
* `Python 2.6 docs`_
.. _Python 2.6 docs: http://docs.python.org/library/string.html#formatstrings
* PEP 3101 Advanced String Formatting
Research so far
===============
Scanning the web, I've found that thousands separators are
2009-03-13 20:48:29 -04:00
usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands. Eric Smith pointed-out that these are already
2009-03-13 16:35:04 -04:00
handled by the "n" specifier in the locale module (albeit only
for integers).
Visual Basic and its brethren (like `MS Excel`_) use a completely
different style and have ultra-flexible custom format
2009-03-12 13:50:47 -04:00
specifiers like::
"_($* #,##0_)".
.. _`MS Excel`: http://www.brainbell.com/tutorials/ms-office/excel/Create_Custom_Number_Formats.htm
2009-03-12 15:36:28 -04:00
`COBOL`_ uses picture clauses like::
2009-03-12 15:20:00 -04:00
PICTURE $***,**9.99CR
2009-03-12 15:20:00 -04:00
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol#Syntactic_features
2009-03-12 15:36:28 -04:00
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
emit a COMMA as a thousands separator. The general form of ``~D`` is
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
to SPACE. The *commachar* defaults to COMMA. The *commainterval*
defaults to three.
::
(format nil "~:D" 229345007) => "229,345,007"
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
2009-03-12 15:36:28 -04:00
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
The type specifier approach is locale aware. The picture formatting only
offers a COMMA as a thousands separator::
2009-03-12 18:36:03 -04:00
String.Format("{0:n}", 12400) ==> "12,400"
String.Format("{0:0,0}", 12400) ==> "12,400"
2009-03-12 15:36:28 -04:00
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
Proposal I (from Nick Coghlan)
==============================
A comma will be added to the format() specifier mini-language::
[[fill]align][sign][#][0][width][,][.precision][type]
The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.
The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example::
2009-03-12 18:36:03 -04:00
format(n, "6,d").replace(",", "_")
This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped::
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
The *width* argument means the total length including the commas
and decimal point::
format(1234, "08,d") --> '0001,234'
format(1234.5, "08,.1f") --> '01,234.5'
Proposal II (from Eric Smith)
=============================
Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
The SPACE can be either U+0020 or U+00A0.
Whenever a separator is followed by a precision, it is a
decimal separator and an optional separator preceding it is a
thousands separator. When the precision is absent, a lone
specifier means a thousands separator::
[[fill]align][sign][#][0][width][tsep][dsep precision]][type]
Examples::
format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8.,1f") --> ' 1.234,0'
format(1234, "8 ,f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8,d") --> ' 1,234'
format(1234, "8_d") --> ' 1_234'
This proposal meets mosts needs , but it comes at the expense
of being a little more complicated to learn and remember.
As shown in the examples, the *width* argument means the total
length including the thousands separators and decimal separators.
2009-03-12 13:50:47 -04:00
No change is proposed for the locale module.
2009-03-13 16:35:04 -04:00
Comparison
==========
The difference between the two proposals is that the first is hard-wired
to a COMMA for a thousands separator and a DOT as a decimal separator.
The second allows either separator to be one of several possibilities.
Other Ideas
===========
* Lie Ryan suggested a convenience function of the form::
create_format(self, type='i', base=16, seppos=4, sep=':',
charset='0123456789abcdef', maxwidth=32,
minwidth=32, pad='0')
* Eric Smith would like the C version of the mini-language
2009-03-12 22:20:21 -04:00
parser to be exposed with hooks that would make it easier
to write custom *__format__* methods. That way, methods like
*Decimal.__format__* would not have to be written from scratch.
* Antoine Pitrou noted that the provision for a SPACE separator
should also allow a non-breaking space (U+00A0).
* A poster on the newgroup, Wolfgang Rohdewald, noted that a
2009-03-13 19:58:25 -04:00
convention in Switzerland is to use an APOSTROPHE as a
thousands separator, ``12`000.99``.
2009-03-13 20:48:29 -04:00
* The `ADA language`_ allows UNDERSCORES in its numeric literals.
.. _`ADA language`: http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html
Commentary
==========
* Some commenters do not like the idea of format strings at all
and find them to be unreadable. Suggested alternatives include
the COBOL style PICTURE approach or a convenience function with
keyword arguments for every possible combination.
* Some newsgroup respondants think there is no place for any
scripts that are not internationalized and that it is a step
2009-03-13 23:08:00 -04:00
backwards to provide a simple way to hardwire a particular choice
(thus reducing incentive to use a locale sensitive approach).
* Another thought is that embedding some particular convention in
individual format strings makes it hard to change that convention
later. No workable alternative was suggested but the general idea
is to set the convention once and have it apply everywhere (others
commented that locale already provides a way to do this).
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: