python-peps/pep-0378.txt

211 lines
6.5 KiB
Plaintext
Raw Normal View History

PEP: 378
Title: Format Specifier for Thousands Separator
Version: $Revision$
Last-Modified: $Date$
2009-03-12 08:48:58 -04:00
Author: Raymond Hettinger <python@rcn.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Mar-2009
2009-03-12 14:20:21 -04:00
Python-Version: 2.7 and 3.1
Post-History: 12-Mar-2009
Motivation
==========
Provide a simple, non-locale aware way to format a number
with a thousands separator.
Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.
In the finance world, output with commas is the norm. Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.
It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier
for many users.
Current Version of the Mini-Language
====================================
* `Python 2.6 docs`_
.. _Python 2.6 docs: http://docs.python.org/library/string.html#formatstrings
* PEP 3101 Advanced String Formatting
Research so far
===============
Scanning the web, I've found that thousands separators are
usually one of COMMA, DOT, SPACE, or UNDERSCORE.
When a COMMA is the decimal separator, the thousands separator
is typically a DOT or SPACE (see examples from Denis Spir).
James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands. Eric Smith pointed-out that these are already
handled by the "n" specifier in the locale module (albiet only
for integers).
Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
2009-03-12 13:50:47 -04:00
specifiers like::
"_($* #,##0_)".
2009-03-12 15:36:28 -04:00
`COBOL`_ uses picture clauses like::
2009-03-12 15:20:00 -04:00
PIC $***,**9.99CR
2009-03-12 15:36:28 -04:00
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
emit a COMMA as a thousands separator. The general form of ``~D`` is
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
to SPACE. The *commachar* defaults to COLON. The *commainterval*
defaults to three.
::
(format nil "~:D" 229345007) => "229,345,007"
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
2009-03-12 15:36:28 -04:00
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
The type specifier approach is locale aware. The picture formatting only
offers a COMMA as a thousands separator::
2009-03-12 18:36:03 -04:00
String.Format("{0:n}", 12400) ==> "12,400"
String.Format("{0:0,0}", 12400) ==> "12,400"
2009-03-12 15:36:28 -04:00
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
Proposal I (from Nick Coghlan)
==============================
A comma will be added to the format() specifier mini-language:
[[fill]align][sign][#][0][width][,][.precision][type]
The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.
The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example::
2009-03-12 18:36:03 -04:00
format(n, "6,d").replace(",", "_")
This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped::
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
The *width* argument means the total length including the commas
and decimal point::
format(1234, "08,d") --> '0001,234'
format(1234.5, "08,.1f") --> '01,234.5'
Proposal II (to meet Antoine Pitrou's request)
==============================================
Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a comma, period, space, or underscore.
[[fill]align][sign][#][0][width][T[tsep]][dsep precision][type]
Examples::
format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
2009-03-12 12:17:07 -04:00
format(1234, "8T ,f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'
This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but it comes at the
expense of being a little more complicated to learn and
remember. Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.
As shown in the examples, the *width* argument means the total
length including the thousands separators and decimal separators.
2009-03-12 13:50:47 -04:00
No change is proposed for the locale module.
2009-03-12 22:20:21 -04:00
Proposal III (from Eric Smith: like II but without the T)
=========================================================
In the second proposal, the *T* isn't strictly necessary.
In the context of an integer, an optional single specifier means
a thousands separator. In the context of a float, a single
specifier is a decimal separator while two specifiers are taken
as a thousands separator and a decimal separator::
[[fill]align][sign][#][0][width][tsep][dsep precision][type]
Examples::
format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8.,1f") --> ' 1.234,0'
format(1234, "8 ,f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8,d") --> ' 1,234'
This is a cleaner looking syntax but has the minor disadvantage of
using context to direct the translation. Whenever the separator
2009-03-12 22:23:57 -04:00
is followed by a precision, it is a decimal separator and the separator
2009-03-12 22:20:21 -04:00
preceding it is a thousands separator. When the precision is
absent, the context is integral and a lone specifier means
a thousands separator.
Other Ideas
===========
* Lie Ryan suggested a convenience function of the form::
create_format(self, type='i', base=16, seppos=4, sep=':',
charset='0123456789abcdef', maxwidth=32,
minwidth=32, pad='0')
* Eric Smith would like the C version of the mini-language
2009-03-12 22:20:21 -04:00
parser to be exposed with hooks that would make it easier
to write custom __format__ methods. That way
Decimal.__format__ would not have to be written from scratch.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: