2009-03-12 08:43:24 -04:00
|
|
|
|
PEP: 378
|
|
|
|
|
Title: Format Specifier for Thousands Separator
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2009-03-12 08:48:58 -04:00
|
|
|
|
Author: Raymond Hettinger <python@rcn.com>
|
2009-03-12 08:43:24 -04:00
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 12-Mar-2009
|
2009-03-12 14:20:21 -04:00
|
|
|
|
Python-Version: 2.7 and 3.1
|
2009-03-12 08:43:24 -04:00
|
|
|
|
Post-History: 12-Mar-2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Provide a simple, non-locale aware way to format a number
|
|
|
|
|
with a thousands separator.
|
|
|
|
|
|
|
|
|
|
Adding thousands separators is one of the simplest ways to
|
|
|
|
|
improve the professional appearance and readability of output
|
|
|
|
|
exposed to end users.
|
|
|
|
|
|
|
|
|
|
In the finance world, output with commas is the norm. Finance
|
|
|
|
|
users and non-professional programmers find the locale
|
|
|
|
|
approach to be frustrating, arcane and non-obvious.
|
|
|
|
|
|
|
|
|
|
It is not the goal to replace locale or to accommodate every
|
|
|
|
|
possible convention. The goal is to make a common task easier
|
|
|
|
|
for many users.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Current Version of the Mini-Language
|
|
|
|
|
====================================
|
|
|
|
|
|
|
|
|
|
* `Python 2.6 docs`_
|
|
|
|
|
|
|
|
|
|
.. _Python 2.6 docs: http://docs.python.org/library/string.html#formatstrings
|
|
|
|
|
|
|
|
|
|
* PEP 3101 Advanced String Formatting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Research so far
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
Scanning the web, I've found that thousands separators are
|
|
|
|
|
usually one of COMMA, DOT, SPACE, or UNDERSCORE.
|
|
|
|
|
When a COMMA is the decimal separator, the thousands separator
|
|
|
|
|
is typically a DOT or SPACE (see examples from Denis Spir).
|
|
|
|
|
|
|
|
|
|
James Knight observed that Indian/Pakistani numbering systems
|
|
|
|
|
group by hundreds. Ben Finney noted that Chinese group by
|
|
|
|
|
ten-thousands. Eric Smith pointed-out that these are already
|
|
|
|
|
handled by the "n" specifier in the locale module (albiet only
|
|
|
|
|
for integers).
|
|
|
|
|
|
|
|
|
|
Visual Basic and its brethren (like MS Excel) use a completely
|
|
|
|
|
different style and have ultra-flexible custom format
|
2009-03-12 13:50:47 -04:00
|
|
|
|
specifiers like::
|
|
|
|
|
|
|
|
|
|
"_($* #,##0_)".
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-12 15:36:28 -04:00
|
|
|
|
`COBOL`_ uses picture clauses like::
|
2009-03-12 15:20:00 -04:00
|
|
|
|
|
|
|
|
|
PIC $***,**9.99CR
|
|
|
|
|
|
2009-03-12 15:36:28 -04:00
|
|
|
|
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol
|
|
|
|
|
|
2009-03-12 13:46:29 -04:00
|
|
|
|
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
|
|
|
|
|
emit a COMMA as a thousands separator. The general form of ``~D`` is
|
|
|
|
|
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
|
|
|
|
|
to SPACE. The *commachar* defaults to COLON. The *commainterval*
|
|
|
|
|
defaults to three.
|
2009-03-12 13:28:57 -04:00
|
|
|
|
|
2009-03-12 13:46:29 -04:00
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
(format nil "~:D" 229345007) => "229,345,007"
|
2009-03-12 13:28:57 -04:00
|
|
|
|
|
|
|
|
|
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-12 15:36:28 -04:00
|
|
|
|
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
|
|
|
|
|
The type specifier approach is locale aware. The picture formatting only
|
|
|
|
|
offers a COMMA as a thousands separator::
|
|
|
|
|
|
2009-03-12 18:36:03 -04:00
|
|
|
|
String.Format("{0:n}", 12400) ==> "12,400"
|
|
|
|
|
String.Format("{0:0,0}", 12400) ==> "12,400"
|
2009-03-12 15:36:28 -04:00
|
|
|
|
|
|
|
|
|
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
|
|
|
|
|
|
|
|
|
|
|
2009-03-12 08:43:24 -04:00
|
|
|
|
Proposal I (from Nick Coghlan)
|
|
|
|
|
==============================
|
|
|
|
|
|
|
|
|
|
A comma will be added to the format() specifier mini-language:
|
|
|
|
|
|
|
|
|
|
[[fill]align][sign][#][0][width][,][.precision][type]
|
|
|
|
|
|
|
|
|
|
The ',' option indicates that commas should be included in the
|
|
|
|
|
output as a thousands separator. As with locales which do not
|
|
|
|
|
use a period as the decimal point, locales which use a
|
|
|
|
|
different convention for digit separation will need to use the
|
|
|
|
|
locale module to obtain appropriate formatting.
|
|
|
|
|
|
|
|
|
|
The proposal works well with floats, ints, and decimals.
|
|
|
|
|
It also allows easy substitution for other separators.
|
|
|
|
|
For example::
|
|
|
|
|
|
2009-03-12 18:36:03 -04:00
|
|
|
|
format(n, "6,d").replace(",", "_")
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
This technique is completely general but it is awkward in the
|
|
|
|
|
one case where the commas and periods need to be swapped::
|
|
|
|
|
|
|
|
|
|
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
|
|
|
|
|
|
2009-03-12 19:33:12 -04:00
|
|
|
|
The *width* argument means the total length including the commas
|
|
|
|
|
and decimal point::
|
|
|
|
|
|
|
|
|
|
format(1234, "08,d") --> '0001,234'
|
|
|
|
|
format(1234.5, "08,.1f") --> '01,234.5'
|
|
|
|
|
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
Proposal II (to meet Antoine Pitrou's request)
|
|
|
|
|
==============================================
|
|
|
|
|
|
|
|
|
|
Make both the thousands separator and decimal separator user
|
|
|
|
|
specifiable but not locale aware. For simplicity, limit the
|
|
|
|
|
choices to a comma, period, space, or underscore.
|
|
|
|
|
|
|
|
|
|
[[fill]align][sign][#][0][width][T[tsep]][dsep precision][type]
|
|
|
|
|
|
|
|
|
|
Examples::
|
|
|
|
|
|
|
|
|
|
format(1234, "8.1f") --> ' 1234.0'
|
|
|
|
|
format(1234, "8,1f") --> ' 1234,0'
|
|
|
|
|
format(1234, "8T.,1f") --> ' 1.234,0'
|
2009-03-12 12:17:07 -04:00
|
|
|
|
format(1234, "8T ,f") --> ' 1 234,0'
|
2009-03-12 08:43:24 -04:00
|
|
|
|
format(1234, "8d") --> ' 1234'
|
|
|
|
|
format(1234, "8T,d") --> ' 1,234'
|
|
|
|
|
|
|
|
|
|
This proposal meets mosts needs (except for people wanting
|
|
|
|
|
grouping for hundreds or ten-thousands), but it comes at the
|
|
|
|
|
expense of being a little more complicated to learn and
|
|
|
|
|
remember. Also, it makes it more challenging to write custom
|
|
|
|
|
__format__ methods that follow the format specification
|
|
|
|
|
mini-language.
|
|
|
|
|
|
2009-03-12 19:33:12 -04:00
|
|
|
|
As shown in the examples, the *width* argument means the total
|
|
|
|
|
length including the thousands separators and decimal separators.
|
|
|
|
|
|
2009-03-12 13:50:47 -04:00
|
|
|
|
No change is proposed for the locale module.
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
|
2009-03-12 22:20:21 -04:00
|
|
|
|
Proposal III (from Eric Smith: like II but without the T)
|
|
|
|
|
=========================================================
|
|
|
|
|
|
|
|
|
|
In the second proposal, the *T* isn't strictly necessary.
|
|
|
|
|
In the context of an integer, an optional single specifier means
|
|
|
|
|
a thousands separator. In the context of a float, a single
|
|
|
|
|
specifier is a decimal separator while two specifiers are taken
|
|
|
|
|
as a thousands separator and a decimal separator::
|
|
|
|
|
|
|
|
|
|
[[fill]align][sign][#][0][width][tsep][dsep precision][type]
|
|
|
|
|
|
|
|
|
|
Examples::
|
|
|
|
|
|
|
|
|
|
format(1234, "8.1f") --> ' 1234.0'
|
|
|
|
|
format(1234, "8,1f") --> ' 1234,0'
|
|
|
|
|
format(1234, "8.,1f") --> ' 1.234,0'
|
|
|
|
|
format(1234, "8 ,f") --> ' 1 234,0'
|
|
|
|
|
format(1234, "8d") --> ' 1234'
|
|
|
|
|
format(1234, "8,d") --> ' 1,234'
|
|
|
|
|
|
|
|
|
|
This is a cleaner looking syntax but has the minor disadvantage of
|
|
|
|
|
using context to direct the translation. Whenever the separator
|
2009-03-12 22:23:57 -04:00
|
|
|
|
is followed by a precision, it is a decimal separator and the separator
|
2009-03-12 22:20:21 -04:00
|
|
|
|
preceding it is a thousands separator. When the precision is
|
|
|
|
|
absent, the context is integral and a lone specifier means
|
|
|
|
|
a thousands separator.
|
|
|
|
|
|
|
|
|
|
|
2009-03-12 08:43:24 -04:00
|
|
|
|
Other Ideas
|
|
|
|
|
===========
|
|
|
|
|
|
|
|
|
|
* Lie Ryan suggested a convenience function of the form::
|
|
|
|
|
|
2009-03-12 09:03:14 -04:00
|
|
|
|
create_format(self, type='i', base=16, seppos=4, sep=':',
|
|
|
|
|
charset='0123456789abcdef', maxwidth=32,
|
2009-03-12 08:43:24 -04:00
|
|
|
|
minwidth=32, pad='0')
|
|
|
|
|
|
|
|
|
|
* Eric Smith would like the C version of the mini-language
|
2009-03-12 22:20:21 -04:00
|
|
|
|
parser to be exposed with hooks that would make it easier
|
|
|
|
|
to write custom __format__ methods. That way
|
|
|
|
|
Decimal.__format__ would not have to be written from scratch.
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|