2009-03-12 08:43:24 -04:00
|
|
|
|
PEP: 378
|
|
|
|
|
Title: Format Specifier for Thousands Separator
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2009-03-12 08:48:58 -04:00
|
|
|
|
Author: Raymond Hettinger <python@rcn.com>
|
2009-04-25 19:31:00 -04:00
|
|
|
|
Status: Final
|
2009-03-12 08:43:24 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 12-Mar-2009
|
2009-03-12 14:20:21 -04:00
|
|
|
|
Python-Version: 2.7 and 3.1
|
2009-03-12 08:43:24 -04:00
|
|
|
|
Post-History: 12-Mar-2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Provide a simple, non-locale aware way to format a number
|
|
|
|
|
with a thousands separator.
|
|
|
|
|
|
|
|
|
|
Adding thousands separators is one of the simplest ways to
|
2009-03-13 23:08:00 -04:00
|
|
|
|
humanize a program's output, improving its professional appearance
|
|
|
|
|
and readability.
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-13 23:08:00 -04:00
|
|
|
|
In the finance world, output with thousands separators is the norm.
|
|
|
|
|
Finance users and non-professional programmers find the locale
|
2009-03-12 08:43:24 -04:00
|
|
|
|
approach to be frustrating, arcane and non-obvious.
|
|
|
|
|
|
2009-03-13 19:28:44 -04:00
|
|
|
|
The locale module presents two other challenges. First, it is
|
|
|
|
|
a global setting and not suitable for multi-threaded apps that
|
|
|
|
|
need to serve-up requests in multiple locales. Second, the
|
2009-03-13 20:48:29 -04:00
|
|
|
|
name of a relevant locale (such as "de_DE") can vary from
|
2009-03-13 19:28:44 -04:00
|
|
|
|
platform to platform or may not be defined at all. The docs
|
2009-03-13 19:52:26 -04:00
|
|
|
|
for the locale module describe these and `many other challenges`_
|
2009-03-13 19:28:44 -04:00
|
|
|
|
in detail.
|
|
|
|
|
|
2016-03-27 15:56:33 -04:00
|
|
|
|
.. _`many other challenges`: https://docs.python.org/2.6/library/locale.html#background-details-hints-tips-and-caveats
|
2009-03-13 19:28:44 -04:00
|
|
|
|
|
2009-03-16 03:24:12 -04:00
|
|
|
|
It is not the goal to replace the locale module, to perform
|
2009-03-16 03:24:50 -04:00
|
|
|
|
internationalization tasks, or accommodate every possible
|
|
|
|
|
convention. Such tasks are better suited to robust tools like
|
2015-02-14 12:47:39 -05:00
|
|
|
|
`Babel`_. Instead, the goal is to make a common, everyday
|
2009-03-16 03:24:50 -04:00
|
|
|
|
task easier for many users.
|
2009-03-13 19:52:26 -04:00
|
|
|
|
|
|
|
|
|
.. _`Babel`: http://babel.edgewall.org/
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
Main Proposal (from Nick Coghlan, originally called Proposal I)
|
|
|
|
|
===============================================================
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
A comma will be added to the format() specifier mini-language::
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
[[fill]align][sign][#][0][width][,][.precision][type]
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
The ',' option indicates that commas should be included in the
|
|
|
|
|
output as a thousands separator. As with locales which do not
|
|
|
|
|
use a period as the decimal point, locales which use a
|
|
|
|
|
different convention for digit separation will need to use the
|
|
|
|
|
locale module to obtain appropriate formatting.
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
The proposal works well with floats, ints, and decimals.
|
|
|
|
|
It also allows easy substitution for other separators.
|
|
|
|
|
For example::
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
format(n, "6,d").replace(",", "_")
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
This technique is completely general but it is awkward in the
|
|
|
|
|
one case where the commas and periods need to be swapped::
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
The *width* argument means the total length including the commas
|
|
|
|
|
and decimal point::
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
format(1234, "08,d") --> '0001,234'
|
|
|
|
|
format(1234.5, "08,.1f") --> '01,234.5'
|
|
|
|
|
|
|
|
|
|
The ',' option is defined as shown above for types 'd', 'e',
|
2009-04-17 14:34:48 -04:00
|
|
|
|
'f', 'g', 'E', 'G', '%', 'F' and ''. To allow future extensions, it is
|
2009-03-16 20:06:05 -04:00
|
|
|
|
undefined for other types: binary, octal, hex, character,
|
|
|
|
|
etc.
|
|
|
|
|
|
|
|
|
|
This proposal has the virtue of being simpler than the alternative
|
|
|
|
|
proposal but is much less flexible and meets the needs of fewer
|
|
|
|
|
users right out of the box. It is expected that some other
|
|
|
|
|
solution will arise for specifying alternative separators.
|
2009-03-16 18:16:11 -04:00
|
|
|
|
|
|
|
|
|
|
2009-03-12 08:43:24 -04:00
|
|
|
|
Current Version of the Mini-Language
|
|
|
|
|
====================================
|
|
|
|
|
|
|
|
|
|
* `Python 2.6 docs`_
|
|
|
|
|
|
2016-03-27 15:56:33 -04:00
|
|
|
|
.. _Python 2.6 docs: https://docs.python.org/2.6/library/string.html#formatstrings
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
* PEP 3101 Advanced String Formatting
|
|
|
|
|
|
|
|
|
|
|
2009-03-16 18:16:11 -04:00
|
|
|
|
Research into what Other Languages Do
|
|
|
|
|
=====================================
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
Scanning the web, I've found that thousands separators are
|
2009-03-13 20:48:29 -04:00
|
|
|
|
usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 18:16:11 -04:00
|
|
|
|
`C-Sharp`_ provides both styles (picture formatting and type specifiers).
|
|
|
|
|
The type specifier approach is locale aware. The picture formatting only
|
|
|
|
|
offers a COMMA as a thousands separator::
|
2009-03-12 15:20:00 -04:00
|
|
|
|
|
2009-03-16 18:16:11 -04:00
|
|
|
|
String.Format("{0:n}", 12400) ==> "12,400"
|
|
|
|
|
String.Format("{0:0,0}", 12400) ==> "12,400"
|
2009-03-12 15:20:00 -04:00
|
|
|
|
|
2009-03-16 18:16:11 -04:00
|
|
|
|
.. _`C-Sharp`: http://blog.stevex.net/index.php/string-formatting-in-csharp/
|
2009-03-12 15:36:28 -04:00
|
|
|
|
|
2009-03-12 13:46:29 -04:00
|
|
|
|
`Common Lisp`_ uses a COLON before the ``~D`` decimal type specifier to
|
|
|
|
|
emit a COMMA as a thousands separator. The general form of ``~D`` is
|
|
|
|
|
``~mincol,padchar,commachar,commaintervalD``. The *padchar* defaults
|
2009-03-13 19:52:26 -04:00
|
|
|
|
to SPACE. The *commachar* defaults to COMMA. The *commainterval*
|
2009-03-12 13:46:29 -04:00
|
|
|
|
defaults to three.
|
2009-03-12 13:28:57 -04:00
|
|
|
|
|
2009-03-12 13:46:29 -04:00
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
(format nil "~:D" 229345007) => "229,345,007"
|
2009-03-12 13:28:57 -04:00
|
|
|
|
|
|
|
|
|
.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-12 15:36:28 -04:00
|
|
|
|
|
2009-03-16 18:16:11 -04:00
|
|
|
|
* The `ADA language`_ allows UNDERSCORES in its numeric literals.
|
2009-03-12 15:36:28 -04:00
|
|
|
|
|
2009-03-16 18:16:11 -04:00
|
|
|
|
.. _`ADA language`: http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html
|
|
|
|
|
|
|
|
|
|
Visual Basic and its brethren (like `MS Excel`_) use a completely
|
|
|
|
|
different style and have ultra-flexible custom format
|
|
|
|
|
specifiers like::
|
|
|
|
|
|
|
|
|
|
"_($* #,##0_)".
|
|
|
|
|
|
|
|
|
|
.. _`MS Excel`: http://www.brainbell.com/tutorials/ms-office/excel/Create_Custom_Number_Formats.htm
|
|
|
|
|
|
|
|
|
|
`COBOL`_ uses picture clauses like::
|
|
|
|
|
|
|
|
|
|
PICTURE $***,**9.99CR
|
|
|
|
|
|
|
|
|
|
.. _`COBOL`: http://en.wikipedia.org/wiki/Cobol#Syntactic_features
|
2009-03-12 15:36:28 -04:00
|
|
|
|
|
2009-03-17 15:38:35 -04:00
|
|
|
|
Java offers a `Decimal.Format Class`_ that uses picture patterns (one
|
|
|
|
|
for positive numbers and an optional one for negatives) such as:
|
|
|
|
|
``"#,##0.00;(#,##0.00)"``. It allows arbitrary groupings including
|
|
|
|
|
hundreds and ten-thousands and uneven groupings. The special patten
|
|
|
|
|
characters are non-localized (using a DOT for a decimal separator and
|
|
|
|
|
a COMMA for a grouping separator). The user can supply an alternate
|
|
|
|
|
set of symbols using the formatter's *DecimalFormatSymbols* object.
|
|
|
|
|
|
|
|
|
|
.. _`Decimal.Format Class`: http://java.sun.com/javase/6/docs/api/java/text/DecimalFormat.html
|
|
|
|
|
|
2009-03-12 15:36:28 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
Alternative Proposal (from Eric Smith, originally called Proposal II)
|
|
|
|
|
=====================================================================
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
Make both the thousands separator and decimal separator user
|
|
|
|
|
specifiable but not locale aware. For simplicity, limit the
|
|
|
|
|
choices to a COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
|
|
|
|
|
The SPACE can be either U+0020 or U+00A0.
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
Whenever a separator is followed by a precision, it is a
|
|
|
|
|
decimal separator and an optional separator preceding it is a
|
|
|
|
|
thousands separator. When the precision is absent, a lone
|
|
|
|
|
specifier means a thousands separator::
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 22:45:54 -04:00
|
|
|
|
[[fill]align][sign][#][0][width][tsep][dsep precision][type]
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
Examples::
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
format(1234, "8.1f") --> ' 1234.0'
|
|
|
|
|
format(1234, "8,1f") --> ' 1234,0'
|
|
|
|
|
format(1234, "8.,1f") --> ' 1.234,0'
|
|
|
|
|
format(1234, "8 ,f") --> ' 1 234,0'
|
|
|
|
|
format(1234, "8d") --> ' 1234'
|
|
|
|
|
format(1234, "8,d") --> ' 1,234'
|
|
|
|
|
format(1234, "8_d") --> ' 1_234'
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
This proposal meets mosts needs, but it comes at the expense
|
|
|
|
|
of taking a bit more effort to parse. Not every possible
|
|
|
|
|
convention is covered, but at least one of the options (spaces
|
|
|
|
|
or underscores) should be readable, understandable, and useful
|
|
|
|
|
to folks from many diverse backgrounds.
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
As shown in the examples, the *width* argument means the total
|
|
|
|
|
length including the thousands separators and decimal separators.
|
2009-03-12 19:33:12 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
No change is proposed for the locale module.
|
2009-03-12 19:33:12 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
The thousands separator is defined as shown above for types
|
|
|
|
|
'd', 'e', 'f', 'g', '%', 'E', 'G' and 'F'. To allow future
|
|
|
|
|
extensions, it is undefined for other types: binary, octal,
|
|
|
|
|
hex, character, etc.
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
2009-03-16 20:06:05 -04:00
|
|
|
|
The drawback to this alternative proposal is the difficulty
|
|
|
|
|
of mentally parsing whether a single separator is a thousands
|
|
|
|
|
separator or decimal separator. Perhaps it is too arcane
|
|
|
|
|
to link the decimal separator with the precision specifier.
|
2009-03-13 20:48:29 -04:00
|
|
|
|
|
2009-03-13 19:28:44 -04:00
|
|
|
|
|
|
|
|
|
Commentary
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
* Some commenters do not like the idea of format strings at all
|
|
|
|
|
and find them to be unreadable. Suggested alternatives include
|
|
|
|
|
the COBOL style PICTURE approach or a convenience function with
|
|
|
|
|
keyword arguments for every possible combination.
|
|
|
|
|
|
|
|
|
|
* Some newsgroup respondants think there is no place for any
|
|
|
|
|
scripts that are not internationalized and that it is a step
|
2009-03-13 23:08:00 -04:00
|
|
|
|
backwards to provide a simple way to hardwire a particular choice
|
|
|
|
|
(thus reducing incentive to use a locale sensitive approach).
|
2009-03-13 19:28:44 -04:00
|
|
|
|
|
|
|
|
|
* Another thought is that embedding some particular convention in
|
|
|
|
|
individual format strings makes it hard to change that convention
|
|
|
|
|
later. No workable alternative was suggested but the general idea
|
|
|
|
|
is to set the convention once and have it apply everywhere (others
|
2009-03-13 19:52:26 -04:00
|
|
|
|
commented that locale already provides a way to do this).
|
2009-03-13 19:28:44 -04:00
|
|
|
|
|
2009-03-16 03:24:12 -04:00
|
|
|
|
* There are some precedents for grouping digits in the fractional
|
|
|
|
|
part of a floating point number, but this PEP does not venture into
|
|
|
|
|
that territory. Only digits to the left of the decimal point are
|
|
|
|
|
grouped. This does not preclude future extensions; it just focuses
|
|
|
|
|
on a single, generally useful extension to the formatting language.
|
|
|
|
|
|
|
|
|
|
* James Knight observed that Indian/Pakistani numbering systems
|
|
|
|
|
group by hundreds. Ben Finney noted that Chinese group by
|
|
|
|
|
ten-thousands. Eric Smith pointed-out that these are already
|
|
|
|
|
handled by the "n" specifier in the locale module (albeit only
|
|
|
|
|
for integers). This PEP does not attempt to support all of those
|
2009-03-16 15:57:45 -04:00
|
|
|
|
possibilities. It focues on a single, relatively common grouping
|
2009-03-16 03:24:12 -04:00
|
|
|
|
convention that offers a quick way to improve readability in many
|
|
|
|
|
(though not all) contexts.
|
|
|
|
|
|
2009-03-12 08:43:24 -04:00
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|