python-peps/pep-0378.txt

PEP: 378
Title: Format Specifier for Thousands Separator
Version: $Revision$
Last-Modified: $Date$
Author: Raymond Hettinger <python@rcn.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Mar-2009
Post-History: 12-Mar-2009


Motivation
==========

Provide a simple, non-locale aware way to format a number
with a thousands separator.

Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.

In the finance world, output with commas is the norm.  Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.

It is not the goal to replace locale or to accommodate every
possible convention.  The goal is to make a common task easier
for many users.


Current Version of the Mini-Language
====================================

* `Python 2.6 docs`_

  .. _Python 2.6 docs: http://docs.python.org/library/string.html#formatstrings

* PEP 3101 Advanced String Formatting


Research so far
===============

Scanning the web, I've found that thousands separators are
usually one of COMMA, DOT, SPACE, or UNDERSCORE.  
When a COMMA is the decimal separator, the thousands separator
is typically a DOT or SPACE (see examples from Denis Spir).

James Knight observed that Indian/Pakistani numbering systems
group by hundreds.   Ben Finney noted that Chinese group by
ten-thousands.  Eric Smith pointed-out that these are already
handled by the "n" specifier in the locale module (albiet only
for integers).

Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: "_($* #,##0_)".

`Common Lisp`_ uses a COLON before the type specifier to emit a COMMA
as a thousands separator::

    (format nil "The answer is ~:D." 229345007) 
                                => "The answer is 229,345,007."

.. _`Common Lisp`: http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html

Proposal I (from Nick Coghlan)
==============================

A comma will be added to the format() specifier mini-language:

[[fill]align][sign][#][0][width][,][.precision][type]

The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.

The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example::

  format(n, "6,f").replace(",", "_")

This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped::

  format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")


Proposal II (to meet Antoine Pitrou's request)
==============================================

Make both the thousands separator and decimal separator user
specifiable but not locale aware.  For simplicity, limit the
choices to a comma, period, space, or underscore.

[[fill]align][sign][#][0][width][T[tsep]][dsep precision][type]

Examples::

  format(1234, "8.1f")    -->     '  1234.0'
  format(1234, "8,1f")    -->     '  1234,0'
  format(1234, "8T.,1f")  -->     ' 1.234,0'
  format(1234, "8T ,f")   -->     ' 1 234,0'
  format(1234, "8d")      -->     '    1234'
  format(1234, "8T,d")    -->     '   1,234'

This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but it comes at the
expense of being a little more complicated to learn and
remember.  Also, it makes it more challenging to write custom
__format__ methods that follow the format specification
mini-language.

No change is proposed for the local module.


Other Ideas
===========

* Lie Ryan suggested a convenience function of the form::

    create_format(self, type='i', base=16, seppos=4, sep=':',
                  charset='0123456789abcdef', maxwidth=32,
                  minwidth=32, pad='0')

* Eric Smith would like the C version of the mini-language
  parser to be exposed.  That would make it easier to write
  custom __format__ methods.


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: