python-peps/pep-0515.txt

PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6
Post-History: 10-Feb-2016, 11-Feb-2016

Abstract and Rationale
======================

This PEP proposes to extend Python's syntax so that underscores can be used as
visual separators for digit grouping purposes in integral, floating-point and
complex number literals.

This is a common feature of other modern languages, and can aid readability of
long literals, or literals whose value should clearly separate into parts, such
as bytes or words in hexadecimal notation.

Examples::

    # grouping decimal numbers by thousands
    amount = 10_000_000.0

    # grouping hexadecimal addresses by words
    addr = 0xDEAD_BEEF

    # grouping bits into nibbles in a binary literal
    flags = 0b_0011_1111_0100_1110

    # making the literal suffix stand out more
    imag = 1.247812376e-15_j


Specification
=============

The current proposal is to allow one or more consecutive underscores following
digits and base specifiers in numeric literals.  The underscores have no
semantic meaning, and literals are parsed as if the underscores were absent.

The production list for integer literals would therefore look like this::

   integer: decimalinteger | octinteger | hexinteger | bininteger
   decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")*
   nonzerodigit: "1"..."9"
   digit: "0"..."9"
   octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*
   hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*
   bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
   octdigit: "0"..."7"
   hexdigit: digit | "a"..."f" | "A"..."F"
   bindigit: "0" | "1"

For floating-point and complex literals::

   floatnumber: pointfloat | exponentfloat
   pointfloat: [intpart] fraction | intpart "."
   exponentfloat: (intpart | pointfloat) exponent
   intpart: digit (digit | "_")*
   fraction: "." intpart
   exponent: ("e" | "E") ["+" | "-"] intpart
   imagnumber: (floatnumber | intpart) ("j" | "J")


Alternative Syntax
==================

Underscore Placement Rules
--------------------------

Instead of the liberal rule specified above, the use of underscores could be
limited.  Common rules are (see the "other languages" section):

* Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscore allowed, but only between digits.

A less common rule would be to allow underscores only every N digits (where N
could be 3 for decimal literals, or 4 for hexadecimal ones).  This is
unnecessarily restrictive, especially considering the separator placement is
different in different cultures.

Different Separators
--------------------

A proposed alternate syntax was to use whitespace for grouping.  Although
strings are a precedent for combining adjoining literals, the behavior can lead
to unexpected effects which are not possible with underscores.  Also, no other
language is known to use this rule, except for languages that generally
disregard any whitespace.

C++14 introduces apostrophes for grouping, which is not considered due to the
conflict with Python's string literals. [1]_


Behavior in Other Languages
===========================

Those languages that do allow underscore grouping implement a large variety of
rules for allowed placement of underscores.  This is a listing placing the known
rules into three major groups.  In cases where the language spec contradicts the
actual behavior, the actual behavior is listed.

**Group 1: liberal**

This group is the least homogeneous: the rules vary slightly between languages.
All of them allow trailing underscores.  Some allow underscores after non-digits
like the ``e`` or the sign in exponents.

* D [2]_
* Perl 5 (underscores basically allowed anywhere, although docs say it's more
  restricted) [3]_
* Rust (allows between exponent sign and digits) [4]_
* Swift (although textual description says "between digits") [5]_

**Group 2: only between digits, multiple consecutive underscores**

* C# (open proposal for 7.0) [6]_
* Java [7]_

**Group 3: only between digits, only one underscore**

* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_


Implementation
==============

A preliminary patch that implements the specification given above has been
posted to the issue tracker. [11]_


Open Questions
==============

This PEP currently only proposes changing the literal syntax.  The following
extensions are open for discussion:

* Allowing underscores in string arguments to the ``Decimal`` constructor.  It
  could be argued that these are akin to literals, since there is no Decimal
  literal available (yet).

* Allowing underscores in string arguments to ``int()`` with base argument 0,
  ``float()`` and ``complex()``.


References
==========

.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

.. [2] http://dlang.org/spec/lex.html#integerliteral

.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors

.. [4] http://doc.rust-lang.org/reference.html#number-literals

.. [5] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html

.. [6] https://github.com/dotnet/roslyn/issues/216

.. [7] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html

.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4

.. [9] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/

.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers

.. [11] http://bugs.python.org/issue26331


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: