2016-02-10 17:16:04 -05:00
|
|
|
|
PEP: 515
|
|
|
|
|
Title: Underscores in Numeric Literals
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Author: Georg Brandl, Serhiy Storchaka
|
2016-02-10 17:16:04 -05:00
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 10-Feb-2016
|
|
|
|
|
Python-Version: 3.6
|
2016-02-11 03:18:45 -05:00
|
|
|
|
Post-History: 10-Feb-2016, 11-Feb-2016
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
Abstract and Rationale
|
|
|
|
|
======================
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
This PEP proposes to extend Python's syntax and number-from-string
|
|
|
|
|
constructors so that underscores can be used as visual separators for
|
|
|
|
|
digit grouping purposes in integral, floating-point and complex number
|
|
|
|
|
literals.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
This is a common feature of other modern languages, and can aid
|
|
|
|
|
readability of long literals, or literals whose value should clearly
|
|
|
|
|
separate into parts, such as bytes or words in hexadecimal notation.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
Examples::
|
|
|
|
|
|
|
|
|
|
# grouping decimal numbers by thousands
|
|
|
|
|
amount = 10_000_000.0
|
|
|
|
|
|
|
|
|
|
# grouping hexadecimal addresses by words
|
2016-02-13 07:24:48 -05:00
|
|
|
|
addr = 0xCAFE_F00D
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-11 15:08:20 -05:00
|
|
|
|
# grouping bits into nibbles in a binary literal
|
2016-02-10 17:16:04 -05:00
|
|
|
|
flags = 0b_0011_1111_0100_1110
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
# same, for string conversions
|
|
|
|
|
flags = int('0b_1111_0000', 2)
|
2016-02-11 02:45:52 -05:00
|
|
|
|
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
=============
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
The current proposal is to allow one underscore between digits, and
|
|
|
|
|
after base specifiers in numeric literals. The underscores have no
|
|
|
|
|
semantic meaning, and literals are parsed as if the underscores were
|
|
|
|
|
absent.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Literal Grammar
|
|
|
|
|
---------------
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
The production list for integer literals would therefore look like
|
|
|
|
|
this::
|
|
|
|
|
|
|
|
|
|
integer: decinteger | bininteger | octinteger | hexinteger
|
|
|
|
|
decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*
|
|
|
|
|
bininteger: "0" ("b" | "B") (["_"] bindigit)+
|
|
|
|
|
octinteger: "0" ("o" | "O") (["_"] octdigit)+
|
|
|
|
|
hexinteger: "0" ("x" | "X") (["_"] hexdigit)+
|
2016-02-10 17:16:04 -05:00
|
|
|
|
nonzerodigit: "1"..."9"
|
|
|
|
|
digit: "0"..."9"
|
2016-02-13 03:43:02 -05:00
|
|
|
|
bindigit: "0" | "1"
|
2016-02-10 17:16:04 -05:00
|
|
|
|
octdigit: "0"..."7"
|
|
|
|
|
hexdigit: digit | "a"..."f" | "A"..."F"
|
|
|
|
|
|
2016-02-11 02:45:52 -05:00
|
|
|
|
For floating-point and complex literals::
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
floatnumber: pointfloat | exponentfloat
|
2016-02-13 03:43:02 -05:00
|
|
|
|
pointfloat: [digitpart] fraction | digitpart "."
|
|
|
|
|
exponentfloat: (digitpart | pointfloat) exponent
|
|
|
|
|
digitpart: digit (["_"] digit)*
|
|
|
|
|
fraction: "." digitpart
|
|
|
|
|
exponent: ("e" | "E") ["+" | "-"] digitpart
|
|
|
|
|
imagnumber: (floatnumber | digitpart) ("j" | "J")
|
2016-02-11 02:45:52 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Constructors
|
|
|
|
|
------------
|
2016-02-11 02:45:52 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Following the same rules for placement, underscores will be allowed in
|
|
|
|
|
the following constructors:
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
- ``int()`` (with any base)
|
|
|
|
|
- ``float()``
|
|
|
|
|
- ``complex()``
|
|
|
|
|
- ``Decimal()``
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Prior Art
|
|
|
|
|
=========
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Those languages that do allow underscore grouping implement a large
|
|
|
|
|
variety of rules for allowed placement of underscores. In cases where
|
|
|
|
|
the language spec contradicts the actual behavior, the actual behavior
|
|
|
|
|
is listed. ("single" or "multiple" refer to allowing runs of
|
|
|
|
|
consecutive underscores.)
|
|
|
|
|
|
|
|
|
|
* Ada: single, only between digits [8]_
|
|
|
|
|
* C# (open proposal for 7.0): multiple, only between digits [6]_
|
2016-02-13 03:46:00 -05:00
|
|
|
|
* C++14: single, between digits (different separator chosen) [1]_
|
2016-02-13 03:43:02 -05:00
|
|
|
|
* D: multiple, anywhere, including trailing [2]_
|
|
|
|
|
* Java: multiple, only between digits [7]_
|
|
|
|
|
* Julia: single, only between digits (but not in float exponent parts)
|
|
|
|
|
[9]_
|
|
|
|
|
* Perl 5: multiple, basically anywhere, although docs say it's
|
|
|
|
|
restricted to one underscore between digits [3]_
|
|
|
|
|
* Ruby: single, only between digits (although docs say "anywhere")
|
|
|
|
|
[10]_
|
|
|
|
|
* Rust: multiple, anywhere, except for between exponent "e" and digits
|
|
|
|
|
[4]_
|
|
|
|
|
* Swift: multiple, between digits and trailing (although textual
|
|
|
|
|
description says only "between digits") [5]_
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Alternative Syntax
|
|
|
|
|
==================
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Underscore Placement Rules
|
|
|
|
|
--------------------------
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Instead of the relatively strict rule specified above, the use of
|
|
|
|
|
underscores could be limited. As we seen from other languages, common
|
|
|
|
|
rules include:
|
2016-02-11 02:57:12 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
* Only one consecutive underscore allowed, and only between digits.
|
|
|
|
|
* Multiple consecutive underscores allowed, but only between digits.
|
|
|
|
|
* Multiple consecutive underscores allowed, in most positions except
|
|
|
|
|
for the start of the literal, or special positions like after a
|
|
|
|
|
decimal point.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
The syntax in this PEP has ultimately been selected because it covers
|
|
|
|
|
the common use cases, and does not allow for syntax that would have to
|
|
|
|
|
be discouraged in style guides anyway.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
A less common rule would be to allow underscores only every N digits
|
|
|
|
|
(where N could be 3 for decimal literals, or 4 for hexadecimal ones).
|
|
|
|
|
This is unnecessarily restrictive, especially considering the
|
|
|
|
|
separator placement is different in different cultures.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Different Separators
|
|
|
|
|
--------------------
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
A proposed alternate syntax was to use whitespace for grouping.
|
|
|
|
|
Although strings are a precedent for combining adjoining literals, the
|
|
|
|
|
behavior can lead to unexpected effects which are not possible with
|
|
|
|
|
underscores. Also, no other language is known to use this rule,
|
|
|
|
|
except for languages that generally disregard any whitespace.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
2016-02-13 03:46:00 -05:00
|
|
|
|
C++14 introduces apostrophes for grouping (because underscores
|
|
|
|
|
introduce ambiguity with user-defined literals), which is not
|
|
|
|
|
considered because of the use in Python's string literals. [1]_
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Open Proposals
|
2016-02-10 17:16:04 -05:00
|
|
|
|
==============
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
It has been proposed [11]_ to extend the number-to-string formatting
|
|
|
|
|
language to allow ``_`` as a thousans separator, where currently only
|
|
|
|
|
``,`` is supported. This could be used to easily generate code with
|
|
|
|
|
more readable literals.
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
Implementation
|
2016-02-11 03:11:20 -05:00
|
|
|
|
==============
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
A preliminary patch that implements the specification given above has
|
|
|
|
|
been posted to the issue tracker. [12]_
|
2016-02-11 03:11:20 -05:00
|
|
|
|
|
|
|
|
|
|
2016-02-10 17:16:04 -05:00
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html
|
|
|
|
|
|
|
|
|
|
.. [2] http://dlang.org/spec/lex.html#integerliteral
|
|
|
|
|
|
|
|
|
|
.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors
|
|
|
|
|
|
|
|
|
|
.. [4] http://doc.rust-lang.org/reference.html#number-literals
|
|
|
|
|
|
|
|
|
|
.. [5] https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html
|
|
|
|
|
|
|
|
|
|
.. [6] https://github.com/dotnet/roslyn/issues/216
|
|
|
|
|
|
|
|
|
|
.. [7] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html
|
|
|
|
|
|
|
|
|
|
.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4
|
|
|
|
|
|
|
|
|
|
.. [9] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/
|
|
|
|
|
|
|
|
|
|
.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers
|
|
|
|
|
|
2016-02-13 03:43:02 -05:00
|
|
|
|
.. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html
|
|
|
|
|
|
|
|
|
|
.. [12] http://bugs.python.org/issue26331
|
2016-02-10 17:16:04 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|