PEP 515: major revision. Use rules preferred by Guido.

This commit is contained in:
Georg Brandl 2016-02-13 09:43:02 +01:00
parent 2002aa056a
commit 3693b34730
1 changed files with 102 additions and 85 deletions

View File

@ -2,7 +2,7 @@ PEP: 515
Title: Underscores in Numeric Literals Title: Underscores in Numeric Literals
Version: $Revision$ Version: $Revision$
Last-Modified: $Date$ Last-Modified: $Date$
Author: Georg Brandl Author: Georg Brandl, Serhiy Storchaka
Status: Draft Status: Draft
Type: Standards Track Type: Standards Track
Content-Type: text/x-rst Content-Type: text/x-rst
@ -13,13 +13,14 @@ Post-History: 10-Feb-2016, 11-Feb-2016
Abstract and Rationale Abstract and Rationale
====================== ======================
This PEP proposes to extend Python's syntax so that underscores can be used as This PEP proposes to extend Python's syntax and number-from-string
visual separators for digit grouping purposes in integral, floating-point and constructors so that underscores can be used as visual separators for
complex number literals. digit grouping purposes in integral, floating-point and complex number
literals.
This is a common feature of other modern languages, and can aid readability of This is a common feature of other modern languages, and can aid
long literals, or literals whose value should clearly separate into parts, such readability of long literals, or literals whose value should clearly
as bytes or words in hexadecimal notation. separate into parts, such as bytes or words in hexadecimal notation.
Examples:: Examples::
@ -32,39 +33,81 @@ Examples::
# grouping bits into nibbles in a binary literal # grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110 flags = 0b_0011_1111_0100_1110
# making the literal suffix stand out more # same, for string conversions
imag = 1.247812376e-15_j flags = int('0b_1111_0000', 2)
Specification Specification
============= =============
The current proposal is to allow one or more consecutive underscores following The current proposal is to allow one underscore between digits, and
digits and base specifiers in numeric literals. The underscores have no after base specifiers in numeric literals. The underscores have no
semantic meaning, and literals are parsed as if the underscores were absent. semantic meaning, and literals are parsed as if the underscores were
absent.
The production list for integer literals would therefore look like this:: Literal Grammar
---------------
integer: decimalinteger | octinteger | hexinteger | bininteger The production list for integer literals would therefore look like
decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")* this::
integer: decinteger | bininteger | octinteger | hexinteger
decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*
bininteger: "0" ("b" | "B") (["_"] bindigit)+
octinteger: "0" ("o" | "O") (["_"] octdigit)+
hexinteger: "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit: "1"..."9" nonzerodigit: "1"..."9"
digit: "0"..."9" digit: "0"..."9"
octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* bindigit: "0" | "1"
hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*
bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
octdigit: "0"..."7" octdigit: "0"..."7"
hexdigit: digit | "a"..."f" | "A"..."F" hexdigit: digit | "a"..."f" | "A"..."F"
bindigit: "0" | "1"
For floating-point and complex literals:: For floating-point and complex literals::
floatnumber: pointfloat | exponentfloat floatnumber: pointfloat | exponentfloat
pointfloat: [intpart] fraction | intpart "." pointfloat: [digitpart] fraction | digitpart "."
exponentfloat: (intpart | pointfloat) exponent exponentfloat: (digitpart | pointfloat) exponent
intpart: digit (digit | "_")* digitpart: digit (["_"] digit)*
fraction: "." intpart fraction: "." digitpart
exponent: ("e" | "E") ["+" | "-"] intpart exponent: ("e" | "E") ["+" | "-"] digitpart
imagnumber: (floatnumber | intpart) ("j" | "J") imagnumber: (floatnumber | digitpart) ("j" | "J")
Constructors
------------
Following the same rules for placement, underscores will be allowed in
the following constructors:
- ``int()`` (with any base)
- ``float()``
- ``complex()``
- ``Decimal()``
Prior Art
=========
Those languages that do allow underscore grouping implement a large
variety of rules for allowed placement of underscores. In cases where
the language spec contradicts the actual behavior, the actual behavior
is listed. ("single" or "multiple" refer to allowing runs of
consecutive underscores.)
* Ada: single, only between digits [8]_
* C# (open proposal for 7.0): multiple, only between digits [6]_
* C++ (C++14): single, between digits (different separator chosen) [1]_
* D: multiple, anywhere, including trailing [2]_
* Java: multiple, only between digits [7]_
* Julia: single, only between digits (but not in float exponent parts)
[9]_
* Perl 5: multiple, basically anywhere, although docs say it's
restricted to one underscore between digits [3]_
* Ruby: single, only between digits (although docs say "anywhere")
[10]_
* Rust: multiple, anywhere, except for between exponent "e" and digits
[4]_
* Swift: multiple, between digits and trailing (although textual
description says only "between digits") [5]_
Alternative Syntax Alternative Syntax
@ -73,81 +116,53 @@ Alternative Syntax
Underscore Placement Rules Underscore Placement Rules
-------------------------- --------------------------
Instead of the liberal rule specified above, the use of underscores could be Instead of the relatively strict rule specified above, the use of
limited. Common rules are (see the "other languages" section): underscores could be limited. As we seen from other languages, common
rules include:
* Only one consecutive underscore allowed, and only between digits. * Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscore allowed, but only between digits. * Multiple consecutive underscores allowed, but only between digits.
* Multiple consecutive underscores allowed, in most positions except
for the start of the literal, or special positions like after a
decimal point.
A less common rule would be to allow underscores only every N digits (where N The syntax in this PEP has ultimately been selected because it covers
could be 3 for decimal literals, or 4 for hexadecimal ones). This is the common use cases, and does not allow for syntax that would have to
unnecessarily restrictive, especially considering the separator placement is be discouraged in style guides anyway.
different in different cultures.
A less common rule would be to allow underscores only every N digits
(where N could be 3 for decimal literals, or 4 for hexadecimal ones).
This is unnecessarily restrictive, especially considering the
separator placement is different in different cultures.
Different Separators Different Separators
-------------------- --------------------
A proposed alternate syntax was to use whitespace for grouping. Although A proposed alternate syntax was to use whitespace for grouping.
strings are a precedent for combining adjoining literals, the behavior can lead Although strings are a precedent for combining adjoining literals, the
to unexpected effects which are not possible with underscores. Also, no other behavior can lead to unexpected effects which are not possible with
language is known to use this rule, except for languages that generally underscores. Also, no other language is known to use this rule,
disregard any whitespace. except for languages that generally disregard any whitespace.
C++14 introduces apostrophes for grouping, which is not considered due to the C++14 introduces apostrophes for grouping (because underscores introduce
conflict with Python's string literals. [1]_ ambiguity with user-defined literals), which is not considered because of the
use in Python's string literals. [1]_
Behavior in Other Languages Open Proposals
=========================== ==============
Those languages that do allow underscore grouping implement a large variety of It has been proposed [11]_ to extend the number-to-string formatting
rules for allowed placement of underscores. This is a listing placing the known language to allow ``_`` as a thousans separator, where currently only
rules into three major groups. In cases where the language spec contradicts the ``,`` is supported. This could be used to easily generate code with
actual behavior, the actual behavior is listed. more readable literals.
**Group 1: liberal**
This group is the least homogeneous: the rules vary slightly between languages.
All of them allow trailing underscores. Some allow underscores after non-digits
like the ``e`` or the sign in exponents.
* D [2]_
* Perl 5 (underscores basically allowed anywhere, although docs say it's more
restricted) [3]_
* Rust (allows between exponent sign and digits) [4]_
* Swift (although textual description says "between digits") [5]_
**Group 2: only between digits, multiple consecutive underscores**
* C# (open proposal for 7.0) [6]_
* Java [7]_
**Group 3: only between digits, only one underscore**
* Ada [8]_
* Julia (but not in the exponent part of floats) [9]_
* Ruby (docs say "anywhere", in reality only between digits) [10]_
Implementation Implementation
============== ==============
A preliminary patch that implements the specification given above has been A preliminary patch that implements the specification given above has
posted to the issue tracker. [11]_ been posted to the issue tracker. [12]_
Open Questions
==============
This PEP currently only proposes changing the literal syntax. The following
extensions are open for discussion:
* Allowing underscores in string arguments to the ``Decimal`` constructor. It
could be argued that these are akin to literals, since there is no Decimal
literal available (yet).
* Allowing underscores in string arguments to ``int()`` with base argument 0,
``float()`` and ``complex()``.
References References
@ -173,7 +188,9 @@ References
.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers .. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers
.. [11] http://bugs.python.org/issue26331 .. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html
.. [12] http://bugs.python.org/issue26331
Copyright Copyright