From 3693b347307ad9f10de4d5895628ee632d781c2e Mon Sep 17 00:00:00 2001 From: Georg Brandl Date: Sat, 13 Feb 2016 09:43:02 +0100 Subject: [PATCH] PEP 515: major revision. Use rules preferred by Guido. --- pep-0515.txt | 187 ++++++++++++++++++++++++++++----------------------- 1 file changed, 102 insertions(+), 85 deletions(-) diff --git a/pep-0515.txt b/pep-0515.txt index 80f0d3e89..05e789103 100644 --- a/pep-0515.txt +++ b/pep-0515.txt @@ -2,7 +2,7 @@ PEP: 515 Title: Underscores in Numeric Literals Version: $Revision$ Last-Modified: $Date$ -Author: Georg Brandl +Author: Georg Brandl, Serhiy Storchaka Status: Draft Type: Standards Track Content-Type: text/x-rst @@ -13,13 +13,14 @@ Post-History: 10-Feb-2016, 11-Feb-2016 Abstract and Rationale ====================== -This PEP proposes to extend Python's syntax so that underscores can be used as -visual separators for digit grouping purposes in integral, floating-point and -complex number literals. +This PEP proposes to extend Python's syntax and number-from-string +constructors so that underscores can be used as visual separators for +digit grouping purposes in integral, floating-point and complex number +literals. -This is a common feature of other modern languages, and can aid readability of -long literals, or literals whose value should clearly separate into parts, such -as bytes or words in hexadecimal notation. +This is a common feature of other modern languages, and can aid +readability of long literals, or literals whose value should clearly +separate into parts, such as bytes or words in hexadecimal notation. Examples:: @@ -32,39 +33,81 @@ Examples:: # grouping bits into nibbles in a binary literal flags = 0b_0011_1111_0100_1110 - # making the literal suffix stand out more - imag = 1.247812376e-15_j + # same, for string conversions + flags = int('0b_1111_0000', 2) Specification ============= -The current proposal is to allow one or more consecutive underscores following -digits and base specifiers in numeric literals. The underscores have no -semantic meaning, and literals are parsed as if the underscores were absent. +The current proposal is to allow one underscore between digits, and +after base specifiers in numeric literals. The underscores have no +semantic meaning, and literals are parsed as if the underscores were +absent. -The production list for integer literals would therefore look like this:: +Literal Grammar +--------------- - integer: decimalinteger | octinteger | hexinteger | bininteger - decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")* +The production list for integer literals would therefore look like +this:: + + integer: decinteger | bininteger | octinteger | hexinteger + decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")* + bininteger: "0" ("b" | "B") (["_"] bindigit)+ + octinteger: "0" ("o" | "O") (["_"] octdigit)+ + hexinteger: "0" ("x" | "X") (["_"] hexdigit)+ nonzerodigit: "1"..."9" digit: "0"..."9" - octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")* - hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")* - bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")* + bindigit: "0" | "1" octdigit: "0"..."7" hexdigit: digit | "a"..."f" | "A"..."F" - bindigit: "0" | "1" For floating-point and complex literals:: floatnumber: pointfloat | exponentfloat - pointfloat: [intpart] fraction | intpart "." - exponentfloat: (intpart | pointfloat) exponent - intpart: digit (digit | "_")* - fraction: "." intpart - exponent: ("e" | "E") ["+" | "-"] intpart - imagnumber: (floatnumber | intpart) ("j" | "J") + pointfloat: [digitpart] fraction | digitpart "." + exponentfloat: (digitpart | pointfloat) exponent + digitpart: digit (["_"] digit)* + fraction: "." digitpart + exponent: ("e" | "E") ["+" | "-"] digitpart + imagnumber: (floatnumber | digitpart) ("j" | "J") + +Constructors +------------ + +Following the same rules for placement, underscores will be allowed in +the following constructors: + +- ``int()`` (with any base) +- ``float()`` +- ``complex()`` +- ``Decimal()`` + + +Prior Art +========= + +Those languages that do allow underscore grouping implement a large +variety of rules for allowed placement of underscores. In cases where +the language spec contradicts the actual behavior, the actual behavior +is listed. ("single" or "multiple" refer to allowing runs of +consecutive underscores.) + +* Ada: single, only between digits [8]_ +* C# (open proposal for 7.0): multiple, only between digits [6]_ +* C++ (C++14): single, between digits (different separator chosen) [1]_ +* D: multiple, anywhere, including trailing [2]_ +* Java: multiple, only between digits [7]_ +* Julia: single, only between digits (but not in float exponent parts) + [9]_ +* Perl 5: multiple, basically anywhere, although docs say it's + restricted to one underscore between digits [3]_ +* Ruby: single, only between digits (although docs say "anywhere") + [10]_ +* Rust: multiple, anywhere, except for between exponent "e" and digits + [4]_ +* Swift: multiple, between digits and trailing (although textual + description says only "between digits") [5]_ Alternative Syntax @@ -73,81 +116,53 @@ Alternative Syntax Underscore Placement Rules -------------------------- -Instead of the liberal rule specified above, the use of underscores could be -limited. Common rules are (see the "other languages" section): +Instead of the relatively strict rule specified above, the use of +underscores could be limited. As we seen from other languages, common +rules include: * Only one consecutive underscore allowed, and only between digits. -* Multiple consecutive underscore allowed, but only between digits. +* Multiple consecutive underscores allowed, but only between digits. +* Multiple consecutive underscores allowed, in most positions except + for the start of the literal, or special positions like after a + decimal point. -A less common rule would be to allow underscores only every N digits (where N -could be 3 for decimal literals, or 4 for hexadecimal ones). This is -unnecessarily restrictive, especially considering the separator placement is -different in different cultures. +The syntax in this PEP has ultimately been selected because it covers +the common use cases, and does not allow for syntax that would have to +be discouraged in style guides anyway. + +A less common rule would be to allow underscores only every N digits +(where N could be 3 for decimal literals, or 4 for hexadecimal ones). +This is unnecessarily restrictive, especially considering the +separator placement is different in different cultures. Different Separators -------------------- -A proposed alternate syntax was to use whitespace for grouping. Although -strings are a precedent for combining adjoining literals, the behavior can lead -to unexpected effects which are not possible with underscores. Also, no other -language is known to use this rule, except for languages that generally -disregard any whitespace. +A proposed alternate syntax was to use whitespace for grouping. +Although strings are a precedent for combining adjoining literals, the +behavior can lead to unexpected effects which are not possible with +underscores. Also, no other language is known to use this rule, +except for languages that generally disregard any whitespace. -C++14 introduces apostrophes for grouping, which is not considered due to the -conflict with Python's string literals. [1]_ +C++14 introduces apostrophes for grouping (because underscores introduce +ambiguity with user-defined literals), which is not considered because of the +use in Python's string literals. [1]_ -Behavior in Other Languages -=========================== +Open Proposals +============== -Those languages that do allow underscore grouping implement a large variety of -rules for allowed placement of underscores. This is a listing placing the known -rules into three major groups. In cases where the language spec contradicts the -actual behavior, the actual behavior is listed. - -**Group 1: liberal** - -This group is the least homogeneous: the rules vary slightly between languages. -All of them allow trailing underscores. Some allow underscores after non-digits -like the ``e`` or the sign in exponents. - -* D [2]_ -* Perl 5 (underscores basically allowed anywhere, although docs say it's more - restricted) [3]_ -* Rust (allows between exponent sign and digits) [4]_ -* Swift (although textual description says "between digits") [5]_ - -**Group 2: only between digits, multiple consecutive underscores** - -* C# (open proposal for 7.0) [6]_ -* Java [7]_ - -**Group 3: only between digits, only one underscore** - -* Ada [8]_ -* Julia (but not in the exponent part of floats) [9]_ -* Ruby (docs say "anywhere", in reality only between digits) [10]_ +It has been proposed [11]_ to extend the number-to-string formatting +language to allow ``_`` as a thousans separator, where currently only +``,`` is supported. This could be used to easily generate code with +more readable literals. Implementation ============== -A preliminary patch that implements the specification given above has been -posted to the issue tracker. [11]_ - - -Open Questions -============== - -This PEP currently only proposes changing the literal syntax. The following -extensions are open for discussion: - -* Allowing underscores in string arguments to the ``Decimal`` constructor. It - could be argued that these are akin to literals, since there is no Decimal - literal available (yet). - -* Allowing underscores in string arguments to ``int()`` with base argument 0, - ``float()`` and ``complex()``. +A preliminary patch that implements the specification given above has +been posted to the issue tracker. [12]_ References @@ -173,7 +188,9 @@ References .. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers -.. [11] http://bugs.python.org/issue26331 +.. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html + +.. [12] http://bugs.python.org/issue26331 Copyright