PEP 515: major revision. Use rules preferred by Guido.
This commit is contained in:
parent
2002aa056a
commit
3693b34730
187
pep-0515.txt
187
pep-0515.txt
|
@ -2,7 +2,7 @@ PEP: 515
|
|||
Title: Underscores in Numeric Literals
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Georg Brandl
|
||||
Author: Georg Brandl, Serhiy Storchaka
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
|
@ -13,13 +13,14 @@ Post-History: 10-Feb-2016, 11-Feb-2016
|
|||
Abstract and Rationale
|
||||
======================
|
||||
|
||||
This PEP proposes to extend Python's syntax so that underscores can be used as
|
||||
visual separators for digit grouping purposes in integral, floating-point and
|
||||
complex number literals.
|
||||
This PEP proposes to extend Python's syntax and number-from-string
|
||||
constructors so that underscores can be used as visual separators for
|
||||
digit grouping purposes in integral, floating-point and complex number
|
||||
literals.
|
||||
|
||||
This is a common feature of other modern languages, and can aid readability of
|
||||
long literals, or literals whose value should clearly separate into parts, such
|
||||
as bytes or words in hexadecimal notation.
|
||||
This is a common feature of other modern languages, and can aid
|
||||
readability of long literals, or literals whose value should clearly
|
||||
separate into parts, such as bytes or words in hexadecimal notation.
|
||||
|
||||
Examples::
|
||||
|
||||
|
@ -32,39 +33,81 @@ Examples::
|
|||
# grouping bits into nibbles in a binary literal
|
||||
flags = 0b_0011_1111_0100_1110
|
||||
|
||||
# making the literal suffix stand out more
|
||||
imag = 1.247812376e-15_j
|
||||
# same, for string conversions
|
||||
flags = int('0b_1111_0000', 2)
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
The current proposal is to allow one or more consecutive underscores following
|
||||
digits and base specifiers in numeric literals. The underscores have no
|
||||
semantic meaning, and literals are parsed as if the underscores were absent.
|
||||
The current proposal is to allow one underscore between digits, and
|
||||
after base specifiers in numeric literals. The underscores have no
|
||||
semantic meaning, and literals are parsed as if the underscores were
|
||||
absent.
|
||||
|
||||
The production list for integer literals would therefore look like this::
|
||||
Literal Grammar
|
||||
---------------
|
||||
|
||||
integer: decimalinteger | octinteger | hexinteger | bininteger
|
||||
decimalinteger: nonzerodigit (digit | "_")* | "0" ("0" | "_")*
|
||||
The production list for integer literals would therefore look like
|
||||
this::
|
||||
|
||||
integer: decinteger | bininteger | octinteger | hexinteger
|
||||
decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*
|
||||
bininteger: "0" ("b" | "B") (["_"] bindigit)+
|
||||
octinteger: "0" ("o" | "O") (["_"] octdigit)+
|
||||
hexinteger: "0" ("x" | "X") (["_"] hexdigit)+
|
||||
nonzerodigit: "1"..."9"
|
||||
digit: "0"..."9"
|
||||
octinteger: "0" ("o" | "O") "_"* octdigit (octdigit | "_")*
|
||||
hexinteger: "0" ("x" | "X") "_"* hexdigit (hexdigit | "_")*
|
||||
bininteger: "0" ("b" | "B") "_"* bindigit (bindigit | "_")*
|
||||
bindigit: "0" | "1"
|
||||
octdigit: "0"..."7"
|
||||
hexdigit: digit | "a"..."f" | "A"..."F"
|
||||
bindigit: "0" | "1"
|
||||
|
||||
For floating-point and complex literals::
|
||||
|
||||
floatnumber: pointfloat | exponentfloat
|
||||
pointfloat: [intpart] fraction | intpart "."
|
||||
exponentfloat: (intpart | pointfloat) exponent
|
||||
intpart: digit (digit | "_")*
|
||||
fraction: "." intpart
|
||||
exponent: ("e" | "E") ["+" | "-"] intpart
|
||||
imagnumber: (floatnumber | intpart) ("j" | "J")
|
||||
pointfloat: [digitpart] fraction | digitpart "."
|
||||
exponentfloat: (digitpart | pointfloat) exponent
|
||||
digitpart: digit (["_"] digit)*
|
||||
fraction: "." digitpart
|
||||
exponent: ("e" | "E") ["+" | "-"] digitpart
|
||||
imagnumber: (floatnumber | digitpart) ("j" | "J")
|
||||
|
||||
Constructors
|
||||
------------
|
||||
|
||||
Following the same rules for placement, underscores will be allowed in
|
||||
the following constructors:
|
||||
|
||||
- ``int()`` (with any base)
|
||||
- ``float()``
|
||||
- ``complex()``
|
||||
- ``Decimal()``
|
||||
|
||||
|
||||
Prior Art
|
||||
=========
|
||||
|
||||
Those languages that do allow underscore grouping implement a large
|
||||
variety of rules for allowed placement of underscores. In cases where
|
||||
the language spec contradicts the actual behavior, the actual behavior
|
||||
is listed. ("single" or "multiple" refer to allowing runs of
|
||||
consecutive underscores.)
|
||||
|
||||
* Ada: single, only between digits [8]_
|
||||
* C# (open proposal for 7.0): multiple, only between digits [6]_
|
||||
* C++ (C++14): single, between digits (different separator chosen) [1]_
|
||||
* D: multiple, anywhere, including trailing [2]_
|
||||
* Java: multiple, only between digits [7]_
|
||||
* Julia: single, only between digits (but not in float exponent parts)
|
||||
[9]_
|
||||
* Perl 5: multiple, basically anywhere, although docs say it's
|
||||
restricted to one underscore between digits [3]_
|
||||
* Ruby: single, only between digits (although docs say "anywhere")
|
||||
[10]_
|
||||
* Rust: multiple, anywhere, except for between exponent "e" and digits
|
||||
[4]_
|
||||
* Swift: multiple, between digits and trailing (although textual
|
||||
description says only "between digits") [5]_
|
||||
|
||||
|
||||
Alternative Syntax
|
||||
|
@ -73,81 +116,53 @@ Alternative Syntax
|
|||
Underscore Placement Rules
|
||||
--------------------------
|
||||
|
||||
Instead of the liberal rule specified above, the use of underscores could be
|
||||
limited. Common rules are (see the "other languages" section):
|
||||
Instead of the relatively strict rule specified above, the use of
|
||||
underscores could be limited. As we seen from other languages, common
|
||||
rules include:
|
||||
|
||||
* Only one consecutive underscore allowed, and only between digits.
|
||||
* Multiple consecutive underscore allowed, but only between digits.
|
||||
* Multiple consecutive underscores allowed, but only between digits.
|
||||
* Multiple consecutive underscores allowed, in most positions except
|
||||
for the start of the literal, or special positions like after a
|
||||
decimal point.
|
||||
|
||||
A less common rule would be to allow underscores only every N digits (where N
|
||||
could be 3 for decimal literals, or 4 for hexadecimal ones). This is
|
||||
unnecessarily restrictive, especially considering the separator placement is
|
||||
different in different cultures.
|
||||
The syntax in this PEP has ultimately been selected because it covers
|
||||
the common use cases, and does not allow for syntax that would have to
|
||||
be discouraged in style guides anyway.
|
||||
|
||||
A less common rule would be to allow underscores only every N digits
|
||||
(where N could be 3 for decimal literals, or 4 for hexadecimal ones).
|
||||
This is unnecessarily restrictive, especially considering the
|
||||
separator placement is different in different cultures.
|
||||
|
||||
Different Separators
|
||||
--------------------
|
||||
|
||||
A proposed alternate syntax was to use whitespace for grouping. Although
|
||||
strings are a precedent for combining adjoining literals, the behavior can lead
|
||||
to unexpected effects which are not possible with underscores. Also, no other
|
||||
language is known to use this rule, except for languages that generally
|
||||
disregard any whitespace.
|
||||
A proposed alternate syntax was to use whitespace for grouping.
|
||||
Although strings are a precedent for combining adjoining literals, the
|
||||
behavior can lead to unexpected effects which are not possible with
|
||||
underscores. Also, no other language is known to use this rule,
|
||||
except for languages that generally disregard any whitespace.
|
||||
|
||||
C++14 introduces apostrophes for grouping, which is not considered due to the
|
||||
conflict with Python's string literals. [1]_
|
||||
C++14 introduces apostrophes for grouping (because underscores introduce
|
||||
ambiguity with user-defined literals), which is not considered because of the
|
||||
use in Python's string literals. [1]_
|
||||
|
||||
|
||||
Behavior in Other Languages
|
||||
===========================
|
||||
Open Proposals
|
||||
==============
|
||||
|
||||
Those languages that do allow underscore grouping implement a large variety of
|
||||
rules for allowed placement of underscores. This is a listing placing the known
|
||||
rules into three major groups. In cases where the language spec contradicts the
|
||||
actual behavior, the actual behavior is listed.
|
||||
|
||||
**Group 1: liberal**
|
||||
|
||||
This group is the least homogeneous: the rules vary slightly between languages.
|
||||
All of them allow trailing underscores. Some allow underscores after non-digits
|
||||
like the ``e`` or the sign in exponents.
|
||||
|
||||
* D [2]_
|
||||
* Perl 5 (underscores basically allowed anywhere, although docs say it's more
|
||||
restricted) [3]_
|
||||
* Rust (allows between exponent sign and digits) [4]_
|
||||
* Swift (although textual description says "between digits") [5]_
|
||||
|
||||
**Group 2: only between digits, multiple consecutive underscores**
|
||||
|
||||
* C# (open proposal for 7.0) [6]_
|
||||
* Java [7]_
|
||||
|
||||
**Group 3: only between digits, only one underscore**
|
||||
|
||||
* Ada [8]_
|
||||
* Julia (but not in the exponent part of floats) [9]_
|
||||
* Ruby (docs say "anywhere", in reality only between digits) [10]_
|
||||
It has been proposed [11]_ to extend the number-to-string formatting
|
||||
language to allow ``_`` as a thousans separator, where currently only
|
||||
``,`` is supported. This could be used to easily generate code with
|
||||
more readable literals.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
A preliminary patch that implements the specification given above has been
|
||||
posted to the issue tracker. [11]_
|
||||
|
||||
|
||||
Open Questions
|
||||
==============
|
||||
|
||||
This PEP currently only proposes changing the literal syntax. The following
|
||||
extensions are open for discussion:
|
||||
|
||||
* Allowing underscores in string arguments to the ``Decimal`` constructor. It
|
||||
could be argued that these are akin to literals, since there is no Decimal
|
||||
literal available (yet).
|
||||
|
||||
* Allowing underscores in string arguments to ``int()`` with base argument 0,
|
||||
``float()`` and ``complex()``.
|
||||
A preliminary patch that implements the specification given above has
|
||||
been posted to the issue tracker. [12]_
|
||||
|
||||
|
||||
References
|
||||
|
@ -173,7 +188,9 @@ References
|
|||
|
||||
.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers
|
||||
|
||||
.. [11] http://bugs.python.org/issue26331
|
||||
.. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html
|
||||
|
||||
.. [12] http://bugs.python.org/issue26331
|
||||
|
||||
|
||||
Copyright
|
||||
|
|
Loading…
Reference in New Issue