PEP 701: Correct handling of format specifiers and nested expressions (#3151)
This commit is contained in:
parent
69b9d2b3d3
commit
295769391a
45
pep-0701.rst
45
pep-0701.rst
|
@ -228,13 +228,28 @@ for details on the syntax):
|
||||||
The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined
|
The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined
|
||||||
:ref:`later in this document <701-new-tokens>`.
|
:ref:`later in this document <701-new-tokens>`.
|
||||||
|
|
||||||
This PEP leaves up to the implementation the level of f-string nesting allowed but
|
This PEP leaves up to the implementation the level of f-string nesting allowed
|
||||||
**specifies a lower bound of 5 levels of nesting**. This is to ensure that users can
|
(f-strings withing the expression parts of other f-strings) but **specifies a
|
||||||
have a reasonable expectation of being able to nest f-strings with "reasonable" depth.
|
lower bound of 5 levels of nesting**. This is to ensure that users can have a
|
||||||
|
reasonable expectation of being able to nest f-strings with "reasonable" depth.
|
||||||
This PEP implies that limiting nesting is **not part of the language
|
This PEP implies that limiting nesting is **not part of the language
|
||||||
specification** but also the language specification **doesn't mandate arbitrary
|
specification** but also the language specification **doesn't mandate arbitrary
|
||||||
nesting**.
|
nesting**.
|
||||||
|
|
||||||
|
Similarly, this PEP leaves up to the implementation the level of expression nesting
|
||||||
|
in format specifiers but **specifies a lower bound of 2 levels of nesting**. This means
|
||||||
|
that the following should always be valid:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
f"{'':*^{1:{1}}}"
|
||||||
|
|
||||||
|
but the following can be valid or not depending on the implementation:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
f"{'':*^{1:{1:{1}}}}"
|
||||||
|
|
||||||
The new grammar will preserve the Abstract Syntax Tree (AST) of the current
|
The new grammar will preserve the Abstract Syntax Tree (AST) of the current
|
||||||
implementation. This means that no semantic changes will be introduced by this
|
implementation. This means that no semantic changes will be introduced by this
|
||||||
PEP on existing code that uses f-strings.
|
PEP on existing code that uses f-strings.
|
||||||
|
@ -362,8 +377,11 @@ tokens:
|
||||||
2. Keep consuming tokens until a one of the following is encountered:
|
2. Keep consuming tokens until a one of the following is encountered:
|
||||||
|
|
||||||
* A closing quote equal to the opening quote.
|
* A closing quote equal to the opening quote.
|
||||||
* An opening brace (``{``) or a closing brace (``}``) that is not immediately
|
* If in "format specifier mode" (see step 3), an opening brace (``{``) or a
|
||||||
followed by another opening/closing brace.
|
closing brace (``}``).
|
||||||
|
* If not in "format specifier mode" (see step 3), an opening brace (``{``) or
|
||||||
|
a closing brace (``}``) that is not immediately followed by another opening/closing
|
||||||
|
brace.
|
||||||
|
|
||||||
In all cases, if the character buffer is not empty, emit a ``FSTRING_MIDDLE``
|
In all cases, if the character buffer is not empty, emit a ``FSTRING_MIDDLE``
|
||||||
token with the contents captured so far but transform any double
|
token with the contents captured so far but transform any double
|
||||||
|
@ -375,16 +393,15 @@ tokens:
|
||||||
is encountered, go to step 3.
|
is encountered, go to step 3.
|
||||||
* If a closing bracket (not immediately followed by another closing bracket)
|
* If a closing bracket (not immediately followed by another closing bracket)
|
||||||
is encountered, emit a token for the closing bracket and go to step 2.
|
is encountered, emit a token for the closing bracket and go to step 2.
|
||||||
|
|
||||||
3. Push a new tokenizer mode to the tokenizer mode stack for "Regular Python
|
3. Push a new tokenizer mode to the tokenizer mode stack for "Regular Python
|
||||||
tokenization within f-string" and proceed to tokenize with it. This mode
|
tokenization within f-string" and proceed to tokenize with it. This mode
|
||||||
tokenizes as the "Regular Python tokenization" until a ``!``, ``:``, ``=``
|
tokenizes as the "Regular Python tokenization" until a ``:`` or a ``}``
|
||||||
character is encountered or if a ``}`` character is encountered with the same
|
character is encountered with the same level of nesting as the opening
|
||||||
level of nesting as the opening bracket token that was pushed when we enter the
|
bracket token that was pushed when we enter the f-string part. Using this mode,
|
||||||
f-string part. Using this mode, emit tokens until one of the stop points are
|
emit tokens until one of the stop points are reached. When this happens, emit
|
||||||
reached. When this happens, emit the corresponding token for the stopping
|
the corresponding token for the stopping character encountered and, pop the
|
||||||
character encountered and, pop the current tokenizer mode from the tokenizer mode
|
current tokenizer mode from the tokenizer mode stack and go to step 2. If the
|
||||||
stack and go to step 2.
|
stopping point is a ``:`` character, enter step 2 in "format specifier" mode.
|
||||||
4. Emit a ``FSTRING_END`` token with the contents captured and pop the current
|
4. Emit a ``FSTRING_END`` token with the contents captured and pop the current
|
||||||
tokenizer mode (corresponding to "F-string tokenization") and go back to
|
tokenizer mode (corresponding to "F-string tokenization") and go back to
|
||||||
"Regular Python mode".
|
"Regular Python mode".
|
||||||
|
@ -561,7 +578,7 @@ Rejected Ideas
|
||||||
>>> f'Useless use of lambdas: { lambda x: x*2 }'
|
>>> f'Useless use of lambdas: { lambda x: x*2 }'
|
||||||
SyntaxError: unexpected EOF while parsing
|
SyntaxError: unexpected EOF while parsing
|
||||||
|
|
||||||
The reason is that this will introduce a considerable amount of
|
The reason is that this will introduce a considerable amount of
|
||||||
complexity for no real benefit. This is due to the fact that the ``:`` character
|
complexity for no real benefit. This is due to the fact that the ``:`` character
|
||||||
normally separates the f-string format specification. This format specification
|
normally separates the f-string format specification. This format specification
|
||||||
is currently tokenized as a string. As the tokenizer MUST tokenize what's on the
|
is currently tokenized as a string. As the tokenizer MUST tokenize what's on the
|
||||||
|
|
Loading…
Reference in New Issue