PEP 701: Correct handling of format specifiers and nested expressions (#3151)

This commit is contained in:
Pablo Galindo Salgado 2023-05-22 13:29:47 +01:00 committed by GitHub
parent 69b9d2b3d3
commit 295769391a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 31 additions and 14 deletions

View File

@ -228,13 +228,28 @@ for details on the syntax):
The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined
:ref:`later in this document <701-new-tokens>`.
This PEP leaves up to the implementation the level of f-string nesting allowed but
**specifies a lower bound of 5 levels of nesting**. This is to ensure that users can
have a reasonable expectation of being able to nest f-strings with "reasonable" depth.
This PEP leaves up to the implementation the level of f-string nesting allowed
(f-strings withing the expression parts of other f-strings) but **specifies a
lower bound of 5 levels of nesting**. This is to ensure that users can have a
reasonable expectation of being able to nest f-strings with "reasonable" depth.
This PEP implies that limiting nesting is **not part of the language
specification** but also the language specification **doesn't mandate arbitrary
nesting**.
Similarly, this PEP leaves up to the implementation the level of expression nesting
in format specifiers but **specifies a lower bound of 2 levels of nesting**. This means
that the following should always be valid:
.. code-block:: python
f"{'':*^{1:{1}}}"
but the following can be valid or not depending on the implementation:
.. code-block:: python
f"{'':*^{1:{1:{1}}}}"
The new grammar will preserve the Abstract Syntax Tree (AST) of the current
implementation. This means that no semantic changes will be introduced by this
PEP on existing code that uses f-strings.
@ -362,8 +377,11 @@ tokens:
2. Keep consuming tokens until a one of the following is encountered:
* A closing quote equal to the opening quote.
* An opening brace (``{``) or a closing brace (``}``) that is not immediately
followed by another opening/closing brace.
* If in "format specifier mode" (see step 3), an opening brace (``{``) or a
closing brace (``}``).
* If not in "format specifier mode" (see step 3), an opening brace (``{``) or
a closing brace (``}``) that is not immediately followed by another opening/closing
brace.
In all cases, if the character buffer is not empty, emit a ``FSTRING_MIDDLE``
token with the contents captured so far but transform any double
@ -375,16 +393,15 @@ tokens:
is encountered, go to step 3.
* If a closing bracket (not immediately followed by another closing bracket)
is encountered, emit a token for the closing bracket and go to step 2.
3. Push a new tokenizer mode to the tokenizer mode stack for "Regular Python
tokenization within f-string" and proceed to tokenize with it. This mode
tokenizes as the "Regular Python tokenization" until a ``!``, ``:``, ``=``
character is encountered or if a ``}`` character is encountered with the same
level of nesting as the opening bracket token that was pushed when we enter the
f-string part. Using this mode, emit tokens until one of the stop points are
reached. When this happens, emit the corresponding token for the stopping
character encountered and, pop the current tokenizer mode from the tokenizer mode
stack and go to step 2.
tokenizes as the "Regular Python tokenization" until a ``:`` or a ``}``
character is encountered with the same level of nesting as the opening
bracket token that was pushed when we enter the f-string part. Using this mode,
emit tokens until one of the stop points are reached. When this happens, emit
the corresponding token for the stopping character encountered and, pop the
current tokenizer mode from the tokenizer mode stack and go to step 2. If the
stopping point is a ``:`` character, enter step 2 in "format specifier" mode.
4. Emit a ``FSTRING_END`` token with the contents captured and pop the current
tokenizer mode (corresponding to "F-string tokenization") and go back to
"Regular Python mode".
@ -561,7 +578,7 @@ Rejected Ideas
>>> f'Useless use of lambdas: { lambda x: x*2 }'
SyntaxError: unexpected EOF while parsing
The reason is that this will introduce a considerable amount of
The reason is that this will introduce a considerable amount of
complexity for no real benefit. This is due to the fact that the ``:`` character
normally separates the f-string format specification. This format specification
is currently tokenized as a string. As the tokenizer MUST tokenize what's on the