PEP 701: Correct handling of format specifiers and nested expressions (#3151)

This commit is contained in:
Pablo Galindo Salgado 2023-05-22 13:29:47 +01:00 committed by GitHub
parent 69b9d2b3d3
commit 295769391a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 31 additions and 14 deletions

View File

@ -228,13 +228,28 @@ for details on the syntax):
The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined
:ref:`later in this document <701-new-tokens>`. :ref:`later in this document <701-new-tokens>`.
This PEP leaves up to the implementation the level of f-string nesting allowed but This PEP leaves up to the implementation the level of f-string nesting allowed
**specifies a lower bound of 5 levels of nesting**. This is to ensure that users can (f-strings withing the expression parts of other f-strings) but **specifies a
have a reasonable expectation of being able to nest f-strings with "reasonable" depth. lower bound of 5 levels of nesting**. This is to ensure that users can have a
reasonable expectation of being able to nest f-strings with "reasonable" depth.
This PEP implies that limiting nesting is **not part of the language This PEP implies that limiting nesting is **not part of the language
specification** but also the language specification **doesn't mandate arbitrary specification** but also the language specification **doesn't mandate arbitrary
nesting**. nesting**.
Similarly, this PEP leaves up to the implementation the level of expression nesting
in format specifiers but **specifies a lower bound of 2 levels of nesting**. This means
that the following should always be valid:
.. code-block:: python
f"{'':*^{1:{1}}}"
but the following can be valid or not depending on the implementation:
.. code-block:: python
f"{'':*^{1:{1:{1}}}}"
The new grammar will preserve the Abstract Syntax Tree (AST) of the current The new grammar will preserve the Abstract Syntax Tree (AST) of the current
implementation. This means that no semantic changes will be introduced by this implementation. This means that no semantic changes will be introduced by this
PEP on existing code that uses f-strings. PEP on existing code that uses f-strings.
@ -362,8 +377,11 @@ tokens:
2. Keep consuming tokens until a one of the following is encountered: 2. Keep consuming tokens until a one of the following is encountered:
* A closing quote equal to the opening quote. * A closing quote equal to the opening quote.
* An opening brace (``{``) or a closing brace (``}``) that is not immediately * If in "format specifier mode" (see step 3), an opening brace (``{``) or a
followed by another opening/closing brace. closing brace (``}``).
* If not in "format specifier mode" (see step 3), an opening brace (``{``) or
a closing brace (``}``) that is not immediately followed by another opening/closing
brace.
In all cases, if the character buffer is not empty, emit a ``FSTRING_MIDDLE`` In all cases, if the character buffer is not empty, emit a ``FSTRING_MIDDLE``
token with the contents captured so far but transform any double token with the contents captured so far but transform any double
@ -375,16 +393,15 @@ tokens:
is encountered, go to step 3. is encountered, go to step 3.
* If a closing bracket (not immediately followed by another closing bracket) * If a closing bracket (not immediately followed by another closing bracket)
is encountered, emit a token for the closing bracket and go to step 2. is encountered, emit a token for the closing bracket and go to step 2.
3. Push a new tokenizer mode to the tokenizer mode stack for "Regular Python 3. Push a new tokenizer mode to the tokenizer mode stack for "Regular Python
tokenization within f-string" and proceed to tokenize with it. This mode tokenization within f-string" and proceed to tokenize with it. This mode
tokenizes as the "Regular Python tokenization" until a ``!``, ``:``, ``=`` tokenizes as the "Regular Python tokenization" until a ``:`` or a ``}``
character is encountered or if a ``}`` character is encountered with the same character is encountered with the same level of nesting as the opening
level of nesting as the opening bracket token that was pushed when we enter the bracket token that was pushed when we enter the f-string part. Using this mode,
f-string part. Using this mode, emit tokens until one of the stop points are emit tokens until one of the stop points are reached. When this happens, emit
reached. When this happens, emit the corresponding token for the stopping the corresponding token for the stopping character encountered and, pop the
character encountered and, pop the current tokenizer mode from the tokenizer mode current tokenizer mode from the tokenizer mode stack and go to step 2. If the
stack and go to step 2. stopping point is a ``:`` character, enter step 2 in "format specifier" mode.
4. Emit a ``FSTRING_END`` token with the contents captured and pop the current 4. Emit a ``FSTRING_END`` token with the contents captured and pop the current
tokenizer mode (corresponding to "F-string tokenization") and go back to tokenizer mode (corresponding to "F-string tokenization") and go back to
"Regular Python mode". "Regular Python mode".
@ -561,7 +578,7 @@ Rejected Ideas
>>> f'Useless use of lambdas: { lambda x: x*2 }' >>> f'Useless use of lambdas: { lambda x: x*2 }'
SyntaxError: unexpected EOF while parsing SyntaxError: unexpected EOF while parsing
The reason is that this will introduce a considerable amount of The reason is that this will introduce a considerable amount of
complexity for no real benefit. This is due to the fact that the ``:`` character complexity for no real benefit. This is due to the fact that the ``:`` character
normally separates the f-string format specification. This format specification normally separates the f-string format specification. This format specification
is currently tokenized as a string. As the tokenizer MUST tokenize what's on the is currently tokenized as a string. As the tokenizer MUST tokenize what's on the