PEP 701: Correct handling of format specifiers and nested expressions (#3151)

2023-05-22 13:29:47 +01:00 · 2023-05-22 13:29:47 +01:00 · 295769391a
parent 69b9d2b3d3
commit 295769391a
1 changed files with 31 additions and 14 deletions
--- a/pep-0701.rst
+++ b/pep-0701.rst
@ -228,13 +228,28 @@ for details on the syntax):
 The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined
 :ref:`later in this document <701-new-tokens>`.

-This PEP leaves up to the implementation the level of f-string nesting allowed but
-**specifies a lower bound of 5 levels of nesting**. This is to ensure that users can
-have a reasonable expectation of being able to nest f-strings with "reasonable" depth.
+This PEP leaves up to the implementation the level of f-string nesting allowed
+(f-strings withing the expression parts of other f-strings) but **specifies a
+lower bound of 5 levels of nesting**. This is to ensure that users can have a
+reasonable expectation of being able to nest f-strings with "reasonable" depth.
 This PEP implies that limiting nesting is **not part of the language
 specification** but also the language specification **doesn't mandate arbitrary
 nesting**.

+Similarly, this PEP leaves up to the implementation the level of expression nesting
+in format specifiers but **specifies a lower bound of 2 levels of nesting**. This means
+that the following should always be valid:
+
+.. code-block:: python
+
+    f"{'':*^{1:{1}}}"
+
+but the following can be valid or not depending on the implementation:
+
+.. code-block:: python
+
+    f"{'':*^{1:{1:{1}}}}"
+
 The new grammar will preserve the Abstract Syntax Tree (AST) of the current
 implementation. This means that no semantic changes will be introduced by this
 PEP on existing code that uses f-strings.
@ -362,8 +377,11 @@ tokens:
 2. Keep consuming tokens until a one of the following is encountered:

   * A closing quote equal to the opening quote.
-   * An opening brace (``{``) or a closing brace (``}``) that is not immediately
-     followed by another opening/closing brace.
+   * If in "format specifier mode" (see step 3), an opening brace (``{``) or a
+     closing brace (``}``).
+   * If not in "format specifier mode" (see step 3), an opening brace (``{``) or
+     a closing brace (``}``) that is not immediately followed by another opening/closing
+     brace.

   In all cases, if the character buffer is not empty, emit a ``FSTRING_MIDDLE``
   token with the contents captured so far but transform any double
@ -375,16 +393,15 @@ tokens:
     is encountered, go to step 3.
   * If a closing bracket (not immediately followed by another closing bracket)
     is encountered, emit a token for the closing bracket and go to step 2.
-
 3. Push a new tokenizer mode to the tokenizer mode stack for "Regular Python
   tokenization within f-string" and proceed to tokenize with it. This mode
-   tokenizes as the "Regular Python tokenization" until a ``!``, ``:``, ``=``
-   character is encountered or if a ``}`` character is encountered with the same
-   level of nesting as the opening bracket token that was pushed when we enter the
-   f-string part. Using this mode, emit tokens until one of the stop points are
-   reached. When this happens, emit the corresponding token for the stopping
-   character encountered and, pop the current tokenizer mode from the tokenizer mode
-   stack and go to step 2.
+   tokenizes as the "Regular Python tokenization" until a ``:`` or a ``}``
+   character is encountered with the same level of nesting as the opening
+   bracket token that was pushed when we enter the f-string part. Using this mode,
+   emit tokens until one of the stop points are reached. When this happens, emit
+   the corresponding token for the stopping character encountered and, pop the
+   current tokenizer mode from the tokenizer mode stack and go to step 2. If the
+   stopping point is a ``:`` character, enter step 2 in "format specifier" mode.
 4. Emit a ``FSTRING_END`` token with the contents captured and pop the current
   tokenizer mode (corresponding to "F-string tokenization") and go back to
   "Regular Python mode".
@ -561,7 +578,7 @@ Rejected Ideas
    >>> f'Useless use of lambdas: { lambda x: x*2 }'
    SyntaxError: unexpected EOF while parsing

-   The reason is that this will introduce a considerable amount of
+  The reason is that this will introduce a considerable amount of
   complexity for no real benefit. This is due to the fact that the ``:`` character
   normally separates the f-string format specification. This format specification
   is currently tokenized as a string. As the tokenizer MUST tokenize what's on the