PEP 701: Add some clarifications to f-string debug expressions and tokens (#2929)

2022-12-16 21:41:26 +00:00 · 2022-12-16 21:41:26 +00:00 · 9de4efd734
parent b4d7ea782d
commit 9de4efd734
1 changed files with 45 additions and 2 deletions
--- a/pep-0701.rst
+++ b/pep-0701.rst
@ -213,7 +213,47 @@ This PEP leaves up to the implementation the level of f-string nesting allowed.
 This means that limiting nesting is **not part of the language specification**
 but also the language specification **doesn't mandate arbitrary nesting**. 
-Three new tokens are introduced:
+Handling of f-string debug expressions
 --------------------------------------
 Since Python 3.8, f-strings can be used to debug expressions by using the
 ``=`` operator. For example::
    >>> a = 1
    >>> f"{1+1=}"
    '1+1=2'
 This semantics were not introduced formally in a PEP and they were implemented
 in the current string parser as a special case in `bpo-36817
 <https://bugs.python.org/issue?@action=redirect&bpo=36817>`_ and documented in
 `the f-string lexical analysis section
 <https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_.
 This feature is not affected by the changes proposed in this PEP but is
 important to specify that the formal handling of this feature requires the lexer
 to be able to "untokenize" the expression part of the f-string. This is not a
 problem for the current string parser as it can operate directly on the string
 token contents. However, incorporating this feature into a given parser
 implementation requires the lexer to keep track of the raw string contents of
 the expression part of the f-string and make them available to the parser when
 the parse tree is constructed for f-string nodes. A pure "untokenization" is not
 enough because as specified currently, f-string debugging preserve whitespace,
 including spaces after the ``{`` and the ``=`` characters. This means that the
 raw string contents of the expression part of the f-string must be kept intact
 and not just the associated tokens.
 How parser/lexer implementations deal with this problem is of course up to the
 implementation.
 New tokens
 ----------
 Three new tokens are introduced: ``FSTRING_START``, ``FSTRING_MIDDLE`` and
 ``FSTRING_END``. This PEP does not mandate the precise definitions of these tokens
 as different lexers may have different implementations that may be more efficient
 than the ones proposed here given the context of the particular implementation.  However,
 the following definitions are provided as a reference so that the reader can have a
 better understanding of the proposed grammar changes and how the tokens are used:
 * ``FSTRING_START``: This token includes f-string character (``f``/``F``) and the open quote(s).
 * ``FSTRING_MIDDLE``: This token includes the text between the opening quote
@ -254,6 +294,9 @@ while ``f"""some words"""`` will be tokenized simply as::
    FSTRING_START - 'f"""'
    FSTRING_END - 'some words'
 Consequences of the new grammar
 -------------------------------
 All restrictions mentioned in the PEP are lifted from f-literals, as explained below:
 * Expression portions may now contain strings delimited with the same kind of
@ -291,7 +334,7 @@ limited to be different from the quotes of the enclosing string, because this is
 now allowed: as an arbitrary Python string can contain any possible choice of
 quotes, so can any f-string expression. Additionally there is no need to clarify
 that certain things are not allowed in the expression part because of
-implementation restructions such as comments, new line characters or
+implementation restrictions such as comments, new line characters or
 backslashes. 
 The only "surprising" difference is that as f-strings allow specifying a