PEP 701: Add some clarifications to f-string debug expressions and tokens (#2929)

This commit is contained in:
Pablo Galindo Salgado 2022-12-16 21:41:26 +00:00 committed by GitHub
parent b4d7ea782d
commit 9de4efd734
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 45 additions and 2 deletions

View File

@ -213,7 +213,47 @@ This PEP leaves up to the implementation the level of f-string nesting allowed.
This means that limiting nesting is **not part of the language specification**
but also the language specification **doesn't mandate arbitrary nesting**.
Three new tokens are introduced:
Handling of f-string debug expressions
--------------------------------------
Since Python 3.8, f-strings can be used to debug expressions by using the
``=`` operator. For example::
>>> a = 1
>>> f"{1+1=}"
'1+1=2'
This semantics were not introduced formally in a PEP and they were implemented
in the current string parser as a special case in `bpo-36817
<https://bugs.python.org/issue?@action=redirect&bpo=36817>`_ and documented in
`the f-string lexical analysis section
<https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_.
This feature is not affected by the changes proposed in this PEP but is
important to specify that the formal handling of this feature requires the lexer
to be able to "untokenize" the expression part of the f-string. This is not a
problem for the current string parser as it can operate directly on the string
token contents. However, incorporating this feature into a given parser
implementation requires the lexer to keep track of the raw string contents of
the expression part of the f-string and make them available to the parser when
the parse tree is constructed for f-string nodes. A pure "untokenization" is not
enough because as specified currently, f-string debugging preserve whitespace,
including spaces after the ``{`` and the ``=`` characters. This means that the
raw string contents of the expression part of the f-string must be kept intact
and not just the associated tokens.
How parser/lexer implementations deal with this problem is of course up to the
implementation.
New tokens
----------
Three new tokens are introduced: ``FSTRING_START``, ``FSTRING_MIDDLE`` and
``FSTRING_END``. This PEP does not mandate the precise definitions of these tokens
as different lexers may have different implementations that may be more efficient
than the ones proposed here given the context of the particular implementation. However,
the following definitions are provided as a reference so that the reader can have a
better understanding of the proposed grammar changes and how the tokens are used:
* ``FSTRING_START``: This token includes f-string character (``f``/``F``) and the open quote(s).
* ``FSTRING_MIDDLE``: This token includes the text between the opening quote
@ -254,6 +294,9 @@ while ``f"""some words"""`` will be tokenized simply as::
FSTRING_START - 'f"""'
FSTRING_END - 'some words'
Consequences of the new grammar
-------------------------------
All restrictions mentioned in the PEP are lifted from f-literals, as explained below:
* Expression portions may now contain strings delimited with the same kind of
@ -291,7 +334,7 @@ limited to be different from the quotes of the enclosing string, because this is
now allowed: as an arbitrary Python string can contain any possible choice of
quotes, so can any f-string expression. Additionally there is no need to clarify
that certain things are not allowed in the expression part because of
implementation restructions such as comments, new line characters or
implementation restrictions such as comments, new line characters or
backslashes.
The only "surprising" difference is that as f-strings allow specifying a