368 lines
16 KiB
ReStructuredText
368 lines
16 KiB
ReStructuredText
|
PEP: 701
|
|||
|
Title: Syntactic formalization of f-strings
|
|||
|
Author: Pablo Galindo <pablogsal@python.org>,
|
|||
|
Batuhan Taskaya <batuhan@python.org>,
|
|||
|
Lysandros Nikolaou <lisandrosnik@gmail.com>
|
|||
|
Discussions-To:
|
|||
|
Status: Draft
|
|||
|
Type: Standards Track
|
|||
|
Content-Type: text/x-rst
|
|||
|
Created: 15-Nov-2022
|
|||
|
Python-Version: 3.12
|
|||
|
|
|||
|
|
|||
|
Abstract
|
|||
|
========
|
|||
|
|
|||
|
This document proposes to lift some of the restrictions originally formulated in
|
|||
|
:pep:`498` and to provide a formalized grammar for f-strings that can be
|
|||
|
integrated into the parser directly. The proposed syntactic formalization of
|
|||
|
f-strings will have some small side-effects on how f-strings are parsed and
|
|||
|
interpreted, allowing for a considerable number of advantages for end users and
|
|||
|
library developers, while also dramatically reducing the maintenance cost of
|
|||
|
the code dedicated to parsing f-strings.
|
|||
|
|
|||
|
|
|||
|
Motivation
|
|||
|
==========
|
|||
|
|
|||
|
When f-strings were originally introduced in :pep:`498`, the specification was
|
|||
|
provided without providing a formal grammar for f-strings. Additionally, the
|
|||
|
specification contains several restrictions that are imposed so the parsing of
|
|||
|
f-strings could be implemented into CPython without modifying the existing
|
|||
|
lexer. These limitations have been recognized previously and previous attempts
|
|||
|
have been made to lift them in :pep:`536`, but `none of this work was ever implemented`_.
|
|||
|
Some of these limitations (collected originally by :pep:`536`) are:
|
|||
|
|
|||
|
#. It is impossible to use the quote character delimiting the f-string
|
|||
|
within the expression portion::
|
|||
|
|
|||
|
>>> f'Magic wand: { bag['wand'] }'
|
|||
|
^
|
|||
|
SyntaxError: invalid syntax
|
|||
|
|
|||
|
#. A previously considered way around it would lead to escape sequences
|
|||
|
in executed code and is prohibited in f-strings::
|
|||
|
|
|||
|
>>> f'Magic wand { bag[\'wand\'] } string'
|
|||
|
SyntaxError: f-string expression portion cannot include a backslash
|
|||
|
|
|||
|
#. Comments are forbidden even in multi-line f-strings::
|
|||
|
|
|||
|
>>> f'''A complex trick: {
|
|||
|
... bag['bag'] # recursive bags!
|
|||
|
... }'''
|
|||
|
SyntaxError: f-string expression part cannot include '#'
|
|||
|
|
|||
|
#. Arbitrary nesting of expressions without expansion of escape sequences is
|
|||
|
available in every single other language employing a string interpolation
|
|||
|
method that uses expressions instead of just variable names, `per Wikipedia`_.
|
|||
|
|
|||
|
These limitations serve no purpose from a language user perspective and
|
|||
|
can be lifted by giving f-literals a regular grammar without exceptions
|
|||
|
and implementing it using dedicated parse code.
|
|||
|
|
|||
|
The other issue that f-strings have is that the current implementation in
|
|||
|
CPython relies on tokenising f-strings as ``STRING`` tokens and a post processing of
|
|||
|
these tokens. This has the following problems:
|
|||
|
|
|||
|
#. It adds a considerable maintenance cost to the CPython parser. This is because
|
|||
|
the parsing code needs to be written by hand, which has historically led to a
|
|||
|
considerable number of inconsistencies and bugs. Writing and maintaining parsing
|
|||
|
code by hand in C has always been considered error prone and dangerous as it needs
|
|||
|
to deal with a lot of manual memory management over the original lexer buffers.
|
|||
|
|
|||
|
#. The f-string parsing code is not able to use the new improved error message mechanisms
|
|||
|
that the new PEG parser, originally introduced in :pep:`617`, has allowed. The
|
|||
|
improvements that these error messages brought has been greatly celebrated but
|
|||
|
unfortunately f-strings cannot benefit from them because they are parsed in a
|
|||
|
separate piece of the parsing machinery. This is especially unfortunate, since
|
|||
|
there are several syntactical features of f-strings that can be confusing due
|
|||
|
to the different implicit tokenization that happens inside the expression
|
|||
|
part (for instance ``f"{y:=3}"`` is not an assignment expression).
|
|||
|
|
|||
|
#. Other Python implementations have no way to know if they have implemented
|
|||
|
f-strings correctly because contrary to other language features, they are not
|
|||
|
part of the :ref:`official Python grammar <f-strings>`.
|
|||
|
This is important because several prominent
|
|||
|
alternative implementations are using CPython's PEG parser, `such as PyPy`_,
|
|||
|
and/or are basing their grammars on the official PEG grammar. The
|
|||
|
fact that f-strings use a separate parser prevents these alternative implementations
|
|||
|
from leveraging the official grammar and benefiting from improvements in error messages derived
|
|||
|
from the grammar.
|
|||
|
|
|||
|
|
|||
|
A version of this proposal was originally `discussed on Python-Dev`_ and
|
|||
|
`presented at the Python Language Summit 2022`_ where it was enthusiastically
|
|||
|
received.
|
|||
|
|
|||
|
Rationale
|
|||
|
=========
|
|||
|
|
|||
|
By building on top of the new Python PEG Parser (:pep:`617`), this PEP proposes
|
|||
|
to redefine “f-strings”, especially emphasizing the clear separation of the
|
|||
|
string component and the expression (or replacement, ``{...}``) component. :pep:`498`
|
|||
|
summarizes the syntactical part of “f-strings” as the following:
|
|||
|
|
|||
|
In Python source code, an f-string is a literal string, prefixed with ‘f’, which
|
|||
|
contains expressions inside braces. The expressions are replaced with their values.
|
|||
|
|
|||
|
However, :pep:`498` also contained a formal list of exclusions on what
|
|||
|
can or cannot be contained inside the expression component (primarily due to the
|
|||
|
limitations of the existing parser). By clearly establishing the formal grammar, we
|
|||
|
now also have the ability to define the expression component of an f-string as truly "any
|
|||
|
applicable Python expression" (in that particular context) without being bound
|
|||
|
by the limitations imposed by the details of our implementation.
|
|||
|
|
|||
|
The formalization effort and the premise above also has a significant benefit for
|
|||
|
Python programmers due to its ability to simplify and eliminate the obscure
|
|||
|
limitations. This reduces the mental burden and the cognitive complexity of
|
|||
|
f-string literals (as well as the Python language in general).
|
|||
|
|
|||
|
#. The expression component can include any string literal that a normal Python expression
|
|||
|
can include. This opens up the possibility of nesting string literals (formatted or
|
|||
|
not) inside the expression component of an f-string with the same quote type (and length)::
|
|||
|
|
|||
|
>>> f"These are the things: {", ".join(things)}"
|
|||
|
|
|||
|
>>> f"{source.removesuffix(".py")}.c: $(srcdir)/{source}"
|
|||
|
|
|||
|
>>> f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}"
|
|||
|
|
|||
|
This choice not only allows for a more consistent and predictable behavior of what can be
|
|||
|
placed in f-strings but provides an intuitive way to manimulate string literals in a
|
|||
|
more flexible way without to having to fight the limitations of the implementation.
|
|||
|
|
|||
|
#. Another issue that has felt unintuitive to most is the lack of support for backslashes
|
|||
|
within the expression component of an f-string. One example that keeps coming up is including
|
|||
|
a newline character in the expression part for joining containers. For example::
|
|||
|
|
|||
|
>>> a = ["hello", "world"]
|
|||
|
>>> f"{'\n'.join(a)}"
|
|||
|
File "<stdin>", line 1
|
|||
|
f"{'\n'.join(a)}"
|
|||
|
^
|
|||
|
SyntaxError: f-string expression part cannot include a backslash
|
|||
|
|
|||
|
A common work-around for this was to either assign the newline to an intermediate variable or
|
|||
|
pre-create the whole string prior to creating the f-string::
|
|||
|
|
|||
|
>>> a = ["hello", "world"]
|
|||
|
>>> joined = '\n'.join(a)
|
|||
|
>>> f"{joined}"
|
|||
|
'hello\nworld'
|
|||
|
|
|||
|
It only feels natural to allow backslashes in the expression part now that the new PEG parser
|
|||
|
can easily support it.
|
|||
|
|
|||
|
>>> a = ["hello", "world"]
|
|||
|
>>> f"{'\n'.join(a)}"
|
|||
|
'hello\nworld'
|
|||
|
|
|||
|
#. Before the changes proposed in this document, there was no explicit limit in
|
|||
|
how f-strings can be nested, but the fact that string quotes cannot be reused
|
|||
|
inside the expression component of f-strings made it impossible to nest
|
|||
|
f-strings arbitrarily. In fact, this is the most nested-fstring that can be
|
|||
|
written::
|
|||
|
|
|||
|
>>> f"""{f'''{f'{f"{1+1}"}'}'''}"""
|
|||
|
'2'
|
|||
|
|
|||
|
As this PEP allows placing **any** valid Python expression inside the
|
|||
|
expression component of the f-strings, it is now possible to reuse quotes and
|
|||
|
therefore is possible to nest f-strings arbitrarily::
|
|||
|
|
|||
|
>>> f"{f"{f"{f"{f"{f"{1+1}"}"}"}"}"}"
|
|||
|
'2'
|
|||
|
|
|||
|
Although this is just a consequence of allowing arbitrary expressions, the
|
|||
|
authors of this PEP do not believe that this is a fundamental benefit and we
|
|||
|
have decided that the language specification will not explicitly mandate that
|
|||
|
this nesting can be arbitrary. This is because allowing arbitrarily-deep
|
|||
|
nesting imposes a lot of extra complexity to the lexer implementation
|
|||
|
(particularly as lexer/parser pipelines need to allow "untokenizing" to
|
|||
|
support the 'f-string debugging expressions' and this is especially taxing when
|
|||
|
arbitrary nesting is allowed). Implementations are therefore free to impose a
|
|||
|
limit on the nesting depth if they need to. Note that this is not an uncommon
|
|||
|
situation, as the CPython implementation already imposes several limits all
|
|||
|
over the place, including a limit on the nesting depth of parentheses and
|
|||
|
brackets, a limit on the nesting of the blocks, a limit in the number of
|
|||
|
branches in ``if`` statements, a limit on the number of expressions in
|
|||
|
star-unpacking, etc.
|
|||
|
|
|||
|
Specification
|
|||
|
=============
|
|||
|
|
|||
|
The formal proposed PEG grammar specification for f-strings is (see :pep:`617`
|
|||
|
for details on the syntax):
|
|||
|
|
|||
|
.. code-block:: peg
|
|||
|
|
|||
|
fstring
|
|||
|
| FSTRING_START fstring_middle* FSTRING_END
|
|||
|
fstring_middle
|
|||
|
| fstring_replacement_field
|
|||
|
| FSTRING_MIDDLE
|
|||
|
fstring_replacement_field
|
|||
|
| '{' (yield_expr | star_expressions) "="? [ "!" NAME ] [ ':' fstring_format_spec* ] '}'
|
|||
|
fstring_format_spec:
|
|||
|
| FSTRING_MIDDLE
|
|||
|
| fstring_replacement_field
|
|||
|
|
|||
|
This PEP leaves up to the implementation the level of f-string nesting allowed.
|
|||
|
This means that limiting nesting is **not part of the language specification**
|
|||
|
but also the language specification **doesn't mandate arbitrary nesting**.
|
|||
|
|
|||
|
Three new tokens are introduced:
|
|||
|
|
|||
|
* ``FSTRING_START``: This token includes f-string character (``f``/``F``) and the open quote(s).
|
|||
|
* ``FSTRING_MIDDLE``: This token includes the text between the opening quote
|
|||
|
and the first expression brace (``{``) and the text between two expression braces (``}`` and ``{``).
|
|||
|
* ``FSTRING_END``: This token includes everything after the last expression brace (or the whole literal part
|
|||
|
if no expression exists) until the closing quote.
|
|||
|
|
|||
|
These tokens are always string parts and they are semantically equivalent to the
|
|||
|
``STRING`` token with the restrictions specified. These tokens must be produced by the lexer
|
|||
|
when lexing f-strings. This means that **the tokenizer cannot produce a single token for f-strings anymore**. How
|
|||
|
the lexer emits this token is **not specified** as this will heavily depend on every
|
|||
|
implementation (even the Python version of the lexer in the standard library is
|
|||
|
implemented differently to the one used by the PEG parser).
|
|||
|
|
|||
|
As an example::
|
|||
|
|
|||
|
f'some words {a+b} more words {c+d} final words'
|
|||
|
|
|||
|
will be tokenized as::
|
|||
|
|
|||
|
FSTRING_START - "f'"
|
|||
|
FSTRING_MIDDLE - 'some words '
|
|||
|
LBRACE - '{'
|
|||
|
NAME - 'a'
|
|||
|
PLUS - '+'
|
|||
|
NAME - 'b'
|
|||
|
RBRACE - '}'
|
|||
|
FSTRING_MIDDLE - ' more words '
|
|||
|
LBRACE - '{'
|
|||
|
NAME - 'c'
|
|||
|
PLUS - '+'
|
|||
|
NAME - 'd'
|
|||
|
RBRACE - '}'
|
|||
|
FSTRING_END - ' final words' (without the end quote)
|
|||
|
|
|||
|
while ``f"""some words"""`` will be tokenized simply as::
|
|||
|
|
|||
|
FSTRING_START - 'f"""'
|
|||
|
FSTRING_END - 'some words'
|
|||
|
|
|||
|
All restrictions mentioned in the PEP are lifted from f-literals, as explained below:
|
|||
|
|
|||
|
* Expression portions may now contain strings delimited with the same kind of
|
|||
|
quote that is used to delimit the f-literal.
|
|||
|
* Backslashes may now appear within expressions just like anywhere else in
|
|||
|
Python code. In case of strings nested within f-literals, escape sequences are
|
|||
|
expanded when the innermost string is evaluated.
|
|||
|
* Comments, using the ``#`` character, are possible only in multi-line f-literals,
|
|||
|
since comments are terminated by the end of the line (which makes closing a
|
|||
|
single-line f-literal impossible)
|
|||
|
|
|||
|
Backwards Compatibility
|
|||
|
=======================
|
|||
|
|
|||
|
This PEP is backwards compatible: any valid Python code will continue to
|
|||
|
be valid if this PEP is implemented and it will not change semantically.
|
|||
|
|
|||
|
How to Teach This
|
|||
|
=================
|
|||
|
|
|||
|
As the concept of f-strings is already ubiquitous in the Python community, there is
|
|||
|
no fundamental need for users to learn anything new. However, as the formalized grammar
|
|||
|
allows some new possibilities, it is important that the formal grammar is added to the
|
|||
|
documentation and explained in detail, explicitly mentioning what constructs are possible
|
|||
|
since this PEP is aiming to avoid confusion.
|
|||
|
|
|||
|
It is also beneficial to provide users with a simple framework for understanding what can
|
|||
|
be placed inside an f-string expression. In this case the authors think that this work will
|
|||
|
make it even simpler to explain this aspect of the language, since it can be summarized as:
|
|||
|
|
|||
|
You can place any valid Python expression inside an f-string expression.
|
|||
|
|
|||
|
With the changes in this PEP, there is no need to clarify that string quotes are
|
|||
|
limited to be different from the quotes of the enclosing string, because this is
|
|||
|
now allowed: as an arbitrary Python string can contain any possible choice of
|
|||
|
quotes, so can any f-string expression. Additionally there is no need to clarify
|
|||
|
that certain things are not allowed in the expression part because of
|
|||
|
implementation restructions such as comments, new line characters or
|
|||
|
backslashes.
|
|||
|
|
|||
|
The only "surprising" difference is that as f-strings allow specifying a
|
|||
|
format, expressions that allow a ``:`` character at the top level still need to be
|
|||
|
enclosed in parenthesis. This is not new to this work, but it is important to
|
|||
|
emphasize that this restriction is still in place. This allows for an easier
|
|||
|
modification of the summary:
|
|||
|
|
|||
|
You can place any valid Python expression inside
|
|||
|
an f-string expression, and everything after a ``:`` character at the top level will
|
|||
|
be identified as a format specification.
|
|||
|
|
|||
|
|
|||
|
Reference Implementation
|
|||
|
========================
|
|||
|
|
|||
|
A reference implementation can be found in the implementation_ fork.
|
|||
|
|
|||
|
Rejected Ideas
|
|||
|
==============
|
|||
|
|
|||
|
#. We have decided not to lift the restriction that some expression portions
|
|||
|
need to wrap ``':'`` and ``'!'`` in braces at the top level, e.g.::
|
|||
|
|
|||
|
>>> f'Useless use of lambdas: { lambda x: x*2 }'
|
|||
|
SyntaxError: unexpected EOF while parsing
|
|||
|
|
|||
|
The reason is that this would this will introduce a considerable amount of
|
|||
|
complexity for no real benefit. This is due to the fact that the ``:`` character
|
|||
|
normally separates the f-string format specification. This format specification
|
|||
|
is currently tokenized as a string. As the tokenizer MUST tokenize what's on the
|
|||
|
right of the ``:`` as either a string or a stream of tokens, this won't allow the
|
|||
|
parser to differentiate between the different semantics as that would require the
|
|||
|
tokenizer to backtrack and produce a different set of tokens (this is, first try
|
|||
|
as a stream of tokens, and if it fails, try as a string for a format specifier).
|
|||
|
|
|||
|
As there is no fundamental advantage in being able to allow lambdas and similar
|
|||
|
expressions at the top level, we have decided to keep the restriction that these must
|
|||
|
be parenthesized if needed::
|
|||
|
|
|||
|
>>> f'Useless use of lambdas: { (lambda x: x*2) }'
|
|||
|
|
|||
|
|
|||
|
Open Issues
|
|||
|
===========
|
|||
|
|
|||
|
None yet
|
|||
|
|
|||
|
|
|||
|
Footnotes
|
|||
|
=========
|
|||
|
|
|||
|
|
|||
|
.. _official Python grammar: https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals
|
|||
|
|
|||
|
.. _none of this work was ever implemented: https://mail.python.org/archives/list/python-dev@python.org/thread/N43O4KNLZW4U7YZC4NVPCETZIVRDUVU2/#NM2A37THVIXXEYR4J5ZPTNLXGGUNFRLZ
|
|||
|
|
|||
|
.. _such as PyPy: https://foss.heptapod.net/pypy/pypy/-/commit/fe120f89bf07e64a41de62b224e4a3d80e0fe0d4/pipelines?ref=branch%2Fpy3.9
|
|||
|
|
|||
|
.. _discussed on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/thread/54N3MOYVBDSJQZTU6MTCPLUPIFSDN5IS/#SAYU6SMP4KT7G7AQ6WVQYUDOSZPKHJMS
|
|||
|
|
|||
|
.. _presented at the Python Language Summit 2022: https://pyfound.blogspot.com/2022/05/the-2022-python-language-summit-f.html
|
|||
|
|
|||
|
.. _per Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples
|
|||
|
|
|||
|
.. _implementation: https://github.com/we-like-parsers/cpython/tree/fstring-grammar
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
=========
|
|||
|
|
|||
|
This document is placed in the public domain or under the
|
|||
|
CC0-1.0-Universal license, whichever is more permissive.
|