PEP 701: Incorporate feedback from the discussion thread (#2939)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
This commit is contained in:
Pablo Galindo Salgado 2022-12-24 21:42:21 +00:00 committed by GitHub
parent 574e82c2f4
commit d547ef7ef4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 110 additions and 10 deletions

View File

@ -143,9 +143,8 @@ f-string literals (as well as the Python language in general).
>>> f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}"
This choice not only allows for a more consistent and predictable behavior of what can be
placed in f-strings but provides an intuitive way to manipulate string literals in a
more flexible way without to having to fight the limitations of the implementation.
This "feature" is not universally agreed to be desirable, and some users find this unreadable.
For a discussion on the different views on this, see the :ref:`701-considerations-of-quote-reuse` section.
#. Another issue that has felt unintuitive to most is the lack of support for backslashes
within the expression component of an f-string. One example that keeps coming up is including
@ -223,10 +222,17 @@ for details on the syntax):
| FSTRING_MIDDLE
| fstring_replacement_field
The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined
:ref:`later in this document <701-new-tokens>`.
This PEP leaves up to the implementation the level of f-string nesting allowed.
This means that limiting nesting is **not part of the language specification**
but also the language specification **doesn't mandate arbitrary nesting**.
The new grammar will preserve the Abstract Syntax Tree (AST) of the current
implementation. This means that no semantic changes will be introduced by this
PEP on existing code that uses f-strings.
Handling of f-string debug expressions
--------------------------------------
@ -259,6 +265,8 @@ and not just the associated tokens.
How parser/lexer implementations deal with this problem is of course up to the
implementation.
.. _701-new-tokens:
New tokens
----------
@ -277,10 +285,10 @@ better understanding of the proposed grammar changes and how the tokens are used
These tokens are always string parts and they are semantically equivalent to the
``STRING`` token with the restrictions specified. These tokens must be produced by the lexer
when lexing f-strings. This means that **the tokenizer cannot produce a single token for f-strings anymore**. How
the lexer emits this token is **not specified** as this will heavily depend on every
implementation (even the Python version of the lexer in the standard library is
implemented differently to the one used by the PEG parser).
when lexing f-strings. This means that **the tokenizer cannot produce a single token for f-strings anymore**.
How the lexer emits this token is **not specified** as this will heavily depend on every
implementation (even the Python version of the lexer in the standard library is implemented
differently to the one used by the PEG parser).
As an example::
@ -308,6 +316,20 @@ while ``f"""some words"""`` will be tokenized simply as::
FSTRING_START - 'f"""'
FSTRING_END - 'some words'
One way existing lexers can be adapted to emit these tokens is to incorporate a stack of "lexer modes"
or to use a stack of different lexers. This is because the lexer needs to switch from "regular Python
lexing" to "f-string lexing" when it encounters an f-string start token and as f-strings can be nested,
the context needs to be preserved until the f-string closes. Also, the "lexer mode" inside an f-string
expression part needs to behave as a "super-set" of the regular Python lexer (as it needs to be able to
switch back to f-string lexing when it encounters the ``}`` terminator for the expression part as well
as handling f-string formatting and debug expressions). Of course, as mentioned before, is not possible to
provide a precise specification of how this should be done as it will depend on the specific implementation
and nature of the lexer to be changed.
The specifics of how (or if) the ``tokenize`` module will emit these tokens (or others) and what
is included in the emitted tokens are left out of this document and must be decided later in a regular
CPython issue.
Consequences of the new grammar
-------------------------------
@ -320,7 +342,76 @@ All restrictions mentioned in the PEP are lifted from f-string literals, as expl
expanded when the innermost string is evaluated.
* Comments, using the ``#`` character, are possible only in multi-line f-string literals,
since comments are terminated by the end of the line (which makes closing a
single-line f-string literal impossible)
single-line f-string literal impossible). Comments in multi-line f-string literals require
the closing ``{`` of the expression part to be present in a different line as the one the
comment is in.
.. _701-considerations-of-quote-reuse:
Considerations regarding quote reuse
------------------------------------
One of the consequences of the grammar proposed here is that, as mentioned above,
f-string expressions can now contain strings delimited with the same kind of quote
that is used to delimit the external f-string literal. For example:
>>> f" something { my_dict["key"] } something else "
In the `discussion thread for this PEP <https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046>`_,
several concerns have been raised regarding this aspect and we want to collect them here,
as these should be taken into consideration when accepting or rejecting this PEP.
Some of these objections include:
* Many people find quote reuse withing the same string confusing and hard to read. This is because
allowing quote reuse will violate a current property of Python as it stands today: the fact that
strings are fully delimited by two consecutive pairs of the same kind of quote, which by itself is a very simple rule.
One of the reasons quote reuse may be harder for humans to parse, leading to less readable
code, is that the quote character is the same for both start and
end (as opposed to other delimiters).
* Some users have raised concerns that quote reuse may break some lexer and syntax highlighting tools that rely
on simple mechanisms to detect strings and f-strings, such as regular expressions or simple delimiter
matching tools. Introducing quote reuse in f-strings will either make it trickier to keep these tools
working or will break the tools altogether (as, for instance, regular expressions cannot parse arbitrary nested
structures with delimiters). The IDLE editor, included in the standard library, is an example of a
tool which may need some work to correctly apply syntax highlighting to f-strings.
Here are some of the arguments in favour:
* Many languages that allow similar syntactic constructs (normally called "string interpolation") allow quote
reuse and arbitrary nesting. These languages include JavaScript, Ruby, C#, Bash, Swift and many others.
The fact that many languages allow quote reuse can be a compelling argument in favour of allowing it in Python. This
is because it will make the language more familiar to users coming from other languages.
* As many other popular languages allow quote reuse in string interpolation constructs, this means that editors
that support syntax highlighting for these languages will already have the necessary tools to support syntax
highlighting for f-strings with quote reuse in Python. This means that although the files that handle syntax
highlighting for Python will need to be updated to support this new feature, is not expected to be impossible
or very hard to do.
* One advantage of allowing quote reuse is that it composes cleanly with other syntax. Sometimes this is referred to
as "referential transparency". An example of this is that if we have ``f(x+1)``, assuming ``a`` is a brand new variable, it
should behave the same as ``a = x+1; f(a)``. And vice versa. So if we have::
def py2c(source):
prefix = source.removesuffix(".py")
return f"{prefix}.c"
It should be expected that if we replace the variable ``prefix`` with its definition, the answer should be the same::
def py2c(source):
return f"{source.removesuffix(".py")}.c"
* Limiting quote reuse will considerably increase the complexity of the implementation of the proposed changes. This is because
it will force the parser to have the context that is parsing an expression part of an f-string with a given quote in order
to know if it needs to reject an expression that reuses the quote. Carrying this context around is not trivial in parsers that
can backtrack arbitrarily (such as the PEG parser). The issue becomes even more complex if we consider that f-strings can be
arbitrarily nested and therefore several quote types may need to be rejected.
To gather feedback from the community,
`a poll <https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046/24>`__
has been initiated to get a sense of how the community feels about this aspect of the PEP.
Backwards Compatibility
=======================
@ -370,8 +461,18 @@ A reference implementation can be found in the implementation_ fork.
Rejected Ideas
==============
#. Although we think the readability arguments that have been raised against
allowing quote reuse in f-string expressions are valid and very important,
we have decided to propose not rejecting quote reuse in f-strings at the parser
level. The reason is that one of the cornerstones of this PEP is to reduce the
complexity and maintenance of parsing f-strings in CPython and this will not
only work against that goal, but it may even make the implementation even more
complex than the current one. We believe that forbidding quote reuse should be
done in linters and code style tools and not in the parser, the same way other
confusing or hard-to-read constructs in the language are handled today.
#. We have decided not to lift the restriction that some expression portions
need to wrap ``':'`` and ``'!'`` in braces at the top level, e.g.::
need to wrap ``':'`` and ``'!'`` in parentheses at the top level, e.g.::
>>> f'Useless use of lambdas: { lambda x: x*2 }'
SyntaxError: unexpected EOF while parsing
@ -390,7 +491,6 @@ Rejected Ideas
be parenthesized if needed::
>>> f'Useless use of lambdas: { (lambda x: x*2) }'
Open Issues
===========