From d547ef7ef43190be69d13a87dd15f7983025cf26 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Sat, 24 Dec 2022 21:42:21 +0000 Subject: [PATCH] PEP 701: Incorporate feedback from the discussion thread (#2939) Co-authored-by: Alex Waygood Co-authored-by: Jelle Zijlstra Co-authored-by: C.A.M. Gerlach --- pep-0701.rst | 120 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 110 insertions(+), 10 deletions(-) diff --git a/pep-0701.rst b/pep-0701.rst index 09a0c0ed1..3661e6a2f 100644 --- a/pep-0701.rst +++ b/pep-0701.rst @@ -143,9 +143,8 @@ f-string literals (as well as the Python language in general). >>> f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}" - This choice not only allows for a more consistent and predictable behavior of what can be - placed in f-strings but provides an intuitive way to manipulate string literals in a - more flexible way without to having to fight the limitations of the implementation. + This "feature" is not universally agreed to be desirable, and some users find this unreadable. + For a discussion on the different views on this, see the :ref:`701-considerations-of-quote-reuse` section. #. Another issue that has felt unintuitive to most is the lack of support for backslashes within the expression component of an f-string. One example that keeps coming up is including @@ -223,10 +222,17 @@ for details on the syntax): | FSTRING_MIDDLE | fstring_replacement_field +The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined +:ref:`later in this document <701-new-tokens>`. + This PEP leaves up to the implementation the level of f-string nesting allowed. This means that limiting nesting is **not part of the language specification** but also the language specification **doesn't mandate arbitrary nesting**. +The new grammar will preserve the Abstract Syntax Tree (AST) of the current +implementation. This means that no semantic changes will be introduced by this +PEP on existing code that uses f-strings. + Handling of f-string debug expressions -------------------------------------- @@ -259,6 +265,8 @@ and not just the associated tokens. How parser/lexer implementations deal with this problem is of course up to the implementation. +.. _701-new-tokens: + New tokens ---------- @@ -277,10 +285,10 @@ better understanding of the proposed grammar changes and how the tokens are used These tokens are always string parts and they are semantically equivalent to the ``STRING`` token with the restrictions specified. These tokens must be produced by the lexer -when lexing f-strings. This means that **the tokenizer cannot produce a single token for f-strings anymore**. How -the lexer emits this token is **not specified** as this will heavily depend on every -implementation (even the Python version of the lexer in the standard library is -implemented differently to the one used by the PEG parser). +when lexing f-strings. This means that **the tokenizer cannot produce a single token for f-strings anymore**. +How the lexer emits this token is **not specified** as this will heavily depend on every +implementation (even the Python version of the lexer in the standard library is implemented +differently to the one used by the PEG parser). As an example:: @@ -308,6 +316,20 @@ while ``f"""some words"""`` will be tokenized simply as:: FSTRING_START - 'f"""' FSTRING_END - 'some words' +One way existing lexers can be adapted to emit these tokens is to incorporate a stack of "lexer modes" +or to use a stack of different lexers. This is because the lexer needs to switch from "regular Python +lexing" to "f-string lexing" when it encounters an f-string start token and as f-strings can be nested, +the context needs to be preserved until the f-string closes. Also, the "lexer mode" inside an f-string +expression part needs to behave as a "super-set" of the regular Python lexer (as it needs to be able to +switch back to f-string lexing when it encounters the ``}`` terminator for the expression part as well +as handling f-string formatting and debug expressions). Of course, as mentioned before, is not possible to +provide a precise specification of how this should be done as it will depend on the specific implementation +and nature of the lexer to be changed. + +The specifics of how (or if) the ``tokenize`` module will emit these tokens (or others) and what +is included in the emitted tokens are left out of this document and must be decided later in a regular +CPython issue. + Consequences of the new grammar ------------------------------- @@ -320,7 +342,76 @@ All restrictions mentioned in the PEP are lifted from f-string literals, as expl expanded when the innermost string is evaluated. * Comments, using the ``#`` character, are possible only in multi-line f-string literals, since comments are terminated by the end of the line (which makes closing a - single-line f-string literal impossible) + single-line f-string literal impossible). Comments in multi-line f-string literals require + the closing ``{`` of the expression part to be present in a different line as the one the + comment is in. + +.. _701-considerations-of-quote-reuse: + +Considerations regarding quote reuse +------------------------------------ + +One of the consequences of the grammar proposed here is that, as mentioned above, +f-string expressions can now contain strings delimited with the same kind of quote +that is used to delimit the external f-string literal. For example: + + >>> f" something { my_dict["key"] } something else " + +In the `discussion thread for this PEP `_, +several concerns have been raised regarding this aspect and we want to collect them here, +as these should be taken into consideration when accepting or rejecting this PEP. + +Some of these objections include: + +* Many people find quote reuse withing the same string confusing and hard to read. This is because + allowing quote reuse will violate a current property of Python as it stands today: the fact that + strings are fully delimited by two consecutive pairs of the same kind of quote, which by itself is a very simple rule. + One of the reasons quote reuse may be harder for humans to parse, leading to less readable + code, is that the quote character is the same for both start and + end (as opposed to other delimiters). + +* Some users have raised concerns that quote reuse may break some lexer and syntax highlighting tools that rely + on simple mechanisms to detect strings and f-strings, such as regular expressions or simple delimiter + matching tools. Introducing quote reuse in f-strings will either make it trickier to keep these tools + working or will break the tools altogether (as, for instance, regular expressions cannot parse arbitrary nested + structures with delimiters). The IDLE editor, included in the standard library, is an example of a + tool which may need some work to correctly apply syntax highlighting to f-strings. + +Here are some of the arguments in favour: + +* Many languages that allow similar syntactic constructs (normally called "string interpolation") allow quote + reuse and arbitrary nesting. These languages include JavaScript, Ruby, C#, Bash, Swift and many others. + The fact that many languages allow quote reuse can be a compelling argument in favour of allowing it in Python. This + is because it will make the language more familiar to users coming from other languages. + +* As many other popular languages allow quote reuse in string interpolation constructs, this means that editors + that support syntax highlighting for these languages will already have the necessary tools to support syntax + highlighting for f-strings with quote reuse in Python. This means that although the files that handle syntax + highlighting for Python will need to be updated to support this new feature, is not expected to be impossible + or very hard to do. + +* One advantage of allowing quote reuse is that it composes cleanly with other syntax. Sometimes this is referred to + as "referential transparency". An example of this is that if we have ``f(x+1)``, assuming ``a`` is a brand new variable, it + should behave the same as ``a = x+1; f(a)``. And vice versa. So if we have:: + + def py2c(source): + prefix = source.removesuffix(".py") + return f"{prefix}.c" + + It should be expected that if we replace the variable ``prefix`` with its definition, the answer should be the same:: + + def py2c(source): + return f"{source.removesuffix(".py")}.c" + +* Limiting quote reuse will considerably increase the complexity of the implementation of the proposed changes. This is because + it will force the parser to have the context that is parsing an expression part of an f-string with a given quote in order + to know if it needs to reject an expression that reuses the quote. Carrying this context around is not trivial in parsers that + can backtrack arbitrarily (such as the PEG parser). The issue becomes even more complex if we consider that f-strings can be + arbitrarily nested and therefore several quote types may need to be rejected. + + To gather feedback from the community, + `a poll `__ + has been initiated to get a sense of how the community feels about this aspect of the PEP. Backwards Compatibility ======================= @@ -370,8 +461,18 @@ A reference implementation can be found in the implementation_ fork. Rejected Ideas ============== +#. Although we think the readability arguments that have been raised against + allowing quote reuse in f-string expressions are valid and very important, + we have decided to propose not rejecting quote reuse in f-strings at the parser + level. The reason is that one of the cornerstones of this PEP is to reduce the + complexity and maintenance of parsing f-strings in CPython and this will not + only work against that goal, but it may even make the implementation even more + complex than the current one. We believe that forbidding quote reuse should be + done in linters and code style tools and not in the parser, the same way other + confusing or hard-to-read constructs in the language are handled today. + #. We have decided not to lift the restriction that some expression portions - need to wrap ``':'`` and ``'!'`` in braces at the top level, e.g.:: + need to wrap ``':'`` and ``'!'`` in parentheses at the top level, e.g.:: >>> f'Useless use of lambdas: { lambda x: x*2 }' SyntaxError: unexpected EOF while parsing @@ -390,7 +491,6 @@ Rejected Ideas be parenthesized if needed:: >>> f'Useless use of lambdas: { (lambda x: x*2) }' - Open Issues ===========