PEP 701: Incorporate feedback from the discussion thread (#2939)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
2022-12-24 21:42:21 +00:00 · 2022-12-24 21:42:21 +00:00 · d547ef7ef4
parent 574e82c2f4
commit d547ef7ef4
1 changed files with 110 additions and 10 deletions
--- a/pep-0701.rst
+++ b/pep-0701.rst
@ -143,9 +143,8 @@ f-string literals (as well as the Python language in general).

    >>> f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}"

-   This choice not only allows for a more consistent and predictable behavior of what can be
-   placed in f-strings but provides an intuitive way to manipulate string literals in a
-   more flexible way without to having to fight the limitations of the implementation.
+   This "feature" is not universally agreed to be desirable, and some users find this unreadable.
+   For a discussion on the different views on this, see the :ref:`701-considerations-of-quote-reuse` section.

 #. Another issue that has felt unintuitive to most is the lack of support for backslashes
   within the expression component of an f-string. One example that keeps coming up is including
@ -223,10 +222,17 @@ for details on the syntax):
        | FSTRING_MIDDLE
        | fstring_replacement_field

+The new tokens (``FSTRING_START``, ``FSTRING_MIDDLE``, ``FSTRING_END``) are defined
+:ref:`later in this document <701-new-tokens>`.
+
 This PEP leaves up to the implementation the level of f-string nesting allowed.
 This means that limiting nesting is **not part of the language specification**
 but also the language specification **doesn't mandate arbitrary nesting**. 

+The new grammar will preserve the Abstract Syntax Tree (AST) of the current
+implementation. This means that no semantic changes will be introduced by this
+PEP on existing code that uses f-strings.
+
 Handling of f-string debug expressions
 --------------------------------------

@ -259,6 +265,8 @@ and not just the associated tokens.
 How parser/lexer implementations deal with this problem is of course up to the
 implementation.

+.. _701-new-tokens:
+
 New tokens
 ----------

@ -277,10 +285,10 @@ better understanding of the proposed grammar changes and how the tokens are used

 These tokens are always string parts and they are semantically equivalent to the
 ``STRING`` token with the restrictions specified. These tokens must be produced by the lexer
-when lexing f-strings.  This means that **the tokenizer cannot produce a single token for f-strings anymore**. How
-the lexer emits this token is **not specified** as this will heavily depend on every
-implementation (even the Python version of the lexer in the standard library is
-implemented differently to the one used by the PEG parser).
+when lexing f-strings.  This means that **the tokenizer cannot produce a single token for f-strings anymore**.
+How the lexer emits this token is **not specified** as this will heavily depend on every
+implementation (even the Python version of the lexer in the standard library is implemented
+differently to the one used by the PEG parser).

 As an example::

@ -308,6 +316,20 @@ while ``f"""some words"""`` will be tokenized simply as::
    FSTRING_START - 'f"""'
    FSTRING_END - 'some words'

+One way existing lexers can be adapted to emit these tokens is to incorporate a stack of "lexer modes"
+or to use a stack of different lexers. This is because the lexer needs to switch from "regular Python
+lexing" to "f-string lexing" when it encounters an f-string start token and as f-strings can be nested,
+the context needs to be preserved until the f-string closes. Also, the "lexer mode" inside an f-string
+expression part needs to behave as a "super-set" of the regular Python lexer (as it needs to be able to 
+switch back to f-string lexing when it encounters the ``}`` terminator for the expression part as well
+as handling f-string formatting and debug expressions). Of course, as mentioned before, is not possible to
+provide a precise specification of how this should be done as it will depend on the specific implementation
+and nature of the lexer to be changed.
+
+The specifics of how (or if) the ``tokenize`` module will emit these tokens (or others) and what
+is included in the emitted tokens are left out of this document and must be decided later in a regular
+CPython issue.
+
 Consequences of the new grammar
 -------------------------------

@ -320,7 +342,76 @@ All restrictions mentioned in the PEP are lifted from f-string literals, as expl
  expanded when the innermost string is evaluated.
 * Comments, using the ``#`` character, are possible only in multi-line f-string literals,
  since comments are terminated by the end of the line (which makes closing a
-  single-line f-string literal impossible)
+  single-line f-string literal impossible). Comments in multi-line f-string literals require
+  the closing ``{`` of the expression part to be present in a different line as the one the
+  comment is in.
+
+.. _701-considerations-of-quote-reuse:
+
+Considerations regarding quote reuse
+------------------------------------
+
+One of the consequences of the grammar proposed here is that, as mentioned above,
+f-string expressions can now contain strings delimited with the same kind of quote
+that is used to delimit the external f-string literal. For example:
+
+    >>> f" something { my_dict["key"] } something else "
+
+In the `discussion thread for this PEP <https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046>`_,
+several concerns have been raised regarding this aspect and we want to collect them here,
+as these should be taken into consideration when accepting or rejecting this PEP.
+
+Some of these objections include:
+
+* Many people find quote reuse withing the same string confusing and hard to read. This is because
+  allowing quote reuse will violate a current property of Python as it stands today: the fact that
+  strings are fully delimited by two consecutive pairs of the same kind of quote, which by itself is a very simple rule.
+  One of the reasons quote reuse may be harder for humans to parse, leading to less readable
+  code, is that the quote character is the same for both start and
+  end (as opposed to other delimiters). 
+
+* Some users have raised concerns that quote reuse may break some lexer and syntax highlighting tools that rely
+  on simple mechanisms to detect strings and f-strings, such as regular expressions or simple delimiter
+  matching tools. Introducing quote reuse in f-strings will either make it trickier to keep these tools
+  working or will break the tools altogether (as, for instance, regular expressions cannot parse arbitrary nested
+  structures with delimiters). The IDLE editor, included in the standard library, is an example of a
+  tool which may need some work to correctly apply syntax highlighting to f-strings.
+
+Here are some of the arguments in favour:
+
+* Many languages that allow similar syntactic constructs (normally called "string interpolation") allow quote
+  reuse and arbitrary nesting. These languages include JavaScript, Ruby, C#, Bash, Swift and many others.
+  The fact that many languages allow quote reuse can be a compelling argument in favour of allowing it in Python. This
+  is because it will make the language more familiar to users coming from other languages. 
+
+* As many other popular languages allow quote reuse in string interpolation constructs, this means that editors
+  that support syntax highlighting for these languages will already have the necessary tools to support syntax
+  highlighting for f-strings with quote reuse in Python. This means that although the files that handle syntax
+  highlighting for Python will need to be updated to support this new feature, is not expected to be impossible
+  or very hard to do.
+
+* One advantage of allowing quote reuse is that it composes cleanly with other syntax. Sometimes this is referred to
+  as "referential transparency". An example of this is that if we have ``f(x+1)``, assuming ``a`` is a brand new variable, it
+  should behave the same as ``a = x+1; f(a)``. And vice versa.  So if we have::
+
+    def py2c(source):
+        prefix = source.removesuffix(".py")
+        return f"{prefix}.c"
+
+  It should be expected that if we replace the variable ``prefix`` with its definition, the answer should be the same::
+
+    def py2c(source):
+        return f"{source.removesuffix(".py")}.c"
+
+* Limiting quote reuse will considerably increase the complexity of the implementation of the proposed changes. This is because
+  it will force the parser to have the context that is parsing an expression part of an f-string with a given quote in order
+  to know if it needs to reject an expression that reuses the quote. Carrying this context around is not trivial in parsers that
+  can backtrack arbitrarily (such as the PEG parser). The issue becomes even more complex if we consider that f-strings can be
+  arbitrarily nested and therefore several quote types may need to be rejected.
+
+  To gather feedback from the community,
+  `a poll <https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046/24>`__
+  has been initiated to get a sense of how the community feels about this aspect of the PEP.

 Backwards Compatibility
 =======================
@ -370,8 +461,18 @@ A reference implementation can be found in the implementation_ fork.
 Rejected Ideas
 ==============

+#. Although we think the readability arguments that have been raised against
+   allowing quote reuse in f-string expressions are valid and very important,
+   we have decided to propose not rejecting quote reuse in f-strings at the parser
+   level. The reason is that one of the cornerstones of this PEP is to reduce the
+   complexity and maintenance of parsing f-strings in CPython and this will not
+   only work against that goal, but it may even make the implementation even more
+   complex than the current one. We believe that forbidding quote reuse should be
+   done in linters and code style tools and not in the parser, the same way other
+   confusing or hard-to-read constructs in the language are handled today.
+
 #. We have decided not to lift the restriction that some expression portions
-   need to wrap ``':'`` and ``'!'`` in braces at the top level, e.g.::
+   need to wrap ``':'`` and ``'!'`` in parentheses at the top level, e.g.::

    >>> f'Useless use of lambdas: { lambda x: x*2 }'
    SyntaxError: unexpected EOF while parsing
@ -390,7 +491,6 @@ Rejected Ideas
   be parenthesized if needed::

    >>> f'Useless use of lambdas: { (lambda x: x*2) }'
-  

 Open Issues
 ===========