From b1d591f26f5f0458155665b5da6bb91f03c004c3 Mon Sep 17 00:00:00 2001 From: Jim Baker Date: Fri, 9 Aug 2024 08:49:09 -0600 Subject: [PATCH] PEP 750: Tag Strings For Writing Domain-Specific Languages (#3858) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Lysandros Nikolaou Co-authored-by: Jelle Zijlstra Co-authored-by: Paul Everitt Co-authored-by: Carol Willing --- .github/CODEOWNERS | 1 + peps/pep-0750.rst | 891 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 892 insertions(+) create mode 100644 peps/pep-0750.rst diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index bb5213dce..dfd387325 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -629,6 +629,7 @@ peps/pep-0747.rst @JelleZijlstra # ... peps/pep-0749.rst @JelleZijlstra # ... +peps/pep-0750.rst @gvanrossum @lysnikolaou peps/pep-0751.rst @brettcannon # ... # peps/pep-0754.rst diff --git a/peps/pep-0750.rst b/peps/pep-0750.rst new file mode 100644 index 000000000..03dfc932c --- /dev/null +++ b/peps/pep-0750.rst @@ -0,0 +1,891 @@ +PEP: 750 +Title: Tag Strings For Writing Domain-Specific Languages +Author: Jim Baker , Guido van Rossum , Paul Everitt +Sponsor: Lysandros Nikolaou +Status: Draft +Type: Standards Track +Created: 08-Jul-2024 +Python-Version: 3.14 + +Abstract +======== + +This PEP introduces tag strings for custom, repeatable string processing. Tag strings +are an extension to f-strings, with a custom function -- the "tag" -- in place of the +``f`` prefix. This function can then provide rich features such as safety checks, lazy +evaluation, domain-specific languages (DSLs) for web templating, and more. + +Tag strings are similar to `JavaScript tagged template literals `_ +and related ideas in other languages. The following tag string usage shows how similar it is to an ``f`` string, albeit +with the ability to process the literal string and embedded values: + +.. code-block:: python + + name = "World" + greeting = greet"hello {name}" + assert greeting == "Hello WORLD!" + + +Tag functions accept prepared arguments and return a string: + +.. code-block:: python + + def greet(*args): + """Tag function to return a greeting with an upper-case recipient.""" + salutation, recipient, *_ = args + _, getvalue = recipient + return f"{salutation.title().strip()} {getvalue().upper()}!" + +Below you can find richer examples. As a note, an implementation based on CPython 3.12 +exists, as discussed in this document. + +Relationship With Other PEPs +============================ + +Python introduced f-strings in Python 3.6 with :pep:`498`. The grammar was +then formalized in :pep:`701` which also lifted some restrictions. This PEP +is based off of PEP 701. + +At nearly the same time PEP 498 arrived, :pep:`501` was written to provide +"i-strings" -- that is, "interpolation template strings". The PEP was +deferred pending further experience with f-strings. Work on this PEP was +resumed by a different author in March 2023, introducing "t-strings" as template +literal strings, and built atop PEP 701. + +The authors of this PEP consider tag strings as a generalization of the +updated work in PEP 501. + +Motivation +========== + +Python f-strings became very popular, very fast. The syntax was simple, convenient, and +interpolated expressions had access to regular scoping rules. However, f-strings have +two main limitations - expressions are eagerly evaluated, and interpolated values +cannot be intercepted. The former means that f-strings cannot be re-used like templates, +and the latter means that how values are interpolated cannot be customized. + +Templating in Python is currently achieved using packages like Jinja2 which bring their +own templating languages for generating dynamic content. In addition to being one more +thing to learn, these languages are not nearly as expressive as Python itself. This +means that business logic, which cannot be expressed in the templating language, must be +written in Python instead, spreading the logic across different languages and files. + +Likewise, the inability to intercept interpolated values means that they cannot be +sanitized or otherwise transformed before being integrated into the final string. Here, +the convenience of f-strings could be considered a liability. For example, a user +executing a query with `sqlite3 `__ +may be tempted to use an f-string to embed values into their SQL expression instead of +using the ``?`` placeholder and passing the values as a tuple to avoid an +`SQL injection attack `__. + +Tag strings address both these problems by extending the f-string syntax to provide +developers access to the string and its interpolated values before they are combined. In +doing so, tag strings may be interpreted in many different ways, opening up the +possibility for DSLs and other custom string processing. + +Proposal +======== + +This PEP proposes customizable prefixes for f-strings. These f-strings then +become a "tag string": an f-string with a "tag function." The tag function is +a callable which is given a sequence of arguments for the parsed tokens in +the string. + +Here's a very simple example. Imagine we want a certain kind of string with +some custom business policies: uppercase the value and add an exclamation point. + +Let's start with a tag string which simply returns a static greeting: + +.. code-block:: python + + def greet(*args): + """Give a static greeting.""" + return "Hello!" + + assert greet"Hi" == "Hello!" # Use the custom "tag" on the string + +As you can see, ``greet`` is just a callable, in the place that the ``f`` +prefix would go. Let's look at the args: + +.. code-block:: python + + def greet(*args): + """Uppercase and add exclamation.""" + salutation = args[0].upper() + return f"{salutation}!" + + greeting = greet"Hello" # Use the custom "tag" on the string + assert greeting == "HELLO!" + +The tag function is passed a sequence of arguments. Since our tag string is simply +``"Hello"``, the ``args`` sequence only contains a string-like value of ``'Hello'``. + +With this in place, let's introduce an *interpolation*. That is, a place where +a value should be inserted: + +.. code-block:: python + + def greet(*args): + """Handle an interpolation.""" + # The first arg is the string-like value "Hello " with a space + salutation = args[0].strip() + # The second arg is an "interpolation" + interpolation = args[1] + # Interpolations are tuples, the first item is a lambda + getvalue = interpolation[0] + # It gets called in the scope where it was defined, so + # the interpolation returns "World" + result = getvalue() + recipient = result.upper() + return f"{salutation} {recipient}!" + + name = "World" + greeting = greet"Hello {name}" + assert greeting == "Hello WORLD!" + +The f-string interpolation of ``{name}`` leads to the new machinery in tag +strings: + +- ``args[0]`` is still the string-like ``'Hello '``, this time with a trailing space +- ``args[1]`` is an expression -- the ``{name}`` part +- Tag strings represent this part as an *interpolation* object as discussed below + +The ``*args`` list is a sequence of ``Decoded`` and ``Interpolation`` values. A "decoded" object +is a string-like object with extra powers, as described below. An "interpolation" object is a +tuple-like value representing how Python processed the interpolation into a form useful for your +tag function. Both are fully described below in `Specification`_. + +Here is a more generalized version using structural pattern matching and type hints: + +.. code-block:: python + + from typing import Decoded, Interpolation # Get the new protocols + + def greet(*args: Decoded | Interpolation) -> str: + """Handle arbitrary args using structural pattern matching.""" + result = [] + for arg in args: + match arg: + case Decoded() as decoded: + result.append(decoded) + case Interpolation() as interpolation: + value = interpolation.getvalue() + result.append(value.upper()) + + return f"{''.join(result)}!" + + name = "World" + greeting = greet"Hello {name} nice to meet you" + assert greeting == "Hello WORLD nice to meet you!" + +Tag strings extract more than just a callable from the ``Interpolation``. They also +provide Python string formatting info, as well as the original text: + +.. code-block:: python + + def greet(*args: Decoded | Interpolation) -> str: + """Interpolations can have string formatting specs and conversions.""" + result = [] + for arg in args: + match arg: + case Decoded() as decoded: + result.append(decoded) + case getvalue, raw, conversion, format_spec: # Unpack + gv = f"gv: {getvalue()}" + r = f"r: {raw}" + c = f"c: {conversion}" + f = f"f: {format_spec}" + result.append(", ".join([gv, r, c, f])) + + return f"{''.join(result)}!" + + name = "World" + assert greet"Hello {name!r:s}" == "Hello gv: World, r: name, c: r, f: s!" + +You can see each of the ``Interpolation`` parts getting extracted: + +- The lambda expression to call and get the value in the scope it was defined +- The raw string of the interpolation (``name``) +- The Python "conversion" field (``s``) +- Any `format specification `_ + (``r``) + +Specification +============= + +In the rest of this specification, ``my_tag`` will be used for an arbitrary tag. +For example: + +.. code-block:: python + + def mytag(*args): + return args + + trade = 'shrubberies' + mytag'Did you say "{trade}"?' + +Valid Tag Names +--------------- + +The tag name can be any undotted name that isn't already an existing valid string or +bytes prefix, as seen in the `lexical analysis specification +`_. +Therefore these prefixes can't be used as a tag: + +.. code-block:: text + + stringprefix: "r" | "u" | "R" | "U" | "f" | "F" + : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" + + bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" + +Python `restricts certain keywords `_ from being +used as identifiers. This restriction also applies to tag names. Usage of keywords should +trigger a helpful error, as done in recent CPython releases. + +Tags Must Immediately Precede the Quote Mark +-------------------------------------------- + +As with other string literal prefixes, no whitespace can be between the tag and the +quote mark. + +PEP 701 +------- + +Tag strings support the full syntax of :pep:`701` in that any string literal, +with any quote mark, can be nested in the interpolation. This nesting includes +of course tag strings. + +Evaluating Tag Strings +---------------------- + +When the tag string is evaluated, the tag must have a binding, or a ``NameError`` +is raised; and it must be a callable, or a ``TypeError`` is raised. The callable +must accept a sequence of positional arguments. This behavior follows from the +de-sugaring of: + +.. code-block:: python + + trade = 'shrubberies' + mytag'Did you say "{trade}"?' + +to: + +.. code-block:: python + + mytag(DecodedConcrete(r'Did you say "'), InterpolationConcrete(lambda: trade, 'trade', None, None), DecodedConcrete(r'"?')) + +.. note:: + + `DecodedConcrete` and `InterpolationConcrete` are just example implementations. If approved, + tag strings will have concrete types in `builtins`. + +Decoded Strings +--------------- + +In the ``mytag'Did you say "{trade}"?'`` example, there are two strings: ``r'Did you say "'`` +and ``r'"?'``. + +Strings are internally stored as objects with a ``Decoded`` structure, meaning: conforming to +a protocol ``Decoded``: + +.. code-block:: python + + @runtime_checkable + class Decoded(Protocol): + def __str__(self) -> str: + ... + + raw: str + + +These ``Decoded`` objects have access to raw strings. Raw strings are used because tag strings +are meant to target a variety of DSLs, such as the shell and regexes. Such DSLs have their +own specific treatment of metacharacters, namely the backslash. + +However, often the "cooked" string is what is needed, by decoding the string as +if it were a standard Python string. In the proposed implementation, the decoded object's +``__new__`` will *store* the raw string and *store and return* the "cooked" string. + +The protocol is marked as ``@runtime_checkable`` to allow structural pattern matching to +test against the protocol instead of a type. This can incur a small performance penalty. +Since the ``case`` tests are in user-code tag functions, authors can choose to optimize by +testing for the implementation type discussed next. + +The ``Decoded`` protocol will be available from ``typing``. In CPython, ``Decoded`` +will be implemented in C, but for discussion of this PEP, the following is a compatible +implementation: + +.. code-block:: python + + class DecodedConcrete(str): + _raw: str + + def __new__(cls, raw: str): + decoded = raw.encode("utf-8").decode("unicode-escape") + if decoded == raw: + decoded = raw + chunk = super().__new__(cls, decoded) + chunk._raw = raw + return chunk + + @property + def raw(self): + return self._raw + +Interpolation +------------- + +An ``Interpolation`` is the data structure representing an expression inside the tag +string. Interpolations enable a delayed evaluation model, where the interpolation +expression is computed, transformed, memoized, or processed in any way. + +In addition, the original text of the interpolation expression is made available to the +tag function. This can be useful for debugging or metaprogramming. + +``Interpolation`` is a ``Protocol`` which will be made available from ``typing``. It +has the following definition: + +.. code-block:: python + + @runtime_checkable + class Interpolation(Protocol): + def __len__(self): + ... + + def __getitem__(self, index: int): + ... + + def getvalue(self) -> Callable[[], Any]: + ... + + expr: str + conv: Literal["a", "r", "s"] | None + format_spec: str | None + +Given this example interpolation: + +.. code-block:: python + + mytag'{trade!r:some-formatspec}' + +these attributes are as follows: + +* ``getvalue`` is a zero argument closure for the interpolation. In this case, ``lambda: trade``. + +* ``expr`` is the *expression text* of the interpolation. Example: ``'trade'``. + +* ``conv`` is the + `optional conversion `_ + to be used by the tag function, one of ``r``, ``s``, and ``a``, corresponding to repr, str, + and ascii conversions. Note that as with f-strings, no other conversions are supported. + Example: ``'r'``. + +* ``format_spec`` is the optional `format_spec string `_. + A ``format_spec`` is eagerly evaluated if it contains any expressions before being passed to the tag + function. Example: ``'some-formatspec'``. + +In all cases, the tag function determines what to do with valid ``Interpolation`` +attributes. + +In the CPython reference implementation, implementing ``Interpolation`` in C would +use the equivalent `Struct Sequence Objects +`_ (see +such code as `os.stat_result +`_). For purposes of this +PEP, here is an example of a pure Python implementation: + +.. code-block:: python + + class InterpolationConcrete(NamedTuple): + getvalue: Callable[[], Any] + expr: str + conv: Literal['a', 'r', 's'] | None = None + format_spec: str | None = None + +Interpolation Expression Evaluation +----------------------------------- + +Expression evaluation for interpolations is the same as in :pep:`498#expression-evaluation`, +except that all expressions are always implicitly wrapped with a ``lambda``: + + The expressions that are extracted from the string are evaluated in the context + where the tag string appeared. This means the expression has full access to its + lexical scope, including local and global variables. Any valid Python expression + can be used, including function and method calls. + +However, there's one additional nuance to consider, `function scope +`_ +versus `annotation scope +`_. +Consider this somewhat contrived example to configure captions: + +.. code-block:: python + + class CaptionConfig: + tag = 'b' + figure = f'<{tag}>Figure' + +Let's now attempt to rewrite the above example to use tag strings: + +.. code-block:: python + + class CaptionConfig: + tag = 'b' + figure = html'<{tag}>Figure' + +Unfortunately, this rewrite doesn't work if using the usual lambda wrapping to +implement interpolations, namely ``lambda: tag``. When the interpolations are +evaluated by the tag function, it will result in ``NameError: name 'tag' is not +defined``. The root cause of this name error is that ``lambda: tag`` uses function scope, +and it's therefore not able to use the class definition where ``tag`` is +defined. + +Desugaring how the tag string could be evaluated will result in the same +``NameError`` even using f-strings; the lambda wrapping here also uses function +scoping: + +.. code-block:: python + + class CaptionConfig: + tag = 'b' + figure = f'<{(lambda: tag)()}>Figure' + +For tag strings, getting such a ``NameError`` would be surprising. It would also +be a rough edge in using tag strings in this specific case of working with class +variables. After all, tag strings are supposed to support a superset of the +capabilities of f-strings. + +The solution is to use annotation scope for tag string interpolations. While the +name "annotation scope" suggests it's only about annotations, it solves this +problem by lexically resolving names in the class definition, such as ``tag``, +unlike function scope. + +.. note:: + + The use of annotation scope means it's not possible to fully desugar + interpolations into Python code. Instead it's as if one is writing + ``interpolation_lambda: tag``, not ``lambda: tag``, where a hypothetical + ``interpolation_lambda`` keyword variant uses annotation scope instead of + the standard function scope. + + This is more or less how the reference implementation implements this + concept (but without creating a new keyword of course). + +This PEP and its reference implementation therefore use the support for +annotation scope. Note that this usage is a separable part from the +implementation of :pep:`649` and :pep:`695` which provides a somewhat similar +deferred execution model for annotations. Instead it's up to the tag function to +evaluate any interpolations. + +With annotation scope in place, lambda-wrapped expressions in interpolations +then provide the usual lexical scoping seen with f-strings. So there's no need +to use ``locals()``, ``globals()``, or frame introspection with +``sys._getframe`` to evaluate the interpolation. In addition, the code of each +expression is available and does not have to be looked up with +``inspect.getsource`` or some other means. + +Format Specification +-------------------- + +The ``format_spec`` is by default ``None`` if it is not specified in the tag string's +corresponding interpolation. + +Because the tag function is completely responsible for processing ``Decoded`` +and ``Interpolation`` values, there is no required interpretation for the format +spec and conversion in an interpolation. For example, this is a valid usage: + +.. code-block:: python + + html'
{content:HTML|str}
' + +In this case the ``format_spec`` for the second interpolation is the string +``'HTML|str'``; it is up to the ``html`` tag to do something with the +"format spec" here, if anything. + +f-string-style ``=`` Evaluation +------------------------------- + +``mytag'{expr=}'`` is parsed to being the same as ``mytag'expr={expr}``', as +implemented in the issue `Add = to f-strings for +easier debugging `_. + +Tag Function Arguments +---------------------- + +The tag function has the following signature: + +.. code-block:: python + + def mytag(*args: Decoded | Interpolation) -> Any: + ... + +This corresponds to the following protocol: + +.. code-block:: python + + class TagFunction(Protocol): + def __call__(self, *args: Decoded | Interpolation) -> Any: + ... + +Because of subclassing, the signature for ``mytag`` can of course be widened to +the following, at the cost of losing some type specificity: + +.. code-block:: python + + def mytag(*args: str | tuple) -> Any: + ... + +A user might write a tag string as follows: + +.. code-block:: python + + def tag(*args): + return args + + tag"\N{{GRINNING FACE}}" + +Tag strings will represent this as exactly one ``Decoded`` argument. In this case, ``Decoded.raw`` would be +``'\\N{GRINNING FACE}'``. The "cooked" representation via encode and decode would be: + +.. code-block:: python + + '\\N{GRINNING FACE}'.encode('utf-8').decode('unicode-escape') + '😀' + +Named unicode characters immediately followed by more text will still produce +just one ``Decoded`` argument: + +.. code-block:: python + + def tag(*args): + return args + + assert tag"\N{{GRINNING FACE}}sometext" == (DecodedConcrete("😀sometext"),) + + +Return Value +------------ + +Tag functions can return any type. Often they will return a string, but +richer systems can be built by returning richer objects. See below for +a motivating example. + +Function Application +-------------------- + +Tag strings desugar as follows: + +.. code-block:: python + + mytag'Hi, {name!s:format_spec}!' + +This is equivalent to: + +.. code-block:: python + + mytag(DecodedConcrete(r'Hi, '), InterpolationConcrete(lambda: name, 'name', + 's', 'format_spec'), DecodedConcrete(r'!')) + +.. note:: + + To keep it simple, this and subsequent desugaring omits an important scoping + aspect in how names in interpolation expressions are resolved, specifically + when defining classes. See `Interpolation Expression Evaluation`_. + +No Empty Decoded String +----------------------- + +Alternation between decodeds and interpolations is commonly seen, but it depends +on the tag string. Decoded strings will never have a value that is the empty string: + +.. code-block:: python + + mytag'{a}{b}{c}' + +...which results in this desugaring: + +.. code-block:: python + + mytag(InterpolationConcrete(lambda: a, 'a', None, None), InterpolationConcrete(lambda: b, 'b', None, None), InterpolationConcrete(lambda: c, 'c', None, None)) + +Likewise: + +.. code-block:: python + + mytag'' + +...results in this desugaring: + +.. code-block:: python + + mytag() + +HTML Example of Rich Return Types +================================= + +Tag functions can be a powerful part of larger processing chains by returning richer objects. +JavaScript tagged template literals, for example, are not constrained by a requirement to +return a string. As an example, let's look at an HTML generation system, with a usage and +"subcomponent": + +.. code-block:: + + def Menu(*, logo: str, class_: str) -> HTML: + return html'Site Logo' + + icon = 'acme.png' + result = html'
<{Menu} logo={icon} class="my-menu"/>
' + img = result.children[0] + assert img.tag == "img" + assert img.attrs == {"src": "acme.png", "class": "my-menu", "alt": "Site Logo"} + # We can also treat the return type as a string of specially-serialized HTML + assert str(result) = '
' # etc. + +This ``html`` tag function might have the following signature: + +.. code-block:: python + + def html(*args: Decoded | Interpolation) -> HTML: + ... + +The ``HTML`` return class might have the following shape as a ``Protocol``: + +.. code-block:: python + + @runtime_checkable + class HTML(Protocol): + tag: str + attrs: dict[str, Any] + children: Sequence[str | HTML] + +In summary, the returned instance can be used as: + +- A string, for serializing to the final output +- An iterable, for working with WSGI/ASGI for output streamed and evaluated + interpolations *in the order* they are written out +- A DOM (data) structure of nested Python data + +In each case, the result can be lazily and recursively composed in a safe fashion, because +the return value isn't required to be a string. Recommended practice is that +return values are "passive" objects. + +What benefits might come from returning rich objects instead of strings? A DSL for +a domain such as HTML templating can provide a toolchain of post-processing, as +`Babel `_ does for JavaScript +`with AST-based transformation plugins `_. +Similarly, systems that provide middleware processing can operate on richer, +standard objects with more capabilities. Tag string results can be tested as +nested Python objects, rather than string manipulation. Finally, the intermediate +results can be cached/persisted in useful ways. + +Tool Support +============ + +Python Semantics in Tag Strings +------------------------------- + +Python template languages and other DSLs have semantics quite apart from Python. +Different scope rules, different calling semantics e.g. for macros, their own +grammar for loops, and the like. + +This means all tools need to write special support for each language. Even then, +it is usually difficult to find all the possible scopes, for example to autocomplete +values. + +However, f-strings do not have this issue. An f-string is considered part of Python. +Expressions in curly braces behave as expected and values should resolve based on +regular scoping rules. Tools such as mypy can see inside f-string expressions, +but will likely never look inside a Jinja2 template. + +DSLs written with tag strings will inherit much of this value. While we can't expect +standard tooling to understand the "domain" in the DSL, they can still inspect +anything expressible in an f-string. + +Backwards Compatibility +======================= + +Like f-strings, use of tag strings will be a syntactic backwards incompatibility +with previous versions. + +Security Implications +===================== + +The security implications of working with interpolations, with respect to +interpolations, are as follows: + +1. Scope lookup is the same as f-strings (lexical scope). This model has been + shown to work well in practice. + +2. Tag functions can ensure that any interpolations are done in a safe fashion, + including respecting the context in the target DSL. + +How To Teach This +================= + +Tag strings have several audiences: consumers of tag functions, authors of tag +functions, and framework authors who provide interesting machinery for tag +functions. + +All three groups can start from an important framing: + +- Existing solutions (such as template engines) can do parts of tag strings +- But tag strings move logic closer to "normal Python" + +Consumers can look at tag strings as starting from f-strings: + +- They look familiar +- Scoping and syntax rules are the same + +They first thing they need to absorb: unlike f-strings, the string isn't +immediately evaluated "in-place". Something else (the tag function) happens. +That's the second thing to teach: the tag functions do something particular. +Thus the concept of "domain specific languages" (DSLs). What's extra to +teach: you need to import the tag function before tagging a string. + +Tag function authors think in terms of making a DSL. They have +business policies they want to provide in a Python-familiar way. With tag +functions, Python is going to do much of the pre-processing. This lowers +the bar for making a DSL. + +Tag authors can begin with simple use cases. After authors gain experience, tag strings can be used to add larger +patterns: lazy evaluation, intermediate representations, registries, and more. + +Each of these points also match the teaching of decorators. In that case, +a learner consumes something which applies to the code just after it. They +don't need to know too much about decorator theory to take advantage of the +utility. + +Common Patterns Seen In Writing Tag Functions +============================================= + +Structural Pattern Matching +--------------------------- + +Iterating over the arguments with structural pattern matching is the expected +best practice for many tag function implementations: + +.. code-block:: python + + def tag(*args: Decoded | Interpolation) -> Any: + for arg in args: + match arg: + case Decoded() as decoded: + ... # handle each decoded string + case Interpolation() as interpolation: + ... # handle each interpolation + +Lazy Evaluation +--------------- + +The example tag functions above each call the interpolation's ``getvalue`` lambda +immediately. Python developers have frequently wished that f-strings could be +deferred, or lazily evaluated. It would be straightforward to write a wrapper that, +for example, defers calling the lambda until an ``__str__`` was invoked. + +Memoizing +--------- + +Tag function authors have control of processing the static string parts and +the dynamic interpolation parts. For higher performance, they can deploy approaches +for memoizing processing, for example by generating keys. + +Order of Evaluation +------------------- + +Imagine a tag that generates a number of sections in HTML. The tag needs inputs for each +section. But what if the last input argument takes a while? You can't return the HTML for +the first section until all the arguments are available. + +You'd prefer to emit markup as the inputs are available. Some templating tools support +this approach, as does tag strings. + +Reference Implementation +======================== + +At the time of this PEP's announcement, a fully-working implementation is +`available `_. + +This implementation is not final, as the PEP discussion will likely provide changes. + +Rejected Ideas +============== + + +Enable Exact Round-Tripping of ``conv`` and ``format_spec`` +----------------------------------------------------------- + +There are two limitations with respect to exactly round-tripping to the original +source text. + +First, the ``format_spec`` can be arbitrarily nested: + +.. code-block:: python + + mytag'{x:{a{b{c}}}}' + +In this PEP and corresponding reference implementation, the format_spec +is eagerly evaluated to set the ``format_spec`` in the interpolation, thereby losing the +original expressions. + +While it would be feasible to preserve round-tripping in every usage, this would +require an extra flag ``equals`` to support, for example, ``{x=}``, and a +recursive ``Interpolation`` definition for ``format_spec``. The following is roughly the +pure Python equivalent of this type, including preserving the sequence +unpacking (as used in case statements): + +.. code-block:: python + + class InterpolationConcrete(NamedTuple): + getvalue: Callable[[], Any] + raw: str + conv: str | None = None + format_spec: str | None | tuple[Decoded | Interpolation, ...] = None + equals: bool = False + + def __len__(self): + return 4 + + def __iter__(self): + return iter((self.getvalue, self.raw, self.conv, self.format_spec)) + +However, the additional complexity to support exact round-tripping seems +unnecessary and is thus rejected. + +No Implicit String Concatenation +-------------------------------- + +Implicit tag string concatenation isn't supported, which is `unlike other string literals +`_. + +The expectation is that triple quoting is sufficient. If implicit string +concatenation is supported, results from tag evaluations would need to +support the ``+`` operator with ``__add__`` and ``__radd__``. + +Because tag strings target embedded DSLs, this complexity introduces other +issues, such as determining appropriate separators. This seems unnecessarily +complicated and is thus rejected. + +Arbitrary Conversion Values +--------------------------- + +Python allows only ``r``, ``s``, or ``a`` as possible conversion type values. +Trying to assign a different value results in ``SyntaxError``. + +In theory, tag functions could choose to handle other conversion types. But this +PEP adheres closely to :pep:`701`. Any changes to allowed values should be in a +separate PEP. + +Acknowledgements +================ + +Thanks to Ryan Morshead for contributions during development of the ideas leading +to tag strings. Thanks also to Koudai Aono for infrastructure work on contributing +materials. Special mention also to Dropbox's `pyxl `_ +as tackling similar ideas years ago. + +Copyright +========= + +This document is placed in the public domain or under the CC0-1.0-Universal +license, whichever is more permissive.