PEP 750: Tag Strings For Writing Domain-Specific Languages (#3858)

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Paul Everitt <pauleveritt@me.com>
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
This commit is contained in:
Jim Baker 2024-08-09 08:49:09 -06:00 committed by GitHub
parent a9242b65d4
commit b1d591f26f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 892 additions and 0 deletions

1
.github/CODEOWNERS vendored
View File

@ -629,6 +629,7 @@ peps/pep-0747.rst @JelleZijlstra
# ...
peps/pep-0749.rst @JelleZijlstra
# ...
peps/pep-0750.rst @gvanrossum @lysnikolaou
peps/pep-0751.rst @brettcannon
# ...
# peps/pep-0754.rst

891
peps/pep-0750.rst Normal file
View File

@ -0,0 +1,891 @@
PEP: 750
Title: Tag Strings For Writing Domain-Specific Languages
Author: Jim Baker <jim.baker@python.org>, Guido van Rossum <guido@python.org>, Paul Everitt <pauleveritt@me.com>
Sponsor: Lysandros Nikolaou <lisandrosnik@gmail.com>
Status: Draft
Type: Standards Track
Created: 08-Jul-2024
Python-Version: 3.14
Abstract
========
This PEP introduces tag strings for custom, repeatable string processing. Tag strings
are an extension to f-strings, with a custom function -- the "tag" -- in place of the
``f`` prefix. This function can then provide rich features such as safety checks, lazy
evaluation, domain-specific languages (DSLs) for web templating, and more.
Tag strings are similar to `JavaScript tagged template literals <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates>`_
and related ideas in other languages. The following tag string usage shows how similar it is to an ``f`` string, albeit
with the ability to process the literal string and embedded values:
.. code-block:: python
name = "World"
greeting = greet"hello {name}"
assert greeting == "Hello WORLD!"
Tag functions accept prepared arguments and return a string:
.. code-block:: python
def greet(*args):
"""Tag function to return a greeting with an upper-case recipient."""
salutation, recipient, *_ = args
_, getvalue = recipient
return f"{salutation.title().strip()} {getvalue().upper()}!"
Below you can find richer examples. As a note, an implementation based on CPython 3.12
exists, as discussed in this document.
Relationship With Other PEPs
============================
Python introduced f-strings in Python 3.6 with :pep:`498`. The grammar was
then formalized in :pep:`701` which also lifted some restrictions. This PEP
is based off of PEP 701.
At nearly the same time PEP 498 arrived, :pep:`501` was written to provide
"i-strings" -- that is, "interpolation template strings". The PEP was
deferred pending further experience with f-strings. Work on this PEP was
resumed by a different author in March 2023, introducing "t-strings" as template
literal strings, and built atop PEP 701.
The authors of this PEP consider tag strings as a generalization of the
updated work in PEP 501.
Motivation
==========
Python f-strings became very popular, very fast. The syntax was simple, convenient, and
interpolated expressions had access to regular scoping rules. However, f-strings have
two main limitations - expressions are eagerly evaluated, and interpolated values
cannot be intercepted. The former means that f-strings cannot be re-used like templates,
and the latter means that how values are interpolated cannot be customized.
Templating in Python is currently achieved using packages like Jinja2 which bring their
own templating languages for generating dynamic content. In addition to being one more
thing to learn, these languages are not nearly as expressive as Python itself. This
means that business logic, which cannot be expressed in the templating language, must be
written in Python instead, spreading the logic across different languages and files.
Likewise, the inability to intercept interpolated values means that they cannot be
sanitized or otherwise transformed before being integrated into the final string. Here,
the convenience of f-strings could be considered a liability. For example, a user
executing a query with `sqlite3 <https://docs.python.org/3/library/sqlite3.html>`__
may be tempted to use an f-string to embed values into their SQL expression instead of
using the ``?`` placeholder and passing the values as a tuple to avoid an
`SQL injection attack <https://en.wikipedia.org/wiki/SQL_injection>`__.
Tag strings address both these problems by extending the f-string syntax to provide
developers access to the string and its interpolated values before they are combined. In
doing so, tag strings may be interpreted in many different ways, opening up the
possibility for DSLs and other custom string processing.
Proposal
========
This PEP proposes customizable prefixes for f-strings. These f-strings then
become a "tag string": an f-string with a "tag function." The tag function is
a callable which is given a sequence of arguments for the parsed tokens in
the string.
Here's a very simple example. Imagine we want a certain kind of string with
some custom business policies: uppercase the value and add an exclamation point.
Let's start with a tag string which simply returns a static greeting:
.. code-block:: python
def greet(*args):
"""Give a static greeting."""
return "Hello!"
assert greet"Hi" == "Hello!" # Use the custom "tag" on the string
As you can see, ``greet`` is just a callable, in the place that the ``f``
prefix would go. Let's look at the args:
.. code-block:: python
def greet(*args):
"""Uppercase and add exclamation."""
salutation = args[0].upper()
return f"{salutation}!"
greeting = greet"Hello" # Use the custom "tag" on the string
assert greeting == "HELLO!"
The tag function is passed a sequence of arguments. Since our tag string is simply
``"Hello"``, the ``args`` sequence only contains a string-like value of ``'Hello'``.
With this in place, let's introduce an *interpolation*. That is, a place where
a value should be inserted:
.. code-block:: python
def greet(*args):
"""Handle an interpolation."""
# The first arg is the string-like value "Hello " with a space
salutation = args[0].strip()
# The second arg is an "interpolation"
interpolation = args[1]
# Interpolations are tuples, the first item is a lambda
getvalue = interpolation[0]
# It gets called in the scope where it was defined, so
# the interpolation returns "World"
result = getvalue()
recipient = result.upper()
return f"{salutation} {recipient}!"
name = "World"
greeting = greet"Hello {name}"
assert greeting == "Hello WORLD!"
The f-string interpolation of ``{name}`` leads to the new machinery in tag
strings:
- ``args[0]`` is still the string-like ``'Hello '``, this time with a trailing space
- ``args[1]`` is an expression -- the ``{name}`` part
- Tag strings represent this part as an *interpolation* object as discussed below
The ``*args`` list is a sequence of ``Decoded`` and ``Interpolation`` values. A "decoded" object
is a string-like object with extra powers, as described below. An "interpolation" object is a
tuple-like value representing how Python processed the interpolation into a form useful for your
tag function. Both are fully described below in `Specification`_.
Here is a more generalized version using structural pattern matching and type hints:
.. code-block:: python
from typing import Decoded, Interpolation # Get the new protocols
def greet(*args: Decoded | Interpolation) -> str:
"""Handle arbitrary args using structural pattern matching."""
result = []
for arg in args:
match arg:
case Decoded() as decoded:
result.append(decoded)
case Interpolation() as interpolation:
value = interpolation.getvalue()
result.append(value.upper())
return f"{''.join(result)}!"
name = "World"
greeting = greet"Hello {name} nice to meet you"
assert greeting == "Hello WORLD nice to meet you!"
Tag strings extract more than just a callable from the ``Interpolation``. They also
provide Python string formatting info, as well as the original text:
.. code-block:: python
def greet(*args: Decoded | Interpolation) -> str:
"""Interpolations can have string formatting specs and conversions."""
result = []
for arg in args:
match arg:
case Decoded() as decoded:
result.append(decoded)
case getvalue, raw, conversion, format_spec: # Unpack
gv = f"gv: {getvalue()}"
r = f"r: {raw}"
c = f"c: {conversion}"
f = f"f: {format_spec}"
result.append(", ".join([gv, r, c, f]))
return f"{''.join(result)}!"
name = "World"
assert greet"Hello {name!r:s}" == "Hello gv: World, r: name, c: r, f: s!"
You can see each of the ``Interpolation`` parts getting extracted:
- The lambda expression to call and get the value in the scope it was defined
- The raw string of the interpolation (``name``)
- The Python "conversion" field (``s``)
- Any `format specification <https://docs.python.org/3/library/string.html#format-specification-mini-language>`_
(``r``)
Specification
=============
In the rest of this specification, ``my_tag`` will be used for an arbitrary tag.
For example:
.. code-block:: python
def mytag(*args):
return args
trade = 'shrubberies'
mytag'Did you say "{trade}"?'
Valid Tag Names
---------------
The tag name can be any undotted name that isn't already an existing valid string or
bytes prefix, as seen in the `lexical analysis specification
<https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals>`_.
Therefore these prefixes can't be used as a tag:
.. code-block:: text
stringprefix: "r" | "u" | "R" | "U" | "f" | "F"
: | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
Python `restricts certain keywords <https://docs.python.org/3/reference/lexical_analysis.html#keywords>`_ from being
used as identifiers. This restriction also applies to tag names. Usage of keywords should
trigger a helpful error, as done in recent CPython releases.
Tags Must Immediately Precede the Quote Mark
--------------------------------------------
As with other string literal prefixes, no whitespace can be between the tag and the
quote mark.
PEP 701
-------
Tag strings support the full syntax of :pep:`701` in that any string literal,
with any quote mark, can be nested in the interpolation. This nesting includes
of course tag strings.
Evaluating Tag Strings
----------------------
When the tag string is evaluated, the tag must have a binding, or a ``NameError``
is raised; and it must be a callable, or a ``TypeError`` is raised. The callable
must accept a sequence of positional arguments. This behavior follows from the
de-sugaring of:
.. code-block:: python
trade = 'shrubberies'
mytag'Did you say "{trade}"?'
to:
.. code-block:: python
mytag(DecodedConcrete(r'Did you say "'), InterpolationConcrete(lambda: trade, 'trade', None, None), DecodedConcrete(r'"?'))
.. note::
`DecodedConcrete` and `InterpolationConcrete` are just example implementations. If approved,
tag strings will have concrete types in `builtins`.
Decoded Strings
---------------
In the ``mytag'Did you say "{trade}"?'`` example, there are two strings: ``r'Did you say "'``
and ``r'"?'``.
Strings are internally stored as objects with a ``Decoded`` structure, meaning: conforming to
a protocol ``Decoded``:
.. code-block:: python
@runtime_checkable
class Decoded(Protocol):
def __str__(self) -> str:
...
raw: str
These ``Decoded`` objects have access to raw strings. Raw strings are used because tag strings
are meant to target a variety of DSLs, such as the shell and regexes. Such DSLs have their
own specific treatment of metacharacters, namely the backslash.
However, often the "cooked" string is what is needed, by decoding the string as
if it were a standard Python string. In the proposed implementation, the decoded object's
``__new__`` will *store* the raw string and *store and return* the "cooked" string.
The protocol is marked as ``@runtime_checkable`` to allow structural pattern matching to
test against the protocol instead of a type. This can incur a small performance penalty.
Since the ``case`` tests are in user-code tag functions, authors can choose to optimize by
testing for the implementation type discussed next.
The ``Decoded`` protocol will be available from ``typing``. In CPython, ``Decoded``
will be implemented in C, but for discussion of this PEP, the following is a compatible
implementation:
.. code-block:: python
class DecodedConcrete(str):
_raw: str
def __new__(cls, raw: str):
decoded = raw.encode("utf-8").decode("unicode-escape")
if decoded == raw:
decoded = raw
chunk = super().__new__(cls, decoded)
chunk._raw = raw
return chunk
@property
def raw(self):
return self._raw
Interpolation
-------------
An ``Interpolation`` is the data structure representing an expression inside the tag
string. Interpolations enable a delayed evaluation model, where the interpolation
expression is computed, transformed, memoized, or processed in any way.
In addition, the original text of the interpolation expression is made available to the
tag function. This can be useful for debugging or metaprogramming.
``Interpolation`` is a ``Protocol`` which will be made available from ``typing``. It
has the following definition:
.. code-block:: python
@runtime_checkable
class Interpolation(Protocol):
def __len__(self):
...
def __getitem__(self, index: int):
...
def getvalue(self) -> Callable[[], Any]:
...
expr: str
conv: Literal["a", "r", "s"] | None
format_spec: str | None
Given this example interpolation:
.. code-block:: python
mytag'{trade!r:some-formatspec}'
these attributes are as follows:
* ``getvalue`` is a zero argument closure for the interpolation. In this case, ``lambda: trade``.
* ``expr`` is the *expression text* of the interpolation. Example: ``'trade'``.
* ``conv`` is the
`optional conversion <https://docs.python.org/3/library/string.html#format-string-syntax>`_
to be used by the tag function, one of ``r``, ``s``, and ``a``, corresponding to repr, str,
and ascii conversions. Note that as with f-strings, no other conversions are supported.
Example: ``'r'``.
* ``format_spec`` is the optional `format_spec string <https://docs.python.org/3/library/string.html#format-specification-mini-language>`_.
A ``format_spec`` is eagerly evaluated if it contains any expressions before being passed to the tag
function. Example: ``'some-formatspec'``.
In all cases, the tag function determines what to do with valid ``Interpolation``
attributes.
In the CPython reference implementation, implementing ``Interpolation`` in C would
use the equivalent `Struct Sequence Objects
<https://docs.python.org/3/c-api/tuple.html#struct-sequence-objects>`_ (see
such code as `os.stat_result
<https://docs.python.org/3/library/os.html#os.stat_result>`_). For purposes of this
PEP, here is an example of a pure Python implementation:
.. code-block:: python
class InterpolationConcrete(NamedTuple):
getvalue: Callable[[], Any]
expr: str
conv: Literal['a', 'r', 's'] | None = None
format_spec: str | None = None
Interpolation Expression Evaluation
-----------------------------------
Expression evaluation for interpolations is the same as in :pep:`498#expression-evaluation`,
except that all expressions are always implicitly wrapped with a ``lambda``:
The expressions that are extracted from the string are evaluated in the context
where the tag string appeared. This means the expression has full access to its
lexical scope, including local and global variables. Any valid Python expression
can be used, including function and method calls.
However, there's one additional nuance to consider, `function scope
<https://docs.python.org/3/reference/executionmodel.html#resolution-of-names>`_
versus `annotation scope
<https://docs.python.org/3/reference/executionmodel.html#annotation-scopes>`_.
Consider this somewhat contrived example to configure captions:
.. code-block:: python
class CaptionConfig:
tag = 'b'
figure = f'<{tag}>Figure</{tag}>'
Let's now attempt to rewrite the above example to use tag strings:
.. code-block:: python
class CaptionConfig:
tag = 'b'
figure = html'<{tag}>Figure</{tag}>'
Unfortunately, this rewrite doesn't work if using the usual lambda wrapping to
implement interpolations, namely ``lambda: tag``. When the interpolations are
evaluated by the tag function, it will result in ``NameError: name 'tag' is not
defined``. The root cause of this name error is that ``lambda: tag`` uses function scope,
and it's therefore not able to use the class definition where ``tag`` is
defined.
Desugaring how the tag string could be evaluated will result in the same
``NameError`` even using f-strings; the lambda wrapping here also uses function
scoping:
.. code-block:: python
class CaptionConfig:
tag = 'b'
figure = f'<{(lambda: tag)()}>Figure</{(lambda: tag)()}>'
For tag strings, getting such a ``NameError`` would be surprising. It would also
be a rough edge in using tag strings in this specific case of working with class
variables. After all, tag strings are supposed to support a superset of the
capabilities of f-strings.
The solution is to use annotation scope for tag string interpolations. While the
name "annotation scope" suggests it's only about annotations, it solves this
problem by lexically resolving names in the class definition, such as ``tag``,
unlike function scope.
.. note::
The use of annotation scope means it's not possible to fully desugar
interpolations into Python code. Instead it's as if one is writing
``interpolation_lambda: tag``, not ``lambda: tag``, where a hypothetical
``interpolation_lambda`` keyword variant uses annotation scope instead of
the standard function scope.
This is more or less how the reference implementation implements this
concept (but without creating a new keyword of course).
This PEP and its reference implementation therefore use the support for
annotation scope. Note that this usage is a separable part from the
implementation of :pep:`649` and :pep:`695` which provides a somewhat similar
deferred execution model for annotations. Instead it's up to the tag function to
evaluate any interpolations.
With annotation scope in place, lambda-wrapped expressions in interpolations
then provide the usual lexical scoping seen with f-strings. So there's no need
to use ``locals()``, ``globals()``, or frame introspection with
``sys._getframe`` to evaluate the interpolation. In addition, the code of each
expression is available and does not have to be looked up with
``inspect.getsource`` or some other means.
Format Specification
--------------------
The ``format_spec`` is by default ``None`` if it is not specified in the tag string's
corresponding interpolation.
Because the tag function is completely responsible for processing ``Decoded``
and ``Interpolation`` values, there is no required interpretation for the format
spec and conversion in an interpolation. For example, this is a valid usage:
.. code-block:: python
html'<div id={id:int}>{content:HTML|str}</div>'
In this case the ``format_spec`` for the second interpolation is the string
``'HTML|str'``; it is up to the ``html`` tag to do something with the
"format spec" here, if anything.
f-string-style ``=`` Evaluation
-------------------------------
``mytag'{expr=}'`` is parsed to being the same as ``mytag'expr={expr}``', as
implemented in the issue `Add = to f-strings for
easier debugging <https://github.com/python/cpython/issues/80998>`_.
Tag Function Arguments
----------------------
The tag function has the following signature:
.. code-block:: python
def mytag(*args: Decoded | Interpolation) -> Any:
...
This corresponds to the following protocol:
.. code-block:: python
class TagFunction(Protocol):
def __call__(self, *args: Decoded | Interpolation) -> Any:
...
Because of subclassing, the signature for ``mytag`` can of course be widened to
the following, at the cost of losing some type specificity:
.. code-block:: python
def mytag(*args: str | tuple) -> Any:
...
A user might write a tag string as follows:
.. code-block:: python
def tag(*args):
return args
tag"\N{{GRINNING FACE}}"
Tag strings will represent this as exactly one ``Decoded`` argument. In this case, ``Decoded.raw`` would be
``'\\N{GRINNING FACE}'``. The "cooked" representation via encode and decode would be:
.. code-block:: python
'\\N{GRINNING FACE}'.encode('utf-8').decode('unicode-escape')
'😀'
Named unicode characters immediately followed by more text will still produce
just one ``Decoded`` argument:
.. code-block:: python
def tag(*args):
return args
assert tag"\N{{GRINNING FACE}}sometext" == (DecodedConcrete("😀sometext"),)
Return Value
------------
Tag functions can return any type. Often they will return a string, but
richer systems can be built by returning richer objects. See below for
a motivating example.
Function Application
--------------------
Tag strings desugar as follows:
.. code-block:: python
mytag'Hi, {name!s:format_spec}!'
This is equivalent to:
.. code-block:: python
mytag(DecodedConcrete(r'Hi, '), InterpolationConcrete(lambda: name, 'name',
's', 'format_spec'), DecodedConcrete(r'!'))
.. note::
To keep it simple, this and subsequent desugaring omits an important scoping
aspect in how names in interpolation expressions are resolved, specifically
when defining classes. See `Interpolation Expression Evaluation`_.
No Empty Decoded String
-----------------------
Alternation between decodeds and interpolations is commonly seen, but it depends
on the tag string. Decoded strings will never have a value that is the empty string:
.. code-block:: python
mytag'{a}{b}{c}'
...which results in this desugaring:
.. code-block:: python
mytag(InterpolationConcrete(lambda: a, 'a', None, None), InterpolationConcrete(lambda: b, 'b', None, None), InterpolationConcrete(lambda: c, 'c', None, None))
Likewise:
.. code-block:: python
mytag''
...results in this desugaring:
.. code-block:: python
mytag()
HTML Example of Rich Return Types
=================================
Tag functions can be a powerful part of larger processing chains by returning richer objects.
JavaScript tagged template literals, for example, are not constrained by a requirement to
return a string. As an example, let's look at an HTML generation system, with a usage and
"subcomponent":
.. code-block::
def Menu(*, logo: str, class_: str) -> HTML:
return html'<img alt="Site Logo" src={logo} class={class_} />'
icon = 'acme.png'
result = html'<header><{Menu} logo={icon} class="my-menu"/></header>'
img = result.children[0]
assert img.tag == "img"
assert img.attrs == {"src": "acme.png", "class": "my-menu", "alt": "Site Logo"}
# We can also treat the return type as a string of specially-serialized HTML
assert str(result) = '<header>' # etc.
This ``html`` tag function might have the following signature:
.. code-block:: python
def html(*args: Decoded | Interpolation) -> HTML:
...
The ``HTML`` return class might have the following shape as a ``Protocol``:
.. code-block:: python
@runtime_checkable
class HTML(Protocol):
tag: str
attrs: dict[str, Any]
children: Sequence[str | HTML]
In summary, the returned instance can be used as:
- A string, for serializing to the final output
- An iterable, for working with WSGI/ASGI for output streamed and evaluated
interpolations *in the order* they are written out
- A DOM (data) structure of nested Python data
In each case, the result can be lazily and recursively composed in a safe fashion, because
the return value isn't required to be a string. Recommended practice is that
return values are "passive" objects.
What benefits might come from returning rich objects instead of strings? A DSL for
a domain such as HTML templating can provide a toolchain of post-processing, as
`Babel <https://babeljs.io>`_ does for JavaScript
`with AST-based transformation plugins <https://babeljs.io/docs/#pluggable>`_.
Similarly, systems that provide middleware processing can operate on richer,
standard objects with more capabilities. Tag string results can be tested as
nested Python objects, rather than string manipulation. Finally, the intermediate
results can be cached/persisted in useful ways.
Tool Support
============
Python Semantics in Tag Strings
-------------------------------
Python template languages and other DSLs have semantics quite apart from Python.
Different scope rules, different calling semantics e.g. for macros, their own
grammar for loops, and the like.
This means all tools need to write special support for each language. Even then,
it is usually difficult to find all the possible scopes, for example to autocomplete
values.
However, f-strings do not have this issue. An f-string is considered part of Python.
Expressions in curly braces behave as expected and values should resolve based on
regular scoping rules. Tools such as mypy can see inside f-string expressions,
but will likely never look inside a Jinja2 template.
DSLs written with tag strings will inherit much of this value. While we can't expect
standard tooling to understand the "domain" in the DSL, they can still inspect
anything expressible in an f-string.
Backwards Compatibility
=======================
Like f-strings, use of tag strings will be a syntactic backwards incompatibility
with previous versions.
Security Implications
=====================
The security implications of working with interpolations, with respect to
interpolations, are as follows:
1. Scope lookup is the same as f-strings (lexical scope). This model has been
shown to work well in practice.
2. Tag functions can ensure that any interpolations are done in a safe fashion,
including respecting the context in the target DSL.
How To Teach This
=================
Tag strings have several audiences: consumers of tag functions, authors of tag
functions, and framework authors who provide interesting machinery for tag
functions.
All three groups can start from an important framing:
- Existing solutions (such as template engines) can do parts of tag strings
- But tag strings move logic closer to "normal Python"
Consumers can look at tag strings as starting from f-strings:
- They look familiar
- Scoping and syntax rules are the same
They first thing they need to absorb: unlike f-strings, the string isn't
immediately evaluated "in-place". Something else (the tag function) happens.
That's the second thing to teach: the tag functions do something particular.
Thus the concept of "domain specific languages" (DSLs). What's extra to
teach: you need to import the tag function before tagging a string.
Tag function authors think in terms of making a DSL. They have
business policies they want to provide in a Python-familiar way. With tag
functions, Python is going to do much of the pre-processing. This lowers
the bar for making a DSL.
Tag authors can begin with simple use cases. After authors gain experience, tag strings can be used to add larger
patterns: lazy evaluation, intermediate representations, registries, and more.
Each of these points also match the teaching of decorators. In that case,
a learner consumes something which applies to the code just after it. They
don't need to know too much about decorator theory to take advantage of the
utility.
Common Patterns Seen In Writing Tag Functions
=============================================
Structural Pattern Matching
---------------------------
Iterating over the arguments with structural pattern matching is the expected
best practice for many tag function implementations:
.. code-block:: python
def tag(*args: Decoded | Interpolation) -> Any:
for arg in args:
match arg:
case Decoded() as decoded:
... # handle each decoded string
case Interpolation() as interpolation:
... # handle each interpolation
Lazy Evaluation
---------------
The example tag functions above each call the interpolation's ``getvalue`` lambda
immediately. Python developers have frequently wished that f-strings could be
deferred, or lazily evaluated. It would be straightforward to write a wrapper that,
for example, defers calling the lambda until an ``__str__`` was invoked.
Memoizing
---------
Tag function authors have control of processing the static string parts and
the dynamic interpolation parts. For higher performance, they can deploy approaches
for memoizing processing, for example by generating keys.
Order of Evaluation
-------------------
Imagine a tag that generates a number of sections in HTML. The tag needs inputs for each
section. But what if the last input argument takes a while? You can't return the HTML for
the first section until all the arguments are available.
You'd prefer to emit markup as the inputs are available. Some templating tools support
this approach, as does tag strings.
Reference Implementation
========================
At the time of this PEP's announcement, a fully-working implementation is
`available <https://github.com/lysnikolaou/cpython/tree/tag-strings-rebased>`_.
This implementation is not final, as the PEP discussion will likely provide changes.
Rejected Ideas
==============
Enable Exact Round-Tripping of ``conv`` and ``format_spec``
-----------------------------------------------------------
There are two limitations with respect to exactly round-tripping to the original
source text.
First, the ``format_spec`` can be arbitrarily nested:
.. code-block:: python
mytag'{x:{a{b{c}}}}'
In this PEP and corresponding reference implementation, the format_spec
is eagerly evaluated to set the ``format_spec`` in the interpolation, thereby losing the
original expressions.
While it would be feasible to preserve round-tripping in every usage, this would
require an extra flag ``equals`` to support, for example, ``{x=}``, and a
recursive ``Interpolation`` definition for ``format_spec``. The following is roughly the
pure Python equivalent of this type, including preserving the sequence
unpacking (as used in case statements):
.. code-block:: python
class InterpolationConcrete(NamedTuple):
getvalue: Callable[[], Any]
raw: str
conv: str | None = None
format_spec: str | None | tuple[Decoded | Interpolation, ...] = None
equals: bool = False
def __len__(self):
return 4
def __iter__(self):
return iter((self.getvalue, self.raw, self.conv, self.format_spec))
However, the additional complexity to support exact round-tripping seems
unnecessary and is thus rejected.
No Implicit String Concatenation
--------------------------------
Implicit tag string concatenation isn't supported, which is `unlike other string literals
<https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation>`_.
The expectation is that triple quoting is sufficient. If implicit string
concatenation is supported, results from tag evaluations would need to
support the ``+`` operator with ``__add__`` and ``__radd__``.
Because tag strings target embedded DSLs, this complexity introduces other
issues, such as determining appropriate separators. This seems unnecessarily
complicated and is thus rejected.
Arbitrary Conversion Values
---------------------------
Python allows only ``r``, ``s``, or ``a`` as possible conversion type values.
Trying to assign a different value results in ``SyntaxError``.
In theory, tag functions could choose to handle other conversion types. But this
PEP adheres closely to :pep:`701`. Any changes to allowed values should be in a
separate PEP.
Acknowledgements
================
Thanks to Ryan Morshead for contributions during development of the ideas leading
to tag strings. Thanks also to Koudai Aono for infrastructure work on contributing
materials. Special mention also to Dropbox's `pyxl <https://github.com/dropbox/pyxl>`_
as tackling similar ideas years ago.
Copyright
=========
This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.