893 lines
32 KiB
ReStructuredText
893 lines
32 KiB
ReStructuredText
PEP: 750
|
|
Title: Tag Strings For Writing Domain-Specific Languages
|
|
Author: Jim Baker <jim.baker@python.org>, Guido van Rossum <guido@python.org>, Paul Everitt <pauleveritt@me.com>
|
|
Sponsor: Lysandros Nikolaou <lisandrosnik@gmail.com>
|
|
Discussions-To: https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Created: 08-Jul-2024
|
|
Python-Version: 3.14
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP introduces tag strings for custom, repeatable string processing. Tag strings
|
|
are an extension to f-strings, with a custom function -- the "tag" -- in place of the
|
|
``f`` prefix. This function can then provide rich features such as safety checks, lazy
|
|
evaluation, domain-specific languages (DSLs) for web templating, and more.
|
|
|
|
Tag strings are similar to `JavaScript tagged template literals <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates>`_
|
|
and related ideas in other languages. The following tag string usage shows how similar it is to an ``f`` string, albeit
|
|
with the ability to process the literal string and embedded values:
|
|
|
|
.. code-block:: python
|
|
|
|
name = "World"
|
|
greeting = greet"hello {name}"
|
|
assert greeting == "Hello WORLD!"
|
|
|
|
|
|
Tag functions accept prepared arguments and return a string:
|
|
|
|
.. code-block:: python
|
|
|
|
def greet(*args):
|
|
"""Tag function to return a greeting with an upper-case recipient."""
|
|
salutation, recipient, *_ = args
|
|
getvalue, *_ = recipient
|
|
return f"{salutation.title().strip()} {getvalue().upper()}!"
|
|
|
|
Below you can find richer examples. As a note, an implementation based on CPython 3.14
|
|
exists, as discussed in this document.
|
|
|
|
Relationship With Other PEPs
|
|
============================
|
|
|
|
Python introduced f-strings in Python 3.6 with :pep:`498`. The grammar was
|
|
then formalized in :pep:`701` which also lifted some restrictions. This PEP
|
|
is based off of PEP 701.
|
|
|
|
At nearly the same time PEP 498 arrived, :pep:`501` was written to provide
|
|
"i-strings" -- that is, "interpolation template strings". The PEP was
|
|
deferred pending further experience with f-strings. Work on this PEP was
|
|
resumed by a different author in March 2023, introducing "t-strings" as template
|
|
literal strings, and built atop PEP 701.
|
|
|
|
The authors of this PEP consider tag strings as a generalization of the
|
|
updated work in PEP 501.
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Python f-strings became very popular, very fast. The syntax was simple, convenient, and
|
|
interpolated expressions had access to regular scoping rules. However, f-strings have
|
|
two main limitations - expressions are eagerly evaluated, and interpolated values
|
|
cannot be intercepted. The former means that f-strings cannot be re-used like templates,
|
|
and the latter means that how values are interpolated cannot be customized.
|
|
|
|
Templating in Python is currently achieved using packages like Jinja2 which bring their
|
|
own templating languages for generating dynamic content. In addition to being one more
|
|
thing to learn, these languages are not nearly as expressive as Python itself. This
|
|
means that business logic, which cannot be expressed in the templating language, must be
|
|
written in Python instead, spreading the logic across different languages and files.
|
|
|
|
Likewise, the inability to intercept interpolated values means that they cannot be
|
|
sanitized or otherwise transformed before being integrated into the final string. Here,
|
|
the convenience of f-strings could be considered a liability. For example, a user
|
|
executing a query with `sqlite3 <https://docs.python.org/3/library/sqlite3.html>`__
|
|
may be tempted to use an f-string to embed values into their SQL expression instead of
|
|
using the ``?`` placeholder and passing the values as a tuple to avoid an
|
|
`SQL injection attack <https://en.wikipedia.org/wiki/SQL_injection>`__.
|
|
|
|
Tag strings address both these problems by extending the f-string syntax to provide
|
|
developers access to the string and its interpolated values before they are combined. In
|
|
doing so, tag strings may be interpreted in many different ways, opening up the
|
|
possibility for DSLs and other custom string processing.
|
|
|
|
Proposal
|
|
========
|
|
|
|
This PEP proposes customizable prefixes for f-strings. These f-strings then
|
|
become a "tag string": an f-string with a "tag function." The tag function is
|
|
a callable which is given a sequence of arguments for the parsed tokens in
|
|
the string.
|
|
|
|
Here's a very simple example. Imagine we want a certain kind of string with
|
|
some custom business policies: uppercase the value and add an exclamation point.
|
|
|
|
Let's start with a tag string which simply returns a static greeting:
|
|
|
|
.. code-block:: python
|
|
|
|
def greet(*args):
|
|
"""Give a static greeting."""
|
|
return "Hello!"
|
|
|
|
assert greet"Hi" == "Hello!" # Use the custom "tag" on the string
|
|
|
|
As you can see, ``greet`` is just a callable, in the place that the ``f``
|
|
prefix would go. Let's look at the args:
|
|
|
|
.. code-block:: python
|
|
|
|
def greet(*args):
|
|
"""Uppercase and add exclamation."""
|
|
salutation = args[0].upper()
|
|
return f"{salutation}!"
|
|
|
|
greeting = greet"Hello" # Use the custom "tag" on the string
|
|
assert greeting == "HELLO!"
|
|
|
|
The tag function is passed a sequence of arguments. Since our tag string is simply
|
|
``"Hello"``, the ``args`` sequence only contains a string-like value of ``'Hello'``.
|
|
|
|
With this in place, let's introduce an *interpolation*. That is, a place where
|
|
a value should be inserted:
|
|
|
|
.. code-block:: python
|
|
|
|
def greet(*args):
|
|
"""Handle an interpolation."""
|
|
# The first arg is the string-like value "Hello " with a space
|
|
salutation = args[0].strip()
|
|
# The second arg is an "interpolation"
|
|
interpolation = args[1]
|
|
# Interpolations are tuples, the first item is a lambda
|
|
getvalue = interpolation[0]
|
|
# It gets called in the scope where it was defined, so
|
|
# the interpolation returns "World"
|
|
result = getvalue()
|
|
recipient = result.upper()
|
|
return f"{salutation} {recipient}!"
|
|
|
|
name = "World"
|
|
greeting = greet"Hello {name}"
|
|
assert greeting == "Hello WORLD!"
|
|
|
|
The f-string interpolation of ``{name}`` leads to the new machinery in tag
|
|
strings:
|
|
|
|
- ``args[0]`` is still the string-like ``'Hello '``, this time with a trailing space
|
|
- ``args[1]`` is an expression -- the ``{name}`` part
|
|
- Tag strings represent this part as an *interpolation* object as discussed below
|
|
|
|
The ``*args`` list is a sequence of ``Decoded`` and ``Interpolation`` values. A "decoded" object
|
|
is a string-like object with extra powers, as described below. An "interpolation" object is a
|
|
tuple-like value representing how Python processed the interpolation into a form useful for your
|
|
tag function. Both are fully described below in `Specification`_.
|
|
|
|
Here is a more generalized version using structural pattern matching and type hints:
|
|
|
|
.. code-block:: python
|
|
|
|
from typing import Decoded, Interpolation # Get the new protocols
|
|
|
|
def greet(*args: Decoded | Interpolation) -> str:
|
|
"""Handle arbitrary args using structural pattern matching."""
|
|
result = []
|
|
for arg in args:
|
|
match arg:
|
|
case Decoded() as decoded:
|
|
result.append(decoded)
|
|
case Interpolation() as interpolation:
|
|
value = interpolation.getvalue()
|
|
result.append(value.upper())
|
|
|
|
return f"{''.join(result)}!"
|
|
|
|
name = "World"
|
|
greeting = greet"Hello {name} nice to meet you"
|
|
assert greeting == "Hello WORLD nice to meet you!"
|
|
|
|
Tag strings extract more than just a callable from the ``Interpolation``. They also
|
|
provide Python string formatting info, as well as the original text:
|
|
|
|
.. code-block:: python
|
|
|
|
def greet(*args: Decoded | Interpolation) -> str:
|
|
"""Interpolations can have string formatting specs and conversions."""
|
|
result = []
|
|
for arg in args:
|
|
match arg:
|
|
case Decoded() as decoded:
|
|
result.append(decoded)
|
|
case getvalue, raw, conversion, format_spec: # Unpack
|
|
gv = f"gv: {getvalue()}"
|
|
r = f"r: {raw}"
|
|
c = f"c: {conversion}"
|
|
f = f"f: {format_spec}"
|
|
result.append(", ".join([gv, r, c, f]))
|
|
|
|
return f"{''.join(result)}!"
|
|
|
|
name = "World"
|
|
assert greet"Hello {name!r:s}" == "Hello gv: World, r: name, c: r, f: s!"
|
|
|
|
You can see each of the ``Interpolation`` parts getting extracted:
|
|
|
|
- The lambda expression to call and get the value in the scope it was defined
|
|
- The raw string of the interpolation (``name``)
|
|
- The Python "conversion" field (``r``)
|
|
- Any `format specification <https://docs.python.org/3/library/string.html#format-specification-mini-language>`_
|
|
(``s``)
|
|
|
|
Specification
|
|
=============
|
|
|
|
In the rest of this specification, ``my_tag`` will be used for an arbitrary tag.
|
|
For example:
|
|
|
|
.. code-block:: python
|
|
|
|
def mytag(*args):
|
|
return args
|
|
|
|
trade = 'shrubberies'
|
|
mytag'Did you say "{trade}"?'
|
|
|
|
Valid Tag Names
|
|
---------------
|
|
|
|
The tag name can be any undotted name that isn't already an existing valid string or
|
|
bytes prefix, as seen in the `lexical analysis specification
|
|
<https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals>`_.
|
|
Therefore these prefixes can't be used as a tag:
|
|
|
|
.. code-block:: text
|
|
|
|
stringprefix: "r" | "u" | "R" | "U" | "f" | "F"
|
|
: | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
|
|
|
|
bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
|
|
|
|
Python `restricts certain keywords <https://docs.python.org/3/reference/lexical_analysis.html#keywords>`_ from being
|
|
used as identifiers. This restriction also applies to tag names. Usage of keywords should
|
|
trigger a helpful error, as done in recent CPython releases.
|
|
|
|
Tags Must Immediately Precede the Quote Mark
|
|
--------------------------------------------
|
|
|
|
As with other string literal prefixes, no whitespace can be between the tag and the
|
|
quote mark.
|
|
|
|
PEP 701
|
|
-------
|
|
|
|
Tag strings support the full syntax of :pep:`701` in that any string literal,
|
|
with any quote mark, can be nested in the interpolation. This nesting includes
|
|
of course tag strings.
|
|
|
|
Evaluating Tag Strings
|
|
----------------------
|
|
|
|
When the tag string is evaluated, the tag must have a binding, or a ``NameError``
|
|
is raised; and it must be a callable, or a ``TypeError`` is raised. The callable
|
|
must accept a sequence of positional arguments. This behavior follows from the
|
|
de-sugaring of:
|
|
|
|
.. code-block:: python
|
|
|
|
trade = 'shrubberies'
|
|
mytag'Did you say "{trade}"?'
|
|
|
|
to:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag(DecodedConcrete(r'Did you say "'), InterpolationConcrete(lambda: trade, 'trade', None, None), DecodedConcrete(r'"?'))
|
|
|
|
.. note::
|
|
|
|
`DecodedConcrete` and `InterpolationConcrete` are just example implementations. If approved,
|
|
tag strings will have concrete types in `builtins`.
|
|
|
|
Decoded Strings
|
|
---------------
|
|
|
|
In the ``mytag'Did you say "{trade}"?'`` example, there are two strings: ``r'Did you say "'``
|
|
and ``r'"?'``.
|
|
|
|
Strings are internally stored as objects with a ``Decoded`` structure, meaning: conforming to
|
|
a protocol ``Decoded``:
|
|
|
|
.. code-block:: python
|
|
|
|
@runtime_checkable
|
|
class Decoded(Protocol):
|
|
def __str__(self) -> str:
|
|
...
|
|
|
|
raw: str
|
|
|
|
|
|
These ``Decoded`` objects have access to raw strings. Raw strings are used because tag strings
|
|
are meant to target a variety of DSLs, such as the shell and regexes. Such DSLs have their
|
|
own specific treatment of metacharacters, namely the backslash.
|
|
|
|
However, often the "cooked" string is what is needed, by decoding the string as
|
|
if it were a standard Python string. In the proposed implementation, the decoded object's
|
|
``__new__`` will *store* the raw string and *store and return* the "cooked" string.
|
|
|
|
The protocol is marked as ``@runtime_checkable`` to allow structural pattern matching to
|
|
test against the protocol instead of a type. This can incur a small performance penalty.
|
|
Since the ``case`` tests are in user-code tag functions, authors can choose to optimize by
|
|
testing for the implementation type discussed next.
|
|
|
|
The ``Decoded`` protocol will be available from ``typing``. In CPython, ``Decoded``
|
|
will be implemented in C, but for discussion of this PEP, the following is a compatible
|
|
implementation:
|
|
|
|
.. code-block:: python
|
|
|
|
class DecodedConcrete(str):
|
|
_raw: str
|
|
|
|
def __new__(cls, raw: str):
|
|
decoded = raw.encode("utf-8").decode("unicode-escape")
|
|
if decoded == raw:
|
|
decoded = raw
|
|
chunk = super().__new__(cls, decoded)
|
|
chunk._raw = raw
|
|
return chunk
|
|
|
|
@property
|
|
def raw(self):
|
|
return self._raw
|
|
|
|
Interpolation
|
|
-------------
|
|
|
|
An ``Interpolation`` is the data structure representing an expression inside the tag
|
|
string. Interpolations enable a delayed evaluation model, where the interpolation
|
|
expression is computed, transformed, memoized, or processed in any way.
|
|
|
|
In addition, the original text of the interpolation expression is made available to the
|
|
tag function. This can be useful for debugging or metaprogramming.
|
|
|
|
``Interpolation`` is a ``Protocol`` which will be made available from ``typing``. It
|
|
has the following definition:
|
|
|
|
.. code-block:: python
|
|
|
|
@runtime_checkable
|
|
class Interpolation(Protocol):
|
|
def __len__(self):
|
|
...
|
|
|
|
def __getitem__(self, index: int):
|
|
...
|
|
|
|
def getvalue(self) -> Callable[[], Any]:
|
|
...
|
|
|
|
expr: str
|
|
conv: Literal["a", "r", "s"] | None
|
|
format_spec: str | None
|
|
|
|
Given this example interpolation:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag'{trade!r:some-formatspec}'
|
|
|
|
these attributes are as follows:
|
|
|
|
* ``getvalue`` is a zero argument closure for the interpolation. In this case, ``lambda: trade``.
|
|
|
|
* ``expr`` is the *expression text* of the interpolation. Example: ``'trade'``.
|
|
|
|
* ``conv`` is the
|
|
`optional conversion <https://docs.python.org/3/library/string.html#format-string-syntax>`_
|
|
to be used by the tag function, one of ``r``, ``s``, and ``a``, corresponding to repr, str,
|
|
and ascii conversions. Note that as with f-strings, no other conversions are supported.
|
|
Example: ``'r'``.
|
|
|
|
* ``format_spec`` is the optional `format_spec string <https://docs.python.org/3/library/string.html#format-specification-mini-language>`_.
|
|
A ``format_spec`` is eagerly evaluated if it contains any expressions before being passed to the tag
|
|
function. Example: ``'some-formatspec'``.
|
|
|
|
In all cases, the tag function determines what to do with valid ``Interpolation``
|
|
attributes.
|
|
|
|
In the CPython reference implementation, implementing ``Interpolation`` in C would
|
|
use the equivalent `Struct Sequence Objects
|
|
<https://docs.python.org/3/c-api/tuple.html#struct-sequence-objects>`_ (see
|
|
such code as `os.stat_result
|
|
<https://docs.python.org/3/library/os.html#os.stat_result>`_). For purposes of this
|
|
PEP, here is an example of a pure Python implementation:
|
|
|
|
.. code-block:: python
|
|
|
|
class InterpolationConcrete(NamedTuple):
|
|
getvalue: Callable[[], Any]
|
|
expr: str
|
|
conv: Literal['a', 'r', 's'] | None = None
|
|
format_spec: str | None = None
|
|
|
|
Interpolation Expression Evaluation
|
|
-----------------------------------
|
|
|
|
Expression evaluation for interpolations is the same as in :pep:`498#expression-evaluation`,
|
|
except that all expressions are always implicitly wrapped with a ``lambda``:
|
|
|
|
The expressions that are extracted from the string are evaluated in the context
|
|
where the tag string appeared. This means the expression has full access to its
|
|
lexical scope, including local and global variables. Any valid Python expression
|
|
can be used, including function and method calls.
|
|
|
|
However, there's one additional nuance to consider, `function scope
|
|
<https://docs.python.org/3/reference/executionmodel.html#resolution-of-names>`_
|
|
versus `annotation scope
|
|
<https://docs.python.org/3/reference/executionmodel.html#annotation-scopes>`_.
|
|
Consider this somewhat contrived example to configure captions:
|
|
|
|
.. code-block:: python
|
|
|
|
class CaptionConfig:
|
|
tag = 'b'
|
|
figure = f'<{tag}>Figure</{tag}>'
|
|
|
|
Let's now attempt to rewrite the above example to use tag strings:
|
|
|
|
.. code-block:: python
|
|
|
|
class CaptionConfig:
|
|
tag = 'b'
|
|
figure = html'<{tag}>Figure</{tag}>'
|
|
|
|
Unfortunately, this rewrite doesn't work if using the usual lambda wrapping to
|
|
implement interpolations, namely ``lambda: tag``. When the interpolations are
|
|
evaluated by the tag function, it will result in ``NameError: name 'tag' is not
|
|
defined``. The root cause of this name error is that ``lambda: tag`` uses function scope,
|
|
and it's therefore not able to use the class definition where ``tag`` is
|
|
defined.
|
|
|
|
Desugaring how the tag string could be evaluated will result in the same
|
|
``NameError`` even using f-strings; the lambda wrapping here also uses function
|
|
scoping:
|
|
|
|
.. code-block:: python
|
|
|
|
class CaptionConfig:
|
|
tag = 'b'
|
|
figure = f'<{(lambda: tag)()}>Figure</{(lambda: tag)()}>'
|
|
|
|
For tag strings, getting such a ``NameError`` would be surprising. It would also
|
|
be a rough edge in using tag strings in this specific case of working with class
|
|
variables. After all, tag strings are supposed to support a superset of the
|
|
capabilities of f-strings.
|
|
|
|
The solution is to use annotation scope for tag string interpolations. While the
|
|
name "annotation scope" suggests it's only about annotations, it solves this
|
|
problem by lexically resolving names in the class definition, such as ``tag``,
|
|
unlike function scope.
|
|
|
|
.. note::
|
|
|
|
The use of annotation scope means it's not possible to fully desugar
|
|
interpolations into Python code. Instead it's as if one is writing
|
|
``interpolation_lambda: tag``, not ``lambda: tag``, where a hypothetical
|
|
``interpolation_lambda`` keyword variant uses annotation scope instead of
|
|
the standard function scope.
|
|
|
|
This is more or less how the reference implementation implements this
|
|
concept (but without creating a new keyword of course).
|
|
|
|
This PEP and its reference implementation therefore use the support for
|
|
annotation scope. Note that this usage is a separable part from the
|
|
implementation of :pep:`649` and :pep:`695` which provides a somewhat similar
|
|
deferred execution model for annotations. Instead it's up to the tag function to
|
|
evaluate any interpolations.
|
|
|
|
With annotation scope in place, lambda-wrapped expressions in interpolations
|
|
then provide the usual lexical scoping seen with f-strings. So there's no need
|
|
to use ``locals()``, ``globals()``, or frame introspection with
|
|
``sys._getframe`` to evaluate the interpolation. In addition, the code of each
|
|
expression is available and does not have to be looked up with
|
|
``inspect.getsource`` or some other means.
|
|
|
|
Format Specification
|
|
--------------------
|
|
|
|
The ``format_spec`` is by default ``None`` if it is not specified in the tag string's
|
|
corresponding interpolation.
|
|
|
|
Because the tag function is completely responsible for processing ``Decoded``
|
|
and ``Interpolation`` values, there is no required interpretation for the format
|
|
spec and conversion in an interpolation. For example, this is a valid usage:
|
|
|
|
.. code-block:: python
|
|
|
|
html'<div id={id:int}>{content:HTML|str}</div>'
|
|
|
|
In this case the ``format_spec`` for the second interpolation is the string
|
|
``'HTML|str'``; it is up to the ``html`` tag to do something with the
|
|
"format spec" here, if anything.
|
|
|
|
f-string-style ``=`` Evaluation
|
|
-------------------------------
|
|
|
|
``mytag'{expr=}'`` is parsed to being the same as ``mytag'expr={expr}``', as
|
|
implemented in the issue `Add = to f-strings for
|
|
easier debugging <https://github.com/python/cpython/issues/80998>`_.
|
|
|
|
Tag Function Arguments
|
|
----------------------
|
|
|
|
The tag function has the following signature:
|
|
|
|
.. code-block:: python
|
|
|
|
def mytag(*args: Decoded | Interpolation) -> Any:
|
|
...
|
|
|
|
This corresponds to the following protocol:
|
|
|
|
.. code-block:: python
|
|
|
|
class TagFunction(Protocol):
|
|
def __call__(self, *args: Decoded | Interpolation) -> Any:
|
|
...
|
|
|
|
Because of subclassing, the signature for ``mytag`` can of course be widened to
|
|
the following, at the cost of losing some type specificity:
|
|
|
|
.. code-block:: python
|
|
|
|
def mytag(*args: str | tuple) -> Any:
|
|
...
|
|
|
|
A user might write a tag string as follows:
|
|
|
|
.. code-block:: python
|
|
|
|
def tag(*args):
|
|
return args
|
|
|
|
tag"\N{{GRINNING FACE}}"
|
|
|
|
Tag strings will represent this as exactly one ``Decoded`` argument. In this case, ``Decoded.raw`` would be
|
|
``'\\N{GRINNING FACE}'``. The "cooked" representation via encode and decode would be:
|
|
|
|
.. code-block:: python
|
|
|
|
'\\N{GRINNING FACE}'.encode('utf-8').decode('unicode-escape')
|
|
'😀'
|
|
|
|
Named unicode characters immediately followed by more text will still produce
|
|
just one ``Decoded`` argument:
|
|
|
|
.. code-block:: python
|
|
|
|
def tag(*args):
|
|
return args
|
|
|
|
assert tag"\N{{GRINNING FACE}}sometext" == (DecodedConcrete("😀sometext"),)
|
|
|
|
|
|
Return Value
|
|
------------
|
|
|
|
Tag functions can return any type. Often they will return a string, but
|
|
richer systems can be built by returning richer objects. See below for
|
|
a motivating example.
|
|
|
|
Function Application
|
|
--------------------
|
|
|
|
Tag strings desugar as follows:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag'Hi, {name!s:format_spec}!'
|
|
|
|
This is equivalent to:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag(DecodedConcrete(r'Hi, '), InterpolationConcrete(lambda: name, 'name',
|
|
's', 'format_spec'), DecodedConcrete(r'!'))
|
|
|
|
.. note::
|
|
|
|
To keep it simple, this and subsequent desugaring omits an important scoping
|
|
aspect in how names in interpolation expressions are resolved, specifically
|
|
when defining classes. See `Interpolation Expression Evaluation`_.
|
|
|
|
No Empty Decoded String
|
|
-----------------------
|
|
|
|
Alternation between decodeds and interpolations is commonly seen, but it depends
|
|
on the tag string. Decoded strings will never have a value that is the empty string:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag'{a}{b}{c}'
|
|
|
|
...which results in this desugaring:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag(InterpolationConcrete(lambda: a, 'a', None, None), InterpolationConcrete(lambda: b, 'b', None, None), InterpolationConcrete(lambda: c, 'c', None, None))
|
|
|
|
Likewise:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag''
|
|
|
|
...results in this desugaring:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag()
|
|
|
|
HTML Example of Rich Return Types
|
|
=================================
|
|
|
|
Tag functions can be a powerful part of larger processing chains by returning richer objects.
|
|
JavaScript tagged template literals, for example, are not constrained by a requirement to
|
|
return a string. As an example, let's look at an HTML generation system, with a usage and
|
|
"subcomponent":
|
|
|
|
.. code-block::
|
|
|
|
def Menu(*, logo: str, class_: str) -> HTML:
|
|
return html'<img alt="Site Logo" src={logo} class={class_} />'
|
|
|
|
icon = 'acme.png'
|
|
result = html'<header><{Menu} logo={icon} class="my-menu"/></header>'
|
|
img = result.children[0]
|
|
assert img.tag == "img"
|
|
assert img.attrs == {"src": "acme.png", "class": "my-menu", "alt": "Site Logo"}
|
|
# We can also treat the return type as a string of specially-serialized HTML
|
|
assert str(result) = '<header>' # etc.
|
|
|
|
This ``html`` tag function might have the following signature:
|
|
|
|
.. code-block:: python
|
|
|
|
def html(*args: Decoded | Interpolation) -> HTML:
|
|
...
|
|
|
|
The ``HTML`` return class might have the following shape as a ``Protocol``:
|
|
|
|
.. code-block:: python
|
|
|
|
@runtime_checkable
|
|
class HTML(Protocol):
|
|
tag: str
|
|
attrs: dict[str, Any]
|
|
children: Sequence[str | HTML]
|
|
|
|
In summary, the returned instance can be used as:
|
|
|
|
- A string, for serializing to the final output
|
|
- An iterable, for working with WSGI/ASGI for output streamed and evaluated
|
|
interpolations *in the order* they are written out
|
|
- A DOM (data) structure of nested Python data
|
|
|
|
In each case, the result can be lazily and recursively composed in a safe fashion, because
|
|
the return value isn't required to be a string. Recommended practice is that
|
|
return values are "passive" objects.
|
|
|
|
What benefits might come from returning rich objects instead of strings? A DSL for
|
|
a domain such as HTML templating can provide a toolchain of post-processing, as
|
|
`Babel <https://babeljs.io>`_ does for JavaScript
|
|
`with AST-based transformation plugins <https://babeljs.io/docs/#pluggable>`_.
|
|
Similarly, systems that provide middleware processing can operate on richer,
|
|
standard objects with more capabilities. Tag string results can be tested as
|
|
nested Python objects, rather than string manipulation. Finally, the intermediate
|
|
results can be cached/persisted in useful ways.
|
|
|
|
Tool Support
|
|
============
|
|
|
|
Python Semantics in Tag Strings
|
|
-------------------------------
|
|
|
|
Python template languages and other DSLs have semantics quite apart from Python.
|
|
Different scope rules, different calling semantics e.g. for macros, their own
|
|
grammar for loops, and the like.
|
|
|
|
This means all tools need to write special support for each language. Even then,
|
|
it is usually difficult to find all the possible scopes, for example to autocomplete
|
|
values.
|
|
|
|
However, f-strings do not have this issue. An f-string is considered part of Python.
|
|
Expressions in curly braces behave as expected and values should resolve based on
|
|
regular scoping rules. Tools such as mypy can see inside f-string expressions,
|
|
but will likely never look inside a Jinja2 template.
|
|
|
|
DSLs written with tag strings will inherit much of this value. While we can't expect
|
|
standard tooling to understand the "domain" in the DSL, they can still inspect
|
|
anything expressible in an f-string.
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
Like f-strings, use of tag strings will be a syntactic backwards incompatibility
|
|
with previous versions.
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
The security implications of working with interpolations, with respect to
|
|
interpolations, are as follows:
|
|
|
|
1. Scope lookup is the same as f-strings (lexical scope). This model has been
|
|
shown to work well in practice.
|
|
|
|
2. Tag functions can ensure that any interpolations are done in a safe fashion,
|
|
including respecting the context in the target DSL.
|
|
|
|
How To Teach This
|
|
=================
|
|
|
|
Tag strings have several audiences: consumers of tag functions, authors of tag
|
|
functions, and framework authors who provide interesting machinery for tag
|
|
functions.
|
|
|
|
All three groups can start from an important framing:
|
|
|
|
- Existing solutions (such as template engines) can do parts of tag strings
|
|
- But tag strings move logic closer to "normal Python"
|
|
|
|
Consumers can look at tag strings as starting from f-strings:
|
|
|
|
- They look familiar
|
|
- Scoping and syntax rules are the same
|
|
|
|
They first thing they need to absorb: unlike f-strings, the string isn't
|
|
immediately evaluated "in-place". Something else (the tag function) happens.
|
|
That's the second thing to teach: the tag functions do something particular.
|
|
Thus the concept of "domain specific languages" (DSLs). What's extra to
|
|
teach: you need to import the tag function before tagging a string.
|
|
|
|
Tag function authors think in terms of making a DSL. They have
|
|
business policies they want to provide in a Python-familiar way. With tag
|
|
functions, Python is going to do much of the pre-processing. This lowers
|
|
the bar for making a DSL.
|
|
|
|
Tag authors can begin with simple use cases. After authors gain experience, tag strings can be used to add larger
|
|
patterns: lazy evaluation, intermediate representations, registries, and more.
|
|
|
|
Each of these points also match the teaching of decorators. In that case,
|
|
a learner consumes something which applies to the code just after it. They
|
|
don't need to know too much about decorator theory to take advantage of the
|
|
utility.
|
|
|
|
Common Patterns Seen In Writing Tag Functions
|
|
=============================================
|
|
|
|
Structural Pattern Matching
|
|
---------------------------
|
|
|
|
Iterating over the arguments with structural pattern matching is the expected
|
|
best practice for many tag function implementations:
|
|
|
|
.. code-block:: python
|
|
|
|
def tag(*args: Decoded | Interpolation) -> Any:
|
|
for arg in args:
|
|
match arg:
|
|
case Decoded() as decoded:
|
|
... # handle each decoded string
|
|
case Interpolation() as interpolation:
|
|
... # handle each interpolation
|
|
|
|
Lazy Evaluation
|
|
---------------
|
|
|
|
The example tag functions above each call the interpolation's ``getvalue`` lambda
|
|
immediately. Python developers have frequently wished that f-strings could be
|
|
deferred, or lazily evaluated. It would be straightforward to write a wrapper that,
|
|
for example, defers calling the lambda until an ``__str__`` was invoked.
|
|
|
|
Memoizing
|
|
---------
|
|
|
|
Tag function authors have control of processing the static string parts and
|
|
the dynamic interpolation parts. For higher performance, they can deploy approaches
|
|
for memoizing processing, for example by generating keys.
|
|
|
|
Order of Evaluation
|
|
-------------------
|
|
|
|
Imagine a tag that generates a number of sections in HTML. The tag needs inputs for each
|
|
section. But what if the last input argument takes a while? You can't return the HTML for
|
|
the first section until all the arguments are available.
|
|
|
|
You'd prefer to emit markup as the inputs are available. Some templating tools support
|
|
this approach, as does tag strings.
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
At the time of this PEP's announcement, a fully-working implementation is
|
|
`available <https://github.com/lysnikolaou/cpython/tree/tag-strings-rebased>`_.
|
|
|
|
This implementation is not final, as the PEP discussion will likely provide changes.
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
|
|
Enable Exact Round-Tripping of ``conv`` and ``format_spec``
|
|
-----------------------------------------------------------
|
|
|
|
There are two limitations with respect to exactly round-tripping to the original
|
|
source text.
|
|
|
|
First, the ``format_spec`` can be arbitrarily nested:
|
|
|
|
.. code-block:: python
|
|
|
|
mytag'{x:{a{b{c}}}}'
|
|
|
|
In this PEP and corresponding reference implementation, the format_spec
|
|
is eagerly evaluated to set the ``format_spec`` in the interpolation, thereby losing the
|
|
original expressions.
|
|
|
|
While it would be feasible to preserve round-tripping in every usage, this would
|
|
require an extra flag ``equals`` to support, for example, ``{x=}``, and a
|
|
recursive ``Interpolation`` definition for ``format_spec``. The following is roughly the
|
|
pure Python equivalent of this type, including preserving the sequence
|
|
unpacking (as used in case statements):
|
|
|
|
.. code-block:: python
|
|
|
|
class InterpolationConcrete(NamedTuple):
|
|
getvalue: Callable[[], Any]
|
|
raw: str
|
|
conv: str | None = None
|
|
format_spec: str | None | tuple[Decoded | Interpolation, ...] = None
|
|
equals: bool = False
|
|
|
|
def __len__(self):
|
|
return 4
|
|
|
|
def __iter__(self):
|
|
return iter((self.getvalue, self.raw, self.conv, self.format_spec))
|
|
|
|
However, the additional complexity to support exact round-tripping seems
|
|
unnecessary and is thus rejected.
|
|
|
|
No Implicit String Concatenation
|
|
--------------------------------
|
|
|
|
Implicit tag string concatenation isn't supported, which is `unlike other string literals
|
|
<https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation>`_.
|
|
|
|
The expectation is that triple quoting is sufficient. If implicit string
|
|
concatenation is supported, results from tag evaluations would need to
|
|
support the ``+`` operator with ``__add__`` and ``__radd__``.
|
|
|
|
Because tag strings target embedded DSLs, this complexity introduces other
|
|
issues, such as determining appropriate separators. This seems unnecessarily
|
|
complicated and is thus rejected.
|
|
|
|
Arbitrary Conversion Values
|
|
---------------------------
|
|
|
|
Python allows only ``r``, ``s``, or ``a`` as possible conversion type values.
|
|
Trying to assign a different value results in ``SyntaxError``.
|
|
|
|
In theory, tag functions could choose to handle other conversion types. But this
|
|
PEP adheres closely to :pep:`701`. Any changes to allowed values should be in a
|
|
separate PEP.
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
Thanks to Ryan Morshead for contributions during development of the ideas leading
|
|
to tag strings. Thanks also to Koudai Aono for infrastructure work on contributing
|
|
materials. Special mention also to Dropbox's `pyxl <https://github.com/dropbox/pyxl>`_
|
|
as tackling similar ideas years ago.
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the CC0-1.0-Universal
|
|
license, whichever is more permissive.
|