PEP 750: Tag Strings For Writing Domain-Specific Languages (#3858)
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Paul Everitt <pauleveritt@me.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
This commit is contained in:
parent
a9242b65d4
commit
b1d591f26f
|
@ -629,6 +629,7 @@ peps/pep-0747.rst @JelleZijlstra
|
|||
# ...
|
||||
peps/pep-0749.rst @JelleZijlstra
|
||||
# ...
|
||||
peps/pep-0750.rst @gvanrossum @lysnikolaou
|
||||
peps/pep-0751.rst @brettcannon
|
||||
# ...
|
||||
# peps/pep-0754.rst
|
||||
|
|
|
@ -0,0 +1,891 @@
|
|||
PEP: 750
|
||||
Title: Tag Strings For Writing Domain-Specific Languages
|
||||
Author: Jim Baker <jim.baker@python.org>, Guido van Rossum <guido@python.org>, Paul Everitt <pauleveritt@me.com>
|
||||
Sponsor: Lysandros Nikolaou <lisandrosnik@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 08-Jul-2024
|
||||
Python-Version: 3.14
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP introduces tag strings for custom, repeatable string processing. Tag strings
|
||||
are an extension to f-strings, with a custom function -- the "tag" -- in place of the
|
||||
``f`` prefix. This function can then provide rich features such as safety checks, lazy
|
||||
evaluation, domain-specific languages (DSLs) for web templating, and more.
|
||||
|
||||
Tag strings are similar to `JavaScript tagged template literals <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates>`_
|
||||
and related ideas in other languages. The following tag string usage shows how similar it is to an ``f`` string, albeit
|
||||
with the ability to process the literal string and embedded values:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
name = "World"
|
||||
greeting = greet"hello {name}"
|
||||
assert greeting == "Hello WORLD!"
|
||||
|
||||
|
||||
Tag functions accept prepared arguments and return a string:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def greet(*args):
|
||||
"""Tag function to return a greeting with an upper-case recipient."""
|
||||
salutation, recipient, *_ = args
|
||||
_, getvalue = recipient
|
||||
return f"{salutation.title().strip()} {getvalue().upper()}!"
|
||||
|
||||
Below you can find richer examples. As a note, an implementation based on CPython 3.12
|
||||
exists, as discussed in this document.
|
||||
|
||||
Relationship With Other PEPs
|
||||
============================
|
||||
|
||||
Python introduced f-strings in Python 3.6 with :pep:`498`. The grammar was
|
||||
then formalized in :pep:`701` which also lifted some restrictions. This PEP
|
||||
is based off of PEP 701.
|
||||
|
||||
At nearly the same time PEP 498 arrived, :pep:`501` was written to provide
|
||||
"i-strings" -- that is, "interpolation template strings". The PEP was
|
||||
deferred pending further experience with f-strings. Work on this PEP was
|
||||
resumed by a different author in March 2023, introducing "t-strings" as template
|
||||
literal strings, and built atop PEP 701.
|
||||
|
||||
The authors of this PEP consider tag strings as a generalization of the
|
||||
updated work in PEP 501.
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Python f-strings became very popular, very fast. The syntax was simple, convenient, and
|
||||
interpolated expressions had access to regular scoping rules. However, f-strings have
|
||||
two main limitations - expressions are eagerly evaluated, and interpolated values
|
||||
cannot be intercepted. The former means that f-strings cannot be re-used like templates,
|
||||
and the latter means that how values are interpolated cannot be customized.
|
||||
|
||||
Templating in Python is currently achieved using packages like Jinja2 which bring their
|
||||
own templating languages for generating dynamic content. In addition to being one more
|
||||
thing to learn, these languages are not nearly as expressive as Python itself. This
|
||||
means that business logic, which cannot be expressed in the templating language, must be
|
||||
written in Python instead, spreading the logic across different languages and files.
|
||||
|
||||
Likewise, the inability to intercept interpolated values means that they cannot be
|
||||
sanitized or otherwise transformed before being integrated into the final string. Here,
|
||||
the convenience of f-strings could be considered a liability. For example, a user
|
||||
executing a query with `sqlite3 <https://docs.python.org/3/library/sqlite3.html>`__
|
||||
may be tempted to use an f-string to embed values into their SQL expression instead of
|
||||
using the ``?`` placeholder and passing the values as a tuple to avoid an
|
||||
`SQL injection attack <https://en.wikipedia.org/wiki/SQL_injection>`__.
|
||||
|
||||
Tag strings address both these problems by extending the f-string syntax to provide
|
||||
developers access to the string and its interpolated values before they are combined. In
|
||||
doing so, tag strings may be interpreted in many different ways, opening up the
|
||||
possibility for DSLs and other custom string processing.
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
This PEP proposes customizable prefixes for f-strings. These f-strings then
|
||||
become a "tag string": an f-string with a "tag function." The tag function is
|
||||
a callable which is given a sequence of arguments for the parsed tokens in
|
||||
the string.
|
||||
|
||||
Here's a very simple example. Imagine we want a certain kind of string with
|
||||
some custom business policies: uppercase the value and add an exclamation point.
|
||||
|
||||
Let's start with a tag string which simply returns a static greeting:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def greet(*args):
|
||||
"""Give a static greeting."""
|
||||
return "Hello!"
|
||||
|
||||
assert greet"Hi" == "Hello!" # Use the custom "tag" on the string
|
||||
|
||||
As you can see, ``greet`` is just a callable, in the place that the ``f``
|
||||
prefix would go. Let's look at the args:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def greet(*args):
|
||||
"""Uppercase and add exclamation."""
|
||||
salutation = args[0].upper()
|
||||
return f"{salutation}!"
|
||||
|
||||
greeting = greet"Hello" # Use the custom "tag" on the string
|
||||
assert greeting == "HELLO!"
|
||||
|
||||
The tag function is passed a sequence of arguments. Since our tag string is simply
|
||||
``"Hello"``, the ``args`` sequence only contains a string-like value of ``'Hello'``.
|
||||
|
||||
With this in place, let's introduce an *interpolation*. That is, a place where
|
||||
a value should be inserted:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def greet(*args):
|
||||
"""Handle an interpolation."""
|
||||
# The first arg is the string-like value "Hello " with a space
|
||||
salutation = args[0].strip()
|
||||
# The second arg is an "interpolation"
|
||||
interpolation = args[1]
|
||||
# Interpolations are tuples, the first item is a lambda
|
||||
getvalue = interpolation[0]
|
||||
# It gets called in the scope where it was defined, so
|
||||
# the interpolation returns "World"
|
||||
result = getvalue()
|
||||
recipient = result.upper()
|
||||
return f"{salutation} {recipient}!"
|
||||
|
||||
name = "World"
|
||||
greeting = greet"Hello {name}"
|
||||
assert greeting == "Hello WORLD!"
|
||||
|
||||
The f-string interpolation of ``{name}`` leads to the new machinery in tag
|
||||
strings:
|
||||
|
||||
- ``args[0]`` is still the string-like ``'Hello '``, this time with a trailing space
|
||||
- ``args[1]`` is an expression -- the ``{name}`` part
|
||||
- Tag strings represent this part as an *interpolation* object as discussed below
|
||||
|
||||
The ``*args`` list is a sequence of ``Decoded`` and ``Interpolation`` values. A "decoded" object
|
||||
is a string-like object with extra powers, as described below. An "interpolation" object is a
|
||||
tuple-like value representing how Python processed the interpolation into a form useful for your
|
||||
tag function. Both are fully described below in `Specification`_.
|
||||
|
||||
Here is a more generalized version using structural pattern matching and type hints:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from typing import Decoded, Interpolation # Get the new protocols
|
||||
|
||||
def greet(*args: Decoded | Interpolation) -> str:
|
||||
"""Handle arbitrary args using structural pattern matching."""
|
||||
result = []
|
||||
for arg in args:
|
||||
match arg:
|
||||
case Decoded() as decoded:
|
||||
result.append(decoded)
|
||||
case Interpolation() as interpolation:
|
||||
value = interpolation.getvalue()
|
||||
result.append(value.upper())
|
||||
|
||||
return f"{''.join(result)}!"
|
||||
|
||||
name = "World"
|
||||
greeting = greet"Hello {name} nice to meet you"
|
||||
assert greeting == "Hello WORLD nice to meet you!"
|
||||
|
||||
Tag strings extract more than just a callable from the ``Interpolation``. They also
|
||||
provide Python string formatting info, as well as the original text:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def greet(*args: Decoded | Interpolation) -> str:
|
||||
"""Interpolations can have string formatting specs and conversions."""
|
||||
result = []
|
||||
for arg in args:
|
||||
match arg:
|
||||
case Decoded() as decoded:
|
||||
result.append(decoded)
|
||||
case getvalue, raw, conversion, format_spec: # Unpack
|
||||
gv = f"gv: {getvalue()}"
|
||||
r = f"r: {raw}"
|
||||
c = f"c: {conversion}"
|
||||
f = f"f: {format_spec}"
|
||||
result.append(", ".join([gv, r, c, f]))
|
||||
|
||||
return f"{''.join(result)}!"
|
||||
|
||||
name = "World"
|
||||
assert greet"Hello {name!r:s}" == "Hello gv: World, r: name, c: r, f: s!"
|
||||
|
||||
You can see each of the ``Interpolation`` parts getting extracted:
|
||||
|
||||
- The lambda expression to call and get the value in the scope it was defined
|
||||
- The raw string of the interpolation (``name``)
|
||||
- The Python "conversion" field (``s``)
|
||||
- Any `format specification <https://docs.python.org/3/library/string.html#format-specification-mini-language>`_
|
||||
(``r``)
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
In the rest of this specification, ``my_tag`` will be used for an arbitrary tag.
|
||||
For example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def mytag(*args):
|
||||
return args
|
||||
|
||||
trade = 'shrubberies'
|
||||
mytag'Did you say "{trade}"?'
|
||||
|
||||
Valid Tag Names
|
||||
---------------
|
||||
|
||||
The tag name can be any undotted name that isn't already an existing valid string or
|
||||
bytes prefix, as seen in the `lexical analysis specification
|
||||
<https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals>`_.
|
||||
Therefore these prefixes can't be used as a tag:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
stringprefix: "r" | "u" | "R" | "U" | "f" | "F"
|
||||
: | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
|
||||
|
||||
bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
|
||||
|
||||
Python `restricts certain keywords <https://docs.python.org/3/reference/lexical_analysis.html#keywords>`_ from being
|
||||
used as identifiers. This restriction also applies to tag names. Usage of keywords should
|
||||
trigger a helpful error, as done in recent CPython releases.
|
||||
|
||||
Tags Must Immediately Precede the Quote Mark
|
||||
--------------------------------------------
|
||||
|
||||
As with other string literal prefixes, no whitespace can be between the tag and the
|
||||
quote mark.
|
||||
|
||||
PEP 701
|
||||
-------
|
||||
|
||||
Tag strings support the full syntax of :pep:`701` in that any string literal,
|
||||
with any quote mark, can be nested in the interpolation. This nesting includes
|
||||
of course tag strings.
|
||||
|
||||
Evaluating Tag Strings
|
||||
----------------------
|
||||
|
||||
When the tag string is evaluated, the tag must have a binding, or a ``NameError``
|
||||
is raised; and it must be a callable, or a ``TypeError`` is raised. The callable
|
||||
must accept a sequence of positional arguments. This behavior follows from the
|
||||
de-sugaring of:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
trade = 'shrubberies'
|
||||
mytag'Did you say "{trade}"?'
|
||||
|
||||
to:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag(DecodedConcrete(r'Did you say "'), InterpolationConcrete(lambda: trade, 'trade', None, None), DecodedConcrete(r'"?'))
|
||||
|
||||
.. note::
|
||||
|
||||
`DecodedConcrete` and `InterpolationConcrete` are just example implementations. If approved,
|
||||
tag strings will have concrete types in `builtins`.
|
||||
|
||||
Decoded Strings
|
||||
---------------
|
||||
|
||||
In the ``mytag'Did you say "{trade}"?'`` example, there are two strings: ``r'Did you say "'``
|
||||
and ``r'"?'``.
|
||||
|
||||
Strings are internally stored as objects with a ``Decoded`` structure, meaning: conforming to
|
||||
a protocol ``Decoded``:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@runtime_checkable
|
||||
class Decoded(Protocol):
|
||||
def __str__(self) -> str:
|
||||
...
|
||||
|
||||
raw: str
|
||||
|
||||
|
||||
These ``Decoded`` objects have access to raw strings. Raw strings are used because tag strings
|
||||
are meant to target a variety of DSLs, such as the shell and regexes. Such DSLs have their
|
||||
own specific treatment of metacharacters, namely the backslash.
|
||||
|
||||
However, often the "cooked" string is what is needed, by decoding the string as
|
||||
if it were a standard Python string. In the proposed implementation, the decoded object's
|
||||
``__new__`` will *store* the raw string and *store and return* the "cooked" string.
|
||||
|
||||
The protocol is marked as ``@runtime_checkable`` to allow structural pattern matching to
|
||||
test against the protocol instead of a type. This can incur a small performance penalty.
|
||||
Since the ``case`` tests are in user-code tag functions, authors can choose to optimize by
|
||||
testing for the implementation type discussed next.
|
||||
|
||||
The ``Decoded`` protocol will be available from ``typing``. In CPython, ``Decoded``
|
||||
will be implemented in C, but for discussion of this PEP, the following is a compatible
|
||||
implementation:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class DecodedConcrete(str):
|
||||
_raw: str
|
||||
|
||||
def __new__(cls, raw: str):
|
||||
decoded = raw.encode("utf-8").decode("unicode-escape")
|
||||
if decoded == raw:
|
||||
decoded = raw
|
||||
chunk = super().__new__(cls, decoded)
|
||||
chunk._raw = raw
|
||||
return chunk
|
||||
|
||||
@property
|
||||
def raw(self):
|
||||
return self._raw
|
||||
|
||||
Interpolation
|
||||
-------------
|
||||
|
||||
An ``Interpolation`` is the data structure representing an expression inside the tag
|
||||
string. Interpolations enable a delayed evaluation model, where the interpolation
|
||||
expression is computed, transformed, memoized, or processed in any way.
|
||||
|
||||
In addition, the original text of the interpolation expression is made available to the
|
||||
tag function. This can be useful for debugging or metaprogramming.
|
||||
|
||||
``Interpolation`` is a ``Protocol`` which will be made available from ``typing``. It
|
||||
has the following definition:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@runtime_checkable
|
||||
class Interpolation(Protocol):
|
||||
def __len__(self):
|
||||
...
|
||||
|
||||
def __getitem__(self, index: int):
|
||||
...
|
||||
|
||||
def getvalue(self) -> Callable[[], Any]:
|
||||
...
|
||||
|
||||
expr: str
|
||||
conv: Literal["a", "r", "s"] | None
|
||||
format_spec: str | None
|
||||
|
||||
Given this example interpolation:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag'{trade!r:some-formatspec}'
|
||||
|
||||
these attributes are as follows:
|
||||
|
||||
* ``getvalue`` is a zero argument closure for the interpolation. In this case, ``lambda: trade``.
|
||||
|
||||
* ``expr`` is the *expression text* of the interpolation. Example: ``'trade'``.
|
||||
|
||||
* ``conv`` is the
|
||||
`optional conversion <https://docs.python.org/3/library/string.html#format-string-syntax>`_
|
||||
to be used by the tag function, one of ``r``, ``s``, and ``a``, corresponding to repr, str,
|
||||
and ascii conversions. Note that as with f-strings, no other conversions are supported.
|
||||
Example: ``'r'``.
|
||||
|
||||
* ``format_spec`` is the optional `format_spec string <https://docs.python.org/3/library/string.html#format-specification-mini-language>`_.
|
||||
A ``format_spec`` is eagerly evaluated if it contains any expressions before being passed to the tag
|
||||
function. Example: ``'some-formatspec'``.
|
||||
|
||||
In all cases, the tag function determines what to do with valid ``Interpolation``
|
||||
attributes.
|
||||
|
||||
In the CPython reference implementation, implementing ``Interpolation`` in C would
|
||||
use the equivalent `Struct Sequence Objects
|
||||
<https://docs.python.org/3/c-api/tuple.html#struct-sequence-objects>`_ (see
|
||||
such code as `os.stat_result
|
||||
<https://docs.python.org/3/library/os.html#os.stat_result>`_). For purposes of this
|
||||
PEP, here is an example of a pure Python implementation:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class InterpolationConcrete(NamedTuple):
|
||||
getvalue: Callable[[], Any]
|
||||
expr: str
|
||||
conv: Literal['a', 'r', 's'] | None = None
|
||||
format_spec: str | None = None
|
||||
|
||||
Interpolation Expression Evaluation
|
||||
-----------------------------------
|
||||
|
||||
Expression evaluation for interpolations is the same as in :pep:`498#expression-evaluation`,
|
||||
except that all expressions are always implicitly wrapped with a ``lambda``:
|
||||
|
||||
The expressions that are extracted from the string are evaluated in the context
|
||||
where the tag string appeared. This means the expression has full access to its
|
||||
lexical scope, including local and global variables. Any valid Python expression
|
||||
can be used, including function and method calls.
|
||||
|
||||
However, there's one additional nuance to consider, `function scope
|
||||
<https://docs.python.org/3/reference/executionmodel.html#resolution-of-names>`_
|
||||
versus `annotation scope
|
||||
<https://docs.python.org/3/reference/executionmodel.html#annotation-scopes>`_.
|
||||
Consider this somewhat contrived example to configure captions:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class CaptionConfig:
|
||||
tag = 'b'
|
||||
figure = f'<{tag}>Figure</{tag}>'
|
||||
|
||||
Let's now attempt to rewrite the above example to use tag strings:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class CaptionConfig:
|
||||
tag = 'b'
|
||||
figure = html'<{tag}>Figure</{tag}>'
|
||||
|
||||
Unfortunately, this rewrite doesn't work if using the usual lambda wrapping to
|
||||
implement interpolations, namely ``lambda: tag``. When the interpolations are
|
||||
evaluated by the tag function, it will result in ``NameError: name 'tag' is not
|
||||
defined``. The root cause of this name error is that ``lambda: tag`` uses function scope,
|
||||
and it's therefore not able to use the class definition where ``tag`` is
|
||||
defined.
|
||||
|
||||
Desugaring how the tag string could be evaluated will result in the same
|
||||
``NameError`` even using f-strings; the lambda wrapping here also uses function
|
||||
scoping:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class CaptionConfig:
|
||||
tag = 'b'
|
||||
figure = f'<{(lambda: tag)()}>Figure</{(lambda: tag)()}>'
|
||||
|
||||
For tag strings, getting such a ``NameError`` would be surprising. It would also
|
||||
be a rough edge in using tag strings in this specific case of working with class
|
||||
variables. After all, tag strings are supposed to support a superset of the
|
||||
capabilities of f-strings.
|
||||
|
||||
The solution is to use annotation scope for tag string interpolations. While the
|
||||
name "annotation scope" suggests it's only about annotations, it solves this
|
||||
problem by lexically resolving names in the class definition, such as ``tag``,
|
||||
unlike function scope.
|
||||
|
||||
.. note::
|
||||
|
||||
The use of annotation scope means it's not possible to fully desugar
|
||||
interpolations into Python code. Instead it's as if one is writing
|
||||
``interpolation_lambda: tag``, not ``lambda: tag``, where a hypothetical
|
||||
``interpolation_lambda`` keyword variant uses annotation scope instead of
|
||||
the standard function scope.
|
||||
|
||||
This is more or less how the reference implementation implements this
|
||||
concept (but without creating a new keyword of course).
|
||||
|
||||
This PEP and its reference implementation therefore use the support for
|
||||
annotation scope. Note that this usage is a separable part from the
|
||||
implementation of :pep:`649` and :pep:`695` which provides a somewhat similar
|
||||
deferred execution model for annotations. Instead it's up to the tag function to
|
||||
evaluate any interpolations.
|
||||
|
||||
With annotation scope in place, lambda-wrapped expressions in interpolations
|
||||
then provide the usual lexical scoping seen with f-strings. So there's no need
|
||||
to use ``locals()``, ``globals()``, or frame introspection with
|
||||
``sys._getframe`` to evaluate the interpolation. In addition, the code of each
|
||||
expression is available and does not have to be looked up with
|
||||
``inspect.getsource`` or some other means.
|
||||
|
||||
Format Specification
|
||||
--------------------
|
||||
|
||||
The ``format_spec`` is by default ``None`` if it is not specified in the tag string's
|
||||
corresponding interpolation.
|
||||
|
||||
Because the tag function is completely responsible for processing ``Decoded``
|
||||
and ``Interpolation`` values, there is no required interpretation for the format
|
||||
spec and conversion in an interpolation. For example, this is a valid usage:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
html'<div id={id:int}>{content:HTML|str}</div>'
|
||||
|
||||
In this case the ``format_spec`` for the second interpolation is the string
|
||||
``'HTML|str'``; it is up to the ``html`` tag to do something with the
|
||||
"format spec" here, if anything.
|
||||
|
||||
f-string-style ``=`` Evaluation
|
||||
-------------------------------
|
||||
|
||||
``mytag'{expr=}'`` is parsed to being the same as ``mytag'expr={expr}``', as
|
||||
implemented in the issue `Add = to f-strings for
|
||||
easier debugging <https://github.com/python/cpython/issues/80998>`_.
|
||||
|
||||
Tag Function Arguments
|
||||
----------------------
|
||||
|
||||
The tag function has the following signature:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def mytag(*args: Decoded | Interpolation) -> Any:
|
||||
...
|
||||
|
||||
This corresponds to the following protocol:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class TagFunction(Protocol):
|
||||
def __call__(self, *args: Decoded | Interpolation) -> Any:
|
||||
...
|
||||
|
||||
Because of subclassing, the signature for ``mytag`` can of course be widened to
|
||||
the following, at the cost of losing some type specificity:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def mytag(*args: str | tuple) -> Any:
|
||||
...
|
||||
|
||||
A user might write a tag string as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def tag(*args):
|
||||
return args
|
||||
|
||||
tag"\N{{GRINNING FACE}}"
|
||||
|
||||
Tag strings will represent this as exactly one ``Decoded`` argument. In this case, ``Decoded.raw`` would be
|
||||
``'\\N{GRINNING FACE}'``. The "cooked" representation via encode and decode would be:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
'\\N{GRINNING FACE}'.encode('utf-8').decode('unicode-escape')
|
||||
'😀'
|
||||
|
||||
Named unicode characters immediately followed by more text will still produce
|
||||
just one ``Decoded`` argument:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def tag(*args):
|
||||
return args
|
||||
|
||||
assert tag"\N{{GRINNING FACE}}sometext" == (DecodedConcrete("😀sometext"),)
|
||||
|
||||
|
||||
Return Value
|
||||
------------
|
||||
|
||||
Tag functions can return any type. Often they will return a string, but
|
||||
richer systems can be built by returning richer objects. See below for
|
||||
a motivating example.
|
||||
|
||||
Function Application
|
||||
--------------------
|
||||
|
||||
Tag strings desugar as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag'Hi, {name!s:format_spec}!'
|
||||
|
||||
This is equivalent to:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag(DecodedConcrete(r'Hi, '), InterpolationConcrete(lambda: name, 'name',
|
||||
's', 'format_spec'), DecodedConcrete(r'!'))
|
||||
|
||||
.. note::
|
||||
|
||||
To keep it simple, this and subsequent desugaring omits an important scoping
|
||||
aspect in how names in interpolation expressions are resolved, specifically
|
||||
when defining classes. See `Interpolation Expression Evaluation`_.
|
||||
|
||||
No Empty Decoded String
|
||||
-----------------------
|
||||
|
||||
Alternation between decodeds and interpolations is commonly seen, but it depends
|
||||
on the tag string. Decoded strings will never have a value that is the empty string:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag'{a}{b}{c}'
|
||||
|
||||
...which results in this desugaring:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag(InterpolationConcrete(lambda: a, 'a', None, None), InterpolationConcrete(lambda: b, 'b', None, None), InterpolationConcrete(lambda: c, 'c', None, None))
|
||||
|
||||
Likewise:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag''
|
||||
|
||||
...results in this desugaring:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag()
|
||||
|
||||
HTML Example of Rich Return Types
|
||||
=================================
|
||||
|
||||
Tag functions can be a powerful part of larger processing chains by returning richer objects.
|
||||
JavaScript tagged template literals, for example, are not constrained by a requirement to
|
||||
return a string. As an example, let's look at an HTML generation system, with a usage and
|
||||
"subcomponent":
|
||||
|
||||
.. code-block::
|
||||
|
||||
def Menu(*, logo: str, class_: str) -> HTML:
|
||||
return html'<img alt="Site Logo" src={logo} class={class_} />'
|
||||
|
||||
icon = 'acme.png'
|
||||
result = html'<header><{Menu} logo={icon} class="my-menu"/></header>'
|
||||
img = result.children[0]
|
||||
assert img.tag == "img"
|
||||
assert img.attrs == {"src": "acme.png", "class": "my-menu", "alt": "Site Logo"}
|
||||
# We can also treat the return type as a string of specially-serialized HTML
|
||||
assert str(result) = '<header>' # etc.
|
||||
|
||||
This ``html`` tag function might have the following signature:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def html(*args: Decoded | Interpolation) -> HTML:
|
||||
...
|
||||
|
||||
The ``HTML`` return class might have the following shape as a ``Protocol``:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@runtime_checkable
|
||||
class HTML(Protocol):
|
||||
tag: str
|
||||
attrs: dict[str, Any]
|
||||
children: Sequence[str | HTML]
|
||||
|
||||
In summary, the returned instance can be used as:
|
||||
|
||||
- A string, for serializing to the final output
|
||||
- An iterable, for working with WSGI/ASGI for output streamed and evaluated
|
||||
interpolations *in the order* they are written out
|
||||
- A DOM (data) structure of nested Python data
|
||||
|
||||
In each case, the result can be lazily and recursively composed in a safe fashion, because
|
||||
the return value isn't required to be a string. Recommended practice is that
|
||||
return values are "passive" objects.
|
||||
|
||||
What benefits might come from returning rich objects instead of strings? A DSL for
|
||||
a domain such as HTML templating can provide a toolchain of post-processing, as
|
||||
`Babel <https://babeljs.io>`_ does for JavaScript
|
||||
`with AST-based transformation plugins <https://babeljs.io/docs/#pluggable>`_.
|
||||
Similarly, systems that provide middleware processing can operate on richer,
|
||||
standard objects with more capabilities. Tag string results can be tested as
|
||||
nested Python objects, rather than string manipulation. Finally, the intermediate
|
||||
results can be cached/persisted in useful ways.
|
||||
|
||||
Tool Support
|
||||
============
|
||||
|
||||
Python Semantics in Tag Strings
|
||||
-------------------------------
|
||||
|
||||
Python template languages and other DSLs have semantics quite apart from Python.
|
||||
Different scope rules, different calling semantics e.g. for macros, their own
|
||||
grammar for loops, and the like.
|
||||
|
||||
This means all tools need to write special support for each language. Even then,
|
||||
it is usually difficult to find all the possible scopes, for example to autocomplete
|
||||
values.
|
||||
|
||||
However, f-strings do not have this issue. An f-string is considered part of Python.
|
||||
Expressions in curly braces behave as expected and values should resolve based on
|
||||
regular scoping rules. Tools such as mypy can see inside f-string expressions,
|
||||
but will likely never look inside a Jinja2 template.
|
||||
|
||||
DSLs written with tag strings will inherit much of this value. While we can't expect
|
||||
standard tooling to understand the "domain" in the DSL, they can still inspect
|
||||
anything expressible in an f-string.
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
Like f-strings, use of tag strings will be a syntactic backwards incompatibility
|
||||
with previous versions.
|
||||
|
||||
Security Implications
|
||||
=====================
|
||||
|
||||
The security implications of working with interpolations, with respect to
|
||||
interpolations, are as follows:
|
||||
|
||||
1. Scope lookup is the same as f-strings (lexical scope). This model has been
|
||||
shown to work well in practice.
|
||||
|
||||
2. Tag functions can ensure that any interpolations are done in a safe fashion,
|
||||
including respecting the context in the target DSL.
|
||||
|
||||
How To Teach This
|
||||
=================
|
||||
|
||||
Tag strings have several audiences: consumers of tag functions, authors of tag
|
||||
functions, and framework authors who provide interesting machinery for tag
|
||||
functions.
|
||||
|
||||
All three groups can start from an important framing:
|
||||
|
||||
- Existing solutions (such as template engines) can do parts of tag strings
|
||||
- But tag strings move logic closer to "normal Python"
|
||||
|
||||
Consumers can look at tag strings as starting from f-strings:
|
||||
|
||||
- They look familiar
|
||||
- Scoping and syntax rules are the same
|
||||
|
||||
They first thing they need to absorb: unlike f-strings, the string isn't
|
||||
immediately evaluated "in-place". Something else (the tag function) happens.
|
||||
That's the second thing to teach: the tag functions do something particular.
|
||||
Thus the concept of "domain specific languages" (DSLs). What's extra to
|
||||
teach: you need to import the tag function before tagging a string.
|
||||
|
||||
Tag function authors think in terms of making a DSL. They have
|
||||
business policies they want to provide in a Python-familiar way. With tag
|
||||
functions, Python is going to do much of the pre-processing. This lowers
|
||||
the bar for making a DSL.
|
||||
|
||||
Tag authors can begin with simple use cases. After authors gain experience, tag strings can be used to add larger
|
||||
patterns: lazy evaluation, intermediate representations, registries, and more.
|
||||
|
||||
Each of these points also match the teaching of decorators. In that case,
|
||||
a learner consumes something which applies to the code just after it. They
|
||||
don't need to know too much about decorator theory to take advantage of the
|
||||
utility.
|
||||
|
||||
Common Patterns Seen In Writing Tag Functions
|
||||
=============================================
|
||||
|
||||
Structural Pattern Matching
|
||||
---------------------------
|
||||
|
||||
Iterating over the arguments with structural pattern matching is the expected
|
||||
best practice for many tag function implementations:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def tag(*args: Decoded | Interpolation) -> Any:
|
||||
for arg in args:
|
||||
match arg:
|
||||
case Decoded() as decoded:
|
||||
... # handle each decoded string
|
||||
case Interpolation() as interpolation:
|
||||
... # handle each interpolation
|
||||
|
||||
Lazy Evaluation
|
||||
---------------
|
||||
|
||||
The example tag functions above each call the interpolation's ``getvalue`` lambda
|
||||
immediately. Python developers have frequently wished that f-strings could be
|
||||
deferred, or lazily evaluated. It would be straightforward to write a wrapper that,
|
||||
for example, defers calling the lambda until an ``__str__`` was invoked.
|
||||
|
||||
Memoizing
|
||||
---------
|
||||
|
||||
Tag function authors have control of processing the static string parts and
|
||||
the dynamic interpolation parts. For higher performance, they can deploy approaches
|
||||
for memoizing processing, for example by generating keys.
|
||||
|
||||
Order of Evaluation
|
||||
-------------------
|
||||
|
||||
Imagine a tag that generates a number of sections in HTML. The tag needs inputs for each
|
||||
section. But what if the last input argument takes a while? You can't return the HTML for
|
||||
the first section until all the arguments are available.
|
||||
|
||||
You'd prefer to emit markup as the inputs are available. Some templating tools support
|
||||
this approach, as does tag strings.
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
At the time of this PEP's announcement, a fully-working implementation is
|
||||
`available <https://github.com/lysnikolaou/cpython/tree/tag-strings-rebased>`_.
|
||||
|
||||
This implementation is not final, as the PEP discussion will likely provide changes.
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
|
||||
Enable Exact Round-Tripping of ``conv`` and ``format_spec``
|
||||
-----------------------------------------------------------
|
||||
|
||||
There are two limitations with respect to exactly round-tripping to the original
|
||||
source text.
|
||||
|
||||
First, the ``format_spec`` can be arbitrarily nested:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
mytag'{x:{a{b{c}}}}'
|
||||
|
||||
In this PEP and corresponding reference implementation, the format_spec
|
||||
is eagerly evaluated to set the ``format_spec`` in the interpolation, thereby losing the
|
||||
original expressions.
|
||||
|
||||
While it would be feasible to preserve round-tripping in every usage, this would
|
||||
require an extra flag ``equals`` to support, for example, ``{x=}``, and a
|
||||
recursive ``Interpolation`` definition for ``format_spec``. The following is roughly the
|
||||
pure Python equivalent of this type, including preserving the sequence
|
||||
unpacking (as used in case statements):
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class InterpolationConcrete(NamedTuple):
|
||||
getvalue: Callable[[], Any]
|
||||
raw: str
|
||||
conv: str | None = None
|
||||
format_spec: str | None | tuple[Decoded | Interpolation, ...] = None
|
||||
equals: bool = False
|
||||
|
||||
def __len__(self):
|
||||
return 4
|
||||
|
||||
def __iter__(self):
|
||||
return iter((self.getvalue, self.raw, self.conv, self.format_spec))
|
||||
|
||||
However, the additional complexity to support exact round-tripping seems
|
||||
unnecessary and is thus rejected.
|
||||
|
||||
No Implicit String Concatenation
|
||||
--------------------------------
|
||||
|
||||
Implicit tag string concatenation isn't supported, which is `unlike other string literals
|
||||
<https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation>`_.
|
||||
|
||||
The expectation is that triple quoting is sufficient. If implicit string
|
||||
concatenation is supported, results from tag evaluations would need to
|
||||
support the ``+`` operator with ``__add__`` and ``__radd__``.
|
||||
|
||||
Because tag strings target embedded DSLs, this complexity introduces other
|
||||
issues, such as determining appropriate separators. This seems unnecessarily
|
||||
complicated and is thus rejected.
|
||||
|
||||
Arbitrary Conversion Values
|
||||
---------------------------
|
||||
|
||||
Python allows only ``r``, ``s``, or ``a`` as possible conversion type values.
|
||||
Trying to assign a different value results in ``SyntaxError``.
|
||||
|
||||
In theory, tag functions could choose to handle other conversion types. But this
|
||||
PEP adheres closely to :pep:`701`. Any changes to allowed values should be in a
|
||||
separate PEP.
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
Thanks to Ryan Morshead for contributions during development of the ideas leading
|
||||
to tag strings. Thanks also to Koudai Aono for infrastructure work on contributing
|
||||
materials. Special mention also to Dropbox's `pyxl <https://github.com/dropbox/pyxl>`_
|
||||
as tackling similar ideas years ago.
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the CC0-1.0-Universal
|
||||
license, whichever is more permissive.
|
Loading…
Reference in New Issue