PEP: 501 Title: General purpose template literal strings Author: Alyssa Coghlan , Nick Humrich Discussions-To: https://discuss.python.org/t/pep-501-reopen-general-purpose-string-template-literals/24625 Status: Draft Type: Standards Track Content-Type: text/x-rst Requires: 701 Created: 08-Aug-2015 Python-Version: 3.12 Post-History: `08-Aug-2015 `__, `05-Sep-2015 `__, `09-Mar-2023 `__, .. TODO: Start new PEP 501 d.p.o thread once these updates have been merged Abstract ======== Though easy and elegant to use, Python :term:`f-string`\s can be vulnerable to injection attacks when used to construct shell commands, SQL queries, HTML snippets and similar (for example, ``os.system(f"echo {message_from_user}")``). This PEP introduces template literal strings (or "t-strings"), which have syntax and semantics that are similar to f-strings, but with rendering deferred until :func:`format` or another template rendering function is called on them. This will allow standard library calls, helper functions and third party tools to safety and intelligently perform appropriate escaping and other string processing on inputs while retaining the usability and convenience of f-strings. Relationship with other PEPs ============================ This PEP is inpired by and builds on top of the f-string syntax first implemented in :pep:`498` and formalised in :pep:`701`. This PEP complements the literal string typing support added to Python's formal type system in :pep:`675` by introducing a *safe* way to do dynamic interpolation of runtime values into security sensitive strings. This PEP competes with some aspects of the tagged string proposal in :pep:`750` (most notably in whether template rendering is expressed as ``render(t"template literal")`` or as ``render"template literal"``), but also shares *many* common features (after :pep:`750` was published, this PEP was updated with `several new changes `__ inspired by the tagged strings proposal). This PEP does NOT propose an alternative to :pep:`292` for user interface internationalization use cases (but does note the potential for future syntactic enhancements aimed at that use case that would benefit from the compiler-supported value interpolation machinery that this PEP and :pep:`750` introduce). Motivation ========== :pep:`498` added new syntactic support for string interpolation that is transparent to the compiler, allowing name references from the interpolation operation full access to containing namespaces (as with any other expression), rather than being limited to explicit name references. These are referred to in the PEP (and elsewhere) as "f-strings" (a mnemonic for "formatted strings"). Since acceptance of :pep:`498`, f-strings have become well-established and very popular. f-strings became even more useful and flexible with the formalised grammar in :pep:`701`. While f-strings are great, eager rendering has its limitations. For example, the eagerness of f-strings has made code like the following unfortunately plausible: .. code-block:: python os.system(f"echo {message_from_user}") This kind of code is superficially elegant, but poses a significant problem if the interpolated value ``message_from_user`` is in fact provided by an untrusted user: it's an opening for a form of code injection attack, where the supplied user data has not been properly escaped before being passed to the ``os.system`` call. While the ``LiteralString`` type annotation introduced in :pep:`675` means that typecheckers are able to report a type error for this kind of unsafe function usage, those errors don't help make it easier to write code that uses safer alternatives (such as :func:`subprocess.run`). To address that problem (and a number of other concerns), this PEP proposes the complementary introduction of "t-strings" (a mnemonic for "template literal strings"), where ``format(t"Message with {data}")`` would produce the same result as ``f"Message with {data}"``, but the template literal instance can instead be passed to other template rendering functions which process the contents of the template differently. Proposal ======== Dedicated template literal syntax --------------------------------- This PEP proposes a new string prefix that declares the string to be a template literal rather than an ordinary string: .. code-block:: python template = t"Substitute {names:>{field_width}} and {expressions()!r} at runtime" This would be effectively interpreted as: .. code-block:: python template = TemplateLiteral( r"Substitute {names:>{field_width}} and {expressions()} at runtime", TemplateLiteralText(r"Substitute "), TemplateLiteralField("names", names, f">{field_width}", ""), TemplateLiteralText(r" and "), TemplateLiteralField("expressions()", expressions(), f"", "r"), ) (Note: this is an illustrative example implementation. The exact compile time construction syntax of ``types.TemplateLiteral`` is considered an implementation detail not specified by the PEP. In particular, the compiler may bypass the default constructor's runtime logic that detects consecutive text segments and merges them into a single text segment, as well as checking the runtime types of all supplied arguments). The ``__format__`` method on ``types.TemplateLiteral`` would then implement the following :meth:`str.format` inspired semantics: .. code-block:: python-console >>> import datetime >>> name = 'Jane' >>> age = 50 >>> anniversary = datetime.date(1991, 10, 12) >>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.') 'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.' >>> format(t'She said her name is {name!r}.') "She said her name is 'Jane'." The syntax of template literals would be based on :pep:`701`, and largely use the same syntax for the string portion of the template. Aside from using a different prefix, the one other syntactic change is in the definition and handling of conversion specifiers, both to allow ``!()`` as a standard conversion specifier to request evaluation of a field at rendering time, and to allow custom renderers to also define custom conversion specifiers. This PEP does not propose to remove or deprecate any of the existing string formatting mechanisms, as those will remain valuable when formatting strings that are not present directly in the source code of the application. Lazy field evaluation conversion specifier ------------------------------------------ In addition to the existing support for the ``a``, ``r``, and ``s`` conversion specifiers, :meth:`str.format`, :meth:`str.format_map`, and :class:`string.Formatter` will be updated to accept ``()`` as a conversion specifier that means "call the interpolated value". To support application of the standard conversion specifiers in custom template rendering functions, a new :func:`!operator.convert_field` function will be added. The signature and behaviour of the :func:`format` builtin will also be updated to accept a conversion specifier as a third optional parameter. If a non-empty conversion specifier is given, the value will be converted with :func:`!operator.convert_field` before looking up the ``__format__`` method. Custom conversion specifiers ---------------------------- To allow additional field-specific directives to be passed to custom rendering functions in a way that still allows formatting of the template with the default renderer, the conversion specifier field will be allowed to contain a second ``!`` character. :func:`!operator.convert_field` and :func:`format` (and hence the default ``TemplateLiteral.render`` template rendering method), will ignore that character and any subsequent text in the conversion specifier field. :meth:`str.format`, :meth:`str.format_map`, and :class:`string.Formatter` will also be updated to accept (and ignore) custom conversion specifiers. Template renderer for POSIX shell commands ------------------------------------------ As both a practical demonstration of the benefits of delayed rendering support, and as a valuable feature in its own right, a new ``sh`` template renderer will be added to the :mod:`shlex` module. This renderer will produce strings where all interpolated fields are escaped with :func:`shlex.quote`. The :class:`subprocess.Popen` API (and higher level APIs that depend on it, such as :func:`subprocess.run`) will be updated to accept interpolation templates and handle them in accordance with the new ``shlex.sh`` renderer. Background ========== This PEP was initially proposed as a competitor to :pep:`498`. After it became clear that the eager rendering proposal had sustantially more immediate support, it then spent several years in a deferred state, pending further experience with :pep:`498`'s simpler approach of only supporting eager rendering without the additional complexity of also supporting deferred rendering. Since then, f-strings have become very popular and :pep:`701` was introduced to tidy up some rough edges and limitations in their syntax and semantics. The template literal proposal was updated in 2023 to reflect current knowledge of f-strings, and improvements from :pep:`701`. In 2024, :pep:`750` was published, proposing a general purpose mechanism for custom tagged string prefixes, rather than the narrower template literal proposal in this PEP. This PEP was again updated, both to incorporate new ideas inspired by the tagged strings proposal, and to describe the perceived benefits of the narrower template literal syntax proposal in this PEP over the more general tagged string proposal. Summary of differences from f-strings ------------------------------------- The key differences between f-strings and t-strings are: * the ``t`` (template literal) prefix indicates delayed rendering, but otherwise largely uses the same syntax and semantics as formatted strings * template literals are available at runtime as a new kind of object (``types.TemplateLiteral``) * the default rendering used by formatted strings is invoked on a template literal object by calling ``format(template)`` rather than being done implicitly in the compiled code * unlike f-strings (where conversion specifiers are handled directly in the compiler), t-string conversion specifiers are handled at rendering time by the rendering function * the new ``!()`` conversion specifier indicates that the field expression is a callable that should be called when using the default :func:`format` rendering function. This specifier is specifically *not* being added to f-strings (since it is pointless there). * a second ``!`` is allowed in t-string conversion specifiers (with any subsequent text being ignored) as a way to allow custom template rendering functions to accept custom conversion specifiers without breaking the default :func:`!TemplateLiteral.render` rendering method. This feature is specifically *not* being added to f-strings (since it is pointless there). * while f-string ``f"Message {here}"`` would be *semantically* equivalent to ``format(t"Message {here}")``, f-strings will continue to be supported directly in the compiler and hence avoid the runtime overhead of actually using the delayed rendering machinery that is needed for t-strings Summary of differences from tagged strings ------------------------------------------ When tagged strings were `first proposed `__, there were several notable differences from the proposal in PEP 501 beyond the surface syntax difference between whether rendering function invocations are written as ``render(t"template literal")`` or as ``render"template literal"``. Over the course of the initial PEP 750 discussion, many of those differences were eliminated, either by PEP 501 adopting that aspect of PEP 750's proposal (such as lazily applying conversion specifiers), or by PEP 750 changing to retain some aspect of PEP 501's proposal (such as defining a dedicated type to hold template segments rather than representing them as simple sequences). The main remaining significant difference is that this PEP argues that adding *only* the t-string prefix is a sufficient enhancement to give all the desired benefits described in PEP 750. The expansion to a generalised "tagged string" syntax isn't necessary, and causes additional problems that can be avoided. The two PEPs also differ in their proposed approaches to handling lazy evaluation of template fields. While there *are* other differences between the two proposals, those differences are more cosmetic than substantive. In particular: * this PEP proposes different names for the structural typing protocols * this PEP proposes specific names for the concrete implementation types * this PEP proposes exact details for the proposed APIs of the concrete implementation types (including concatenation and repetition support, which are not part of the structural typing protocols) * this PEP proposes changes to the existing :func:`format` builtin to make it usable directly as a template field renderer The two PEPs also differ in *how* they make their case for delayed rendering support. This PEP focuses more on the concrete implementation concept of using template literals to allow the "interpolation" and "rendering" steps in f-string processing to be separated in time, and then taking advantage of that to reduce the potential code injection risks associated with misuse of f-strings. PEP 750 focuses more on the way that native templating support allows behaviours that are difficult or impossible to achieve via existing string based templating methods. As with the cosmetic differences noted above, this is more a difference in style than a difference in substance. Rationale ========= f-strings (:pep:`498`) made interpolating values into strings with full access to Python's lexical namespace semantics simpler, but it does so at the cost of creating a situation where interpolating values into sensitive targets like SQL queries, shell commands and HTML templates will enjoy a much cleaner syntax when handled without regard for code injection attacks than when they are handled correctly. This PEP proposes to provide the option of delaying the actual rendering of a template literal to a formatted string to its ``__format__`` method, allowing the use of other template renderers by passing the template around as a first class object. While very different in the technical details, the ``types.TemplateLiteral`` interface proposed in this PEP is conceptually quite similar to the ``FormattableString`` type underlying the `native interpolation `__ support introduced in C# 6.0, as well as the `JavaScript template literals `__ introduced in ES6. While not the original motivation for developing the proposal, many of the benefits for defining domain specific languages described in :pep:`750` also apply to this PEP (including the potential for per-DSL semantic highlighting in code editors based on the type specifications of declared template variables and rendering function parameters). Specification ============= This PEP proposes a new ``t`` string prefix that results in the creation of an instance of a new type, ``types.TemplateLiteral``. Template literals are Unicode strings (bytes literals are not permitted), and string literal concatenation operates as normal, with the entire combined literal forming the template literal. The template string is parsed into literals, expressions, format specifiers, and conversion specifiers as described for f-strings in :pep:`498` and :pep:`701`. The syntax for conversion specifiers is relaxed such that arbitrary strings are accepted (excluding those containing ``{``, ``}`` or ``:``) rather than being restricted to valid Python identifiers. However, rather than being rendered directly into a formatted string, these components are instead organised into instances of new types with the following behaviour: .. code-block:: python class TemplateLiteralText(str): # This is a renamed and extended version of the DecodedConcrete type in PEP 750 # Real type would be implemented in C, this is an API compatible Python equivalent _raw: str def __new__(cls, raw: str): decoded = raw.encode("utf-8").decode("unicode-escape") if decoded == raw: decoded = raw text = super().__new__(cls, decoded) text._raw = raw return text @staticmethod def merge(text_segments:Sequence[TemplateLiteralText]) -> TemplateLiteralText: if len(text_segments) == 1: return text_segments[0] return TemplateLiteralText("".join(t._raw for t in text_segments)) @property def raw(self) -> str: return self._raw def __repr__(self) -> str: return f"{type(self).__name__}(r{self._raw!r})" def __add__(self, other:Any) -> TemplateLiteralText|NotImplemented: if isinstance(other, TemplateLiteralText): return TemplateLiteralText(self._raw + other._raw) return NotImplemented def __mul__(self, other:Any) -> TemplateLiteralText|NotImplemented: try: factor = operator.index(other) except TypeError: return NotImplemented return TemplateLiteralText(self._raw * factor) __rmul__ = __mul__ class TemplateLiteralField(NamedTuple): # This is mostly a renamed version of the InterpolationConcrete type in PEP 750 # However: # - value is eagerly evaluated (values were all originally lazy in PEP 750) # - conversion specifiers are allowed to be arbitrary strings # - order of fields is adjusted so the text form is the first field and the # remaining parameters match the updated signature of the `*format` builtin # Real type would be implemented in C, this is an API compatible Python equivalent expr: str value: Any format_spec: str | None = None conversion_spec: str | None = None def __repr__(self) -> str: return (f"{type(self).__name__}({self.expr}, {self.value!r}, " f"{self.format_spec!r}, {self.conversion_spec!r})") def __str__(self) -> str: return format(self.value, self.format_spec, self.conversion_spec) def __format__(self, format_override) -> str: if format_override: format_spec = format_override else: format_spec = self.format_spec return format(self.value, format_spec, self.conversion_spec) class TemplateLiteral: # This type corresponds to the TemplateConcrete type in PEP 750 # Real type would be implemented in C, this is an API compatible Python equivalent _raw_template: str _segments = tuple[TemplateLiteralText|TemplateLiteralField] def __new__(cls, raw_template:str, *segments:TemplateLiteralText|TemplateLiteralField): self = super().__new__(cls) self._raw_template = raw_template # Check if there are any adjacent text segments that need merging # or any empty text segments that need discarding type_err = "Template literal segments must be template literal text or field instances" text_expected = True needs_merge = False for segment in segments: match segment: case TemplateLiteralText(): if not text_expected or not segment: needs_merge = True break text_expected = False case TemplateLiteralField(): text_expected = True case _: raise TypeError(type_err) if not needs_merge: # Match loop above will have checked all segments self._segments = segments return self # Merge consecutive runs of text fields and drop any empty text fields merged_segments:list[TemplateLiteralText|TemplateLiteralField] = [] pending_merge:list[TemplateLiteralText] = [] for segment in segments: match segment: case TemplateLiteralText() as text_segment: if text_segment: pending_merge.append(text_segment) case TemplateLiteralField(): if pending_merge: merged_segments.append(TemplateLiteralText.merge(pending_merge)) pending_merge.clear() merged_segments.append(segment) case _: # First loop above may not check all segments when a merge is needed raise TypeError(type_err) if pending_merge: merged_segments.append(TemplateLiteralText.merge(pending_merge)) pending_merge.clear() self._segments = tuple(merged_segments) return self @property def raw_template(self) -> str: return self._raw_template @property def segments(self) -> tuple[TemplateLiteralText|TemplateLiteralField]: return self._segments def __len__(self) -> int: return len(self._segments) def __iter__(self) -> Iterable[TemplateLiteralText|TemplateLiteralField]: return iter(self._segments) # Note: template literals do NOT define any relative ordering def __eq__(self, other): if not isinstance(other, TemplateLiteral): return NotImplemented return ( self._raw_template == other._raw_template and self._segments == other._segments and self.field_values == other.field_values and self.format_specifiers == other.format_specifiers ) def __repr__(self) -> str: return (f"{type(self).__name__}(r{self._raw!r}, " f"{', '.join(map(repr, self._segments))})") def __format__(self, format_specifier) -> str: # When formatted, render to a string, and then use string formatting return format(self.render(), format_specifier) def render(self, *, render_template=''.join, render_text=str, render_field=format): ... # See definition of the template rendering semantics below def __add__(self, other) -> TemplateLiteral|NotImplemented: if isinstance(other, TemplateLiteral): combined_raw_text = self._raw + other._raw combined_segments = self._segments + other._segments return TemplateLiteral(combined_raw_text, *combined_segments) if isinstance(other, str): # Treat the given string as a new raw text segment combined_raw_text = self._raw + other combined_segments = self._segments + (TemplateLiteralText(other),) return TemplateLiteral(combined_raw_text, *combined_segments) return NotImplemented def __radd__(self, other) -> TemplateLiteral|NotImplemented: if isinstance(other, str): # Treat the given string as a new raw text segment. This effectively # has precedence over string concatenation in CPython due to # https://github.com/python/cpython/issues/55686 combined_raw_text = other + self._raw combined_segments = (TemplateLiteralText(other),) + self._segments return TemplateLiteral(combined_raw_text, *combined_segments) return NotImplemented def __mul__(self, other) -> TemplateLiteral|NotImplemented: try: factor = operator.index(other) except TypeError: return NotImplemented if not self or factor == 1: return self if factor < 1: return TemplateLiteral("") repeated_text = self._raw_template * factor repeated_segments = self._segments * factor return TemplateLiteral(repeated_text, *repeated_segments) __rmul__ = __mul__ (Note: this is an illustrative example implementation, the exact compile time construction method and internal data management details of ``types.TemplateLiteral`` are considered an implementation detail not specified by the PEP. However, the expected post-construction behaviour of the public APIs on ``types.TemplateLiteral`` instances is specified by the above code, as is the constructor signature for building template instances at runtime) The result of a template literal expression is an instance of this type, rather than an already rendered string. Rendering only takes place when the instance's ``render`` method is called (either directly, or indirectly via ``__format__``). The compiler will pass the following details to the template literal for later use: * a string containing the raw template as written in the source code * a sequence of template segments, with each segment being either: * a literal text segment (a regular Python string that also provides access to its raw form) * a parsed template interpolation field, specifying the text of the interpolated expression (as a regular string), its evaluated result, the format specifier text (with any substitution fields eagerly evaluated as an f-string), and the conversion specifier text (as a regular string) The raw template is just the template literal as a string. By default, it is used to provide a human-readable representation for the template literal, but template renderers may also use it for other purposes (e.g. as a cache lookup key). The parsed template structure is taken from :pep:`750` and consists of a sequence of template segments corresponding to the text segments and interpolation fields in the template string. This approach is designed to allow compilers to fully process each segment of the template in order, before finally emitting code to pass all of the template segments to the template literal constructor. For example, assuming the following runtime values: .. code-block:: python names = ["Alice", "Bob", "Carol", "Eve"] field_width = 10 def expressions(): return 42 The template from the proposal section would be represented at runtime as: .. code-block:: python TemplateLiteral( r"Substitute {names:>{field_width}} and {expressions()!r} at runtime", TemplateLiteralText(r"Substitute "), TemplateLiteralField("names", ["Alice", "Bob", "Carol", "Eve"], ">10", ""), TemplateLiteralText(r" and "), TemplateLiteralField("expressions()", 42, "", "r"), ) Rendering templates ------------------- The ``TemplateLiteral.render`` implementation defines the rendering process in terms of the following renderers: * an overall ``render_template`` operation that defines how the sequence of rendered text and field segments are composed into a fully rendered result. The default template renderer is string concatenation using ``''.join``. * a per text segment ``render_text`` operation that receives the individual literal text segments within the template. The default text renderer is the builtin ``str`` constructor. * a per field segment ``render_field`` operation that receives the field value, format specifier, and conversion specifier for substitution fields within the template. The default field renderer is the :func:`format` builtin. Given the parsed template representation above, the semantics of template rendering would then be equivalent to the following: .. code-block:: python def render(self, *, render_template=''.join, render_text=str, render_field=format): rendered_segments = [] for segment in self._segments: match segment: case TemplateLiteralText() as text_segment: rendered_segments.append(render_text(text_segment)) case TemplateLiteralField() as field_segment: rendered_segments.append(render_field(*field_segment[1:])) return render_template(rendered_segments) Format specifiers ----------------- The syntax and processing of field specifiers in t-strings is defined to be the same as it is for f-strings. This includes allowing field specifiers to themselves contain f-string substitution fields. The raw text of the field specifiers (without processing any substitution fields) is retained as part of the full raw template string. The parsed field specifiers receive the field specifier string with those substitutions already resolved. The ``:`` prefix is also omitted. Aside from separating them out from the substitution expression during parsing, format specifiers are otherwise treated as opaque strings by the interpolation template parser - assigning semantics to those (or, alternatively, prohibiting their use) is handled at rendering time by the field renderer. Conversion specifiers --------------------- In addition to the existing support for ``a``, ``r``, and ``s`` conversion specifiers, :meth:`str.format` and :meth:`str.format_map` will be updated to accept ``()`` as a conversion specifier that means "call the interpolated value". Where :pep:`701` restricts conversion specifiers to ``NAME`` tokens, this PEP will instead allow ``FSTRING_MIDDLE`` tokens (such that only ``{``, ``}`` and ``:`` are disallowed). This change is made primarily to support lazy field rendering with the ``!()`` conversion specifier, but also allows custom rendering functions more flexibility when defining their own conversion specifiers in preference to those defined for the default :func:`format` field renderer. Conversion specifiers are still handled as plain strings, and do NOT support the use of substitution fields. The parsed conversion specifiers receive the conversion specifier string with the ``!`` prefix omitted. To allow custom template renderers to define their own custom conversion specifiers without causing the default renderer to fail, conversion specifiers will be permitted to contain a custom suffix prefixed with a second ``!`` character. That is, ``!!``, ``!a!``, ``!r!``, ``!s!``, and ``!()!`` would all be valid conversion specifiers in a template literal. As described above, the default rendering supports the original ``!a``, ``!r`` and ``!s`` conversion specifiers defined in :pep:`3101`, together with the new ``!()`` lazy field evaluation conversion specifier defined in this PEP. The default rendering ignores any custom conversion specifier suffixes. The full mapping between the standard conversion specifiers and the special methods called on the interpolated value when the field is rendered: * No conversion (empty string): ``__format__`` (with format specifier as parameter) * ``a``: ``__repr__`` (as per the :func:`ascii` builtin) * ``r``: ``__repr__`` (as per the :func:`repr` builtin) * ``s``: ``__str__`` (as per the ``str`` builtin) * ``()``: ``__call__`` (with no parameters) When a conversion occurs, ``__format__`` (with the format specifier) is called on the result of the conversion rather than being called on the original object. The changes to :func:`format` and the addition of :func:`!operator.convert_field` make it straightforward for custom renderers to also support the standard conversion specifiers. f-strings themselves will NOT support the new ``!()`` conversion specifier (as it is redundant when value interpolation and value rendering always occur at the same time). They also will NOT support the use of custom conversion specifiers (since the rendering function is known at compile time and doesn't make use of the custom specifiers). New field conversion API in the :mod:`operator` module ------------------------------------------------------ To support application of the standard conversion specifiers in custom template rendering functions, a new :func:`!operator.convert_field` function will be added: .. code-block:: python def convert_field(value, conversion_spec=''): """Apply the given string formatting conversion specifier to the given value""" std_spec, sep, custom_spec = conversion_spec.partition("!") match std_spec: case '': return value case 'a': return ascii(value) case 'r': return repr(value) case 's': return str(value) case '()': return value() if not sep: err = f"Invalid conversion specifier {std_spec!r}" else: err = f"Invalid conversion specifier {std_spec!r} in {conversion_spec!r}" raise ValueError(f"{err}: expected '', 'a', 'r', 's' or '()') Conversion specifier parameter added to :func:`format` ------------------------------------------------------ The signature and behaviour of the :func:`format` builtin will be updated: .. code-block:: python def format(value, format_spec='', conversion_spec=''): if conversion_spec: value_to_format = operator.convert_field(value) else: value_to_format = value return type(value_to_format).__format__(value, format_spec) If a non-empty conversion specifier is given, the value will be converted with :func:`!operator.convert_field` before looking up the ``__format__`` method. The signature of the ``__format__`` special method does NOT change (only format specifiers are handled by the object being formatted). Structural typing and duck typing --------------------------------- To allow custom renderers to accept alternative interpolation template implementations (rather than being tightly coupled to the native template literal types), the following structural protocols will be added to the ``typing`` module: .. code-block:: python @runtime_checkable class TemplateText(Protocol): # Renamed version of PEP 750's Decoded protocol def __str__(self) -> str: ... raw: str @runtime_checkable class TemplateField(Protocol): # Renamed and modified version of PEP 750's Interpolation protocol def __len__(self): ... def __getitem__(self, index: int): ... def __str__(self) -> str: ... expr: str value: Any format_spec: str | None = None conversion_spec: str | None = None @runtime_checkable class InterpolationTemplate(Protocol): # Corresponds to PEP 750's Template protocol def __iter__(self) -> Iterable[TemplateText|TemplateField]: ... raw_template: str Note that the structural protocol APIs are substantially narrower than the full implementation APIs defined for ``TemplateLiteralText``, ``TemplateLiteralField``, and ``TemplateLiteral``. Code that wants to accept interpolation templates and define specific handling for them without introducing a dependency on the ``typing`` module, or restricting the code to handling the concrete template literal types, should instead perform an attribute existence check on ``raw_template``. Writing custom renderers ------------------------ Writing a custom renderer doesn't require any special syntax. Instead, custom renderers are ordinary callables that process an interpolation template directly either by calling the ``render()`` method with alternate ``render_template``, ``render_text``, and/or ``render_field`` implementations, or by accessing the template's data attributes directly. For example, the following function would render a template using objects' ``repr`` implementations rather than their native formatting support: .. code-block:: python def repr_format(template): def render_field(value, format_spec, conversion_spec): converted_value = operator.convert_field(value, conversion_spec) return format(repr(converted_value), format_spec) return template.render(render_field=render_field) The customer renderer shown respects the conversion specifiers in the original template, but it is also possible to ignore them and render the interpolated values directly: .. code-block:: python def input_repr_format(template): def render_field(value, format_spec, __): return format(repr(value), format_spec) return template.render(render_field=render_field) When writing custom renderers, note that the return type of the overall rendering operation is determined by the return type of the passed in ``render_template`` callable. While this will still be a string for formatting related use cases, producing non-string objects *is* permitted. For example, a custom SQL template renderer could involve an ``sqlalchemy.sql.text`` call that produces an `SQL Alchemy query object `__. A subprocess invocation related template renderer could produce a string sequence suitable for passing to ``subprocess.run``, or it could even call ``subprocess.run`` directly, and return the result. Non-strings may also be returned from ``render_text`` and ``render_field``, as long as they are paired with a ``render_template`` implementation that expects that behaviour. Custom renderers using the pattern matching style described in :pep:`750` are also supported: .. code-block:: python # Use the structural typing protocols rather than the concrete implementation types from typing import InterpolationTemplate, TemplateText, TemplateField def greet(template: InterpolationTemplate) -> str: """Render an interpolation template using structural pattern matching.""" result = [] for segment in template: match segment: match segment: case TemplateText() as text_segment: result.append(text_segment) case TemplateField() as field_segment: result.append(str(field_segment).upper()) return f"{''.join(result)}!" Expression evaluation --------------------- As with f-strings, the subexpressions that are extracted from the interpolation template are evaluated in the context where the template literal appears. This means the expression has full access to local, nonlocal and global variables. Any valid Python expression can be used inside ``{}``, including function and method calls. Because the substitution expressions are evaluated where the string appears in the source code, there are no additional security concerns related to the contents of the expression itself, as you could have also just written the same expression and used runtime field parsing: .. code-block:: python-console >>> bar=10 >>> def foo(data): ... return data + 20 ... >>> str(t'input={bar}, output={foo(bar)}') 'input=10, output=30' Is essentially equivalent to: .. code-block:: python-console >>> 'input={}, output={}'.format(bar, foo(bar)) 'input=10, output=30' Handling code injection attacks ------------------------------- The :pep:`498` formatted string syntax makes it potentially attractive to write code like the following: .. code-block:: python runquery(f"SELECT {column} FROM {table};") runcommand(f"cat {filename}") return_response(f"{response.body}") These all represent potential vectors for code injection attacks, if any of the variables being interpolated happen to come from an untrusted source. The specific proposal in this PEP is designed to make it straightforward to write use case specific renderers that take care of quoting interpolated values appropriately for the relevant security context: .. code-block:: python runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};")) runcommand(sh(t"cat {filename}")) return_response(html(t"{response.body}")) This PEP does not cover adding all such renderers to the standard library immediately (though one for shell escaping is proposed), but rather proposes to ensure that they can be readily provided by third party libraries, and potentially incorporated into the standard library at a later date. Over time, it is expected that APIs processing potentially dangerous string inputs may be updated to accept interpolation templates natively, allowing problematic code examples to be fixed simply by replacing the ``f`` string prefix with a ``t``: .. code-block:: python runquery(t"SELECT {column} FROM {table};") runcommand(t"cat {filename}") return_response(t"{response.body}") It is proposed that a renderer is included in the :mod:`shlex` module, aiming to offer a more POSIX shell style experience for accessing external programs, without the significant risks posed by running ``os.system`` or enabling the system shell when using the ``subprocess`` module APIs. This renderer will provide an interface for running external programs inspired by that offered by the `Julia programming language `__, only with the backtick based ``\`cat $filename\``` syntax replaced by ``t"cat {filename}"`` style template literals. See more in the :ref:`pep-501-shlex-module` section. Error handling -------------- Either compile time or run time errors can occur when processing interpolation expressions. Compile time errors are limited to those errors that can be detected when parsing a template string into its component tuples. These errors all raise SyntaxError. Unmatched braces:: >>> t'x={x' File "", line 1 t'x={x' ^ SyntaxError: missing '}' in template literal expression Invalid expressions:: >>> t'x={!x}' File "", line 1 !x ^ SyntaxError: invalid syntax Run time errors occur when evaluating the expressions inside a template string before creating the template literal object. See :pep:`498` for some examples. Different renderers may also impose additional runtime constraints on acceptable interpolated expressions and other formatting details, which will be reported as runtime exceptions. .. _pep-501-shlex-module: Renderer for shell escaping added to :mod:`shlex` ------------------------------------------------- As a reference implementation, a renderer for safe POSIX shell escaping can be added to the :mod:`shlex` module. This renderer would be called ``sh`` and would be equivalent to calling ``shlex.quote`` on each field value in the template literal. Thus: .. code-block:: python os.system(shlex.sh(t'cat {myfile}')) would have the same behavior as: .. code-block:: python os.system('cat ' + shlex.quote(myfile))) The implementation would be: .. code-block:: python def sh(template: TemplateLiteral): def render_field(value, format_spec, conversion_spec) field_text = format(value, format_spec, conversion_spec) return quote(field_text) return template.render(render_field=render_field) The addition of ``shlex.sh`` will NOT change the existing admonishments in the :mod:`subprocess` documentation that passing ``shell=True`` is best avoided, nor the reference from the :func:`os.system` documentation the higher level ``subprocess`` APIs. Changes to subprocess module ---------------------------- With the additional renderer in the shlex module, and the addition of template literals, the :mod:`subprocess` module can be changed to handle accepting template literals as an additional input type to ``Popen``, as it already accepts a sequence, or a string, with different behavior for each. With the addition of template literals, :class:`subprocess.Popen` (and in return, all its higher level functions such as :func:`subprocess.run`) could accept strings in a safe way (at least on :ref:`POSIX systems `). For example: .. code-block:: python subprocess.run(t'cat {myfile}', shell=True) would automatically use the ``shlex.sh`` renderer provided in this PEP. Therefore, using ``shlex`` inside a ``subprocess.run`` call like so: .. code-block:: python subprocess.run(shlex.sh(t'cat {myfile}'), shell=True) would be redundant, as ``run`` would automatically render any template literals through ``shlex.sh`` Alternatively, when ``subprocess.Popen`` is run without ``shell=True``, it could still provide subprocess with a more ergonomic syntax. For example: .. code-block:: python subprocess.run(t'cat {myfile} --flag {value}') would be equivalent to: .. code-block:: python subprocess.run(['cat', myfile, '--flag', value]) or, more accurately: .. code-block:: python subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}')) It would do this by first using the ``shlex.sh`` renderer, as above, then using ``shlex.split`` on the result. The implementation inside ``subprocess.Popen._execute_child`` would look like: .. code-block:: python if hasattr(args, "raw_template"): import shlex if shell: args = [shlex.sh(args)] else: args = shlex.split(shlex.sh(args)) How to Teach This ================= This PEP intentionally includes two standard renderers that will always be available in teaching environments: the :func:`format` builtin and the new ``shlex.sh`` POSIX shell renderer. Together, these two renderers can be used to build an initial understanding of delayed rendering on top of a student's initial introduction to string formatting with f-strings. This initial understanding would have the goal of allowing students to *use* template literals effectively, in combination with pre-existing template rendering functions. For example, ``f"{'some text'}"``, ``f"{value}"``, ``f"{value!r}"``, , ``f"{callable()}"`` could all be introduced. Those same operations could then be rewritten as ``format(t"{'some text'}")``, ``format(t"{value}")``, ``format(t"{value!r}")``, , ``format(t"{callable()}")`` to illustrate the relationship between the eager rendering form and the delayed rendering form. The difference between "template definition time" (or "interpolation time" ) and "template rendering time" can then be investigated further by storing the template literals as local variables and looking at their representations separately from the results of the ``format`` calls. At this point, the ``t"{callable!()}"`` syntax can be introduced to distinguish between field expressions that are called at template definition time and those that are called at template rendering time. Finally, the differences between the results of ``f"{'some text'}"``, ``format(t"{'some text'}")``, and ``shlex.sh(t"{'some text'}")`` could be explored to illustrate the potential for differences between the default rendering function and custom rendering functions. Actually defining your own custom template rendering functions would then be a separate more advanced topic (similar to the way students are routinely taught to use decorators and context managers well before they learn how to write their own custom ones). :pep:`750` includes further ideas for teaching aspects of the delayed rendering topic. Discussion ========== Refer to :pep:`498` for previous discussion, as several of the points there also apply to this PEP. :pep:`750`'s design discussions are also highly relevant, as that PEP inspired several aspects of the current design. Support for binary interpolation -------------------------------- As f-strings don't handle byte strings, neither will t-strings. Interoperability with str-only interfaces ----------------------------------------- For interoperability with interfaces that only accept strings, interpolation templates can still be prerendered with :func:`format`, rather than delegating the rendering to the called function. This reflects the key difference from :pep:`498`, which *always* eagerly applies the default rendering, without any way to delegate the choice of renderer to another section of the code. Preserving the raw template string ---------------------------------- Earlier versions of this PEP failed to make the raw template string available on the template literal. Retaining it makes it possible to provide a more attractive template representation, as well as providing the ability to precisely reconstruct the original string, including both the expression text and the details of any eagerly rendered substitution fields in format specifiers. Creating a rich object rather than a global name lookup ------------------------------------------------------- Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than creating a new kind of object for later consumption by interpolation functions. Creating a rich descriptive object with a useful default renderer made it much easier to support customisation of the semantics of interpolation. Building atop f-strings rather than replacing them -------------------------------------------------- Earlier versions of this PEP attempted to serve as a complete substitute for :pep:`498` (f-strings) . With the acceptance of that PEP and the more recent :pep:`701`, this PEP can instead build a more flexible delayed rendering capability on top of the existing f-string eager rendering. Assuming the presence of f-strings as a supporting capability simplified a number of aspects of the proposal in this PEP (such as how to handle substitution fields in format specifiers). Defining repetition and concatenation semantics ----------------------------------------------- This PEP explicitly defines repetition and concatenation semantics for ``TemplateLiteral`` and ``TemplateLiteralText``. While not strictly necessary, defining these is expected to make the types easier to work with in code that historically only supported regular strings. New conversion specifier for lazy field evaluation -------------------------------------------------- The initially published version of :pep:`750` defaulted to lazy evaluation for all interpolation fields. While it was subsequently updated to default to eager evaluation (as happens for f-strings and this PEP), the discussions around the topic prompted the idea of providing a way to indicate to rendering functions that the interpolated field value should be called at rendering time rather than being used without modification. Since PEP 750 also deferred the processing of conversion specifiers until evaluation time, the suggestion was put forward that invoking ``__call__`` without arguments could be seen as similar to the existing conversion specifiers that invoke ``__repr__`` (``!a``, ``!r``) or ``__str__`` (``!s``). Accordingly, this PEP was updated to also make conversion specifier processing the responsibility of rendering functions, and to introduce ``!()`` as a new conversion specifier for lazy evaluation. Adding :func:`!operator.convert_field` and updating the :func:`format` builtin was than a matter of providing appropriate support to rendering function implementations that wanted to accept the default conversion specifiers. Allowing arbitrary conversion specifiers in custom renderers ------------------------------------------------------------ Accepting ``!()`` as a new conversion specifier necessarily requires updating the syntax that the parser accepts for conversion specifiers (they are currently restricted to identifiers). This then raised the question of whether t-string compilation should enforce the additional restriction that f-string compilation imposes: that the conversion specifier be exactly one of ``!a``, ``!r``, or ``!s``. With t-strings already being updated to allow ``!()`` when compiled, it made sense to treat conversion specifiers as relating to rendering function similar to the way that format specifiers related to the formatting of individual objects: aside from some characters that are excluded for parsing reasons, they are otherwise free text fields with the meaning decided by the consuming function or object. This reduces the temptation to introduce renderer specific metaformatting into the template's format specifiers (since any renderer specific information can be placed in the conversion specifier instead). Only reserving a single new string prefix ----------------------------------------- The primary difference between this PEP and :pep:`750` is that the latter aims to enable the use of arbitrary string prefixes, rather than requiring the creation of template literal instances that are then passed to other APIs. For example, PEP 750 would allow the ``sh`` render described in this PEP to be used as ``sh"cat {somefile}"`` rather than requiring the template literal to be created explicitly and then passed to a regular function call (as in ``sh(t"cat {somefile}")``). The main reason the PEP authors prefer the second spelling is because it makes it clearer to a reader what is going on: a template literal instance is being created, and then passed to a callable that knows how to do something useful with interpolation template instances. A `draft proposal `__ from one of the :pep:`750` authors also suggests that static typecheckers will be able to infer the use of particular domain specific languages just as readily from the form that uses an explicit function call as they would be able to infer it from a directly tagged string. With the tagged string syntax at least arguably reducing clarity for human readers without increasing the overall expressiveness of the construct, it seems reasonable to start with the smallest viable proposal (a single new string prefix), and then revisit the potential value of generalising to arbitrary prefixes in the future. As a lesser, but still genuine, consideration, only using a single new string prefix for this use case leaves open the possibility of defining alternate prefixes in the future that still produce ``TemplateLiteral`` objects, but use a different syntax within the string to define the interpolation fields (see the :ref:`i18n discussion ` below). Deferring consideration of more concise delayed evaluation syntax ----------------------------------------------------------------- During the discussions of delayed evaluation, ``{-> expr}`` was `suggested `__ as potential syntactic sugar for the already supported ``lambda`` based syntax: ``{(lambda: expr)}`` (the parentheses are required in the existing syntax to avoid misinterpretation of the ``:`` character as indicating the start of the format specifier). While adding such a spelling would complement the rendering time function call syntax proposed in this PEP (that is, writing ``{-> expr!()}`` to evaluate arbitrary expressions at rendering time), it is a topic that the PEP authors consider to be better left to a future PEP if this PEP or :pep:`750` is accepted. Deferring consideration of possible logging integration ------------------------------------------------------- One of the challenges with the logging module has been that we have previously been unable to devise a reasonable migration strategy away from the use of printf-style formatting. While the logging module does allow formatters to specify the use of :meth:`str.format` or :class:`string.Template` style substitution, it can be awkward to ensure that messages written that way are only ever processed by log record formatters that are expecting that syntax. The runtime parsing and interpolation overhead for logging messages also poses a problem for extensive logging of runtime events for monitoring purposes. While beyond the scope of this initial PEP, template literal support could potentially be added to the logging module's event reporting APIs, permitting relevant details to be captured using forms like: .. code-block:: python logging.debug(t"Event: {event}; Details: {data}") logging.critical(t"Error: {error}; Details: {data}") Rather than the historical mod-formatting style: .. code-block:: python logging.debug("Event: %s; Details: %s", event, data) logging.critical("Error: %s; Details: %s", event, data) As the template literal is passed in as an ordinary argument, other keyword arguments would also remain available: .. code-block:: python logging.critical(t"Error: {error}; Details: {data}", exc_info=True) The approach to standardising lazy field evaluation described in this PEP is primarily based on the anticipated needs of this hypothetical integration into the logging module: .. code-block:: python logging.debug(t"Eager evaluation of {expensive_call()}") logging.debug(t"Lazy evaluation of {expensive_call!()}") logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}") logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}") It's an open question whether the definition of logging formatters would be updated to support template strings, but if they were, the most likely way of defining fields which should be :ref:`looked up on the log record ` instead of being interpreted eagerly is simply to escape them so they're available as part of the literal text: .. code-block:: python proc_id = get_process_id() formatter = logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}") .. _pep-501-defer-i18n: Deferring consideration of possible use in i18n use cases --------------------------------------------------------- The initial motivating use case for this PEP was providing a cleaner syntax for i18n (internationalization) translation, as that requires access to the original unmodified template. As such, it focused on compatibility with the substitution syntax used in Python's :class:`string.Template` formatting and Mozilla's l20n project. However, subsequent discussion revealed there are significant additional considerations to be taken into account in the i18n use case, which don't impact the simpler cases of handling interpolation into security sensitive contexts (like HTML, system shells, and database queries), or producing application debugging messages in the preferred language of the development team (rather than the native language of end users). Due to that realisation, the PEP was switched to use the :meth:`str.format` substitution syntax originally defined in :pep:`3101` and subsequently used as the basis for :pep:`498`. While it would theoretically be possible to update :class:`string.Template` to support the creation of instances from native template literals, and to implement the structural ``typing.Template`` protocol, the PEP authors have not identified any practical benefit in doing so. However, one significant benefit of the "only one string prefix" approach used in this PEP is that while it generalises the existing f-string interpolation syntax to support delayed rendering through t-strings, it doesn't imply that that should be the *only* compiler supported interpolation syntax that Python should ever offer. Most notably, it leaves the door open to an alternate "t$-string" syntax that would allow ``TemplateLiteral`` instances to be created using a :pep:`292` based interpolation syntax rather than a :pep:`3101` based syntax: template = t$"Substitute $words and ${other_values} at runtime" The only runtime distinction between templates created that way and templates created from regular t-strings would be in the contents of their ``raw_template`` attributes. .. _pep-501-defer-non-posix-shells: Deferring escaped rendering support for non-POSIX shells -------------------------------------------------------- :func:`shlex.quote` works by classifying the regex character set ``[\w@%+=:,./-]`` to be safe, deeming all other characters to be unsafe, and hence requiring quoting of the string containing them. The quoting mechanism used is then specific to the way that string quoting works in POSIX shells, so it cannot be trusted when running a shell that doesn't follow POSIX shell string quoting rules. For example, running ``subprocess.run(f'echo {shlex.quote(sys.argv[1])}', shell=True)`` is safe when using a shell that follows POSIX quoting rules:: $ cat > run_quoted.py import sys, shlex, subprocess subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) $ python3 run_quoted.py pwd pwd $ python3 run_quoted.py '; pwd' ; pwd $ python3 run_quoted.py "'pwd'" 'pwd' but remains unsafe when running a shell from Python invokes ``cmd.exe`` (or Powershell):: S:\> echo import sys, shlex, subprocess > run_quoted.py S:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.py S:\> type run_quoted.py import sys, shlex, subprocess subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) S:\> python3 run_quoted.py "echo OK" 'echo OK' S:\> python3 run_quoted.py "'& echo Oh no!" ''"'"' Oh no!' Resolving this standard library limitation is beyond the scope of this PEP. Acknowledgements ================ * Eric V. Smith for creating :pep:`498` and demonstrating the feasibility of arbitrary expression substitution in string interpolation * The authors of :pep:`750` for the substantial design improvements that tagged strings inspired for this PEP, their general advocacy for the value of language level delayed template rendering support, and their efforts to ensure that any native interpolation template support lays a strong foundation for future efforts in providing robust syntax highlighting and static type checking support for domain specific languages * Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to exploring the feasibility of using this model of delayed rendering in i18n use cases (even though the ultimate conclusion was that it was a poor fit, at least for current approaches to i18n in Python) References ========== * `%-formatting `_ * `str.format `_ * `string.Template documentation `_ * :pep:`215`: String Interpolation * :pep:`292`: Simpler String Substitutions * :pep:`3101`: Advanced String Formatting * :pep:`498`: Literal string formatting * :pep:`675`: Arbitrary Literal String Type * :pep:`701`: Syntactic formalization of f-strings * `FormattableString and C# native string interpolation `_ * `IFormattable interface in C# (see remarks for globalization notes) `_ * `TemplateLiterals in Javascript `_ * `Running external commands in Julia `_ Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.