PEP: 501 Title: General purpose template literal strings Author: Alyssa Coghlan , Nick Humrich Discussions-To: https://discuss.python.org/t/pep-501-reopen-general-purpose-string-template-literals/24625 Status: Draft Type: Standards Track Content-Type: text/x-rst Requires: 701 Created: 08-Aug-2015 Python-Version: 3.12 Post-History: `08-Aug-2015 `__, `05-Sep-2015 `__, `09-Mar-2023 `__, Abstract ======== Though easy and elegant to use, Python :term:`f-string`\s can be vulnerable to injection attacks when used to construct shell commands, SQL queries, HTML snippets and similar (for example, ``os.system(f"echo {message_from_user}")``). This PEP introduces template literal strings (or "t-strings"), which have the same syntax and semantics but with rendering deferred until :func:`format` or another builder function is called on them. This will allow standard library calls, helper functions and third party tools to safety and intelligently perform appropriate escaping and other string processing on inputs while retaining the usability and convenience of f-strings. Motivation ========== :pep:`498` added new syntactic support for string interpolation that is transparent to the compiler, allowing name references from the interpolation operation full access to containing namespaces (as with any other expression), rather than being limited to explicit name references. These are referred to in the PEP as "f-strings" (a mnemonic for "formatted strings"). Since acceptance of :pep:`498`, f-strings have become well-established and very popular. F-strings are becoming even more useful with the addition of :pep:`701`. While f-strings are great, eager rendering has its limitations. For example, the eagerness of f-strings has made code like the following likely:: os.system(f"echo {message_from_user}") This kind of code is superficially elegant, but poses a significant problem if the interpolated value ``message_from_user`` is in fact provided by an untrusted user: it's an opening for a form of code injection attack, where the supplied user data has not been properly escaped before being passed to the ``os.system`` call. To address that problem (and a number of other concerns), this PEP proposes the complementary introduction of "t-strings" (a mnemonic for "template literal strings"), where ``f"Message with {data}"`` would produce the same result as ``format(t"Message with {data}")``. While this PEP and :pep:`675` are similar in their goals, neither one competes with the other, and can instead be used together. This PEP was previously in deferred status, pending further experience with :pep:`498`'s simpler approach of only supporting eager rendering without the additional complexity of also supporting deferred rendering. Since then, f-strings have become very popular and :pep:`701` has been introduced. This PEP has been updated to reflect current knowledge of f-strings, and improvements from 701. It is designed to be built on top of the :pep:`701` implementation. Proposal ======== This PEP proposes a new string prefix that declares the string to be a template literal rather than an ordinary string:: template = t"Substitute {names:>10} and {expressions()} at runtime" This would be effectively interpreted as:: _raw_template = "Substitute {names:>10} and {expressions()} at runtime" _parsed_template = ( ("Substitute ", "names"), (" and ", "expressions()"), (" at runtime", None), ) _field_values = (names, expressions()) _format_specifiers = (f">10", f"") template = types.TemplateLiteral( _raw_template, _parsed_template, _field_values, _format_specifiers) The ``__format__`` method on ``types.TemplateLiteral`` would then implement the following :meth:`str.format` inspired semantics:: >>> import datetime >>> name = 'Jane' >>> age = 50 >>> anniversary = datetime.date(1991, 10, 12) >>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.') 'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.' >>> format(t'She said her name is {name!r}.') "She said her name is 'Jane'." The implementation of template literals would be based on :pep:`701`, and use the same syntax. This PEP does not propose to remove or deprecate any of the existing string formatting mechanisms, as those will remain valuable when formatting strings that are not present directly in the source code of the application. Summary of differences from f-strings ------------------------------------- The key differences between f-strings and t-strings are: * the ``t`` (template literal) prefix indicates delayed rendering, but otherwise uses the same syntax and semantics as formatted strings * template literals are available at runtime as a new kind of object (``types.TemplateLiteral``) * the default rendering used by formatted strings is invoked on a template literal object by calling ``format(template)`` rather than implicitly * while f-string ``f"Message {here}"`` would be *semantically* equivalent to ``format(t"Message {here}")``, f-strings will continue to avoid the runtime overhead of using the delayed rendering machinery that is needed for t-strings Rationale ========= F-strings (:pep:`498`) made interpolating values into strings with full access to Python's lexical namespace semantics simpler, but it does so at the cost of creating a situation where interpolating values into sensitive targets like SQL queries, shell commands and HTML templates will enjoy a much cleaner syntax when handled without regard for code injection attacks than when they are handled correctly. This PEP proposes to provide the option of delaying the actual rendering of a template literal to its ``__format__`` method, allowing the use of other template renderers by passing the template around as a first class object. While very different in the technical details, the ``types.TemplateLiteral`` interface proposed in this PEP is conceptually quite similar to the ``FormattableString`` type underlying the `native interpolation `__ support introduced in C# 6.0, as well as `template literals in Javascript `__ introduced in ES6. Specification ============= This PEP proposes a new ``t`` string prefix that results in the creation of an instance of a new type, ``types.TemplateLiteral``. Template literals are Unicode strings (bytes literals are not permitted), and string literal concatenation operates as normal, with the entire combined literal forming the template literal. The template string is parsed into literals, expressions and format specifiers as described for f-strings in :pep:`498` and :pep:`701`. Conversion specifiers are handled by the compiler, and appear as part of the field text in interpolation templates. However, rather than being rendered directly into a formatted string, these components are instead organised into an instance of a new type with the following semantics:: class TemplateLiteral: __slots__ = ("raw_template", "parsed_template", "field_values", "format_specifiers") def __new__(cls, raw_template, parsed_template, field_values, format_specifiers): self = super().__new__(cls) self.raw_template = raw_template if len(parsed_template) == 0: raise ValueError("'parsed_template' must contain at least one value") self.parsed_template = parsed_template self.field_values = field_values self.format_specifiers = format_specifiers return self def __bool__(self): return bool(self.raw_template) def __add__(self, other): if isinstance(other, TemplateLiteral): if ( self.parsed_template and self.parsed_template[-1][1] is None and other.parsed_template ): # merge the last string of self with the first string of other content = self.parsed_template[-1][0] new_parsed_template = ( self.parsed_template[:-1] + ( ( content + other.parsed_template[0][0], other.parsed_template[0][1], ), ) + other.parsed_template[1:] ) else: new_parsed_template = self.parsed_template + other.parsed_template return TemplateLiteral( self.raw_template + other.raw_template, new_parsed_template, self.field_values + other.field_values, self.format_specifiers + other.format_specifiers, ) if isinstance(other, str): if self.parsed_template and self.parsed_template[-1][1] is None: # merge string with last value new_parsed_template = self.parsed_template[:-1] + ( (self.parsed_template[-1][0] + other, None), ) else: new_parsed_template = self.parsed_template + ((other, None),) return TemplateLiteral( self.raw_template + other, new_parsed_template, self.field_values, self.format_specifiers, ) else: raise TypeError( f"unsupported operand type(s) for +: '{type(self)}' and '{type(other)}'" ) def __radd__(self, other): if isinstance(other, str): if self.parsed_template: new_parsed_template = ( (other + self.parsed_template[0][0], self.parsed_template[0][1]), ) + self.parsed_template[1:] else: new_parsed_template = ((other, None),) return TemplateLiteral( other + self.raw_template, new_parsed_template, self.field_values, self.format_specifiers, ) else: raise TypeError( f"unsupported operand type(s) for +: '{type(other)}' and '{type(self)}'" ) def __mul__(self, other): if isinstance(other, int): if not self.raw_template or other == 1: return self if other < 1: return TemplateLiteral("", ("", None), (), ()) parsed_template = self.parsed_template last_node = parsed_template[-1] trailing_field = last_node[1] if trailing_field is not None: # With a trailing field, everything can just be repeated the requested number of times new_parsed_template = parsed_template * other else: # Without a trailing field, need to amend the parsed template repetitions to merge # the trailing text from each repetition with the leading text of the next first_node = parsed_template[0] merged_node = (last_node[0] + first_node[0], first_node[1]) repeating_pattern = parsed_template[1:-1] + merged_node new_parsed_template = ( parsed_template[:-1] + (repeating_pattern * (other - 1))[:-1] + last_node ) return TemplateLiteral( self.raw_template * other, new_parsed_template, self.field_values * other, self.format_specifiers * other, ) else: raise TypeError( f"unsupported operand type(s) for *: '{type(self)}' and '{type(other)}'" ) def __rmul__(self, other): if isinstance(other, int): return self * other else: raise TypeError( f"unsupported operand type(s) for *: '{type(other)}' and '{type(self)}'" ) def __eq__(self, other): if not isinstance(other, TemplateLiteral): return False return ( self.raw_template == other.raw_template and self.parsed_template == other.parsed_template and self.field_values == other.field_values and self.format_specifiers == other.format_specifiers ) def __repr__(self): return ( f"<{type(self).__qualname__} {repr(self.raw_template)} " f"at {id(self):#x}>" ) def __format__(self, format_specifier): # When formatted, render to a string, and use string formatting return format(self.render(), format_specifier) def render(self, *, render_template="".join, render_field=format): ... # See definition of the template rendering semantics below The result of a template literal expression is an instance of this type, rather than an already rendered string — rendering only takes place when the instance's ``render`` method is called (either directly, or indirectly via ``__format__``). The compiler will pass the following details to the template literal for later use: * a string containing the raw template as written in the source code * a parsed template tuple that allows the renderer to render the template without needing to reparse the raw string template for substitution fields * a tuple containing the evaluated field values, in field substitution order * a tuple containing the field format specifiers, in field substitution order This structure is designed to take full advantage of compile time constant folding by ensuring the parsed template is always constant, even when the field values and format specifiers include variable substitution expressions. The raw template is just the template literal as a string. By default, it is used to provide a human-readable representation for the template literal. The parsed template consists of a tuple of 2-tuples, with each 2-tuple containing the following fields: * ``leading_text``: a leading string literal. This will be the empty string if the current field is at the start of the string, or immediately follows the preceding field. * ``field_expr``: the text of the expression element in the substitution field. This will be None for a final trailing text segment. The tuple of evaluated field values holds the *results* of evaluating the substitution expressions in the scope where the template literal appears. The tuple of field specifiers holds the *results* of evaluating the field specifiers as f-strings in the scope where the template literal appears. The ``TemplateLiteral.render`` implementation then defines the rendering process in terms of the following renderers: * an overall ``render_template`` operation that defines how the sequence of literal template sections and rendered fields are composed into a fully rendered result. The default template renderer is string concatenation using ``''.join``. * a per field ``render_field`` operation that receives the field value and format specifier for substitution fields within the template. The default field renderer is the ``format`` builtin. Given an appropriate parsed template representation and internal methods of iterating over it, the semantics of template rendering would then be equivalent to the following:: def render(self, *, render_template=''.join, render_field=format): iter_fields = enumerate(self.parsed_template) values = self.field_values specifiers = self.format_specifiers template_parts = [] for field_pos, (leading_text, field_expr) in iter_fields: template_parts.append(leading_text) if field_expr is not None: value = values[field_pos] specifier = specifiers[field_pos] rendered_field = render_field(value, specifier) template_parts.append(rendered_field) return render_template(template_parts) Conversion specifiers --------------------- The ``!a``, ``!r`` and ``!s`` conversion specifiers supported by ``str.format`` and hence :pep:`498` are handled in template literals as follows: * they're included unmodified in the raw template to ensure no information is lost * they're *replaced* in the parsed template with the corresponding builtin calls, in order to ensure that ``field_expr`` always contains a valid Python expression * the corresponding field value placed in the field values tuple is converted appropriately *before* being passed to the template literal This means that, for most purposes, the difference between the use of conversion specifiers and calling the corresponding builtins in the original template literal will be transparent to custom renderers. The difference will only be apparent if reparsing the raw template, or attempting to reconstruct the original template from the parsed template. Writing custom renderers ------------------------ Writing a custom renderer doesn't require any special syntax. Instead, custom renderers are ordinary callables that process an interpolation template directly either by calling the ``render()`` method with alternate ``render_template`` or ``render_field`` implementations, or by accessing the template's data attributes directly. For example, the following function would render a template using objects' ``repr`` implementations rather than their native formatting support:: def reprformat(template): def render_field(value, specifier): return format(repr(value), specifier) return template.render(render_field=render_field) When writing custom renderers, note that the return type of the overall rendering operation is determined by the return type of the passed in ``render_template`` callable. While this is expected to be a string in most cases, producing non-string objects *is* permitted. For example, a custom template renderer could involve an ``sqlalchemy.sql.text`` call that produces an `SQL Alchemy query object `__. Non-strings may also be returned from ``render_field``, as long as it is paired with a ``render_template`` implementation that expects that behaviour. Expression evaluation --------------------- As with f-strings, the subexpressions that are extracted from the interpolation template are evaluated in the context where the template literal appears. This means the expression has full access to local, nonlocal and global variables. Any valid Python expression can be used inside ``{}``, including function and method calls. Because the substitution expressions are evaluated where the string appears in the source code, there are no additional security concerns related to the contents of the expression itself, as you could have also just written the same expression and used runtime field parsing:: >>> bar=10 >>> def foo(data): ... return data + 20 ... >>> str(t'input={bar}, output={foo(bar)}') 'input=10, output=30' Is essentially equivalent to:: >>> 'input={}, output={}'.format(bar, foo(bar)) 'input=10, output=30' Handling code injection attacks ------------------------------- The :pep:`498` formatted string syntax makes it potentially attractive to write code like the following:: runquery(f"SELECT {column} FROM {table};") runcommand(f"cat {filename}") return_response(f"{response.body}") These all represent potential vectors for code injection attacks, if any of the variables being interpolated happen to come from an untrusted source. The specific proposal in this PEP is designed to make it straightforward to write use case specific renderers that take care of quoting interpolated values appropriately for the relevant security context:: runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};")) runcommand(sh(t"cat {filename}")) return_response(html(t"{response.body}")) This PEP does not cover adding all such renderers to the standard library immediately (though one for shell escaping is proposed), but rather proposes to ensure that they can be readily provided by third party libraries, and potentially incorporated into the standard library at a later date. It is proposed that a renderer is included in the :mod:`shlex` module, aimed to offer a POSIX shell style experience for accessing external programs, without the significant risks posed by running ``os.system`` or enabling the system shell when using the ``subprocess`` module APIs, which will provide an interface for running external programs inspired by that offered by the `Julia programming language `__, only with the backtick based ``\`cat $filename\``` syntax replaced by ``t"cat {filename}"`` style template literals. See more in the :ref:`501-shlex-module` section. Format specifiers ----------------- Aside from separating them out from the substitution expression during parsing, format specifiers are otherwise treated as opaque strings by the interpolation template parser - assigning semantics to those (or, alternatively, prohibiting their use) is handled at runtime by the field renderer. Error handling -------------- Either compile time or run time errors can occur when processing interpolation expressions. Compile time errors are limited to those errors that can be detected when parsing a template string into its component tuples. These errors all raise SyntaxError. Unmatched braces:: >>> t'x={x' File "", line 1 t'x={x' ^ SyntaxError: missing '}' in template literal expression Invalid expressions:: >>> t'x={!x}' File "", line 1 !x ^ SyntaxError: invalid syntax Run time errors occur when evaluating the expressions inside a template string before creating the template literal object. See :pep:`498` for some examples. Different renderers may also impose additional runtime constraints on acceptable interpolated expressions and other formatting details, which will be reported as runtime exceptions. .. _501-shlex-module: Renderer for shell escaping added to shlex ========================================== As a reference implementation, a renderer for safe POSIX shell escaping can be added to the :mod:`shlex` module. This renderer would be called ``sh`` and would be equivalent to calling ``shlex.quote`` on each field value in the template literal. Thus:: os.system(shlex.sh(t'cat {myfile}')) would have the same behavior as:: os.system('cat ' + shlex.quote(myfile))) The implementation would be:: def sh(template: TemplateLiteral): return template.render(render_field=quote) Changes to subprocess module ============================ With the additional renderer in the shlex module, and the addition of template literals, the :mod:`subprocess` module can be changed to handle accepting template literals as an additional input type to ``Popen``, as it already accepts a sequence, or a string, with different behavior for each. With the addition of template literals, :class:`subprocess.Popen` (and in return, all its higher level functions such as :func:`~subprocess.run`) could accept strings in a safe way. For example:: subprocess.run(t'cat {myfile}', shell=True) would automatically use the ``shlex.sh`` renderer provided in this PEP. Therefore, using shlex inside a ``subprocess.run`` call like so:: subprocess.run(shlex.sh(t'cat {myfile}'), shell=True) would be redundant, as ``run`` would automatically render any template literals through ``shlex.sh`` Alternatively, when ``subprocess.Popen`` is run without ``shell=True``, it could still provide subprocess with a more ergonomic syntax. For example:: subprocess.run(t'cat {myfile} --flag {value}') would be equivalent to:: subprocess.run(['cat', myfile, '--flag', value]) or, more accurately:: subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}')) It would do this by first using the ``shlex.sh`` renderer, as above, then using ``shlex.split`` on the result. The implementation inside ``subprocess.Popen._execute_child`` would look like:: if isinstance(args, TemplateLiteral): import shlex if shell: args = [shlex.sh(args)] else: args = shlex.split(shlex.sh(args)) Possible integration with the logging module ============================================ One of the challenges with the logging module has been that we have previously been unable to devise a reasonable migration strategy away from the use of printf-style formatting. The runtime parsing and interpolation overhead for logging messages also poses a problem for extensive logging of runtime events for monitoring purposes. While beyond the scope of this initial PEP, template literal support could potentially be added to the logging module's event reporting APIs, permitting relevant details to be captured using forms like:: logging.debug(t"Event: {event}; Details: {data}") logging.critical(t"Error: {error}; Details: {data}") Rather than the current mod-formatting style:: logging.debug("Event: %s; Details: %s", event, data) logging.critical("Error: %s; Details: %s", event, data) As the template literal is passed in as an ordinary argument, other keyword arguments would also remain available:: logging.critical(t"Error: {error}; Details: {data}", exc_info=True) As part of any such integration, a recommended approach would need to be defined for "lazy evaluation" of interpolated fields, as the ``logging`` module's existing delayed interpolation support provides access to :ref:`various attributes ` of the event ``LogRecord`` instance. For example, since template literal expressions are arbitrary Python expressions, string literals could be used to indicate cases where evaluation itself is being deferred, not just rendering:: logging.debug(t"Logger: {'record.name'}; Event: {event}; Details: {data}") This could be further extended with idioms like using inline tuples to indicate deferred function calls to be made only if the log message is actually going to be rendered at current logging levels:: logging.debug(t"Event: {event}; Details: {expensive_call, raw_data}") This kind of approach would be possible as having access to the actual *text* of the field expression would allow the logging renderer to distinguish between inline tuples that appear in the field expression itself, and tuples that happen to be passed in as data values in a normal field. Comparison to PEP 675 ===================== This PEP has similar goals to :pep:`675`. While both are attempting to provide a way to have safer code, they are doing so in different ways. :pep:`675` provides a way to find potential security issues via static analysis. It does so by providing a way for the type checker to flag sections of code that are using dynamic strings incorrectly. This requires a user to actually run a static analysis type checker such as mypy. If :pep:`675` tells you that you are violating a type check, it is up to the programmer to know how to handle the dynamic-ness of the string. This PEP provides a safer alternative to f-strings at runtime. If a user recieves a type-error, changing an existing f-string into a t-string could be an easy way to solve the problem. t-strings enable safer code by correctly escaping the dynamic sections of strings, while maintaining the static portions. This PEP also allows a way for a library/codebase to be safe, but it does so at runtime rather than only during static analysis. For example, if a library wanted to ensure "only safe strings", it could check that the type of object passed in at runtime is a template literal:: def my_safe_function(string_like_object): if not isinstance(string_like_object, types.TemplateLiteral): raise TypeError("Argument 'string_like_object' must be a t-string") The two PEPs could also be used together by typing your function as accepting either a string literal or a template literal. This way, your function can provide the same API for both static and dynamic strings:: def my_safe_function(string_like_object: LiteralString | TemplateLiteral): ... Discussion ========== Refer to :pep:`498` for previous discussion, as several of the points there also apply to this PEP. Support for binary interpolation -------------------------------- As f-strings don't handle byte strings, neither will t-strings. Interoperability with str-only interfaces ----------------------------------------- For interoperability with interfaces that only accept strings, interpolation templates can still be prerendered with ``format``, rather than delegating the rendering to the called function. This reflects the key difference from :pep:`498`, which *always* eagerly applies the default rendering, without any way to delegate the choice of renderer to another section of the code. Preserving the raw template string ---------------------------------- Earlier versions of this PEP failed to make the raw template string available on the template literal. Retaining it makes it possible to provide a more attractive template representation, as well as providing the ability to precisely reconstruct the original string, including both the expression text and the details of any eagerly rendered substitution fields in format specifiers. Creating a rich object rather than a global name lookup ------------------------------------------------------- Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than a creating a new kind of object for later consumption by interpolation functions. Creating a rich descriptive object with a useful default renderer made it much easier to support customisation of the semantics of interpolation. Building atop f-strings rather than replacing them -------------------------------------------------- Earlier versions of this PEP attempted to serve as a complete substitute for :pep:`498` (f-strings) . With the acceptance of that PEP and the more recent :pep:`701`, this PEP can now build a more flexible delayed rendering capability on top of the existing f-string eager rendering. Assuming the presence of f-strings as a supporting capability simplified a number of aspects of the proposal in this PEP (such as how to handle substitution fields in format specifiers) Deferring consideration of possible use in i18n use cases --------------------------------------------------------- The initial motivating use case for this PEP was providing a cleaner syntax for i18n translation, as that requires access to the original unmodified template. As such, it focused on compatibility with the substitution syntax used in Python's ``string.Template`` formatting and Mozilla's l20n project. However, subsequent discussion revealed there are significant additional considerations to be taken into account in the i18n use case, which don't impact the simpler cases of handling interpolation into security sensitive contexts (like HTML, system shells, and database queries), or producing application debugging messages in the preferred language of the development team (rather than the native language of end users). Due to the original design of the ``str.format`` substitution syntax in :pep:`3101` being inspired by C#'s string formatting syntax, the specific field substitution syntax used in :pep:`498` is consistent not only with Python's own ``str.format`` syntax, but also with string formatting in C#, including the native "$-string" interpolation syntax introduced in C# 6.0 (released in July 2015). The related ``IFormattable`` interface in C# forms the basis of a `number of elements `__ of C#'s internationalization and localization support. This means that while this particular substitution syntax may not currently be widely used for translation of *Python* applications (losing out to traditional %-formatting and the designed-specifically-for-i18n ``string.Template`` formatting), it *is* a popular translation format in the wider software development ecosystem (since it is already the preferred format for translating C# applications). Acknowledgements ================ * Eric V. Smith for creating :pep:`498` and demonstrating the feasibility of arbitrary expression substitution in string interpolation * Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to exploring the feasibility of using this model of delayed rendering in i18n use cases (even though the ultimate conclusion was that it was a poor fit, at least for current approaches to i18n in Python) References ========== * `%-formatting `_ * `str.format `_ * `string.Template documentation `_ * :pep:`215`: String Interpolation * :pep:`292`: Simpler String Substitutions * :pep:`3101`: Advanced String Formatting * :pep:`498`: Literal string formatting * :pep:`675`: Arbitrary Literal String Type * :pep:`701`: Syntactic formalization of f-strings * `FormattableString and C# native string interpolation `_ * `IFormattable interface in C# (see remarks for globalization notes) `_ * `TemplateLiterals in Javascript `_ * `Running external commands in Julia `_ Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.