diff --git a/peps/pep-0501.rst b/peps/pep-0501.rst index 453fd7652..7bcc6162c 100644 --- a/peps/pep-0501.rst +++ b/peps/pep-0501.rst @@ -12,6 +12,8 @@ Post-History: `08-Aug-2015 `__, `09-Mar-2023 `__, +.. TODO: Start new PEP 501 d.p.o thread once these updates have been merged + Abstract ======== @@ -21,25 +23,51 @@ shell commands, SQL queries, HTML snippets and similar (for example, ``os.system(f"echo {message_from_user}")``). This PEP introduces template literal strings (or "t-strings"), which have the same syntax and semantics but with rendering deferred -until :func:`format` or another builder function is called on them. +until :func:`format` or another template rendering function is called on them. This will allow standard library calls, helper functions and third party tools to safety and intelligently perform appropriate escaping and other string processing on inputs while retaining the usability and convenience of f-strings. +Relationship with other PEPs +============================ + +This PEP is inpired by and builds on top of the f-string syntax first implemented +in :pep:`498` and formalised in :pep:`701`. + +This PEP complements the literal string typing support added to Python's formal type +system in :pep:`675` by introducing a *safe* way to do dynamic interpolation of runtime +values into security sensitive strings. + +This PEP competes with some aspects of the tagged string proposal in :pep:`750` +(most notably in whether template rendering is expressed as ``render(t"template literal")`` +or as ``render"template literal"``), but also shares *many* common features (after +:pep:`750` was published, this PEP was updated with +`several new changes `__ +inspired by the tagged strings proposal). + +This PEP does NOT propose an alternative to :pep:`292` for user interface +internationalization use cases (but does note the potential for future syntactic +enhancements aimed at that use case that would benefit from the compiler-support +value interpolation machinery that this PEP and :pep:`750` introduce). + + Motivation ========== + :pep:`498` added new syntactic support for string interpolation that is transparent to the compiler, allowing name references from the interpolation operation full access to containing namespaces (as with any other expression), rather than being limited to explicit name references. These are referred -to in the PEP as "f-strings" (a mnemonic for "formatted strings"). +to in the PEP (and elsewhere) as "f-strings" (a mnemonic for "formatted strings"). Since acceptance of :pep:`498`, f-strings have become well-established and very popular. -F-strings are becoming even more useful with the addition of :pep:`701`. -While f-strings are great, eager rendering has its limitations. For example, the eagerness of f-strings -has made code like the following likely:: +f-strings became even more useful and flexible with the formalised grammar in :pep:`701`. +While f-strings are great, eager rendering has its limitations. For example, the +eagerness of f-strings has made code like the following unfortunately plausible: + +.. code-block:: python os.system(f"echo {message_from_user}") @@ -49,45 +77,54 @@ untrusted user: it's an opening for a form of code injection attack, where the supplied user data has not been properly escaped before being passed to the ``os.system`` call. +While the ``LiteralString`` type annotation introduced in :pep:`675` means that typecheckers +are able to report a type error for this kind of unsafe function usage, those errors don't +help make it easier to write code that uses safer alternatives (such as +:func:`subprocess.run`). + To address that problem (and a number of other concerns), this PEP proposes the complementary introduction of "t-strings" (a mnemonic for "template literal strings"), -where ``f"Message with {data}"`` would produce the same -result as ``format(t"Message with {data}")``. - - -While this PEP and :pep:`675` are similar in their goals, neither one competes with the other, -and can instead be used together. - -This PEP was previously in deferred status, pending further experience with :pep:`498`'s -simpler approach of only supporting eager rendering without the additional -complexity of also supporting deferred rendering. Since then, f-strings have become very popular -and :pep:`701` has been introduced. This PEP has been updated to reflect current knowledge of f-strings, -and improvements from 701. It is designed to be built on top of the :pep:`701` implementation. +where ``format(t"Message with {data}")`` would produce the same result as +``f"Message with {data}"``, but the interpolation template instance can instead be passed +to other template rendering functions which process the contents of the template +differently. Proposal ======== +Dedicated template literal syntax +--------------------------------- + This PEP proposes a new string prefix that declares the -string to be a template literal rather than an ordinary string:: +string to be a template literal rather than an ordinary string: - template = t"Substitute {names:>10} and {expressions()} at runtime" +.. code-block:: python -This would be effectively interpreted as:: + template = t"Substitute {names:>{field_width}} and {expressions()!r} at runtime" - _raw_template = "Substitute {names:>10} and {expressions()} at runtime" - _parsed_template = ( - ("Substitute ", "names"), - (" and ", "expressions()"), - (" at runtime", None), +This would be effectively interpreted as: + +.. code-block:: python + + template = TemplateLiteral( + r"Substitute {names:>{field_width}} and {expressions()} at runtime", + TemplateLiteralText(r"Substitute "), + TemplateLiteralField("names", names, f">{field_width}", ""), + TemplateLiteralText(r" and "), + TemplateLiteralField("expressions()", expressions(), f"", "r"), ) - _field_values = (names, expressions()) - _format_specifiers = (f">10", f"") - template = types.TemplateLiteral( - _raw_template, _parsed_template, _field_values, _format_specifiers) + +(Note: this is an illustrative example implementation. The exact compile time construction +syntax of ``types.TemplateLiteral`` is considered an implementation detail not specified by +the PEP. In particular, the compiler may bypass the default constructor's runtime logic that +detects consecutive text segments and merges them into a single text segment, as well as +checking the runtime types of all supplied arguments). The ``__format__`` method on ``types.TemplateLiteral`` would then -implement the following :meth:`str.format` inspired semantics:: +implement the following :meth:`str.format` inspired semantics: + +.. code-block:: python-console >>> import datetime >>> name = 'Jane' @@ -98,47 +135,171 @@ implement the following :meth:`str.format` inspired semantics:: >>> format(t'She said her name is {name!r}.') "She said her name is 'Jane'." -The implementation of template literals would be based on :pep:`701`, and use the same syntax. +The syntax of template literals would be based on :pep:`701`, and largely use the same +syntax for the string portion of the template. Aside from using a different prefix, the one +other syntactic change is in the definition and handling of conversion specifiers, both to +allow ``!()`` as a standard conversion specifier to request evaluation of a field at +rendering time, and to allow custom renderers to also define custom conversion specifiers. This PEP does not propose to remove or deprecate any of the existing string formatting mechanisms, as those will remain valuable when formatting strings that are not present directly in the source code of the application. + +Lazy field evaluation conversion specifier +------------------------------------------ + +In addition to the existing support for the ``a``, ``r``, and ``s`` conversion specifiers, +:meth:`str.format` and :meth:`str.format_map` will be updated to accept ``()`` as a +conversion specifier that means "call the interpolated value". + +To support application of the standard conversion specifiers in custom template rendering +functions, a new :func:`!operator.convert_field` function will be added. + +The signature and behaviour of the :func:`format` builtin will also be updated to accept a +conversion specifier as a third optional parameter. If a non-empty conversion specifier +is given, the value will be converted with :func:`!operator.convert_field` before looking up +the ``__format__`` method. + + +Custom conversion specifiers +---------------------------- + +To allow additional field-specific directives to be passed to custom rendering functions in +a way that still allows formatting of the template with the default renderer, the conversion +specifier field will be allowed to contain a second ``!`` character. + +:func:`!operator.convert_field` and :func:`format` (and hence the default template rendering +function) will ignore that character and any subsequent text in the conversion specifier +field. + + +Template renderer for POSIX shell commands +------------------------------------------ + +As both a practical demonstration of the benefits of delayed rendering support, and as +a valuable feature in its own right, a new ``sh`` template renderer will be added to +the :mod:`shlex` module. This renderer will produce strings where all interpolated fields +are escaped with :func:`shlex.quote`. + +The :class:`subprocess.Popen` API (and higher level APIs that depend on it, such as +:func:`subprocess.run`) will be updated to accept interpolation templates and handle +them in accordance with the new ``shlex.sh`` renderer. + + +Background +========== + +This PEP was initially proposed as a competitor to :pep:`498`. After it became clear that +the eager rendering proposal had sustantially more immediate support, it then spent several +years in a deferred state, pending further experience with :pep:`498`'s simpler approach of +only supporting eager rendering without the additional complexity of also supporting deferred +rendering. + +Since then, f-strings have become very popular and :pep:`701` was introduced to tidy up some +rough edges and limitations in their syntax and semantics. The template literal proposal +was updated in 2023 to reflect current knowledge of f-strings, and improvements from +:pep:`701`. + +In 2024, :pep:`750` was published, proposing a general purpose mechanism for custom tagged +string prefixes, rather than the narrower template literal proposal in this PEP. This PEP +was again updated, both to incorporate new ideas inspired by the tagged strings proposal, +and to describe the perceived benefits of the narrower template literal syntax proposal +in this PEP over the more general tagged string proposal. + + Summary of differences from f-strings ------------------------------------- The key differences between f-strings and t-strings are: * the ``t`` (template literal) prefix indicates delayed rendering, but - otherwise uses the same syntax and semantics as formatted strings + otherwise largely uses the same syntax and semantics as formatted strings * template literals are available at runtime as a new kind of object (``types.TemplateLiteral``) * the default rendering used by formatted strings is invoked on a template literal object by calling ``format(template)`` rather than - implicitly + being done implicitly in the compiled code +* unlike f-strings (where conversion specifiers are handled directly in the compiler), + t-string conversion specifiers are handled at rendering time by the rendering function +* the new ``!()`` conversion specifier indicates that the field expression is a callable + that should be called when using the default :func:`format` rendering function. This specifier + is specifically *not* being added to f-strings (since it is pointless there). * while f-string ``f"Message {here}"`` would be *semantically* equivalent to - ``format(t"Message {here}")``, f-strings will continue to avoid the runtime overhead of using - the delayed rendering machinery that is needed for t-strings + ``format(t"Message {here}")``, f-strings will continue to be supported directly in the + compiler and hence avoid the runtime overhead of actually using the delayed rendering + machinery that is needed for t-strings + + +Summary of differences from tagged strings +------------------------------------------ + +When tagged strings were +`first proposed `__, +there were several notable differences from the proposal in PEP 501 beyond the surface +syntax difference between whether rendering function invocations are written as +``render(t"template literal")`` or as ``render"template literal"``. + +Over the course of the initial PEP 750 discussion, many of those differences were eliminated, +either by PEP 501 adopting that aspect of PEP 750's proposal (such as lazily applying +conversion specifiers), or by PEP 750 changing to retain some aspect of PEP 501's proposal +(such as defining a dedicated type to hold template segments rather than representing them +as simple sequences). + +The main remaining significant difference is that this PEP argues that adding *only* the +t-string prefix is a sufficient enhancement to give all the desired benefits described in +PEP 750. The expansion to a generalised "tagged string" syntax isn't necessary, and causes +additional problems that can be avoided. + +The two PEPs also differ in their proposed approaches to handling lazy evaluation of template +fields. + +While there *are* other differences between the two proposals, those differences are more +cosmetic than substantive. In particular: + +* this PEP proposes different names for the structural typing protocols +* this PEP proposes specific names for the concrete implementation types +* this PEP proposes exact details for the proposed APIs of the concrete implementation types + (including concatenation and repetition support, which are not part of the structural + typing protocols) +* this PEP proposes changes to the existing :func:`format` builtin to make it usable directly as + template field renderer + +The two PEPs also differ in *how* they make their case for delayed rendering support. This +PEP focuses more on the concrete implementation concept of using template literals to allow +the "interpolation" and "rendering" steps in f-string processing to be separated in time, +and then taking advantage of that to reduce the potential code injection risks associated +with misuse of f-strings. PEP 750 focuses more on the way that native templating support +allows behaviours that are difficult or impossible to achieve via existing string based +templating methods. As with the cosmetic differences noted above, this is more a difference +in style than a difference in substance. Rationale ========= -F-strings (:pep:`498`) made interpolating values into strings with full access to Python's +f-strings (:pep:`498`) made interpolating values into strings with full access to Python's lexical namespace semantics simpler, but it does so at the cost of creating a situation where interpolating values into sensitive targets like SQL queries, shell commands and HTML templates will enjoy a much cleaner syntax when handled without regard for code injection attacks than when they are handled correctly. This PEP proposes to provide the option of delaying the actual rendering -of a template literal to its ``__format__`` method, allowing the use of -other template renderers by passing the template around as a first class object. +of a template literal to a formatted string to its ``__format__`` method, allowing the use +of other template renderers by passing the template around as a first class object. While very different in the technical details, the ``types.TemplateLiteral`` interface proposed in this PEP is conceptually quite similar to the ``FormattableString`` type underlying the -`native interpolation `__ support introduced in C# 6.0, -as well as `template literals in Javascript `__ introduced in ES6. +`native interpolation `__ +support introduced in C# 6.0, as well as the +`JavaScript template literals `__ +introduced in ES6. + +While not the original motivation for developing the proposal, many of the benefits for +defining domain specific languages described in :pep:`750` also apply to this PEP +(including the potential for per-DSL semantic highlighting in code editors based on the +type specifications of declared template variables and rendering function parameters). Specification @@ -152,168 +313,216 @@ Template literals are Unicode strings (bytes literals are not permitted), and string literal concatenation operates as normal, with the entire combined literal forming the template literal. -The template string is parsed into literals, expressions and format specifiers -as described for f-strings in :pep:`498` and :pep:`701`. Conversion specifiers are handled -by the compiler, and appear as part of the field text in interpolation -templates. +The template string is parsed into literals, expressions, format specifiers, and conversion +specifiers as described for f-strings in :pep:`498` and :pep:`701`. The syntax for conversion +specifiers is relaxed such that arbitrary strings are accepted (excluding those containing +``{``, ``}`` and ``:``) rather than being restricted to valid Python identifiers. However, rather than being rendered directly into a formatted string, these -components are instead organised into an instance of a new type with the -following semantics:: +components are instead organised into instances of new types with the +following behaviour: + +.. code-block:: python + + class TemplateLiteralText(str): + # This is a renamed and extended version of the DecodedConcrete type in PEP 750 + # Real type would be implemented in C, this is an API compatible Python equivalent + _raw: str + + def __new__(cls, raw: str): + decoded = raw.encode("utf-8").decode("unicode-escape") + if decoded == raw: + decoded = raw + text = super().__new__(cls, decoded) + text._raw = raw + return text + + @staticmethod + def merge(text_segments:Sequence[TemplateLiteralText]) -> TemplateLiteralText: + if len(text_segments) == 1: + return text_segments[0] + return TemplateLiteralText("".join(t._raw for t in text_segments)) + + @property + def raw(self) -> str: + return self._raw + + def __repr__(self) -> str: + return f"{type(self).__name__}(r{self._raw!r})" + + def __add__(self, other:Any) -> TemplateLiteralText|NotImplemented: + if isinstance(other, TemplateLiteralText): + return TemplateLiteralText(self._raw + other._raw) + return NotImplemented + + + def __mul__(self, other:Any) -> TemplateLiteralText|NotImplemented: + try: + factor = operator.index(other) + except TypeError: + return NotImplemented + return TemplateLiteralText(self._raw * factor) + __rmul__ = __mul__ + + class TemplateLiteralField(NamedTuple): + # This is mostly a renamed version of the InterpolationConcrete type in PEP 750 + # However: + # - value is eagerly evaluated (values were all originally lazy in PEP 750) + # - conversion specifiers are allowed to be arbitrary strings + # - order of fields is adjusted so the text form is the first field and the + # remaining parameters match the updated signature of the `*format` builtin + # Real type would be implemented in C, this is an API compatible Python equivalent + + expr: str + value: Any + format_spec: str | None = None + conversion_spec: str | None = None + + def __repr__(self) -> str: + return (f"{type(self).__name__}({self.expr}, {self.value!r}, " + f"{self.format_spec!r}, {self.conversion_spec!r})") + + def __str__(self) -> str: + return format(self.value, self.format_spec, self.conversion_spec) + + def __format__(self, format_override) -> str: + if format_override: + format_spec = format_override + else: + format_spec = self.format_spec + return format(self.value, format_spec, self.conversion_spec) class TemplateLiteral: - __slots__ = ("raw_template", "parsed_template", "field_values", "format_specifiers") + # This type corresponds to the TemplateConcrete type in PEP 750 + # Real type would be implemented in C, this is an API compatible Python equivalent + _raw_template: str + _segments = tuple[TemplateLiteralText|TemplateLiteralField] - def __new__(cls, raw_template, parsed_template, field_values, format_specifiers): + def __new__(cls, raw_template:str, *segments:TemplateLiteralText|TemplateLiteralField): self = super().__new__(cls) - self.raw_template = raw_template - if len(parsed_template) == 0: - raise ValueError("'parsed_template' must contain at least one value") - self.parsed_template = parsed_template - self.field_values = field_values - self.format_specifiers = format_specifiers + self._raw_template = raw_template + # Check if there are any adjacent text segments that need merging + # or any empty text segments that need discarding + type_err = "Template literal segments must be template literal text or field instances" + text_expected = True + needs_merge = False + for segment in segments: + match segment: + case TemplateLiteralText(): + if not text_expected or not segment: + needs_merge = True + break + text_expected = False + case TemplateLiteralField(): + text_expected = True + case _: + raise TypeError(type_err) + if not needs_merge: + # Match loop above will have checked all segments + self._segments = segments + return self + # Merge consecutive runs of text fields and drop any empty text fields + merged_segments:list[TemplateLiteralText|TemplateLiteralField] = [] + pending_merge:list[TemplateLiteralText] = [] + for segment in segments: + match segment: + case TemplateLiteralText() as text_segment: + if text_segment: + pending_merge.append(text_segment) + case TemplateLiteralField(): + if pending_merge: + merged_segments.append(TemplateLiteralText.merge(pending_merge)) + pending_merge.clear() + merged_segments.append(segment) + case _: + # First loop above may not check all segments when a merge is needed + raise TypeError(type_err) + if pending_merge: + merged_segments.append(TemplateLiteralText.merge(pending_merge)) + pending_merge.clear() + self._segments = tuple(merged_segments) return self - def __bool__(self): - return bool(self.raw_template) + @property + def raw_template(self) -> str: + return self._raw_template - def __add__(self, other): - if isinstance(other, TemplateLiteral): - if ( - self.parsed_template - and self.parsed_template[-1][1] is None - and other.parsed_template - ): - # merge the last string of self with the first string of other - content = self.parsed_template[-1][0] - new_parsed_template = ( - self.parsed_template[:-1] - + ( - ( - content + other.parsed_template[0][0], - other.parsed_template[0][1], - ), - ) - + other.parsed_template[1:] - ) + @property + def segments(self) -> tuple[TemplateLiteralText|TemplateLiteralField]: + return self._segments - else: - new_parsed_template = self.parsed_template + other.parsed_template + def __len__(self) -> int: + return len(self._segments) - return TemplateLiteral( - self.raw_template + other.raw_template, - new_parsed_template, - self.field_values + other.field_values, - self.format_specifiers + other.format_specifiers, - ) - - if isinstance(other, str): - if self.parsed_template and self.parsed_template[-1][1] is None: - # merge string with last value - new_parsed_template = self.parsed_template[:-1] + ( - (self.parsed_template[-1][0] + other, None), - ) - else: - new_parsed_template = self.parsed_template + ((other, None),) - - return TemplateLiteral( - self.raw_template + other, - new_parsed_template, - self.field_values, - self.format_specifiers, - ) - else: - raise TypeError( - f"unsupported operand type(s) for +: '{type(self)}' and '{type(other)}'" - ) - - def __radd__(self, other): - if isinstance(other, str): - if self.parsed_template: - new_parsed_template = ( - (other + self.parsed_template[0][0], self.parsed_template[0][1]), - ) + self.parsed_template[1:] - else: - new_parsed_template = ((other, None),) - - return TemplateLiteral( - other + self.raw_template, - new_parsed_template, - self.field_values, - self.format_specifiers, - ) - else: - raise TypeError( - f"unsupported operand type(s) for +: '{type(other)}' and '{type(self)}'" - ) - - def __mul__(self, other): - if isinstance(other, int): - if not self.raw_template or other == 1: - return self - if other < 1: - return TemplateLiteral("", ("", None), (), ()) - parsed_template = self.parsed_template - last_node = parsed_template[-1] - trailing_field = last_node[1] - if trailing_field is not None: - # With a trailing field, everything can just be repeated the requested number of times - new_parsed_template = parsed_template * other - else: - # Without a trailing field, need to amend the parsed template repetitions to merge - # the trailing text from each repetition with the leading text of the next - first_node = parsed_template[0] - merged_node = (last_node[0] + first_node[0], first_node[1]) - repeating_pattern = parsed_template[1:-1] + merged_node - new_parsed_template = ( - parsed_template[:-1] - + (repeating_pattern * (other - 1))[:-1] - + last_node - ) - return TemplateLiteral( - self.raw_template * other, - new_parsed_template, - self.field_values * other, - self.format_specifiers * other, - ) - else: - raise TypeError( - f"unsupported operand type(s) for *: '{type(self)}' and '{type(other)}'" - ) - - def __rmul__(self, other): - if isinstance(other, int): - return self * other - else: - raise TypeError( - f"unsupported operand type(s) for *: '{type(other)}' and '{type(self)}'" - ) + def __iter__(self) -> Iterable[TemplateLiteralText|TemplateLiteralField]: + return iter(self._segments) + # Note: template literals do NOT define any relative ordering def __eq__(self, other): if not isinstance(other, TemplateLiteral): - return False + return NotImplemented return ( - self.raw_template == other.raw_template - and self.parsed_template == other.parsed_template + self._raw_template == other._raw_template + and self._segments == other._segments and self.field_values == other.field_values and self.format_specifiers == other.format_specifiers ) - def __repr__(self): - return ( - f"<{type(self).__qualname__} {repr(self.raw_template)} " - f"at {id(self):#x}>" - ) + def __repr__(self) -> str: + return (f"{type(self).__name__}(r{self._raw!r}, " + f"{', '.join(map(repr, self._segments))})") - def __format__(self, format_specifier): - # When formatted, render to a string, and use string formatting + def __format__(self, format_specifier) -> str: + # When formatted, render to a string, and then use string formatting return format(self.render(), format_specifier) - def render(self, *, render_template="".join, render_field=format): + def render(self, *, render_template=''.join, render_text=str, render_field=format): ... # See definition of the template rendering semantics below + def __add__(self, other) -> TemplateLiteral|NotImplemented: + if isinstance(other, TemplateLiteral): + combined_raw_text = self._raw + other._raw + combined_segments = self._segments + other._segments + return TemplateLiteral(combined_raw_text, *combined_segments) + if isinstance(other, str): + # Treat the given string as a new raw text segment + combined_raw_text = self._raw + other + combined_segments = self._segments + (TemplateLiteralText(other),) + return TemplateLiteral(combined_raw_text, *combined_segments) + return NotImplemented + + def __radd__(self, other) -> TemplateLiteral|NotImplemented: + if isinstance(other, str): + # Treat the given string as a new raw text segment. This will likely never + # run in practice due to https://github.com/python/cpython/issues/55686, + # but it at least makes the *intended* behaviour in this case clear. + combined_raw_text = other + self._raw + combined_segments = (TemplateLiteralText(other),) + self._segments + return TemplateLiteral(combined_raw_text, *combined_segments) + return NotImplemented + + def __mul__(self, other) -> TemplateLiteral|NotImplemented: + try: + factor = operator.index(other) + except TypeError: + return NotImplemented + if not self or factor == 1: + return self + if factor < 1: + return TemplateLiteral("") + repeated_text = self._raw_template * factor + repeated_segments = self._segments * factor + return TemplateLiteral(repeated_text, *repeated_segments) + __rmul__ = __mul__ + +(Note: this is an illustrative example implementation, the exact compile time construction +method and internal data management details of ``types.TemplateLiteral`` are considered an +implementation detail not specified by the PEP. However, the expected post-construction +behaviour of the public APIs on ``types.TemplateLiteral`` instances is specified by the +above code, as is the constructor signature for building template instances at runtime) + The result of a template literal expression is an instance of this -type, rather than an already rendered string — rendering only takes +type, rather than an already rendered string. Rendering only takes place when the instance's ``render`` method is called (either directly, or indirectly via ``__format__``). @@ -321,108 +530,317 @@ The compiler will pass the following details to the template literal for later use: * a string containing the raw template as written in the source code -* a parsed template tuple that allows the renderer to render the - template without needing to reparse the raw string template for substitution - fields -* a tuple containing the evaluated field values, in field substitution order -* a tuple containing the field format specifiers, in field substitution order +* a sequence of template segments, with each segment being either: -This structure is designed to take full advantage of compile time constant -folding by ensuring the parsed template is always constant, even when the -field values and format specifiers include variable substitution expressions. + * a literal text segment (a regular Python string that also provides access + to its raw form) + * a parsed template interpolation field, specifying the text of the interpolated + expression (as a regular string), its evaluated result, the format specifier text + (with any substitution fields eagerly evaluated as an f-string), and the conversion + specifier text (as a regular string) The raw template is just the template literal as a string. By default, it is used to provide a human-readable representation for the -template literal. +template literal, but template renderers may also use it for other purposes (e.g. as a +cache lookup key). -The parsed template consists of a tuple of 2-tuples, with each 2-tuple -containing the following fields: +The parsed template structure is taken from :pep:`750` and consists of a sequence of +template segments corresponding to the text segments and interpolation fields in the +template string. -* ``leading_text``: a leading string literal. This will be the empty string if - the current field is at the start of the string, or immediately follows the - preceding field. -* ``field_expr``: the text of the expression element in the substitution field. - This will be None for a final trailing text segment. +This approach is designed to allow compilers to fully process each segment of the template +in order, before finally emitting code to pass all of the template segments to the template +literal constructor. -The tuple of evaluated field values holds the *results* of evaluating the -substitution expressions in the scope where the template literal appears. +For example, assuming the following runtime values: -The tuple of field specifiers holds the *results* of evaluating the field -specifiers as f-strings in the scope where the template literal appears. +.. code-block:: python -The ``TemplateLiteral.render`` implementation then defines the rendering + names = ["Alice", "Bob", "Carol", "Eve"] + field_width = 10 + def expressions(): + return 42 + +The template from the proposal section would be represented at runtime as: + +.. code-block:: python + + TemplateLiteral( + r"Substitute {names:>{field_width}} and {expressions()!r} at runtime", + TemplateLiteralText(r"Substitute "), + TemplateLiteralField("names", ["Alice", "Bob", "Carol", "Eve"], ">10", ""), + TemplateLiteralText(r" and "), + TemplateLiteralField("expressions()", 42, "", "r"), + ) + + +Rendering templates +------------------- + +The ``TemplateLiteral.render`` implementation defines the rendering process in terms of the following renderers: * an overall ``render_template`` operation that defines how the sequence of - literal template sections and rendered fields are composed into a fully - rendered result. The default template renderer is string concatenation - using ``''.join``. -* a per field ``render_field`` operation that receives the field value and - format specifier for substitution fields within the template. The default - field renderer is the ``format`` builtin. + rendered text and field segments are composed into a fully rendered result. + The default template renderer is string concatenation using ``''.join``. +* a per text segment ``render_text`` operation that receives the individual literal + text segments within the template. The default text renderer is the builtin ``str`` + constructor. +* a per field segment ``render_field`` operation that receives the field value, format + specifier, and conversion specifier for substitution fields within the template. The + default field renderer is the :func:`format` builtin. -Given an appropriate parsed template representation and internal methods of -iterating over it, the semantics of template rendering would then be equivalent -to the following:: +Given the parsed template representation above, the semantics of template rendering would +then be equivalent to the following: + +.. code-block:: python + + def render(self, *, render_template=''.join, render_text=str, render_field=format): + rendered_segments = [] + for segment in self._segments: + match segment: + case TemplateLiteralText() as text_segment: + rendered_segments.append(render_text(text_segment)) + case TemplateLiteralField() as field_segment: + rendered_segments.append(render_field(*field_segment[1:])) + return render_template(rendered_segments) + + +Format specifiers +----------------- + +The syntax and processing of field specifiers in t-strings is defined to be the same as it +is for f-strings. + +This includes allowing field specifiers to themselves contain f-string substitution fields. +The raw text of the field specifiers (without processing any substitution fields) is +retained as part of the full raw template string. + +The parsed field specifiers receive the field specifier string with those substitutions +already resolved. The ``:`` prefix is also omitted. + +Aside from separating them out from the substitution expression during parsing, +format specifiers are otherwise treated as opaque strings by the interpolation +template parser - assigning semantics to those (or, alternatively, +prohibiting their use) is handled at rendering time by the field renderer. - def render(self, *, render_template=''.join, - render_field=format): - iter_fields = enumerate(self.parsed_template) - values = self.field_values - specifiers = self.format_specifiers - template_parts = [] - for field_pos, (leading_text, field_expr) in iter_fields: - template_parts.append(leading_text) - if field_expr is not None: - value = values[field_pos] - specifier = specifiers[field_pos] - rendered_field = render_field(value, specifier) - template_parts.append(rendered_field) - return render_template(template_parts) Conversion specifiers --------------------- -The ``!a``, ``!r`` and ``!s`` conversion specifiers supported by ``str.format`` -and hence :pep:`498` are handled in template literals as follows: -* they're included unmodified in the raw template to ensure no information is - lost -* they're *replaced* in the parsed template with the corresponding builtin - calls, in order to ensure that ``field_expr`` always contains a valid - Python expression -* the corresponding field value placed in the field values tuple is - converted appropriately *before* being passed to the template literal +In addition to the existing support for ``a``, ``r``, and ``s`` conversion specifiers, +:meth:`str.format` and :meth:`str.format_map` will be updated to accept ``()`` as a +conversion specifier that means "call the interpolated value". + +Where :pep:`701` restricts conversion specifiers to ``NAME`` tokens, this PEP will instead +allow ``FSTRING_MIDDLE`` tokens (such that only ``{``, ``}`` and ``:`` are disallowed). This +change is made primarily to support lazy field rendering with the ``!()`` conversion +specifier, but also allows custom rendering functions more flexibility when defining their +own conversion specifiers in preference to those defined for the default :func:`format` field +renderer. + +Conversion specifiers are still handled as plain strings, and do NOT support the use +of substitution fields. + +The parsed conversion specifiers receive the conversion specifier string with the +``!`` prefix omitted. + +To allow custom template renderers to define their own custom conversion specifiers without +causing the default renderer to fail, conversion specifiers will be permitted to contain a +custom suffix prefixed with a second ``!`` character. That is, ``!!``, +``!a!``, ``!r!``, ``!s!``, and ``!()!`` would all be +valid conversion specifiers in a template literal. + +As described above, the default rendering supports the original ``!a``, ``!r`` and ``!s`` +conversion specifiers defined in :pep:`3101`, together with the new ``!()`` lazy field +evaluation conversion specifier defined in this PEP. The default rendering ignores any +custom conversion specifier suffixes. + +The full mapping between the standard conversion specifiers and the special methods called +on the interpolated value when the field is rendered: + +* No conversion (empty string): ``__format__`` (with format specifier as parameter) +* ``a``: ``__repr__`` (as per the :func:`ascii` builtin) +* ``r``: ``__repr__`` (as per the :func:`repr` builtin) +* ``s``: ``__str__`` (as per the ``str`` builtin) +* ``()``: ``__call__`` (with no parameters) + +When a conversion occurs, ``__format__`` (with the format specifier) is called on the result +of the conversion rather than being called on the original object. + +The changes to :func:`format` and the addition of :func:`!operator.convert_field` make it +straightforward for custom renderers to also support the standard conversion specifiers. + +f-strings themselves will NOT support the new ``!()`` conversion specifier (as it is +redundant when value interpolation and value rendering always occur at the same time). They +also will NOT support the use of custom conversion specifiers (since the rendering function +is known at compile time and doesn't make use of the custom specifiers). + + +New field conversion API in the :mod:`operator` module +------------------------------------------------------ + +To support application of the standard conversion specifiers in custom template rendering +functions, a new :func:`!operator.convert_field` function will be added: + +.. code-block:: python + + def convert_field(value, conversion_spec=''): + """Apply the given string formatting conversion specifier to the given value""" + std_spec, sep, custom_spec = conversion_spec.partition("!") + match std_spec: + case '': + return value + case 'a': + return ascii(value) + case 'r': + return repr(value) + case 's': + return str(value) + case '()': + return value() + if not sep: + err = f"Invalid conversion specifier {std_spec!r}" + else: + err = f"Invalid conversion specifier {std_spec!r} in {conversion_spec!r}" + raise ValueError(f"{err}: expected '', 'a', 'r', 's' or '()') + + +Conversion specifier parameter added to :func:`format` +------------------------------------------------------ + +The signature and behaviour of the :func:`format` builtin will be updated: + +.. code-block:: python + + def format(value, format_spec='', conversion_spec=''): + if conversion_spec: + value_to_format = operator.convert_field(value) + else: + value_to_format = value + return type(value_to_format).__format__(value, format_spec) + +If a non-empty conversion specifier is given, the value will be converted with +:func:`!operator.convert_field` before looking up the ``__format__`` method. + +The signature of the ``__format__`` special method does NOT change (only format specifiers +are handled by the object being formatted). + + +Structural typing and duck typing +--------------------------------- + +To allow custom renderers to accept alternative interpolation template implementations +(rather than being tightly coupled to the native interpolation template types), the +following structural protocols will be added to the ``typing`` module: + +.. code-block:: python + + @runtime_checkable + class TemplateText(Protocol): + # Renamed version of PEP 750's Decoded protocol + def __str__(self) -> str: + ... + + raw: str + + @runtime_checkable + class TemplateField(Protocol): + # Renamed and modified version of PEP 750's Interpolation protocol + def __len__(self): + ... + + def __getitem__(self, index: int): + ... + + def __str__(self) -> str: + ... + + expr: str + value: Any + format_spec: str | None = None + conversion_spec: str | None = None + + @runtime_checkable + class InterpolationTemplate(Protocol): + # Corresponds to PEP 750's Template protocol + def __iter__(self) -> Iterable[TemplateText|TemplateField]: + ... + + raw_template: str + +Note that the structural protocol APIs are substantially narrower than the full +implementation APIs defined for ``TemplateLiteralText``, ``TemplateLiteralField``, +and ``TemplateLiteral``. + +Code that wants to accept interpolation templates and define specific handling for them +without introducing a dependency on the ``typing`` module, or restricting the code to +handling the concrete template literal types, should instead perform an attribute +existence check on ``raw_template``. -This means that, for most purposes, the difference between the use of -conversion specifiers and calling the corresponding builtins in the -original template literal will be transparent to custom renderers. The -difference will only be apparent if reparsing the raw template, or attempting -to reconstruct the original template from the parsed template. Writing custom renderers ------------------------ Writing a custom renderer doesn't require any special syntax. Instead, custom renderers are ordinary callables that process an interpolation -template directly either by calling the ``render()`` method with alternate ``render_template`` or ``render_field`` implementations, or by accessing the -template's data attributes directly. +template directly either by calling the ``render()`` method with alternate +``render_template``, ``render_text``, and/or ``render_field`` implementations, or by +accessing the template's data attributes directly. For example, the following function would render a template using objects' -``repr`` implementations rather than their native formatting support:: +``repr`` implementations rather than their native formatting support: - def reprformat(template): - def render_field(value, specifier): - return format(repr(value), specifier) +.. code-block:: python + + def repr_format(template): + def render_field(value, format_spec, conversion_spec): + converted_value = operator.convert_field(value, conversion_spec) + return format(repr(converted_value), format_spec) + return template.render(render_field=render_field) + +The customer renderer shown respects the conversion specifiers in the original template, but +it is also possible to ignore them and render the interpolated values directly: + +.. code-block:: python + + def input_repr_format(template): + def render_field(value, format_spec, __): + return format(repr(value), format_spec) return template.render(render_field=render_field) When writing custom renderers, note that the return type of the overall -rendering operation is determined by the return type of the passed in ``render_template`` callable. While this is expected to be a string in most -cases, producing non-string objects *is* permitted. For example, a custom -template renderer could involve an ``sqlalchemy.sql.text`` call that produces -an `SQL Alchemy query object `__. +rendering operation is determined by the return type of the passed in ``render_template`` +callable. While this will still be a string for formatting related use cases, producing +non-string objects *is* permitted. For example, a custom SQL +template renderer could involve an ``sqlalchemy.sql.text`` call that produces an +`SQL Alchemy query object `__. +A subprocess invocation related template renderer could produce a string sequence suitable +for passing to ``subprocess.run``, or it could even call ``subprocess.run`` directly, and +return the result. + +Non-strings may also be returned from ``render_text`` and ``render_field``, as long as +they are paired with a ``render_template`` implementation that expects that behaviour. + +Custom renderers using the pattern matching style described in :pep:`750` are also supported: + +.. code-block:: python + + # Use the structural typing protocols rather than the concrete implementation types + from typing import InterpolationTemplate, TemplateText, TemplateField + + def greet(template: InterpolationTemplate) -> str: + """Render an interpolation template using structural pattern matching.""" + result = [] + for segment in template: + match segment: + match segment: + case TemplateText() as text_segment: + result.append(text_segment) + case TemplateField() as field_segment: + result.append(str(field_segment).upper()) + return f"{''.join(result)}!" -Non-strings may also be returned from ``render_field``, as long as it is paired -with a ``render_template`` implementation that expects that behaviour. Expression evaluation --------------------- @@ -436,7 +854,10 @@ function and method calls. Because the substitution expressions are evaluated where the string appears in the source code, there are no additional security concerns related to the contents of the expression itself, as you could have also just written the -same expression and used runtime field parsing:: +same expression and used runtime field parsing: + +.. code-block:: python-console + >>> bar=10 >>> def foo(data): @@ -445,16 +866,21 @@ same expression and used runtime field parsing:: >>> str(t'input={bar}, output={foo(bar)}') 'input=10, output=30' -Is essentially equivalent to:: +Is essentially equivalent to: + +.. code-block:: python-console >>> 'input={}, output={}'.format(bar, foo(bar)) 'input=10, output=30' + Handling code injection attacks ------------------------------- The :pep:`498` formatted string syntax makes it potentially attractive to write -code like the following:: +code like the following: + +.. code-block:: python runquery(f"SELECT {column} FROM {table};") runcommand(f"cat {filename}") @@ -464,34 +890,38 @@ These all represent potential vectors for code injection attacks, if any of the variables being interpolated happen to come from an untrusted source. The specific proposal in this PEP is designed to make it straightforward to write use case specific renderers that take care of quoting interpolated values -appropriately for the relevant security context:: +appropriately for the relevant security context: + +.. code-block:: python runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};")) runcommand(sh(t"cat {filename}")) return_response(html(t"{response.body}")) This PEP does not cover adding all such renderers to the standard library -immediately (though one for shell escaping is proposed), but rather proposes to ensure that they can be readily provided by -third party libraries, and potentially incorporated into the standard library -at a later date. +immediately (though one for shell escaping is proposed), but rather proposes to ensure +that they can be readily provided by third party libraries, and potentially incorporated +into the standard library at a later date. -It is proposed that a renderer is included in the :mod:`shlex` module, aimed to offer a POSIX shell style experience for -accessing external programs, without the significant risks posed by running -``os.system`` or enabling the system shell when using the ``subprocess`` module -APIs, which will provide an interface for running external programs inspired by that -offered by the +Over time, it is expected that APIs processing potentially dangerous string inputs may be +updated to accept interpolation templates natively, allowing problematic code examples to +be fixed simply by replacing the ``f`` string prefix with a ``t``: + +.. code-block:: python + + runquery(t"SELECT {column} FROM {table};") + runcommand(t"cat {filename}") + return_response(t"{response.body}") + +It is proposed that a renderer is included in the :mod:`shlex` module, aiming to offer a +more POSIX shell style experience for accessing external programs, without the significant +risks posed by running ``os.system`` or enabling the system shell when using the +``subprocess`` module APIs. This renderer will provide an interface for running external +programs inspired by that offered by the `Julia programming language `__, -only with the backtick based ``\`cat $filename\``` syntax replaced by -``t"cat {filename}"`` style template literals. -See more in the :ref:`501-shlex-module` section. +only with the backtick based ``\`cat $filename\``` syntax replaced by ``t"cat {filename}"`` +style template literals. See more in the :ref:`pep-501-shlex-module` section. -Format specifiers ------------------ - -Aside from separating them out from the substitution expression during parsing, -format specifiers are otherwise treated as opaque strings by the interpolation -template parser - assigning semantics to those (or, alternatively, -prohibiting their use) is handled at runtime by the field renderer. Error handling -------------- @@ -525,176 +955,171 @@ Different renderers may also impose additional runtime constraints on acceptable interpolated expressions and other formatting details, which will be reported as runtime exceptions. -.. _501-shlex-module: -Renderer for shell escaping added to shlex -========================================== +.. _pep-501-shlex-module: -As a reference implementation, a renderer for safe POSIX shell escaping can be added to the :mod:`shlex` -module. This renderer would be called ``sh`` and would be equivalent to calling ``shlex.quote`` on -each field value in the template literal. +Renderer for shell escaping added to :mod:`shlex` +------------------------------------------------- -Thus:: +As a reference implementation, a renderer for safe POSIX shell escaping can be added to +the :mod:`shlex` module. This renderer would be called ``sh`` and would be equivalent to +calling ``shlex.quote`` on each field value in the template literal. + +Thus: + +.. code-block:: python os.system(shlex.sh(t'cat {myfile}')) -would have the same behavior as:: +would have the same behavior as: + +.. code-block:: python os.system('cat ' + shlex.quote(myfile))) -The implementation would be:: +The implementation would be: + +.. code-block:: python def sh(template: TemplateLiteral): - return template.render(render_field=quote) + def render_field(value, format_spec, conversion_spec) + field_text = format(value, format_spec, conversion_spec) + return quote(field_text) + return template.render(render_field=render_field) + +The addition of ``shlex.sh`` will NOT change the existing admonishments in the +:mod:`subprocess` documentation that passing ``shell=True`` is best avoided, nor the +reference from the :func:`os.system` documentation the higher level ``subprocess`` APIs. Changes to subprocess module -============================ +---------------------------- With the additional renderer in the shlex module, and the addition of template literals, the :mod:`subprocess` module can be changed to handle accepting template literals as an additional input type to ``Popen``, as it already accepts a sequence, or a string, with different behavior for each. -With the addition of template literals, :class:`subprocess.Popen` (and in return, all its higher level functions such as :func:`~subprocess.run`) -could accept strings in a safe way. -For example:: + +With the addition of template literals, :class:`subprocess.Popen` (and in return, all its +higher level functions such as :func:`subprocess.run`) could accept strings in a safe way +(at least on :ref:`POSIX systems `). + +For example: + +.. code-block:: python subprocess.run(t'cat {myfile}', shell=True) -would automatically use the ``shlex.sh`` renderer provided in this PEP. Therefore, using shlex -inside a ``subprocess.run`` call like so:: +would automatically use the ``shlex.sh`` renderer provided in this PEP. Therefore, using +``shlex`` inside a ``subprocess.run`` call like so: + +.. code-block:: python subprocess.run(shlex.sh(t'cat {myfile}'), shell=True) -would be redundant, as ``run`` would automatically render any template literals through ``shlex.sh`` +would be redundant, as ``run`` would automatically render any template literals +through ``shlex.sh`` -Alternatively, when ``subprocess.Popen`` is run without ``shell=True``, it could still provide -subprocess with a more ergonomic syntax. For example:: +Alternatively, when ``subprocess.Popen`` is run without ``shell=True``, it could still +provide subprocess with a more ergonomic syntax. For example: + +.. code-block:: python subprocess.run(t'cat {myfile} --flag {value}') -would be equivalent to:: +would be equivalent to: + +.. code-block:: python subprocess.run(['cat', myfile, '--flag', value]) -or, more accurately:: +or, more accurately: + +.. code-block:: python subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}')) -It would do this by first using the ``shlex.sh`` renderer, as above, then using ``shlex.split`` on the result. +It would do this by first using the ``shlex.sh`` renderer, as above, then using +``shlex.split`` on the result. -The implementation inside ``subprocess.Popen._execute_child`` would look like:: +The implementation inside ``subprocess.Popen._execute_child`` would look like: - if isinstance(args, TemplateLiteral): - import shlex - if shell: - args = [shlex.sh(args)] - else: - args = shlex.split(shlex.sh(args)) +.. code-block:: python + + if hasattr(args, "raw_template"): + import shlex + if shell: + args = [shlex.sh(args)] + else: + args = shlex.split(shlex.sh(args)) -Possible integration with the logging module -============================================ +How to Teach This +================= -One of the challenges with the logging module has been that we have previously -been unable to devise a reasonable migration strategy away from the use of -printf-style formatting. The runtime parsing and interpolation overhead for -logging messages also poses a problem for extensive logging of runtime events -for monitoring purposes. +This PEP intentionally includes two standard renderers that will always be available in +teaching environments: the :func:`format` builtin and the new ``shlex.sh`` POSIX shell +renderer. -While beyond the scope of this initial PEP, template literal support -could potentially be added to the logging module's event reporting APIs, -permitting relevant details to be captured using forms like:: +Together, these two renderers can be used to build an initial understanding of delayed +rendering on top of a student's initial introduction to string formatting with f-strings. +This initial understanding would have the goal of allowing students to *use* template +literals effectively, in combination with pre-existing template rendering functions. - logging.debug(t"Event: {event}; Details: {data}") - logging.critical(t"Error: {error}; Details: {data}") +For example, ``f"{'some text'}"``, ``f"{value}"``, ``f"{value!r}"``, , ``f"{callable()}"`` +could all be introduced. -Rather than the current mod-formatting style:: +Those same operations could then be rewritten as ``format(t"{'some text'}")``, +``format(t"{value}")``, ``format(t"{value!r}")``, , ``format(t"{callable()}")`` to +illustrate the relationship between the eager rendering form and the delayed rendering +form. - logging.debug("Event: %s; Details: %s", event, data) - logging.critical("Error: %s; Details: %s", event, data) +The difference between "template definition time" (or "interpolation time" ) and +"template rendering time" can then be investigated further by storing the template literals +as local variables and looking at their representations separately from the results of the +``format`` calls. At this point, the ``t"{callable!()}"`` syntax can be introduced to +distinguish between field expressions that are called at template definition time and those +that are called at template rendering time. -As the template literal is passed in as an ordinary argument, other -keyword arguments would also remain available:: +Finally, the differences between the results of ``f"{'some text'}"``, +``format(t"{'some text'}")``, and ``shlex.sh(t"{'some text'}")`` could be explored to +illustrate the potential for differences between the default rendering function and custom +rendering functions. - logging.critical(t"Error: {error}; Details: {data}", exc_info=True) +Actually defining your own custom template rendering functions would then be a separate more +advanced topic (similar to the way students are routinely taught to use decorators and +context managers well before they learn how to write their own custom ones). -As part of any such integration, a recommended approach would need to be -defined for "lazy evaluation" of interpolated fields, as the ``logging`` -module's existing delayed interpolation support provides access to -:ref:`various attributes ` of the event ``LogRecord`` instance. - -For example, since template literal expressions are arbitrary Python expressions, -string literals could be used to indicate cases where evaluation itself is -being deferred, not just rendering:: - - logging.debug(t"Logger: {'record.name'}; Event: {event}; Details: {data}") - -This could be further extended with idioms like using inline tuples to indicate -deferred function calls to be made only if the log message is actually -going to be rendered at current logging levels:: - - logging.debug(t"Event: {event}; Details: {expensive_call, raw_data}") - -This kind of approach would be possible as having access to the actual *text* -of the field expression would allow the logging renderer to distinguish -between inline tuples that appear in the field expression itself, and tuples -that happen to be passed in as data values in a normal field. - - -Comparison to PEP 675 -===================== - -This PEP has similar goals to :pep:`675`. -While both are attempting to provide a way to have safer code, they are doing so in different ways. -:pep:`675` provides a way to find potential security issues via static analysis. -It does so by providing a way for the type checker to flag sections of code that are using -dynamic strings incorrectly. This requires a user to actually run a static analysis type checker such as mypy. - -If :pep:`675` tells you that you are violating a type check, it is up to the programmer to know how to handle the dynamic-ness of the string. -This PEP provides a safer alternative to f-strings at runtime. -If a user receives a type-error, changing an existing f-string into a t-string could be an easy way to solve the problem. - -t-strings enable safer code by correctly escaping the dynamic sections of strings, while maintaining the static portions. - -This PEP also allows a way for a library/codebase to be safe, but it does so at runtime rather than -only during static analysis. For example, if a library wanted to ensure "only safe strings", it -could check that the type of object passed in at runtime is a template literal:: - - def my_safe_function(string_like_object): - if not isinstance(string_like_object, types.TemplateLiteral): - raise TypeError("Argument 'string_like_object' must be a t-string") - -The two PEPs could also be used together by typing your function as accepting either a string literal or a template literal. -This way, your function can provide the same API for both static and dynamic strings:: - - def my_safe_function(string_like_object: LiteralString | TemplateLiteral): - ... +:pep:`750` includes further ideas for teaching aspects of the delayed rendering topic. Discussion ========== Refer to :pep:`498` for previous discussion, as several of the points there -also apply to this PEP. +also apply to this PEP. :pep:`750`'s design discussions are also highly relevant, +as that PEP inspired several aspects of the current design. + Support for binary interpolation -------------------------------- As f-strings don't handle byte strings, neither will t-strings. + Interoperability with str-only interfaces ----------------------------------------- For interoperability with interfaces that only accept strings, interpolation -templates can still be prerendered with ``format``, rather than delegating the +templates can still be prerendered with :func:`format`, rather than delegating the rendering to the called function. This reflects the key difference from :pep:`498`, which *always* eagerly applies the default rendering, without any way to delegate the choice of renderer to another section of the code. + Preserving the raw template string ---------------------------------- @@ -704,33 +1129,195 @@ attractive template representation, as well as providing the ability to precisely reconstruct the original string, including both the expression text and the details of any eagerly rendered substitution fields in format specifiers. + Creating a rich object rather than a global name lookup ------------------------------------------------------- Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than -a creating a new kind of object for later consumption by interpolation +creating a new kind of object for later consumption by interpolation functions. Creating a rich descriptive object with a useful default renderer made it much easier to support customisation of the semantics of interpolation. + Building atop f-strings rather than replacing them -------------------------------------------------- Earlier versions of this PEP attempted to serve as a complete substitute for :pep:`498` (f-strings) . With the acceptance of that PEP and the more recent :pep:`701`, -this PEP can now build a more flexible delayed rendering capability +this PEP can instead build a more flexible delayed rendering capability on top of the existing f-string eager rendering. Assuming the presence of f-strings as a supporting capability simplified a number of aspects of the proposal in this PEP (such as how to handle substitution -fields in format specifiers) +fields in format specifiers). + + +Defining repetition and concatenation semantics +----------------------------------------------- + +This PEPs explicitly defines repetition and concatenation semantics for ``TemplateLiteral`` +and ``TemplateLiteralText``. While not strictly necessary, defining these is expected +to make the types easier to work with in code that historically only supported regular +strings. + + +New conversion specifier for lazy field evaluation +-------------------------------------------------- + +The initially published version of :pep:`750` defaulted to lazy evaluation for all +interpolation fields. While it was subsequently updated to default to eager evaluation +(as happens for f-strings and this PEP), the discussions around the topic prompted the idea +of providing a way to indicate to rendering functions that the interpolated field value +should be called at rendering time rather than being used without modification. + +Since PEP 750 also deferred the processing of conversion specifiers until evaluation time, +the suggestion was put forward that invoking ``__call__`` without arguments could be seen +as similar to the existing conversion specifiers that invoke ``__repr__`` (``!a``, ``!r``) +or ``__str__`` (``!s``). + +Accordingly, this PEP was updated to also make conversion specifier processing the +responsibility of rendering functions, and to introduce ``!()`` as a new conversion +specifier for lazy evaluation. + +Adding :func:`!operator.convert_field` and updating the :func:`format` builtin was than +a matter of providing appropriate support to rendering function implementations that +wanted to accept the default conversion specifiers. + + +Allowing arbitrary conversion specifiers in custom renderers +------------------------------------------------------------ + +Accepting ``!()`` as a new conversion specifier necessarily requires updating the syntax +that the parser accepts for conversion specifiers (they are currently restricted to +identifiers). This then raised the question of whether t-string compilation should enforce +the additional restriction that f-string compilation imposes: that the conversion specifier +be exactly one of ``!a``, ``!r``, or ``!s``. + +With t-strings already being updated to allow ``!()`` when compiled, it made sense to treat +conversion specifiers as relating to rendering function similar to the way that format +specifiers related to the formatting of individual objects: aside from some characters that +are excluded for parsing reasons, they are otherwise free text fields with the meaning +decided by the consuming function or object. This reduces the temptation to introduce +renderer specific metaformatting into the template's format specifiers (since any +renderer specific information can be placed in the conversion specifier instead). + + +Only reserving a single new string prefix +----------------------------------------- + +The primary difference between this PEP and :pep:`750` is that the latter aims to enable +the use of arbitrary string prefixes, rather than requiring the creation of template +literal instances that are then passed to other APIs. For example, PEP 750 would allow +the ``sh`` render described in this PEP to be used as ``sh"cat {somefile}"`` rather than +requiring the template literal to be created explicitly and then passed to a regular +function call (as in ``sh(t"cat {somefile}")``). + +The main reason the PEP authors prefer the second spelling is because it makes it clearer +to a reader what is going on: a template literal instance is being created, and then +passed to a callable that knows how to do something useful with interpolation template +instances. + +A `draft proposal `__ +from one of the :pep:`750` authors also suggests that static typecheckers will be able +to infer the use of particular domain specific languages just as readily from the form +that uses an explicit function call as they would be able to infer it from a directly +tagged string. + +With the tagged string syntax at least arguably reducing clarity for human readers without +increasing the overall expressiveness of the construct, it seems reasonable to start with +the smallest viable proposal (a single new string prefix), and then revisit the potential +value of generalising to arbitrary prefixes in the future. + +As a lesser, but still genuine, consideration, only using a single new string prefix for +this use case leaves open the possibility of defining alternate prefixes in the future that +still produce ``TemplateLiteral`` objects, but use a different syntax within the string to +define the interpolation fields (see the :ref:`i18n discussion ` below). + + +Deferring consideration of more concise delayed evaluation syntax +----------------------------------------------------------------- + +During the discussions of delayed evaluation, ``{-> expr}`` was +`suggested `__ +as potential syntactic sugar for the already supported ``lambda`` based syntax: +``{(lambda: expr)}`` (the parentheses are required in the existing syntax to avoid +misinterpretation of the ``:`` character as indicating the start of the format specifier). + +While adding such a spelling would complement the rendering time function call syntax +proposed in this PEP (that is, writing ``{-> expr!()}`` to evaluate arbitrary expressions +at rendering time), it is a topic that the PEP authors consider to be better left to a +future PEP if this PEP or :pep:`750` is accepted. + + +Deferring consideration of possible logging integration +------------------------------------------------------- + +One of the challenges with the logging module has been that we have previously +been unable to devise a reasonable migration strategy away from the use of +printf-style formatting. While the logging module does allow formatters to specify the +use of :meth:`str.format` or :class:`string.Template` style substitution, it can be awkward +to ensure that messages written that way are only ever processed by log record formatters +that are expecting that syntax. + +The runtime parsing and interpolation overhead for logging messages also poses a problem +for extensive logging of runtime events for monitoring purposes. + +While beyond the scope of this initial PEP, template literal support +could potentially be added to the logging module's event reporting APIs, +permitting relevant details to be captured using forms like: + +.. code-block:: python + + logging.debug(t"Event: {event}; Details: {data}") + logging.critical(t"Error: {error}; Details: {data}") + +Rather than the historical mod-formatting style: + +.. code-block:: python + + logging.debug("Event: %s; Details: %s", event, data) + logging.critical("Error: %s; Details: %s", event, data) + +As the template literal is passed in as an ordinary argument, other +keyword arguments would also remain available: + +.. code-block:: python + + logging.critical(t"Error: {error}; Details: {data}", exc_info=True) + +The approach to standardising lazy field evaluation described in this PEP is +primarily based on the anticipated needs of this hypothetical integration into +the logging module: + +.. code-block:: python + + logging.debug(t"Eager evaluation of {expensive_call()}") + logging.debug(t"Lazy evaluation of {expensive_call!()}") + + logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}") + logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}") + +It's an open question whether the definition of logging formatters would be updated to +support template strings, but if they were, the most likely way of defining fields which +should be :ref:`looked up on the log record ` instead of being +interpreted eagerly is simply to escape them so they're available as part of the literal +text: + +.. code-block:: python + + proc_id = get_process_id() + formatter = logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}") + + +.. _pep-501-defer-i18n: Deferring consideration of possible use in i18n use cases --------------------------------------------------------- The initial motivating use case for this PEP was providing a cleaner syntax -for i18n translation, as that requires access to the original unmodified -template. As such, it focused on compatibility with the substitution syntax used -in Python's ``string.Template`` formatting and Mozilla's l20n project. +for i18n (internationalization) translation, as that requires access to the original +unmodified template. As such, it focused on compatibility with the substitution syntax +used in Python's :class:`string.Template` formatting and Mozilla's l20n project. However, subsequent discussion revealed there are significant additional considerations to be taken into account in the i18n use case, which don't @@ -739,25 +1326,79 @@ contexts (like HTML, system shells, and database queries), or producing application debugging messages in the preferred language of the development team (rather than the native language of end users). -Due to the original design of the ``str.format`` substitution syntax in :pep:`3101` being inspired by C#'s string formatting syntax, the specific field -substitution syntax used in :pep:`498` is consistent not only with Python's own ``str.format`` syntax, but also with string formatting in C#, including the -native "$-string" interpolation syntax introduced in C# 6.0 (released in July -2015). The related ``IFormattable`` interface in C# forms the basis of a -`number of elements `__ of C#'s internationalization and localization -support. +Due to that realisation, the PEP was switched to use the :meth:`str.format` substitution +syntax originally defined in :pep:`3101` and subsequently used as the basis for :pep:`498`. + +While it would theoretically be possible to update :class:`string.Template` to support +the creation of instances from native template literals, and to implement the structural +``typing.Template`` protocol, the PEP authors have not identified any practical benefit +in doing so. + +However, one significant benefit of the "only one string prefix" approach used in this PEP +is that while it generalises the existing f-string interpolation syntax to support delayed +rendering through t-strings, it doesn't imply that that should be the *only* compiler +supported interpolation syntax that Python should ever offer. + +Most notably, it leaves the door open to an alternate "t$-string" syntax that would allow +``TemplateLiteral`` instances to be created using a :pep:`292` based interpolation syntax +rather than a :pep:`3101` based syntax: + + template = t$"Substitute $words and ${other_values} at runtime" + +The only runtime distinction between templates created that way and templates created from +regular t-strings would be in the contents of their ``raw_template`` attributes. + + +.. _pep-501-defer-non-posix-shells: + +Deferring escaped rendering support for non-POSIX shells +-------------------------------------------------------- + +:func:`shlex.quote` works by classifying the regex character set ``[\w@%+=:,./-]`` to be +safe, deeming all other characters to be unsafe, and hence requiring quoting of the string +containing them. The quoting mechanism used is then specific to the way that string quoting +works in POSIX shells, so it cannot be trusted when running a shell that doesn't follow +POSIX shell string quoting rules. + +For example, running ``subprocess.run(f'echo {shlex.quote(sys.argv[1])}', shell=True)`` is +safe when using a shell that follows POSIX quoting rules:: + + $ cat > run_quoted.py + import sys, shlex, subprocess + subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) + $ python3 run_quoted.py pwd + pwd + $ python3 run_quoted.py '; pwd' + ; pwd + $ python3 run_quoted.py "'pwd'" + 'pwd' + +but remains unsafe when running a shell from Python invokes ``cmd.exe`` (or Powershell):: + + S:\> echo import sys, shlex, subprocess > run_quoted.py + S:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.py + S:\> type run_quoted.py + import sys, shlex, subprocess + subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) + S:\> python3 run_quoted.py "echo OK" + 'echo OK' + S:\> python3 run_quoted.py "'& echo Oh no!" + ''"'"' + Oh no!' + +Resolving this standard library limitation is beyond the scope of this PEP. -This means that while this particular substitution syntax may not -currently be widely used for translation of *Python* applications (losing out -to traditional %-formatting and the designed-specifically-for-i18n -``string.Template`` formatting), it *is* a popular translation format in the -wider software development ecosystem (since it is already the preferred -format for translating C# applications). Acknowledgements ================ * Eric V. Smith for creating :pep:`498` and demonstrating the feasibility of arbitrary expression substitution in string interpolation +* The authors of :pep:`750` for the substantial design improvements that tagged strings + inspired for this PEP, their general advocacy for the value of language level delayed + template rendering support, and their efforts to ensure that any native interpolation + template support lays a strong foundation for future efforts in providing robust syntax + highlighting and static type checking support for domain specific languages * Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to exploring the feasibility of using this model of delayed rendering in i18n use cases (even though the ultimate conclusion was that it was a poor fit,