1485 lines
64 KiB
ReStructuredText
1485 lines
64 KiB
ReStructuredText
PEP: 501
|
|
Title: General purpose template literal strings
|
|
Author: Alyssa Coghlan <ncoghlan@gmail.com>, Nick Humrich <nick@humrich.us>
|
|
Discussions-To: https://discuss.python.org/t/pep-501-reopen-general-purpose-string-template-literals/24625
|
|
Status: Withdrawn
|
|
Type: Standards Track
|
|
Requires: 701
|
|
Created: 08-Aug-2015
|
|
Python-Version: 3.12
|
|
Post-History: `08-Aug-2015 <https://mail.python.org/archives/list/python-dev@python.org/thread/EAZ3P2M3CDDIQFR764NF6FXQHWXYMKJF/>`__,
|
|
`05-Sep-2015 <https://mail.python.org/archives/list/python-dev@python.org/thread/ILVRPS6DTFZ7IHL5HONDBB6INVXTFOZ2/>`__,
|
|
`09-Mar-2023 <https://discuss.python.org/t/pep-501-reopen-general-purpose-string-template-literals/24625>`__,
|
|
Superseded-By: 750
|
|
|
|
.. superseded:: 750
|
|
|
|
Abstract
|
|
========
|
|
|
|
Though easy and elegant to use, Python :term:`f-string`\s
|
|
can be vulnerable to injection attacks when used to construct
|
|
shell commands, SQL queries, HTML snippets and similar
|
|
(for example, ``os.system(f"echo {message_from_user}")``).
|
|
This PEP introduces template literal strings (or "t-strings"),
|
|
which have syntax and semantics that are similar to f-strings,
|
|
but with rendering deferred until :func:`format` or another
|
|
template rendering function is called on them.
|
|
This will allow standard library calls, helper functions
|
|
and third party tools to safety and intelligently perform
|
|
appropriate escaping and other string processing on inputs
|
|
while retaining the usability and convenience of f-strings.
|
|
|
|
|
|
PEP Withdrawal
|
|
==============
|
|
|
|
When :pep:`750` was first published as a "tagged strings" proposal
|
|
(allowing for arbitrary string prefixes), this PEP was kept open to
|
|
continue championing the simpler "template literal" approach that
|
|
used a single dedicated string prefix to produce instances of a new
|
|
"interpolation template" type.
|
|
|
|
The `October 2024 updates <https://github.com/python/peps/pull/4062>`__
|
|
to :pep:`750` agreed that template strings were a better fit for Python
|
|
than the broader tagged strings concept.
|
|
|
|
All of the other concerns the authors of this PEP had with :pep:`750`
|
|
were also either addressed in those updates, or else left in a state
|
|
where they could reasonably be addressed in a future change proposal.
|
|
|
|
Due to the clear improvements in the updated :pep:`750` proposal,
|
|
this PEP has been withdrawn in favour of :pep:`750`.
|
|
|
|
.. important::
|
|
|
|
The remainder of this PEP still reflects the state of the tagged strings
|
|
proposal in August 2024. It has *not* been updated to reflect the
|
|
October 2024 changes to :pep:`750`, since the PEP withdrawal makes doing
|
|
so redundant.
|
|
|
|
|
|
Relationship with other PEPs
|
|
============================
|
|
|
|
This PEP is inpired by and builds on top of the f-string syntax first implemented
|
|
in :pep:`498` and formalised in :pep:`701`.
|
|
|
|
This PEP complements the literal string typing support added to Python's formal type
|
|
system in :pep:`675` by introducing a *safe* way to do dynamic interpolation of runtime
|
|
values into security sensitive strings.
|
|
|
|
This PEP competes with some aspects of the tagged string proposal in :pep:`750`
|
|
(most notably in whether template rendering is expressed as ``render(t"template literal")``
|
|
or as ``render"template literal"``), but also shares *many* common features (after
|
|
:pep:`750` was published, this PEP was updated with
|
|
`several new changes <https://github.com/python/peps/issues/3904>`__
|
|
inspired by the tagged strings proposal).
|
|
|
|
This PEP does NOT propose an alternative to :pep:`292` for user interface
|
|
internationalization use cases (but does note the potential for future syntactic
|
|
enhancements aimed at that use case that would benefit from the compiler-supported
|
|
value interpolation machinery that this PEP and :pep:`750` introduce).
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
:pep:`498` added new syntactic support for string interpolation that is
|
|
transparent to the compiler, allowing name references from the interpolation
|
|
operation full access to containing namespaces (as with any other expression),
|
|
rather than being limited to explicit name references. These are referred
|
|
to in the PEP (and elsewhere) as "f-strings" (a mnemonic for "formatted strings").
|
|
|
|
Since acceptance of :pep:`498`, f-strings have become well-established and very popular.
|
|
f-strings became even more useful and flexible with the formalised grammar in :pep:`701`.
|
|
While f-strings are great, eager rendering has its limitations. For example, the
|
|
eagerness of f-strings has made code like the following unfortunately plausible:
|
|
|
|
.. code-block:: python
|
|
|
|
os.system(f"echo {message_from_user}")
|
|
|
|
This kind of code is superficially elegant, but poses a significant problem
|
|
if the interpolated value ``message_from_user`` is in fact provided by an
|
|
untrusted user: it's an opening for a form of code injection attack, where
|
|
the supplied user data has not been properly escaped before being passed to
|
|
the ``os.system`` call.
|
|
|
|
While the ``LiteralString`` type annotation introduced in :pep:`675` means that typecheckers
|
|
are able to report a type error for this kind of unsafe function usage, those errors don't
|
|
help make it easier to write code that uses safer alternatives (such as
|
|
:func:`subprocess.run`).
|
|
|
|
To address that problem (and a number of other concerns), this PEP proposes
|
|
the complementary introduction of "t-strings" (a mnemonic for "template literal strings"),
|
|
where ``format(t"Message with {data}")`` would produce the same result as
|
|
``f"Message with {data}"``, but the template literal instance can instead be passed
|
|
to other template rendering functions which process the contents of the template
|
|
differently.
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
Dedicated template literal syntax
|
|
---------------------------------
|
|
|
|
This PEP proposes a new string prefix that declares the
|
|
string to be a template literal rather than an ordinary string:
|
|
|
|
.. code-block:: python
|
|
|
|
template = t"Substitute {names:>{field_width}} and {expressions()!r} at runtime"
|
|
|
|
This would be effectively interpreted as:
|
|
|
|
.. code-block:: python
|
|
|
|
template = TemplateLiteral(
|
|
r"Substitute {names:>{field_width}} and {expressions()} at runtime",
|
|
TemplateLiteralText(r"Substitute "),
|
|
TemplateLiteralField("names", names, f">{field_width}", ""),
|
|
TemplateLiteralText(r" and "),
|
|
TemplateLiteralField("expressions()", expressions(), f"", "r"),
|
|
)
|
|
|
|
(Note: this is an illustrative example implementation. The exact compile time construction
|
|
syntax of ``types.TemplateLiteral`` is considered an implementation detail not specified by
|
|
the PEP. In particular, the compiler may bypass the default constructor's runtime logic that
|
|
detects consecutive text segments and merges them into a single text segment, as well as
|
|
checking the runtime types of all supplied arguments).
|
|
|
|
The ``__format__`` method on ``types.TemplateLiteral`` would then
|
|
implement the following :meth:`str.format` inspired semantics:
|
|
|
|
.. code-block:: python-console
|
|
|
|
>>> import datetime
|
|
>>> name = 'Jane'
|
|
>>> age = 50
|
|
>>> anniversary = datetime.date(1991, 10, 12)
|
|
>>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
|
|
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
|
>>> format(t'She said her name is {name!r}.')
|
|
"She said her name is 'Jane'."
|
|
|
|
The syntax of template literals would be based on :pep:`701`, and largely use the same
|
|
syntax for the string portion of the template. Aside from using a different prefix, the one
|
|
other syntactic change is in the definition and handling of conversion specifiers, both to
|
|
allow ``!()`` as a standard conversion specifier to request evaluation of a field at
|
|
rendering time, and to allow custom renderers to also define custom conversion specifiers.
|
|
|
|
This PEP does not propose to remove or deprecate any of the existing
|
|
string formatting mechanisms, as those will remain valuable when formatting
|
|
strings that are not present directly in the source code of the application.
|
|
|
|
|
|
Lazy field evaluation conversion specifier
|
|
------------------------------------------
|
|
|
|
In addition to the existing support for the ``a``, ``r``, and ``s`` conversion specifiers,
|
|
:meth:`str.format`, :meth:`str.format_map`, and :class:`string.Formatter` will be updated
|
|
to accept ``()`` as a conversion specifier that means "call the interpolated value".
|
|
|
|
To support application of the standard conversion specifiers in custom template rendering
|
|
functions, a new :func:`!operator.convert_field` function will be added.
|
|
|
|
The signature and behaviour of the :func:`format` builtin will also be updated to accept a
|
|
conversion specifier as a third optional parameter. If a non-empty conversion specifier
|
|
is given, the value will be converted with :func:`!operator.convert_field` before looking up
|
|
the ``__format__`` method.
|
|
|
|
|
|
Custom conversion specifiers
|
|
----------------------------
|
|
|
|
To allow additional field-specific directives to be passed to custom rendering functions in
|
|
a way that still allows formatting of the template with the default renderer, the conversion
|
|
specifier field will be allowed to contain a second ``!`` character.
|
|
|
|
:func:`!operator.convert_field` and :func:`format` (and hence the default
|
|
``TemplateLiteral.render`` template rendering method), will ignore that character and any
|
|
subsequent text in the conversion specifier field.
|
|
|
|
:meth:`str.format`, :meth:`str.format_map`, and :class:`string.Formatter` will also be
|
|
updated to accept (and ignore) custom conversion specifiers.
|
|
|
|
|
|
Template renderer for POSIX shell commands
|
|
------------------------------------------
|
|
|
|
As both a practical demonstration of the benefits of delayed rendering support, and as
|
|
a valuable feature in its own right, a new ``sh`` template renderer will be added to
|
|
the :mod:`shlex` module. This renderer will produce strings where all interpolated fields
|
|
are escaped with :func:`shlex.quote`.
|
|
|
|
The :class:`subprocess.Popen` API (and higher level APIs that depend on it, such as
|
|
:func:`subprocess.run`) will be updated to accept interpolation templates and handle
|
|
them in accordance with the new ``shlex.sh`` renderer.
|
|
|
|
|
|
Background
|
|
==========
|
|
|
|
This PEP was initially proposed as a competitor to :pep:`498`. After it became clear that
|
|
the eager rendering proposal had sustantially more immediate support, it then spent several
|
|
years in a deferred state, pending further experience with :pep:`498`'s simpler approach of
|
|
only supporting eager rendering without the additional complexity of also supporting deferred
|
|
rendering.
|
|
|
|
Since then, f-strings have become very popular and :pep:`701` was introduced to tidy up some
|
|
rough edges and limitations in their syntax and semantics. The template literal proposal
|
|
was updated in 2023 to reflect current knowledge of f-strings, and improvements from
|
|
:pep:`701`.
|
|
|
|
In 2024, :pep:`750` was published, proposing a general purpose mechanism for custom tagged
|
|
string prefixes, rather than the narrower template literal proposal in this PEP. This PEP
|
|
was again updated, both to incorporate new ideas inspired by the tagged strings proposal,
|
|
and to describe the perceived benefits of the narrower template literal syntax proposal
|
|
in this PEP over the more general tagged string proposal.
|
|
|
|
|
|
Summary of differences from f-strings
|
|
-------------------------------------
|
|
|
|
The key differences between f-strings and t-strings are:
|
|
|
|
* the ``t`` (template literal) prefix indicates delayed rendering, but
|
|
otherwise largely uses the same syntax and semantics as formatted strings
|
|
* template literals are available at runtime as a new kind of object
|
|
(``types.TemplateLiteral``)
|
|
* the default rendering used by formatted strings is invoked on a
|
|
template literal object by calling ``format(template)`` rather than
|
|
being done implicitly in the compiled code
|
|
* unlike f-strings (where conversion specifiers are handled directly in the compiler),
|
|
t-string conversion specifiers are handled at rendering time by the rendering function
|
|
* the new ``!()`` conversion specifier indicates that the field expression is a callable
|
|
that should be called when using the default :func:`format` rendering function. This
|
|
specifier is specifically *not* being added to f-strings (since it is pointless there).
|
|
* a second ``!`` is allowed in t-string conversion specifiers (with any subsequent text
|
|
being ignored) as a way to allow custom template rendering functions to accept custom
|
|
conversion specifiers without breaking the default :func:`!TemplateLiteral.render`
|
|
rendering method. This feature is specifically *not* being added to f-strings (since
|
|
it is pointless there).
|
|
* while f-string ``f"Message {here}"`` would be *semantically* equivalent to
|
|
``format(t"Message {here}")``, f-strings will continue to be supported directly in the
|
|
compiler and hence avoid the runtime overhead of actually using the delayed rendering
|
|
machinery that is needed for t-strings
|
|
|
|
|
|
Summary of differences from tagged strings
|
|
------------------------------------------
|
|
|
|
When tagged strings were
|
|
`first proposed <https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408>`__,
|
|
there were several notable differences from the proposal in PEP 501 beyond the surface
|
|
syntax difference between whether rendering function invocations are written as
|
|
``render(t"template literal")`` or as ``render"template literal"``.
|
|
|
|
Over the course of the initial PEP 750 discussion, many of those differences were eliminated,
|
|
either by PEP 501 adopting that aspect of PEP 750's proposal (such as lazily applying
|
|
conversion specifiers), or by PEP 750 changing to retain some aspect of PEP 501's proposal
|
|
(such as defining a dedicated type to hold template segments rather than representing them
|
|
as simple sequences).
|
|
|
|
The main remaining significant difference is that this PEP argues that adding *only* the
|
|
t-string prefix is a sufficient enhancement to give all the desired benefits described in
|
|
PEP 750. The expansion to a generalised "tagged string" syntax isn't necessary, and causes
|
|
additional problems that can be avoided.
|
|
|
|
The two PEPs also differ in their proposed approaches to handling lazy evaluation of template
|
|
fields.
|
|
|
|
While there *are* other differences between the two proposals, those differences are more
|
|
cosmetic than substantive. In particular:
|
|
|
|
* this PEP proposes different names for the structural typing protocols
|
|
* this PEP proposes specific names for the concrete implementation types
|
|
* this PEP proposes exact details for the proposed APIs of the concrete implementation types
|
|
(including concatenation and repetition support, which are not part of the structural
|
|
typing protocols)
|
|
* this PEP proposes changes to the existing :func:`format` builtin to make it usable
|
|
directly as a template field renderer
|
|
|
|
The two PEPs also differ in *how* they make their case for delayed rendering support. This
|
|
PEP focuses more on the concrete implementation concept of using template literals to allow
|
|
the "interpolation" and "rendering" steps in f-string processing to be separated in time,
|
|
and then taking advantage of that to reduce the potential code injection risks associated
|
|
with misuse of f-strings. PEP 750 focuses more on the way that native templating support
|
|
allows behaviours that are difficult or impossible to achieve via existing string based
|
|
templating methods. As with the cosmetic differences noted above, this is more a difference
|
|
in style than a difference in substance.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
f-strings (:pep:`498`) made interpolating values into strings with full access to Python's
|
|
lexical namespace semantics simpler, but it does so at the cost of creating a
|
|
situation where interpolating values into sensitive targets like SQL queries,
|
|
shell commands and HTML templates will enjoy a much cleaner syntax when handled
|
|
without regard for code injection attacks than when they are handled correctly.
|
|
|
|
This PEP proposes to provide the option of delaying the actual rendering
|
|
of a template literal to a formatted string to its ``__format__`` method, allowing the use
|
|
of other template renderers by passing the template around as a first class object.
|
|
|
|
While very different in the technical details, the
|
|
``types.TemplateLiteral`` interface proposed in this PEP is
|
|
conceptually quite similar to the ``FormattableString`` type underlying the
|
|
`native interpolation <https://msdn.microsoft.com/en-us/library/dn961160.aspx>`__
|
|
support introduced in C# 6.0, as well as the
|
|
`JavaScript template literals <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals>`__
|
|
introduced in ES6.
|
|
|
|
While not the original motivation for developing the proposal, many of the benefits for
|
|
defining domain specific languages described in :pep:`750` also apply to this PEP
|
|
(including the potential for per-DSL semantic highlighting in code editors based on the
|
|
type specifications of declared template variables and rendering function parameters).
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
This PEP proposes a new ``t`` string prefix that
|
|
results in the creation of an instance of a new type,
|
|
``types.TemplateLiteral``.
|
|
|
|
Template literals are Unicode strings (bytes literals are not
|
|
permitted), and string literal concatenation operates as normal, with the
|
|
entire combined literal forming the template literal.
|
|
|
|
The template string is parsed into literals, expressions, format specifiers, and conversion
|
|
specifiers as described for f-strings in :pep:`498` and :pep:`701`. The syntax for conversion
|
|
specifiers is relaxed such that arbitrary strings are accepted (excluding those containing
|
|
``{``, ``}`` or ``:``) rather than being restricted to valid Python identifiers.
|
|
|
|
However, rather than being rendered directly into a formatted string, these
|
|
components are instead organised into instances of new types with the
|
|
following behaviour:
|
|
|
|
.. code-block:: python
|
|
|
|
class TemplateLiteralText(str):
|
|
# This is a renamed and extended version of the DecodedConcrete type in PEP 750
|
|
# Real type would be implemented in C, this is an API compatible Python equivalent
|
|
_raw: str
|
|
|
|
def __new__(cls, raw: str):
|
|
decoded = raw.encode("utf-8").decode("unicode-escape")
|
|
if decoded == raw:
|
|
decoded = raw
|
|
text = super().__new__(cls, decoded)
|
|
text._raw = raw
|
|
return text
|
|
|
|
@staticmethod
|
|
def merge(text_segments:Sequence[TemplateLiteralText]) -> TemplateLiteralText:
|
|
if len(text_segments) == 1:
|
|
return text_segments[0]
|
|
return TemplateLiteralText("".join(t._raw for t in text_segments))
|
|
|
|
@property
|
|
def raw(self) -> str:
|
|
return self._raw
|
|
|
|
def __repr__(self) -> str:
|
|
return f"{type(self).__name__}(r{self._raw!r})"
|
|
|
|
def __add__(self, other:Any) -> TemplateLiteralText|NotImplemented:
|
|
if isinstance(other, TemplateLiteralText):
|
|
return TemplateLiteralText(self._raw + other._raw)
|
|
return NotImplemented
|
|
|
|
|
|
def __mul__(self, other:Any) -> TemplateLiteralText|NotImplemented:
|
|
try:
|
|
factor = operator.index(other)
|
|
except TypeError:
|
|
return NotImplemented
|
|
return TemplateLiteralText(self._raw * factor)
|
|
__rmul__ = __mul__
|
|
|
|
class TemplateLiteralField(NamedTuple):
|
|
# This is mostly a renamed version of the InterpolationConcrete type in PEP 750
|
|
# However:
|
|
# - value is eagerly evaluated (values were all originally lazy in PEP 750)
|
|
# - conversion specifiers are allowed to be arbitrary strings
|
|
# - order of fields is adjusted so the text form is the first field and the
|
|
# remaining parameters match the updated signature of the `*format` builtin
|
|
# Real type would be implemented in C, this is an API compatible Python equivalent
|
|
|
|
expr: str
|
|
value: Any
|
|
format_spec: str | None = None
|
|
conversion_spec: str | None = None
|
|
|
|
def __repr__(self) -> str:
|
|
return (f"{type(self).__name__}({self.expr}, {self.value!r}, "
|
|
f"{self.format_spec!r}, {self.conversion_spec!r})")
|
|
|
|
def __str__(self) -> str:
|
|
return format(self.value, self.format_spec, self.conversion_spec)
|
|
|
|
def __format__(self, format_override) -> str:
|
|
if format_override:
|
|
format_spec = format_override
|
|
else:
|
|
format_spec = self.format_spec
|
|
return format(self.value, format_spec, self.conversion_spec)
|
|
|
|
class TemplateLiteral:
|
|
# This type corresponds to the TemplateConcrete type in PEP 750
|
|
# Real type would be implemented in C, this is an API compatible Python equivalent
|
|
_raw_template: str
|
|
_segments = tuple[TemplateLiteralText|TemplateLiteralField]
|
|
|
|
def __new__(cls, raw_template:str, *segments:TemplateLiteralText|TemplateLiteralField):
|
|
self = super().__new__(cls)
|
|
self._raw_template = raw_template
|
|
# Check if there are any adjacent text segments that need merging
|
|
# or any empty text segments that need discarding
|
|
type_err = "Template literal segments must be template literal text or field instances"
|
|
text_expected = True
|
|
needs_merge = False
|
|
for segment in segments:
|
|
match segment:
|
|
case TemplateLiteralText():
|
|
if not text_expected or not segment:
|
|
needs_merge = True
|
|
break
|
|
text_expected = False
|
|
case TemplateLiteralField():
|
|
text_expected = True
|
|
case _:
|
|
raise TypeError(type_err)
|
|
if not needs_merge:
|
|
# Match loop above will have checked all segments
|
|
self._segments = segments
|
|
return self
|
|
# Merge consecutive runs of text fields and drop any empty text fields
|
|
merged_segments:list[TemplateLiteralText|TemplateLiteralField] = []
|
|
pending_merge:list[TemplateLiteralText] = []
|
|
for segment in segments:
|
|
match segment:
|
|
case TemplateLiteralText() as text_segment:
|
|
if text_segment:
|
|
pending_merge.append(text_segment)
|
|
case TemplateLiteralField():
|
|
if pending_merge:
|
|
merged_segments.append(TemplateLiteralText.merge(pending_merge))
|
|
pending_merge.clear()
|
|
merged_segments.append(segment)
|
|
case _:
|
|
# First loop above may not check all segments when a merge is needed
|
|
raise TypeError(type_err)
|
|
if pending_merge:
|
|
merged_segments.append(TemplateLiteralText.merge(pending_merge))
|
|
pending_merge.clear()
|
|
self._segments = tuple(merged_segments)
|
|
return self
|
|
|
|
@property
|
|
def raw_template(self) -> str:
|
|
return self._raw_template
|
|
|
|
@property
|
|
def segments(self) -> tuple[TemplateLiteralText|TemplateLiteralField]:
|
|
return self._segments
|
|
|
|
def __len__(self) -> int:
|
|
return len(self._segments)
|
|
|
|
def __iter__(self) -> Iterable[TemplateLiteralText|TemplateLiteralField]:
|
|
return iter(self._segments)
|
|
|
|
# Note: template literals do NOT define any relative ordering
|
|
def __eq__(self, other):
|
|
if not isinstance(other, TemplateLiteral):
|
|
return NotImplemented
|
|
return (
|
|
self._raw_template == other._raw_template
|
|
and self._segments == other._segments
|
|
and self.field_values == other.field_values
|
|
and self.format_specifiers == other.format_specifiers
|
|
)
|
|
|
|
def __repr__(self) -> str:
|
|
return (f"{type(self).__name__}(r{self._raw!r}, "
|
|
f"{', '.join(map(repr, self._segments))})")
|
|
|
|
def __format__(self, format_specifier) -> str:
|
|
# When formatted, render to a string, and then use string formatting
|
|
return format(self.render(), format_specifier)
|
|
|
|
def render(self, *, render_template=''.join, render_text=str, render_field=format):
|
|
... # See definition of the template rendering semantics below
|
|
|
|
def __add__(self, other) -> TemplateLiteral|NotImplemented:
|
|
if isinstance(other, TemplateLiteral):
|
|
combined_raw_text = self._raw + other._raw
|
|
combined_segments = self._segments + other._segments
|
|
return TemplateLiteral(combined_raw_text, *combined_segments)
|
|
if isinstance(other, str):
|
|
# Treat the given string as a new raw text segment
|
|
combined_raw_text = self._raw + other
|
|
combined_segments = self._segments + (TemplateLiteralText(other),)
|
|
return TemplateLiteral(combined_raw_text, *combined_segments)
|
|
return NotImplemented
|
|
|
|
def __radd__(self, other) -> TemplateLiteral|NotImplemented:
|
|
if isinstance(other, str):
|
|
# Treat the given string as a new raw text segment. This effectively
|
|
# has precedence over string concatenation in CPython due to
|
|
# https://github.com/python/cpython/issues/55686
|
|
combined_raw_text = other + self._raw
|
|
combined_segments = (TemplateLiteralText(other),) + self._segments
|
|
return TemplateLiteral(combined_raw_text, *combined_segments)
|
|
return NotImplemented
|
|
|
|
def __mul__(self, other) -> TemplateLiteral|NotImplemented:
|
|
try:
|
|
factor = operator.index(other)
|
|
except TypeError:
|
|
return NotImplemented
|
|
if not self or factor == 1:
|
|
return self
|
|
if factor < 1:
|
|
return TemplateLiteral("")
|
|
repeated_text = self._raw_template * factor
|
|
repeated_segments = self._segments * factor
|
|
return TemplateLiteral(repeated_text, *repeated_segments)
|
|
__rmul__ = __mul__
|
|
|
|
(Note: this is an illustrative example implementation, the exact compile time construction
|
|
method and internal data management details of ``types.TemplateLiteral`` are considered an
|
|
implementation detail not specified by the PEP. However, the expected post-construction
|
|
behaviour of the public APIs on ``types.TemplateLiteral`` instances is specified by the
|
|
above code, as is the constructor signature for building template instances at runtime)
|
|
|
|
The result of a template literal expression is an instance of this
|
|
type, rather than an already rendered string. Rendering only takes
|
|
place when the instance's ``render`` method is called (either directly, or
|
|
indirectly via ``__format__``).
|
|
|
|
The compiler will pass the following details to the template literal for
|
|
later use:
|
|
|
|
* a string containing the raw template as written in the source code
|
|
* a sequence of template segments, with each segment being either:
|
|
|
|
* a literal text segment (a regular Python string that also provides access
|
|
to its raw form)
|
|
* a parsed template interpolation field, specifying the text of the interpolated
|
|
expression (as a regular string), its evaluated result, the format specifier text
|
|
(with any substitution fields eagerly evaluated as an f-string), and the conversion
|
|
specifier text (as a regular string)
|
|
|
|
The raw template is just the template literal as a string. By default,
|
|
it is used to provide a human-readable representation for the
|
|
template literal, but template renderers may also use it for other purposes (e.g. as a
|
|
cache lookup key).
|
|
|
|
The parsed template structure is taken from :pep:`750` and consists of a sequence of
|
|
template segments corresponding to the text segments and interpolation fields in the
|
|
template string.
|
|
|
|
This approach is designed to allow compilers to fully process each segment of the template
|
|
in order, before finally emitting code to pass all of the template segments to the template
|
|
literal constructor.
|
|
|
|
For example, assuming the following runtime values:
|
|
|
|
.. code-block:: python
|
|
|
|
names = ["Alice", "Bob", "Carol", "Eve"]
|
|
field_width = 10
|
|
def expressions():
|
|
return 42
|
|
|
|
The template from the proposal section would be represented at runtime as:
|
|
|
|
.. code-block:: python
|
|
|
|
TemplateLiteral(
|
|
r"Substitute {names:>{field_width}} and {expressions()!r} at runtime",
|
|
TemplateLiteralText(r"Substitute "),
|
|
TemplateLiteralField("names", ["Alice", "Bob", "Carol", "Eve"], ">10", ""),
|
|
TemplateLiteralText(r" and "),
|
|
TemplateLiteralField("expressions()", 42, "", "r"),
|
|
)
|
|
|
|
|
|
Rendering templates
|
|
-------------------
|
|
|
|
The ``TemplateLiteral.render`` implementation defines the rendering
|
|
process in terms of the following renderers:
|
|
|
|
* an overall ``render_template`` operation that defines how the sequence of
|
|
rendered text and field segments are composed into a fully rendered result.
|
|
The default template renderer is string concatenation using ``''.join``.
|
|
* a per text segment ``render_text`` operation that receives the individual literal
|
|
text segments within the template. The default text renderer is the builtin ``str``
|
|
constructor.
|
|
* a per field segment ``render_field`` operation that receives the field value, format
|
|
specifier, and conversion specifier for substitution fields within the template. The
|
|
default field renderer is the :func:`format` builtin.
|
|
|
|
Given the parsed template representation above, the semantics of template rendering would
|
|
then be equivalent to the following:
|
|
|
|
.. code-block:: python
|
|
|
|
def render(self, *, render_template=''.join, render_text=str, render_field=format):
|
|
rendered_segments = []
|
|
for segment in self._segments:
|
|
match segment:
|
|
case TemplateLiteralText() as text_segment:
|
|
rendered_segments.append(render_text(text_segment))
|
|
case TemplateLiteralField() as field_segment:
|
|
rendered_segments.append(render_field(*field_segment[1:]))
|
|
return render_template(rendered_segments)
|
|
|
|
|
|
Format specifiers
|
|
-----------------
|
|
|
|
The syntax and processing of field specifiers in t-strings is defined to be the same as it
|
|
is for f-strings.
|
|
|
|
This includes allowing field specifiers to themselves contain f-string substitution fields.
|
|
The raw text of the field specifiers (without processing any substitution fields) is
|
|
retained as part of the full raw template string.
|
|
|
|
The parsed field specifiers receive the field specifier string with those substitutions
|
|
already resolved. The ``:`` prefix is also omitted.
|
|
|
|
Aside from separating them out from the substitution expression during parsing,
|
|
format specifiers are otherwise treated as opaque strings by the interpolation
|
|
template parser - assigning semantics to those (or, alternatively,
|
|
prohibiting their use) is handled at rendering time by the field renderer.
|
|
|
|
|
|
Conversion specifiers
|
|
---------------------
|
|
|
|
In addition to the existing support for ``a``, ``r``, and ``s`` conversion specifiers,
|
|
:meth:`str.format` and :meth:`str.format_map` will be updated to accept ``()`` as a
|
|
conversion specifier that means "call the interpolated value".
|
|
|
|
Where :pep:`701` restricts conversion specifiers to ``NAME`` tokens, this PEP will instead
|
|
allow ``FSTRING_MIDDLE`` tokens (such that only ``{``, ``}`` and ``:`` are disallowed). This
|
|
change is made primarily to support lazy field rendering with the ``!()`` conversion
|
|
specifier, but also allows custom rendering functions more flexibility when defining their
|
|
own conversion specifiers in preference to those defined for the default :func:`format` field
|
|
renderer.
|
|
|
|
Conversion specifiers are still handled as plain strings, and do NOT support the use
|
|
of substitution fields.
|
|
|
|
The parsed conversion specifiers receive the conversion specifier string with the
|
|
``!`` prefix omitted.
|
|
|
|
To allow custom template renderers to define their own custom conversion specifiers without
|
|
causing the default renderer to fail, conversion specifiers will be permitted to contain a
|
|
custom suffix prefixed with a second ``!`` character. That is, ``!!<custom>``,
|
|
``!a!<custom>``, ``!r!<custom>``, ``!s!<custom>``, and ``!()!<custom>`` would all be
|
|
valid conversion specifiers in a template literal.
|
|
|
|
As described above, the default rendering supports the original ``!a``, ``!r`` and ``!s``
|
|
conversion specifiers defined in :pep:`3101`, together with the new ``!()`` lazy field
|
|
evaluation conversion specifier defined in this PEP. The default rendering ignores any
|
|
custom conversion specifier suffixes.
|
|
|
|
The full mapping between the standard conversion specifiers and the special methods called
|
|
on the interpolated value when the field is rendered:
|
|
|
|
* No conversion (empty string): ``__format__`` (with format specifier as parameter)
|
|
* ``a``: ``__repr__`` (as per the :func:`ascii` builtin)
|
|
* ``r``: ``__repr__`` (as per the :func:`repr` builtin)
|
|
* ``s``: ``__str__`` (as per the ``str`` builtin)
|
|
* ``()``: ``__call__`` (with no parameters)
|
|
|
|
When a conversion occurs, ``__format__`` (with the format specifier) is called on the result
|
|
of the conversion rather than being called on the original object.
|
|
|
|
The changes to :func:`format` and the addition of :func:`!operator.convert_field` make it
|
|
straightforward for custom renderers to also support the standard conversion specifiers.
|
|
|
|
f-strings themselves will NOT support the new ``!()`` conversion specifier (as it is
|
|
redundant when value interpolation and value rendering always occur at the same time). They
|
|
also will NOT support the use of custom conversion specifiers (since the rendering function
|
|
is known at compile time and doesn't make use of the custom specifiers).
|
|
|
|
|
|
New field conversion API in the :mod:`operator` module
|
|
------------------------------------------------------
|
|
|
|
To support application of the standard conversion specifiers in custom template rendering
|
|
functions, a new :func:`!operator.convert_field` function will be added:
|
|
|
|
.. code-block:: python
|
|
|
|
def convert_field(value, conversion_spec=''):
|
|
"""Apply the given string formatting conversion specifier to the given value"""
|
|
std_spec, sep, custom_spec = conversion_spec.partition("!")
|
|
match std_spec:
|
|
case '':
|
|
return value
|
|
case 'a':
|
|
return ascii(value)
|
|
case 'r':
|
|
return repr(value)
|
|
case 's':
|
|
return str(value)
|
|
case '()':
|
|
return value()
|
|
if not sep:
|
|
err = f"Invalid conversion specifier {std_spec!r}"
|
|
else:
|
|
err = f"Invalid conversion specifier {std_spec!r} in {conversion_spec!r}"
|
|
raise ValueError(f"{err}: expected '', 'a', 'r', 's' or '()')
|
|
|
|
|
|
Conversion specifier parameter added to :func:`format`
|
|
------------------------------------------------------
|
|
|
|
The signature and behaviour of the :func:`format` builtin will be updated:
|
|
|
|
.. code-block:: python
|
|
|
|
def format(value, format_spec='', conversion_spec=''):
|
|
if conversion_spec:
|
|
value_to_format = operator.convert_field(value)
|
|
else:
|
|
value_to_format = value
|
|
return type(value_to_format).__format__(value, format_spec)
|
|
|
|
If a non-empty conversion specifier is given, the value will be converted with
|
|
:func:`!operator.convert_field` before looking up the ``__format__`` method.
|
|
|
|
The signature of the ``__format__`` special method does NOT change (only format specifiers
|
|
are handled by the object being formatted).
|
|
|
|
|
|
Structural typing and duck typing
|
|
---------------------------------
|
|
|
|
To allow custom renderers to accept alternative interpolation template implementations
|
|
(rather than being tightly coupled to the native template literal types), the
|
|
following structural protocols will be added to the ``typing`` module:
|
|
|
|
.. code-block:: python
|
|
|
|
@runtime_checkable
|
|
class TemplateText(Protocol):
|
|
# Renamed version of PEP 750's Decoded protocol
|
|
def __str__(self) -> str:
|
|
...
|
|
|
|
raw: str
|
|
|
|
@runtime_checkable
|
|
class TemplateField(Protocol):
|
|
# Renamed and modified version of PEP 750's Interpolation protocol
|
|
def __len__(self):
|
|
...
|
|
|
|
def __getitem__(self, index: int):
|
|
...
|
|
|
|
def __str__(self) -> str:
|
|
...
|
|
|
|
expr: str
|
|
value: Any
|
|
format_spec: str | None = None
|
|
conversion_spec: str | None = None
|
|
|
|
@runtime_checkable
|
|
class InterpolationTemplate(Protocol):
|
|
# Corresponds to PEP 750's Template protocol
|
|
def __iter__(self) -> Iterable[TemplateText|TemplateField]:
|
|
...
|
|
|
|
raw_template: str
|
|
|
|
Note that the structural protocol APIs are substantially narrower than the full
|
|
implementation APIs defined for ``TemplateLiteralText``, ``TemplateLiteralField``,
|
|
and ``TemplateLiteral``.
|
|
|
|
Code that wants to accept interpolation templates and define specific handling for them
|
|
without introducing a dependency on the ``typing`` module, or restricting the code to
|
|
handling the concrete template literal types, should instead perform an attribute
|
|
existence check on ``raw_template``.
|
|
|
|
|
|
Writing custom renderers
|
|
------------------------
|
|
|
|
Writing a custom renderer doesn't require any special syntax. Instead,
|
|
custom renderers are ordinary callables that process an interpolation
|
|
template directly either by calling the ``render()`` method with alternate
|
|
``render_template``, ``render_text``, and/or ``render_field`` implementations, or by
|
|
accessing the template's data attributes directly.
|
|
|
|
For example, the following function would render a template using objects'
|
|
``repr`` implementations rather than their native formatting support:
|
|
|
|
.. code-block:: python
|
|
|
|
def repr_format(template):
|
|
def render_field(value, format_spec, conversion_spec):
|
|
converted_value = operator.convert_field(value, conversion_spec)
|
|
return format(repr(converted_value), format_spec)
|
|
return template.render(render_field=render_field)
|
|
|
|
The customer renderer shown respects the conversion specifiers in the original template, but
|
|
it is also possible to ignore them and render the interpolated values directly:
|
|
|
|
.. code-block:: python
|
|
|
|
def input_repr_format(template):
|
|
def render_field(value, format_spec, __):
|
|
return format(repr(value), format_spec)
|
|
return template.render(render_field=render_field)
|
|
|
|
When writing custom renderers, note that the return type of the overall
|
|
rendering operation is determined by the return type of the passed in ``render_template``
|
|
callable. While this will still be a string for formatting related use cases, producing
|
|
non-string objects *is* permitted. For example, a custom SQL
|
|
template renderer could involve an ``sqlalchemy.sql.text`` call that produces an
|
|
`SQL Alchemy query object <http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#using-textual-sql>`__.
|
|
A subprocess invocation related template renderer could produce a string sequence suitable
|
|
for passing to ``subprocess.run``, or it could even call ``subprocess.run`` directly, and
|
|
return the result.
|
|
|
|
Non-strings may also be returned from ``render_text`` and ``render_field``, as long as
|
|
they are paired with a ``render_template`` implementation that expects that behaviour.
|
|
|
|
Custom renderers using the pattern matching style described in :pep:`750` are also supported:
|
|
|
|
.. code-block:: python
|
|
|
|
# Use the structural typing protocols rather than the concrete implementation types
|
|
from typing import InterpolationTemplate, TemplateText, TemplateField
|
|
|
|
def greet(template: InterpolationTemplate) -> str:
|
|
"""Render an interpolation template using structural pattern matching."""
|
|
result = []
|
|
for segment in template:
|
|
match segment:
|
|
match segment:
|
|
case TemplateText() as text_segment:
|
|
result.append(text_segment)
|
|
case TemplateField() as field_segment:
|
|
result.append(str(field_segment).upper())
|
|
return f"{''.join(result)}!"
|
|
|
|
|
|
Expression evaluation
|
|
---------------------
|
|
|
|
As with f-strings, the subexpressions that are extracted from the interpolation
|
|
template are evaluated in the context where the template literal
|
|
appears. This means the expression has full access to local, nonlocal and global variables.
|
|
Any valid Python expression can be used inside ``{}``, including
|
|
function and method calls.
|
|
|
|
Because the substitution expressions are evaluated where the string appears in
|
|
the source code, there are no additional security concerns related to the
|
|
contents of the expression itself, as you could have also just written the
|
|
same expression and used runtime field parsing:
|
|
|
|
.. code-block:: python-console
|
|
|
|
|
|
>>> bar=10
|
|
>>> def foo(data):
|
|
... return data + 20
|
|
...
|
|
>>> str(t'input={bar}, output={foo(bar)}')
|
|
'input=10, output=30'
|
|
|
|
Is essentially equivalent to:
|
|
|
|
.. code-block:: python-console
|
|
|
|
>>> 'input={}, output={}'.format(bar, foo(bar))
|
|
'input=10, output=30'
|
|
|
|
|
|
Handling code injection attacks
|
|
-------------------------------
|
|
|
|
The :pep:`498` formatted string syntax makes it potentially attractive to write
|
|
code like the following:
|
|
|
|
.. code-block:: python
|
|
|
|
runquery(f"SELECT {column} FROM {table};")
|
|
runcommand(f"cat {filename}")
|
|
return_response(f"<html><body>{response.body}</body></html>")
|
|
|
|
These all represent potential vectors for code injection attacks, if any of the
|
|
variables being interpolated happen to come from an untrusted source. The
|
|
specific proposal in this PEP is designed to make it straightforward to write
|
|
use case specific renderers that take care of quoting interpolated values
|
|
appropriately for the relevant security context:
|
|
|
|
.. code-block:: python
|
|
|
|
runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))
|
|
runcommand(sh(t"cat {filename}"))
|
|
return_response(html(t"<html><body>{response.body}</body></html>"))
|
|
|
|
This PEP does not cover adding all such renderers to the standard library
|
|
immediately (though one for shell escaping is proposed), but rather proposes to ensure
|
|
that they can be readily provided by third party libraries, and potentially incorporated
|
|
into the standard library at a later date.
|
|
|
|
Over time, it is expected that APIs processing potentially dangerous string inputs may be
|
|
updated to accept interpolation templates natively, allowing problematic code examples to
|
|
be fixed simply by replacing the ``f`` string prefix with a ``t``:
|
|
|
|
.. code-block:: python
|
|
|
|
runquery(t"SELECT {column} FROM {table};")
|
|
runcommand(t"cat {filename}")
|
|
return_response(t"<html><body>{response.body}</body></html>")
|
|
|
|
It is proposed that a renderer is included in the :mod:`shlex` module, aiming to offer a
|
|
more POSIX shell style experience for accessing external programs, without the significant
|
|
risks posed by running ``os.system`` or enabling the system shell when using the
|
|
``subprocess`` module APIs. This renderer will provide an interface for running external
|
|
programs inspired by that offered by the
|
|
`Julia programming language <https://docs.julialang.org/en/v1/manual/running-external-programs/>`__,
|
|
only with the backtick based ``\`cat $filename\``` syntax replaced by ``t"cat {filename}"``
|
|
style template literals. See more in the :ref:`pep-501-shlex-module` section.
|
|
|
|
|
|
Error handling
|
|
--------------
|
|
|
|
Either compile time or run time errors can occur when processing interpolation
|
|
expressions. Compile time errors are limited to those errors that can be
|
|
detected when parsing a template string into its component tuples. These
|
|
errors all raise SyntaxError.
|
|
|
|
Unmatched braces::
|
|
|
|
>>> t'x={x'
|
|
File "<stdin>", line 1
|
|
t'x={x'
|
|
^
|
|
SyntaxError: missing '}' in template literal expression
|
|
|
|
Invalid expressions::
|
|
|
|
>>> t'x={!x}'
|
|
File "<fstring>", line 1
|
|
!x
|
|
^
|
|
SyntaxError: invalid syntax
|
|
|
|
Run time errors occur when evaluating the expressions inside a
|
|
template string before creating the template literal object. See :pep:`498`
|
|
for some examples.
|
|
|
|
Different renderers may also impose additional runtime
|
|
constraints on acceptable interpolated expressions and other formatting
|
|
details, which will be reported as runtime exceptions.
|
|
|
|
|
|
.. _pep-501-shlex-module:
|
|
|
|
Renderer for shell escaping added to :mod:`shlex`
|
|
-------------------------------------------------
|
|
|
|
As a reference implementation, a renderer for safe POSIX shell escaping can be added to
|
|
the :mod:`shlex` module. This renderer would be called ``sh`` and would be equivalent to
|
|
calling ``shlex.quote`` on each field value in the template literal.
|
|
|
|
Thus:
|
|
|
|
.. code-block:: python
|
|
|
|
os.system(shlex.sh(t'cat {myfile}'))
|
|
|
|
would have the same behavior as:
|
|
|
|
.. code-block:: python
|
|
|
|
os.system('cat ' + shlex.quote(myfile)))
|
|
|
|
The implementation would be:
|
|
|
|
.. code-block:: python
|
|
|
|
def sh(template: TemplateLiteral):
|
|
def render_field(value, format_spec, conversion_spec)
|
|
field_text = format(value, format_spec, conversion_spec)
|
|
return quote(field_text)
|
|
return template.render(render_field=render_field)
|
|
|
|
The addition of ``shlex.sh`` will NOT change the existing admonishments in the
|
|
:mod:`subprocess` documentation that passing ``shell=True`` is best avoided, nor the
|
|
reference from the :func:`os.system` documentation the higher level ``subprocess`` APIs.
|
|
|
|
|
|
Changes to subprocess module
|
|
----------------------------
|
|
|
|
With the additional renderer in the shlex module, and the addition of template literals,
|
|
the :mod:`subprocess` module can be changed to handle accepting template literals
|
|
as an additional input type to ``Popen``, as it already accepts a sequence, or a string,
|
|
with different behavior for each.
|
|
|
|
With the addition of template literals, :class:`subprocess.Popen` (and in return, all its
|
|
higher level functions such as :func:`subprocess.run`) could accept strings in a safe way
|
|
(at least on :ref:`POSIX systems <pep-501-defer-non-posix-shells>`).
|
|
|
|
For example:
|
|
|
|
.. code-block:: python
|
|
|
|
subprocess.run(t'cat {myfile}', shell=True)
|
|
|
|
would automatically use the ``shlex.sh`` renderer provided in this PEP. Therefore, using
|
|
``shlex`` inside a ``subprocess.run`` call like so:
|
|
|
|
.. code-block:: python
|
|
|
|
subprocess.run(shlex.sh(t'cat {myfile}'), shell=True)
|
|
|
|
would be redundant, as ``run`` would automatically render any template literals
|
|
through ``shlex.sh``
|
|
|
|
|
|
Alternatively, when ``subprocess.Popen`` is run without ``shell=True``, it could still
|
|
provide subprocess with a more ergonomic syntax. For example:
|
|
|
|
.. code-block:: python
|
|
|
|
subprocess.run(t'cat {myfile} --flag {value}')
|
|
|
|
would be equivalent to:
|
|
|
|
.. code-block:: python
|
|
|
|
subprocess.run(['cat', myfile, '--flag', value])
|
|
|
|
or, more accurately:
|
|
|
|
.. code-block:: python
|
|
|
|
subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}'))
|
|
|
|
It would do this by first using the ``shlex.sh`` renderer, as above, then using
|
|
``shlex.split`` on the result.
|
|
|
|
The implementation inside ``subprocess.Popen._execute_child`` would look like:
|
|
|
|
.. code-block:: python
|
|
|
|
if hasattr(args, "raw_template"):
|
|
import shlex
|
|
if shell:
|
|
args = [shlex.sh(args)]
|
|
else:
|
|
args = shlex.split(shlex.sh(args))
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
This PEP intentionally includes two standard renderers that will always be available in
|
|
teaching environments: the :func:`format` builtin and the new ``shlex.sh`` POSIX shell
|
|
renderer.
|
|
|
|
Together, these two renderers can be used to build an initial understanding of delayed
|
|
rendering on top of a student's initial introduction to string formatting with f-strings.
|
|
This initial understanding would have the goal of allowing students to *use* template
|
|
literals effectively, in combination with pre-existing template rendering functions.
|
|
|
|
For example, ``f"{'some text'}"``, ``f"{value}"``, ``f"{value!r}"``, , ``f"{callable()}"``
|
|
could all be introduced.
|
|
|
|
Those same operations could then be rewritten as ``format(t"{'some text'}")``,
|
|
``format(t"{value}")``, ``format(t"{value!r}")``, , ``format(t"{callable()}")`` to
|
|
illustrate the relationship between the eager rendering form and the delayed rendering
|
|
form.
|
|
|
|
The difference between "template definition time" (or "interpolation time" ) and
|
|
"template rendering time" can then be investigated further by storing the template literals
|
|
as local variables and looking at their representations separately from the results of the
|
|
``format`` calls. At this point, the ``t"{callable!()}"`` syntax can be introduced to
|
|
distinguish between field expressions that are called at template definition time and those
|
|
that are called at template rendering time.
|
|
|
|
Finally, the differences between the results of ``f"{'some text'}"``,
|
|
``format(t"{'some text'}")``, and ``shlex.sh(t"{'some text'}")`` could be explored to
|
|
illustrate the potential for differences between the default rendering function and custom
|
|
rendering functions.
|
|
|
|
Actually defining your own custom template rendering functions would then be a separate more
|
|
advanced topic (similar to the way students are routinely taught to use decorators and
|
|
context managers well before they learn how to write their own custom ones).
|
|
|
|
:pep:`750` includes further ideas for teaching aspects of the delayed rendering topic.
|
|
|
|
|
|
Discussion
|
|
==========
|
|
|
|
Refer to :pep:`498` for previous discussion, as several of the points there
|
|
also apply to this PEP. :pep:`750`'s design discussions are also highly relevant,
|
|
as that PEP inspired several aspects of the current design.
|
|
|
|
|
|
Support for binary interpolation
|
|
--------------------------------
|
|
|
|
As f-strings don't handle byte strings, neither will t-strings.
|
|
|
|
|
|
Interoperability with str-only interfaces
|
|
-----------------------------------------
|
|
|
|
For interoperability with interfaces that only accept strings, interpolation
|
|
templates can still be prerendered with :func:`format`, rather than delegating the
|
|
rendering to the called function.
|
|
|
|
This reflects the key difference from :pep:`498`, which *always* eagerly applies
|
|
the default rendering, without any way to delegate the choice of renderer to
|
|
another section of the code.
|
|
|
|
|
|
Preserving the raw template string
|
|
----------------------------------
|
|
|
|
Earlier versions of this PEP failed to make the raw template string available
|
|
on the template literal. Retaining it makes it possible to provide a more
|
|
attractive template representation, as well as providing the ability to
|
|
precisely reconstruct the original string, including both the expression text
|
|
and the details of any eagerly rendered substitution fields in format specifiers.
|
|
|
|
|
|
Creating a rich object rather than a global name lookup
|
|
-------------------------------------------------------
|
|
|
|
Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
|
|
creating a new kind of object for later consumption by interpolation
|
|
functions. Creating a rich descriptive object with a useful default renderer
|
|
made it much easier to support customisation of the semantics of interpolation.
|
|
|
|
|
|
Building atop f-strings rather than replacing them
|
|
--------------------------------------------------
|
|
|
|
Earlier versions of this PEP attempted to serve as a complete substitute for
|
|
:pep:`498` (f-strings) . With the acceptance of that PEP and the more recent :pep:`701`,
|
|
this PEP can instead build a more flexible delayed rendering capability
|
|
on top of the existing f-string eager rendering.
|
|
|
|
Assuming the presence of f-strings as a supporting capability simplified a
|
|
number of aspects of the proposal in this PEP (such as how to handle substitution
|
|
fields in format specifiers).
|
|
|
|
|
|
Defining repetition and concatenation semantics
|
|
-----------------------------------------------
|
|
|
|
This PEP explicitly defines repetition and concatenation semantics for ``TemplateLiteral``
|
|
and ``TemplateLiteralText``. While not strictly necessary, defining these is expected
|
|
to make the types easier to work with in code that historically only supported regular
|
|
strings.
|
|
|
|
|
|
New conversion specifier for lazy field evaluation
|
|
--------------------------------------------------
|
|
|
|
The initially published version of :pep:`750` defaulted to lazy evaluation for all
|
|
interpolation fields. While it was subsequently updated to default to eager evaluation
|
|
(as happens for f-strings and this PEP), the discussions around the topic prompted the idea
|
|
of providing a way to indicate to rendering functions that the interpolated field value
|
|
should be called at rendering time rather than being used without modification.
|
|
|
|
Since PEP 750 also deferred the processing of conversion specifiers until evaluation time,
|
|
the suggestion was put forward that invoking ``__call__`` without arguments could be seen
|
|
as similar to the existing conversion specifiers that invoke ``__repr__`` (``!a``, ``!r``)
|
|
or ``__str__`` (``!s``).
|
|
|
|
Accordingly, this PEP was updated to also make conversion specifier processing the
|
|
responsibility of rendering functions, and to introduce ``!()`` as a new conversion
|
|
specifier for lazy evaluation.
|
|
|
|
Adding :func:`!operator.convert_field` and updating the :func:`format` builtin was than
|
|
a matter of providing appropriate support to rendering function implementations that
|
|
wanted to accept the default conversion specifiers.
|
|
|
|
|
|
Allowing arbitrary conversion specifiers in custom renderers
|
|
------------------------------------------------------------
|
|
|
|
Accepting ``!()`` as a new conversion specifier necessarily requires updating the syntax
|
|
that the parser accepts for conversion specifiers (they are currently restricted to
|
|
identifiers). This then raised the question of whether t-string compilation should enforce
|
|
the additional restriction that f-string compilation imposes: that the conversion specifier
|
|
be exactly one of ``!a``, ``!r``, or ``!s``.
|
|
|
|
With t-strings already being updated to allow ``!()`` when compiled, it made sense to treat
|
|
conversion specifiers as relating to rendering function similar to the way that format
|
|
specifiers related to the formatting of individual objects: aside from some characters that
|
|
are excluded for parsing reasons, they are otherwise free text fields with the meaning
|
|
decided by the consuming function or object. This reduces the temptation to introduce
|
|
renderer specific metaformatting into the template's format specifiers (since any
|
|
renderer specific information can be placed in the conversion specifier instead).
|
|
|
|
|
|
Only reserving a single new string prefix
|
|
-----------------------------------------
|
|
|
|
The primary difference between this PEP and :pep:`750` is that the latter aims to enable
|
|
the use of arbitrary string prefixes, rather than requiring the creation of template
|
|
literal instances that are then passed to other APIs. For example, PEP 750 would allow
|
|
the ``sh`` render described in this PEP to be used as ``sh"cat {somefile}"`` rather than
|
|
requiring the template literal to be created explicitly and then passed to a regular
|
|
function call (as in ``sh(t"cat {somefile}")``).
|
|
|
|
The main reason the PEP authors prefer the second spelling is because it makes it clearer
|
|
to a reader what is going on: a template literal instance is being created, and then
|
|
passed to a callable that knows how to do something useful with interpolation template
|
|
instances.
|
|
|
|
A `draft proposal <https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408/176>`__
|
|
from one of the :pep:`750` authors also suggests that static typecheckers will be able
|
|
to infer the use of particular domain specific languages just as readily from the form
|
|
that uses an explicit function call as they would be able to infer it from a directly
|
|
tagged string.
|
|
|
|
With the tagged string syntax at least arguably reducing clarity for human readers without
|
|
increasing the overall expressiveness of the construct, it seems reasonable to start with
|
|
the smallest viable proposal (a single new string prefix), and then revisit the potential
|
|
value of generalising to arbitrary prefixes in the future.
|
|
|
|
As a lesser, but still genuine, consideration, only using a single new string prefix for
|
|
this use case leaves open the possibility of defining alternate prefixes in the future that
|
|
still produce ``TemplateLiteral`` objects, but use a different syntax within the string to
|
|
define the interpolation fields (see the :ref:`i18n discussion <pep-501-defer-i18n>` below).
|
|
|
|
|
|
Deferring consideration of more concise delayed evaluation syntax
|
|
-----------------------------------------------------------------
|
|
|
|
During the discussions of delayed evaluation, ``{-> expr}`` was
|
|
`suggested <https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408/112>`__
|
|
as potential syntactic sugar for the already supported ``lambda`` based syntax:
|
|
``{(lambda: expr)}`` (the parentheses are required in the existing syntax to avoid
|
|
misinterpretation of the ``:`` character as indicating the start of the format specifier).
|
|
|
|
While adding such a spelling would complement the rendering time function call syntax
|
|
proposed in this PEP (that is, writing ``{-> expr!()}`` to evaluate arbitrary expressions
|
|
at rendering time), it is a topic that the PEP authors consider to be better left to a
|
|
future PEP if this PEP or :pep:`750` is accepted.
|
|
|
|
|
|
Deferring consideration of possible logging integration
|
|
-------------------------------------------------------
|
|
|
|
One of the challenges with the logging module has been that we have previously
|
|
been unable to devise a reasonable migration strategy away from the use of
|
|
printf-style formatting. While the logging module does allow formatters to specify the
|
|
use of :meth:`str.format` or :class:`string.Template` style substitution, it can be awkward
|
|
to ensure that messages written that way are only ever processed by log record formatters
|
|
that are expecting that syntax.
|
|
|
|
The runtime parsing and interpolation overhead for logging messages also poses a problem
|
|
for extensive logging of runtime events for monitoring purposes.
|
|
|
|
While beyond the scope of this initial PEP, template literal support
|
|
could potentially be added to the logging module's event reporting APIs,
|
|
permitting relevant details to be captured using forms like:
|
|
|
|
.. code-block:: python
|
|
|
|
logging.debug(t"Event: {event}; Details: {data}")
|
|
logging.critical(t"Error: {error}; Details: {data}")
|
|
|
|
Rather than the historical mod-formatting style:
|
|
|
|
.. code-block:: python
|
|
|
|
logging.debug("Event: %s; Details: %s", event, data)
|
|
logging.critical("Error: %s; Details: %s", event, data)
|
|
|
|
As the template literal is passed in as an ordinary argument, other
|
|
keyword arguments would also remain available:
|
|
|
|
.. code-block:: python
|
|
|
|
logging.critical(t"Error: {error}; Details: {data}", exc_info=True)
|
|
|
|
The approach to standardising lazy field evaluation described in this PEP is
|
|
primarily based on the anticipated needs of this hypothetical integration into
|
|
the logging module:
|
|
|
|
.. code-block:: python
|
|
|
|
logging.debug(t"Eager evaluation of {expensive_call()}")
|
|
logging.debug(t"Lazy evaluation of {expensive_call!()}")
|
|
|
|
logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}")
|
|
logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}")
|
|
|
|
It's an open question whether the definition of logging formatters would be updated to
|
|
support template strings, but if they were, the most likely way of defining fields which
|
|
should be :ref:`looked up on the log record <logrecord-attributes>` instead of being
|
|
interpreted eagerly is simply to escape them so they're available as part of the literal
|
|
text:
|
|
|
|
.. code-block:: python
|
|
|
|
proc_id = get_process_id()
|
|
formatter = logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}")
|
|
|
|
|
|
.. _pep-501-defer-i18n:
|
|
|
|
Deferring consideration of possible use in i18n use cases
|
|
---------------------------------------------------------
|
|
|
|
The initial motivating use case for this PEP was providing a cleaner syntax
|
|
for i18n (internationalization) translation, as that requires access to the original
|
|
unmodified template. As such, it focused on compatibility with the substitution syntax
|
|
used in Python's :class:`string.Template` formatting and Mozilla's l20n project.
|
|
|
|
However, subsequent discussion revealed there are significant additional
|
|
considerations to be taken into account in the i18n use case, which don't
|
|
impact the simpler cases of handling interpolation into security sensitive
|
|
contexts (like HTML, system shells, and database queries), or producing
|
|
application debugging messages in the preferred language of the development
|
|
team (rather than the native language of end users).
|
|
|
|
Due to that realisation, the PEP was switched to use the :meth:`str.format` substitution
|
|
syntax originally defined in :pep:`3101` and subsequently used as the basis for :pep:`498`.
|
|
|
|
While it would theoretically be possible to update :class:`string.Template` to support
|
|
the creation of instances from native template literals, and to implement the structural
|
|
``typing.Template`` protocol, the PEP authors have not identified any practical benefit
|
|
in doing so.
|
|
|
|
However, one significant benefit of the "only one string prefix" approach used in this PEP
|
|
is that while it generalises the existing f-string interpolation syntax to support delayed
|
|
rendering through t-strings, it doesn't imply that that should be the *only* compiler
|
|
supported interpolation syntax that Python should ever offer.
|
|
|
|
Most notably, it leaves the door open to an alternate "t$-string" syntax that would allow
|
|
``TemplateLiteral`` instances to be created using a :pep:`292` based interpolation syntax
|
|
rather than a :pep:`3101` based syntax:
|
|
|
|
template = t$"Substitute $words and ${other_values} at runtime"
|
|
|
|
The only runtime distinction between templates created that way and templates created from
|
|
regular t-strings would be in the contents of their ``raw_template`` attributes.
|
|
|
|
|
|
.. _pep-501-defer-non-posix-shells:
|
|
|
|
Deferring escaped rendering support for non-POSIX shells
|
|
--------------------------------------------------------
|
|
|
|
:func:`shlex.quote` works by classifying the regex character set ``[\w@%+=:,./-]`` to be
|
|
safe, deeming all other characters to be unsafe, and hence requiring quoting of the string
|
|
containing them. The quoting mechanism used is then specific to the way that string quoting
|
|
works in POSIX shells, so it cannot be trusted when running a shell that doesn't follow
|
|
POSIX shell string quoting rules.
|
|
|
|
For example, running ``subprocess.run(f'echo {shlex.quote(sys.argv[1])}', shell=True)`` is
|
|
safe when using a shell that follows POSIX quoting rules::
|
|
|
|
$ cat > run_quoted.py
|
|
import sys, shlex, subprocess
|
|
subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
|
|
$ python3 run_quoted.py pwd
|
|
pwd
|
|
$ python3 run_quoted.py '; pwd'
|
|
; pwd
|
|
$ python3 run_quoted.py "'pwd'"
|
|
'pwd'
|
|
|
|
but remains unsafe when running a shell from Python invokes ``cmd.exe`` (or Powershell)::
|
|
|
|
S:\> echo import sys, shlex, subprocess > run_quoted.py
|
|
S:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.py
|
|
S:\> type run_quoted.py
|
|
import sys, shlex, subprocess
|
|
subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
|
|
S:\> python3 run_quoted.py "echo OK"
|
|
'echo OK'
|
|
S:\> python3 run_quoted.py "'& echo Oh no!"
|
|
''"'"'
|
|
Oh no!'
|
|
|
|
Resolving this standard library limitation is beyond the scope of this PEP.
|
|
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
* Eric V. Smith for creating :pep:`498` and demonstrating the feasibility of
|
|
arbitrary expression substitution in string interpolation
|
|
* The authors of :pep:`750` for the substantial design improvements that tagged strings
|
|
inspired for this PEP, their general advocacy for the value of language level delayed
|
|
template rendering support, and their efforts to ensure that any native interpolation
|
|
template support lays a strong foundation for future efforts in providing robust syntax
|
|
highlighting and static type checking support for domain specific languages
|
|
* Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to
|
|
exploring the feasibility of using this model of delayed rendering in i18n
|
|
use cases (even though the ultimate conclusion was that it was a poor fit,
|
|
at least for current approaches to i18n in Python)
|
|
|
|
References
|
|
==========
|
|
|
|
* `%-formatting
|
|
<https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting>`_
|
|
|
|
* `str.format
|
|
<https://docs.python.org/3/library/string.html#formatstrings>`_
|
|
|
|
* `string.Template documentation
|
|
<https://docs.python.org/3/library/string.html#template-strings>`_
|
|
|
|
* :pep:`215`: String Interpolation
|
|
|
|
* :pep:`292`: Simpler String Substitutions
|
|
|
|
* :pep:`3101`: Advanced String Formatting
|
|
|
|
* :pep:`498`: Literal string formatting
|
|
|
|
* :pep:`675`: Arbitrary Literal String Type
|
|
|
|
* :pep:`701`: Syntactic formalization of f-strings
|
|
|
|
* `FormattableString and C# native string interpolation
|
|
<https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated>`_
|
|
|
|
* `IFormattable interface in C# (see remarks for globalization notes)
|
|
<https://docs.microsoft.com/en-us/dotnet/api/system.iformattable>`_
|
|
|
|
* `TemplateLiterals in Javascript
|
|
<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals>`_
|
|
|
|
* `Running external commands in Julia
|
|
<https://docs.julialang.org/en/v1/manual/running-external-programs/>`_
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|