560 lines
22 KiB
Plaintext
560 lines
22 KiB
Plaintext
PEP: 501
|
||
Title: General purpose string interpolation
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 08-Aug-2015
|
||
Python-Version: 3.6
|
||
Post-History: 08-Aug-2015, 23-Aug-2015
|
||
|
||
Abstract
|
||
========
|
||
|
||
PEP 498 proposes new syntactic support for string interpolation that is
|
||
transparent to the compiler, allow name references from the interpolation
|
||
operation full access to containing namespaces (as with any other expression),
|
||
rather than being limited to explicitly name references.
|
||
|
||
However, it only offers this capability for string formatting, making it likely
|
||
we will see code like the following::
|
||
|
||
os.system(f"echo {user_message}")
|
||
|
||
This kind of code is superficially elegant, but poses a significant problem
|
||
if the interpolated value ``user_message`` is in fact provided by a user: it's
|
||
an opening for a form of code injection attack, where the supplied user data
|
||
has not been properly escaped before being passed to the ``os.system`` call.
|
||
|
||
To address that problem (and a number of other concerns), this PEP proposes an
|
||
alternative approach to compiler supported interpolation, using ``i`` (for
|
||
"interpolation") as the new string prefix and a substitution syntax
|
||
inspired by that used in ``string.Template`` and ES6 JavaScript, rather than
|
||
adding a 4th substitution variable syntax to Python.
|
||
|
||
Some possible examples of the proposed syntax::
|
||
|
||
msg = str(i'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
||
print(_(i"This is a $translated $message"))
|
||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
||
myquery = sql(i"SELECT $column FROM $table;")
|
||
mycommand = sh(i"cat $filename")
|
||
mypage = html(i"<html><body>${response.body}</body></html>")
|
||
callable = defer(i"$x + $y")
|
||
|
||
Summary of differences from PEP 498
|
||
===================================
|
||
|
||
The key differences of this proposal relative to PEP 498:
|
||
|
||
* "i" (interpolation template) prefix rather than "f" (formatted string)
|
||
* string.Template/JavaScript inspired substitution syntax, rather than str.format/C# inspired
|
||
* interpolation templates are created at runtime as a new kind of object
|
||
* the default rendering is invoked by calling ``str()`` on a template object
|
||
rather than automatically
|
||
|
||
Proposal
|
||
========
|
||
|
||
This PEP proposes the introduction of a new string prefix that declares the
|
||
string to be an interpolation template rather than an ordinary string::
|
||
|
||
template = i"Substitute $names and ${expressions} at runtime"
|
||
|
||
This would be effectively interpreted as::
|
||
|
||
_raw_template = "Substitute $names and ${expressions} at runtime"
|
||
_parsed_fields = (
|
||
("Substitute ", 0, "names", "", ""),
|
||
(" and ", 1, "expressions", "", ""),
|
||
(" at runtime", None, None, None, None),
|
||
)
|
||
_field_values = (names, expressions)
|
||
template = types.InterpolationTemplate(_raw_template,
|
||
_parsed_fields,
|
||
_field_values)
|
||
|
||
The ``__str__`` method on ``types.InterpolationTemplate`` would then implementat
|
||
the following ``str.format`` inspired semantics::
|
||
|
||
>>> import datetime
|
||
>>> name = 'Jane'
|
||
>>> age = 50
|
||
>>> anniversary = datetime.date(1991, 10, 12)
|
||
>>> str(i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
||
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
||
>>> str(i'She said her name is ${name!r}.')
|
||
"She said her name is 'Jane'."
|
||
|
||
The interpolation template prefix can be combined with single-quoted,
|
||
double-quoted and triple quoted strings, including raw strings. It does not
|
||
support combination with bytes literals.
|
||
|
||
This PEP does not propose to remove or deprecate any of the existing
|
||
string formatting mechanisms, as those will remain valuable when formatting
|
||
strings that are not present directly in the source code of the application.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
PEP 498 makes interpolating values into strings with full access to Python's
|
||
lexical namespace semantics simpler, but it does so at the cost of creating a
|
||
situation where interpolating values into sensitive targets like SQL queries,
|
||
shell commands and HTML templates will enjoy a much cleaner syntax when handled
|
||
without regard for code injection attacks than when they are handled correctly.
|
||
It also has the effect of introducing yet another syntax for substitution
|
||
expressions into Python, when we already have 3 (``str.format``,
|
||
``bytes.__mod__`` and ``string.Template``)
|
||
|
||
This PEP proposes to handle the former issue by deferring the actual rendering
|
||
of the interpolation template to its ``__str__`` method (allow the use of
|
||
other template renderers by passing the template around as an object), and the
|
||
latter by adopting the ``string.Template`` substitution syntax defined in PEP
|
||
292.
|
||
|
||
The substitution syntax devised for PEP 292 is deliberately simple so that the
|
||
template strings can be extracted into an i18n message catalog, and passed to
|
||
translators who may not themselves be developers. For these use cases, it is
|
||
important that the interpolation syntax be as simple as possible, as the
|
||
translators are responsible for preserving the substition markers, even as
|
||
they translate the surrounding text. The PEP 292 syntax is also a common mesage
|
||
catalog syntax already supporting by many commercial software translation
|
||
support tools.
|
||
|
||
PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
|
||
introduced for general purpose string formatting in PEP 3101, so this PEP adds
|
||
that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
|
||
tools the option of rejecting usage of that more advanced syntax at runtime,
|
||
rather than categorically rejecting it at compile time. The proposed permitted
|
||
expressions, conversion specifiers, and format specifiers inside ``${ref}`` are
|
||
exactly as defined for ``{ref}`` substituion in PEP 498.
|
||
|
||
The specific proposal in this PEP is also deliberately close in both syntax
|
||
and semantics to the general purpose interpolation syntax introduced to
|
||
JavaScript in ES6, as we can reasonably expect a great many Python developers
|
||
to be regularly switching back and forth between user interface code written in
|
||
JavaScript and core application code written in Python.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
This PEP proposes the introduction of ``i`` as a new string prefix that
|
||
results in the creation of an instance of a new type,
|
||
``types.InterpolationTemplate``.
|
||
|
||
Interpolation template literals are Unicode strings (bytes literals are not
|
||
permitted), and string literal concatenation operates as normal, with the
|
||
entire combined literal forming the interpolation template.
|
||
|
||
The template string is parsed into literals and expressions. Expressions
|
||
appear as either identifiers prefixed with a single "$" character, or
|
||
surrounded be a leading '${' and a trailing '}. The parts of the format string
|
||
that are not expressions are separated out as string literals.
|
||
|
||
While parsing the string, any doubled ``$$`` is replaced with a single ``$``
|
||
and is considered part of the literal text, rather than as introducing an
|
||
expression.
|
||
|
||
These components are then organised into an instance of a new type with the
|
||
following semantics::
|
||
|
||
class InterpolationTemplate:
|
||
__slots__ = ("raw_template", "parsed_fields", "field_values")
|
||
|
||
def __new__(cls, raw_template, parsed_fields, field_values):
|
||
self = super().__new__()
|
||
self.raw_template = raw_template
|
||
self.parsed_fields = parsed_fields
|
||
self.field_values = field_values
|
||
return self
|
||
|
||
def __iter__(self):
|
||
# Support iterable unpacking
|
||
yield self.raw_template
|
||
yield self.parsed_fields
|
||
yield self.field_values
|
||
|
||
def __repr__(self):
|
||
return str(i"<${type(self).__qualname__} ${self.raw_template!r} "
|
||
"at ${id(self):#x}>")
|
||
|
||
def __str__(self):
|
||
# See definition of the default template rendering below
|
||
|
||
The result of the interpolation template expression is an instance of this
|
||
type, rather than an already rendered string - default rendering only takes
|
||
place when the instance's ``__str__`` method is called.
|
||
|
||
The format of the parsed fields tuple is inspired by the interface of
|
||
``string.Formatter.parse``, and consists of a series of 5-tuples each
|
||
containing:
|
||
|
||
* a leading string literal (may be the empty string)
|
||
* the substitution field position (zero-based enumeration)
|
||
* the substitution expression text
|
||
* the substitution conversion specifier (as defined by str.format)
|
||
* the substitution format specifier (as defined by str.format)
|
||
|
||
This field ordering is defined such that reading the parsed field tuples from
|
||
left to right will have all the subcomponents displayed in the same order as
|
||
they appear in the original template string.
|
||
|
||
For ease of access the sequence elements will be available as attributes in
|
||
addition to being available by position:
|
||
|
||
* ``leading_text``
|
||
* ``field_position``
|
||
* ``expression``
|
||
* ``conversion``
|
||
* ``format``
|
||
|
||
The expression text is simply the text of the substitution expression, as it
|
||
appeared in the original string, but without the leading and/or surrounding
|
||
expression markers. The conversion specifier and format specifier are separated
|
||
from the substition expression by ``!`` and ``:`` as defined for ``str.format``.
|
||
|
||
If a given substition field has no leading literal section, conversion specifier
|
||
or format specifier, then the corresponding elements in the tuple are the
|
||
empty string. If the final part of the string has no trailing substitution
|
||
field, then the field position, field expression, conversion specifier and
|
||
format specifier will all be ``None``.
|
||
|
||
The substitution field values tuple is created by evaluating the interpolated
|
||
expressions in the exact runtime context where the interpolation expression
|
||
appears in the source code.
|
||
|
||
For the following example interpolation template::
|
||
|
||
i'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'
|
||
|
||
the parsed fields tuple would be::
|
||
|
||
(
|
||
('abc', 0, 'expr1', '', 'spec1'),
|
||
('', 1, 'expr2', 'r', 'spec2'),
|
||
(def', 2, 'expr3', 's', ''),
|
||
('ghi', 3, 'ident', '', ''),
|
||
('$jkl', None, None, None, None)
|
||
)
|
||
|
||
While the field values tuple would be::
|
||
|
||
(expr1, expr2, expr3, ident)
|
||
|
||
The parsed fields tuple can be constant folded at compile time, while the
|
||
expression values tuple will always need to be constructed at runtime.
|
||
|
||
The ``InterpolationTemplate.__str__`` implementation would have the following
|
||
semantics, with field processing being defined in terms of the ``format``
|
||
builtin and ``str.format`` conversion specifiers::
|
||
|
||
_converter = string.Formatter().convert_field
|
||
|
||
def __str__(self):
|
||
raw_template, fields, values = self
|
||
template_parts = []
|
||
for leading_text, field_num, expr, conversion, format_spec in fields:
|
||
template_parts.append(leading_text)
|
||
if field_num is not None:
|
||
value = values[field_num]
|
||
if conversion:
|
||
value = _converter(value, conversion)
|
||
field_text = format(value, format_spec)
|
||
template_parts.append(field_str)
|
||
return "".join(template_parts)
|
||
|
||
Writing custom interpolators
|
||
----------------------------
|
||
|
||
Writing a custom interpolator doesn't requiring any special syntax. Instead,
|
||
custom interpolators are ordinary callables that process an interpolation
|
||
template directly based on the ``raw_template``, ``parsed_fields`` and
|
||
``field_values`` attributes, rather than relying on the default rendered.
|
||
|
||
|
||
Expression evaluation
|
||
---------------------
|
||
|
||
The subexpressions that are extracted from the interpolation expression are
|
||
evaluated in the context where the interpolation expression appears. This means
|
||
the expression has full access to local, nonlocal and global variables. Any
|
||
valid Python expression can be used inside ``${}``, including function and
|
||
method calls. References without the surrounding braces are limited to looking
|
||
up single identifiers.
|
||
|
||
Because the substitution expressions are evaluated where the string appears in
|
||
the source code, there are no additional security concerns related to the
|
||
contents of the expression itself, as you could have also just written the
|
||
same expression and used runtime field parsing::
|
||
|
||
>>> bar=10
|
||
>>> def foo(data):
|
||
... return data + 20
|
||
...
|
||
>>> str(i'input=$bar, output=${foo(bar)}')
|
||
'input=10, output=30'
|
||
|
||
Is essentially equivalent to::
|
||
|
||
>>> 'input={}, output={}'.format(bar, foo(bar))
|
||
'input=10, output=30'
|
||
|
||
Handling code injection attacks
|
||
-------------------------------
|
||
|
||
The proposed interpolation syntax makes it potentially attractive to write
|
||
code like the following::
|
||
|
||
myquery = str(i"SELECT $column FROM $table;")
|
||
mycommand = str(i"cat $filename")
|
||
mypage = str(i"<html><body>${response.body}</body></html>")
|
||
|
||
These all represent potential vectors for code injection attacks, if any of the
|
||
variables being interpolated happen to come from an untrusted source. The
|
||
specific proposal in this PEP is designed to make it straightforward to write
|
||
use case specific interpolators that take care of quoting interpolated values
|
||
appropriately for the relevant security context::
|
||
|
||
myquery = sql(i"SELECT $column FROM $table;")
|
||
mycommand = sh(i"cat $filename")
|
||
mypage = html(i"<html><body>${response.body}</body></html>")
|
||
|
||
This PEP does not cover adding such interpolators to the standard library,
|
||
but instead ensures they can be readily provided by third party libraries.
|
||
|
||
(Although it's tempting to propose adding InterpolationTemplate support at
|
||
least to ``subprocess.call``, ``subprocess.check_call`` and
|
||
``subprocess.check_output``)
|
||
|
||
Format and conversion specifiers
|
||
--------------------------------
|
||
|
||
Aside from separating them out from the substitution expression, format and
|
||
conversion specifiers are otherwise treated as opaque strings by the
|
||
interpolation template parser - assigning semantics to those (or, alternatively,
|
||
prohibiting their use) is handled at runtime by the specified interpolator.
|
||
|
||
Error handling
|
||
--------------
|
||
|
||
Either compile time or run time errors can occur when processing interpolation
|
||
expressions. Compile time errors are limited to those errors that can be
|
||
detected when parsing a template string into its component tuples. These
|
||
errors all raise SyntaxError.
|
||
|
||
Unmatched braces::
|
||
|
||
>>> i'x=${x'
|
||
File "<stdin>", line 1
|
||
SyntaxError: missing '}' in interpolation expression
|
||
|
||
Invalid expressions::
|
||
|
||
>>> i'x=${!x}'
|
||
File "<fstring>", line 1
|
||
!x
|
||
^
|
||
SyntaxError: invalid syntax
|
||
|
||
Run time errors occur when evaluating the expressions inside a
|
||
template string before creating the interpolation template object. See PEP 498
|
||
for some examples.
|
||
|
||
Different interpolators may also impose additional runtime
|
||
constraints on acceptable interpolated expressions and other formatting
|
||
details, which will be reported as runtime exceptions.
|
||
|
||
|
||
Internationalising interpolated strings
|
||
=======================================
|
||
|
||
Since this PEP derives its interpolation syntax from the internationalisation
|
||
focused PEP 292, it's worth considering the potential implications this PEP
|
||
may have for the internationalisation use case.
|
||
|
||
Internationalisation enters the picture by writing a custom interpolator that
|
||
performs internationalisation. For example, the following implementation
|
||
would delegate interpolation calls to ``string.Template``::
|
||
|
||
def i18n(template):
|
||
# A real implementation would also handle normal strings
|
||
raw_template, fields, values = template
|
||
translated = gettext.gettext(raw_template)
|
||
value_map = _build_interpolation_map(fields, values)
|
||
return string.Template(translated).safe_substitute(value_map)
|
||
|
||
def _build_interpolation_map(fields, values):
|
||
field_values = {}
|
||
for literal_text, field_num, expr, conversion, format_spec in fields:
|
||
assert expr.isidentifier() and not conversion and not format_spec
|
||
if field_num is not None:
|
||
field_values[expr] = values[field_num]
|
||
return field_values
|
||
|
||
And could then be invoked as::
|
||
|
||
# _ = i18n at top of module or injected into the builtins module
|
||
print(_(i"This is a $translated $message"))
|
||
|
||
Any actual i18n implementation would need to address other issues (most notably
|
||
message catalog extraction), but this gives the general idea of what might be
|
||
possible.
|
||
|
||
It's also worth noting that one of the benefits of the ``$`` based substitution
|
||
syntax in this PEP is its compatibility with Mozilla's
|
||
`l20n syntax <http://l20n.org/>`__, which uses ``{{ name }}`` for global
|
||
substitution, and ``{{ $user }}`` for local context substitution.
|
||
|
||
With the syntax in this PEP, an l20n interpolator could be written as::
|
||
|
||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
||
|
||
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
|
||
catalog lookups using PEP 498's semantics), the necessary brace escaping would
|
||
make the string look like this in order to interpolate the user variable
|
||
while preserving all of the expected braces::
|
||
|
||
locally_interpolated = f"{{{{ ${user} }}}} is running {{{{ appname }}}}"
|
||
|
||
|
||
Possible integration with the logging module
|
||
============================================
|
||
|
||
One of the challenges with the logging module has been that previously been
|
||
unable to devise a reasonable migration strategy away from the use of
|
||
printf-style formatting. The runtime parsing and interpolation overhead for
|
||
logging messages also poses a problem for extensive logging of runtime events
|
||
for monitoring purposes.
|
||
|
||
While beyond the scope of this initial PEP, interpolation template support
|
||
could potentially be added to the logging module's event reporting APIs,
|
||
permitting relevant details to be captured using forms like::
|
||
|
||
logging.debug(i"Event: $event; Details: $data")
|
||
logging.critical(i"Error: $error; Details: $data")
|
||
|
||
As the interpolation template is passed in as an ordinary argument, other
|
||
keyword arguments also remain available::
|
||
|
||
logging.critical(i"Error: $error; Details: $data", exc_info=True)
|
||
|
||
Discussion
|
||
==========
|
||
|
||
Refer to PEP 498 for additional discussion, as several of the points there
|
||
also apply to this PEP.
|
||
|
||
Deferring support for binary interpolation
|
||
------------------------------------------
|
||
|
||
Supporting binary interpolation with this syntax would be relatively
|
||
straightforward (the elements in the parsed fields tuple would just be
|
||
byte strings rather than text strings, and the default renderer would be
|
||
markedly less useful), but poses a signficant likelihood of producing
|
||
confusing type errors when a text interpolator was presented with
|
||
binary input.
|
||
|
||
Since the proposed operator is useful without binary interpolation support, and
|
||
such support can be readily added later, further consideration of binary
|
||
interpolation is considered out of scope for the current PEP.
|
||
|
||
Interoperability with str-only interfaces
|
||
-----------------------------------------
|
||
|
||
For interoperability with interfaces that only accept strings, interpolation
|
||
templates can be prerendered with ``str``, rather than delegating the rendering
|
||
to the called function.
|
||
|
||
This reflects the key difference from PEP 498, which *always* eagerly applies
|
||
the default rendering, without any convenient way to decide to do something
|
||
different.
|
||
|
||
Preserving the raw template string
|
||
----------------------------------
|
||
|
||
Earlier versions of this PEP failed to make the raw template string available
|
||
to interpolators. This greatly complicated the i18n example, as it needed to
|
||
reconstruct the original template to pass to the message catalog lookup.
|
||
|
||
Creating a rich object rather than a global name lookup
|
||
-------------------------------------------------------
|
||
|
||
Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
|
||
a creating a new kind of object for later consumption by interpolation
|
||
functions. Creating a rich descriptive object with a useful default renderer
|
||
made it much easier to support customisation of the semantics of interpolation.
|
||
|
||
Relative order of conversion and format specifier in parsed fields
|
||
------------------------------------------------------------------
|
||
|
||
The relative order of the conversion specifier and the format specifier in the
|
||
substitution field 5-tuple is defined to match the order they appear in the
|
||
format string, which is unfortunately the inverse of the way they appear in the
|
||
``string.Formatter.parse`` 4-tuple.
|
||
|
||
I consider this a design defect in ``string.Formatter.parse``, so I think it's
|
||
worth fixing it in for the customer interpolator API, since the tuple already
|
||
has other differences (like including both the field position number *and* the
|
||
text of the expression).
|
||
|
||
This PEP also makes the parsed field attributes available by name, so it's
|
||
possible to write interpolators without caring about the precise field order
|
||
at all.
|
||
|
||
|
||
Acknowledgements
|
||
================
|
||
|
||
* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
|
||
arbitrary expression substitution in string interpolation
|
||
* Barry Warsaw for the string.Template syntax defined in PEP 292
|
||
* Armin Ronacher for pointing me towards Mozilla's l20n project
|
||
* Mike Miller for his survey of programming language interpolation syntaxes in
|
||
PEP (TBD)
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#] %-formatting
|
||
(https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
|
||
|
||
.. [#] str.format
|
||
(https://docs.python.org/3/library/string.html#formatstrings)
|
||
|
||
.. [#] string.Template documentation
|
||
(https://docs.python.org/3/library/string.html#template-strings)
|
||
|
||
.. [#] PEP 215: String Interpolation
|
||
(https://www.python.org/dev/peps/pep-0215/)
|
||
|
||
.. [#] PEP 292: Simpler String Substitutions
|
||
(https://www.python.org/dev/peps/pep-0292/)
|
||
|
||
.. [#] PEP 3101: Advanced String Formatting
|
||
(https://www.python.org/dev/peps/pep-3101/)
|
||
|
||
.. [#] PEP 498: Literal string formatting
|
||
(https://www.python.org/dev/peps/pep-0498/)
|
||
|
||
.. [#] string.Formatter.parse
|
||
(https://docs.python.org/3/library/string.html#string.Formatter.parse)
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|