PEP 501: Build on 498 instead of competing
This commit is contained in:
parent
fbe3070944
commit
651a74028d
565
pep-0501.txt
565
pep-0501.txt
|
@ -6,9 +6,10 @@ Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||||
Status: Draft
|
Status: Draft
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
Content-Type: text/x-rst
|
Content-Type: text/x-rst
|
||||||
|
Requires: 498
|
||||||
Created: 08-Aug-2015
|
Created: 08-Aug-2015
|
||||||
Python-Version: 3.6
|
Python-Version: 3.6
|
||||||
Post-History: 08-Aug-2015, 23-Aug-2015
|
Post-History: 08-Aug-2015, 23-Aug-2015, 30-Aug-2015
|
||||||
|
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
@ -16,44 +17,53 @@ Abstract
|
||||||
PEP 498 proposes new syntactic support for string interpolation that is
|
PEP 498 proposes new syntactic support for string interpolation that is
|
||||||
transparent to the compiler, allow name references from the interpolation
|
transparent to the compiler, allow name references from the interpolation
|
||||||
operation full access to containing namespaces (as with any other expression),
|
operation full access to containing namespaces (as with any other expression),
|
||||||
rather than being limited to explicitly name references.
|
rather than being limited to explicit name references. These are referred
|
||||||
|
to in the PEP as "f-strings" (a mnemonic for "formatted strings").
|
||||||
|
|
||||||
However, it only offers this capability for string formatting, making it likely
|
However, it only offers this capability for string formatting, making it likely
|
||||||
we will see code like the following::
|
we will see code like the following::
|
||||||
|
|
||||||
os.system(f"echo {user_message}")
|
os.system(f"echo {message_from_user}")
|
||||||
|
|
||||||
This kind of code is superficially elegant, but poses a significant problem
|
This kind of code is superficially elegant, but poses a significant problem
|
||||||
if the interpolated value ``user_message`` is in fact provided by a user: it's
|
if the interpolated value ``message_from_user`` is in fact provided by an
|
||||||
an opening for a form of code injection attack, where the supplied user data
|
untrusted user: it's an opening for a form of code injection attack, where
|
||||||
has not been properly escaped before being passed to the ``os.system`` call.
|
the supplied user data has not been properly escaped before being passed to
|
||||||
|
the ``os.system`` call.
|
||||||
|
|
||||||
To address that problem (and a number of other concerns), this PEP proposes an
|
To address that problem (and a number of other concerns), this PEP proposes
|
||||||
alternative approach to compiler supported interpolation, using ``i`` (for
|
the complementary introduction of "i-strings" (a mnemonic for "interpolation
|
||||||
"interpolation") as the new string prefix and a substitution syntax
|
template strings"), where ``f"Message with {data}"`` would produce the same
|
||||||
inspired by that used in ``string.Template`` and ES6 JavaScript, rather than
|
result as ``format(i"Message with {data}")``.
|
||||||
adding a 4th substitution variable syntax to Python.
|
|
||||||
|
|
||||||
Some possible examples of the proposed syntax::
|
Some possible examples of the proposed syntax::
|
||||||
|
|
||||||
msg = str(i'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
mycommand = sh(i"cat {filename}")
|
||||||
print(_(i"This is a $translated $message"))
|
myquery = sql(i"SELECT {column} FROM {table};")
|
||||||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
myresponse = html(i"<html><body>{response.body}</body></html>")
|
||||||
myquery = sql(i"SELECT $column FROM $table;")
|
logging.debug(i"Message with {detailed} {debugging} {info}")
|
||||||
mycommand = sh(i"cat $filename")
|
|
||||||
mypage = html(i"<html><body>${response.body}</body></html>")
|
|
||||||
callable = defer(i"$x + $y")
|
|
||||||
|
|
||||||
Summary of differences from PEP 498
|
Summary of differences from PEP 498
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
The key differences of this proposal relative to PEP 498:
|
The key additions this proposal makes relative to PEP 498:
|
||||||
|
|
||||||
* "i" (interpolation template) prefix rather than "f" (formatted string)
|
* the "i" (interpolation template) prefix indicates delayed rendering, but
|
||||||
* string.Template/JavaScript inspired substitution syntax, rather than str.format/C# inspired
|
otherwise uses the same syntax and semantics as formatted strings
|
||||||
* interpolation templates are created at runtime as a new kind of object
|
* interpolation templates are available at runtime as a new kind of object
|
||||||
* the default rendering is invoked by calling ``str()`` on a template object
|
(``types.InterpolationTemplate``)
|
||||||
rather than automatically
|
* the default rendering used by formatted strings is invoked on an
|
||||||
|
interpolation template object by calling ``format(template)`` rather than
|
||||||
|
implicitly
|
||||||
|
* while f-string ``f"Message {here}"`` would be *semantically* equivalent to
|
||||||
|
``format(i"Message {here}")``, it is expected that the explicit syntax would
|
||||||
|
avoid the runtime overhead of using the delayed rendering machinery
|
||||||
|
|
||||||
|
NOTE: This proposal spells out a draft API for ``types.InterpolationTemplate``.
|
||||||
|
The precise details of the structures and methods exposed by this type would
|
||||||
|
be informed by the reference implementation of PEP 498, so it makes sense to
|
||||||
|
gain experience with that as an internal API before locking down a public API
|
||||||
|
(if this extension proposal is accepted).
|
||||||
|
|
||||||
Proposal
|
Proposal
|
||||||
========
|
========
|
||||||
|
@ -61,38 +71,39 @@ Proposal
|
||||||
This PEP proposes the introduction of a new string prefix that declares the
|
This PEP proposes the introduction of a new string prefix that declares the
|
||||||
string to be an interpolation template rather than an ordinary string::
|
string to be an interpolation template rather than an ordinary string::
|
||||||
|
|
||||||
template = i"Substitute $names and ${expressions} at runtime"
|
template = i"Substitute {names} and {expressions()} at runtime"
|
||||||
|
|
||||||
This would be effectively interpreted as::
|
This would be effectively interpreted as::
|
||||||
|
|
||||||
_raw_template = "Substitute $names and ${expressions} at runtime"
|
_raw_template = "Substitute {names} and {expressions()} at runtime"
|
||||||
_parsed_fields = (
|
_parsed_template = (
|
||||||
("Substitute ", 0, "names", "", ""),
|
("Substitute ", "names"),
|
||||||
(" and ", 1, "expressions", "", ""),
|
(" and ", "expressions()"),
|
||||||
(" at runtime", None, None, None, None),
|
(" at runtime", None),
|
||||||
)
|
)
|
||||||
_field_values = (names, expressions)
|
_field_values = (names, expressions())
|
||||||
|
_format_specifiers = (f"", f"")
|
||||||
template = types.InterpolationTemplate(_raw_template,
|
template = types.InterpolationTemplate(_raw_template,
|
||||||
_parsed_fields,
|
_parsed_template,
|
||||||
_field_values)
|
_field_values,
|
||||||
|
_format_specifiers)
|
||||||
|
|
||||||
The ``__str__`` method on ``types.InterpolationTemplate`` would then implementat
|
The ``__format__`` method on ``types.InterpolationTemplate`` would then
|
||||||
the following ``str.format`` inspired semantics::
|
implement the following ``str.format`` inspired semantics::
|
||||||
|
|
||||||
>>> import datetime
|
>>> import datetime
|
||||||
>>> name = 'Jane'
|
>>> name = 'Jane'
|
||||||
>>> age = 50
|
>>> age = 50
|
||||||
>>> anniversary = datetime.date(1991, 10, 12)
|
>>> anniversary = datetime.date(1991, 10, 12)
|
||||||
>>> str(i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
>>> format(i'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
|
||||||
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
||||||
>>> str(i'She said her name is ${name!r}.')
|
>>> format(i'She said her name is {repr(name)}.')
|
||||||
"She said her name is 'Jane'."
|
"She said her name is 'Jane'."
|
||||||
|
|
||||||
The interpolation template prefix can be combined with single-quoted,
|
As with formatted strings, the interpolation template prefix can be combined with single-quoted, double-quoted and triple quoted strings, including raw strings.
|
||||||
double-quoted and triple quoted strings, including raw strings. It does not
|
It does not support combination with bytes literals.
|
||||||
support combination with bytes literals.
|
|
||||||
|
|
||||||
This PEP does not propose to remove or deprecate any of the existing
|
Similarly, this PEP does not propose to remove or deprecate any of the existing
|
||||||
string formatting mechanisms, as those will remain valuable when formatting
|
string formatting mechanisms, as those will remain valuable when formatting
|
||||||
strings that are not present directly in the source code of the application.
|
strings that are not present directly in the source code of the application.
|
||||||
|
|
||||||
|
@ -105,38 +116,15 @@ lexical namespace semantics simpler, but it does so at the cost of creating a
|
||||||
situation where interpolating values into sensitive targets like SQL queries,
|
situation where interpolating values into sensitive targets like SQL queries,
|
||||||
shell commands and HTML templates will enjoy a much cleaner syntax when handled
|
shell commands and HTML templates will enjoy a much cleaner syntax when handled
|
||||||
without regard for code injection attacks than when they are handled correctly.
|
without regard for code injection attacks than when they are handled correctly.
|
||||||
It also has the effect of introducing yet another syntax for substitution
|
|
||||||
expressions into Python, when we already have 3 (``str.format``,
|
|
||||||
``bytes.__mod__`` and ``string.Template``)
|
|
||||||
|
|
||||||
This PEP proposes to handle the former issue by deferring the actual rendering
|
This PEP proposes to provide the option of delaying the actual rendering
|
||||||
of the interpolation template to its ``__str__`` method (allow the use of
|
of an interpolation template to its ``__format__`` method, allowing the use of
|
||||||
other template renderers by passing the template around as an object), and the
|
other template renderers by passing the template around as a first class object.
|
||||||
latter by adopting the ``string.Template`` substitution syntax defined in PEP
|
|
||||||
292.
|
|
||||||
|
|
||||||
The substitution syntax devised for PEP 292 is deliberately simple so that the
|
While very different in the technical details, the
|
||||||
template strings can be extracted into an i18n message catalog, and passed to
|
``types.InterpolationTemplate`` interface proposed in this PEP is
|
||||||
translators who may not themselves be developers. For these use cases, it is
|
conceptually quite similar to the ``FormattableString`` type underlying the
|
||||||
important that the interpolation syntax be as simple as possible, as the
|
`native interpolation <https://msdn.microsoft.com/en-us/library/dn961160.aspx>`__ support introduced in C# 6.0.
|
||||||
translators are responsible for preserving the substition markers, even as
|
|
||||||
they translate the surrounding text. The PEP 292 syntax is also a common mesage
|
|
||||||
catalog syntax already supporting by many commercial software translation
|
|
||||||
support tools.
|
|
||||||
|
|
||||||
PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
|
|
||||||
introduced for general purpose string formatting in PEP 3101, so this PEP adds
|
|
||||||
that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
|
|
||||||
tools the option of rejecting usage of that more advanced syntax at runtime,
|
|
||||||
rather than categorically rejecting it at compile time. The proposed permitted
|
|
||||||
expressions, conversion specifiers, and format specifiers inside ``${ref}`` are
|
|
||||||
exactly as defined for ``{ref}`` substituion in PEP 498.
|
|
||||||
|
|
||||||
The specific proposal in this PEP is also deliberately close in both syntax
|
|
||||||
and semantics to the general purpose interpolation syntax introduced to
|
|
||||||
JavaScript in ES6, as we can reasonably expect a great many Python developers
|
|
||||||
to be regularly switching back and forth between user interface code written in
|
|
||||||
JavaScript and core application code written in Python.
|
|
||||||
|
|
||||||
|
|
||||||
Specification
|
Specification
|
||||||
|
@ -150,141 +138,153 @@ Interpolation template literals are Unicode strings (bytes literals are not
|
||||||
permitted), and string literal concatenation operates as normal, with the
|
permitted), and string literal concatenation operates as normal, with the
|
||||||
entire combined literal forming the interpolation template.
|
entire combined literal forming the interpolation template.
|
||||||
|
|
||||||
The template string is parsed into literals and expressions. Expressions
|
The template string is parsed into literals, expressions and format specifiers
|
||||||
appear as either identifiers prefixed with a single "$" character, or
|
as described for f-strings in PEP 498. Conversion specifiers are handled
|
||||||
surrounded be a leading '${' and a trailing '}. The parts of the format string
|
by the compiler, and appear as part of the field text in interpolation
|
||||||
that are not expressions are separated out as string literals.
|
templates.
|
||||||
|
|
||||||
While parsing the string, any doubled ``$$`` is replaced with a single ``$``
|
However, rather than being rendered directly into a formatted strings, these
|
||||||
and is considered part of the literal text, rather than as introducing an
|
components are instead organised into an instance of a new type with the
|
||||||
expression.
|
|
||||||
|
|
||||||
These components are then organised into an instance of a new type with the
|
|
||||||
following semantics::
|
following semantics::
|
||||||
|
|
||||||
class InterpolationTemplate:
|
class InterpolationTemplate:
|
||||||
__slots__ = ("raw_template", "parsed_fields", "field_values")
|
__slots__ = ("raw_template", "parsed_template",
|
||||||
|
"field_values", "format_specifiers")
|
||||||
|
|
||||||
def __new__(cls, raw_template, parsed_fields, field_values):
|
def __new__(cls, raw_template, parsed_template,
|
||||||
|
field_values, format_specifiers):
|
||||||
self = super().__new__(cls)
|
self = super().__new__(cls)
|
||||||
self.raw_template = raw_template
|
self.raw_template = raw_template
|
||||||
self.parsed_fields = parsed_fields
|
self.parsed_template = parsed_template
|
||||||
self.field_values = field_values
|
self.field_values = field_values
|
||||||
|
self.format_specifiers = format_specifiers
|
||||||
return self
|
return self
|
||||||
|
|
||||||
def __iter__(self):
|
|
||||||
# Support iterable unpacking
|
|
||||||
yield self.raw_template
|
|
||||||
yield self.parsed_fields
|
|
||||||
yield self.field_values
|
|
||||||
|
|
||||||
def __repr__(self):
|
def __repr__(self):
|
||||||
return str(i"<${type(self).__qualname__} ${self.raw_template!r} "
|
return (f"<{type(self).__qualname__} {repr(self._raw_template)} "
|
||||||
"at ${id(self):#x}>")
|
f"at {id(self):#x}>")
|
||||||
|
|
||||||
def __str__(self):
|
def __format__(self, format_specifier):
|
||||||
# See definition of the default template rendering below
|
# When formatted, render to a string, and use string formatting
|
||||||
|
return format(self.render(), format_specifier)
|
||||||
|
|
||||||
The result of the interpolation template expression is an instance of this
|
def render(self, *, render_template=''.join,
|
||||||
type, rather than an already rendered string - default rendering only takes
|
render_field=format):
|
||||||
place when the instance's ``__str__`` method is called.
|
# See definition of the template rendering semantics below
|
||||||
|
|
||||||
The format of the parsed fields tuple is inspired by the interface of
|
The result of an interpolation template expression is an instance of this
|
||||||
``string.Formatter.parse``, and consists of a series of 5-tuples each
|
type, rather than an already rendered string - rendering only takes
|
||||||
containing:
|
place when the instance's ``render`` method is called (either directly, or
|
||||||
|
indirectly via ``__format__``).
|
||||||
|
|
||||||
* a leading string literal (may be the empty string)
|
The compiler will pass the following details to the interpolation template for
|
||||||
* the substitution field position (zero-based enumeration)
|
later use:
|
||||||
* the substitution expression text
|
|
||||||
* the substitution conversion specifier (as defined by str.format)
|
|
||||||
* the substitution format specifier (as defined by str.format)
|
|
||||||
|
|
||||||
This field ordering is defined such that reading the parsed field tuples from
|
* a string containing the raw template as written in the source code
|
||||||
left to right will have all the subcomponents displayed in the same order as
|
* a parsed template tuple that allows the renderer to render the
|
||||||
they appear in the original template string.
|
template without needing to reparse the raw string template for substitution
|
||||||
|
fields
|
||||||
|
* a tuple containing the evaluated field values, in field substitution order
|
||||||
|
* a tuple containing the field format specifiers, in field substitution order
|
||||||
|
|
||||||
For ease of access the sequence elements will be available as attributes in
|
This structure is designed to take full advantage of compile time constant
|
||||||
addition to being available by position:
|
folding by ensuring the parsed template is always constant, even when the
|
||||||
|
field values and format specifiers include variable substitution expressions.
|
||||||
|
|
||||||
* ``leading_text``
|
The raw template is just the interpolation template as a string. By default,
|
||||||
* ``field_position``
|
it is used to provide an human readable representation for the interpolation
|
||||||
* ``expression``
|
template.
|
||||||
* ``conversion``
|
|
||||||
* ``format``
|
|
||||||
|
|
||||||
The expression text is simply the text of the substitution expression, as it
|
The parsed template consists of a tuple of 2-tuples, with each 2-tuple
|
||||||
appeared in the original string, but without the leading and/or surrounding
|
containing the following fields:
|
||||||
expression markers. The conversion specifier and format specifier are separated
|
|
||||||
from the substition expression by ``!`` and ``:`` as defined for ``str.format``.
|
|
||||||
|
|
||||||
If a given substition field has no leading literal section, conversion specifier
|
* ``leading_text``: a leading string literal. This will be the empty string if
|
||||||
or format specifier, then the corresponding elements in the tuple are the
|
the current field is at the start of the string, or immediately follows the
|
||||||
empty string. If the final part of the string has no trailing substitution
|
preceding field.
|
||||||
field, then the field position, field expression, conversion specifier and
|
* ``field_expr``: the text of the expression element in the substitution field.
|
||||||
format specifier will all be ``None``.
|
This will be None for a final trailing text segment.
|
||||||
|
|
||||||
The substitution field values tuple is created by evaluating the interpolated
|
The tuple of evaluated field values holds the *results* of evaluating the
|
||||||
expressions in the exact runtime context where the interpolation expression
|
substitution expressions in the scope where the interpolation template appears.
|
||||||
appears in the source code.
|
|
||||||
|
|
||||||
For the following example interpolation template::
|
The tuple of field specifiers holds the *results* of evaluating the field
|
||||||
|
specifiers as f-strings in the scope where the interpolation template appears.
|
||||||
|
|
||||||
i'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'
|
The ``InterpolationTemplate.render`` implementation then defines the rendering
|
||||||
|
process in terms of the following renderers:
|
||||||
|
|
||||||
the parsed fields tuple would be::
|
* an overall ``render_template`` operation that defines how the sequence of
|
||||||
|
literal template sections and rendered fields are composed into a fully
|
||||||
|
rendered result. The default template renderer is string concatenation
|
||||||
|
using ``''.join``.
|
||||||
|
* a per field ``render_field`` operation that receives the field value and
|
||||||
|
format specifier for substitution fields within the template. The default
|
||||||
|
field renderer is the ``format`` builtin.
|
||||||
|
|
||||||
(
|
Given an appropriate parsed template representation and internal methods of
|
||||||
('abc', 0, 'expr1', '', 'spec1'),
|
iterating over it, the semantics of template rendering would then be equivalent
|
||||||
('', 1, 'expr2', 'r', 'spec2'),
|
to the following::
|
||||||
(def', 2, 'expr3', 's', ''),
|
|
||||||
('ghi', 3, 'ident', '', ''),
|
|
||||||
('$jkl', None, None, None, None)
|
|
||||||
)
|
|
||||||
|
|
||||||
While the field values tuple would be::
|
def render(self, *, render_template=''.join,
|
||||||
|
render_field=format):
|
||||||
(expr1, expr2, expr3, ident)
|
iter_fields = enumerate(self.parsed_template)
|
||||||
|
values = self.field_values
|
||||||
The parsed fields tuple can be constant folded at compile time, while the
|
specifiers = self.format_specifiers
|
||||||
expression values tuple will always need to be constructed at runtime.
|
|
||||||
|
|
||||||
The ``InterpolationTemplate.__str__`` implementation would have the following
|
|
||||||
semantics, with field processing being defined in terms of the ``format``
|
|
||||||
builtin and ``str.format`` conversion specifiers::
|
|
||||||
|
|
||||||
_converter = string.Formatter().convert_field
|
|
||||||
|
|
||||||
def __str__(self):
|
|
||||||
raw_template, fields, values = self
|
|
||||||
template_parts = []
|
template_parts = []
|
||||||
for leading_text, field_num, expr, conversion, format_spec in fields:
|
for field_pos, (leading_text, field_expr) in iter_fields:
|
||||||
template_parts.append(leading_text)
|
template_parts.append(leading_text)
|
||||||
if field_num is not None:
|
if field_expr is not None:
|
||||||
value = values[field_num]
|
value = values[field_pos]
|
||||||
if conversion:
|
specifier = specifiers[field_pos]
|
||||||
value = _converter(value, conversion)
|
rendered_field = render_field(value, specifier)
|
||||||
field_text = format(value, format_spec)
|
template_parts.append(rendered_field)
|
||||||
template_parts.append(field_str)
|
return render_template(template_parts)
|
||||||
return "".join(template_parts)
|
|
||||||
|
|
||||||
Writing custom interpolators
|
Conversion specifiers
|
||||||
----------------------------
|
---------------------
|
||||||
|
|
||||||
Writing a custom interpolator doesn't requiring any special syntax. Instead,
|
The ``!a``, ``!r`` and ``!s`` conversion specifiers supported by ``str.format``
|
||||||
custom interpolators are ordinary callables that process an interpolation
|
and hence PEP 498 are handled in interpolation templates as follows:
|
||||||
template directly based on the ``raw_template``, ``parsed_fields`` and
|
|
||||||
``field_values`` attributes, rather than relying on the default rendered.
|
* they're included unmodified in the raw template to ensure no information is
|
||||||
|
lost
|
||||||
|
* they're *replaced* in the parsed template with the corresponding builtin
|
||||||
|
calls, in order to ensure that ``field_expr`` always contains a valid
|
||||||
|
Python expression
|
||||||
|
* the corresponding field value placed in the field values tuple is
|
||||||
|
converted appropriately *before* being passed to the interpolation
|
||||||
|
template
|
||||||
|
|
||||||
|
This means that, for most purposes, the difference between the use of
|
||||||
|
conversion specifiers and calling the corresponding builtins in the
|
||||||
|
original interpolation template will be transparent to custom renderers. The
|
||||||
|
difference will only be apparent if reparsing the raw template, or attempting
|
||||||
|
to reconstruct the original template from the parsed template.
|
||||||
|
|
||||||
|
Writing custom renderers
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
Writing a custom renderer doesn't requiring any special syntax. Instead,
|
||||||
|
custom renderers are ordinary callables that process an interpolation
|
||||||
|
template directly either by calling the ``render()`` method with alternate ``render_template`` or ``render_field`` implementations, or by accessing the
|
||||||
|
template's data attributes directly.
|
||||||
|
|
||||||
|
For example, the following function would render a template using objects'
|
||||||
|
``repr`` implementations rather than their native formatting support::
|
||||||
|
|
||||||
|
def reprformat(template):
|
||||||
|
def render_field(value, specifier):
|
||||||
|
return format(repr(value), specifier)
|
||||||
|
return template.render(render_field=render_field)
|
||||||
|
|
||||||
|
|
||||||
Expression evaluation
|
Expression evaluation
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
The subexpressions that are extracted from the interpolation expression are
|
As with f-strings, the subexpressions that are extracted from the interpolation
|
||||||
evaluated in the context where the interpolation expression appears. This means
|
template are evaluated in the context where the interpolation template
|
||||||
the expression has full access to local, nonlocal and global variables. Any
|
appears. This means the expression has full access to local, nonlocal and global variables. Any valid Python expression can be used inside ``{}``, including
|
||||||
valid Python expression can be used inside ``${}``, including function and
|
function and method calls.
|
||||||
method calls. References without the surrounding braces are limited to looking
|
|
||||||
up single identifiers.
|
|
||||||
|
|
||||||
Because the substitution expressions are evaluated where the string appears in
|
Because the substitution expressions are evaluated where the string appears in
|
||||||
the source code, there are no additional security concerns related to the
|
the source code, there are no additional security concerns related to the
|
||||||
|
@ -295,7 +295,7 @@ same expression and used runtime field parsing::
|
||||||
>>> def foo(data):
|
>>> def foo(data):
|
||||||
... return data + 20
|
... return data + 20
|
||||||
...
|
...
|
||||||
>>> str(i'input=$bar, output=${foo(bar)}')
|
>>> str(i'input={bar}, output={foo(bar)}')
|
||||||
'input=10, output=30'
|
'input=10, output=30'
|
||||||
|
|
||||||
Is essentially equivalent to::
|
Is essentially equivalent to::
|
||||||
|
@ -306,37 +306,44 @@ Is essentially equivalent to::
|
||||||
Handling code injection attacks
|
Handling code injection attacks
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
The proposed interpolation syntax makes it potentially attractive to write
|
The PEP 498 formatted string syntax makes it potentially attractive to write
|
||||||
code like the following::
|
code like the following::
|
||||||
|
|
||||||
myquery = str(i"SELECT $column FROM $table;")
|
runquery(f"SELECT {column} FROM {table};")
|
||||||
mycommand = str(i"cat $filename")
|
runcommand(f"cat {filename}")
|
||||||
mypage = str(i"<html><body>${response.body}</body></html>")
|
return_response(f"<html><body>{response.body}</body></html>")
|
||||||
|
|
||||||
These all represent potential vectors for code injection attacks, if any of the
|
These all represent potential vectors for code injection attacks, if any of the
|
||||||
variables being interpolated happen to come from an untrusted source. The
|
variables being interpolated happen to come from an untrusted source. The
|
||||||
specific proposal in this PEP is designed to make it straightforward to write
|
specific proposal in this PEP is designed to make it straightforward to write
|
||||||
use case specific interpolators that take care of quoting interpolated values
|
use case specific renderers that take care of quoting interpolated values
|
||||||
appropriately for the relevant security context::
|
appropriately for the relevant security context::
|
||||||
|
|
||||||
myquery = sql(i"SELECT $column FROM $table;")
|
runquery(sql(i"SELECT {column} FROM {table};"))
|
||||||
mycommand = sh(i"cat $filename")
|
runcommand(sh(i"cat {filename}"))
|
||||||
mypage = html(i"<html><body>${response.body}</body></html>")
|
return_response(html(i"<html><body>{response.body}</body></html>"))
|
||||||
|
|
||||||
This PEP does not cover adding such interpolators to the standard library,
|
This PEP does not cover adding such renderers to the standard library
|
||||||
but instead ensures they can be readily provided by third party libraries.
|
immediately, but rather proposes to ensure that they can be readily provided by
|
||||||
|
third party libraries, and potentially incorporated into the standard library
|
||||||
|
at a later date.
|
||||||
|
|
||||||
(Although it's tempting to propose adding InterpolationTemplate support at
|
For example, a renderer that aimed to offer a POSIX shell style experience for
|
||||||
least to ``subprocess.call``, ``subprocess.check_call`` and
|
accessing external programs, without the significant risks posed by running
|
||||||
``subprocess.check_output``)
|
``os.system`` or enabling the system shell when using the ``subprocess`` module
|
||||||
|
APIs, might provide an interface for running external programs similar to that
|
||||||
|
offered by the
|
||||||
|
`Julia programming language <http://julia.readthedocs.org/en/latest/manual/running-external-programs/>`__,
|
||||||
|
only with the backtick based ``\`cat $filename\``` syntax replaced by
|
||||||
|
``i"cat {filename}"`` style interpolation templates.
|
||||||
|
|
||||||
Format and conversion specifiers
|
Format specifiers
|
||||||
--------------------------------
|
-----------------
|
||||||
|
|
||||||
Aside from separating them out from the substitution expression, format and
|
Aside from separating them out from the substitution expression during parsing,
|
||||||
conversion specifiers are otherwise treated as opaque strings by the
|
format specifiers are otherwise treated as opaque strings by the interpolation
|
||||||
interpolation template parser - assigning semantics to those (or, alternatively,
|
template parser - assigning semantics to those (or, alternatively,
|
||||||
prohibiting their use) is handled at runtime by the specified interpolator.
|
prohibiting their use) is handled at runtime by the field renderer.
|
||||||
|
|
||||||
Error handling
|
Error handling
|
||||||
--------------
|
--------------
|
||||||
|
@ -348,13 +355,13 @@ errors all raise SyntaxError.
|
||||||
|
|
||||||
Unmatched braces::
|
Unmatched braces::
|
||||||
|
|
||||||
>>> i'x=${x'
|
>>> i'x={x'
|
||||||
File "<stdin>", line 1
|
File "<stdin>", line 1
|
||||||
SyntaxError: missing '}' in interpolation expression
|
SyntaxError: missing '}' in interpolation expression
|
||||||
|
|
||||||
Invalid expressions::
|
Invalid expressions::
|
||||||
|
|
||||||
>>> i'x=${!x}'
|
>>> i'x={!x}'
|
||||||
File "<fstring>", line 1
|
File "<fstring>", line 1
|
||||||
!x
|
!x
|
||||||
^
|
^
|
||||||
|
@ -364,68 +371,16 @@ Run time errors occur when evaluating the expressions inside a
|
||||||
template string before creating the interpolation template object. See PEP 498
|
template string before creating the interpolation template object. See PEP 498
|
||||||
for some examples.
|
for some examples.
|
||||||
|
|
||||||
Different interpolators may also impose additional runtime
|
Different renderers may also impose additional runtime
|
||||||
constraints on acceptable interpolated expressions and other formatting
|
constraints on acceptable interpolated expressions and other formatting
|
||||||
details, which will be reported as runtime exceptions.
|
details, which will be reported as runtime exceptions.
|
||||||
|
|
||||||
|
|
||||||
Internationalising interpolated strings
|
|
||||||
=======================================
|
|
||||||
|
|
||||||
Since this PEP derives its interpolation syntax from the internationalisation
|
|
||||||
focused PEP 292, it's worth considering the potential implications this PEP
|
|
||||||
may have for the internationalisation use case.
|
|
||||||
|
|
||||||
Internationalisation enters the picture by writing a custom interpolator that
|
|
||||||
performs internationalisation. For example, the following implementation
|
|
||||||
would delegate interpolation calls to ``string.Template``::
|
|
||||||
|
|
||||||
def i18n(template):
|
|
||||||
# A real implementation would also handle normal strings
|
|
||||||
raw_template, fields, values = template
|
|
||||||
translated = gettext.gettext(raw_template)
|
|
||||||
value_map = _build_interpolation_map(fields, values)
|
|
||||||
return string.Template(translated).safe_substitute(value_map)
|
|
||||||
|
|
||||||
def _build_interpolation_map(fields, values):
|
|
||||||
field_values = {}
|
|
||||||
for literal_text, field_num, expr, conversion, format_spec in fields:
|
|
||||||
assert expr.isidentifier() and not conversion and not format_spec
|
|
||||||
if field_num is not None:
|
|
||||||
field_values[expr] = values[field_num]
|
|
||||||
return field_values
|
|
||||||
|
|
||||||
And could then be invoked as::
|
|
||||||
|
|
||||||
# _ = i18n at top of module or injected into the builtins module
|
|
||||||
print(_(i"This is a $translated $message"))
|
|
||||||
|
|
||||||
Any actual i18n implementation would need to address other issues (most notably
|
|
||||||
message catalog extraction), but this gives the general idea of what might be
|
|
||||||
possible.
|
|
||||||
|
|
||||||
It's also worth noting that one of the benefits of the ``$`` based substitution
|
|
||||||
syntax in this PEP is its compatibility with Mozilla's
|
|
||||||
`l20n syntax <http://l20n.org/>`__, which uses ``{{ name }}`` for global
|
|
||||||
substitution, and ``{{ $user }}`` for local context substitution.
|
|
||||||
|
|
||||||
With the syntax in this PEP, an l20n interpolator could be written as::
|
|
||||||
|
|
||||||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
|
||||||
|
|
||||||
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
|
|
||||||
catalog lookups using PEP 498's semantics), the necessary brace escaping would
|
|
||||||
make the string look like this in order to interpolate the user variable
|
|
||||||
while preserving all of the expected braces::
|
|
||||||
|
|
||||||
locally_interpolated = f"{{{{ ${user} }}}} is running {{{{ appname }}}}"
|
|
||||||
|
|
||||||
|
|
||||||
Possible integration with the logging module
|
Possible integration with the logging module
|
||||||
============================================
|
============================================
|
||||||
|
|
||||||
One of the challenges with the logging module has been that previously been
|
One of the challenges with the logging module has been that we have previously
|
||||||
unable to devise a reasonable migration strategy away from the use of
|
been unable to devise a reasonable migration strategy away from the use of
|
||||||
printf-style formatting. The runtime parsing and interpolation overhead for
|
printf-style formatting. The runtime parsing and interpolation overhead for
|
||||||
logging messages also poses a problem for extensive logging of runtime events
|
logging messages also poses a problem for extensive logging of runtime events
|
||||||
for monitoring purposes.
|
for monitoring purposes.
|
||||||
|
@ -434,13 +389,41 @@ While beyond the scope of this initial PEP, interpolation template support
|
||||||
could potentially be added to the logging module's event reporting APIs,
|
could potentially be added to the logging module's event reporting APIs,
|
||||||
permitting relevant details to be captured using forms like::
|
permitting relevant details to be captured using forms like::
|
||||||
|
|
||||||
logging.debug(i"Event: $event; Details: $data")
|
logging.debug(i"Event: {event}; Details: {data}")
|
||||||
logging.critical(i"Error: $error; Details: $data")
|
logging.critical(i"Error: {error}; Details: {data}")
|
||||||
|
|
||||||
|
Rather than the current mod-formatting style::
|
||||||
|
|
||||||
|
logging.debug("Event: %s; Details: %s", event, data)
|
||||||
|
logging.critical("Error: %s; Details: %s", event, data)
|
||||||
|
|
||||||
As the interpolation template is passed in as an ordinary argument, other
|
As the interpolation template is passed in as an ordinary argument, other
|
||||||
keyword arguments also remain available::
|
keyword arguments would also remain available::
|
||||||
|
|
||||||
|
logging.critical(i"Error: {error}; Details: {data}", exc_info=True)
|
||||||
|
|
||||||
|
As part of any such integration, a recommended approach would need to be
|
||||||
|
defined for "lazy evaluation" of interpolated fields, as the ``logging``
|
||||||
|
module's existing delayed interpolation support provides access to
|
||||||
|
`various attributes <https://docs.python.org/3/library/logging.html#logrecord-attributes>`__ of the event ``LogRecord`` instance.
|
||||||
|
|
||||||
|
For example, since interpolation expressions are arbitrary Python expressions,
|
||||||
|
string literals could be used to indicate cases where evaluation itself is
|
||||||
|
being deferred, not just rendering::
|
||||||
|
|
||||||
|
logging.debug(i"Logger: {'record.name'}; Event: {event}; Details: {data}")
|
||||||
|
|
||||||
|
This could be further extended with idioms like using inline tuples to indicate
|
||||||
|
deferred function calls to be made only if the log message is actually
|
||||||
|
going to be rendered at current logging levels::
|
||||||
|
|
||||||
|
logging.debug(i"Event: {event}; Details: {expensive_call, raw_data}")
|
||||||
|
|
||||||
|
This kind of approach would be possible as having access to the actual *text*
|
||||||
|
of the field expression would allow the logging renderer to distinguish
|
||||||
|
between inline tuples that appear in the field expression itself, and tuples
|
||||||
|
that happen to be passed in as data values in a normal field.
|
||||||
|
|
||||||
logging.critical(i"Error: $error; Details: $data", exc_info=True)
|
|
||||||
|
|
||||||
Discussion
|
Discussion
|
||||||
==========
|
==========
|
||||||
|
@ -455,10 +438,10 @@ Supporting binary interpolation with this syntax would be relatively
|
||||||
straightforward (the elements in the parsed fields tuple would just be
|
straightforward (the elements in the parsed fields tuple would just be
|
||||||
byte strings rather than text strings, and the default renderer would be
|
byte strings rather than text strings, and the default renderer would be
|
||||||
markedly less useful), but poses a signficant likelihood of producing
|
markedly less useful), but poses a signficant likelihood of producing
|
||||||
confusing type errors when a text interpolator was presented with
|
confusing type errors when a text renderer was presented with
|
||||||
binary input.
|
binary input.
|
||||||
|
|
||||||
Since the proposed operator is useful without binary interpolation support, and
|
Since the proposed syntax is useful without binary interpolation support, and
|
||||||
such support can be readily added later, further consideration of binary
|
such support can be readily added later, further consideration of binary
|
||||||
interpolation is considered out of scope for the current PEP.
|
interpolation is considered out of scope for the current PEP.
|
||||||
|
|
||||||
|
@ -466,19 +449,21 @@ Interoperability with str-only interfaces
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
For interoperability with interfaces that only accept strings, interpolation
|
For interoperability with interfaces that only accept strings, interpolation
|
||||||
templates can be prerendered with ``str``, rather than delegating the rendering
|
templates can still be prerendered with ``format``, rather than delegating the
|
||||||
to the called function.
|
rendering to the called function.
|
||||||
|
|
||||||
This reflects the key difference from PEP 498, which *always* eagerly applies
|
This reflects the key difference from PEP 498, which *always* eagerly applies
|
||||||
the default rendering, without any convenient way to decide to do something
|
the default rendering, without any convenient way to delegate that choice to
|
||||||
different.
|
another section of the code.
|
||||||
|
|
||||||
Preserving the raw template string
|
Preserving the raw template string
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
|
||||||
Earlier versions of this PEP failed to make the raw template string available
|
Earlier versions of this PEP failed to make the raw template string available
|
||||||
to interpolators. This greatly complicated the i18n example, as it needed to
|
on the interpolation template. Retaining it makes it possible to provide a more
|
||||||
reconstruct the original template to pass to the message catalog lookup.
|
attractive template representation, as well as providing the ability to
|
||||||
|
precisely reconstruct the original string, including both the expression text
|
||||||
|
and the details of any eagerly rendered substitution fields in format specifiers.
|
||||||
|
|
||||||
Creating a rich object rather than a global name lookup
|
Creating a rich object rather than a global name lookup
|
||||||
-------------------------------------------------------
|
-------------------------------------------------------
|
||||||
|
@ -488,33 +473,52 @@ a creating a new kind of object for later consumption by interpolation
|
||||||
functions. Creating a rich descriptive object with a useful default renderer
|
functions. Creating a rich descriptive object with a useful default renderer
|
||||||
made it much easier to support customisation of the semantics of interpolation.
|
made it much easier to support customisation of the semantics of interpolation.
|
||||||
|
|
||||||
Relative order of conversion and format specifier in parsed fields
|
Building atop PEP 498, rather than competing with it
|
||||||
------------------------------------------------------------------
|
----------------------------------------------------
|
||||||
|
|
||||||
The relative order of the conversion specifier and the format specifier in the
|
Earlier versions of this PEP attempted to serve as a complete substitute for
|
||||||
substitution field 5-tuple is defined to match the order they appear in the
|
PEP 498, rather than building a more flexible delayed rendering capability on
|
||||||
format string, which is unfortunately the inverse of the way they appear in the
|
top of PEP 498's eager rendering.
|
||||||
``string.Formatter.parse`` 4-tuple.
|
|
||||||
|
|
||||||
I consider this a design defect in ``string.Formatter.parse``, so I think it's
|
Assuming the presence of f-strings as a supporting capability simplified a
|
||||||
worth fixing it in for the customer interpolator API, since the tuple already
|
number of aspects of the proposal in this PEP (such as how to handle substitution
|
||||||
has other differences (like including both the field position number *and* the
|
fields in format specifiers)
|
||||||
text of the expression).
|
|
||||||
|
|
||||||
This PEP also makes the parsed field attributes available by name, so it's
|
Deferring consideration of possible use in i18n use cases
|
||||||
possible to write interpolators without caring about the precise field order
|
---------------------------------------------------------
|
||||||
at all.
|
|
||||||
|
|
||||||
|
The initial motivating use case for this PEP was providing a cleaner syntax
|
||||||
|
for i18n translation, as that requires access to the original unmodified
|
||||||
|
template. As such, it focused on compatibility with the subsitution syntax used
|
||||||
|
in Python's ``string.Template`` formatting and Mozilla's l20n project.
|
||||||
|
|
||||||
|
However, subsequent discussion revealed there are significant additional
|
||||||
|
considerations to be taken into account in the i18n use case, which don't
|
||||||
|
impact the simpler cases of handling interpolation into security sensitive
|
||||||
|
contexts (like HTML, system shells, and database queries), or producing
|
||||||
|
application debugging messages in the preferred language of the development
|
||||||
|
team (rather than the native language of end users).
|
||||||
|
|
||||||
|
Due to the original design of the ``str.format`` substitution syntax in PEP
|
||||||
|
3101 being inspired by C#'s string formatting syntax, the specific field
|
||||||
|
substitution syntax used in PEP 498 is consistent not only with Python's own ``str.format`` syntax, but also with string formatting in C#, including the
|
||||||
|
native "$-string" interpolation syntax introduced in C# 6.0 (released in July
|
||||||
|
2015). This means that while this particular substitution syntax may not
|
||||||
|
currently be widely used for translation of *Python* applications (losing out
|
||||||
|
to traditional %-formatting and the designed-specifically-for-i18n
|
||||||
|
``string.Template`` formatting), it *is* a popular translation format in the
|
||||||
|
wider software development ecosystem (since it is already the preferred
|
||||||
|
format for translating C# applications).
|
||||||
|
|
||||||
Acknowledgements
|
Acknowledgements
|
||||||
================
|
================
|
||||||
|
|
||||||
* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
|
* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
|
||||||
arbitrary expression substitution in string interpolation
|
arbitrary expression substitution in string interpolation
|
||||||
* Barry Warsaw for the string.Template syntax defined in PEP 292
|
* Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to
|
||||||
* Armin Ronacher for pointing me towards Mozilla's l20n project
|
exploring the feasibility of using this model of delayed rendering in i18n
|
||||||
* Mike Miller for his survey of programming language interpolation syntaxes in
|
use cases (even though the ultimate conclusion was that it was a poor fit,
|
||||||
PEP (TBD)
|
at least for current approaches to i18n in Python)
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
@ -540,8 +544,11 @@ References
|
||||||
.. [#] PEP 498: Literal string formatting
|
.. [#] PEP 498: Literal string formatting
|
||||||
(https://www.python.org/dev/peps/pep-0498/)
|
(https://www.python.org/dev/peps/pep-0498/)
|
||||||
|
|
||||||
.. [#] string.Formatter.parse
|
.. [#] FormattableString and C# native string interpolation
|
||||||
(https://docs.python.org/3/library/string.html#string.Formatter.parse)
|
(https://msdn.microsoft.com/en-us/library/dn961160.aspx)
|
||||||
|
|
||||||
|
.. [#] Running external commands in Julia
|
||||||
|
(http://julia.readthedocs.org/en/latest/manual/running-external-programs/)
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
=========
|
=========
|
||||||
|
|
Loading…
Reference in New Issue