PEP 501: Build on 498 instead of competing
This commit is contained in:
parent
fbe3070944
commit
651a74028d
565
pep-0501.txt
565
pep-0501.txt
|
@ -6,9 +6,10 @@ Author: Nick Coghlan <ncoghlan@gmail.com>
|
|||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Requires: 498
|
||||
Created: 08-Aug-2015
|
||||
Python-Version: 3.6
|
||||
Post-History: 08-Aug-2015, 23-Aug-2015
|
||||
Post-History: 08-Aug-2015, 23-Aug-2015, 30-Aug-2015
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
@ -16,44 +17,53 @@ Abstract
|
|||
PEP 498 proposes new syntactic support for string interpolation that is
|
||||
transparent to the compiler, allow name references from the interpolation
|
||||
operation full access to containing namespaces (as with any other expression),
|
||||
rather than being limited to explicitly name references.
|
||||
rather than being limited to explicit name references. These are referred
|
||||
to in the PEP as "f-strings" (a mnemonic for "formatted strings").
|
||||
|
||||
However, it only offers this capability for string formatting, making it likely
|
||||
we will see code like the following::
|
||||
|
||||
os.system(f"echo {user_message}")
|
||||
os.system(f"echo {message_from_user}")
|
||||
|
||||
This kind of code is superficially elegant, but poses a significant problem
|
||||
if the interpolated value ``user_message`` is in fact provided by a user: it's
|
||||
an opening for a form of code injection attack, where the supplied user data
|
||||
has not been properly escaped before being passed to the ``os.system`` call.
|
||||
if the interpolated value ``message_from_user`` is in fact provided by an
|
||||
untrusted user: it's an opening for a form of code injection attack, where
|
||||
the supplied user data has not been properly escaped before being passed to
|
||||
the ``os.system`` call.
|
||||
|
||||
To address that problem (and a number of other concerns), this PEP proposes an
|
||||
alternative approach to compiler supported interpolation, using ``i`` (for
|
||||
"interpolation") as the new string prefix and a substitution syntax
|
||||
inspired by that used in ``string.Template`` and ES6 JavaScript, rather than
|
||||
adding a 4th substitution variable syntax to Python.
|
||||
To address that problem (and a number of other concerns), this PEP proposes
|
||||
the complementary introduction of "i-strings" (a mnemonic for "interpolation
|
||||
template strings"), where ``f"Message with {data}"`` would produce the same
|
||||
result as ``format(i"Message with {data}")``.
|
||||
|
||||
Some possible examples of the proposed syntax::
|
||||
|
||||
msg = str(i'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
||||
print(_(i"This is a $translated $message"))
|
||||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
||||
myquery = sql(i"SELECT $column FROM $table;")
|
||||
mycommand = sh(i"cat $filename")
|
||||
mypage = html(i"<html><body>${response.body}</body></html>")
|
||||
callable = defer(i"$x + $y")
|
||||
mycommand = sh(i"cat {filename}")
|
||||
myquery = sql(i"SELECT {column} FROM {table};")
|
||||
myresponse = html(i"<html><body>{response.body}</body></html>")
|
||||
logging.debug(i"Message with {detailed} {debugging} {info}")
|
||||
|
||||
Summary of differences from PEP 498
|
||||
===================================
|
||||
|
||||
The key differences of this proposal relative to PEP 498:
|
||||
The key additions this proposal makes relative to PEP 498:
|
||||
|
||||
* "i" (interpolation template) prefix rather than "f" (formatted string)
|
||||
* string.Template/JavaScript inspired substitution syntax, rather than str.format/C# inspired
|
||||
* interpolation templates are created at runtime as a new kind of object
|
||||
* the default rendering is invoked by calling ``str()`` on a template object
|
||||
rather than automatically
|
||||
* the "i" (interpolation template) prefix indicates delayed rendering, but
|
||||
otherwise uses the same syntax and semantics as formatted strings
|
||||
* interpolation templates are available at runtime as a new kind of object
|
||||
(``types.InterpolationTemplate``)
|
||||
* the default rendering used by formatted strings is invoked on an
|
||||
interpolation template object by calling ``format(template)`` rather than
|
||||
implicitly
|
||||
* while f-string ``f"Message {here}"`` would be *semantically* equivalent to
|
||||
``format(i"Message {here}")``, it is expected that the explicit syntax would
|
||||
avoid the runtime overhead of using the delayed rendering machinery
|
||||
|
||||
NOTE: This proposal spells out a draft API for ``types.InterpolationTemplate``.
|
||||
The precise details of the structures and methods exposed by this type would
|
||||
be informed by the reference implementation of PEP 498, so it makes sense to
|
||||
gain experience with that as an internal API before locking down a public API
|
||||
(if this extension proposal is accepted).
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
@ -61,38 +71,39 @@ Proposal
|
|||
This PEP proposes the introduction of a new string prefix that declares the
|
||||
string to be an interpolation template rather than an ordinary string::
|
||||
|
||||
template = i"Substitute $names and ${expressions} at runtime"
|
||||
template = i"Substitute {names} and {expressions()} at runtime"
|
||||
|
||||
This would be effectively interpreted as::
|
||||
|
||||
_raw_template = "Substitute $names and ${expressions} at runtime"
|
||||
_parsed_fields = (
|
||||
("Substitute ", 0, "names", "", ""),
|
||||
(" and ", 1, "expressions", "", ""),
|
||||
(" at runtime", None, None, None, None),
|
||||
_raw_template = "Substitute {names} and {expressions()} at runtime"
|
||||
_parsed_template = (
|
||||
("Substitute ", "names"),
|
||||
(" and ", "expressions()"),
|
||||
(" at runtime", None),
|
||||
)
|
||||
_field_values = (names, expressions)
|
||||
_field_values = (names, expressions())
|
||||
_format_specifiers = (f"", f"")
|
||||
template = types.InterpolationTemplate(_raw_template,
|
||||
_parsed_fields,
|
||||
_field_values)
|
||||
_parsed_template,
|
||||
_field_values,
|
||||
_format_specifiers)
|
||||
|
||||
The ``__str__`` method on ``types.InterpolationTemplate`` would then implementat
|
||||
the following ``str.format`` inspired semantics::
|
||||
The ``__format__`` method on ``types.InterpolationTemplate`` would then
|
||||
implement the following ``str.format`` inspired semantics::
|
||||
|
||||
>>> import datetime
|
||||
>>> name = 'Jane'
|
||||
>>> age = 50
|
||||
>>> anniversary = datetime.date(1991, 10, 12)
|
||||
>>> str(i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
||||
>>> format(i'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
|
||||
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
||||
>>> str(i'She said her name is ${name!r}.')
|
||||
>>> format(i'She said her name is {repr(name)}.')
|
||||
"She said her name is 'Jane'."
|
||||
|
||||
The interpolation template prefix can be combined with single-quoted,
|
||||
double-quoted and triple quoted strings, including raw strings. It does not
|
||||
support combination with bytes literals.
|
||||
As with formatted strings, the interpolation template prefix can be combined with single-quoted, double-quoted and triple quoted strings, including raw strings.
|
||||
It does not support combination with bytes literals.
|
||||
|
||||
This PEP does not propose to remove or deprecate any of the existing
|
||||
Similarly, this PEP does not propose to remove or deprecate any of the existing
|
||||
string formatting mechanisms, as those will remain valuable when formatting
|
||||
strings that are not present directly in the source code of the application.
|
||||
|
||||
|
@ -105,38 +116,15 @@ lexical namespace semantics simpler, but it does so at the cost of creating a
|
|||
situation where interpolating values into sensitive targets like SQL queries,
|
||||
shell commands and HTML templates will enjoy a much cleaner syntax when handled
|
||||
without regard for code injection attacks than when they are handled correctly.
|
||||
It also has the effect of introducing yet another syntax for substitution
|
||||
expressions into Python, when we already have 3 (``str.format``,
|
||||
``bytes.__mod__`` and ``string.Template``)
|
||||
|
||||
This PEP proposes to handle the former issue by deferring the actual rendering
|
||||
of the interpolation template to its ``__str__`` method (allow the use of
|
||||
other template renderers by passing the template around as an object), and the
|
||||
latter by adopting the ``string.Template`` substitution syntax defined in PEP
|
||||
292.
|
||||
This PEP proposes to provide the option of delaying the actual rendering
|
||||
of an interpolation template to its ``__format__`` method, allowing the use of
|
||||
other template renderers by passing the template around as a first class object.
|
||||
|
||||
The substitution syntax devised for PEP 292 is deliberately simple so that the
|
||||
template strings can be extracted into an i18n message catalog, and passed to
|
||||
translators who may not themselves be developers. For these use cases, it is
|
||||
important that the interpolation syntax be as simple as possible, as the
|
||||
translators are responsible for preserving the substition markers, even as
|
||||
they translate the surrounding text. The PEP 292 syntax is also a common mesage
|
||||
catalog syntax already supporting by many commercial software translation
|
||||
support tools.
|
||||
|
||||
PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
|
||||
introduced for general purpose string formatting in PEP 3101, so this PEP adds
|
||||
that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
|
||||
tools the option of rejecting usage of that more advanced syntax at runtime,
|
||||
rather than categorically rejecting it at compile time. The proposed permitted
|
||||
expressions, conversion specifiers, and format specifiers inside ``${ref}`` are
|
||||
exactly as defined for ``{ref}`` substituion in PEP 498.
|
||||
|
||||
The specific proposal in this PEP is also deliberately close in both syntax
|
||||
and semantics to the general purpose interpolation syntax introduced to
|
||||
JavaScript in ES6, as we can reasonably expect a great many Python developers
|
||||
to be regularly switching back and forth between user interface code written in
|
||||
JavaScript and core application code written in Python.
|
||||
While very different in the technical details, the
|
||||
``types.InterpolationTemplate`` interface proposed in this PEP is
|
||||
conceptually quite similar to the ``FormattableString`` type underlying the
|
||||
`native interpolation <https://msdn.microsoft.com/en-us/library/dn961160.aspx>`__ support introduced in C# 6.0.
|
||||
|
||||
|
||||
Specification
|
||||
|
@ -150,141 +138,153 @@ Interpolation template literals are Unicode strings (bytes literals are not
|
|||
permitted), and string literal concatenation operates as normal, with the
|
||||
entire combined literal forming the interpolation template.
|
||||
|
||||
The template string is parsed into literals and expressions. Expressions
|
||||
appear as either identifiers prefixed with a single "$" character, or
|
||||
surrounded be a leading '${' and a trailing '}. The parts of the format string
|
||||
that are not expressions are separated out as string literals.
|
||||
The template string is parsed into literals, expressions and format specifiers
|
||||
as described for f-strings in PEP 498. Conversion specifiers are handled
|
||||
by the compiler, and appear as part of the field text in interpolation
|
||||
templates.
|
||||
|
||||
While parsing the string, any doubled ``$$`` is replaced with a single ``$``
|
||||
and is considered part of the literal text, rather than as introducing an
|
||||
expression.
|
||||
|
||||
These components are then organised into an instance of a new type with the
|
||||
However, rather than being rendered directly into a formatted strings, these
|
||||
components are instead organised into an instance of a new type with the
|
||||
following semantics::
|
||||
|
||||
class InterpolationTemplate:
|
||||
__slots__ = ("raw_template", "parsed_fields", "field_values")
|
||||
__slots__ = ("raw_template", "parsed_template",
|
||||
"field_values", "format_specifiers")
|
||||
|
||||
def __new__(cls, raw_template, parsed_fields, field_values):
|
||||
def __new__(cls, raw_template, parsed_template,
|
||||
field_values, format_specifiers):
|
||||
self = super().__new__(cls)
|
||||
self.raw_template = raw_template
|
||||
self.parsed_fields = parsed_fields
|
||||
self.parsed_template = parsed_template
|
||||
self.field_values = field_values
|
||||
self.format_specifiers = format_specifiers
|
||||
return self
|
||||
|
||||
def __iter__(self):
|
||||
# Support iterable unpacking
|
||||
yield self.raw_template
|
||||
yield self.parsed_fields
|
||||
yield self.field_values
|
||||
|
||||
def __repr__(self):
|
||||
return str(i"<${type(self).__qualname__} ${self.raw_template!r} "
|
||||
"at ${id(self):#x}>")
|
||||
return (f"<{type(self).__qualname__} {repr(self._raw_template)} "
|
||||
f"at {id(self):#x}>")
|
||||
|
||||
def __str__(self):
|
||||
# See definition of the default template rendering below
|
||||
def __format__(self, format_specifier):
|
||||
# When formatted, render to a string, and use string formatting
|
||||
return format(self.render(), format_specifier)
|
||||
|
||||
The result of the interpolation template expression is an instance of this
|
||||
type, rather than an already rendered string - default rendering only takes
|
||||
place when the instance's ``__str__`` method is called.
|
||||
def render(self, *, render_template=''.join,
|
||||
render_field=format):
|
||||
# See definition of the template rendering semantics below
|
||||
|
||||
The format of the parsed fields tuple is inspired by the interface of
|
||||
``string.Formatter.parse``, and consists of a series of 5-tuples each
|
||||
containing:
|
||||
The result of an interpolation template expression is an instance of this
|
||||
type, rather than an already rendered string - rendering only takes
|
||||
place when the instance's ``render`` method is called (either directly, or
|
||||
indirectly via ``__format__``).
|
||||
|
||||
* a leading string literal (may be the empty string)
|
||||
* the substitution field position (zero-based enumeration)
|
||||
* the substitution expression text
|
||||
* the substitution conversion specifier (as defined by str.format)
|
||||
* the substitution format specifier (as defined by str.format)
|
||||
The compiler will pass the following details to the interpolation template for
|
||||
later use:
|
||||
|
||||
This field ordering is defined such that reading the parsed field tuples from
|
||||
left to right will have all the subcomponents displayed in the same order as
|
||||
they appear in the original template string.
|
||||
* a string containing the raw template as written in the source code
|
||||
* a parsed template tuple that allows the renderer to render the
|
||||
template without needing to reparse the raw string template for substitution
|
||||
fields
|
||||
* a tuple containing the evaluated field values, in field substitution order
|
||||
* a tuple containing the field format specifiers, in field substitution order
|
||||
|
||||
For ease of access the sequence elements will be available as attributes in
|
||||
addition to being available by position:
|
||||
This structure is designed to take full advantage of compile time constant
|
||||
folding by ensuring the parsed template is always constant, even when the
|
||||
field values and format specifiers include variable substitution expressions.
|
||||
|
||||
* ``leading_text``
|
||||
* ``field_position``
|
||||
* ``expression``
|
||||
* ``conversion``
|
||||
* ``format``
|
||||
The raw template is just the interpolation template as a string. By default,
|
||||
it is used to provide an human readable representation for the interpolation
|
||||
template.
|
||||
|
||||
The expression text is simply the text of the substitution expression, as it
|
||||
appeared in the original string, but without the leading and/or surrounding
|
||||
expression markers. The conversion specifier and format specifier are separated
|
||||
from the substition expression by ``!`` and ``:`` as defined for ``str.format``.
|
||||
The parsed template consists of a tuple of 2-tuples, with each 2-tuple
|
||||
containing the following fields:
|
||||
|
||||
If a given substition field has no leading literal section, conversion specifier
|
||||
or format specifier, then the corresponding elements in the tuple are the
|
||||
empty string. If the final part of the string has no trailing substitution
|
||||
field, then the field position, field expression, conversion specifier and
|
||||
format specifier will all be ``None``.
|
||||
* ``leading_text``: a leading string literal. This will be the empty string if
|
||||
the current field is at the start of the string, or immediately follows the
|
||||
preceding field.
|
||||
* ``field_expr``: the text of the expression element in the substitution field.
|
||||
This will be None for a final trailing text segment.
|
||||
|
||||
The substitution field values tuple is created by evaluating the interpolated
|
||||
expressions in the exact runtime context where the interpolation expression
|
||||
appears in the source code.
|
||||
The tuple of evaluated field values holds the *results* of evaluating the
|
||||
substitution expressions in the scope where the interpolation template appears.
|
||||
|
||||
For the following example interpolation template::
|
||||
The tuple of field specifiers holds the *results* of evaluating the field
|
||||
specifiers as f-strings in the scope where the interpolation template appears.
|
||||
|
||||
i'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'
|
||||
The ``InterpolationTemplate.render`` implementation then defines the rendering
|
||||
process in terms of the following renderers:
|
||||
|
||||
the parsed fields tuple would be::
|
||||
* an overall ``render_template`` operation that defines how the sequence of
|
||||
literal template sections and rendered fields are composed into a fully
|
||||
rendered result. The default template renderer is string concatenation
|
||||
using ``''.join``.
|
||||
* a per field ``render_field`` operation that receives the field value and
|
||||
format specifier for substitution fields within the template. The default
|
||||
field renderer is the ``format`` builtin.
|
||||
|
||||
(
|
||||
('abc', 0, 'expr1', '', 'spec1'),
|
||||
('', 1, 'expr2', 'r', 'spec2'),
|
||||
(def', 2, 'expr3', 's', ''),
|
||||
('ghi', 3, 'ident', '', ''),
|
||||
('$jkl', None, None, None, None)
|
||||
)
|
||||
Given an appropriate parsed template representation and internal methods of
|
||||
iterating over it, the semantics of template rendering would then be equivalent
|
||||
to the following::
|
||||
|
||||
While the field values tuple would be::
|
||||
|
||||
(expr1, expr2, expr3, ident)
|
||||
|
||||
The parsed fields tuple can be constant folded at compile time, while the
|
||||
expression values tuple will always need to be constructed at runtime.
|
||||
|
||||
The ``InterpolationTemplate.__str__`` implementation would have the following
|
||||
semantics, with field processing being defined in terms of the ``format``
|
||||
builtin and ``str.format`` conversion specifiers::
|
||||
|
||||
_converter = string.Formatter().convert_field
|
||||
|
||||
def __str__(self):
|
||||
raw_template, fields, values = self
|
||||
def render(self, *, render_template=''.join,
|
||||
render_field=format):
|
||||
iter_fields = enumerate(self.parsed_template)
|
||||
values = self.field_values
|
||||
specifiers = self.format_specifiers
|
||||
template_parts = []
|
||||
for leading_text, field_num, expr, conversion, format_spec in fields:
|
||||
for field_pos, (leading_text, field_expr) in iter_fields:
|
||||
template_parts.append(leading_text)
|
||||
if field_num is not None:
|
||||
value = values[field_num]
|
||||
if conversion:
|
||||
value = _converter(value, conversion)
|
||||
field_text = format(value, format_spec)
|
||||
template_parts.append(field_str)
|
||||
return "".join(template_parts)
|
||||
if field_expr is not None:
|
||||
value = values[field_pos]
|
||||
specifier = specifiers[field_pos]
|
||||
rendered_field = render_field(value, specifier)
|
||||
template_parts.append(rendered_field)
|
||||
return render_template(template_parts)
|
||||
|
||||
Writing custom interpolators
|
||||
----------------------------
|
||||
Conversion specifiers
|
||||
---------------------
|
||||
|
||||
Writing a custom interpolator doesn't requiring any special syntax. Instead,
|
||||
custom interpolators are ordinary callables that process an interpolation
|
||||
template directly based on the ``raw_template``, ``parsed_fields`` and
|
||||
``field_values`` attributes, rather than relying on the default rendered.
|
||||
The ``!a``, ``!r`` and ``!s`` conversion specifiers supported by ``str.format``
|
||||
and hence PEP 498 are handled in interpolation templates as follows:
|
||||
|
||||
* they're included unmodified in the raw template to ensure no information is
|
||||
lost
|
||||
* they're *replaced* in the parsed template with the corresponding builtin
|
||||
calls, in order to ensure that ``field_expr`` always contains a valid
|
||||
Python expression
|
||||
* the corresponding field value placed in the field values tuple is
|
||||
converted appropriately *before* being passed to the interpolation
|
||||
template
|
||||
|
||||
This means that, for most purposes, the difference between the use of
|
||||
conversion specifiers and calling the corresponding builtins in the
|
||||
original interpolation template will be transparent to custom renderers. The
|
||||
difference will only be apparent if reparsing the raw template, or attempting
|
||||
to reconstruct the original template from the parsed template.
|
||||
|
||||
Writing custom renderers
|
||||
------------------------
|
||||
|
||||
Writing a custom renderer doesn't requiring any special syntax. Instead,
|
||||
custom renderers are ordinary callables that process an interpolation
|
||||
template directly either by calling the ``render()`` method with alternate ``render_template`` or ``render_field`` implementations, or by accessing the
|
||||
template's data attributes directly.
|
||||
|
||||
For example, the following function would render a template using objects'
|
||||
``repr`` implementations rather than their native formatting support::
|
||||
|
||||
def reprformat(template):
|
||||
def render_field(value, specifier):
|
||||
return format(repr(value), specifier)
|
||||
return template.render(render_field=render_field)
|
||||
|
||||
|
||||
Expression evaluation
|
||||
---------------------
|
||||
|
||||
The subexpressions that are extracted from the interpolation expression are
|
||||
evaluated in the context where the interpolation expression appears. This means
|
||||
the expression has full access to local, nonlocal and global variables. Any
|
||||
valid Python expression can be used inside ``${}``, including function and
|
||||
method calls. References without the surrounding braces are limited to looking
|
||||
up single identifiers.
|
||||
As with f-strings, the subexpressions that are extracted from the interpolation
|
||||
template are evaluated in the context where the interpolation template
|
||||
appears. This means the expression has full access to local, nonlocal and global variables. Any valid Python expression can be used inside ``{}``, including
|
||||
function and method calls.
|
||||
|
||||
Because the substitution expressions are evaluated where the string appears in
|
||||
the source code, there are no additional security concerns related to the
|
||||
|
@ -295,7 +295,7 @@ same expression and used runtime field parsing::
|
|||
>>> def foo(data):
|
||||
... return data + 20
|
||||
...
|
||||
>>> str(i'input=$bar, output=${foo(bar)}')
|
||||
>>> str(i'input={bar}, output={foo(bar)}')
|
||||
'input=10, output=30'
|
||||
|
||||
Is essentially equivalent to::
|
||||
|
@ -306,37 +306,44 @@ Is essentially equivalent to::
|
|||
Handling code injection attacks
|
||||
-------------------------------
|
||||
|
||||
The proposed interpolation syntax makes it potentially attractive to write
|
||||
The PEP 498 formatted string syntax makes it potentially attractive to write
|
||||
code like the following::
|
||||
|
||||
myquery = str(i"SELECT $column FROM $table;")
|
||||
mycommand = str(i"cat $filename")
|
||||
mypage = str(i"<html><body>${response.body}</body></html>")
|
||||
runquery(f"SELECT {column} FROM {table};")
|
||||
runcommand(f"cat {filename}")
|
||||
return_response(f"<html><body>{response.body}</body></html>")
|
||||
|
||||
These all represent potential vectors for code injection attacks, if any of the
|
||||
variables being interpolated happen to come from an untrusted source. The
|
||||
specific proposal in this PEP is designed to make it straightforward to write
|
||||
use case specific interpolators that take care of quoting interpolated values
|
||||
use case specific renderers that take care of quoting interpolated values
|
||||
appropriately for the relevant security context::
|
||||
|
||||
myquery = sql(i"SELECT $column FROM $table;")
|
||||
mycommand = sh(i"cat $filename")
|
||||
mypage = html(i"<html><body>${response.body}</body></html>")
|
||||
runquery(sql(i"SELECT {column} FROM {table};"))
|
||||
runcommand(sh(i"cat {filename}"))
|
||||
return_response(html(i"<html><body>{response.body}</body></html>"))
|
||||
|
||||
This PEP does not cover adding such interpolators to the standard library,
|
||||
but instead ensures they can be readily provided by third party libraries.
|
||||
This PEP does not cover adding such renderers to the standard library
|
||||
immediately, but rather proposes to ensure that they can be readily provided by
|
||||
third party libraries, and potentially incorporated into the standard library
|
||||
at a later date.
|
||||
|
||||
(Although it's tempting to propose adding InterpolationTemplate support at
|
||||
least to ``subprocess.call``, ``subprocess.check_call`` and
|
||||
``subprocess.check_output``)
|
||||
For example, a renderer that aimed to offer a POSIX shell style experience for
|
||||
accessing external programs, without the significant risks posed by running
|
||||
``os.system`` or enabling the system shell when using the ``subprocess`` module
|
||||
APIs, might provide an interface for running external programs similar to that
|
||||
offered by the
|
||||
`Julia programming language <http://julia.readthedocs.org/en/latest/manual/running-external-programs/>`__,
|
||||
only with the backtick based ``\`cat $filename\``` syntax replaced by
|
||||
``i"cat {filename}"`` style interpolation templates.
|
||||
|
||||
Format and conversion specifiers
|
||||
--------------------------------
|
||||
Format specifiers
|
||||
-----------------
|
||||
|
||||
Aside from separating them out from the substitution expression, format and
|
||||
conversion specifiers are otherwise treated as opaque strings by the
|
||||
interpolation template parser - assigning semantics to those (or, alternatively,
|
||||
prohibiting their use) is handled at runtime by the specified interpolator.
|
||||
Aside from separating them out from the substitution expression during parsing,
|
||||
format specifiers are otherwise treated as opaque strings by the interpolation
|
||||
template parser - assigning semantics to those (or, alternatively,
|
||||
prohibiting their use) is handled at runtime by the field renderer.
|
||||
|
||||
Error handling
|
||||
--------------
|
||||
|
@ -348,13 +355,13 @@ errors all raise SyntaxError.
|
|||
|
||||
Unmatched braces::
|
||||
|
||||
>>> i'x=${x'
|
||||
>>> i'x={x'
|
||||
File "<stdin>", line 1
|
||||
SyntaxError: missing '}' in interpolation expression
|
||||
|
||||
Invalid expressions::
|
||||
|
||||
>>> i'x=${!x}'
|
||||
>>> i'x={!x}'
|
||||
File "<fstring>", line 1
|
||||
!x
|
||||
^
|
||||
|
@ -364,68 +371,16 @@ Run time errors occur when evaluating the expressions inside a
|
|||
template string before creating the interpolation template object. See PEP 498
|
||||
for some examples.
|
||||
|
||||
Different interpolators may also impose additional runtime
|
||||
Different renderers may also impose additional runtime
|
||||
constraints on acceptable interpolated expressions and other formatting
|
||||
details, which will be reported as runtime exceptions.
|
||||
|
||||
|
||||
Internationalising interpolated strings
|
||||
=======================================
|
||||
|
||||
Since this PEP derives its interpolation syntax from the internationalisation
|
||||
focused PEP 292, it's worth considering the potential implications this PEP
|
||||
may have for the internationalisation use case.
|
||||
|
||||
Internationalisation enters the picture by writing a custom interpolator that
|
||||
performs internationalisation. For example, the following implementation
|
||||
would delegate interpolation calls to ``string.Template``::
|
||||
|
||||
def i18n(template):
|
||||
# A real implementation would also handle normal strings
|
||||
raw_template, fields, values = template
|
||||
translated = gettext.gettext(raw_template)
|
||||
value_map = _build_interpolation_map(fields, values)
|
||||
return string.Template(translated).safe_substitute(value_map)
|
||||
|
||||
def _build_interpolation_map(fields, values):
|
||||
field_values = {}
|
||||
for literal_text, field_num, expr, conversion, format_spec in fields:
|
||||
assert expr.isidentifier() and not conversion and not format_spec
|
||||
if field_num is not None:
|
||||
field_values[expr] = values[field_num]
|
||||
return field_values
|
||||
|
||||
And could then be invoked as::
|
||||
|
||||
# _ = i18n at top of module or injected into the builtins module
|
||||
print(_(i"This is a $translated $message"))
|
||||
|
||||
Any actual i18n implementation would need to address other issues (most notably
|
||||
message catalog extraction), but this gives the general idea of what might be
|
||||
possible.
|
||||
|
||||
It's also worth noting that one of the benefits of the ``$`` based substitution
|
||||
syntax in this PEP is its compatibility with Mozilla's
|
||||
`l20n syntax <http://l20n.org/>`__, which uses ``{{ name }}`` for global
|
||||
substitution, and ``{{ $user }}`` for local context substitution.
|
||||
|
||||
With the syntax in this PEP, an l20n interpolator could be written as::
|
||||
|
||||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
||||
|
||||
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
|
||||
catalog lookups using PEP 498's semantics), the necessary brace escaping would
|
||||
make the string look like this in order to interpolate the user variable
|
||||
while preserving all of the expected braces::
|
||||
|
||||
locally_interpolated = f"{{{{ ${user} }}}} is running {{{{ appname }}}}"
|
||||
|
||||
|
||||
Possible integration with the logging module
|
||||
============================================
|
||||
|
||||
One of the challenges with the logging module has been that previously been
|
||||
unable to devise a reasonable migration strategy away from the use of
|
||||
One of the challenges with the logging module has been that we have previously
|
||||
been unable to devise a reasonable migration strategy away from the use of
|
||||
printf-style formatting. The runtime parsing and interpolation overhead for
|
||||
logging messages also poses a problem for extensive logging of runtime events
|
||||
for monitoring purposes.
|
||||
|
@ -434,13 +389,41 @@ While beyond the scope of this initial PEP, interpolation template support
|
|||
could potentially be added to the logging module's event reporting APIs,
|
||||
permitting relevant details to be captured using forms like::
|
||||
|
||||
logging.debug(i"Event: $event; Details: $data")
|
||||
logging.critical(i"Error: $error; Details: $data")
|
||||
logging.debug(i"Event: {event}; Details: {data}")
|
||||
logging.critical(i"Error: {error}; Details: {data}")
|
||||
|
||||
Rather than the current mod-formatting style::
|
||||
|
||||
logging.debug("Event: %s; Details: %s", event, data)
|
||||
logging.critical("Error: %s; Details: %s", event, data)
|
||||
|
||||
As the interpolation template is passed in as an ordinary argument, other
|
||||
keyword arguments also remain available::
|
||||
keyword arguments would also remain available::
|
||||
|
||||
logging.critical(i"Error: {error}; Details: {data}", exc_info=True)
|
||||
|
||||
As part of any such integration, a recommended approach would need to be
|
||||
defined for "lazy evaluation" of interpolated fields, as the ``logging``
|
||||
module's existing delayed interpolation support provides access to
|
||||
`various attributes <https://docs.python.org/3/library/logging.html#logrecord-attributes>`__ of the event ``LogRecord`` instance.
|
||||
|
||||
For example, since interpolation expressions are arbitrary Python expressions,
|
||||
string literals could be used to indicate cases where evaluation itself is
|
||||
being deferred, not just rendering::
|
||||
|
||||
logging.debug(i"Logger: {'record.name'}; Event: {event}; Details: {data}")
|
||||
|
||||
This could be further extended with idioms like using inline tuples to indicate
|
||||
deferred function calls to be made only if the log message is actually
|
||||
going to be rendered at current logging levels::
|
||||
|
||||
logging.debug(i"Event: {event}; Details: {expensive_call, raw_data}")
|
||||
|
||||
This kind of approach would be possible as having access to the actual *text*
|
||||
of the field expression would allow the logging renderer to distinguish
|
||||
between inline tuples that appear in the field expression itself, and tuples
|
||||
that happen to be passed in as data values in a normal field.
|
||||
|
||||
logging.critical(i"Error: $error; Details: $data", exc_info=True)
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
@ -455,10 +438,10 @@ Supporting binary interpolation with this syntax would be relatively
|
|||
straightforward (the elements in the parsed fields tuple would just be
|
||||
byte strings rather than text strings, and the default renderer would be
|
||||
markedly less useful), but poses a signficant likelihood of producing
|
||||
confusing type errors when a text interpolator was presented with
|
||||
confusing type errors when a text renderer was presented with
|
||||
binary input.
|
||||
|
||||
Since the proposed operator is useful without binary interpolation support, and
|
||||
Since the proposed syntax is useful without binary interpolation support, and
|
||||
such support can be readily added later, further consideration of binary
|
||||
interpolation is considered out of scope for the current PEP.
|
||||
|
||||
|
@ -466,19 +449,21 @@ Interoperability with str-only interfaces
|
|||
-----------------------------------------
|
||||
|
||||
For interoperability with interfaces that only accept strings, interpolation
|
||||
templates can be prerendered with ``str``, rather than delegating the rendering
|
||||
to the called function.
|
||||
templates can still be prerendered with ``format``, rather than delegating the
|
||||
rendering to the called function.
|
||||
|
||||
This reflects the key difference from PEP 498, which *always* eagerly applies
|
||||
the default rendering, without any convenient way to decide to do something
|
||||
different.
|
||||
the default rendering, without any convenient way to delegate that choice to
|
||||
another section of the code.
|
||||
|
||||
Preserving the raw template string
|
||||
----------------------------------
|
||||
|
||||
Earlier versions of this PEP failed to make the raw template string available
|
||||
to interpolators. This greatly complicated the i18n example, as it needed to
|
||||
reconstruct the original template to pass to the message catalog lookup.
|
||||
on the interpolation template. Retaining it makes it possible to provide a more
|
||||
attractive template representation, as well as providing the ability to
|
||||
precisely reconstruct the original string, including both the expression text
|
||||
and the details of any eagerly rendered substitution fields in format specifiers.
|
||||
|
||||
Creating a rich object rather than a global name lookup
|
||||
-------------------------------------------------------
|
||||
|
@ -488,33 +473,52 @@ a creating a new kind of object for later consumption by interpolation
|
|||
functions. Creating a rich descriptive object with a useful default renderer
|
||||
made it much easier to support customisation of the semantics of interpolation.
|
||||
|
||||
Relative order of conversion and format specifier in parsed fields
|
||||
------------------------------------------------------------------
|
||||
Building atop PEP 498, rather than competing with it
|
||||
----------------------------------------------------
|
||||
|
||||
The relative order of the conversion specifier and the format specifier in the
|
||||
substitution field 5-tuple is defined to match the order they appear in the
|
||||
format string, which is unfortunately the inverse of the way they appear in the
|
||||
``string.Formatter.parse`` 4-tuple.
|
||||
Earlier versions of this PEP attempted to serve as a complete substitute for
|
||||
PEP 498, rather than building a more flexible delayed rendering capability on
|
||||
top of PEP 498's eager rendering.
|
||||
|
||||
I consider this a design defect in ``string.Formatter.parse``, so I think it's
|
||||
worth fixing it in for the customer interpolator API, since the tuple already
|
||||
has other differences (like including both the field position number *and* the
|
||||
text of the expression).
|
||||
Assuming the presence of f-strings as a supporting capability simplified a
|
||||
number of aspects of the proposal in this PEP (such as how to handle substitution
|
||||
fields in format specifiers)
|
||||
|
||||
This PEP also makes the parsed field attributes available by name, so it's
|
||||
possible to write interpolators without caring about the precise field order
|
||||
at all.
|
||||
Deferring consideration of possible use in i18n use cases
|
||||
---------------------------------------------------------
|
||||
|
||||
The initial motivating use case for this PEP was providing a cleaner syntax
|
||||
for i18n translation, as that requires access to the original unmodified
|
||||
template. As such, it focused on compatibility with the subsitution syntax used
|
||||
in Python's ``string.Template`` formatting and Mozilla's l20n project.
|
||||
|
||||
However, subsequent discussion revealed there are significant additional
|
||||
considerations to be taken into account in the i18n use case, which don't
|
||||
impact the simpler cases of handling interpolation into security sensitive
|
||||
contexts (like HTML, system shells, and database queries), or producing
|
||||
application debugging messages in the preferred language of the development
|
||||
team (rather than the native language of end users).
|
||||
|
||||
Due to the original design of the ``str.format`` substitution syntax in PEP
|
||||
3101 being inspired by C#'s string formatting syntax, the specific field
|
||||
substitution syntax used in PEP 498 is consistent not only with Python's own ``str.format`` syntax, but also with string formatting in C#, including the
|
||||
native "$-string" interpolation syntax introduced in C# 6.0 (released in July
|
||||
2015). This means that while this particular substitution syntax may not
|
||||
currently be widely used for translation of *Python* applications (losing out
|
||||
to traditional %-formatting and the designed-specifically-for-i18n
|
||||
``string.Template`` formatting), it *is* a popular translation format in the
|
||||
wider software development ecosystem (since it is already the preferred
|
||||
format for translating C# applications).
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
|
||||
arbitrary expression substitution in string interpolation
|
||||
* Barry Warsaw for the string.Template syntax defined in PEP 292
|
||||
* Armin Ronacher for pointing me towards Mozilla's l20n project
|
||||
* Mike Miller for his survey of programming language interpolation syntaxes in
|
||||
PEP (TBD)
|
||||
* Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to
|
||||
exploring the feasibility of using this model of delayed rendering in i18n
|
||||
use cases (even though the ultimate conclusion was that it was a poor fit,
|
||||
at least for current approaches to i18n in Python)
|
||||
|
||||
References
|
||||
==========
|
||||
|
@ -540,8 +544,11 @@ References
|
|||
.. [#] PEP 498: Literal string formatting
|
||||
(https://www.python.org/dev/peps/pep-0498/)
|
||||
|
||||
.. [#] string.Formatter.parse
|
||||
(https://docs.python.org/3/library/string.html#string.Formatter.parse)
|
||||
.. [#] FormattableString and C# native string interpolation
|
||||
(https://msdn.microsoft.com/en-us/library/dn961160.aspx)
|
||||
|
||||
.. [#] Running external commands in Julia
|
||||
(http://julia.readthedocs.org/en/latest/manual/running-external-programs/)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue