PEP 501: string prefix redux, now with template objects

This commit is contained in:
Nick Coghlan 2015-08-23 12:38:55 +10:00
parent 150cb59734
commit e27e7a0338
1 changed files with 135 additions and 129 deletions

View File

@ -29,29 +29,39 @@ an opening for a form of code injection attack, where the supplied user data
has not been properly escaped before being passed to the ``os.system`` call. has not been properly escaped before being passed to the ``os.system`` call.
To address that problem (and a number of other concerns), this PEP proposes an To address that problem (and a number of other concerns), this PEP proposes an
alternative approach to compiler supported interpolation, based on a new ``$`` alternative approach to compiler supported interpolation, using ``i`` (for
binary operator with a syntactically constrained right hand side, a new "interpolation") as the new string prefix and a substitution syntax
``__interpolate__`` magic method, and a substitution syntax inspired by inspired by that used in ``string.Template`` and ES6 JavaScript, rather than
that used in ``string.Template`` and ES6 JavaScript, rather than adding a 4th adding a 4th substitution variable syntax to Python.
substitution variable syntax to Python.
Some examples of the proposed syntax:: Some possible examples of the proposed syntax::
msg = str$'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.' msg = str(i'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
print(_$"This is a $translated $message") print(_(i"This is a $translated $message"))
translated = l20n$"{{ $user }} is running {{ appname }}" translated = l20n(i"{{ $user }} is running {{ appname }}")
myquery = sql$"SELECT $column FROM $table;" myquery = sql(i"SELECT $column FROM $table;")
mycommand = sh$"cat $filename" mycommand = sh(i"cat $filename")
mypage = html$"<html><body>${response.body}</body></html>" mypage = html(i"<html><body>${response.body}</body></html>")
callable = defer$ "$x + $y" callable = defer(i"$x + $y")
Summary of differences from PEP 498
===================================
The key differences of this proposal relative to PEP 498:
* "i" (interpolation template) prefix rather than "f" (formatted string)
* string.Template/JavaScript inspired substitution syntax, rather than str.format/C# inspired
* interpolation templates are created at runtime as a new kind of object
* the default rendering is invoked by calling ``str()`` on a template object
rather than automatically
Proposal Proposal
======== ========
This PEP proposes the introduction of a new binary operator specifically for This PEP proposes the introduction of a new string prefix that declares the
interpolation of arbitrary expressions:: string to be an interpolation template rather than an ordinary string::
value = interpolator $ "Substitute $names and ${expressions} at runtime" template = $"Substitute $names and ${expressions} at runtime"
This would be effectively interpreted as:: This would be effectively interpreted as::
@ -62,28 +72,25 @@ This would be effectively interpreted as::
(" at runtime", None, None, None, None), (" at runtime", None, None, None, None),
) )
_field_values = (names, expressions) _field_values = (names, expressions)
value = interpolator.__interpolate__(_raw_template, template = types.InterpolationTemplate(_raw_template,
_parsed_fields, _parsed_fields,
_field_values) _field_values)
The right hand side of the new operator would be syntactically constrained to The ``__str__`` method on ``types.InterpolationTemplate`` would then implementat
be a string literal. the following ``str.format`` inspired semantics::
The ``str`` builtin type would gain an ``__interpolate__`` implementation that
supported the following ``str.format`` inspired semantics::
>>> import datetime >>> import datetime
>>> name = 'Jane' >>> name = 'Jane'
>>> age = 50 >>> age = 50
>>> anniversary = datetime.date(1991, 10, 12) >>> anniversary = datetime.date(1991, 10, 12)
>>> str$'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.' >>> str(i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.' 'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> str$'She said her name is ${name!r}.' >>> str(i'She said her name is ${name!r}.')
"She said her name is 'Jane'." "She said her name is 'Jane'."
The interpolation operator could be used with single-quoted, double-quoted and The interpolation template prefix can be combined with single-quoted,
triple quoted strings, including raw strings. It would not support bytes double-quoted and triple quoted strings, including raw strings. It does not
literals as the right hand side of the expression. support combination with bytes literals.
This PEP does not propose to remove or deprecate any of the existing This PEP does not propose to remove or deprecate any of the existing
string formatting mechanisms, as those will remain valuable when formatting string formatting mechanisms, as those will remain valuable when formatting
@ -102,9 +109,11 @@ It also has the effect of introducing yet another syntax for substitution
expressions into Python, when we already have 3 (``str.format``, expressions into Python, when we already have 3 (``str.format``,
``bytes.__mod__`` and ``string.Template``) ``bytes.__mod__`` and ``string.Template``)
This PEP proposes to handle the former issue by always specifying an explicit This PEP proposes to handle the former issue by deferring the actual rendering
interpolator for interpolation operations, and the latter by adopting the of the interpolation template to its ``__str__`` method (allow the use of
``string.Template`` substitution syntax defined in PEP 292. other template renderers by passing the template around as an object), and the
latter by adopting the ``string.Template`` substitution syntax defined in PEP
292.
The substitution syntax devised for PEP 292 is deliberately simple so that the The substitution syntax devised for PEP 292 is deliberately simple so that the
template strings can be extracted into an i18n message catalog, and passed to template strings can be extracted into an i18n message catalog, and passed to
@ -133,18 +142,13 @@ JavaScript and core application code written in Python.
Specification Specification
============= =============
This PEP proposes the introduction of ``$`` as a new binary operator designed This PEP proposes the introduction of ``i`` as a new string prefix that
specifically to support interpolation of template strings:: results in the creation of an instance of a new type,
``types.InterpolationTemplate``.
INTERPOLATOR $ TEMPLATE_STRING Interpolation template literals are Unicode strings (bytes literals are not
permitted), and string literal concatenation operates as normal, with the
This would work as a normal binary operator (precedence TBD), with the entire combined literal forming the interpolation template.
exception that the template string would be syntactically constrained to be a
string literal, rather than permitting arbitrary expressions.
The template string must be a Unicode string (bytes literals are not permitted),
and string literal concatenation operates as normal within the template string
component of the expression.
The template string is parsed into literals and expressions. Expressions The template string is parsed into literals and expressions. Expressions
appear as either identifiers prefixed with a single "$" character, or appear as either identifiers prefixed with a single "$" character, or
@ -155,15 +159,37 @@ While parsing the string, any doubled ``$$`` is replaced with a single ``$``
and is considered part of the literal text, rather than as introducing an and is considered part of the literal text, rather than as introducing an
expression. expression.
These components are then organised into a tuple of tuples, and passed to the These components are then organised into an instance of a new type with the
``__interpolate__`` method of the interpolator identified by the given following semantics::
name along with the runtime values of any expressions to be interpolated::
DOTTED_NAME.__interpolate__(TEMPLATE_STRING, class InterpolationTemplate:
<parsed_fields>, __slots__ = ("raw_template", "parsed_fields", "field_values")
<field_values>)
The template string field tuple is inspired by the interface of def __new__(cls, raw_template, parsed_fields, field_values):
self = super().__new__()
self.raw_template = raw_template
self.parsed_fields = parsed_fields
self.field_values = field_values
return self
def __iter__(self):
# Support iterable unpacking
yield self.raw_template
yield self.parsed_fields
yield self.field_values
def __repr__(self):
return str(i"<${type(self).__qualname__} ${self.raw_template!r} "
"at ${id(self):#x}>")
def __str__(self):
# See definition of the default template rendering below
The result of the interpolation template expression is an instance of this
type, rather than an already rendered string - default rendering only takes
place when the instance's ``__str__`` method is called.
The format of the parsed fields tuple is inspired by the interface of
``string.Formatter.parse``, and consists of a series of 5-tuples each ``string.Formatter.parse``, and consists of a series of 5-tuples each
containing: containing:
@ -191,7 +217,7 @@ appeared in the original string, but without the leading and/or surrounding
expression markers. The conversion specifier and format specifier are separated expression markers. The conversion specifier and format specifier are separated
from the substition expression by ``!`` and ``:`` as defined for ``str.format``. from the substition expression by ``!`` and ``:`` as defined for ``str.format``.
If a given substition field has no leading literal section, coversion specifier If a given substition field has no leading literal section, conversion specifier
or format specifier, then the corresponding elements in the tuple are the or format specifier, then the corresponding elements in the tuple are the
empty string. If the final part of the string has no trailing substitution empty string. If the final part of the string has no trailing substitution
field, then the field position, field expression, conversion specifier and field, then the field position, field expression, conversion specifier and
@ -222,13 +248,14 @@ While the field values tuple would be::
The parsed fields tuple can be constant folded at compile time, while the The parsed fields tuple can be constant folded at compile time, while the
expression values tuple will always need to be constructed at runtime. expression values tuple will always need to be constructed at runtime.
The ``str.__interpolate__`` implementation would have the following The ``InterpolationTemplate.__str__`` implementation would have the following
semantics, with field processing being defined in terms of the ``format`` semantics, with field processing being defined in terms of the ``format``
builtin and ``str.format`` conversion specifiers:: builtin and ``str.format`` conversion specifiers::
_converter = string.Formatter().convert_field _converter = string.Formatter().convert_field
def __interpolate__(raw_template, fields, values): def __str__(self):
raw_template, fields, values = self
template_parts = [] template_parts = []
for leading_text, field_num, expr, conversion, format_spec in fields: for leading_text, field_num, expr, conversion, format_spec in fields:
template_parts.append(leading_text) template_parts.append(leading_text)
@ -243,18 +270,10 @@ builtin and ``str.format`` conversion specifiers::
Writing custom interpolators Writing custom interpolators
---------------------------- ----------------------------
To simplify the process of writing custom interpolators, it is proposed to add Writing a custom interpolator doesn't requiring any special syntax. Instead,
a new builtin decorator, ``interpolator``, which would be defined as:: custom interpolators are ordinary callables that process an interpolation
template directly based on the ``raw_template``, ``parsed_fields`` and
def interpolator(f): ``field_values`` attributes, rather than relying on the default rendered.
f.__interpolate__ = f.__call__
return f
This allows new interpolators to be written as::
@interpolator
def my_custom_interpolator(raw_template, parsed_fields, field_values):
...
Expression evaluation Expression evaluation
@ -287,12 +306,12 @@ Is essentially equivalent to::
Handling code injection attacks Handling code injection attacks
------------------------------- -------------------------------
The proposed interpolation expressions make it potentially attractive to write The proposed interpolation syntax makes it potentially attractive to write
code like the following:: code like the following::
myquery = str$"SELECT $column FROM $table;" myquery = str(i"SELECT $column FROM $table;")
mycommand = str$"cat $filename" mycommand = str(i"cat $filename")
mypage = str$"<html><body>${response.body}</body></html>" mypage = str(i"<html><body>${response.body}</body></html>")
These all represent potential vectors for code injection attacks, if any of the These all represent potential vectors for code injection attacks, if any of the
variables being interpolated happen to come from an untrusted source. The variables being interpolated happen to come from an untrusted source. The
@ -300,15 +319,16 @@ specific proposal in this PEP is designed to make it straightforward to write
use case specific interpolators that take care of quoting interpolated values use case specific interpolators that take care of quoting interpolated values
appropriately for the relevant security context:: appropriately for the relevant security context::
myquery = sql$"SELECT $column FROM $table;" myquery = sql(i"SELECT $column FROM $table;")
mycommand = sh$"cat $filename" mycommand = sh(i"cat $filename")
mypage = html$"<html><body>${response.body}</body></html>" mypage = html(i"<html><body>${response.body}</body></html>")
This PEP does not cover adding such interpolators to the standard library, This PEP does not cover adding such interpolators to the standard library,
but instead ensures they can be readily provided by third party libraries. but instead ensures they can be readily provided by third party libraries.
(Although it's tempting to propose adding __interpolate__ implementations to (Although it's tempting to propose adding InterpolationTemplate support at
``subprocess.call``, ``subprocess.check_call`` and ``subprocess.check_output``) least to ``subprocess.call``, ``subprocess.check_call`` and
``subprocess.check_output``)
Format and conversion specifiers Format and conversion specifiers
-------------------------------- --------------------------------
@ -328,20 +348,21 @@ errors all raise SyntaxError.
Unmatched braces:: Unmatched braces::
>>> str$'x=${x' >>> i'x=${x'
File "<stdin>", line 1 File "<stdin>", line 1
SyntaxError: missing '}' in interpolation expression SyntaxError: missing '}' in interpolation expression
Invalid expressions:: Invalid expressions::
>>> str$'x=${!x}' >>> i'x=${!x}'
File "<fstring>", line 1 File "<fstring>", line 1
!x !x
^ ^
SyntaxError: invalid syntax SyntaxError: invalid syntax
Run time errors occur when evaluating the expressions inside an Run time errors occur when evaluating the expressions inside a
template string. See PEP 498 for some examples. template string before creating the interpolation template object. See PEP 498
for some examples.
Different interpolators may also impose additional runtime Different interpolators may also impose additional runtime
constraints on acceptable interpolated expressions and other formatting constraints on acceptable interpolated expressions and other formatting
@ -359,9 +380,10 @@ Internationalisation enters the picture by writing a custom interpolator that
performs internationalisation. For example, the following implementation performs internationalisation. For example, the following implementation
would delegate interpolation calls to ``string.Template``:: would delegate interpolation calls to ``string.Template``::
@interpolator def i18n(template):
def i18n(template, fields, values): # A real implementation would also handle normal strings
translated = gettext.gettext(template) raw_template, fields, values = template
translated = gettext.gettext(raw_template)
value_map = _build_interpolation_map(fields, values) value_map = _build_interpolation_map(fields, values)
return string.Template(translated).safe_substitute(value_map) return string.Template(translated).safe_substitute(value_map)
@ -376,7 +398,7 @@ would delegate interpolation calls to ``string.Template``::
And would could then be invoked as:: And would could then be invoked as::
# _ = i18n at top of module or injected into the builtins module # _ = i18n at top of module or injected into the builtins module
print(_$"This is a $translated $message") print(_(i"This is a $translated $message"))
Any actual i18n implementation would need to address other issues (most notably Any actual i18n implementation would need to address other issues (most notably
message catalog extraction), but this gives the general idea of what might be message catalog extraction), but this gives the general idea of what might be
@ -389,14 +411,14 @@ substitution, and ``{{ $user }}`` for local context substitution.
With the syntax in this PEP, an l20n interpolator could be written as:: With the syntax in this PEP, an l20n interpolator could be written as::
translated = l20n$"{{ $user }} is running {{ appname }}" translated = l20n(i"{{ $user }} is running {{ appname }}")
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
catalog lookups using PEP 498's semantics), the necessary brace escaping would catalog lookups using PEP 498's semantics), the necessary brace escaping would
make the string look like this in order to interpolate the user variable make the string look like this in order to interpolate the user variable
while preserving all of the expected braces:: while preserving all of the expected braces::
interpolated = "{{{{ ${user} }}}} is running {{{{ appname }}}}" locally_interpolated = f"{{{{ ${user} }}}} is running {{{{ appname }}}}"
Possible integration with the logging module Possible integration with the logging module
@ -408,13 +430,17 @@ printf-style formatting. The runtime parsing and interpolation overhead for
logging messages also poses a problem for extensive logging of runtime events logging messages also poses a problem for extensive logging of runtime events
for monitoring purposes. for monitoring purposes.
While beyond the scope of this initial PEP, the proposal described here could While beyond the scope of this initial PEP, interpolation template support
potentially be applied to the logging module's event reporting APIs, permitting could potentially be added to the logging module's event reporting APIs,
relevant details to be captured using forms like:: permitting relevant details to be captured using forms like::
logging.debug$"Event: $event; Details: $data" logging.debug(i"Event: $event; Details: $data")
logging.critical$"Error: $error; Details: $data" logging.critical(i"Error: $error; Details: $data")
As the interpolation template is passed in as an ordinary argument, other
keyword arguments also remain available::
logging.critical(i"Error: $error; Details: $data", exc_info=True)
Discussion Discussion
========== ==========
@ -422,45 +448,14 @@ Discussion
Refer to PEP 498 for additional discussion, as several of the points there Refer to PEP 498 for additional discussion, as several of the points there
also apply to this PEP. also apply to this PEP.
Using call syntax to support keyword-only parameters
----------------------------------------------------
The logging examples raise the question of whether or not it may be desirable
to allow interpolators to accept arbitrary keyword arguments, and allow folks
to write things like::
logging.critical$"Error: $error; Details: $data"(exc_info=True)
in order to pass additional keyword only arguments to the interpolator.
With the current PEP, such code would attempt to call the result of the
interpolation operation. If interpolation keyword support was added, then
calling the result of an interpolation operation directly would require
parentheses for disambiguation::
(defer$ "$x + $y")()
("defer" here would be an interpolator that compiled the supplied string as
a piece of Python code with eagerly bound references to the containing
namespace)
Determining relative precedence
-------------------------------
The PEP doesn't currently specify the relative precedence of the new operator,
as the only examples considered so far concern standalone expressions or simple
variable assignments.
Development of a reference implementation based on the PEP 498 reference
implementation may help answer that question.
Deferring support for binary interpolation Deferring support for binary interpolation
------------------------------------------ ------------------------------------------
Supporting binary interpolation with this syntax would be relatively Supporting binary interpolation with this syntax would be relatively
straightforward (just a matter of relaxing the syntactic restrictions on the straightforward (the elements in the parsed fields tuple would just be
right hand side of the operator), but poses a signficant likelihood of byte strings rather than text strings, and the default renderer would be
producing confusing type errors when a text interpolator was presented with markedly less useful), but poses a signficant likelihood of producing
confusing type errors when a text interpolator was presented with
binary input. binary input.
Since the proposed operator is useful without binary interpolation support, and Since the proposed operator is useful without binary interpolation support, and
@ -474,13 +469,13 @@ Earlier versions of this PEP failed to make the raw template string available
to interpolators. This greatly complicated the i18n example, as it needed to to interpolators. This greatly complicated the i18n example, as it needed to
reconstruct the original template to pass to the message catalog lookup. reconstruct the original template to pass to the message catalog lookup.
Using a magic method rather than a global name lookup Creating a rich object rather than a global name lookup
----------------------------------------------------- -------------------------------------------------------
Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
a magic method on an explicitly named interpolator. Naming the interpolator a creating a new kind of object for later consumption by interpolation
eliminated a lot of the complexity otherwise associated with shadowing the functions. Creating a rich descriptive object with a useful default renderer
builtin function in order to modify the semantics of interpolation. made it much easier to support customisation of the semantics of interpolation.
Relative order of conversion and format specifier in parsed fields Relative order of conversion and format specifier in parsed fields
------------------------------------------------------------------ ------------------------------------------------------------------
@ -499,6 +494,17 @@ This PEP also makes the parsed field attributes available by name, so it's
possible to write interpolators without caring about the precise field order possible to write interpolators without caring about the precise field order
at all. at all.
Acknowledgements
================
* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
arbitrary expression substitution in string interpolation
* Barry Warsaw for the string.Template syntax defined in PEP 292
* Armin Ronacher for pointing me towards Mozilla's l20n project
* Mike Miller for his survey of programming language interpolation syntaxes in
PEP (TBD)
References References
========== ==========