PEP 501: string prefix redux, now with template objects

This commit is contained in:
Nick Coghlan 2015-08-23 12:38:55 +10:00
parent 150cb59734
commit e27e7a0338
1 changed files with 135 additions and 129 deletions

View File

@ -29,29 +29,39 @@ an opening for a form of code injection attack, where the supplied user data
has not been properly escaped before being passed to the ``os.system`` call.
To address that problem (and a number of other concerns), this PEP proposes an
alternative approach to compiler supported interpolation, based on a new ``$``
binary operator with a syntactically constrained right hand side, a new
``__interpolate__`` magic method, and a substitution syntax inspired by
that used in ``string.Template`` and ES6 JavaScript, rather than adding a 4th
substitution variable syntax to Python.
alternative approach to compiler supported interpolation, using ``i`` (for
"interpolation") as the new string prefix and a substitution syntax
inspired by that used in ``string.Template`` and ES6 JavaScript, rather than
adding a 4th substitution variable syntax to Python.
Some examples of the proposed syntax::
Some possible examples of the proposed syntax::
msg = str$'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
print(_$"This is a $translated $message")
translated = l20n$"{{ $user }} is running {{ appname }}"
myquery = sql$"SELECT $column FROM $table;"
mycommand = sh$"cat $filename"
mypage = html$"<html><body>${response.body}</body></html>"
callable = defer$ "$x + $y"
msg = str(i'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
print(_(i"This is a $translated $message"))
translated = l20n(i"{{ $user }} is running {{ appname }}")
myquery = sql(i"SELECT $column FROM $table;")
mycommand = sh(i"cat $filename")
mypage = html(i"<html><body>${response.body}</body></html>")
callable = defer(i"$x + $y")
Summary of differences from PEP 498
===================================
The key differences of this proposal relative to PEP 498:
* "i" (interpolation template) prefix rather than "f" (formatted string)
* string.Template/JavaScript inspired substitution syntax, rather than str.format/C# inspired
* interpolation templates are created at runtime as a new kind of object
* the default rendering is invoked by calling ``str()`` on a template object
rather than automatically
Proposal
========
This PEP proposes the introduction of a new binary operator specifically for
interpolation of arbitrary expressions::
This PEP proposes the introduction of a new string prefix that declares the
string to be an interpolation template rather than an ordinary string::
value = interpolator $ "Substitute $names and ${expressions} at runtime"
template = $"Substitute $names and ${expressions} at runtime"
This would be effectively interpreted as::
@ -62,28 +72,25 @@ This would be effectively interpreted as::
(" at runtime", None, None, None, None),
)
_field_values = (names, expressions)
value = interpolator.__interpolate__(_raw_template,
_parsed_fields,
_field_values)
template = types.InterpolationTemplate(_raw_template,
_parsed_fields,
_field_values)
The right hand side of the new operator would be syntactically constrained to
be a string literal.
The ``str`` builtin type would gain an ``__interpolate__`` implementation that
supported the following ``str.format`` inspired semantics::
The ``__str__`` method on ``types.InterpolationTemplate`` would then implementat
the following ``str.format`` inspired semantics::
>>> import datetime
>>> name = 'Jane'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> str$'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
>>> str(i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> str$'She said her name is ${name!r}.'
>>> str(i'She said her name is ${name!r}.')
"She said her name is 'Jane'."
The interpolation operator could be used with single-quoted, double-quoted and
triple quoted strings, including raw strings. It would not support bytes
literals as the right hand side of the expression.
The interpolation template prefix can be combined with single-quoted,
double-quoted and triple quoted strings, including raw strings. It does not
support combination with bytes literals.
This PEP does not propose to remove or deprecate any of the existing
string formatting mechanisms, as those will remain valuable when formatting
@ -102,9 +109,11 @@ It also has the effect of introducing yet another syntax for substitution
expressions into Python, when we already have 3 (``str.format``,
``bytes.__mod__`` and ``string.Template``)
This PEP proposes to handle the former issue by always specifying an explicit
interpolator for interpolation operations, and the latter by adopting the
``string.Template`` substitution syntax defined in PEP 292.
This PEP proposes to handle the former issue by deferring the actual rendering
of the interpolation template to its ``__str__`` method (allow the use of
other template renderers by passing the template around as an object), and the
latter by adopting the ``string.Template`` substitution syntax defined in PEP
292.
The substitution syntax devised for PEP 292 is deliberately simple so that the
template strings can be extracted into an i18n message catalog, and passed to
@ -133,18 +142,13 @@ JavaScript and core application code written in Python.
Specification
=============
This PEP proposes the introduction of ``$`` as a new binary operator designed
specifically to support interpolation of template strings::
This PEP proposes the introduction of ``i`` as a new string prefix that
results in the creation of an instance of a new type,
``types.InterpolationTemplate``.
INTERPOLATOR $ TEMPLATE_STRING
This would work as a normal binary operator (precedence TBD), with the
exception that the template string would be syntactically constrained to be a
string literal, rather than permitting arbitrary expressions.
The template string must be a Unicode string (bytes literals are not permitted),
and string literal concatenation operates as normal within the template string
component of the expression.
Interpolation template literals are Unicode strings (bytes literals are not
permitted), and string literal concatenation operates as normal, with the
entire combined literal forming the interpolation template.
The template string is parsed into literals and expressions. Expressions
appear as either identifiers prefixed with a single "$" character, or
@ -155,15 +159,37 @@ While parsing the string, any doubled ``$$`` is replaced with a single ``$``
and is considered part of the literal text, rather than as introducing an
expression.
These components are then organised into a tuple of tuples, and passed to the
``__interpolate__`` method of the interpolator identified by the given
name along with the runtime values of any expressions to be interpolated::
These components are then organised into an instance of a new type with the
following semantics::
DOTTED_NAME.__interpolate__(TEMPLATE_STRING,
<parsed_fields>,
<field_values>)
class InterpolationTemplate:
__slots__ = ("raw_template", "parsed_fields", "field_values")
The template string field tuple is inspired by the interface of
def __new__(cls, raw_template, parsed_fields, field_values):
self = super().__new__()
self.raw_template = raw_template
self.parsed_fields = parsed_fields
self.field_values = field_values
return self
def __iter__(self):
# Support iterable unpacking
yield self.raw_template
yield self.parsed_fields
yield self.field_values
def __repr__(self):
return str(i"<${type(self).__qualname__} ${self.raw_template!r} "
"at ${id(self):#x}>")
def __str__(self):
# See definition of the default template rendering below
The result of the interpolation template expression is an instance of this
type, rather than an already rendered string - default rendering only takes
place when the instance's ``__str__`` method is called.
The format of the parsed fields tuple is inspired by the interface of
``string.Formatter.parse``, and consists of a series of 5-tuples each
containing:
@ -191,7 +217,7 @@ appeared in the original string, but without the leading and/or surrounding
expression markers. The conversion specifier and format specifier are separated
from the substition expression by ``!`` and ``:`` as defined for ``str.format``.
If a given substition field has no leading literal section, coversion specifier
If a given substition field has no leading literal section, conversion specifier
or format specifier, then the corresponding elements in the tuple are the
empty string. If the final part of the string has no trailing substitution
field, then the field position, field expression, conversion specifier and
@ -222,13 +248,14 @@ While the field values tuple would be::
The parsed fields tuple can be constant folded at compile time, while the
expression values tuple will always need to be constructed at runtime.
The ``str.__interpolate__`` implementation would have the following
The ``InterpolationTemplate.__str__`` implementation would have the following
semantics, with field processing being defined in terms of the ``format``
builtin and ``str.format`` conversion specifiers::
_converter = string.Formatter().convert_field
def __interpolate__(raw_template, fields, values):
def __str__(self):
raw_template, fields, values = self
template_parts = []
for leading_text, field_num, expr, conversion, format_spec in fields:
template_parts.append(leading_text)
@ -243,18 +270,10 @@ builtin and ``str.format`` conversion specifiers::
Writing custom interpolators
----------------------------
To simplify the process of writing custom interpolators, it is proposed to add
a new builtin decorator, ``interpolator``, which would be defined as::
def interpolator(f):
f.__interpolate__ = f.__call__
return f
This allows new interpolators to be written as::
@interpolator
def my_custom_interpolator(raw_template, parsed_fields, field_values):
...
Writing a custom interpolator doesn't requiring any special syntax. Instead,
custom interpolators are ordinary callables that process an interpolation
template directly based on the ``raw_template``, ``parsed_fields`` and
``field_values`` attributes, rather than relying on the default rendered.
Expression evaluation
@ -287,12 +306,12 @@ Is essentially equivalent to::
Handling code injection attacks
-------------------------------
The proposed interpolation expressions make it potentially attractive to write
The proposed interpolation syntax makes it potentially attractive to write
code like the following::
myquery = str$"SELECT $column FROM $table;"
mycommand = str$"cat $filename"
mypage = str$"<html><body>${response.body}</body></html>"
myquery = str(i"SELECT $column FROM $table;")
mycommand = str(i"cat $filename")
mypage = str(i"<html><body>${response.body}</body></html>")
These all represent potential vectors for code injection attacks, if any of the
variables being interpolated happen to come from an untrusted source. The
@ -300,15 +319,16 @@ specific proposal in this PEP is designed to make it straightforward to write
use case specific interpolators that take care of quoting interpolated values
appropriately for the relevant security context::
myquery = sql$"SELECT $column FROM $table;"
mycommand = sh$"cat $filename"
mypage = html$"<html><body>${response.body}</body></html>"
myquery = sql(i"SELECT $column FROM $table;")
mycommand = sh(i"cat $filename")
mypage = html(i"<html><body>${response.body}</body></html>")
This PEP does not cover adding such interpolators to the standard library,
but instead ensures they can be readily provided by third party libraries.
(Although it's tempting to propose adding __interpolate__ implementations to
``subprocess.call``, ``subprocess.check_call`` and ``subprocess.check_output``)
(Although it's tempting to propose adding InterpolationTemplate support at
least to ``subprocess.call``, ``subprocess.check_call`` and
``subprocess.check_output``)
Format and conversion specifiers
--------------------------------
@ -328,20 +348,21 @@ errors all raise SyntaxError.
Unmatched braces::
>>> str$'x=${x'
>>> i'x=${x'
File "<stdin>", line 1
SyntaxError: missing '}' in interpolation expression
Invalid expressions::
>>> str$'x=${!x}'
>>> i'x=${!x}'
File "<fstring>", line 1
!x
^
SyntaxError: invalid syntax
Run time errors occur when evaluating the expressions inside an
template string. See PEP 498 for some examples.
Run time errors occur when evaluating the expressions inside a
template string before creating the interpolation template object. See PEP 498
for some examples.
Different interpolators may also impose additional runtime
constraints on acceptable interpolated expressions and other formatting
@ -359,9 +380,10 @@ Internationalisation enters the picture by writing a custom interpolator that
performs internationalisation. For example, the following implementation
would delegate interpolation calls to ``string.Template``::
@interpolator
def i18n(template, fields, values):
translated = gettext.gettext(template)
def i18n(template):
# A real implementation would also handle normal strings
raw_template, fields, values = template
translated = gettext.gettext(raw_template)
value_map = _build_interpolation_map(fields, values)
return string.Template(translated).safe_substitute(value_map)
@ -376,7 +398,7 @@ would delegate interpolation calls to ``string.Template``::
And would could then be invoked as::
# _ = i18n at top of module or injected into the builtins module
print(_$"This is a $translated $message")
print(_(i"This is a $translated $message"))
Any actual i18n implementation would need to address other issues (most notably
message catalog extraction), but this gives the general idea of what might be
@ -389,14 +411,14 @@ substitution, and ``{{ $user }}`` for local context substitution.
With the syntax in this PEP, an l20n interpolator could be written as::
translated = l20n$"{{ $user }} is running {{ appname }}"
translated = l20n(i"{{ $user }} is running {{ appname }}")
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
catalog lookups using PEP 498's semantics), the necessary brace escaping would
make the string look like this in order to interpolate the user variable
while preserving all of the expected braces::
interpolated = "{{{{ ${user} }}}} is running {{{{ appname }}}}"
locally_interpolated = f"{{{{ ${user} }}}} is running {{{{ appname }}}}"
Possible integration with the logging module
@ -408,13 +430,17 @@ printf-style formatting. The runtime parsing and interpolation overhead for
logging messages also poses a problem for extensive logging of runtime events
for monitoring purposes.
While beyond the scope of this initial PEP, the proposal described here could
potentially be applied to the logging module's event reporting APIs, permitting
relevant details to be captured using forms like::
While beyond the scope of this initial PEP, interpolation template support
could potentially be added to the logging module's event reporting APIs,
permitting relevant details to be captured using forms like::
logging.debug$"Event: $event; Details: $data"
logging.critical$"Error: $error; Details: $data"
logging.debug(i"Event: $event; Details: $data")
logging.critical(i"Error: $error; Details: $data")
As the interpolation template is passed in as an ordinary argument, other
keyword arguments also remain available::
logging.critical(i"Error: $error; Details: $data", exc_info=True)
Discussion
==========
@ -422,45 +448,14 @@ Discussion
Refer to PEP 498 for additional discussion, as several of the points there
also apply to this PEP.
Using call syntax to support keyword-only parameters
----------------------------------------------------
The logging examples raise the question of whether or not it may be desirable
to allow interpolators to accept arbitrary keyword arguments, and allow folks
to write things like::
logging.critical$"Error: $error; Details: $data"(exc_info=True)
in order to pass additional keyword only arguments to the interpolator.
With the current PEP, such code would attempt to call the result of the
interpolation operation. If interpolation keyword support was added, then
calling the result of an interpolation operation directly would require
parentheses for disambiguation::
(defer$ "$x + $y")()
("defer" here would be an interpolator that compiled the supplied string as
a piece of Python code with eagerly bound references to the containing
namespace)
Determining relative precedence
-------------------------------
The PEP doesn't currently specify the relative precedence of the new operator,
as the only examples considered so far concern standalone expressions or simple
variable assignments.
Development of a reference implementation based on the PEP 498 reference
implementation may help answer that question.
Deferring support for binary interpolation
------------------------------------------
Supporting binary interpolation with this syntax would be relatively
straightforward (just a matter of relaxing the syntactic restrictions on the
right hand side of the operator), but poses a signficant likelihood of
producing confusing type errors when a text interpolator was presented with
straightforward (the elements in the parsed fields tuple would just be
byte strings rather than text strings, and the default renderer would be
markedly less useful), but poses a signficant likelihood of producing
confusing type errors when a text interpolator was presented with
binary input.
Since the proposed operator is useful without binary interpolation support, and
@ -474,13 +469,13 @@ Earlier versions of this PEP failed to make the raw template string available
to interpolators. This greatly complicated the i18n example, as it needed to
reconstruct the original template to pass to the message catalog lookup.
Using a magic method rather than a global name lookup
-----------------------------------------------------
Creating a rich object rather than a global name lookup
-------------------------------------------------------
Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
a magic method on an explicitly named interpolator. Naming the interpolator
eliminated a lot of the complexity otherwise associated with shadowing the
builtin function in order to modify the semantics of interpolation.
a creating a new kind of object for later consumption by interpolation
functions. Creating a rich descriptive object with a useful default renderer
made it much easier to support customisation of the semantics of interpolation.
Relative order of conversion and format specifier in parsed fields
------------------------------------------------------------------
@ -499,6 +494,17 @@ This PEP also makes the parsed field attributes available by name, so it's
possible to write interpolators without caring about the precise field order
at all.
Acknowledgements
================
* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
arbitrary expression substitution in string interpolation
* Barry Warsaw for the string.Template syntax defined in PEP 292
* Armin Ronacher for pointing me towards Mozilla's l20n project
* Mike Miller for his survey of programming language interpolation syntaxes in
PEP (TBD)
References
==========