PEP 501: string prefix redux, now with template objects
This commit is contained in:
parent
150cb59734
commit
e27e7a0338
264
pep-0501.txt
264
pep-0501.txt
|
@ -29,29 +29,39 @@ an opening for a form of code injection attack, where the supplied user data
|
|||
has not been properly escaped before being passed to the ``os.system`` call.
|
||||
|
||||
To address that problem (and a number of other concerns), this PEP proposes an
|
||||
alternative approach to compiler supported interpolation, based on a new ``$``
|
||||
binary operator with a syntactically constrained right hand side, a new
|
||||
``__interpolate__`` magic method, and a substitution syntax inspired by
|
||||
that used in ``string.Template`` and ES6 JavaScript, rather than adding a 4th
|
||||
substitution variable syntax to Python.
|
||||
alternative approach to compiler supported interpolation, using ``i`` (for
|
||||
"interpolation") as the new string prefix and a substitution syntax
|
||||
inspired by that used in ``string.Template`` and ES6 JavaScript, rather than
|
||||
adding a 4th substitution variable syntax to Python.
|
||||
|
||||
Some examples of the proposed syntax::
|
||||
Some possible examples of the proposed syntax::
|
||||
|
||||
msg = str$'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
|
||||
print(_$"This is a $translated $message")
|
||||
translated = l20n$"{{ $user }} is running {{ appname }}"
|
||||
myquery = sql$"SELECT $column FROM $table;"
|
||||
mycommand = sh$"cat $filename"
|
||||
mypage = html$"<html><body>${response.body}</body></html>"
|
||||
callable = defer$ "$x + $y"
|
||||
msg = str(i'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
||||
print(_(i"This is a $translated $message"))
|
||||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
||||
myquery = sql(i"SELECT $column FROM $table;")
|
||||
mycommand = sh(i"cat $filename")
|
||||
mypage = html(i"<html><body>${response.body}</body></html>")
|
||||
callable = defer(i"$x + $y")
|
||||
|
||||
Summary of differences from PEP 498
|
||||
===================================
|
||||
|
||||
The key differences of this proposal relative to PEP 498:
|
||||
|
||||
* "i" (interpolation template) prefix rather than "f" (formatted string)
|
||||
* string.Template/JavaScript inspired substitution syntax, rather than str.format/C# inspired
|
||||
* interpolation templates are created at runtime as a new kind of object
|
||||
* the default rendering is invoked by calling ``str()`` on a template object
|
||||
rather than automatically
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
This PEP proposes the introduction of a new binary operator specifically for
|
||||
interpolation of arbitrary expressions::
|
||||
This PEP proposes the introduction of a new string prefix that declares the
|
||||
string to be an interpolation template rather than an ordinary string::
|
||||
|
||||
value = interpolator $ "Substitute $names and ${expressions} at runtime"
|
||||
template = $"Substitute $names and ${expressions} at runtime"
|
||||
|
||||
This would be effectively interpreted as::
|
||||
|
||||
|
@ -62,28 +72,25 @@ This would be effectively interpreted as::
|
|||
(" at runtime", None, None, None, None),
|
||||
)
|
||||
_field_values = (names, expressions)
|
||||
value = interpolator.__interpolate__(_raw_template,
|
||||
_parsed_fields,
|
||||
_field_values)
|
||||
template = types.InterpolationTemplate(_raw_template,
|
||||
_parsed_fields,
|
||||
_field_values)
|
||||
|
||||
The right hand side of the new operator would be syntactically constrained to
|
||||
be a string literal.
|
||||
|
||||
The ``str`` builtin type would gain an ``__interpolate__`` implementation that
|
||||
supported the following ``str.format`` inspired semantics::
|
||||
The ``__str__`` method on ``types.InterpolationTemplate`` would then implementat
|
||||
the following ``str.format`` inspired semantics::
|
||||
|
||||
>>> import datetime
|
||||
>>> name = 'Jane'
|
||||
>>> age = 50
|
||||
>>> anniversary = datetime.date(1991, 10, 12)
|
||||
>>> str$'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
|
||||
>>> str(i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.')
|
||||
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
||||
>>> str$'She said her name is ${name!r}.'
|
||||
>>> str(i'She said her name is ${name!r}.')
|
||||
"She said her name is 'Jane'."
|
||||
|
||||
The interpolation operator could be used with single-quoted, double-quoted and
|
||||
triple quoted strings, including raw strings. It would not support bytes
|
||||
literals as the right hand side of the expression.
|
||||
The interpolation template prefix can be combined with single-quoted,
|
||||
double-quoted and triple quoted strings, including raw strings. It does not
|
||||
support combination with bytes literals.
|
||||
|
||||
This PEP does not propose to remove or deprecate any of the existing
|
||||
string formatting mechanisms, as those will remain valuable when formatting
|
||||
|
@ -102,9 +109,11 @@ It also has the effect of introducing yet another syntax for substitution
|
|||
expressions into Python, when we already have 3 (``str.format``,
|
||||
``bytes.__mod__`` and ``string.Template``)
|
||||
|
||||
This PEP proposes to handle the former issue by always specifying an explicit
|
||||
interpolator for interpolation operations, and the latter by adopting the
|
||||
``string.Template`` substitution syntax defined in PEP 292.
|
||||
This PEP proposes to handle the former issue by deferring the actual rendering
|
||||
of the interpolation template to its ``__str__`` method (allow the use of
|
||||
other template renderers by passing the template around as an object), and the
|
||||
latter by adopting the ``string.Template`` substitution syntax defined in PEP
|
||||
292.
|
||||
|
||||
The substitution syntax devised for PEP 292 is deliberately simple so that the
|
||||
template strings can be extracted into an i18n message catalog, and passed to
|
||||
|
@ -133,18 +142,13 @@ JavaScript and core application code written in Python.
|
|||
Specification
|
||||
=============
|
||||
|
||||
This PEP proposes the introduction of ``$`` as a new binary operator designed
|
||||
specifically to support interpolation of template strings::
|
||||
This PEP proposes the introduction of ``i`` as a new string prefix that
|
||||
results in the creation of an instance of a new type,
|
||||
``types.InterpolationTemplate``.
|
||||
|
||||
INTERPOLATOR $ TEMPLATE_STRING
|
||||
|
||||
This would work as a normal binary operator (precedence TBD), with the
|
||||
exception that the template string would be syntactically constrained to be a
|
||||
string literal, rather than permitting arbitrary expressions.
|
||||
|
||||
The template string must be a Unicode string (bytes literals are not permitted),
|
||||
and string literal concatenation operates as normal within the template string
|
||||
component of the expression.
|
||||
Interpolation template literals are Unicode strings (bytes literals are not
|
||||
permitted), and string literal concatenation operates as normal, with the
|
||||
entire combined literal forming the interpolation template.
|
||||
|
||||
The template string is parsed into literals and expressions. Expressions
|
||||
appear as either identifiers prefixed with a single "$" character, or
|
||||
|
@ -155,15 +159,37 @@ While parsing the string, any doubled ``$$`` is replaced with a single ``$``
|
|||
and is considered part of the literal text, rather than as introducing an
|
||||
expression.
|
||||
|
||||
These components are then organised into a tuple of tuples, and passed to the
|
||||
``__interpolate__`` method of the interpolator identified by the given
|
||||
name along with the runtime values of any expressions to be interpolated::
|
||||
These components are then organised into an instance of a new type with the
|
||||
following semantics::
|
||||
|
||||
DOTTED_NAME.__interpolate__(TEMPLATE_STRING,
|
||||
<parsed_fields>,
|
||||
<field_values>)
|
||||
class InterpolationTemplate:
|
||||
__slots__ = ("raw_template", "parsed_fields", "field_values")
|
||||
|
||||
The template string field tuple is inspired by the interface of
|
||||
def __new__(cls, raw_template, parsed_fields, field_values):
|
||||
self = super().__new__()
|
||||
self.raw_template = raw_template
|
||||
self.parsed_fields = parsed_fields
|
||||
self.field_values = field_values
|
||||
return self
|
||||
|
||||
def __iter__(self):
|
||||
# Support iterable unpacking
|
||||
yield self.raw_template
|
||||
yield self.parsed_fields
|
||||
yield self.field_values
|
||||
|
||||
def __repr__(self):
|
||||
return str(i"<${type(self).__qualname__} ${self.raw_template!r} "
|
||||
"at ${id(self):#x}>")
|
||||
|
||||
def __str__(self):
|
||||
# See definition of the default template rendering below
|
||||
|
||||
The result of the interpolation template expression is an instance of this
|
||||
type, rather than an already rendered string - default rendering only takes
|
||||
place when the instance's ``__str__`` method is called.
|
||||
|
||||
The format of the parsed fields tuple is inspired by the interface of
|
||||
``string.Formatter.parse``, and consists of a series of 5-tuples each
|
||||
containing:
|
||||
|
||||
|
@ -191,7 +217,7 @@ appeared in the original string, but without the leading and/or surrounding
|
|||
expression markers. The conversion specifier and format specifier are separated
|
||||
from the substition expression by ``!`` and ``:`` as defined for ``str.format``.
|
||||
|
||||
If a given substition field has no leading literal section, coversion specifier
|
||||
If a given substition field has no leading literal section, conversion specifier
|
||||
or format specifier, then the corresponding elements in the tuple are the
|
||||
empty string. If the final part of the string has no trailing substitution
|
||||
field, then the field position, field expression, conversion specifier and
|
||||
|
@ -222,13 +248,14 @@ While the field values tuple would be::
|
|||
The parsed fields tuple can be constant folded at compile time, while the
|
||||
expression values tuple will always need to be constructed at runtime.
|
||||
|
||||
The ``str.__interpolate__`` implementation would have the following
|
||||
The ``InterpolationTemplate.__str__`` implementation would have the following
|
||||
semantics, with field processing being defined in terms of the ``format``
|
||||
builtin and ``str.format`` conversion specifiers::
|
||||
|
||||
_converter = string.Formatter().convert_field
|
||||
|
||||
def __interpolate__(raw_template, fields, values):
|
||||
def __str__(self):
|
||||
raw_template, fields, values = self
|
||||
template_parts = []
|
||||
for leading_text, field_num, expr, conversion, format_spec in fields:
|
||||
template_parts.append(leading_text)
|
||||
|
@ -243,18 +270,10 @@ builtin and ``str.format`` conversion specifiers::
|
|||
Writing custom interpolators
|
||||
----------------------------
|
||||
|
||||
To simplify the process of writing custom interpolators, it is proposed to add
|
||||
a new builtin decorator, ``interpolator``, which would be defined as::
|
||||
|
||||
def interpolator(f):
|
||||
f.__interpolate__ = f.__call__
|
||||
return f
|
||||
|
||||
This allows new interpolators to be written as::
|
||||
|
||||
@interpolator
|
||||
def my_custom_interpolator(raw_template, parsed_fields, field_values):
|
||||
...
|
||||
Writing a custom interpolator doesn't requiring any special syntax. Instead,
|
||||
custom interpolators are ordinary callables that process an interpolation
|
||||
template directly based on the ``raw_template``, ``parsed_fields`` and
|
||||
``field_values`` attributes, rather than relying on the default rendered.
|
||||
|
||||
|
||||
Expression evaluation
|
||||
|
@ -287,12 +306,12 @@ Is essentially equivalent to::
|
|||
Handling code injection attacks
|
||||
-------------------------------
|
||||
|
||||
The proposed interpolation expressions make it potentially attractive to write
|
||||
The proposed interpolation syntax makes it potentially attractive to write
|
||||
code like the following::
|
||||
|
||||
myquery = str$"SELECT $column FROM $table;"
|
||||
mycommand = str$"cat $filename"
|
||||
mypage = str$"<html><body>${response.body}</body></html>"
|
||||
myquery = str(i"SELECT $column FROM $table;")
|
||||
mycommand = str(i"cat $filename")
|
||||
mypage = str(i"<html><body>${response.body}</body></html>")
|
||||
|
||||
These all represent potential vectors for code injection attacks, if any of the
|
||||
variables being interpolated happen to come from an untrusted source. The
|
||||
|
@ -300,15 +319,16 @@ specific proposal in this PEP is designed to make it straightforward to write
|
|||
use case specific interpolators that take care of quoting interpolated values
|
||||
appropriately for the relevant security context::
|
||||
|
||||
myquery = sql$"SELECT $column FROM $table;"
|
||||
mycommand = sh$"cat $filename"
|
||||
mypage = html$"<html><body>${response.body}</body></html>"
|
||||
myquery = sql(i"SELECT $column FROM $table;")
|
||||
mycommand = sh(i"cat $filename")
|
||||
mypage = html(i"<html><body>${response.body}</body></html>")
|
||||
|
||||
This PEP does not cover adding such interpolators to the standard library,
|
||||
but instead ensures they can be readily provided by third party libraries.
|
||||
|
||||
(Although it's tempting to propose adding __interpolate__ implementations to
|
||||
``subprocess.call``, ``subprocess.check_call`` and ``subprocess.check_output``)
|
||||
(Although it's tempting to propose adding InterpolationTemplate support at
|
||||
least to ``subprocess.call``, ``subprocess.check_call`` and
|
||||
``subprocess.check_output``)
|
||||
|
||||
Format and conversion specifiers
|
||||
--------------------------------
|
||||
|
@ -328,20 +348,21 @@ errors all raise SyntaxError.
|
|||
|
||||
Unmatched braces::
|
||||
|
||||
>>> str$'x=${x'
|
||||
>>> i'x=${x'
|
||||
File "<stdin>", line 1
|
||||
SyntaxError: missing '}' in interpolation expression
|
||||
|
||||
Invalid expressions::
|
||||
|
||||
>>> str$'x=${!x}'
|
||||
>>> i'x=${!x}'
|
||||
File "<fstring>", line 1
|
||||
!x
|
||||
^
|
||||
SyntaxError: invalid syntax
|
||||
|
||||
Run time errors occur when evaluating the expressions inside an
|
||||
template string. See PEP 498 for some examples.
|
||||
Run time errors occur when evaluating the expressions inside a
|
||||
template string before creating the interpolation template object. See PEP 498
|
||||
for some examples.
|
||||
|
||||
Different interpolators may also impose additional runtime
|
||||
constraints on acceptable interpolated expressions and other formatting
|
||||
|
@ -359,9 +380,10 @@ Internationalisation enters the picture by writing a custom interpolator that
|
|||
performs internationalisation. For example, the following implementation
|
||||
would delegate interpolation calls to ``string.Template``::
|
||||
|
||||
@interpolator
|
||||
def i18n(template, fields, values):
|
||||
translated = gettext.gettext(template)
|
||||
def i18n(template):
|
||||
# A real implementation would also handle normal strings
|
||||
raw_template, fields, values = template
|
||||
translated = gettext.gettext(raw_template)
|
||||
value_map = _build_interpolation_map(fields, values)
|
||||
return string.Template(translated).safe_substitute(value_map)
|
||||
|
||||
|
@ -376,7 +398,7 @@ would delegate interpolation calls to ``string.Template``::
|
|||
And would could then be invoked as::
|
||||
|
||||
# _ = i18n at top of module or injected into the builtins module
|
||||
print(_$"This is a $translated $message")
|
||||
print(_(i"This is a $translated $message"))
|
||||
|
||||
Any actual i18n implementation would need to address other issues (most notably
|
||||
message catalog extraction), but this gives the general idea of what might be
|
||||
|
@ -389,14 +411,14 @@ substitution, and ``{{ $user }}`` for local context substitution.
|
|||
|
||||
With the syntax in this PEP, an l20n interpolator could be written as::
|
||||
|
||||
translated = l20n$"{{ $user }} is running {{ appname }}"
|
||||
translated = l20n(i"{{ $user }} is running {{ appname }}")
|
||||
|
||||
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
|
||||
catalog lookups using PEP 498's semantics), the necessary brace escaping would
|
||||
make the string look like this in order to interpolate the user variable
|
||||
while preserving all of the expected braces::
|
||||
|
||||
interpolated = "{{{{ ${user} }}}} is running {{{{ appname }}}}"
|
||||
locally_interpolated = f"{{{{ ${user} }}}} is running {{{{ appname }}}}"
|
||||
|
||||
|
||||
Possible integration with the logging module
|
||||
|
@ -408,13 +430,17 @@ printf-style formatting. The runtime parsing and interpolation overhead for
|
|||
logging messages also poses a problem for extensive logging of runtime events
|
||||
for monitoring purposes.
|
||||
|
||||
While beyond the scope of this initial PEP, the proposal described here could
|
||||
potentially be applied to the logging module's event reporting APIs, permitting
|
||||
relevant details to be captured using forms like::
|
||||
While beyond the scope of this initial PEP, interpolation template support
|
||||
could potentially be added to the logging module's event reporting APIs,
|
||||
permitting relevant details to be captured using forms like::
|
||||
|
||||
logging.debug$"Event: $event; Details: $data"
|
||||
logging.critical$"Error: $error; Details: $data"
|
||||
logging.debug(i"Event: $event; Details: $data")
|
||||
logging.critical(i"Error: $error; Details: $data")
|
||||
|
||||
As the interpolation template is passed in as an ordinary argument, other
|
||||
keyword arguments also remain available::
|
||||
|
||||
logging.critical(i"Error: $error; Details: $data", exc_info=True)
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
@ -422,45 +448,14 @@ Discussion
|
|||
Refer to PEP 498 for additional discussion, as several of the points there
|
||||
also apply to this PEP.
|
||||
|
||||
Using call syntax to support keyword-only parameters
|
||||
----------------------------------------------------
|
||||
|
||||
The logging examples raise the question of whether or not it may be desirable
|
||||
to allow interpolators to accept arbitrary keyword arguments, and allow folks
|
||||
to write things like::
|
||||
|
||||
logging.critical$"Error: $error; Details: $data"(exc_info=True)
|
||||
|
||||
in order to pass additional keyword only arguments to the interpolator.
|
||||
|
||||
With the current PEP, such code would attempt to call the result of the
|
||||
interpolation operation. If interpolation keyword support was added, then
|
||||
calling the result of an interpolation operation directly would require
|
||||
parentheses for disambiguation::
|
||||
|
||||
(defer$ "$x + $y")()
|
||||
|
||||
("defer" here would be an interpolator that compiled the supplied string as
|
||||
a piece of Python code with eagerly bound references to the containing
|
||||
namespace)
|
||||
|
||||
Determining relative precedence
|
||||
-------------------------------
|
||||
|
||||
The PEP doesn't currently specify the relative precedence of the new operator,
|
||||
as the only examples considered so far concern standalone expressions or simple
|
||||
variable assignments.
|
||||
|
||||
Development of a reference implementation based on the PEP 498 reference
|
||||
implementation may help answer that question.
|
||||
|
||||
Deferring support for binary interpolation
|
||||
------------------------------------------
|
||||
|
||||
Supporting binary interpolation with this syntax would be relatively
|
||||
straightforward (just a matter of relaxing the syntactic restrictions on the
|
||||
right hand side of the operator), but poses a signficant likelihood of
|
||||
producing confusing type errors when a text interpolator was presented with
|
||||
straightforward (the elements in the parsed fields tuple would just be
|
||||
byte strings rather than text strings, and the default renderer would be
|
||||
markedly less useful), but poses a signficant likelihood of producing
|
||||
confusing type errors when a text interpolator was presented with
|
||||
binary input.
|
||||
|
||||
Since the proposed operator is useful without binary interpolation support, and
|
||||
|
@ -474,13 +469,13 @@ Earlier versions of this PEP failed to make the raw template string available
|
|||
to interpolators. This greatly complicated the i18n example, as it needed to
|
||||
reconstruct the original template to pass to the message catalog lookup.
|
||||
|
||||
Using a magic method rather than a global name lookup
|
||||
-----------------------------------------------------
|
||||
Creating a rich object rather than a global name lookup
|
||||
-------------------------------------------------------
|
||||
|
||||
Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
|
||||
a magic method on an explicitly named interpolator. Naming the interpolator
|
||||
eliminated a lot of the complexity otherwise associated with shadowing the
|
||||
builtin function in order to modify the semantics of interpolation.
|
||||
a creating a new kind of object for later consumption by interpolation
|
||||
functions. Creating a rich descriptive object with a useful default renderer
|
||||
made it much easier to support customisation of the semantics of interpolation.
|
||||
|
||||
Relative order of conversion and format specifier in parsed fields
|
||||
------------------------------------------------------------------
|
||||
|
@ -499,6 +494,17 @@ This PEP also makes the parsed field attributes available by name, so it's
|
|||
possible to write interpolators without caring about the precise field order
|
||||
at all.
|
||||
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
|
||||
arbitrary expression substitution in string interpolation
|
||||
* Barry Warsaw for the string.Template syntax defined in PEP 292
|
||||
* Armin Ronacher for pointing me towards Mozilla's l20n project
|
||||
* Mike Miller for his survey of programming language interpolation syntaxes in
|
||||
PEP (TBD)
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
|
|
Loading…
Reference in New Issue