2015-08-08 10:55:03 -04:00
|
|
|
|
PEP: 501
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Title: General purpose string interpolation
|
2015-08-08 05:20:33 -04:00
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 08-Aug-2015
|
|
|
|
|
Python-Version: 3.6
|
|
|
|
|
Post-History: 08-Aug-2015
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
PEP 498 proposes new syntactic support for string interpolation that is
|
|
|
|
|
transparent to the compiler, allow name references from the interpolation
|
|
|
|
|
operation full access to containing namespaces (as with any other expression),
|
|
|
|
|
rather than being limited to explicitly name references.
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
However, it only offers this capability for string formatting, making it likely
|
|
|
|
|
we will see code like the following::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
os.system(f"echo {user_message}")
|
|
|
|
|
|
|
|
|
|
This kind of code is superficially elegant, but poses a significant problem
|
|
|
|
|
if the interpolated value ``user_message`` is in fact provided by a user: it's
|
|
|
|
|
an opening for a form of code injection attack, where the supplied user data
|
|
|
|
|
has not been properly escaped before being passed to the ``os.system`` call.
|
|
|
|
|
|
|
|
|
|
To address that problem (and a number of other concerns), this PEP proposes an
|
2015-08-22 17:04:33 -04:00
|
|
|
|
alternative approach to compiler supported interpolation, based on a new ``$``
|
|
|
|
|
binary operator with a syntactically constrained right hand side, a new
|
|
|
|
|
``__interpolate__`` magic method, and a substitution syntax inspired by
|
2015-08-22 05:57:17 -04:00
|
|
|
|
that used in ``string.Template`` and ES6 JavaScript, rather than adding a 4th
|
|
|
|
|
substitution variable syntax to Python.
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
Some examples of the proposed syntax::
|
|
|
|
|
|
|
|
|
|
msg = str$'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
|
|
|
|
|
print(_$"This is a $translated $message")
|
|
|
|
|
translated = l20n$"{{ $user }} is running {{ appname }}"
|
|
|
|
|
myquery = sql$"SELECT $column FROM $table;"
|
|
|
|
|
mycommand = sh$"cat $filename"
|
|
|
|
|
mypage = html$"<html><body>${response.body}</body></html>"
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Proposal
|
|
|
|
|
========
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
This PEP proposes the introduction of a new binary operator specifically for
|
|
|
|
|
interpolation of arbitrary expressions::
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
value = interpolator $ "Substitute $names and ${expressions} at runtime"
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
This would be effectively interpreted as::
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
|
|
|
|
_raw_template = "Substitute $names and ${expressions} at runtime"
|
|
|
|
|
_parsed_fields = (
|
|
|
|
|
("Substitute ", 0, "names", "", ""),
|
|
|
|
|
(" and ", 1, "expressions", "", ""),
|
|
|
|
|
(" at runtime", None, None, None, None),
|
|
|
|
|
)
|
|
|
|
|
_field_values = (names, expressions)
|
|
|
|
|
value = interpolator.__interpolate__(_raw_template,
|
|
|
|
|
_parsed_fields,
|
|
|
|
|
_field_values)
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
The right hand side of the new operator would be syntactically constrained to
|
|
|
|
|
be a string literal.
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
|
|
|
|
The ``str`` builtin type would gain an ``__interpolate__`` implementation that
|
2015-08-22 17:04:33 -04:00
|
|
|
|
supported the following ``str.format`` inspired semantics::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
>>> import datetime
|
|
|
|
|
>>> name = 'Jane'
|
|
|
|
|
>>> age = 50
|
|
|
|
|
>>> anniversary = datetime.date(1991, 10, 12)
|
2015-08-22 17:04:33 -04:00
|
|
|
|
>>> str$'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
|
2015-08-08 05:20:33 -04:00
|
|
|
|
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
2015-08-22 17:04:33 -04:00
|
|
|
|
>>> str$'She said her name is ${name!r}.'
|
2015-08-08 05:20:33 -04:00
|
|
|
|
"She said her name is 'Jane'."
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
The interpolation operator could be used with single-quoted, double-quoted and
|
|
|
|
|
triple quoted strings, including raw strings. It would not support bytes
|
|
|
|
|
literals as the right hand side of the expression.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
This PEP does not propose to remove or deprecate any of the existing
|
|
|
|
|
string formatting mechanisms, as those will remain valuable when formatting
|
2015-08-08 05:28:56 -04:00
|
|
|
|
strings that are not present directly in the source code of the application.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
PEP 498 makes interpolating values into strings with full access to Python's
|
2015-08-22 07:17:31 -04:00
|
|
|
|
lexical namespace semantics simpler, but it does so at the cost of creating a
|
|
|
|
|
situation where interpolating values into sensitive targets like SQL queries,
|
|
|
|
|
shell commands and HTML templates will enjoy a much cleaner syntax when handled
|
|
|
|
|
without regard for code injection attacks than when they are handled correctly.
|
|
|
|
|
It also has the effect of introducing yet another syntax for substitution
|
|
|
|
|
expressions into Python, when we already have 3 (``str.format``,
|
|
|
|
|
``bytes.__mod__`` and ``string.Template``)
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
|
|
|
|
This PEP proposes to handle the latter issue by always specifying an explicit
|
|
|
|
|
interpolator for interpolation operations, and the former by adopting the
|
|
|
|
|
``string.Template`` substitution syntax defined in PEP 292.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
The interpolation syntax devised for PEP 292 is deliberately simple so that the
|
2015-08-22 05:57:17 -04:00
|
|
|
|
template strings can be extracted into an i18n message catalog, and passed to
|
2015-08-08 05:20:33 -04:00
|
|
|
|
translators who may not themselves be developers. For these use cases, it is
|
|
|
|
|
important that the interpolation syntax be as simple as possible, as the
|
|
|
|
|
translators are responsible for preserving the substition markers, even as
|
|
|
|
|
they translate the surrounding text. The PEP 292 syntax is also a common mesage
|
|
|
|
|
catalog syntax already supporting by many commercial software translation
|
|
|
|
|
support tools.
|
|
|
|
|
|
|
|
|
|
PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
|
|
|
|
|
introduced for general purpose string formatting in PEP 3101, so this PEP adds
|
|
|
|
|
that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
|
|
|
|
|
tools the option of rejecting usage of that more advanced syntax at runtime,
|
|
|
|
|
rather than categorically rejecting it at compile time. The proposed permitted
|
2015-08-22 05:57:17 -04:00
|
|
|
|
expressions, conversion specifiers, and format specifiers inside ``${ref}`` are
|
|
|
|
|
exactly as defined in PEP 498.
|
|
|
|
|
|
|
|
|
|
The specific proposal in this PEP is also deliberately close in both syntax
|
|
|
|
|
and semantics to the general purpose interpolation syntax introduced to
|
2015-08-22 07:17:31 -04:00
|
|
|
|
JavaScript in ES6, as we can reasonably expect a great many Python developers
|
|
|
|
|
to be regularly switching back and forth between user interface code written in
|
2015-08-22 05:57:17 -04:00
|
|
|
|
JavaScript and core application code written in Python.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
=============
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
This PEP proposes the introduction of ``$`` as a new binary operator designed
|
|
|
|
|
specifically to support interpolation of template strings::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
INTERPOLATOR $ TEMPLATE_STRING
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
This would work as a normal binary operator (precedence TBD), with the
|
|
|
|
|
exception that the template string would be syntactically constrained to be a
|
|
|
|
|
string literal, rather than permitting arbitrary expressions.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
The template string must be a Unicode string (bytes literals are not permitted),
|
2015-08-22 05:57:17 -04:00
|
|
|
|
and string literal concatenation operates as normal within the template string
|
|
|
|
|
component of the expression.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The template string is parsed into literals and expressions. Expressions
|
2015-08-08 05:20:33 -04:00
|
|
|
|
appear as either identifiers prefixed with a single "$" character, or
|
|
|
|
|
surrounded be a leading '${' and a trailing '}. The parts of the format string
|
|
|
|
|
that are not expressions are separated out as string literals.
|
|
|
|
|
|
|
|
|
|
While parsing the string, any doubled ``$$`` is replaced with a single ``$``
|
|
|
|
|
and is considered part of the literal text, rather than as introducing an
|
|
|
|
|
expression.
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
These components are then organised into a tuple of tuples, and passed to the
|
|
|
|
|
``__interpolate__`` method of the interpolator identified by the given
|
|
|
|
|
name::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
DOTTED_NAME.__interpolate__(TEMPLATE_STRING,
|
|
|
|
|
<parsed_fields>,
|
|
|
|
|
<field_values>)
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The template string field tuple is inspired by the interface of
|
|
|
|
|
``string.Formatter.parse``, and consists of a series of 5-tuples each
|
|
|
|
|
containing:
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
* a leading string literal (may be the empty string)
|
|
|
|
|
* the substitution field position (zero-based enumeration)
|
|
|
|
|
* the substitution expression text
|
|
|
|
|
* the substitution conversion specifier (as defined by str.format)
|
|
|
|
|
* the substitution format specifier (as defined by str.format)
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
If a given substition field has no leading literal section, format specifier
|
|
|
|
|
or conversion specifier, then the corresponding elements in the tuple are the
|
|
|
|
|
empty string. If the final part of the string has no trailing substitution
|
|
|
|
|
field, then the field number, format specifier
|
2015-08-08 05:20:33 -04:00
|
|
|
|
and conversion specifier will all be ``None``.
|
|
|
|
|
|
|
|
|
|
The expression text is simply the text of each interpolated expression, as it
|
|
|
|
|
appeared in the original string, but without the leading and/or surrounding
|
|
|
|
|
expression markers.
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The substitution field values tuple is created by evaluating the interpolated
|
|
|
|
|
expressions in the exact runtime context where the interpolation expression
|
|
|
|
|
appears in the source code.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
For the following example interpolation expression::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
str$'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
the parsed fields tuple would be::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
(
|
2015-08-22 05:57:17 -04:00
|
|
|
|
('abc', 0, 'expr1', '', 'spec1'),
|
|
|
|
|
('', 1, 'expr2', 'r', 'spec2'),
|
|
|
|
|
(def', 2, 'expr3', 's', ''),
|
|
|
|
|
('ghi', 3, 'ident', '', ''),
|
|
|
|
|
('$jkl', None, None, None, None)
|
2015-08-08 05:20:33 -04:00
|
|
|
|
)
|
|
|
|
|
|
2015-08-22 07:17:31 -04:00
|
|
|
|
While the field values tuple would be::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
(expr1, expr2, expr3, ident)
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The parsed fields tuple can be constant folded at compile time, while the
|
|
|
|
|
expression values tuple will always need to be constructed at runtime.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The ``str.__interpolate__`` implementation would have the following
|
2015-08-08 05:20:33 -04:00
|
|
|
|
semantics, with field processing being defined in terms of the ``format``
|
|
|
|
|
builtin and ``str.format`` conversion specifiers::
|
|
|
|
|
|
|
|
|
|
_converter = string.Formatter().convert_field
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
def __interpolate__(raw_template, fields, values):
|
2015-08-08 05:20:33 -04:00
|
|
|
|
template_parts = []
|
2015-08-22 05:57:17 -04:00
|
|
|
|
for leading_text, field_num, expr, conversion, format_spec in fields:
|
2015-08-08 05:20:33 -04:00
|
|
|
|
template_parts.append(leading_text)
|
|
|
|
|
if field_num is not None:
|
|
|
|
|
value = values[field_num]
|
|
|
|
|
if conversion:
|
|
|
|
|
value = _converter(value, conversion)
|
|
|
|
|
field_text = format(value, format_spec)
|
|
|
|
|
template_parts.append(field_str)
|
|
|
|
|
return "".join(template_parts)
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Writing custom interpolators
|
|
|
|
|
----------------------------
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
To simplify the process of writing custom interpolators, it is proposed to add
|
|
|
|
|
a new builtin decorator, ``interpolator``, which would be defined as::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
def interpolator(f):
|
|
|
|
|
f.__interpolate__ = f.__call__
|
|
|
|
|
return f
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
This allows new interpolators to be written as::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
@interpolator
|
|
|
|
|
def my_custom_interpolator(raw_template, parsed_fields, field_values):
|
|
|
|
|
...
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Expression evaluation
|
|
|
|
|
---------------------
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The subexpressions that are extracted from the interpolation expression are
|
|
|
|
|
evaluated in the context where the interpolation expression appears. This means
|
|
|
|
|
the expression has full access to local, nonlocal and global variables. Any
|
|
|
|
|
valid Python expression can be used inside ``${}``, including function and
|
|
|
|
|
method calls. References without the surrounding braces are limited to looking
|
|
|
|
|
up single identifiers.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Because the substitution expressions are evaluated where the string appears in
|
|
|
|
|
the source code, there are no additional security concerns related to the
|
|
|
|
|
contents of the expression itself, as you could have also just written the
|
|
|
|
|
same expression and used runtime field parsing::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
>>> bar=10
|
|
|
|
|
>>> def foo(data):
|
|
|
|
|
... return data + 20
|
|
|
|
|
...
|
2015-08-22 17:04:33 -04:00
|
|
|
|
>>> str$'input=$bar, output=${foo(bar)}'
|
2015-08-08 05:20:33 -04:00
|
|
|
|
'input=10, output=30'
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Is essentially equivalent to::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
>>> 'input={}, output={}'.format(bar, foo(bar))
|
|
|
|
|
'input=10, output=30'
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Handling code injection attacks
|
|
|
|
|
-------------------------------
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The proposed interpolation expressions make it potentially attractive to write
|
|
|
|
|
code like the following::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
myquery = str$"SELECT $column FROM $table;"
|
|
|
|
|
mycommand = str$"cat $filename"
|
|
|
|
|
mypage = str$"<html><body>${response.body}</body></html>"
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
|
|
|
|
These all represent potential vectors for code injection attacks, if any of the
|
|
|
|
|
variables being interpolated happen to come from an untrusted source. The
|
|
|
|
|
specific proposal in this PEP is designed to make it straightforward to write
|
|
|
|
|
use case specific interpolators that take care of quoting interpolated values
|
|
|
|
|
appropriately for the relevant security context::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
myquery = sql$"SELECT $column FROM $table;"
|
|
|
|
|
mycommand = sh$"cat $filename"
|
|
|
|
|
mypage = html$"<html><body>${response.body}</body></html>"
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
This PEP does not cover adding such interpolators to the standard library,
|
|
|
|
|
but instead ensures they can be readily provided by third party libraries.
|
|
|
|
|
|
|
|
|
|
(Although it's tempting to propose adding __interpolate__ implementations to
|
|
|
|
|
``subprocess.call``, ``subprocess.check_call`` and ``subprocess.check_output``)
|
|
|
|
|
|
|
|
|
|
Format and conversion specifiers
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
Aside from separating them out from the substitution expression, format and
|
|
|
|
|
conversion specifiers are otherwise treated as opaque strings by the
|
|
|
|
|
interpolation template parser - assigning semantics to those (or, alternatively,
|
|
|
|
|
prohibiting their use) is handled at runtime by the specified interpolator.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
Error handling
|
|
|
|
|
--------------
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Either compile time or run time errors can occur when processing interpolation
|
|
|
|
|
expressions. Compile time errors are limited to those errors that can be
|
|
|
|
|
detected when parsing a template string into its component tuples. These
|
|
|
|
|
errors all raise SyntaxError.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
Unmatched braces::
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
>>> str$'x=${x'
|
2015-08-08 05:20:33 -04:00
|
|
|
|
File "<stdin>", line 1
|
|
|
|
|
SyntaxError: missing '}' in interpolation expression
|
|
|
|
|
|
|
|
|
|
Invalid expressions::
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
>>> str$'x=${!x}'
|
2015-08-08 05:20:33 -04:00
|
|
|
|
File "<fstring>", line 1
|
|
|
|
|
!x
|
|
|
|
|
^
|
|
|
|
|
SyntaxError: invalid syntax
|
|
|
|
|
|
|
|
|
|
Run time errors occur when evaluating the expressions inside an
|
2015-08-22 05:57:17 -04:00
|
|
|
|
template string. See PEP 498 for some examples.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Different interpolators may also impose additional runtime
|
2015-08-08 05:20:33 -04:00
|
|
|
|
constraints on acceptable interpolated expressions and other formatting
|
|
|
|
|
details, which will be reported as runtime exceptions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Internationalising interpolated strings
|
|
|
|
|
=======================================
|
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Since this PEP derives its interpolation syntax from the internationalisation
|
|
|
|
|
focused PEP 292, it's worth considering the potential implications this PEP
|
|
|
|
|
may have for the internationalisation use case.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Internationalisation enters the picture by writing a custom interpolator that
|
|
|
|
|
performs internationalisation. For example, the following implementation
|
|
|
|
|
would delegate interpolation calls to ``string.Template``::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
@interpolator
|
|
|
|
|
def i18n(template, fields, values):
|
|
|
|
|
translated = gettext.gettext(template)
|
2015-08-22 07:17:31 -04:00
|
|
|
|
value_map = _build_interpolation_map(fields, values)
|
|
|
|
|
return string.Template(translated).safe_substitute(value_map)
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
|
|
|
|
def _build_interpolation_map(fields, values):
|
|
|
|
|
field_values = {}
|
|
|
|
|
for literal_text, field_num, expr, conversion, format_spec in fields:
|
|
|
|
|
assert expr.isidentifier() and not conversion and not format_spec
|
2015-08-08 05:20:33 -04:00
|
|
|
|
if field_num is not None:
|
2015-08-22 05:57:17 -04:00
|
|
|
|
field_values[expr] = values[field_num]
|
|
|
|
|
return field_values
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
And would could then be invoked as::
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
print(_$"This is a $translated $message")
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Any actual implementation would need to address other issues (most notably
|
|
|
|
|
message catalog extraction), but this gives the general idea of what might be
|
|
|
|
|
possible.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
It's also worth noting that one of the benefits of the ``$`` based substitution
|
|
|
|
|
syntax in this PEP is its compatibility with Mozilla's
|
|
|
|
|
`l20n syntax <http://l20n.org/>`__, which uses ``{{ name }}`` for global
|
|
|
|
|
substitution, and ``{{ $user }}`` for local context substitution.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
With the syntax in this PEP, an l20n interpolator could be written as::
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
translated = l20n$"{{ $user }} is running {{ appname }}"
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
|
|
|
|
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
|
|
|
|
|
catalog lookups using PEP 498's semantics), the necessary brace escaping would
|
2015-08-22 17:04:33 -04:00
|
|
|
|
make the string look like this in order to interpolate the user variable
|
2015-08-22 05:57:17 -04:00
|
|
|
|
while preserving all of the expected braces::
|
|
|
|
|
|
|
|
|
|
interpolated = "{{{{ ${user} }}}} is running {{{{ appname }}}}"
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 07:17:31 -04:00
|
|
|
|
|
|
|
|
|
Possible integration with the logging module
|
|
|
|
|
============================================
|
|
|
|
|
|
|
|
|
|
One of the challenges with the logging module has been that previously been
|
|
|
|
|
unable to devise a reasonable migration strategy away from the use of
|
|
|
|
|
printf-style formatting. The runtime parsing and interpolation overhead for
|
|
|
|
|
logging messages also poses a problem for extensive logging of runtime events
|
|
|
|
|
for monitoring purposes.
|
|
|
|
|
|
|
|
|
|
While beyond the scope of this initial PEP, the proposal described here could
|
|
|
|
|
potentially be applied to the logging module's event reporting APIs, permitting
|
|
|
|
|
relevant details to be captured using forms like::
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
logging.debug$"Event: $event; Details: $data"
|
|
|
|
|
logging.critical$"Error: $error; Details: $data"
|
2015-08-22 07:17:31 -04:00
|
|
|
|
|
|
|
|
|
|
2015-08-08 05:20:33 -04:00
|
|
|
|
Discussion
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Refer to PEP 498 for additional discussion, as several of the points there
|
|
|
|
|
also apply to this PEP.
|
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
Determining relative precedence
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
The PEP doesn't currently specify the relative precedence of the new operator,
|
|
|
|
|
as the only examples considered so far concern standalone expressions or simple
|
|
|
|
|
variable assignments.
|
|
|
|
|
|
|
|
|
|
Development of a reference implementation based on the PEP 498 reference
|
|
|
|
|
implementation may help answer that question.
|
|
|
|
|
|
|
|
|
|
Deferring support for binary interpolation
|
|
|
|
|
------------------------------------------
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
Supporting binary interpolation with this syntax would be relatively
|
|
|
|
|
straightforward (just a matter of relaxing the syntactic restrictions on the
|
|
|
|
|
right hand side of the operator), but poses a signficant likelihood of
|
|
|
|
|
producing confusing type errors when a text interpolator was presented with
|
|
|
|
|
binary input.
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
2015-08-22 17:04:33 -04:00
|
|
|
|
Since the proposed operator is useful without binary interpolation support, and
|
|
|
|
|
such support can be readily added later, further consideration of binary
|
|
|
|
|
interpolation is considered out of scope for the current PEP.
|
2015-08-22 05:57:17 -04:00
|
|
|
|
|
|
|
|
|
Preserving the raw template string
|
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
|
|
Earlier versions of this PEP failed to make the raw template string available
|
|
|
|
|
to interpolators. This greatly complicated the i18n example, as it needed to
|
|
|
|
|
reconstruct the original template to pass to the message catalog lookup.
|
|
|
|
|
|
|
|
|
|
Using a magic method rather than a global name lookup
|
|
|
|
|
-----------------------------------------------------
|
|
|
|
|
|
|
|
|
|
Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
|
|
|
|
|
a magic method on an explicitly named interpolator. Naming the interpolator
|
|
|
|
|
eliminated a lot of the complexity otherwise associated with shadowing the
|
|
|
|
|
builtin function in order to modify the semantics of interpolation.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
Relative order of conversion and format specifier in parsed fields
|
|
|
|
|
------------------------------------------------------------------
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
The relative order of the conversion specifier and the format specifier in the
|
|
|
|
|
substitution field 5-tuple is defined to match the order they appear in the
|
|
|
|
|
format string, which is unfortunately the inverse of the way they appear in the
|
|
|
|
|
``string.Formatter.parse`` 4-tuple.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 05:57:17 -04:00
|
|
|
|
I consider this a design defect in ``string.Formatter.parse``, so I think it's
|
|
|
|
|
worth fixing it in for the customer interpolator API, since the tuple already
|
|
|
|
|
has other differences (like including both the field position number *and* the
|
|
|
|
|
text of the expression).
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
2015-08-22 07:17:31 -04:00
|
|
|
|
Using call syntax to support keyword-only parameters
|
|
|
|
|
----------------------------------------------------
|
|
|
|
|
|
|
|
|
|
The logging examples could potentially be better written as::
|
|
|
|
|
|
|
|
|
|
!logging.debug("Event: $event; Details: $data")
|
|
|
|
|
!logging.critical("Error: $error; Details: $data")
|
|
|
|
|
|
|
|
|
|
The key benefit this would provide is access to keyword arguments, so you
|
|
|
|
|
could write:
|
|
|
|
|
|
|
|
|
|
!logging.critical("Error: $error; Details: $data", exc_info=True)
|
|
|
|
|
|
|
|
|
|
In this version, an interpolation expression would largely be syntactically
|
|
|
|
|
equivalent to a normal function call, except that it would be restricted to
|
|
|
|
|
accepting a single string literal as its sole position argument.
|
|
|
|
|
|
2015-08-08 05:20:33 -04:00
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [#] %-formatting
|
|
|
|
|
(https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
|
|
|
|
|
|
|
|
|
|
.. [#] str.format
|
|
|
|
|
(https://docs.python.org/3/library/string.html#formatstrings)
|
|
|
|
|
|
|
|
|
|
.. [#] string.Template documentation
|
|
|
|
|
(https://docs.python.org/3/library/string.html#template-strings)
|
|
|
|
|
|
|
|
|
|
.. [#] PEP 215: String Interpolation
|
|
|
|
|
(https://www.python.org/dev/peps/pep-0215/)
|
|
|
|
|
|
|
|
|
|
.. [#] PEP 292: Simpler String Substitutions
|
2015-08-08 10:56:03 -04:00
|
|
|
|
(https://www.python.org/dev/peps/pep-0292/)
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
.. [#] PEP 3101: Advanced String Formatting
|
|
|
|
|
(https://www.python.org/dev/peps/pep-3101/)
|
|
|
|
|
|
|
|
|
|
.. [#] PEP 498: Literal string formatting
|
|
|
|
|
(https://www.python.org/dev/peps/pep-0498/)
|
|
|
|
|
|
|
|
|
|
.. [#] string.Formatter.parse
|
|
|
|
|
(https://docs.python.org/3/library/string.html#string.Formatter.parse)
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|