2015-08-08 10:55:03 -04:00
|
|
|
|
PEP: 501
|
2015-08-08 05:20:33 -04:00
|
|
|
|
Title: Translation ready string interpolation
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 08-Aug-2015
|
|
|
|
|
Python-Version: 3.6
|
|
|
|
|
Post-History: 08-Aug-2015
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
PEP 498 proposes new syntactic support for string interpolation that is
|
|
|
|
|
transparent to the compiler, allow name references from the interpolation
|
|
|
|
|
operation full access to containing namespaces (as with any other expression),
|
|
|
|
|
rather than being limited to explicitly name references.
|
|
|
|
|
|
|
|
|
|
This PEP agrees with the basic motivation of PEP 498, but proposes to focus
|
|
|
|
|
both the syntax and the implementation on the il8n use case, drawing on the
|
|
|
|
|
previous proposals in PEP 292 (which added string.Template) and its predecessor
|
|
|
|
|
PEP 215 (which proposed syntactic support, rather than a runtime string
|
|
|
|
|
manipulation based approach). The text of this PEP currently assumes that the
|
|
|
|
|
reader is familiar with these three previous related proposals.
|
|
|
|
|
|
|
|
|
|
The interpolation syntax proposed for this PEP is that of PEP 292, but expanded
|
|
|
|
|
to allow arbitrary expressions and format specifiers when using the ``${ref}``
|
|
|
|
|
interpolation syntax. The suggested new string prefix is "i" rather than "f",
|
|
|
|
|
with the intended mnemonics being either "interpolated string" or
|
|
|
|
|
"il8n string"::
|
|
|
|
|
|
|
|
|
|
>>> import datetime
|
|
|
|
|
>>> name = 'Jane'
|
|
|
|
|
>>> age = 50
|
|
|
|
|
>>> anniversary = datetime.date(1991, 10, 12)
|
|
|
|
|
>>> i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
|
|
|
|
|
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
|
|
|
|
|
>>> i'She said her name is ${name!r}.'
|
|
|
|
|
"She said her name is 'Jane'."
|
|
|
|
|
|
|
|
|
|
This PEP also proposes the introduction of three new builtin functions,
|
|
|
|
|
``__interpolate__``, ``__interpolateb__`` and ``__interpolateu__``, which
|
|
|
|
|
implement key aspects of the interpolation process, and may be overridden in
|
|
|
|
|
accordance with the usual mechanisms for shadowing builtin functions.
|
|
|
|
|
|
|
|
|
|
This PEP does not propose to remove or deprecate any of the existing
|
|
|
|
|
string formatting mechanisms, as those will remain valuable when formatting
|
2015-08-08 05:28:56 -04:00
|
|
|
|
strings that are not present directly in the source code of the application.
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
The key aim of this PEP that isn't inherited from PEP 498 is to help ensure
|
|
|
|
|
that future Python applications are written in a "translation ready" way, where
|
|
|
|
|
many interface strings that may need to be translated to allow an application
|
|
|
|
|
to be used in multiple languages are flagged as a natural consequence of the
|
|
|
|
|
development process, even though they won't be translated by default.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
PEP 498 makes interpolating values into strings with full access to Python's
|
|
|
|
|
lexical namespace semantics simpler, but it does so at the cost of introducing
|
|
|
|
|
yet another string interpolation syntax.
|
|
|
|
|
|
|
|
|
|
The interpolation syntax devised for PEP 292 is deliberately simple so that the
|
|
|
|
|
template strings can be extracted into an il8n message catalog, and passed to
|
|
|
|
|
translators who may not themselves be developers. For these use cases, it is
|
|
|
|
|
important that the interpolation syntax be as simple as possible, as the
|
|
|
|
|
translators are responsible for preserving the substition markers, even as
|
|
|
|
|
they translate the surrounding text. The PEP 292 syntax is also a common mesage
|
|
|
|
|
catalog syntax already supporting by many commercial software translation
|
|
|
|
|
support tools.
|
|
|
|
|
|
|
|
|
|
PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
|
|
|
|
|
introduced for general purpose string formatting in PEP 3101, so this PEP adds
|
|
|
|
|
that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
|
|
|
|
|
tools the option of rejecting usage of that more advanced syntax at runtime,
|
|
|
|
|
rather than categorically rejecting it at compile time. The proposed permitted
|
|
|
|
|
expressions inside ``${ref}`` are exactly as defined in PEP 498.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
In source code, i-strings are string literals that are prefixed by the
|
|
|
|
|
letter 'i'. The string will be parsed into its components at compile time,
|
|
|
|
|
which will then be passed to the new ``__interpolate__`` builtin at runtime.
|
|
|
|
|
|
|
|
|
|
The 'i' prefix may be combined with 'b', where the 'i' must appear first, in
|
|
|
|
|
which case ``__interpolateb__`` will be called rather than ``__interpolate__``.
|
|
|
|
|
Similarly, 'i' may also be combined with 'u' to call ``__interpolateu__``
|
|
|
|
|
rather than ``__interpolate__``.
|
|
|
|
|
|
|
|
|
|
The 'i' prefix may also be combined with 'r', with or without 'b' or 'u', to
|
|
|
|
|
produce raw i-strings. This disables backslash escape sequences in the string
|
|
|
|
|
literal as usual, but has no effect on the runtime interpolation behaviour.
|
|
|
|
|
|
|
|
|
|
In all cases, the only permitted location for the 'i' prefix is before all other
|
|
|
|
|
prefix characters - it indicates a runtime operation, which is largely
|
|
|
|
|
independent of the compile time prefixes (aside from calling different
|
|
|
|
|
interpolation functions when combined with 'b' or 'u').
|
|
|
|
|
|
|
|
|
|
i-strings are parsed into literals and expressions. Expressions
|
|
|
|
|
appear as either identifiers prefixed with a single "$" character, or
|
|
|
|
|
surrounded be a leading '${' and a trailing '}. The parts of the format string
|
|
|
|
|
that are not expressions are separated out as string literals.
|
|
|
|
|
|
|
|
|
|
While parsing the string, any doubled ``$$`` is replaced with a single ``$``
|
|
|
|
|
and is considered part of the literal text, rather than as introducing an
|
|
|
|
|
expression.
|
|
|
|
|
|
|
|
|
|
These components are then organised into 3 parallel tuples:
|
|
|
|
|
|
|
|
|
|
* parsed format string fields
|
|
|
|
|
* expression text
|
|
|
|
|
* expression values
|
|
|
|
|
|
|
|
|
|
And then passed to the ``__interpolate__`` builtin at runtime::
|
|
|
|
|
|
|
|
|
|
__interpolate__(fields, expressions, values)
|
|
|
|
|
|
|
|
|
|
The format string field tuple is inspired by the interface of
|
|
|
|
|
``string.Formatter.parse``, and consists of a series of 4-tuples each containing
|
|
|
|
|
a leading literal, together with a trailing field number, format specifier,
|
|
|
|
|
and conversion specifier. If a given substition field has no leading literal
|
|
|
|
|
section, format specifier or conversion specifier, then the corresponding
|
|
|
|
|
elements in the tuple are the empty string. If the final part of the string
|
|
|
|
|
has no trailing substitution field, then the field number, format specifier
|
|
|
|
|
and conversion specifier will all be ``None``.
|
|
|
|
|
|
|
|
|
|
The expression text is simply the text of each interpolated expression, as it
|
|
|
|
|
appeared in the original string, but without the leading and/or surrounding
|
|
|
|
|
expression markers.
|
|
|
|
|
|
|
|
|
|
The expression values are the result of evaluating the interpolated expressions
|
|
|
|
|
in the exact runtime context where the i-string appears in the source code.
|
|
|
|
|
|
|
|
|
|
For the following example i-string::
|
|
|
|
|
|
|
|
|
|
i'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'``,
|
|
|
|
|
|
|
|
|
|
the fields tuple would be::
|
|
|
|
|
|
|
|
|
|
(
|
|
|
|
|
('abc', 0, 'spec1', ''),
|
|
|
|
|
('', 1, 'spec2' 'r'),
|
|
|
|
|
(def', 2, '', 's'),
|
|
|
|
|
('ghi', 3, '', ''),
|
|
|
|
|
('$jkl', None, None, None)
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
For the same example, the expression text and value tuples would be::
|
|
|
|
|
|
|
|
|
|
('expr1', 'expr2', 'expr3', 'ident') # Expression text
|
|
|
|
|
(expr1, expr2, expr2, ident) # Expression values
|
|
|
|
|
|
|
|
|
|
The fields and expression text tuples can be constant folded at compile time,
|
|
|
|
|
while the expression values tuple will always need to be constructed at runtime.
|
|
|
|
|
|
|
|
|
|
The default ``__interpolate__`` implementation would have the following
|
|
|
|
|
semantics, with field processing being defined in terms of the ``format``
|
|
|
|
|
builtin and ``str.format`` conversion specifiers::
|
|
|
|
|
|
|
|
|
|
_converter = string.Formatter().convert_field
|
|
|
|
|
|
|
|
|
|
def __interpolate__(fields, expressions, values):
|
|
|
|
|
template_parts = []
|
|
|
|
|
for leading_text, field_num, format_spec, conversion in fields:
|
|
|
|
|
template_parts.append(leading_text)
|
|
|
|
|
if field_num is not None:
|
|
|
|
|
value = values[field_num]
|
|
|
|
|
if conversion:
|
|
|
|
|
value = _converter(value, conversion)
|
|
|
|
|
field_text = format(value, format_spec)
|
|
|
|
|
template_parts.append(field_str)
|
|
|
|
|
return "".join(template_parts)
|
|
|
|
|
|
|
|
|
|
The default ``__interpolateu__`` implementation would be the
|
|
|
|
|
``__interpolate__`` builtin.
|
|
|
|
|
|
|
|
|
|
The default ``__interpolateb__`` implementation would be defined in terms of
|
|
|
|
|
the binary mod-formatting reintroduced in PEP 461::
|
|
|
|
|
|
|
|
|
|
def __interpolateb__(fields, expressions, values):
|
|
|
|
|
template_parts = []
|
|
|
|
|
for leading_data, field_num, format_spec, conversion in fields:
|
|
|
|
|
template_parts.append(leading_data)
|
|
|
|
|
if field_num is not None:
|
|
|
|
|
if conversion:
|
|
|
|
|
raise ValueError("Conversion specifiers not supported "
|
|
|
|
|
"in default binary interpolation")
|
|
|
|
|
value = values[field_num]
|
|
|
|
|
field_data = ("%" + format_spec) % (value,)
|
|
|
|
|
template_parts.append(field_data)
|
|
|
|
|
return b"".join(template_parts)
|
|
|
|
|
|
|
|
|
|
This definition permits examples like the following::
|
|
|
|
|
|
|
|
|
|
>>> data = 10
|
|
|
|
|
>>> ib'$data'
|
|
|
|
|
b'10'
|
|
|
|
|
>>> b'${data:%4x}'
|
|
|
|
|
b' a'
|
|
|
|
|
>>> b'${data:#4x}'
|
|
|
|
|
b' 0xa'
|
|
|
|
|
>>> b'${data:04X}'
|
|
|
|
|
b'000A'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Expression evaluation
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
The expressions that are extracted from the string are evaluated in
|
|
|
|
|
the context where the i-string appeared. This means the expression has
|
|
|
|
|
full access to local, nonlocal and global variables. Any valid Python
|
|
|
|
|
expression can be used inside ``${}``, including function and method calls.
|
|
|
|
|
References without the surrounding braces are limited to looking up single
|
|
|
|
|
identifiers.
|
|
|
|
|
|
|
|
|
|
Because the i-strings are evaluated where the string appears in the
|
|
|
|
|
source code, there is no additional expressiveness available with
|
|
|
|
|
i-strings. There are also no additional security concerns: you could
|
|
|
|
|
have also just written the same expression, not inside of an
|
|
|
|
|
i-string::
|
|
|
|
|
|
|
|
|
|
>>> bar=10
|
|
|
|
|
>>> def foo(data):
|
|
|
|
|
... return data + 20
|
|
|
|
|
...
|
|
|
|
|
>>> i'input=$bar, output=${foo(bar)}'
|
|
|
|
|
'input=10, output=30'
|
|
|
|
|
|
|
|
|
|
Is equivalent to::
|
|
|
|
|
|
|
|
|
|
>>> 'input={}, output={}'.format(bar, foo(bar))
|
|
|
|
|
'input=10, output=30'
|
|
|
|
|
|
|
|
|
|
Format specifiers
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
Format specifiers are not interpreted by the i-string parser - that is
|
|
|
|
|
handling at runtime by the called interpolation function.
|
|
|
|
|
|
|
|
|
|
Concatenating strings
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
As i-strings are shorthand for a runtime builtin function call, implicit
|
|
|
|
|
concatenation is a syntax error (similar to attempting implicit concatenation
|
|
|
|
|
between bytes and str literals)::
|
|
|
|
|
|
|
|
|
|
>>> i"interpolated" "not interpolated"
|
|
|
|
|
File "<stdin>", line 1
|
|
|
|
|
SyntaxError: cannot mix interpolation call with plain literal
|
|
|
|
|
|
|
|
|
|
Error handling
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
Either compile time or run time errors can occur when processing
|
|
|
|
|
i-strings. Compile time errors are limited to those errors that can be
|
|
|
|
|
detected when parsing an i-string into its component tuples. These errors all
|
|
|
|
|
raise SyntaxError.
|
|
|
|
|
|
|
|
|
|
Unmatched braces::
|
|
|
|
|
|
|
|
|
|
>>> i'x=${x'
|
|
|
|
|
File "<stdin>", line 1
|
|
|
|
|
SyntaxError: missing '}' in interpolation expression
|
|
|
|
|
|
|
|
|
|
Invalid expressions::
|
|
|
|
|
|
|
|
|
|
>>> i'x=${!x}'
|
|
|
|
|
File "<fstring>", line 1
|
|
|
|
|
!x
|
|
|
|
|
^
|
|
|
|
|
SyntaxError: invalid syntax
|
|
|
|
|
|
|
|
|
|
Run time errors occur when evaluating the expressions inside an
|
|
|
|
|
i-string. See PEP 498 for some examples.
|
|
|
|
|
|
|
|
|
|
Different interpolation functions may also impose additional runtime
|
|
|
|
|
constraints on acceptable interpolated expressions and other formatting
|
|
|
|
|
details, which will be reported as runtime exceptions.
|
|
|
|
|
|
|
|
|
|
Leading whitespace in expressions is not skipped
|
|
|
|
|
------------------------------------------------
|
|
|
|
|
|
|
|
|
|
Unlike PEP 498, leading whitespace in expressions doesn't need to be skipped -
|
|
|
|
|
'$' is not a legal character in Python's syntax, so it can't appear inside
|
|
|
|
|
a ``${}`` field except as part of another string, whether interpolated or not.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Internationalising interpolated strings
|
|
|
|
|
=======================================
|
|
|
|
|
|
|
|
|
|
So far, this PEP has said nothing practical about internationalisation - only
|
|
|
|
|
formatting text using either str.format or bytes.__mod__ semantics depending
|
|
|
|
|
on whether or not a str or bytes object is being interpolated.
|
|
|
|
|
|
|
|
|
|
Internationalisation enters the picture by overriding the ``__interpolate__``
|
|
|
|
|
builtin on a module-by-module basis. For example, the following implementation
|
|
|
|
|
would delegate interpolation calls to string.Template::
|
|
|
|
|
|
|
|
|
|
def _interpolation_fields_to_template(fields, expressions):
|
|
|
|
|
if not all(expr.isidentifier() for expr in expressions):
|
|
|
|
|
raise ValueError("Only variable substitions permitted for il8n")
|
|
|
|
|
template_parts = []
|
|
|
|
|
for literal_text, field_num, format_spec, conversion in fields:
|
|
|
|
|
if format_spec:
|
|
|
|
|
raise ValueError("Format specifiers not permitted for il8n")
|
|
|
|
|
if conversion:
|
|
|
|
|
raise ValueError("Conversion specifiers not permitted for il8n")
|
|
|
|
|
template_parts.append(literal_text)
|
|
|
|
|
if field_num is not None:
|
|
|
|
|
template_parts.append("${" + expressions[field_num] + "}")
|
|
|
|
|
return "".join(template_parts)
|
|
|
|
|
|
|
|
|
|
def __interpolate__(fields, expressions, values):
|
|
|
|
|
catalog_str = _interpolation_fields_to_template(fields, expressions)
|
|
|
|
|
translated = _(catalog_str)
|
|
|
|
|
values = {k:v for k, v in zip(expressions, values)}
|
|
|
|
|
return string.Template(translated).safe_substitute(values)
|
|
|
|
|
|
|
|
|
|
If a module were to import that definition of __interpolate__ into the
|
|
|
|
|
module namespace, then:
|
|
|
|
|
|
|
|
|
|
* Any i"translated & interpolated" strings would be translated
|
|
|
|
|
* Any iu"untranslated & interpolated" strings would not be translated
|
|
|
|
|
* Any ib"untranslated & interpolated" strings would not be translated
|
|
|
|
|
* Any other string and bytes literals would not be translated unless explicitly
|
|
|
|
|
passed to the relevant translation machinery at runtime
|
|
|
|
|
|
|
|
|
|
This shifts the behaviour from the status quo, where translation support needs
|
|
|
|
|
to be added explicitly to each string requiring translation to one where
|
|
|
|
|
opting *in* to translation is done on a module by module basis, and
|
|
|
|
|
individual interpolated strings can then be opted *out* of translation by
|
|
|
|
|
adding the "u" prefix to the string literal in order to call
|
|
|
|
|
``__interpolateu__`` instead of ``__interpolate__``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Discussion
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Refer to PEP 498 for additional discussion, as several of the points there
|
|
|
|
|
also apply to this PEP.
|
|
|
|
|
|
|
|
|
|
Preserving the unmodified format string
|
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
|
|
A lot of the complexity in the il8n example is actually in recreating the
|
|
|
|
|
original format string from its component parts. It may make sense to preserve
|
|
|
|
|
and pass that entire string to the interpolation function, in addition to
|
|
|
|
|
the broken down field definitions.
|
|
|
|
|
|
|
|
|
|
This approach would also allow translators to more consistently benefit from
|
|
|
|
|
the simplicity of the PEP 292 approach to string formatting (in the example
|
|
|
|
|
above, surrounding braces are added to the catalog strings even for cases that
|
|
|
|
|
don't need them)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [#] %-formatting
|
|
|
|
|
(https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
|
|
|
|
|
|
|
|
|
|
.. [#] str.format
|
|
|
|
|
(https://docs.python.org/3/library/string.html#formatstrings)
|
|
|
|
|
|
|
|
|
|
.. [#] string.Template documentation
|
|
|
|
|
(https://docs.python.org/3/library/string.html#template-strings)
|
|
|
|
|
|
|
|
|
|
.. [#] PEP 215: String Interpolation
|
|
|
|
|
(https://www.python.org/dev/peps/pep-0215/)
|
|
|
|
|
|
|
|
|
|
.. [#] PEP 292: Simpler String Substitutions
|
2015-08-08 10:56:03 -04:00
|
|
|
|
(https://www.python.org/dev/peps/pep-0292/)
|
2015-08-08 05:20:33 -04:00
|
|
|
|
|
|
|
|
|
.. [#] PEP 3101: Advanced String Formatting
|
|
|
|
|
(https://www.python.org/dev/peps/pep-3101/)
|
|
|
|
|
|
|
|
|
|
.. [#] PEP 498: Literal string formatting
|
|
|
|
|
(https://www.python.org/dev/peps/pep-0498/)
|
|
|
|
|
|
|
|
|
|
.. [#] string.Formatter.parse
|
|
|
|
|
(https://docs.python.org/3/library/string.html#string.Formatter.parse)
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|