diff --git a/pep-0500.txt b/pep-0500.txt new file mode 100644 index 000000000..d9a13a928 --- /dev/null +++ b/pep-0500.txt @@ -0,0 +1,401 @@ +PEP: 500 +Title: Translation ready string interpolation +Version: $Revision$ +Last-Modified: $Date$ +Author: Nick Coghlan +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 08-Aug-2015 +Python-Version: 3.6 +Post-History: 08-Aug-2015 + +Abstract +======== + +PEP 498 proposes new syntactic support for string interpolation that is +transparent to the compiler, allow name references from the interpolation +operation full access to containing namespaces (as with any other expression), +rather than being limited to explicitly name references. + +This PEP agrees with the basic motivation of PEP 498, but proposes to focus +both the syntax and the implementation on the il8n use case, drawing on the +previous proposals in PEP 292 (which added string.Template) and its predecessor +PEP 215 (which proposed syntactic support, rather than a runtime string +manipulation based approach). The text of this PEP currently assumes that the +reader is familiar with these three previous related proposals. + +The interpolation syntax proposed for this PEP is that of PEP 292, but expanded +to allow arbitrary expressions and format specifiers when using the ``${ref}`` +interpolation syntax. The suggested new string prefix is "i" rather than "f", +with the intended mnemonics being either "interpolated string" or +"il8n string":: + + >>> import datetime + >>> name = 'Jane' + >>> age = 50 + >>> anniversary = datetime.date(1991, 10, 12) + >>> i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.' + 'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.' + >>> i'She said her name is ${name!r}.' + "She said her name is 'Jane'." + +This PEP also proposes the introduction of three new builtin functions, +``__interpolate__``, ``__interpolateb__`` and ``__interpolateu__``, which +implement key aspects of the interpolation process, and may be overridden in +accordance with the usual mechanisms for shadowing builtin functions. + +This PEP does not propose to remove or deprecate any of the existing +string formatting mechanisms, as those will remain valuable when formatting +strings that are present directly in the source code of the application. + +The key aim of this PEP that isn't inherited from PEP 498 is to help ensure +that future Python applications are written in a "translation ready" way, where +many interface strings that may need to be translated to allow an application +to be used in multiple languages are flagged as a natural consequence of the +development process, even though they won't be translated by default. + + +Rationale +========= + +PEP 498 makes interpolating values into strings with full access to Python's +lexical namespace semantics simpler, but it does so at the cost of introducing +yet another string interpolation syntax. + +The interpolation syntax devised for PEP 292 is deliberately simple so that the +template strings can be extracted into an il8n message catalog, and passed to +translators who may not themselves be developers. For these use cases, it is +important that the interpolation syntax be as simple as possible, as the +translators are responsible for preserving the substition markers, even as +they translate the surrounding text. The PEP 292 syntax is also a common mesage +catalog syntax already supporting by many commercial software translation +support tools. + +PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that +introduced for general purpose string formatting in PEP 3101, so this PEP adds +that flexibility to the ``${ref}`` construct in PEP 292, and allows translation +tools the option of rejecting usage of that more advanced syntax at runtime, +rather than categorically rejecting it at compile time. The proposed permitted +expressions inside ``${ref}`` are exactly as defined in PEP 498. + + +Specification +============= + +In source code, i-strings are string literals that are prefixed by the +letter 'i'. The string will be parsed into its components at compile time, +which will then be passed to the new ``__interpolate__`` builtin at runtime. + +The 'i' prefix may be combined with 'b', where the 'i' must appear first, in +which case ``__interpolateb__`` will be called rather than ``__interpolate__``. +Similarly, 'i' may also be combined with 'u' to call ``__interpolateu__`` +rather than ``__interpolate__``. + +The 'i' prefix may also be combined with 'r', with or without 'b' or 'u', to +produce raw i-strings. This disables backslash escape sequences in the string +literal as usual, but has no effect on the runtime interpolation behaviour. + +In all cases, the only permitted location for the 'i' prefix is before all other +prefix characters - it indicates a runtime operation, which is largely +independent of the compile time prefixes (aside from calling different +interpolation functions when combined with 'b' or 'u'). + +i-strings are parsed into literals and expressions. Expressions +appear as either identifiers prefixed with a single "$" character, or +surrounded be a leading '${' and a trailing '}. The parts of the format string +that are not expressions are separated out as string literals. + +While parsing the string, any doubled ``$$`` is replaced with a single ``$`` +and is considered part of the literal text, rather than as introducing an +expression. + +These components are then organised into 3 parallel tuples: + +* parsed format string fields +* expression text +* expression values + +And then passed to the ``__interpolate__`` builtin at runtime:: + + __interpolate__(fields, expressions, values) + +The format string field tuple is inspired by the interface of +``string.Formatter.parse``, and consists of a series of 4-tuples each containing +a leading literal, together with a trailing field number, format specifier, +and conversion specifier. If a given substition field has no leading literal +section, format specifier or conversion specifier, then the corresponding +elements in the tuple are the empty string. If the final part of the string +has no trailing substitution field, then the field number, format specifier +and conversion specifier will all be ``None``. + +The expression text is simply the text of each interpolated expression, as it +appeared in the original string, but without the leading and/or surrounding +expression markers. + +The expression values are the result of evaluating the interpolated expressions +in the exact runtime context where the i-string appears in the source code. + +For the following example i-string:: + + i'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'``, + +the fields tuple would be:: + + ( + ('abc', 0, 'spec1', ''), + ('', 1, 'spec2' 'r'), + (def', 2, '', 's'), + ('ghi', 3, '', ''), + ('$jkl', None, None, None) + ) + +For the same example, the expression text and value tuples would be:: + + ('expr1', 'expr2', 'expr3', 'ident') # Expression text + (expr1, expr2, expr2, ident) # Expression values + +The fields and expression text tuples can be constant folded at compile time, +while the expression values tuple will always need to be constructed at runtime. + +The default ``__interpolate__`` implementation would have the following +semantics, with field processing being defined in terms of the ``format`` +builtin and ``str.format`` conversion specifiers:: + + _converter = string.Formatter().convert_field + + def __interpolate__(fields, expressions, values): + template_parts = [] + for leading_text, field_num, format_spec, conversion in fields: + template_parts.append(leading_text) + if field_num is not None: + value = values[field_num] + if conversion: + value = _converter(value, conversion) + field_text = format(value, format_spec) + template_parts.append(field_str) + return "".join(template_parts) + +The default ``__interpolateu__`` implementation would be the +``__interpolate__`` builtin. + +The default ``__interpolateb__`` implementation would be defined in terms of +the binary mod-formatting reintroduced in PEP 461:: + + def __interpolateb__(fields, expressions, values): + template_parts = [] + for leading_data, field_num, format_spec, conversion in fields: + template_parts.append(leading_data) + if field_num is not None: + if conversion: + raise ValueError("Conversion specifiers not supported " + "in default binary interpolation") + value = values[field_num] + field_data = ("%" + format_spec) % (value,) + template_parts.append(field_data) + return b"".join(template_parts) + +This definition permits examples like the following:: + + >>> data = 10 + >>> ib'$data' + b'10' + >>> b'${data:%4x}' + b' a' + >>> b'${data:#4x}' + b' 0xa' + >>> b'${data:04X}' + b'000A' + + +Expression evaluation +--------------------- + +The expressions that are extracted from the string are evaluated in +the context where the i-string appeared. This means the expression has +full access to local, nonlocal and global variables. Any valid Python +expression can be used inside ``${}``, including function and method calls. +References without the surrounding braces are limited to looking up single +identifiers. + +Because the i-strings are evaluated where the string appears in the +source code, there is no additional expressiveness available with +i-strings. There are also no additional security concerns: you could +have also just written the same expression, not inside of an +i-string:: + + >>> bar=10 + >>> def foo(data): + ... return data + 20 + ... + >>> i'input=$bar, output=${foo(bar)}' + 'input=10, output=30' + +Is equivalent to:: + + >>> 'input={}, output={}'.format(bar, foo(bar)) + 'input=10, output=30' + +Format specifiers +----------------- + +Format specifiers are not interpreted by the i-string parser - that is +handling at runtime by the called interpolation function. + +Concatenating strings +--------------------- + +As i-strings are shorthand for a runtime builtin function call, implicit +concatenation is a syntax error (similar to attempting implicit concatenation +between bytes and str literals):: + + >>> i"interpolated" "not interpolated" + File "", line 1 + SyntaxError: cannot mix interpolation call with plain literal + +Error handling +-------------- + +Either compile time or run time errors can occur when processing +i-strings. Compile time errors are limited to those errors that can be +detected when parsing an i-string into its component tuples. These errors all +raise SyntaxError. + +Unmatched braces:: + + >>> i'x=${x' + File "", line 1 + SyntaxError: missing '}' in interpolation expression + +Invalid expressions:: + + >>> i'x=${!x}' + File "", line 1 + !x + ^ + SyntaxError: invalid syntax + +Run time errors occur when evaluating the expressions inside an +i-string. See PEP 498 for some examples. + +Different interpolation functions may also impose additional runtime +constraints on acceptable interpolated expressions and other formatting +details, which will be reported as runtime exceptions. + +Leading whitespace in expressions is not skipped +------------------------------------------------ + +Unlike PEP 498, leading whitespace in expressions doesn't need to be skipped - +'$' is not a legal character in Python's syntax, so it can't appear inside +a ``${}`` field except as part of another string, whether interpolated or not. + + +Internationalising interpolated strings +======================================= + +So far, this PEP has said nothing practical about internationalisation - only +formatting text using either str.format or bytes.__mod__ semantics depending +on whether or not a str or bytes object is being interpolated. + +Internationalisation enters the picture by overriding the ``__interpolate__`` +builtin on a module-by-module basis. For example, the following implementation +would delegate interpolation calls to string.Template:: + + def _interpolation_fields_to_template(fields, expressions): + if not all(expr.isidentifier() for expr in expressions): + raise ValueError("Only variable substitions permitted for il8n") + template_parts = [] + for literal_text, field_num, format_spec, conversion in fields: + if format_spec: + raise ValueError("Format specifiers not permitted for il8n") + if conversion: + raise ValueError("Conversion specifiers not permitted for il8n") + template_parts.append(literal_text) + if field_num is not None: + template_parts.append("${" + expressions[field_num] + "}") + return "".join(template_parts) + + def __interpolate__(fields, expressions, values): + catalog_str = _interpolation_fields_to_template(fields, expressions) + translated = _(catalog_str) + values = {k:v for k, v in zip(expressions, values)} + return string.Template(translated).safe_substitute(values) + +If a module were to import that definition of __interpolate__ into the +module namespace, then: + +* Any i"translated & interpolated" strings would be translated +* Any iu"untranslated & interpolated" strings would not be translated +* Any ib"untranslated & interpolated" strings would not be translated +* Any other string and bytes literals would not be translated unless explicitly + passed to the relevant translation machinery at runtime + +This shifts the behaviour from the status quo, where translation support needs +to be added explicitly to each string requiring translation to one where +opting *in* to translation is done on a module by module basis, and +individual interpolated strings can then be opted *out* of translation by +adding the "u" prefix to the string literal in order to call +``__interpolateu__`` instead of ``__interpolate__``. + + +Discussion +========== + +Refer to PEP 498 for additional discussion, as several of the points there +also apply to this PEP. + +Preserving the unmodified format string +--------------------------------------- + +A lot of the complexity in the il8n example is actually in recreating the +original format string from its component parts. It may make sense to preserve +and pass that entire string to the interpolation function, in addition to +the broken down field definitions. + +This approach would also allow translators to more consistently benefit from +the simplicity of the PEP 292 approach to string formatting (in the example +above, surrounding braces are added to the catalog strings even for cases that +don't need them) + + +References +========== + +.. [#] %-formatting + (https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting) + +.. [#] str.format + (https://docs.python.org/3/library/string.html#formatstrings) + +.. [#] string.Template documentation + (https://docs.python.org/3/library/string.html#template-strings) + +.. [#] PEP 215: String Interpolation + (https://www.python.org/dev/peps/pep-0215/) + +.. [#] PEP 292: Simpler String Substitutions + (https://www.python.org/dev/peps/pep-0215/) + +.. [#] PEP 3101: Advanced String Formatting + (https://www.python.org/dev/peps/pep-3101/) + +.. [#] PEP 498: Literal string formatting + (https://www.python.org/dev/peps/pep-0498/) + +.. [#] string.Formatter.parse + (https://docs.python.org/3/library/string.html#string.Formatter.parse) + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: