python-peps/pep-0501.txt

402 lines
15 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 501
Title: Translation ready string interpolation
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 08-Aug-2015
Python-Version: 3.6
Post-History: 08-Aug-2015
Abstract
========
PEP 498 proposes new syntactic support for string interpolation that is
transparent to the compiler, allow name references from the interpolation
operation full access to containing namespaces (as with any other expression),
rather than being limited to explicitly name references.
This PEP agrees with the basic motivation of PEP 498, but proposes to focus
both the syntax and the implementation on the il8n use case, drawing on the
previous proposals in PEP 292 (which added string.Template) and its predecessor
PEP 215 (which proposed syntactic support, rather than a runtime string
manipulation based approach). The text of this PEP currently assumes that the
reader is familiar with these three previous related proposals.
The interpolation syntax proposed for this PEP is that of PEP 292, but expanded
to allow arbitrary expressions and format specifiers when using the ``${ref}``
interpolation syntax. The suggested new string prefix is "i" rather than "f",
with the intended mnemonics being either "interpolated string" or
"il8n string"::
>>> import datetime
>>> name = 'Jane'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> i'She said her name is ${name!r}.'
"She said her name is 'Jane'."
This PEP also proposes the introduction of three new builtin functions,
``__interpolate__``, ``__interpolateb__`` and ``__interpolateu__``, which
implement key aspects of the interpolation process, and may be overridden in
accordance with the usual mechanisms for shadowing builtin functions.
This PEP does not propose to remove or deprecate any of the existing
string formatting mechanisms, as those will remain valuable when formatting
strings that are not present directly in the source code of the application.
The key aim of this PEP that isn't inherited from PEP 498 is to help ensure
that future Python applications are written in a "translation ready" way, where
many interface strings that may need to be translated to allow an application
to be used in multiple languages are flagged as a natural consequence of the
development process, even though they won't be translated by default.
Rationale
=========
PEP 498 makes interpolating values into strings with full access to Python's
lexical namespace semantics simpler, but it does so at the cost of introducing
yet another string interpolation syntax.
The interpolation syntax devised for PEP 292 is deliberately simple so that the
template strings can be extracted into an il8n message catalog, and passed to
translators who may not themselves be developers. For these use cases, it is
important that the interpolation syntax be as simple as possible, as the
translators are responsible for preserving the substition markers, even as
they translate the surrounding text. The PEP 292 syntax is also a common mesage
catalog syntax already supporting by many commercial software translation
support tools.
PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
introduced for general purpose string formatting in PEP 3101, so this PEP adds
that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
tools the option of rejecting usage of that more advanced syntax at runtime,
rather than categorically rejecting it at compile time. The proposed permitted
expressions inside ``${ref}`` are exactly as defined in PEP 498.
Specification
=============
In source code, i-strings are string literals that are prefixed by the
letter 'i'. The string will be parsed into its components at compile time,
which will then be passed to the new ``__interpolate__`` builtin at runtime.
The 'i' prefix may be combined with 'b', where the 'i' must appear first, in
which case ``__interpolateb__`` will be called rather than ``__interpolate__``.
Similarly, 'i' may also be combined with 'u' to call ``__interpolateu__``
rather than ``__interpolate__``.
The 'i' prefix may also be combined with 'r', with or without 'b' or 'u', to
produce raw i-strings. This disables backslash escape sequences in the string
literal as usual, but has no effect on the runtime interpolation behaviour.
In all cases, the only permitted location for the 'i' prefix is before all other
prefix characters - it indicates a runtime operation, which is largely
independent of the compile time prefixes (aside from calling different
interpolation functions when combined with 'b' or 'u').
i-strings are parsed into literals and expressions. Expressions
appear as either identifiers prefixed with a single "$" character, or
surrounded be a leading '${' and a trailing '}. The parts of the format string
that are not expressions are separated out as string literals.
While parsing the string, any doubled ``$$`` is replaced with a single ``$``
and is considered part of the literal text, rather than as introducing an
expression.
These components are then organised into 3 parallel tuples:
* parsed format string fields
* expression text
* expression values
And then passed to the ``__interpolate__`` builtin at runtime::
__interpolate__(fields, expressions, values)
The format string field tuple is inspired by the interface of
``string.Formatter.parse``, and consists of a series of 4-tuples each containing
a leading literal, together with a trailing field number, format specifier,
and conversion specifier. If a given substition field has no leading literal
section, format specifier or conversion specifier, then the corresponding
elements in the tuple are the empty string. If the final part of the string
has no trailing substitution field, then the field number, format specifier
and conversion specifier will all be ``None``.
The expression text is simply the text of each interpolated expression, as it
appeared in the original string, but without the leading and/or surrounding
expression markers.
The expression values are the result of evaluating the interpolated expressions
in the exact runtime context where the i-string appears in the source code.
For the following example i-string::
i'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'``,
the fields tuple would be::
(
('abc', 0, 'spec1', ''),
('', 1, 'spec2' 'r'),
(def', 2, '', 's'),
('ghi', 3, '', ''),
('$jkl', None, None, None)
)
For the same example, the expression text and value tuples would be::
('expr1', 'expr2', 'expr3', 'ident') # Expression text
(expr1, expr2, expr2, ident) # Expression values
The fields and expression text tuples can be constant folded at compile time,
while the expression values tuple will always need to be constructed at runtime.
The default ``__interpolate__`` implementation would have the following
semantics, with field processing being defined in terms of the ``format``
builtin and ``str.format`` conversion specifiers::
_converter = string.Formatter().convert_field
def __interpolate__(fields, expressions, values):
template_parts = []
for leading_text, field_num, format_spec, conversion in fields:
template_parts.append(leading_text)
if field_num is not None:
value = values[field_num]
if conversion:
value = _converter(value, conversion)
field_text = format(value, format_spec)
template_parts.append(field_str)
return "".join(template_parts)
The default ``__interpolateu__`` implementation would be the
``__interpolate__`` builtin.
The default ``__interpolateb__`` implementation would be defined in terms of
the binary mod-formatting reintroduced in PEP 461::
def __interpolateb__(fields, expressions, values):
template_parts = []
for leading_data, field_num, format_spec, conversion in fields:
template_parts.append(leading_data)
if field_num is not None:
if conversion:
raise ValueError("Conversion specifiers not supported "
"in default binary interpolation")
value = values[field_num]
field_data = ("%" + format_spec) % (value,)
template_parts.append(field_data)
return b"".join(template_parts)
This definition permits examples like the following::
>>> data = 10
>>> ib'$data'
b'10'
>>> b'${data:%4x}'
b' a'
>>> b'${data:#4x}'
b' 0xa'
>>> b'${data:04X}'
b'000A'
Expression evaluation
---------------------
The expressions that are extracted from the string are evaluated in
the context where the i-string appeared. This means the expression has
full access to local, nonlocal and global variables. Any valid Python
expression can be used inside ``${}``, including function and method calls.
References without the surrounding braces are limited to looking up single
identifiers.
Because the i-strings are evaluated where the string appears in the
source code, there is no additional expressiveness available with
i-strings. There are also no additional security concerns: you could
have also just written the same expression, not inside of an
i-string::
>>> bar=10
>>> def foo(data):
... return data + 20
...
>>> i'input=$bar, output=${foo(bar)}'
'input=10, output=30'
Is equivalent to::
>>> 'input={}, output={}'.format(bar, foo(bar))
'input=10, output=30'
Format specifiers
-----------------
Format specifiers are not interpreted by the i-string parser - that is
handling at runtime by the called interpolation function.
Concatenating strings
---------------------
As i-strings are shorthand for a runtime builtin function call, implicit
concatenation is a syntax error (similar to attempting implicit concatenation
between bytes and str literals)::
>>> i"interpolated" "not interpolated"
File "<stdin>", line 1
SyntaxError: cannot mix interpolation call with plain literal
Error handling
--------------
Either compile time or run time errors can occur when processing
i-strings. Compile time errors are limited to those errors that can be
detected when parsing an i-string into its component tuples. These errors all
raise SyntaxError.
Unmatched braces::
>>> i'x=${x'
File "<stdin>", line 1
SyntaxError: missing '}' in interpolation expression
Invalid expressions::
>>> i'x=${!x}'
File "<fstring>", line 1
!x
^
SyntaxError: invalid syntax
Run time errors occur when evaluating the expressions inside an
i-string. See PEP 498 for some examples.
Different interpolation functions may also impose additional runtime
constraints on acceptable interpolated expressions and other formatting
details, which will be reported as runtime exceptions.
Leading whitespace in expressions is not skipped
------------------------------------------------
Unlike PEP 498, leading whitespace in expressions doesn't need to be skipped -
'$' is not a legal character in Python's syntax, so it can't appear inside
a ``${}`` field except as part of another string, whether interpolated or not.
Internationalising interpolated strings
=======================================
So far, this PEP has said nothing practical about internationalisation - only
formatting text using either str.format or bytes.__mod__ semantics depending
on whether or not a str or bytes object is being interpolated.
Internationalisation enters the picture by overriding the ``__interpolate__``
builtin on a module-by-module basis. For example, the following implementation
would delegate interpolation calls to string.Template::
def _interpolation_fields_to_template(fields, expressions):
if not all(expr.isidentifier() for expr in expressions):
raise ValueError("Only variable substitions permitted for il8n")
template_parts = []
for literal_text, field_num, format_spec, conversion in fields:
if format_spec:
raise ValueError("Format specifiers not permitted for il8n")
if conversion:
raise ValueError("Conversion specifiers not permitted for il8n")
template_parts.append(literal_text)
if field_num is not None:
template_parts.append("${" + expressions[field_num] + "}")
return "".join(template_parts)
def __interpolate__(fields, expressions, values):
catalog_str = _interpolation_fields_to_template(fields, expressions)
translated = _(catalog_str)
values = {k:v for k, v in zip(expressions, values)}
return string.Template(translated).safe_substitute(values)
If a module were to import that definition of __interpolate__ into the
module namespace, then:
* Any i"translated & interpolated" strings would be translated
* Any iu"untranslated & interpolated" strings would not be translated
* Any ib"untranslated & interpolated" strings would not be translated
* Any other string and bytes literals would not be translated unless explicitly
passed to the relevant translation machinery at runtime
This shifts the behaviour from the status quo, where translation support needs
to be added explicitly to each string requiring translation to one where
opting *in* to translation is done on a module by module basis, and
individual interpolated strings can then be opted *out* of translation by
adding the "u" prefix to the string literal in order to call
``__interpolateu__`` instead of ``__interpolate__``.
Discussion
==========
Refer to PEP 498 for additional discussion, as several of the points there
also apply to this PEP.
Preserving the unmodified format string
---------------------------------------
A lot of the complexity in the il8n example is actually in recreating the
original format string from its component parts. It may make sense to preserve
and pass that entire string to the interpolation function, in addition to
the broken down field definitions.
This approach would also allow translators to more consistently benefit from
the simplicity of the PEP 292 approach to string formatting (in the example
above, surrounding braces are added to the catalog strings even for cases that
don't need them)
References
==========
.. [#] %-formatting
(https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
.. [#] str.format
(https://docs.python.org/3/library/string.html#formatstrings)
.. [#] string.Template documentation
(https://docs.python.org/3/library/string.html#template-strings)
.. [#] PEP 215: String Interpolation
(https://www.python.org/dev/peps/pep-0215/)
.. [#] PEP 292: Simpler String Substitutions
(https://www.python.org/dev/peps/pep-0292/)
.. [#] PEP 3101: Advanced String Formatting
(https://www.python.org/dev/peps/pep-3101/)
.. [#] PEP 498: Literal string formatting
(https://www.python.org/dev/peps/pep-0498/)
.. [#] string.Formatter.parse
(https://docs.python.org/3/library/string.html#string.Formatter.parse)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: