python-peps/pep-0501.txt

543 lines
21 KiB
Plaintext
Raw Normal View History

PEP: 501
Title: General purpose string interpolation
2015-08-08 05:20:33 -04:00
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 08-Aug-2015
Python-Version: 3.6
Post-History: 08-Aug-2015
Abstract
========
PEP 498 proposes new syntactic support for string interpolation that is
transparent to the compiler, allow name references from the interpolation
operation full access to containing namespaces (as with any other expression),
rather than being limited to explicitly name references.
However, it only offers this capability for string formatting, making it likely
we will see code like the following::
2015-08-08 05:20:33 -04:00
os.system(f"echo {user_message}")
This kind of code is superficially elegant, but poses a significant problem
if the interpolated value ``user_message`` is in fact provided by a user: it's
an opening for a form of code injection attack, where the supplied user data
has not been properly escaped before being passed to the ``os.system`` call.
To address that problem (and a number of other concerns), this PEP proposes an
alternative approach to compiler supported interpolation, based on a new ``$``
binary operator with a syntactically constrained right hand side, a new
``__interpolate__`` magic method, and a substitution syntax inspired by
that used in ``string.Template`` and ES6 JavaScript, rather than adding a 4th
substitution variable syntax to Python.
Some examples of the proposed syntax::
msg = str$'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
print(_$"This is a $translated $message")
translated = l20n$"{{ $user }} is running {{ appname }}"
myquery = sql$"SELECT $column FROM $table;"
mycommand = sh$"cat $filename"
mypage = html$"<html><body>${response.body}</body></html>"
callable = defer$ "$x + $y"
Proposal
========
This PEP proposes the introduction of a new binary operator specifically for
interpolation of arbitrary expressions::
value = interpolator $ "Substitute $names and ${expressions} at runtime"
This would be effectively interpreted as::
_raw_template = "Substitute $names and ${expressions} at runtime"
_parsed_fields = (
("Substitute ", 0, "names", "", ""),
(" and ", 1, "expressions", "", ""),
(" at runtime", None, None, None, None),
)
_field_values = (names, expressions)
value = interpolator.__interpolate__(_raw_template,
_parsed_fields,
_field_values)
The right hand side of the new operator would be syntactically constrained to
be a string literal.
The ``str`` builtin type would gain an ``__interpolate__`` implementation that
supported the following ``str.format`` inspired semantics::
2015-08-08 05:20:33 -04:00
>>> import datetime
>>> name = 'Jane'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> str$'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
2015-08-08 05:20:33 -04:00
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> str$'She said her name is ${name!r}.'
2015-08-08 05:20:33 -04:00
"She said her name is 'Jane'."
The interpolation operator could be used with single-quoted, double-quoted and
triple quoted strings, including raw strings. It would not support bytes
literals as the right hand side of the expression.
2015-08-08 05:20:33 -04:00
This PEP does not propose to remove or deprecate any of the existing
string formatting mechanisms, as those will remain valuable when formatting
2015-08-08 05:28:56 -04:00
strings that are not present directly in the source code of the application.
2015-08-08 05:20:33 -04:00
Rationale
=========
PEP 498 makes interpolating values into strings with full access to Python's
lexical namespace semantics simpler, but it does so at the cost of creating a
situation where interpolating values into sensitive targets like SQL queries,
shell commands and HTML templates will enjoy a much cleaner syntax when handled
without regard for code injection attacks than when they are handled correctly.
It also has the effect of introducing yet another syntax for substitution
expressions into Python, when we already have 3 (``str.format``,
``bytes.__mod__`` and ``string.Template``)
This PEP proposes to handle the former issue by always specifying an explicit
interpolator for interpolation operations, and the latter by adopting the
``string.Template`` substitution syntax defined in PEP 292.
2015-08-08 05:20:33 -04:00
The substitution syntax devised for PEP 292 is deliberately simple so that the
template strings can be extracted into an i18n message catalog, and passed to
2015-08-08 05:20:33 -04:00
translators who may not themselves be developers. For these use cases, it is
important that the interpolation syntax be as simple as possible, as the
translators are responsible for preserving the substition markers, even as
they translate the surrounding text. The PEP 292 syntax is also a common mesage
catalog syntax already supporting by many commercial software translation
support tools.
PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
introduced for general purpose string formatting in PEP 3101, so this PEP adds
that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
tools the option of rejecting usage of that more advanced syntax at runtime,
rather than categorically rejecting it at compile time. The proposed permitted
expressions, conversion specifiers, and format specifiers inside ``${ref}`` are
exactly as defined for ``{ref}`` substituion in PEP 498.
The specific proposal in this PEP is also deliberately close in both syntax
and semantics to the general purpose interpolation syntax introduced to
JavaScript in ES6, as we can reasonably expect a great many Python developers
to be regularly switching back and forth between user interface code written in
JavaScript and core application code written in Python.
2015-08-08 05:20:33 -04:00
Specification
=============
This PEP proposes the introduction of ``$`` as a new binary operator designed
specifically to support interpolation of template strings::
2015-08-08 05:20:33 -04:00
INTERPOLATOR $ TEMPLATE_STRING
2015-08-08 05:20:33 -04:00
This would work as a normal binary operator (precedence TBD), with the
exception that the template string would be syntactically constrained to be a
string literal, rather than permitting arbitrary expressions.
2015-08-08 05:20:33 -04:00
The template string must be a Unicode string (bytes literals are not permitted),
and string literal concatenation operates as normal within the template string
component of the expression.
2015-08-08 05:20:33 -04:00
The template string is parsed into literals and expressions. Expressions
2015-08-08 05:20:33 -04:00
appear as either identifiers prefixed with a single "$" character, or
surrounded be a leading '${' and a trailing '}. The parts of the format string
that are not expressions are separated out as string literals.
While parsing the string, any doubled ``$$`` is replaced with a single ``$``
and is considered part of the literal text, rather than as introducing an
expression.
These components are then organised into a tuple of tuples, and passed to the
``__interpolate__`` method of the interpolator identified by the given
name along with the runtime values of any expressions to be interpolated::
2015-08-08 05:20:33 -04:00
DOTTED_NAME.__interpolate__(TEMPLATE_STRING,
<parsed_fields>,
<field_values>)
2015-08-08 05:20:33 -04:00
The template string field tuple is inspired by the interface of
``string.Formatter.parse``, and consists of a series of 5-tuples each
containing:
2015-08-08 05:20:33 -04:00
* a leading string literal (may be the empty string)
* the substitution field position (zero-based enumeration)
* the substitution expression text
* the substitution conversion specifier (as defined by str.format)
* the substitution format specifier (as defined by str.format)
2015-08-08 05:20:33 -04:00
This field ordering is defined such that reading the parsed field tuples from
left to right will have all the subcomponents displayed in the same order as
they appear in the original template string.
For ease of access the sequence elements will be available as attributes in
addition to being available by position:
* ``leading_text``
* ``field_position``
* ``expression``
* ``conversion``
* ``format``
2015-08-08 05:20:33 -04:00
The expression text is simply the text of the substitution expression, as it
2015-08-08 05:20:33 -04:00
appeared in the original string, but without the leading and/or surrounding
expression markers. The conversion specifier and format specifier are separated
from the substition expression by ``!`` and ``:`` as defined for ``str.format``.
If a given substition field has no leading literal section, coversion specifier
or format specifier, then the corresponding elements in the tuple are the
empty string. If the final part of the string has no trailing substitution
field, then the field position, field expression, conversion specifier and
format specifier will all be ``None``.
2015-08-08 05:20:33 -04:00
The substitution field values tuple is created by evaluating the interpolated
expressions in the exact runtime context where the interpolation expression
appears in the source code.
2015-08-08 05:20:33 -04:00
For the following example interpolation expression::
2015-08-08 05:20:33 -04:00
str$'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'
2015-08-08 05:20:33 -04:00
the parsed fields tuple would be::
2015-08-08 05:20:33 -04:00
(
('abc', 0, 'expr1', '', 'spec1'),
('', 1, 'expr2', 'r', 'spec2'),
(def', 2, 'expr3', 's', ''),
('ghi', 3, 'ident', '', ''),
('$jkl', None, None, None, None)
2015-08-08 05:20:33 -04:00
)
While the field values tuple would be::
2015-08-08 05:20:33 -04:00
(expr1, expr2, expr3, ident)
2015-08-08 05:20:33 -04:00
The parsed fields tuple can be constant folded at compile time, while the
expression values tuple will always need to be constructed at runtime.
2015-08-08 05:20:33 -04:00
The ``str.__interpolate__`` implementation would have the following
2015-08-08 05:20:33 -04:00
semantics, with field processing being defined in terms of the ``format``
builtin and ``str.format`` conversion specifiers::
_converter = string.Formatter().convert_field
def __interpolate__(raw_template, fields, values):
2015-08-08 05:20:33 -04:00
template_parts = []
for leading_text, field_num, expr, conversion, format_spec in fields:
2015-08-08 05:20:33 -04:00
template_parts.append(leading_text)
if field_num is not None:
value = values[field_num]
if conversion:
value = _converter(value, conversion)
field_text = format(value, format_spec)
template_parts.append(field_str)
return "".join(template_parts)
Writing custom interpolators
----------------------------
2015-08-08 05:20:33 -04:00
To simplify the process of writing custom interpolators, it is proposed to add
a new builtin decorator, ``interpolator``, which would be defined as::
2015-08-08 05:20:33 -04:00
def interpolator(f):
f.__interpolate__ = f.__call__
return f
2015-08-08 05:20:33 -04:00
This allows new interpolators to be written as::
2015-08-08 05:20:33 -04:00
@interpolator
def my_custom_interpolator(raw_template, parsed_fields, field_values):
...
2015-08-08 05:20:33 -04:00
Expression evaluation
---------------------
The subexpressions that are extracted from the interpolation expression are
evaluated in the context where the interpolation expression appears. This means
the expression has full access to local, nonlocal and global variables. Any
valid Python expression can be used inside ``${}``, including function and
method calls. References without the surrounding braces are limited to looking
up single identifiers.
2015-08-08 05:20:33 -04:00
Because the substitution expressions are evaluated where the string appears in
the source code, there are no additional security concerns related to the
contents of the expression itself, as you could have also just written the
same expression and used runtime field parsing::
2015-08-08 05:20:33 -04:00
>>> bar=10
>>> def foo(data):
... return data + 20
...
>>> str$'input=$bar, output=${foo(bar)}'
2015-08-08 05:20:33 -04:00
'input=10, output=30'
Is essentially equivalent to::
2015-08-08 05:20:33 -04:00
>>> 'input={}, output={}'.format(bar, foo(bar))
'input=10, output=30'
Handling code injection attacks
-------------------------------
2015-08-08 05:20:33 -04:00
The proposed interpolation expressions make it potentially attractive to write
code like the following::
2015-08-08 05:20:33 -04:00
myquery = str$"SELECT $column FROM $table;"
mycommand = str$"cat $filename"
mypage = str$"<html><body>${response.body}</body></html>"
These all represent potential vectors for code injection attacks, if any of the
variables being interpolated happen to come from an untrusted source. The
specific proposal in this PEP is designed to make it straightforward to write
use case specific interpolators that take care of quoting interpolated values
appropriately for the relevant security context::
2015-08-08 05:20:33 -04:00
myquery = sql$"SELECT $column FROM $table;"
mycommand = sh$"cat $filename"
mypage = html$"<html><body>${response.body}</body></html>"
2015-08-08 05:20:33 -04:00
This PEP does not cover adding such interpolators to the standard library,
but instead ensures they can be readily provided by third party libraries.
(Although it's tempting to propose adding __interpolate__ implementations to
``subprocess.call``, ``subprocess.check_call`` and ``subprocess.check_output``)
Format and conversion specifiers
--------------------------------
Aside from separating them out from the substitution expression, format and
conversion specifiers are otherwise treated as opaque strings by the
interpolation template parser - assigning semantics to those (or, alternatively,
prohibiting their use) is handled at runtime by the specified interpolator.
2015-08-08 05:20:33 -04:00
Error handling
--------------
Either compile time or run time errors can occur when processing interpolation
expressions. Compile time errors are limited to those errors that can be
detected when parsing a template string into its component tuples. These
errors all raise SyntaxError.
2015-08-08 05:20:33 -04:00
Unmatched braces::
>>> str$'x=${x'
2015-08-08 05:20:33 -04:00
File "<stdin>", line 1
SyntaxError: missing '}' in interpolation expression
Invalid expressions::
>>> str$'x=${!x}'
2015-08-08 05:20:33 -04:00
File "<fstring>", line 1
!x
^
SyntaxError: invalid syntax
Run time errors occur when evaluating the expressions inside an
template string. See PEP 498 for some examples.
2015-08-08 05:20:33 -04:00
Different interpolators may also impose additional runtime
2015-08-08 05:20:33 -04:00
constraints on acceptable interpolated expressions and other formatting
details, which will be reported as runtime exceptions.
Internationalising interpolated strings
=======================================
Since this PEP derives its interpolation syntax from the internationalisation
focused PEP 292, it's worth considering the potential implications this PEP
may have for the internationalisation use case.
2015-08-08 05:20:33 -04:00
Internationalisation enters the picture by writing a custom interpolator that
performs internationalisation. For example, the following implementation
would delegate interpolation calls to ``string.Template``::
2015-08-08 05:20:33 -04:00
@interpolator
def i18n(template, fields, values):
translated = gettext.gettext(template)
value_map = _build_interpolation_map(fields, values)
return string.Template(translated).safe_substitute(value_map)
def _build_interpolation_map(fields, values):
field_values = {}
for literal_text, field_num, expr, conversion, format_spec in fields:
assert expr.isidentifier() and not conversion and not format_spec
2015-08-08 05:20:33 -04:00
if field_num is not None:
field_values[expr] = values[field_num]
return field_values
2015-08-08 05:20:33 -04:00
And would could then be invoked as::
2015-08-08 05:20:33 -04:00
# _ = i18n at top of module or injected into the builtins module
print(_$"This is a $translated $message")
2015-08-08 05:20:33 -04:00
Any actual i18n implementation would need to address other issues (most notably
message catalog extraction), but this gives the general idea of what might be
possible.
2015-08-08 05:20:33 -04:00
It's also worth noting that one of the benefits of the ``$`` based substitution
syntax in this PEP is its compatibility with Mozilla's
`l20n syntax <http://l20n.org/>`__, which uses ``{{ name }}`` for global
substitution, and ``{{ $user }}`` for local context substitution.
2015-08-08 05:20:33 -04:00
With the syntax in this PEP, an l20n interpolator could be written as::
translated = l20n$"{{ $user }} is running {{ appname }}"
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
catalog lookups using PEP 498's semantics), the necessary brace escaping would
make the string look like this in order to interpolate the user variable
while preserving all of the expected braces::
interpolated = "{{{{ ${user} }}}} is running {{{{ appname }}}}"
2015-08-08 05:20:33 -04:00
Possible integration with the logging module
============================================
One of the challenges with the logging module has been that previously been
unable to devise a reasonable migration strategy away from the use of
printf-style formatting. The runtime parsing and interpolation overhead for
logging messages also poses a problem for extensive logging of runtime events
for monitoring purposes.
While beyond the scope of this initial PEP, the proposal described here could
potentially be applied to the logging module's event reporting APIs, permitting
relevant details to be captured using forms like::
logging.debug$"Event: $event; Details: $data"
logging.critical$"Error: $error; Details: $data"
2015-08-08 05:20:33 -04:00
Discussion
==========
Refer to PEP 498 for additional discussion, as several of the points there
also apply to this PEP.
Using call syntax to support keyword-only parameters
----------------------------------------------------
The logging examples raise the question of whether or not it may be desirable
to allow interpolators to accept arbitrary keyword arguments, and allow folks
to write things like::
logging.critical$"Error: $error; Details: $data"(exc_info=True)
in order to pass additional keyword only arguments to the interpolator.
With the current PEP, such code would attempt to call the result of the
interpolation operation. If interpolation keyword support was added, then
calling the result of an interpolation operation directly would require
parentheses for disambiguation::
(defer$ "$x + $y")()
("defer" here would be an interpolator that compiled the supplied string as
a piece of Python code with eagerly bound references to the containing
namespace)
Determining relative precedence
-------------------------------
The PEP doesn't currently specify the relative precedence of the new operator,
as the only examples considered so far concern standalone expressions or simple
variable assignments.
Development of a reference implementation based on the PEP 498 reference
implementation may help answer that question.
Deferring support for binary interpolation
------------------------------------------
Supporting binary interpolation with this syntax would be relatively
straightforward (just a matter of relaxing the syntactic restrictions on the
right hand side of the operator), but poses a signficant likelihood of
producing confusing type errors when a text interpolator was presented with
binary input.
Since the proposed operator is useful without binary interpolation support, and
such support can be readily added later, further consideration of binary
interpolation is considered out of scope for the current PEP.
Preserving the raw template string
----------------------------------
Earlier versions of this PEP failed to make the raw template string available
to interpolators. This greatly complicated the i18n example, as it needed to
reconstruct the original template to pass to the message catalog lookup.
Using a magic method rather than a global name lookup
-----------------------------------------------------
Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
a magic method on an explicitly named interpolator. Naming the interpolator
eliminated a lot of the complexity otherwise associated with shadowing the
builtin function in order to modify the semantics of interpolation.
2015-08-08 05:20:33 -04:00
Relative order of conversion and format specifier in parsed fields
------------------------------------------------------------------
2015-08-08 05:20:33 -04:00
The relative order of the conversion specifier and the format specifier in the
substitution field 5-tuple is defined to match the order they appear in the
format string, which is unfortunately the inverse of the way they appear in the
``string.Formatter.parse`` 4-tuple.
2015-08-08 05:20:33 -04:00
I consider this a design defect in ``string.Formatter.parse``, so I think it's
worth fixing it in for the customer interpolator API, since the tuple already
has other differences (like including both the field position number *and* the
text of the expression).
2015-08-08 05:20:33 -04:00
This PEP also makes the parsed field attributes available by name, so it's
possible to write interpolators without caring about the precise field order
at all.
2015-08-08 05:20:33 -04:00
References
==========
.. [#] %-formatting
(https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
.. [#] str.format
(https://docs.python.org/3/library/string.html#formatstrings)
.. [#] string.Template documentation
(https://docs.python.org/3/library/string.html#template-strings)
.. [#] PEP 215: String Interpolation
(https://www.python.org/dev/peps/pep-0215/)
.. [#] PEP 292: Simpler String Substitutions
(https://www.python.org/dev/peps/pep-0292/)
2015-08-08 05:20:33 -04:00
.. [#] PEP 3101: Advanced String Formatting
(https://www.python.org/dev/peps/pep-3101/)
.. [#] PEP 498: Literal string formatting
(https://www.python.org/dev/peps/pep-0498/)
.. [#] string.Formatter.parse
(https://docs.python.org/3/library/string.html#string.Formatter.parse)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: