python-peps/pep-0498.txt

584 lines
18 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 498
Title: Literal String Formatting
Version: $Revision$
Last-Modified: $Date$
Author: Eric V. Smith <eric@trueblade.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-Aug-2015
Python-Version: 3.6
Post-History: 07-Aug-2015
Abstract
========
Python supports multiple ways to format text strings. These include
%-formatting [#]_, str.format [#]_, and string.Template [#]_. Each of
these methods have their advantages, but in addition have
disadvantages that make them cumbersome to use in practice. This PEP
proposed to add a new string formatting mechanism: Literal String
Formatting. In this PEP, such strings will be refered to as
"f-strings", taken from the leading character used to denote such
strings.
This PEP does not propose to remove or deprecate any of the existing
string formatting mechanisms.
f-strings provide a way to combine string literals with Python
expressions, using a minimal syntax. It should be noted that an
f-string is really an expression evaluated at run time, not a constant
value. An f-string is a string, prefixed with 'f', that contains
expressions inside braces. The expressions are replaced with their
values. Some examples are::
>>> import datetime
>>> name = 'Fred'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> f'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.'
'My name is Fred, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> f'He said his name is {name!r}.'
"He said his name is 'Fred'."
This PEP proposes a new method on the str type:
str.interpolate(). This method will be used to implement f-strings.
A similar feature was proposed in PEP 215 [#]_. PEP 215 proposed to
support a subset of Python expressions, and did not support the
type-specific string formatting (the __format__ method) which was
introduced with PEP 3101 [#]_.
Rationale
=========
This PEP is driven by the desire to have a simpler way to format
strings in Python. The existing ways of formatting are either error
prone, inflexible, or cumbersome.
%-formatting is limited as to the types it supports. Only ints, strs,
and doubles can be formatted. All other types are either not
supported, or converted to one of these types before formatting. In
addition, there's a well-known trap where a single value is passed::
>>> msg = 'disk failure'
>>> 'error: %s' % msg
'error: disk failure'
But if msg were ever to be a tuple, the same code would fail::
>>> msg = ('disk failure', 32)
>>> 'error: %s' % msg
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting
To be defensive, the following code should be used::
>>> 'error: %s' % (msg,)
"error: ('disk failure', 32)"
str.format() was added to address some of these problems with
%-formatting. In particular, it uses normal function call syntax (and
therefor supports mutliple parameters) and it is extensible through
the __format__() method on the object being converted to a string. See
PEP-3101 for a detailed rationale. This PEP reuses much of the
str.format() syntax and machinery, in order to provide continuity with
an existing Python string formatting mechanism.
However, str.format() is not without its issues. Chief among them are
its verbosity. For example, the text 'value' is repeated here::
>>> value = 4 * 20
>>> 'The value is {value}.'.format(value=value)
'The value is 80.'
Even in its simplest form, there is a bit of boilerplate, and the
value that's inserted into the placeholder is sometimes far removed
from where the placeholder is situated::
>>> 'The value is {}.'.format(value)
'The value is 80.'
With an f-string, this becomes::
>>> f'The value is {value}.'
'The value is 80.'
f-strings provide a concise, readable way to include expressions
inside strings.
string.Template has similar shortcomings to str.format(), but also
supports fewer formatting options. In particular, it does not support
__format__.
No use of globals() or locals()
-------------------------------
In the discussions on python-dev [#]_, a number of solutions where
presented that used locals() and globals() or their equivalents. All
of these have various problems. Among these are referencing variables
that are not otherwise used in a closure. Consider::
>>> def outer(x):
... def inner():
... return 'x={x}'.format_map(locals())
... return inner
...
>>> outer(42)()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in inner
KeyError: 'x'
This returns an error because the compiler has not added a reference
to x inside the closure. You need to manually add a reference to x in
order for this to work::
>>> def outer(x):
... def inner():
... x
... return 'x={x}'.format_map(locals())
... return inner
...
>>> outer(42)()
'x=42'
In addition, using locals() or globals() introduces an information
leak. A called routine that has access to the callers locals() or
globals() has access to far more information than needed to do the
string interpolation.
Guido stated [#]_ that any solution to better string interpolation
would not use locals() or globals().
Specification
=============
In source code, f-strings are string literals that are prefixed by the
letter 'f'. 'f' may be combined with 'r', in either order, to produce
raw f-string literals. 'f' may not be combined with 'b': there are no
binary f-strings. 'f' may also be combined with 'u', in either order,
although adding 'u' has no effect.
f-strings are parsed in to literals and expressions. Expressions
appear within curly braces '{' and '}. The parts of the string outside
of braces are literals. The expressions are evaluated, formatted with
the existing __format__ protocol, then the results are concatenated
together with the string literals. While scanning the string for
expressions, any doubled braces '{{' or '}}' are replaced by the
corresponding single brace. Doubled opening braces do not signify the
start of an expression.
Following the expression, an optional type conversion may be
specified. The allowed conversions are '!s', '!r', or '!a'. These are
treated the same as in str.format: '!s' calls str() on the expression,
'!r' calls repr() on the expression, and '!a' calls ascii() on the
expression. These conversions are applied before the call to
__format__. The only reason to use '!s' is if you want to specify a
format specifier that applies to str, not to the type of the
expression.
Similar to str.format, optional format specifiers maybe be included
inside the f-string, separated from the expression (or the type
conversion, if specified) by a colon. If a format specifier is not
provied, an empty string is used.
So, an f-string looks like::
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } text ... '
The resulting expression's __format__ method is called with the format
specifier. The resulting value is used when building the value of the
f-string.
Expressions cannot contain ':' or '!' outside of strings or parens,
brackets, or braces. The exception is that the '!=' operator is
special cased.
str.interpolate()
-----------------
str.interpolate(mapping) will be a new method. It takes one argument:
a mapping of field names to values. This method is the same as
str.format_map() [#]_, with one difference: it does not interpret the
field_name [#]_ in any way. The field_name is only used to look up the
replacement value in the supplied mapping object. Like str.format()
and str.format_map(), str.interpolate() does interpret and apply the
optional conversion and format_spec. Thus, a field_name may not
contain the characters ':' or '}', nor the strings '!s', '!r', or
'!a'.
Code equivalence
----------------
An f-string is evaluated at run time as a call to str.interpolate().
For example, this code::
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi'
Will be be evaluated as::
'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi'.interpolate({'expr1': expr1, 'expr2': expr2, 'expr3': expr3})
Note that the string on which interpolate() is being called is
identical to the value of the f-string.
Expression evaluation
---------------------
The expressions that are extracted from the string are evaluated in
the context where the f-string appeared. This means the expression has
full access to local and global variables. Any valid Python expression
can be used, including function and method calls.
Because the f-strings are evaluated where the string appears in the
source code, there is no additional expressiveness available with
f-strings. There are also no additional security concerns: you could
have also just written the same expression, not inside of an
f-string::
>>> def foo():
... return 20
...
>>> f'result={foo()}'
'result=20'
Is equivalent to::
>>> 'result=' + str(foo())
'result=20'
After stripping leading and trailing whitespace (see below), the
expression is parsed with the equivalent of ast.parse(expression,
'<fstring>', 'eval') [#]_. Note that this restricts the expression: it
cannot contain any newlines, for example::
>>> x = 0
>>> f'''{x
... +1}'''
File "<fstring>", line 2
+1
^
SyntaxError: invalid syntax
But note that this works, since the newline is removed from the
string, and the spaces in from of the '1' are allowed in an
expression::
>>> f'{x+\
... 1}'
'2'
Format specifiers
-----------------
Format specifiers may also contain evaluated expressions. This allows
code such as::
>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal('12.34567')
>>> f'result: {value:{width}.{prevision}}'
'result: 12.35'
Once expressions in a format specifier are evaluated (if necessary),
format specifiers are not interpreted by the f-string evaluator. Just as
in str.format(), they are merely passed in to the __format__() method
of the object being formatted.
Concatenating strings
---------------------
Adjacent f-strings and regular strings are concatenated. Regular
strings are concatenated at compile time, and f-strings are
concatenated at run time. For example, the expression::
>>> x = 10
>>> y = 'hi'
>>> 'a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e'
yields the value::
'ab10{c}str< hi >de'
While the exact method of this run time concatenation is unspecified,
the above code might evaluate to::
''.join(['ab', '{x}'.interpolate({'x': x}), '{c}', 'str<', 'str<{y:^4}>'.interpolate({'y': y}), 'de'])
You are guaranteed, however, that there will be no compile time
combination of f-strings::
>>> x = 0
>>> y = 1
>>> f'{x}' f'{y}'
'01'
Will result in 2 calls to str.interpolate(): once on the string '{x}',
and again on the string '{y}'. This guarantee is needed to facilitate
proposed future internationalization.
Error handling
--------------
Either compile time or run time errors can occur when processing
f-strings. Compile time errors are limited to those errors that can be
detected when scanning an f-string. These errors all raise
SyntaxError.
Unmatched braces::
>>> f'x={x'
File "<stdin>", line 1
SyntaxError: missing '}' in format string expression
Invalid expressions::
>>> f'x={!x}'
File "<fstring>", line 1
!x
^
SyntaxError: invalid syntax
Run time errors occur when evaluating the expressions inside an
f-string. Note that an f-string can be evaluated multiple times, and
work sometimes and raise an error at other times::
>>> d = {0:10, 1:20}
>>> for i in range(3):
... print(f'{i}:{d[i]}')
...
0:10
1:20
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyError: 2
or::
>>> for x in (32, 100, 'fifty'):
... print(f'x = {x:+3}')
...
'x = +32'
'x = +100'
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: Sign not allowed in string format specifier
Leading and trailing whitespace in expressions is skipped
---------------------------------------------------------
For ease of readability, leading and trailing whitespace in
expressions is ignored. However, this does not affect the string or
keys passed to str.interpolate().
>>> x = 100
>>> f'x = { x }'
'x = 100'
This would be evaluated as::
'x = { x }'.interpolate({' x ': 100})
Discussion
==========
Most of the discussions on python-ideas [#]_ focused on a few issues:
- Whether to allow full Python expressions.
- How to designate f-strings, and how to specify the location of
expressions in them.
XXX: more on the above issues.
Similar support in other languages
----------------------------------
Wikipedia has a good discussion of string interpolation in other
programming languages [#]_. This feature is implemented in many
languages, with a variety of syntaxes and restrictions.
Differences between f-string and str.format expressions
-------------------------------------------------------
There is one small difference between the limited expressions allowed
in str.format() and the full expressions allowed inside f-strings. The
difference is in how index lookups are performed. In str.format(),
index values that do not look like numbers are converted to strings::
>>> d = {'a': 10, 'b': 20}
>>> 'a={d[a]}'.format(d=d)
'a=10'
Notice that the index value is converted to the string "a" when it is
looked up in the dict.
However, in f-strings, you would need to use a literal for the value
of 'a'::
>>> f'a={d["a"]}'
'a=10'
This difference is required because otherwise you would not be able to
use variables as index values::
>>> a = 'b'
>>> f'a={d[a]}'
'a=20'
See [#]_ for a further discussion. It was this observation that led to
full Python expressions being supported in f-strings.
Triple-quoted f-strings
-----------------------
Triple quoted f-strings are allowed. These strings are parsed just as
normal triple-quoted strings are. After parsing, the normal f-string
logic is applied, and str.interpolate() is called.
Raw f-strings
-------------
Raw and f-strings may be combined. For example they could be used to
build up regular expressions::
>>> header = 'Subject'
>>> fr'{header}:\s+'
'Subject:\\s+'
In addition, raw f-strings may be combined with triple-quoted strings.
No binary f-strings
-------------------
For the same reason that we don't support bytes.format(), you may not
combine 'f' with 'b' string literals. The primary problem is that an
object's __format__() method may return Unicode data that is not
compatible with a bytes string.
#XXX: maybe allow this, but encode the output as ascii?
!s, !r, and !s are redundant
----------------------------
The !s, !r, and !a are not strictly required. Because arbitrary
expressions are allowed inside the f-strings, this code::
>>> a = 'some string'
>>> f'{a!r}'
"'some string'"
Is identical to::
>>> f'{repr(a)}'
"'some string'"
Similarly, !s can be replaced by calls to str() and !a by calls to
ascii().
However, !s, !r, and !a are supported by this PEP in order to minimize
the differences with str.format(). !s, !r, and !a are required in
str.format() because it does not allow the execution of arbitrary
expressions.
Lambdas inside expressions
--------------------------
Because lambdas use the ':' character, they cannot appear outside of
parenthesis in an expression. The colon is interpreted as the start of
the format specifier, which means the start of the lambda expression
is seen and is syntactically invalid. As there's no practical use for
a plain lambda in an f-string expression, this is not seen as much of
a limitation.
Lambdas may be used inside of parens::
>>> f'{(lambda x: x*2)(3)}'
'6'
Future extensions:
==================
XXX: By using another leading character (say, 'i'), we could extend
this proposal to cover internationalization and localization. The idea
is that the string would be passed to some lookup function before
.interpolate() is called on it:
>>> name = 'Eric'
>>> fi'Name: {name}'
Could be translated as::
gettext.gettext('Name: {name}').interpolate({'name': name})
If gettext.gettext() returned '{name} es mi nombre', then the
resulting string would be 'Eric es mi nombre'.
Any such internationalization work will be specified in an additional
PEP. In all likelyhood, such a PEP will need to propose one or more
additional optional parameters to str.interpolate() in order to handle
the string.Template case of "safe substitution", where the substituted
field_names are not found in the mapping argument. The choices might
be: use the field_name, use a default (possibly empty) string, or
raise an exception.
References
==========
.. [#] %-formatting
(https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
.. [#] str.format
(https://docs.python.org/3/library/string.html#formatstrings)
.. [#] string.Template documentation
(https://docs.python.org/3/library/string.html#template-strings)
.. [#] PEP 215: String Interpolation
(https://www.python.org/dev/peps/pep-0215/)
.. [#] PEP 3101: Advanced String Formatting
(https://www.python.org/dev/peps/pep-3101/)
.. [#] Formatting using locals() and globals()
(https://mail.python.org/pipermail/python-ideas/2015-July/034671.html)
.. [#] Avoid locals() and globals()
(https://mail.python.org/pipermail/python-ideas/2015-July/034701.html)
.. [#] str.format_map() documentation
(https://docs.python.org/3/library/stdtypes.html#str.format_map)
.. [#] Format string syntax
(https://docs.python.org/3/library/string.html#format-string-syntax)
.. [#] ast.parse() documentation
(https://docs.python.org/3/library/ast.html#ast.parse)
.. [#] Start of python-ideas discussion
(https://mail.python.org/pipermail/python-ideas/2015-July/034657.html)
.. [#] Wikipedia article on string interpolation
(https://en.wikipedia.org/wiki/String_interpolation)
.. [#] Differences in str.format() and f-string expressions
(https://mail.python.org/pipermail/python-ideas/2015-July/034726.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: