diff --git a/pep-0498.txt b/pep-0498.txt new file mode 100644 index 000000000..93cfc8d6d --- /dev/null +++ b/pep-0498.txt @@ -0,0 +1,441 @@ +PEP: XXX +Title: Literal String Formatting +Version: $Revision$ +Last-Modified: $Date$ +Author: Eric V. Smith +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 01-Aug-2015 +Python-Version: 3.6 +Post-History: 07-Aug-2015 + +Abstract +======== + +Python supports multiple ways to format text strings. These include +%-formatting [#]_, str.format [#]_, and string.Template [#]_. Each of +these methods have their advantages, but in addition have +disadvantages that make them cumbersome to use in practice. This PEP +proposed to add a new string formatting mechanism: Literal String +Formatting. In this PEP, such strings will be refered to as +"f-strings", taken from the leading character used to denote such +strings. + +f-strings provide a way to combine string literals with Python +expressions, using a minimal syntax. It should be noted that an +f-string is really an expression evaluated at run time, not a constant +value. An f-string is a string, prefixed with 'f', that contains +expressions inside braces. The expressions are replaced with their +values. Some examples are:: + + >>> import datetime + >>> name = 'Fred' + >>> age = 50 + >>> anniversary = datetime.date(1991, 10, 12) + >>> f'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.' + 'My name is Fred, my age next year is 51, my anniversary is Saturday, October 12, 1991.' + >>> f'He said his name is {name!r}.' + "He said his name is 'Fred'." + +This PEP does not propose to remove or deprecate any of the existing +string formatting mechanisms. + +A similar feature was proposed in PEP 215 [#]_. PEP 215 proposed to +support a subset of Python expressions, and did not support the +type-specific string formatting (the __format__ method) which was +introduced with PEP 3101 [#]_. + +Rationale +========= + +This PEP is driven by the desire to have a simpler way to format +strings in Python. The existing ways of formatting are either error +prone, inflexible, or cumbersome. + +%-formatting is limited as to the types it supports. Only ints, strs, +and doubles can be formatted. All other types are either not +supported, or converted to one of these types before formatting. In +addition, there's a well-known trap where a single value is passed:: + + >>> msg = 'disk failure' + >>> 'error: %s' % msg + 'error: disk failure' + +But if msg were ever to be a tuple, the same code would fail:: + + >>> msg = ('disk failure', 32) + >>> 'error: %s' % msg + Traceback (most recent call last): + File "", line 1, in + TypeError: not all arguments converted during string formatting + +To be defensive, the following code should be used:: + + >>> 'error: %s' % (msg,) + "error: ('disk failure', 32)" + +str.format() was added to address some of these problems with +%-formatting. In particular, it uses normal function call syntax (and +therefor supports mutliple parameters) and it is extensible through +the __format__() method on the object being converted to a string. See +PEP-3101 for a detailed rationale. + +However, str.format() is not without its issues. Chief among them are +its verbosity. For example, the text 'value' is repeated here:: + + >>> value = 4 * 20 + >>> 'The value is {value}.'.format(value=value) + 'The value is 80.' + +Even in its simplest form, there is a bit of boilerplate, and the +value that's inserted into the placeholder is sometimes far removed +from where the placeholder is situated:: + + >>> 'The value is {}.'.format(value) + 'The value is 80.' + +With an f-string, this becomes:: + + >>> f'The value is {value}.' + 'The value is 80.' + +f-strings provide a concise, readable way to include expressions +inside strings. + +string.Template has similar shortcomings to str.format(), but also +supports fewer formatting options. In particular, it does not support +__format__. + +No use of globals() or locals() +------------------------------- + +In the discussions on python-dev [#]_, a number of solutions where +presented that used locals() and globals() or their equivalents. All +of these have various problems. Among these are referencing variables +that are not otherwise used in a closure. Consider:: + + >>> def outer(x): + ... def inner(): + ... return 'x={x}'.format_map(locals()) + ... return inner + ... + >>> outer(42)() + Traceback (most recent call last): + File "", line 1, in + File "", line 3, in inner + KeyError: 'x' + +This returns an error because the compiler has not added a reference +to x inside the closure. You need to manually add a reference to x in +order for this to work:: + + >>> def outer(x): + ... def inner(): + ... x + ... return 'x={x}'.format_map(locals()) + ... return inner + ... + >>> outer(42)() + 'x=42' + +Guido stated [#]_ that any solution to better string interpolation +would not use locals() or globals(). + +Specification +============= + +In source code, f-strings are string literals that are prefixed by the +letter 'f'. 'f' may be combined with 'r', in either order, to produce +raw f-string literals. 'f' may not be combined with 'b': there are no +binary f-strings. 'f' may also be combined with 'u', in either order, +although adding 'u' has no effect. + +f-strings are parsed in to literals and expressions. Expressions +appear within curly braces '{' and '}. The parts of the string outside +of braces are literals. The expressions are evaluated, formatted with +the existing __format__ protocol, then the results are concatenated +together with the string literals. While scanning the string for +expressions, any doubled braces '{{' or '}}' are replaced by the +corresponding single brace. Doubled opening braces do not signify the +start of an expression. + +Following the expression, an optionally the type conversion may be +specified. The allowed conversions are '!s' or '!r'. These are +treated the same as in str.format: '!s' calls str() on the expression, +and '!r' calls repr() on the expression. These conversions are applied +before the call to __format__. The only reason to use '!s' is if you +want to specify a format specifier that applies to str, not to the +type of the expression. + +Similar to str.format, optional format specifiers maybe be included +inside the f-string, separated from the expression (or the type +conversion, if specified) by a colon. If a format specifier is not +provied, an empty string is used. + +So, an f-string looks like:: + + f ' { } text ... ' + +The resulting expression's __format__ method is called with the format +specifier. The resulting value is used when building the value of the +f-string. + +Expressions cannot contain ':' or '!' outside of strings or parens, +brackets, or braces. The exception is that the '!=' operator is +special cased. + +Code equivalence +---------------- + +The exact code that is executed when converting expressions to strings +is unspecified by this PEP. However, it is specified that once the +expression is evaluated, the results __format__() method will be +called with the given format specifier. + +For example, this code:: + + f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi' + +May be evaluated as:: + + ''.join(['abc', expr1.__format__('spec1'), repr(expr2).__format__('spec2'), 'def', str(expr3).__format__(''), 'ghi']) + +Expression evaluation +--------------------- + +The expressions that are extracted from the string are evaluated in +the context where the f-string appeared. This means the expression has +full access to local and global variables. Any valid Python expression +can be used, including function and method calls. + +Because the f-strings are evaluated where the string appears in the +source code, there is no additional expressiveness available with +f-strings. There are also no additional security concerns: you could +have also just written the same expression, not inside of an +f-string:: + + >>> def foo(): + ... return 20 + ... + >>> f'result={foo()}' + 'result=20' + +Is equivalent to:: + + >>> 'result=' + str(foo()) + 'result=20' + +Format specifiers +----------------- + +Format specifiers are not interpreted by the f-string parser. Just as +in str.format(), they are merely passed in to the __format__() method +of the object being formatted. + +Concatenating strings +--------------------- + +Adjacent f-strings and regular strings are concatenated. Regular +strings are concatenated at compile time, and f-strings are +concatenated at run time. For example, the expression:: + + >>> x = 10 + >>> y = 'hi' + >>> 'a' 'b' f'{x}' 'c' f'str<{y:^4}>' 'd' 'e' + +yields the value:: + + 'ab10cstr< hi >de' + +While the exact code that is executed when evaluating this f-string is +not specified, one possible strategy is to evaluate:: + + ''.join(['ab', x.__format__(''), 'c', 'str<', y.__format__('^4'), '>', 'de']) + +Error handling +-------------- + +Either compile time or run time errors can occur when processing +f-strings. Compile time errors are limited to those errors that can be +detected when scanning an f-string. These errors all raise +SyntaxError. + +Unmatched braces:: + + >>> f'x={x' + File "", line 1 + SyntaxError: missing '}' in format string expression + +Invalid expressions:: + + >>> f'x={!x}' + File "", line 1 + !x + ^ + SyntaxError: invalid syntax + +Run time errors occur when evaluating the expressions inside an +f-string. Note that an f-string can be executed multiple times, and +work sometimes and raise an error other times:: + + >>> d = {0:10, 1:20} + >>> for i in range(3): + ... print(f'{i}:{d[i]}') + ... + 0:10 + 1:20 + Traceback (most recent call last): + File "", line 2, in + KeyError: 2 + +Leading whitespace in expressions is skipped +-------------------------------------------- + +Because expressions may begin with a left brace ('{'), there is a +problem when parsing such expressions. For example:: + + >>> f'{{k:v for k, v in [(1, 2), (3, 4)}}' + '{k:v for k, v in [(1, 2), (3, 4)}' + +In this case, the doubled left braces and doubled right braces are +interpreted as single braces, and the string becomes just a normal +string literal. There is no expression evaluation being performed. + +To account for this, whitespace characters at the beginning of an +expression are skipped:: + + >>> f'{ {k:v for k, v in [(1, 2), (3, 4)}}' + '{k:v for k, v in [(1, 2), (3, 4)}' + +Discussion +========== + +Most of the discussions on python-ideas [#]_ focused on a few issues: + + - Whether to allow full Python expressions. + - How to designate f-strings, and how specify the locaton of + expressions in them. + - How to concatenate adjacent strings and f-strings. + +XXX: more on the above issues. + +Differences between f-string and str.format expressions +------------------------------------------------------- + +There is one small difference between the limited expressions allowed +in str.format() and the full expressions allowed inside f-strings. The +difference is in how index lookups are performed. In str.format(), +index values that do not look like numbers are converted to strings:: + + >>> d = {'a': 10, 'b': 20} + >>> 'a={d[a]}'.format(d=d) + 'a=10' + +Notice that the index value is converted to the string "a" when it is +looked up in the dict. + +However, in f-strings, you would need to use a literal for the value +of 'a':: + + >>> f'a={d["a"]}' + 'a=10' + +This difference is required because otherwise you would not be able to +use variables as index values:: + + >>> a = 'b' + >>> f'a={d[a]}' + 'a=20' + +See [#]_ for a further discussion. It was this observation that led to +full Python expressions being supported in f-strings. + +No binary f-strings +------------------- + +For the same reason that we don't support bytes.format(), you may not +combine 'f' with 'b' string literals. The primary problem is that an +object's __format__() method may return Unicode data that is not +compatible with a bytes string. + +!s and !r are redundant +----------------------- + +The !s and !r are not strictly required. Because arbitrary expressions +are allowed inside the f-strings, this code:: + + >>> a = 'some string' + >>> f'{a!r}' + "'some string'" + +Is identical to:: + + >>> f'{repr(a)}' + "'some string'" + +Similarly, !s can be replaced by calls to str(). + +However, !s and !r are supported by this PEP in order to minimize the +differences with str.format(). !s and !r are required in str.format() +because it does not allow the execution of arbitrary expressions. + +Lambdas inside expressions +-------------------------- + +Because lambdas use the ':' character, they cannot appear outside of +parenthesis in an expression. The colon is interpreted as the start of +the format specifier, which means the start of the lambda expression +is seen and is syntactically invalid. As there's no practical use for +a plain lambda in an f-string expression, this is not seen as much of +a limitation. + +Lambdas may be used inside of parens:: + + >>> f'{(lambda x: x*2)(3)}' + '6' + +References +========== + +.. [#] %-formatting + (https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting) + +.. [#] str.format + (https://docs.python.org/3/library/string.html#formatstrings) + +.. [#] string.Template documentation + (https://docs.python.org/3/library/string.html#template-strings) + +.. [#] PEP 215: String Interpolation + (https://www.python.org/dev/peps/pep-0215/) + +.. [#] PEP 3101: Advanced String Formatting + (https://www.python.org/dev/peps/pep-3101/) + +.. [#] Formatting using locals() and globals() + (https://mail.python.org/pipermail/python-ideas/2015-July/034671.html) + +.. [#] Avoid locals() and globals() + (https://mail.python.org/pipermail/python-ideas/2015-July/034701.html) + +.. [#] Start of python-ideas discussion + (https://mail.python.org/pipermail/python-ideas/2015-July/034657.html) + +.. [#] Differences in str.format() and f-string expressions + (https://mail.python.org/pipermail/python-ideas/2015-July/034726.html) + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: