diff --git a/pep-0502.txt b/pep-0502.txt new file mode 100644 index 000000000..a51b7eba6 --- /dev/null +++ b/pep-0502.txt @@ -0,0 +1,846 @@ +PEP: 502 +Title: String Interpolation Redux +Version: $Revision$ +Last-Modified: $Date$ +Author: Mike G. Miller +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 10-Aug-2015 +Python-Version: 3.6 + +Note: Open issues below are stated with a question mark (?), +and are therefore searchable. + + +Abstract +======== + +This proposal describes a new string interpolation feature for Python, +called an *expression-string*, +that is both concise and powerful, +improves readability in most cases, +yet does not conflict with existing code. + +To achieve this end, +a new string prefix is introduced, +which expands at compile-time into an equivalent expression-string object, +with requested variables from its context passed as keyword arguments. +At runtime, +the new object uses these passed values to render a string to given +specifications, building on `the existing syntax`_ of ``str.format()``:: + + >>> location = 'World' + >>> e'Hello, {location} !' # new prefix: e'' + 'Hello, World !' # interpolated result + +.. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax + +This PEP does not recommend to remove or deprecate any of the existing string +formatting mechanisms. + + +Motivation +========== + +Though string formatting and manipulation features are plentiful in Python, +one area where it falls short +is the lack of a convenient string interpolation syntax. +In comparison to other dynamic scripting languages +with similar use cases, +the amount of code necessary to build similar strings is substantially higher, +while at times offering lower readability due to verbosity, dense syntax, +or identifier duplication. [1]_ + +Furthermore, replacement of the print statement with the more consistent print +function of Python 3 (PEP 3105) has added one additional minor burden, +an additional set of parentheses to type and read. +Combined with the verbosity of current formatting solutions, +this puts an otherwise simple language at an unfortunate disadvantage to its +peers:: + + echo "Hello, user: $user, id: $id, on host: $hostname" # bash + say "Hello, user: $user, id: $id, on host: $hostname"; # perl + puts "Hello, user: #{user}, id: #{id}, on host: #{hostname}\n" # ruby + # 80 ch -->| + # Python 3, str.format with named parameters + print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals())) + + # Python 3, variation B, worst case + print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user, + id=id, + hostname= + hostname)) + +In Python, the formatting and printing of a string with multiple variables in a +single line of code of standard width is noticeably harder and more verbose, +indentation often exacerbating the issue. + +For use cases such as smaller projects, systems programming, +shell script replacements, and even one-liners, +where message formatting complexity has yet to be encapsulated, +this verbosity has likely lead a significant number of developers and +administrators to choose other languages over the years. + + +Rationale +========= + + +Naming +------ + +The term expression-string was chosen because other applicable terms, +such as format-string and template are already well used in the Python standard +library. + +The string prefix itself, ``e''`` was chosen to demonstrate that the +specification enables expressions, +is not limited to ``str.format()`` syntax, +and also does not lend itself to `the shorthand term`_ "f-string". +It is also slightly easier to type than other choices such as ``_''`` and +``i''``, +while perhaps `less odd-looking`_ to C-developers. +``printf('')`` vs. ``print(f'')``. + +.. _the shorthand term: reference_needed +.. _less odd-looking: https://mail.python.org/pipermail/python-dev/2015-August/141147.html + + + +Goals +------------- + +The design goals of expression-strings are as follows: + +#. Eliminate need to pass variables manually. +#. Eliminate repetition of identifiers and redundant parentheses. +#. Reduce awkward syntax, punctuation characters, and visual noise. +#. Improve readability and eliminate mismatch errors, + by prefering named parameters to positional arguments. +#. Avoid need for ``locals()`` and ``globals()`` usage, + instead parsing the given string for named parameters, + then passing them automatically. [2]_ [3]_ + + +Limitations +------------- + +In contrast to other languages that take design cues from Unix and its +shells, +and in common with Javascript, +Python specified both single (``'``) and double (``"``) ASCII quote +characters to enclose strings. +It is not reasonable to choose one of them now to enable interpolation, +while leaving the other for uninterpolated strings. +"Backtick" characters (`````) are also `constrained by history`_ as a shortcut +for ``repr()``. + +This leaves a few remaining options for the design of such a feature: + +* An operator, as in printf-style string formatting via ``%``. +* A class, such as ``string.Template()``. +* A function, such as ``str.format()``. +* New syntax +* A new string prefix marker, such as the well-known ``r''`` or ``u''``. + +The first three options above currently work well. +Each has specific use cases and drawbacks, +yet also suffer from the verbosity and visual noise mentioned previously. +All are discussed in the next section. + +.. _constrained by history: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html + +Background +------------- + +This proposal builds on several existing techniques and proposals and what +we've collectively learned from them. + +The following examples focus on the design goals of readability and +error-prevention using named parameters. +Let's assume we have the following dictionary, +and would like to print out its items as an informative string for end users:: + + >>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'} + + +Printf-style formatting +''''''''''''''''''''''' + +This `venerable technique`_ continues to have its uses, +such as with byte-based protocols, +simplicity in simple cases, +and familiarity to many programmers:: + + >>> 'Hello, user: %(user)s, id: %(id)s, on host: %(hostname)s' % params + 'Hello, user: nobody, id: 9, on host: darkstar' + +In this form, considering the prerequisite dictionary creation, +the technique is verbose, a tad noisy, +and relatively readable. +Additional issues are that an operator can only take one argument besides the +original string, +meaning multiple parameters must be passed in a tuple or dictionary. +Also, it is relatively easy to make an error in the number of arguments passed, +the expected type, +have a missing key, +or forget the trailing type, e.g. (``s`` or ``d``). + +.. _venerable technique: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting + + +string.Template +''''''''''''''' + +The ``string.Template`` `class from`_ PEP 292 +(Simpler String Substitutions) +is a purposely simplified design, +using familiar shell interpolation syntax, +with `safe-substitution feature`_, +that finds its main use cases in shell and internationalization tools:: + + Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params) + +Also verbose, however the string itself is readable. +Though functionality is limited, +it meets its requirements well. +It isn't powerful enough for many cases, +and that helps keep inexperienced users out of trouble, +as well as avoiding issues with moderately-trusted input (i18n) from +third-parties. +It unfortunately takes enough code to discourage its use for ad-hoc string +interpolation, +unless encapsulated in a `convenience library`_ such as ``flufl.i18n``. + +.. _class from: https://docs.python.org/3/library/string.html#template-strings +.. _safe-substitution feature: https://docs.python.org/3/library/string.html#string.Template.safe_substitute +.. _convenience library: http://pythonhosted.org/flufl.i18n/ + + +PEP 215 - String Interpolation +'''''''''''''''''''''''''''''' + +PEP 215 was a former proposal of which this one shares a lot in common. +Apparently, the world was not ready for it at the time, +but considering recent support in a number of other languages, +its day may have come. + +The large number of dollar sign (``$``) characters it included may have +led it to resemble Python's arch-nemesis Perl, +and likely contributed to the PEP's lack of acceptance. +It was superseded by the following proposal. + + +str.format() +'''''''''''' + +The ``str.format()`` `syntax of`_ PEP 3101 is the most recent and modern of the +existing options. +It is also more powerful and usually easier to read than the others. +It avoids many of the drawbacks and limits of the previous techniques. + +However, due to its necessary function call and parameter passing, +it runs from verbose to very verbose in various situations with +string literals:: + + >>> 'Hello, user: {user}, id: {id}, on host: {hostname}'.format(**params) + 'Hello, user: nobody, id: 9, on host: darkstar' + + # when using keyword args, var name shortening sometimes needed to fit :/ + >>> 'Hello, user: {user}, id: {id}, on host: {host}'.format(user=user, + id=id, + host=hostname) + 'Hello, user: nobody, id: 9, on host: darkstar' + +.. _syntax of: https://docs.python.org/3/library/string.html#format-string-syntax + + +PEP 498 -- Literal String Formatting +'''''''''''''''''''''''''''''''''''' + +PEP 498 discusses and delves partially into implementation details of +expression-strings, +which it calls f-strings, +the idea and syntax +(with exception of the prefix letter) +of which is identical to that discussed here. +The resulting compile-time transformation however +returns a string joined from parts at runtime, +rather than an object. + +It also, somewhat controversially to those first exposed to it, +introduces the idea that these strings shall be augmented with support for +arbitrary expressions, +which is discussed further in the following sections. + + +PEP 501 -- Translation ready string interpolation +''''''''''''''''''''''''''''''''''''''''''''''''' + +The complimentary PEP 501 brings internationalization into the discussion as a +first-class concern, with its proposal of i-strings, +``string.Template`` syntax integration compatible with ES6 (Javascript), +deferred rendering, +and a similar object return value. + + +Implementations in Other Languages +---------------------------------- + +String interpolation is now well supported by various programming languages +used in multiple industries, +and is converging into a standard of sorts. +It is centered around ``str.format()`` style syntax in minor variations, +with the addition of arbitrary expressions to expand utility. + +In the `Motivation`_ section it was shown how convenient interpolation syntax +existed in Bash, Perl, and Ruby. +Let's take a look at their expression support. + + +Bash +'''' + +Bash supports a number of arbitrary, even recursive constructs inside strings:: + + > echo "user: $USER, id: $((id + 6)) on host: $(echo is $(hostname))" + user: nobody, id: 15 on host: is darkstar + +* Explicit interpolation within double quotes. +* Direct environment variable access supported. +* Arbitrary expressions are supported. [4]_ +* External process execution and output capture supported. [5]_ +* Recursive expressions are supported. + + +Perl +'''' + + +Perl also has arbitrary expression constructs, perhaps not as well known:: + + say "I have @{[$id + 6]} guanacos."; # lists + say "I have ${\($id + 6)} guanacos."; # scalars + say "Hello { @names.join(', ') } how are you?"; # Perl 6 version + +* Explicit interpolation within double quotes. +* Arbitrary expressions are supported. [6]_ [7]_ + + +Ruby +'''' + +Ruby allows arbitrary expressions in its interpolated strings:: + + puts "One plus one is two: #{1 + 1}\n" + +* Explicit interpolation within double quotes. +* Arbitrary expressions are supported. [8]_ [9]_ +* Possible to change delimiter chars with ``%``. +* See the Reference Implementation(s) section for an implementation in Python. + + +Others +'''''' + +Let's look at some less-similar modern languages recently implementing string +interpolation. + + +Scala +''''' + +`Scala interpolation`_ is directed through string prefixes. +Each prefix has a different result:: + + s"Hello, $name ${1 + 1}" # arbitrary + f"$name%s is $height%2.2f meters tall" # printf-style + raw"a\nb" # raw, like r'' + +These prefixes may also be implemented by the user, +by extending Scala's ``StringContext`` class. + +* Explicit interpolation within double quotes with literal prefix. +* User implemented prefixes supported. +* Arbitrary expressions are supported. + +.. _Scala interpolation: http://docs.scala-lang.org/overviews/core/string-interpolation.html + + +ES6 (Javascript) +''''''''''''''''''' + +Designers of `Template strings`_ faced the same issue as Python where single +and double quotes were taken. +Unlike Python however, "backticks" were not. +They were chosen as part of the ECMAScript 2015 (ES6) standard:: + + console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`); + +Custom prefixes are also supported by implementing a function the same name +as the tag:: + + function tag(strings, ...values) { + console.log(strings.raw[0]); // raw string is also available + return "Bazinga!"; + } + tag`Hello ${ a + b } world ${ a * b}`; + +* Explicit interpolation within backticks. +* User implemented prefixes supported. +* Arbitrary expressions are supported. + +.. _Template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings + +C#, Version 6 +''''''''''''' + +C# has a useful new `interpolation feature`_ as well, +with some ability to `customize interpolation`_ via the ``IFormattable`` +interface:: + + $"{person.Name, 20} is {person.Age:D3} year{(p.Age == 1 ? "" : "s")} old."; + +* Explicit interpolation with double quotes and ``$`` prefix. +* Custom interpolations are available. +* Arbitrary expressions are supported. + +.. _interpolation feature: https://msdn.microsoft.com/en-us/library/Dn961160.aspx +.. _customize interpolation: http://www.thomaslevesque.com/2015/02/24/customizing-string-interpolation-in-c-6/ + +Apple's Swift +''''''''''''' + +Arbitrary `interpolation under Swift`_ is available on all strings:: + + let multiplier = 3 + let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)" + // message is "3 times 2.5 is 7.5" + +* Implicit interpolation with double quotes. +* Arbitrary expressions are supported. +* Cannot contain CR/LF. + +.. _interpolation under Swift: https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html#//apple_ref/doc/uid/TP40014097-CH7-ID292 + + +Additional examples +''''''''''''''''''' + +A number of additional examples may be `found at Wikipedia`_. + +.. _found at Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples + +Now that background and imlementation history have been covered, +let's continue on for a solution. + + +New Syntax +---------- + +This should be an option of last resort, +as every new syntax feature has a cost in terms of real-estate in a brain it +inhabits. +There is one alternative left on our list of possibilities, +which follows. + + +New String Prefix +----------------- + +Given the history of string formatting in Python, +backwards-compatibility, +implementations in other languages, +and the avoidance of new syntax unless necessary, +an acceptable design is reached through elimination +rather than unique insight. +Therefore, we choose to explicitly mark interpolated string literals with a +string prefix. + +We also choose an expression syntax that reuses and builds on the strongest of +the existing choices, +``str.format()`` to avoid further duplication. + + +Specification +============= + +String literals with the prefix of ``e`` shall be converted at compile-time to +the construction of an ``estr`` (perhaps ``types.ExpressionString``?) object. +Strings and values are parsed from the literal and passed as tuples to the +constructor:: + + >>> location = 'World' + >>> e'Hello, {location} !' + + # becomes + # estr('Hello, {location} !', # template + ('Hello, ', ' !'), # string fragments + ('location',), # expressions + ('World',), # values + ) + +The object interpolates its result immediately at run-time:: + + 'Hello, World !' + + +ExpressionString Objects +------------------------ + +The ExpressionString object supports both immediate and deferred rendering of +its given template and parameters. +It does this by immediately rendering its inputs to its internal string and +``.rendered`` string member (still necessary?), +useful in the majority of use cases. +To allow for deferred rendering and caller-specified escaping, +all inputs are saved for later inspection, +with convenience methods available. + +Notes: + +* Inputs are saved to the object as ``.template`` and ``.context`` members + for later use. +* No explicit ``str(estr)`` call is necessary to render the result, + though doing so might be desired to free resources if significant. +* Additional or deferred rendering is available through the ``.render()`` + method, which allows template and context to be overriden for flexibility. +* Manual escaping of potentially dangerous input is available through the + ``.escape(escape_function)`` method, + the rules of which may therefore be specified by the caller. + The given function should both accept and return a single modified string. + +* A sample Python implementation can `found at Bitbucket`_: + +.. _found at Bitbucket: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_demo.py + + +Inherits From ``str`` Type +''''''''''''''''''''''''''' + +Inheriting from the ``str`` class is one of the techniques available to improve +compatibility with code expecting a string object, +as it will pass an ``isinstance(obj, str)`` test. +ExpressionString implements this and also renders its result into the "raw" +string of its string superclass, +providing compatibility with a majority of code. + + +Interpolation Syntax +-------------------- + +The strongest of the existing string formatting syntaxes is chosen, +``str.format()`` as a base to build on. [10]_ [11]_ + +.. + +* Additionally, single arbitrary expressions shall also be supported inside + braces as an extension:: + + >>> e'My age is {age + 1} years.' + + See below for section on safety. + +* Triple quoted strings with multiple lines shall be supported:: + + >>> e'''Hello, + {location} !''' + 'Hello,\n World !' + +* Adjacent implicit concatenation shall be supported; + interpolation does not `not bleed into`_ other strings:: + + >>> 'Hello {1, 2, 3} ' e'{location} !' + 'Hello {1, 2, 3} World !' + +* Additional implementation details, + for example expression and error-handling, + are specified in the compatible PEP 498. + +.. _not bleed into: https://mail.python.org/pipermail/python-ideas/2015-July/034763.html + + +Composition with Other Prefixes +------------------------------- + +* Expression-strings apply to unicode objects only, + therefore ``u''`` is never needed. + Should it be prevented? + +* Bytes objects are not included here and do not compose with e'' as they + do not support ``__format__()``. + +* Complimentary to raw strings, + backslash codes shall not be converted in the expression-string, + when combined with ``r''`` as ``re''``. + + +Examples +-------- + +A more complicated example follows:: + + n = 5; # t0, t1 = … TODO + a = e"Sliced {n} onions in {t1-t0:.3f} seconds." + # returns the equvalent of + estr("Sliced {n} onions in {t1-t0:.3f} seconds", # template + ('Sliced ', ' onions in ', ' seconds'), # strings + ('n', 't1-t0:.3f'), # expressions + (5, 0.555555) # values + ) + +With expressions only:: + + b = e"Three random numbers: {rand()}, {rand()}, {rand()}." + # returns the equvalent of + estr("Three random numbers: {rand():f}, {rand():f}, {rand():}.", # template + ('Three random numbers: ', ', ', ', ', '.'), # strings + ('rand():f', 'rand():f', 'rand():f'), # expressions + (rand(), rand(), rand()) # values + ) + + +Safety +----------- + +In this section we will describe the safety situation and precautions taken +in support of expression-strings. + +#. Only string literals shall be considered here, + not variables to be taken as input or passed around, + making external attacks difficult to accomplish. + + * ``str.format()`` `already handles`_ this use-case. + * Direct instantiation of the ExpressionString object with non-literal input + shall not be allowed. (Practicality?) + +#. Neither ``locals()`` nor ``globals()`` are necessary nor used during the + transformation, + avoiding leakage of information. + +#. To eliminate complexity as well as ``RuntimeError`` (s) due to recursion + depth, recursive interpolation is not supported. + +#. Restricted characters or expression classes?, such as ``=`` for assignment. + +However, +mistakes or malicious code could be missed inside string literals. +Though that can be said of code in general, +that these expressions are inside strings means they are a bit more likely +to be obscured. + +.. _already handles: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html + + +Mitigation via tools +'''''''''''''''''''' + +The idea is that tools or linters such as pyflakes, pylint, or Pycharm, +could check inside strings for constructs that exceed project policy. +As this is a common task with languages these days, +tools won't have to implement this feature solely for Python, +significantly shortening time to implementation. + +Additionally the Python interpreter could check(?) and warn with appropriate +command-line parameters passed. + + +Backwards Compatibility +----------------------- + +By using existing syntax and avoiding use of current or historical features, +expression-strings (and any associated sub-features), +were designed so as to not interfere with existing code and is not expected +to cause any issues. + + +Postponed Ideas +--------------- + +Internationalization +'''''''''''''''''''' + +Though it was highly desired to integrate internationalization support, +(see PEP 501), +the finer details diverge at almost every point, +making a common solution unlikely: [15]_ + +* Use-cases +* Compile and run-time tasks +* Interpolation Syntax +* Intended audience +* Security policy + +Rather than try to fit a "square peg in a round hole," +this PEP attempts to allow internationalization to be supported in the future +by not preventing it. +In this proposal, +expression-string inputs are saved for inspection and re-rendering at a later +time, +allowing for their use by an external library of any sort. + + +Rejected Ideas +-------------- + +Restricting Syntax to ``str.format()`` Only +''''''''''''''''''''''''''''''''''''''''''' + +This was deemed not enough of a solution to the problem. +It can be seen in the `Implementations in Other Languages`_ section that the +developer community at large tends to agree. + +The common `arguments against`_ arbitrary expresssions were: + +#. YAGNI, "You ain't gonna need it." +#. The change is not congruent with historical Python conservatism. +#. Postpone - can implement in a future version if need is demonstrated. + +.. _arguments against: https://mail.python.org/pipermail/python-ideas/2015-August/034913.html + + +Additional/Custom String-Prefixes +''''''''''''''''''''''''''''''''' + +As seen in the `Implementations in Other Languages`_ section, +many modern languages have extensible string prefixes with a common interface. +This could be a way to generalize and reduce lines of code in common +situations. +Examples are found in ES6 (Javascript), Scala, Nim, and C# +(to a lesser extent). +This was rejected by the BDFL. [14]_ + + +Automated Escaping of Input Variables +''''''''''''''''''''''''''''''''''''' + +While helpful in some cases, +this was thought to create too much uncertainty of when and where string +expressions could be used safely or not. +The concept was also difficult to describe to others. [12]_ + +Always consider expression-string variables to be unescaped, +unless the developer has explicitly escaped them. + + +Environment Access and Command Substitution +''''''''''''''''''''''''''''''''''''''''''' + +For systems programming and shell-script replacements, +it would be useful to handle environment variables and capture output of +commands directly in an expression string. +This was rejected as not important enough, +and looking too much like bash/perl, +which could encourage bad habits. [13]_ + + +Reference Implementation(s) +=========================== + +An expression-string implementation is currently attached to PEP 498, +under the ``f''`` prefix, +and may be available in nightly builds. + +A Python implementation of Ruby interpolation `is also available`_, +which is similar to this proposal. +It uses the codecs module to do its work:: + + > pip install interpy + + # coding: interpy + location = 'World' + print("Hello #{location}.") + +.. _is also available: https://github.com/syrusakbary/interpy + + +Acknowledgements +================ + +* Eric V. Smith for providing invaluable implementation work and design + opinions, helping to focus this PEP. +* Others on the python-ideas mailing list for rejecting the craziest of ideas, + also helping to achieve focus. + + +References +========== + +.. [1] Briefer String Format + + (https://mail.python.org/pipermail/python-ideas/2015-July/034659.html) + + +.. [2] Briefer String Format + + (https://mail.python.org/pipermail/python-ideas/2015-July/034669.html) + +.. [3] Briefer String Format + + (https://mail.python.org/pipermail/python-ideas/2015-July/034701.html) + +.. [4] Bash Docs + + (http://www.tldp.org/LDP/abs/html/arithexp.html) + +.. [5] Bash Docs + + (http://www.tldp.org/LDP/abs/html/commandsub.html) + +.. [6] Perl Cookbook + + (http://docstore.mik.ua/orelly/perl/cookbook/ch01_11.htm) + +.. [7] Perl Docs + + (http://perl6maven.com/perl6-scalar-array-and-hash-interpolation) + +.. [8] Ruby Docs + + (http://ruby-doc.org/core-2.1.1/doc/syntax/literals_rdoc.html#label-Strings) + +.. [9] Ruby Docs + + (https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#Interpolation) + +.. [10] Python Str.Format Syntax + + (https://docs.python.org/3/library/string.html#format-string-syntax) + +.. [11] Python Format-Spec Mini Language + + (https://docs.python.org/3/library/string.html#format-specification-mini-language) + +.. [12] Escaping of Input Variables + + (https://mail.python.org/pipermail/python-ideas/2015-August/035532.html) + +.. [13] Environment Access and Command Substitution + + (https://mail.python.org/pipermail/python-ideas/2015-August/035554.html) + +.. [14] Extensible String Prefixes + + (https://mail.python.org/pipermail/python-ideas/2015-August/035336.html) + + +.. [15] Literal String Formatting + + (https://mail.python.org/pipermail/python-dev/2015-August/141289.html) + + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: