python-peps/pep-0502.txt

847 lines
27 KiB
Plaintext
Raw Normal View History

PEP: 502
Title: String Interpolation Redux
Version: $Revision$
Last-Modified: $Date$
Author: Mike G. Miller
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Aug-2015
Python-Version: 3.6
Note: Open issues below are stated with a question mark (?),
and are therefore searchable.
Abstract
========
This proposal describes a new string interpolation feature for Python,
called an *expression-string*,
that is both concise and powerful,
improves readability in most cases,
yet does not conflict with existing code.
To achieve this end,
a new string prefix is introduced,
which expands at compile-time into an equivalent expression-string object,
with requested variables from its context passed as keyword arguments.
At runtime,
the new object uses these passed values to render a string to given
specifications, building on `the existing syntax`_ of ``str.format()``::
>>> location = 'World'
>>> e'Hello, {location} !' # new prefix: e''
'Hello, World !' # interpolated result
.. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax
This PEP does not recommend to remove or deprecate any of the existing string
formatting mechanisms.
Motivation
==========
Though string formatting and manipulation features are plentiful in Python,
one area where it falls short
is the lack of a convenient string interpolation syntax.
In comparison to other dynamic scripting languages
with similar use cases,
the amount of code necessary to build similar strings is substantially higher,
while at times offering lower readability due to verbosity, dense syntax,
or identifier duplication. [1]_
Furthermore, replacement of the print statement with the more consistent print
function of Python 3 (PEP 3105) has added one additional minor burden,
an additional set of parentheses to type and read.
Combined with the verbosity of current formatting solutions,
this puts an otherwise simple language at an unfortunate disadvantage to its
peers::
echo "Hello, user: $user, id: $id, on host: $hostname" # bash
say "Hello, user: $user, id: $id, on host: $hostname"; # perl
puts "Hello, user: #{user}, id: #{id}, on host: #{hostname}\n" # ruby
# 80 ch -->|
# Python 3, str.format with named parameters
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals()))
# Python 3, variation B, worst case
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user,
id=id,
hostname=
hostname))
In Python, the formatting and printing of a string with multiple variables in a
single line of code of standard width is noticeably harder and more verbose,
indentation often exacerbating the issue.
For use cases such as smaller projects, systems programming,
shell script replacements, and even one-liners,
where message formatting complexity has yet to be encapsulated,
this verbosity has likely lead a significant number of developers and
administrators to choose other languages over the years.
Rationale
=========
Naming
------
The term expression-string was chosen because other applicable terms,
such as format-string and template are already well used in the Python standard
library.
The string prefix itself, ``e''`` was chosen to demonstrate that the
specification enables expressions,
is not limited to ``str.format()`` syntax,
and also does not lend itself to `the shorthand term`_ "f-string".
It is also slightly easier to type than other choices such as ``_''`` and
``i''``,
while perhaps `less odd-looking`_ to C-developers.
``printf('')`` vs. ``print(f'')``.
.. _the shorthand term: reference_needed
.. _less odd-looking: https://mail.python.org/pipermail/python-dev/2015-August/141147.html
Goals
-------------
The design goals of expression-strings are as follows:
#. Eliminate need to pass variables manually.
#. Eliminate repetition of identifiers and redundant parentheses.
#. Reduce awkward syntax, punctuation characters, and visual noise.
#. Improve readability and eliminate mismatch errors,
by prefering named parameters to positional arguments.
#. Avoid need for ``locals()`` and ``globals()`` usage,
instead parsing the given string for named parameters,
then passing them automatically. [2]_ [3]_
Limitations
-------------
In contrast to other languages that take design cues from Unix and its
shells,
and in common with Javascript,
Python specified both single (``'``) and double (``"``) ASCII quote
characters to enclose strings.
It is not reasonable to choose one of them now to enable interpolation,
while leaving the other for uninterpolated strings.
"Backtick" characters (`````) are also `constrained by history`_ as a shortcut
for ``repr()``.
This leaves a few remaining options for the design of such a feature:
* An operator, as in printf-style string formatting via ``%``.
* A class, such as ``string.Template()``.
* A function, such as ``str.format()``.
* New syntax
* A new string prefix marker, such as the well-known ``r''`` or ``u''``.
The first three options above currently work well.
Each has specific use cases and drawbacks,
yet also suffer from the verbosity and visual noise mentioned previously.
All are discussed in the next section.
.. _constrained by history: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
Background
-------------
This proposal builds on several existing techniques and proposals and what
we've collectively learned from them.
The following examples focus on the design goals of readability and
error-prevention using named parameters.
Let's assume we have the following dictionary,
and would like to print out its items as an informative string for end users::
>>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'}
Printf-style formatting
'''''''''''''''''''''''
This `venerable technique`_ continues to have its uses,
such as with byte-based protocols,
simplicity in simple cases,
and familiarity to many programmers::
>>> 'Hello, user: %(user)s, id: %(id)s, on host: %(hostname)s' % params
'Hello, user: nobody, id: 9, on host: darkstar'
In this form, considering the prerequisite dictionary creation,
the technique is verbose, a tad noisy,
and relatively readable.
Additional issues are that an operator can only take one argument besides the
original string,
meaning multiple parameters must be passed in a tuple or dictionary.
Also, it is relatively easy to make an error in the number of arguments passed,
the expected type,
have a missing key,
or forget the trailing type, e.g. (``s`` or ``d``).
.. _venerable technique: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
string.Template
'''''''''''''''
The ``string.Template`` `class from`_ PEP 292
(Simpler String Substitutions)
is a purposely simplified design,
using familiar shell interpolation syntax,
with `safe-substitution feature`_,
that finds its main use cases in shell and internationalization tools::
Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params)
Also verbose, however the string itself is readable.
Though functionality is limited,
it meets its requirements well.
It isn't powerful enough for many cases,
and that helps keep inexperienced users out of trouble,
as well as avoiding issues with moderately-trusted input (i18n) from
third-parties.
It unfortunately takes enough code to discourage its use for ad-hoc string
interpolation,
unless encapsulated in a `convenience library`_ such as ``flufl.i18n``.
.. _class from: https://docs.python.org/3/library/string.html#template-strings
.. _safe-substitution feature: https://docs.python.org/3/library/string.html#string.Template.safe_substitute
.. _convenience library: http://pythonhosted.org/flufl.i18n/
PEP 215 - String Interpolation
''''''''''''''''''''''''''''''
PEP 215 was a former proposal of which this one shares a lot in common.
Apparently, the world was not ready for it at the time,
but considering recent support in a number of other languages,
its day may have come.
The large number of dollar sign (``$``) characters it included may have
led it to resemble Python's arch-nemesis Perl,
and likely contributed to the PEP's lack of acceptance.
It was superseded by the following proposal.
str.format()
''''''''''''
The ``str.format()`` `syntax of`_ PEP 3101 is the most recent and modern of the
existing options.
It is also more powerful and usually easier to read than the others.
It avoids many of the drawbacks and limits of the previous techniques.
However, due to its necessary function call and parameter passing,
it runs from verbose to very verbose in various situations with
string literals::
>>> 'Hello, user: {user}, id: {id}, on host: {hostname}'.format(**params)
'Hello, user: nobody, id: 9, on host: darkstar'
# when using keyword args, var name shortening sometimes needed to fit :/
>>> 'Hello, user: {user}, id: {id}, on host: {host}'.format(user=user,
id=id,
host=hostname)
'Hello, user: nobody, id: 9, on host: darkstar'
.. _syntax of: https://docs.python.org/3/library/string.html#format-string-syntax
PEP 498 -- Literal String Formatting
''''''''''''''''''''''''''''''''''''
PEP 498 discusses and delves partially into implementation details of
expression-strings,
which it calls f-strings,
the idea and syntax
(with exception of the prefix letter)
of which is identical to that discussed here.
The resulting compile-time transformation however
returns a string joined from parts at runtime,
rather than an object.
It also, somewhat controversially to those first exposed to it,
introduces the idea that these strings shall be augmented with support for
arbitrary expressions,
which is discussed further in the following sections.
PEP 501 -- Translation ready string interpolation
'''''''''''''''''''''''''''''''''''''''''''''''''
The complimentary PEP 501 brings internationalization into the discussion as a
first-class concern, with its proposal of i-strings,
``string.Template`` syntax integration compatible with ES6 (Javascript),
deferred rendering,
and a similar object return value.
Implementations in Other Languages
----------------------------------
String interpolation is now well supported by various programming languages
used in multiple industries,
and is converging into a standard of sorts.
It is centered around ``str.format()`` style syntax in minor variations,
with the addition of arbitrary expressions to expand utility.
In the `Motivation`_ section it was shown how convenient interpolation syntax
existed in Bash, Perl, and Ruby.
Let's take a look at their expression support.
Bash
''''
Bash supports a number of arbitrary, even recursive constructs inside strings::
> echo "user: $USER, id: $((id + 6)) on host: $(echo is $(hostname))"
user: nobody, id: 15 on host: is darkstar
* Explicit interpolation within double quotes.
* Direct environment variable access supported.
* Arbitrary expressions are supported. [4]_
* External process execution and output capture supported. [5]_
* Recursive expressions are supported.
Perl
''''
Perl also has arbitrary expression constructs, perhaps not as well known::
say "I have @{[$id + 6]} guanacos."; # lists
say "I have ${\($id + 6)} guanacos."; # scalars
say "Hello { @names.join(', ') } how are you?"; # Perl 6 version
* Explicit interpolation within double quotes.
* Arbitrary expressions are supported. [6]_ [7]_
Ruby
''''
Ruby allows arbitrary expressions in its interpolated strings::
puts "One plus one is two: #{1 + 1}\n"
* Explicit interpolation within double quotes.
* Arbitrary expressions are supported. [8]_ [9]_
* Possible to change delimiter chars with ``%``.
* See the Reference Implementation(s) section for an implementation in Python.
Others
''''''
Let's look at some less-similar modern languages recently implementing string
interpolation.
Scala
'''''
`Scala interpolation`_ is directed through string prefixes.
Each prefix has a different result::
s"Hello, $name ${1 + 1}" # arbitrary
f"$name%s is $height%2.2f meters tall" # printf-style
raw"a\nb" # raw, like r''
These prefixes may also be implemented by the user,
by extending Scala's ``StringContext`` class.
* Explicit interpolation within double quotes with literal prefix.
* User implemented prefixes supported.
* Arbitrary expressions are supported.
.. _Scala interpolation: http://docs.scala-lang.org/overviews/core/string-interpolation.html
ES6 (Javascript)
'''''''''''''''''''
Designers of `Template strings`_ faced the same issue as Python where single
and double quotes were taken.
Unlike Python however, "backticks" were not.
They were chosen as part of the ECMAScript 2015 (ES6) standard::
console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`);
Custom prefixes are also supported by implementing a function the same name
as the tag::
function tag(strings, ...values) {
console.log(strings.raw[0]); // raw string is also available
return "Bazinga!";
}
tag`Hello ${ a + b } world ${ a * b}`;
* Explicit interpolation within backticks.
* User implemented prefixes supported.
* Arbitrary expressions are supported.
.. _Template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
C#, Version 6
'''''''''''''
C# has a useful new `interpolation feature`_ as well,
with some ability to `customize interpolation`_ via the ``IFormattable``
interface::
$"{person.Name, 20} is {person.Age:D3} year{(p.Age == 1 ? "" : "s")} old.";
* Explicit interpolation with double quotes and ``$`` prefix.
* Custom interpolations are available.
* Arbitrary expressions are supported.
.. _interpolation feature: https://msdn.microsoft.com/en-us/library/Dn961160.aspx
.. _customize interpolation: http://www.thomaslevesque.com/2015/02/24/customizing-string-interpolation-in-c-6/
Apple's Swift
'''''''''''''
Arbitrary `interpolation under Swift`_ is available on all strings::
let multiplier = 3
let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)"
// message is "3 times 2.5 is 7.5"
* Implicit interpolation with double quotes.
* Arbitrary expressions are supported.
* Cannot contain CR/LF.
.. _interpolation under Swift: https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html#//apple_ref/doc/uid/TP40014097-CH7-ID292
Additional examples
'''''''''''''''''''
A number of additional examples may be `found at Wikipedia`_.
.. _found at Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples
Now that background and imlementation history have been covered,
let's continue on for a solution.
New Syntax
----------
This should be an option of last resort,
as every new syntax feature has a cost in terms of real-estate in a brain it
inhabits.
There is one alternative left on our list of possibilities,
which follows.
New String Prefix
-----------------
Given the history of string formatting in Python,
backwards-compatibility,
implementations in other languages,
and the avoidance of new syntax unless necessary,
an acceptable design is reached through elimination
rather than unique insight.
Therefore, we choose to explicitly mark interpolated string literals with a
string prefix.
We also choose an expression syntax that reuses and builds on the strongest of
the existing choices,
``str.format()`` to avoid further duplication.
Specification
=============
String literals with the prefix of ``e`` shall be converted at compile-time to
the construction of an ``estr`` (perhaps ``types.ExpressionString``?) object.
Strings and values are parsed from the literal and passed as tuples to the
constructor::
>>> location = 'World'
>>> e'Hello, {location} !'
# becomes
# estr('Hello, {location} !', # template
('Hello, ', ' !'), # string fragments
('location',), # expressions
('World',), # values
)
The object interpolates its result immediately at run-time::
'Hello, World !'
ExpressionString Objects
------------------------
The ExpressionString object supports both immediate and deferred rendering of
its given template and parameters.
It does this by immediately rendering its inputs to its internal string and
``.rendered`` string member (still necessary?),
useful in the majority of use cases.
To allow for deferred rendering and caller-specified escaping,
all inputs are saved for later inspection,
with convenience methods available.
Notes:
* Inputs are saved to the object as ``.template`` and ``.context`` members
for later use.
* No explicit ``str(estr)`` call is necessary to render the result,
though doing so might be desired to free resources if significant.
* Additional or deferred rendering is available through the ``.render()``
method, which allows template and context to be overriden for flexibility.
* Manual escaping of potentially dangerous input is available through the
``.escape(escape_function)`` method,
the rules of which may therefore be specified by the caller.
The given function should both accept and return a single modified string.
* A sample Python implementation can `found at Bitbucket`_:
.. _found at Bitbucket: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_demo.py
Inherits From ``str`` Type
'''''''''''''''''''''''''''
Inheriting from the ``str`` class is one of the techniques available to improve
compatibility with code expecting a string object,
as it will pass an ``isinstance(obj, str)`` test.
ExpressionString implements this and also renders its result into the "raw"
string of its string superclass,
providing compatibility with a majority of code.
Interpolation Syntax
--------------------
The strongest of the existing string formatting syntaxes is chosen,
``str.format()`` as a base to build on. [10]_ [11]_
..
* Additionally, single arbitrary expressions shall also be supported inside
braces as an extension::
>>> e'My age is {age + 1} years.'
See below for section on safety.
* Triple quoted strings with multiple lines shall be supported::
>>> e'''Hello,
{location} !'''
'Hello,\n World !'
* Adjacent implicit concatenation shall be supported;
interpolation does not `not bleed into`_ other strings::
>>> 'Hello {1, 2, 3} ' e'{location} !'
'Hello {1, 2, 3} World !'
* Additional implementation details,
for example expression and error-handling,
are specified in the compatible PEP 498.
.. _not bleed into: https://mail.python.org/pipermail/python-ideas/2015-July/034763.html
Composition with Other Prefixes
-------------------------------
* Expression-strings apply to unicode objects only,
therefore ``u''`` is never needed.
Should it be prevented?
* Bytes objects are not included here and do not compose with e'' as they
do not support ``__format__()``.
* Complimentary to raw strings,
backslash codes shall not be converted in the expression-string,
when combined with ``r''`` as ``re''``.
Examples
--------
A more complicated example follows::
n = 5; # t0, t1 = … TODO
a = e"Sliced {n} onions in {t1-t0:.3f} seconds."
# returns the equvalent of
estr("Sliced {n} onions in {t1-t0:.3f} seconds", # template
('Sliced ', ' onions in ', ' seconds'), # strings
('n', 't1-t0:.3f'), # expressions
(5, 0.555555) # values
)
With expressions only::
b = e"Three random numbers: {rand()}, {rand()}, {rand()}."
# returns the equvalent of
estr("Three random numbers: {rand():f}, {rand():f}, {rand():}.", # template
('Three random numbers: ', ', ', ', ', '.'), # strings
('rand():f', 'rand():f', 'rand():f'), # expressions
(rand(), rand(), rand()) # values
)
Safety
-----------
In this section we will describe the safety situation and precautions taken
in support of expression-strings.
#. Only string literals shall be considered here,
not variables to be taken as input or passed around,
making external attacks difficult to accomplish.
* ``str.format()`` `already handles`_ this use-case.
* Direct instantiation of the ExpressionString object with non-literal input
shall not be allowed. (Practicality?)
#. Neither ``locals()`` nor ``globals()`` are necessary nor used during the
transformation,
avoiding leakage of information.
#. To eliminate complexity as well as ``RuntimeError`` (s) due to recursion
depth, recursive interpolation is not supported.
#. Restricted characters or expression classes?, such as ``=`` for assignment.
However,
mistakes or malicious code could be missed inside string literals.
Though that can be said of code in general,
that these expressions are inside strings means they are a bit more likely
to be obscured.
.. _already handles: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
Mitigation via tools
''''''''''''''''''''
The idea is that tools or linters such as pyflakes, pylint, or Pycharm,
could check inside strings for constructs that exceed project policy.
As this is a common task with languages these days,
tools won't have to implement this feature solely for Python,
significantly shortening time to implementation.
Additionally the Python interpreter could check(?) and warn with appropriate
command-line parameters passed.
Backwards Compatibility
-----------------------
By using existing syntax and avoiding use of current or historical features,
expression-strings (and any associated sub-features),
were designed so as to not interfere with existing code and is not expected
to cause any issues.
Postponed Ideas
---------------
Internationalization
''''''''''''''''''''
Though it was highly desired to integrate internationalization support,
(see PEP 501),
the finer details diverge at almost every point,
making a common solution unlikely: [15]_
* Use-cases
* Compile and run-time tasks
* Interpolation Syntax
* Intended audience
* Security policy
Rather than try to fit a "square peg in a round hole,"
this PEP attempts to allow internationalization to be supported in the future
by not preventing it.
In this proposal,
expression-string inputs are saved for inspection and re-rendering at a later
time,
allowing for their use by an external library of any sort.
Rejected Ideas
--------------
Restricting Syntax to ``str.format()`` Only
'''''''''''''''''''''''''''''''''''''''''''
This was deemed not enough of a solution to the problem.
It can be seen in the `Implementations in Other Languages`_ section that the
developer community at large tends to agree.
The common `arguments against`_ arbitrary expresssions were:
#. YAGNI, "You ain't gonna need it."
#. The change is not congruent with historical Python conservatism.
#. Postpone - can implement in a future version if need is demonstrated.
.. _arguments against: https://mail.python.org/pipermail/python-ideas/2015-August/034913.html
Additional/Custom String-Prefixes
'''''''''''''''''''''''''''''''''
As seen in the `Implementations in Other Languages`_ section,
many modern languages have extensible string prefixes with a common interface.
This could be a way to generalize and reduce lines of code in common
situations.
Examples are found in ES6 (Javascript), Scala, Nim, and C#
(to a lesser extent).
This was rejected by the BDFL. [14]_
Automated Escaping of Input Variables
'''''''''''''''''''''''''''''''''''''
While helpful in some cases,
this was thought to create too much uncertainty of when and where string
expressions could be used safely or not.
The concept was also difficult to describe to others. [12]_
Always consider expression-string variables to be unescaped,
unless the developer has explicitly escaped them.
Environment Access and Command Substitution
'''''''''''''''''''''''''''''''''''''''''''
For systems programming and shell-script replacements,
it would be useful to handle environment variables and capture output of
commands directly in an expression string.
This was rejected as not important enough,
and looking too much like bash/perl,
which could encourage bad habits. [13]_
Reference Implementation(s)
===========================
An expression-string implementation is currently attached to PEP 498,
under the ``f''`` prefix,
and may be available in nightly builds.
A Python implementation of Ruby interpolation `is also available`_,
which is similar to this proposal.
It uses the codecs module to do its work::
pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
.. _is also available: https://github.com/syrusakbary/interpy
Acknowledgements
================
* Eric V. Smith for providing invaluable implementation work and design
opinions, helping to focus this PEP.
* Others on the python-ideas mailing list for rejecting the craziest of ideas,
also helping to achieve focus.
References
==========
.. [1] Briefer String Format
(https://mail.python.org/pipermail/python-ideas/2015-July/034659.html)
.. [2] Briefer String Format
(https://mail.python.org/pipermail/python-ideas/2015-July/034669.html)
.. [3] Briefer String Format
(https://mail.python.org/pipermail/python-ideas/2015-July/034701.html)
.. [4] Bash Docs
(http://www.tldp.org/LDP/abs/html/arithexp.html)
.. [5] Bash Docs
(http://www.tldp.org/LDP/abs/html/commandsub.html)
.. [6] Perl Cookbook
(http://docstore.mik.ua/orelly/perl/cookbook/ch01_11.htm)
.. [7] Perl Docs
(http://perl6maven.com/perl6-scalar-array-and-hash-interpolation)
.. [8] Ruby Docs
(http://ruby-doc.org/core-2.1.1/doc/syntax/literals_rdoc.html#label-Strings)
.. [9] Ruby Docs
(https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#Interpolation)
.. [10] Python Str.Format Syntax
(https://docs.python.org/3/library/string.html#format-string-syntax)
.. [11] Python Format-Spec Mini Language
(https://docs.python.org/3/library/string.html#format-specification-mini-language)
.. [12] Escaping of Input Variables
(https://mail.python.org/pipermail/python-ideas/2015-August/035532.html)
.. [13] Environment Access and Command Substitution
(https://mail.python.org/pipermail/python-ideas/2015-August/035554.html)
.. [14] Extensible String Prefixes
(https://mail.python.org/pipermail/python-ideas/2015-August/035336.html)
.. [15] Literal String Formatting
(https://mail.python.org/pipermail/python-dev/2015-August/141289.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: