Apply PEP 502 changes from Mike Miller

This commit is contained in:
Chris Angelico 2015-09-15 11:37:22 +10:00
parent 2e0c8ba1b0
commit 1bd2f20a00
1 changed files with 161 additions and 288 deletions

View File

@ -1,44 +1,46 @@
PEP: 502
Title: String Interpolation Redux
Title: String Interpolation - Extended Discussion
Version: $Revision$
Last-Modified: $Date$
Author: Mike G. Miller
Status: Draft
Type: Standards Track
Type: Informational
Content-Type: text/x-rst
Created: 10-Aug-2015
Python-Version: 3.6
Note: Open issues below are stated with a question mark (?),
and are therefore searchable.
Abstract
========
This proposal describes a new string interpolation feature for Python,
called an *expression-string*,
that is both concise and powerful,
improves readability in most cases,
yet does not conflict with existing code.
PEP 498: *Literal String Interpolation*, which proposed "formatted strings" was
accepted September 9th, 2015.
Additional background and rationale given during its design phase is detailed
below.
To recap that PEP,
a string prefix was introduced that marks the string as a template to be
rendered.
These formatted strings may contain one or more expressions
built on `the existing syntax`_ of ``str.format()``.
The formatted string expands at compile-time into a conventional string format
operation,
with the given expressions from its text extracted and passed instead as
positional arguments.
To achieve this end,
a new string prefix is introduced,
which expands at compile-time into an equivalent expression-string object,
with requested variables from its context passed as keyword arguments.
At runtime,
the new object uses these passed values to render a string to given
specifications, building on `the existing syntax`_ of ``str.format()``::
the resulting expressions are evaluated to render a string to given
specifications::
>>> location = 'World'
>>> e'Hello, {location} !' # new prefix: e''
'Hello, World !' # interpolated result
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
Format-strings may be thought of as merely syntactic sugar to simplify traditional
calls to ``str.format()``.
.. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax
This PEP does not recommend to remove or deprecate any of the existing string
formatting mechanisms.
Motivation
==========
@ -50,12 +52,16 @@ In comparison to other dynamic scripting languages
with similar use cases,
the amount of code necessary to build similar strings is substantially higher,
while at times offering lower readability due to verbosity, dense syntax,
or identifier duplication. [1]_
or identifier duplication.
These difficulties are described at moderate length in the original
`post to python-ideas`_
that started the snowball (that became PEP 498) rolling. [1]_
Furthermore, replacement of the print statement with the more consistent print
function of Python 3 (PEP 3105) has added one additional minor burden,
an additional set of parentheses to type and read.
Combined with the verbosity of current formatting solutions,
Combined with the verbosity of current string formatting solutions,
this puts an otherwise simple language at an unfortunate disadvantage to its
peers::
@ -66,7 +72,7 @@ peers::
# Python 3, str.format with named parameters
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals()))
# Python 3, variation B, worst case
# Python 3, worst case
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user,
id=id,
hostname=
@ -74,7 +80,7 @@ peers::
In Python, the formatting and printing of a string with multiple variables in a
single line of code of standard width is noticeably harder and more verbose,
indentation often exacerbating the issue.
with indentation exacerbating the issue.
For use cases such as smaller projects, systems programming,
shell script replacements, and even one-liners,
@ -82,36 +88,17 @@ where message formatting complexity has yet to be encapsulated,
this verbosity has likely lead a significant number of developers and
administrators to choose other languages over the years.
.. _post to python-ideas: https://mail.python.org/pipermail/python-ideas/2015-July/034659.html
Rationale
=========
Naming
------
The term expression-string was chosen because other applicable terms,
such as format-string and template are already well used in the Python standard
library.
The string prefix itself, ``e''`` was chosen to demonstrate that the
specification enables expressions,
is not limited to ``str.format()`` syntax,
and also does not lend itself to `the shorthand term`_ "f-string".
It is also slightly easier to type than other choices such as ``_''`` and
``i''``,
while perhaps `less odd-looking`_ to C-developers.
``printf('')`` vs. ``print(f'')``.
.. _the shorthand term: reference_needed
.. _less odd-looking: https://mail.python.org/pipermail/python-dev/2015-August/141147.html
Goals
-------------
The design goals of expression-strings are as follows:
The design goals of format strings are as follows:
#. Eliminate need to pass variables manually.
#. Eliminate repetition of identifiers and redundant parentheses.
@ -133,40 +120,44 @@ Python specified both single (``'``) and double (``"``) ASCII quote
characters to enclose strings.
It is not reasonable to choose one of them now to enable interpolation,
while leaving the other for uninterpolated strings.
"Backtick" characters (`````) are also `constrained by history`_ as a shortcut
for ``repr()``.
Other characters,
such as the "Backtick" (or grave accent `````) are also
`constrained by history`_
as a shortcut for ``repr()``.
This leaves a few remaining options for the design of such a feature:
* An operator, as in printf-style string formatting via ``%``.
* A class, such as ``string.Template()``.
* A function, such as ``str.format()``.
* New syntax
* A method or function, such as ``str.format()``.
* New syntax, or
* A new string prefix marker, such as the well-known ``r''`` or ``u''``.
The first three options above currently work well.
The first three options above are mature.
Each has specific use cases and drawbacks,
yet also suffer from the verbosity and visual noise mentioned previously.
All are discussed in the next section.
All options are discussed in the next sections.
.. _constrained by history: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
Background
-------------
This proposal builds on several existing techniques and proposals and what
Formatted strings build on several existing techniques and proposals and what
we've collectively learned from them.
In keeping with the design goals of readability and error-prevention,
the following examples therefore use named,
not positional arguments.
The following examples focus on the design goals of readability and
error-prevention using named parameters.
Let's assume we have the following dictionary,
and would like to print out its items as an informative string for end users::
>>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'}
Printf-style formatting
'''''''''''''''''''''''
Printf-style formatting, via operator
'''''''''''''''''''''''''''''''''''''
This `venerable technique`_ continues to have its uses,
such as with byte-based protocols,
@ -178,7 +169,7 @@ and familiarity to many programmers::
In this form, considering the prerequisite dictionary creation,
the technique is verbose, a tad noisy,
and relatively readable.
yet relatively readable.
Additional issues are that an operator can only take one argument besides the
original string,
meaning multiple parameters must be passed in a tuple or dictionary.
@ -190,8 +181,8 @@ or forget the trailing type, e.g. (``s`` or ``d``).
.. _venerable technique: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
string.Template
'''''''''''''''
string.Template Class
'''''''''''''''''''''
The ``string.Template`` `class from`_ PEP 292
(Simpler String Substitutions)
@ -202,7 +193,7 @@ that finds its main use cases in shell and internationalization tools::
Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params)
Also verbose, however the string itself is readable.
While also verbose, the string itself is readable.
Though functionality is limited,
it meets its requirements well.
It isn't powerful enough for many cases,
@ -232,8 +223,8 @@ and likely contributed to the PEP's lack of acceptance.
It was superseded by the following proposal.
str.format()
''''''''''''
str.format() Method
'''''''''''''''''''
The ``str.format()`` `syntax of`_ PEP 3101 is the most recent and modern of the
existing options.
@ -253,36 +244,32 @@ string literals::
host=hostname)
'Hello, user: nobody, id: 9, on host: darkstar'
The verbosity of the method-based approach is illustrated here.
.. _syntax of: https://docs.python.org/3/library/string.html#format-string-syntax
PEP 498 -- Literal String Formatting
''''''''''''''''''''''''''''''''''''
PEP 498 discusses and delves partially into implementation details of
expression-strings,
which it calls f-strings,
the idea and syntax
(with exception of the prefix letter)
of which is identical to that discussed here.
The resulting compile-time transformation however
returns a string joined from parts at runtime,
rather than an object.
It also, somewhat controversially to those first exposed to it,
introduces the idea that these strings shall be augmented with support for
arbitrary expressions,
which is discussed further in the following sections.
PEP 498 defines and discusses format strings,
as also described in the `Abstract`_ above.
It also, somewhat controversially to those first exposed,
introduces the idea that format-strings shall be augmented with support for
arbitrary expressions.
This is discussed further in the
Restricting Syntax section under
`Rejected Ideas`_.
PEP 501 -- Translation ready string interpolation
'''''''''''''''''''''''''''''''''''''''''''''''''
The complimentary PEP 501 brings internationalization into the discussion as a
first-class concern, with its proposal of i-strings,
first-class concern, with its proposal of the i-prefix,
``string.Template`` syntax integration compatible with ES6 (Javascript),
deferred rendering,
and a similar object return value.
and an object return value.
Implementations in Other Languages
@ -374,7 +361,8 @@ ES6 (Javascript)
Designers of `Template strings`_ faced the same issue as Python where single
and double quotes were taken.
Unlike Python however, "backticks" were not.
They were chosen as part of the ECMAScript 2015 (ES6) standard::
Despite `their issues`_,
they were chosen as part of the ECMAScript 2015 (ES6) standard::
console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`);
@ -391,8 +379,10 @@ as the tag::
* User implemented prefixes supported.
* Arbitrary expressions are supported.
.. _their issues: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
.. _Template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
C#, Version 6
'''''''''''''
@ -428,13 +418,14 @@ Arbitrary `interpolation under Swift`_ is available on all strings::
Additional examples
'''''''''''''''''''
A number of additional examples may be `found at Wikipedia`_.
A number of additional examples of string interpolation may be
`found at Wikipedia`_.
Now that background and history have been covered,
let's continue on for a solution.
.. _found at Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples
Now that background and imlementation history have been covered,
let's continue on for a solution.
New Syntax
----------
@ -442,178 +433,47 @@ New Syntax
This should be an option of last resort,
as every new syntax feature has a cost in terms of real-estate in a brain it
inhabits.
There is one alternative left on our list of possibilities,
There is however one alternative left on our list of possibilities,
which follows.
New String Prefix
-----------------
Given the history of string formatting in Python,
backwards-compatibility,
Given the history of string formatting in Python and backwards-compatibility,
implementations in other languages,
and the avoidance of new syntax unless necessary,
avoidance of new syntax unless necessary,
an acceptable design is reached through elimination
rather than unique insight.
Therefore, we choose to explicitly mark interpolated string literals with a
string prefix.
Therefore, marking interpolated string literals with a string prefix is chosen.
We also choose an expression syntax that reuses and builds on the strongest of
We also choose an expression syntax that reuses and builds on the strongest of
the existing choices,
``str.format()`` to avoid further duplication.
Specification
=============
String literals with the prefix of ``e`` shall be converted at compile-time to
the construction of an ``estr`` (perhaps ``types.ExpressionString``?) object.
Strings and values are parsed from the literal and passed as tuples to the
constructor::
``str.format()`` to avoid further duplication of functionality::
>>> location = 'World'
>>> e'Hello, {location} !'
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
# becomes
# estr('Hello, {location} !', # template
('Hello, ', ' !'), # string fragments
('location',), # expressions
('World',), # values
)
The object interpolates its result immediately at run-time::
'Hello, World !'
PEP 498 -- Literal String Formatting, delves into the mechanics and
implementation of this design.
ExpressionString Objects
------------------------
The ExpressionString object supports both immediate and deferred rendering of
its given template and parameters.
It does this by immediately rendering its inputs to its internal string and
``.rendered`` string member (still necessary?),
useful in the majority of use cases.
To allow for deferred rendering and caller-specified escaping,
all inputs are saved for later inspection,
with convenience methods available.
Notes:
* Inputs are saved to the object as ``.template`` and ``.context`` members
for later use.
* No explicit ``str(estr)`` call is necessary to render the result,
though doing so might be desired to free resources if significant.
* Additional or deferred rendering is available through the ``.render()``
method, which allows template and context to be overriden for flexibility.
* Manual escaping of potentially dangerous input is available through the
``.escape(escape_function)`` method,
the rules of which may therefore be specified by the caller.
The given function should both accept and return a single modified string.
* A sample Python implementation can `found at Bitbucket`_:
.. _found at Bitbucket: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_demo.py
Inherits From ``str`` Type
'''''''''''''''''''''''''''
Inheriting from the ``str`` class is one of the techniques available to improve
compatibility with code expecting a string object,
as it will pass an ``isinstance(obj, str)`` test.
ExpressionString implements this and also renders its result into the "raw"
string of its string superclass,
providing compatibility with a majority of code.
Interpolation Syntax
--------------------
The strongest of the existing string formatting syntaxes is chosen,
``str.format()`` as a base to build on. [10]_ [11]_
..
* Additionally, single arbitrary expressions shall also be supported inside
braces as an extension::
>>> e'My age is {age + 1} years.'
See below for section on safety.
* Triple quoted strings with multiple lines shall be supported::
>>> e'''Hello,
{location} !'''
'Hello,\n World !'
* Adjacent implicit concatenation shall be supported;
interpolation does not `not bleed into`_ other strings::
>>> 'Hello {1, 2, 3} ' e'{location} !'
'Hello {1, 2, 3} World !'
* Additional implementation details,
for example expression and error-handling,
are specified in the compatible PEP 498.
.. _not bleed into: https://mail.python.org/pipermail/python-ideas/2015-July/034763.html
Composition with Other Prefixes
-------------------------------
* Expression-strings apply to unicode objects only,
therefore ``u''`` is never needed.
Should it be prevented?
* Bytes objects are not included here and do not compose with e'' as they
do not support ``__format__()``.
* Complimentary to raw strings,
backslash codes shall not be converted in the expression-string,
when combined with ``r''`` as ``re''``.
Examples
--------
A more complicated example follows::
n = 5; # t0, t1 = … TODO
a = e"Sliced {n} onions in {t1-t0:.3f} seconds."
# returns the equvalent of
estr("Sliced {n} onions in {t1-t0:.3f} seconds", # template
('Sliced ', ' onions in ', ' seconds'), # strings
('n', 't1-t0:.3f'), # expressions
(5, 0.555555) # values
)
With expressions only::
b = e"Three random numbers: {rand()}, {rand()}, {rand()}."
# returns the equvalent of
estr("Three random numbers: {rand():f}, {rand():f}, {rand():}.", # template
('Three random numbers: ', ', ', ', ', '.'), # strings
('rand():f', 'rand():f', 'rand():f'), # expressions
(rand(), rand(), rand()) # values
)
Additional Topics
=================
Safety
-----------
In this section we will describe the safety situation and precautions taken
in support of expression-strings.
in support of format-strings.
#. Only string literals shall be considered here,
#. Only string literals have been considered for format-strings,
not variables to be taken as input or passed around,
making external attacks difficult to accomplish.
* ``str.format()`` `already handles`_ this use-case.
* Direct instantiation of the ExpressionString object with non-literal input
shall not be allowed. (Practicality?)
``str.format()`` and alternatives `already handle`_ this use-case.
#. Neither ``locals()`` nor ``globals()`` are necessary nor used during the
transformation,
@ -622,37 +482,72 @@ in support of expression-strings.
#. To eliminate complexity as well as ``RuntimeError`` (s) due to recursion
depth, recursive interpolation is not supported.
#. Restricted characters or expression classes?, such as ``=`` for assignment.
However,
mistakes or malicious code could be missed inside string literals.
Though that can be said of code in general,
that these expressions are inside strings means they are a bit more likely
to be obscured.
.. _already handles: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
.. _already handle: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
Mitigation via tools
Mitigation via Tools
''''''''''''''''''''
The idea is that tools or linters such as pyflakes, pylint, or Pycharm,
could check inside strings for constructs that exceed project policy.
As this is a common task with languages these days,
tools won't have to implement this feature solely for Python,
may check inside strings with expressions and mark them up appropriately.
As this is a common task with programming languages today,
multi-language tools won't have to implement this feature solely for Python,
significantly shortening time to implementation.
Additionally the Python interpreter could check(?) and warn with appropriate
command-line parameters passed.
Farther in the future,
strings might also be checked for constructs that exceed the safety policy of
a project.
Style Guide/Precautions
-----------------------
As arbitrary expressions may accomplish anything a Python expression is
able to,
it is highly recommended to avoid constructs inside format-strings that could
cause side effects.
Further guidelines may be written once usage patterns and true problems are
known.
Reference Implementation(s)
---------------------------
The `say module on PyPI`_ implements string interpolation as described here
with the small burden of a callable interface::
pip install say
from say import say
nums = list(range(4))
say("Nums has {len(nums)} items: {nums}")
A Python implementation of Ruby interpolation `is also available`_.
It uses the codecs module to do its work::
pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
.. _say module on PyPI: https://pypi.python.org/pypi/say/
.. _is also available: https://github.com/syrusakbary/interpy
Backwards Compatibility
-----------------------
By using existing syntax and avoiding use of current or historical features,
expression-strings (and any associated sub-features),
were designed so as to not interfere with existing code and is not expected
to cause any issues.
By using existing syntax and avoiding current or historical features,
format strings were designed so as to not interfere with existing code and are
not expected to cause any issues.
Postponed Ideas
@ -666,20 +561,12 @@ Though it was highly desired to integrate internationalization support,
the finer details diverge at almost every point,
making a common solution unlikely: [15]_
* Use-cases
* Compile and run-time tasks
* Interpolation Syntax
* Use-cases differ
* Compile vs. run-time tasks
* Interpolation syntax needs
* Intended audience
* Security policy
Rather than try to fit a "square peg in a round hole,"
this PEP attempts to allow internationalization to be supported in the future
by not preventing it.
In this proposal,
expression-string inputs are saved for inspection and re-rendering at a later
time,
allowing for their use by an external library of any sort.
Rejected Ideas
--------------
@ -687,18 +574,25 @@ Rejected Ideas
Restricting Syntax to ``str.format()`` Only
'''''''''''''''''''''''''''''''''''''''''''
This was deemed not enough of a solution to the problem.
It can be seen in the `Implementations in Other Languages`_ section that the
developer community at large tends to agree.
The common `arguments against`_ support of arbitrary expresssions were:
The common `arguments against`_ arbitrary expresssions were:
#. YAGNI, "You ain't gonna need it."
#. The change is not congruent with historical Python conservatism.
#. `YAGNI`_, "You aren't gonna need it."
#. The feature is not congruent with historical Python conservatism.
#. Postpone - can implement in a future version if need is demonstrated.
.. _YAGNI: https://en.wikipedia.org/wiki/You_aren't_gonna_need_it
.. _arguments against: https://mail.python.org/pipermail/python-ideas/2015-August/034913.html
Support of only ``str.format()`` syntax however,
was deemed not enough of a solution to the problem.
Often a simple length or increment of an object, for example,
is desired before printing.
It can be seen in the `Implementations in Other Languages`_ section that the
developer community at large tends to agree.
String interpolation with arbitrary expresssions is becoming an industry
standard in modern languages due to its utility.
Additional/Custom String-Prefixes
'''''''''''''''''''''''''''''''''
@ -720,7 +614,7 @@ this was thought to create too much uncertainty of when and where string
expressions could be used safely or not.
The concept was also difficult to describe to others. [12]_
Always consider expression-string variables to be unescaped,
Always consider format string variables to be unescaped,
unless the developer has explicitly escaped them.
@ -735,33 +629,13 @@ and looking too much like bash/perl,
which could encourage bad habits. [13]_
Reference Implementation(s)
===========================
An expression-string implementation is currently attached to PEP 498,
under the ``f''`` prefix,
and may be available in nightly builds.
A Python implementation of Ruby interpolation `is also available`_,
which is similar to this proposal.
It uses the codecs module to do its work::
pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
.. _is also available: https://github.com/syrusakbary/interpy
Acknowledgements
================
* Eric V. Smith for providing invaluable implementation work and design
opinions, helping to focus this PEP.
* Others on the python-ideas mailing list for rejecting the craziest of ideas,
also helping to achieve focus.
* Eric V. Smith for the authoring and implementation of PEP 498.
* Everyone on the python-ideas mailing list for rejecting the various crazy
ideas that came up,
helping to keep the final design in focus.
References
@ -771,7 +645,6 @@ References
(https://mail.python.org/pipermail/python-ideas/2015-July/034659.html)
.. [2] Briefer String Format
(https://mail.python.org/pipermail/python-ideas/2015-July/034669.html)