2002-06-18 22:53:41 -04:00
|
|
|
|
PEP: 292
|
|
|
|
|
Title: Simpler String Substitutions
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2003-09-22 00:51:50 -04:00
|
|
|
|
Author: barry@python.org (Barry A. Warsaw)
|
2002-06-18 22:53:41 -04:00
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Created: 18-Jun-2002
|
2004-03-23 22:08:02 -05:00
|
|
|
|
Python-Version: 2.4
|
|
|
|
|
Post-History: 18-Jun-2002, 23-Mar-2004
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP describes a simpler string substitution feature, also
|
|
|
|
|
known as string interpolation. This PEP is "simpler" in two
|
|
|
|
|
respects:
|
|
|
|
|
|
2004-03-23 22:08:02 -05:00
|
|
|
|
1. Python's current string substitution feature
|
|
|
|
|
(i.e. %-substitution) is complicated and error prone. This PEP
|
|
|
|
|
is simpler at the cost of some expressiveness.
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
2. PEP 215 proposed an alternative string interpolation feature,
|
|
|
|
|
introducing a new `$' string prefix. PEP 292 is simpler than
|
|
|
|
|
this because it involves no syntax changes and has much simpler
|
|
|
|
|
rules for what substitutions can occur in the string.
|
2004-03-23 22:08:02 -05:00
|
|
|
|
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
2004-03-23 22:08:02 -05:00
|
|
|
|
Python currently supports a string substitution syntax based on
|
|
|
|
|
C's printf() '%' formatting character[1]. While quite rich,
|
|
|
|
|
%-formatting codes are also error prone, even for
|
|
|
|
|
experienced Python programmers. A common mistake is to leave off
|
|
|
|
|
the trailing format character, e.g. the `s' in "%(name)s".
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
In addition, the rules for what can follow a % sign are fairly
|
2004-03-23 22:08:02 -05:00
|
|
|
|
complex, while the usual application rarely needs such complexity.
|
2002-06-22 22:20:50 -04:00
|
|
|
|
Most scripts need to do some string interpolation, but most of
|
|
|
|
|
those use simple `stringification' formats, i.e. %s or %(name)s
|
|
|
|
|
This form should be made simpler and less error prone.
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A Simpler Proposal
|
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
We propose the addition of a new class -- called 'Template', which
|
2004-08-10 18:19:59 -04:00
|
|
|
|
will live in the string module -- derived from the built-in
|
2004-08-20 10:20:38 -04:00
|
|
|
|
unicode type. The Template class supports new rules for string
|
2004-08-10 18:19:59 -04:00
|
|
|
|
substitution; its value contains placeholders, introduced with the
|
|
|
|
|
$ character. The following rules for $-placeholders apply:
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
1. $$ is an escape; it is replaced with a single $
|
|
|
|
|
|
|
|
|
|
2. $identifier names a substitution placeholder matching a mapping
|
2004-03-23 22:08:02 -05:00
|
|
|
|
key of "identifier". By default, "identifier" must spell a
|
|
|
|
|
Python identifier as defined in [2]. The first non-identifier
|
|
|
|
|
character after the $ character terminates this placeholder
|
|
|
|
|
specification.
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2004-08-10 18:19:59 -04:00
|
|
|
|
3. ${identifier} is equivalent to $identifier. It is required
|
2002-06-19 23:58:03 -04:00
|
|
|
|
when valid identifier characters follow the placeholder but are
|
|
|
|
|
not part of the placeholder, e.g. "${noun}ification".
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2004-06-16 15:31:35 -04:00
|
|
|
|
If the $ character appears at the end of the line, or is followed
|
|
|
|
|
by any other character than those described above, it is treated
|
|
|
|
|
as if it had been escaped, appearing in the resulting string
|
2004-08-20 10:20:38 -04:00
|
|
|
|
unchanged. NOTE: see open issues below.
|
2004-06-16 15:31:35 -04:00
|
|
|
|
|
2004-03-23 22:08:02 -05:00
|
|
|
|
No other characters have special meaning, however it is possible
|
2004-08-20 10:20:38 -04:00
|
|
|
|
to derive from the Template class to define different rules for
|
2004-08-10 18:19:59 -04:00
|
|
|
|
the placeholder. For example, a derived class could allow for
|
|
|
|
|
periods in the placeholder (e.g. to support a kind of dynamic
|
|
|
|
|
namespace and attribute path lookup).
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
Once the Template has been created, substitutions can be performed
|
2004-03-23 22:08:02 -05:00
|
|
|
|
using traditional Python syntax. For example:
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
>>> from string import Template
|
2004-03-23 22:08:02 -05:00
|
|
|
|
>>> mapping = dict(name='Guido', country='the Netherlands')
|
2004-08-20 10:20:38 -04:00
|
|
|
|
>>> s = Template('${name} was born in ${country}')
|
2004-03-23 22:08:02 -05:00
|
|
|
|
>>> print s % mapping
|
|
|
|
|
Guido was born in the Netherlands
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
Another class is provided which derives from Template. This class
|
|
|
|
|
is called 'SafeTemplate' and supports rules identical to those
|
|
|
|
|
above. The difference between Template instances and SafeTemplate
|
|
|
|
|
instances is that in SafeTemplate if a placeholder is missing from
|
2004-08-10 18:19:59 -04:00
|
|
|
|
the interpolation mapping, no KeyError is raised. Instead, the
|
|
|
|
|
original placeholder is included in the result string unchanged.
|
|
|
|
|
For example:
|
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
>>> from string import Template, SafeTemplate
|
2004-08-10 18:19:59 -04:00
|
|
|
|
>>> mapping = dict(name='Guido', country='the Netherlands')
|
2004-08-20 10:20:38 -04:00
|
|
|
|
>>> s = Template('$who was born in $country')
|
2004-08-10 18:19:59 -04:00
|
|
|
|
>>> print s % mapping
|
|
|
|
|
Traceback (most recent call last):
|
|
|
|
|
[...traceback omitted...]
|
|
|
|
|
KeyError: u'who'
|
2004-08-20 10:20:38 -04:00
|
|
|
|
>>> s = SafeTemplate('$who was born in $country')
|
2004-08-10 18:19:59 -04:00
|
|
|
|
>>> print s % mapping
|
|
|
|
|
$who was born in the Netherlands
|
|
|
|
|
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2002-06-19 23:58:03 -04:00
|
|
|
|
Why `$' and Braces?
|
|
|
|
|
|
|
|
|
|
The BDFL said it best: The $ means "substitution" in so many
|
2004-03-23 22:08:02 -05:00
|
|
|
|
languages besides Perl that I wonder where you've been. [...]
|
2002-06-19 23:58:03 -04:00
|
|
|
|
We're copying this from the shell.
|
|
|
|
|
|
|
|
|
|
|
2002-06-18 22:53:41 -04:00
|
|
|
|
Comparison to PEP 215
|
|
|
|
|
|
|
|
|
|
PEP 215 describes an alternate proposal for string interpolation.
|
|
|
|
|
Unlike that PEP, this one does not propose any new syntax for
|
|
|
|
|
Python. All the proposed new features are embodied in a new
|
2004-03-23 22:08:02 -05:00
|
|
|
|
library module. PEP 215 proposes a new string prefix
|
2002-06-18 22:53:41 -04:00
|
|
|
|
representation such as $"" which signal to Python that a new type
|
|
|
|
|
of string is present. $-strings would have to interact with the
|
|
|
|
|
existing r-prefixes and u-prefixes, essentially doubling the
|
|
|
|
|
number of string prefix combinations.
|
|
|
|
|
|
|
|
|
|
PEP 215 also allows for arbitrary Python expressions inside the
|
|
|
|
|
$-strings, so that you could do things like:
|
|
|
|
|
|
2004-03-23 22:08:02 -05:00
|
|
|
|
import sys
|
|
|
|
|
print $"sys = $sys, sys = $sys.modules['sys']"
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
which would return
|
|
|
|
|
|
2004-03-23 22:08:02 -05:00
|
|
|
|
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
|
|
|
|
|
|
2002-06-18 22:53:41 -04:00
|
|
|
|
It's generally accepted that the rules in PEP 215 are safe in the
|
|
|
|
|
sense that they introduce no new security issues (see PEP 215,
|
|
|
|
|
"Security Issues" for details). However, the rules are still
|
2004-03-23 22:08:02 -05:00
|
|
|
|
quite complex, and make it more difficult to see the substitution
|
|
|
|
|
placeholder in the original $-string.
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
The interesting thing is that the Template class defined in this
|
2004-03-23 22:08:02 -05:00
|
|
|
|
PEP has nothing to say about the values that are substituted for
|
|
|
|
|
the placeholders. Thus, with a little extra work, it's possible
|
|
|
|
|
to support PEP 215's functionality using existing Python syntax.
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
For example, one could define subclasses of Template and dict that
|
2004-08-10 18:19:59 -04:00
|
|
|
|
allowed for a more complex placeholder syntax and a mapping that
|
|
|
|
|
evaluated those placeholders.
|
2002-07-12 19:21:08 -04:00
|
|
|
|
|
|
|
|
|
|
2004-03-23 22:08:02 -05:00
|
|
|
|
Internationalization
|
2002-07-12 19:21:08 -04:00
|
|
|
|
|
2004-08-10 18:19:59 -04:00
|
|
|
|
The implementation supports internationalization magic by keeping
|
|
|
|
|
the original string value intact. In fact, all the work of the
|
|
|
|
|
special substitution rules are implemented by overriding the
|
2004-08-20 10:20:38 -04:00
|
|
|
|
__mod__() operator. However the string value of a Template (or
|
|
|
|
|
SafeTemplate) is the string that was passed to its constructor.
|
2002-07-12 19:21:08 -04:00
|
|
|
|
|
2004-03-23 22:08:02 -05:00
|
|
|
|
This approach allows a gettext-based internationalized program to
|
2004-08-20 10:20:38 -04:00
|
|
|
|
use the Template instance as a lookup into the catalog; in fact
|
|
|
|
|
gettext doesn't care that the catalog key is a Template. Because
|
|
|
|
|
the value of the Template is the original $-string, translators
|
2004-03-23 22:08:02 -05:00
|
|
|
|
also never need to use %-strings. The right thing will happen at
|
|
|
|
|
run-time.
|
2002-07-12 19:21:08 -04:00
|
|
|
|
|
|
|
|
|
|
2004-08-20 10:20:38 -04:00
|
|
|
|
Open Issues
|
|
|
|
|
|
|
|
|
|
- Should the Template and SafeTemplate classes convert mapping
|
|
|
|
|
values to strings (or unicodes)? I.e. what should this code do:
|
|
|
|
|
|
|
|
|
|
>>> from string import Template
|
|
|
|
|
>>> Template('The cost was $amount euros') % {'amount': 7}
|
|
|
|
|
|
|
|
|
|
Should this raise an exception such as TypeError, or should this
|
|
|
|
|
return the string 'The cose was 7 euros'?
|
|
|
|
|
|
|
|
|
|
PEP author preference: no automatic stringification.
|
|
|
|
|
|
|
|
|
|
- The pattern for placeholders in the Template and SafeTemplate
|
|
|
|
|
classes matches Python identifiers. Some people want to match
|
|
|
|
|
Python attribute paths, e.g. "$os.path.sep". This can be useful
|
|
|
|
|
in some applications, however note that it is up to the
|
|
|
|
|
interpolation mapping to provide namespace lookup for the
|
|
|
|
|
attribute paths.
|
|
|
|
|
|
|
|
|
|
Should we include AttributeTemplate and SafeAttributeTemplate in
|
|
|
|
|
the standard library? What about more complex patterns such as
|
|
|
|
|
Python expressions?
|
|
|
|
|
|
|
|
|
|
PEP author preference: No, we don't include them for now. Such
|
|
|
|
|
classes are easily derived, and besides, we're not proposing to
|
|
|
|
|
include any interpolation mappings, and without such a
|
|
|
|
|
specialized mapping, a pattern matching attribute paths or
|
|
|
|
|
expressions aren't useful.
|
|
|
|
|
|
|
|
|
|
- Where does the Template and SafeTemplate classes live? Some
|
|
|
|
|
people have suggested creating a stringtools or stringlib module
|
|
|
|
|
to house these two classes. The PEP author has proposed a
|
|
|
|
|
re-organization of the existing string module, turning it into a
|
|
|
|
|
string package.
|
|
|
|
|
|
|
|
|
|
PEP author preference: There seems little consensus around
|
|
|
|
|
either suggestion, and since the classes are just a few lines of
|
|
|
|
|
Python, I propose no string module re-organization, but to add
|
|
|
|
|
these two classes to string.py.
|
|
|
|
|
|
|
|
|
|
- Should the $-placeholder rules be more strict? Specifically,
|
|
|
|
|
objections have been raised about 'magically' escaping $'s at
|
|
|
|
|
the end of strings, or in strings like '$100'. The suggestion
|
|
|
|
|
was that we add another matching group which matches bare $'s,
|
|
|
|
|
raising a ValueError if we find such a match.
|
|
|
|
|
|
|
|
|
|
PEP author preference: This sounds fine to me, although because
|
|
|
|
|
the pattern is part of the public interface for the class, we
|
|
|
|
|
will have to document that 4 groups are expected instead of 3.
|
|
|
|
|
|
|
|
|
|
|
2002-06-18 22:53:41 -04:00
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] String Formatting Operations
|
|
|
|
|
http://www.python.org/doc/current/lib/typesseq-strings.html
|
|
|
|
|
|
|
|
|
|
[2] Identifiers and Keywords
|
2004-03-23 22:08:02 -05:00
|
|
|
|
http://www.python.org/doc/current/ref/identifiers.html
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
2002-07-12 19:21:08 -04:00
|
|
|
|
[3] Guido's python-dev posting from 21-Jul-2002
|
|
|
|
|
http://mail.python.org/pipermail/python-dev/2002-July/026397.html
|
|
|
|
|
|
2002-06-18 22:53:41 -04:00
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|