PEP: 292 Title: Simpler String Substitutions Version: $Revision$ Last-Modified: $Date$ Author: barry@python.org (Barry A. Warsaw) Status: Draft Type: Standards Track Created: 18-Jun-2002 Python-Version: 2.4 Post-History: 18-Jun-2002, 23-Mar-2004, 22-Aug-2004 Abstract This PEP describes a simpler string substitution feature, also known as string interpolation. This PEP is "simpler" in two respects: 1. Python's current string substitution feature (i.e. %-substitution) is complicated and error prone. This PEP is simpler at the cost of some expressiveness. 2. PEP 215 proposed an alternative string interpolation feature, introducing a new `$' string prefix. PEP 292 is simpler than this because it involves no syntax changes and has much simpler rules for what substitutions can occur in the string. Rationale Python currently supports a string substitution syntax based on C's printf() '%' formatting character[1]. While quite rich, %-formatting codes are also error prone, even for experienced Python programmers. A common mistake is to leave off the trailing format character, e.g. the `s' in "%(name)s". In addition, the rules for what can follow a % sign are fairly complex, while the usual application rarely needs such complexity. Most scripts need to do some string interpolation, but most of those use simple `stringification' formats, i.e. %s or %(name)s This form should be made simpler and less error prone. A Simpler Proposal We propose the addition of a new class -- called 'Template', which will live in the string module -- derived from the built-in unicode type. The Template class supports new rules for string substitution; its value contains placeholders, introduced with the $ character. The following rules for $-placeholders apply: 1. $$ is an escape; it is replaced with a single $ 2. $identifier names a substitution placeholder matching a mapping key of "identifier". By default, "identifier" must spell a Python identifier as defined in [2]. The first non-identifier character after the $ character terminates this placeholder specification. 3. ${identifier} is equivalent to $identifier. It is required when valid identifier characters follow the placeholder but are not part of the placeholder, e.g. "${noun}ification". If the $ character appears at the end of the line, or is followed by any other character than those described above, a ValueError will be raised at interpolation time. Values in the mapping will be converted to Unicode strings by calling the built-in unicode() function, using its default arguments. No other characters have special meaning, however it is possible to derive from the Template class to define different rules for the placeholder. For example, a derived class could allow for periods in the placeholder (e.g. to support a kind of dynamic namespace and attribute path lookup). Once the Template has been created, substitutions can be performed using traditional Python syntax. For example: >>> from string import Template >>> mapping = dict(name='Guido', country='the Netherlands') >>> s = Template('${name} was born in ${country}') >>> print s % mapping Guido was born in the Netherlands Another class is provided which derives from Template. This class is called 'SafeTemplate' and supports rules identical to those above. The difference between Template instances and SafeTemplate instances is that in SafeTemplate if a placeholder is missing from the interpolation mapping, no KeyError is raised. Instead, the original placeholder is included in the result string unchanged. For example: >>> from string import Template, SafeTemplate >>> mapping = dict(name='Guido', country='the Netherlands') >>> s = Template('$who was born in $country') >>> print s % mapping Traceback (most recent call last): [...traceback omitted...] KeyError: u'who' >>> s = SafeTemplate('$who was born in $country') >>> print s % mapping $who was born in the Netherlands Why `$' and Braces? The BDFL said it best: The $ means "substitution" in so many languages besides Perl that I wonder where you've been. [...] We're copying this from the shell. Comparison to PEP 215 PEP 215 describes an alternate proposal for string interpolation. Unlike that PEP, this one does not propose any new syntax for Python. All the proposed new features are embodied in a new library module. PEP 215 proposes a new string prefix representation such as $"" which signal to Python that a new type of string is present. $-strings would have to interact with the existing r-prefixes and u-prefixes, essentially doubling the number of string prefix combinations. PEP 215 also allows for arbitrary Python expressions inside the $-strings, so that you could do things like: import sys print $"sys = $sys, sys = $sys.modules['sys']" which would return sys = , sys = It's generally accepted that the rules in PEP 215 are safe in the sense that they introduce no new security issues (see PEP 215, "Security Issues" for details). However, the rules are still quite complex, and make it more difficult to see the substitution placeholder in the original $-string. The interesting thing is that the Template class defined in this PEP has nothing to say about the values that are substituted for the placeholders. Thus, with a little extra work, it's possible to support PEP 215's functionality using existing Python syntax. For example, one could define subclasses of Template and dict that allowed for a more complex placeholder syntax and a mapping that evaluated those placeholders. Internationalization The implementation supports internationalization magic by keeping the original string value intact. In fact, all the work of the special substitution rules are implemented by overriding the __mod__() operator. However the string value of a Template (or SafeTemplate) is the string that was passed to its constructor. This approach allows a gettext-based internationalized program to use the Template instance as a lookup into the catalog; in fact gettext doesn't care that the catalog key is a Template. Because the value of the Template is the original $-string, translators also never need to use %-strings. The right thing will happen at run-time. Reference Implementation A SourceForge patch[4] is available which implements this proposal, include unit tests and documentation changes. References [1] String Formatting Operations http://www.python.org/doc/current/lib/typesseq-strings.html [2] Identifiers and Keywords http://www.python.org/doc/current/ref/identifiers.html [3] Guido's python-dev posting from 21-Jul-2002 http://mail.python.org/pipermail/python-dev/2002-July/026397.html [4] Reference Implementation http://sourceforge.net/tracker/index.php?func=detail&aid=1014055&group_id=5470&atid=305470 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: