python-peps/pep-0292.txt

240 lines
9.4 KiB
Plaintext
Raw Normal View History

PEP: 292
Title: Simpler String Substitutions
Version: $Revision$
Last-Modified: $Date$
Author: barry@python.org (Barry A. Warsaw)
Status: Draft
Type: Standards Track
Created: 18-Jun-2002
Python-Version: 2.4
Post-History: 18-Jun-2002, 23-Mar-2004
Abstract
This PEP describes a simpler string substitution feature, also
known as string interpolation. This PEP is "simpler" in two
respects:
1. Python's current string substitution feature
(i.e. %-substitution) is complicated and error prone. This PEP
is simpler at the cost of some expressiveness.
2. PEP 215 proposed an alternative string interpolation feature,
introducing a new `$' string prefix. PEP 292 is simpler than
this because it involves no syntax changes and has much simpler
rules for what substitutions can occur in the string.
Rationale
Python currently supports a string substitution syntax based on
C's printf() '%' formatting character[1]. While quite rich,
%-formatting codes are also error prone, even for
experienced Python programmers. A common mistake is to leave off
the trailing format character, e.g. the `s' in "%(name)s".
In addition, the rules for what can follow a % sign are fairly
complex, while the usual application rarely needs such complexity.
Most scripts need to do some string interpolation, but most of
those use simple `stringification' formats, i.e. %s or %(name)s
This form should be made simpler and less error prone.
A Simpler Proposal
We propose the addition of a new class -- called 'Template', which
will live in the string module -- derived from the built-in
unicode type. The Template class supports new rules for string
substitution; its value contains placeholders, introduced with the
$ character. The following rules for $-placeholders apply:
1. $$ is an escape; it is replaced with a single $
2. $identifier names a substitution placeholder matching a mapping
key of "identifier". By default, "identifier" must spell a
Python identifier as defined in [2]. The first non-identifier
character after the $ character terminates this placeholder
specification.
3. ${identifier} is equivalent to $identifier. It is required
when valid identifier characters follow the placeholder but are
not part of the placeholder, e.g. "${noun}ification".
If the $ character appears at the end of the line, or is followed
by any other character than those described above, it is treated
as if it had been escaped, appearing in the resulting string
unchanged. NOTE: see open issues below.
No other characters have special meaning, however it is possible
to derive from the Template class to define different rules for
the placeholder. For example, a derived class could allow for
periods in the placeholder (e.g. to support a kind of dynamic
namespace and attribute path lookup).
Once the Template has been created, substitutions can be performed
using traditional Python syntax. For example:
>>> from string import Template
>>> mapping = dict(name='Guido', country='the Netherlands')
>>> s = Template('${name} was born in ${country}')
>>> print s % mapping
Guido was born in the Netherlands
Another class is provided which derives from Template. This class
is called 'SafeTemplate' and supports rules identical to those
above. The difference between Template instances and SafeTemplate
instances is that in SafeTemplate if a placeholder is missing from
the interpolation mapping, no KeyError is raised. Instead, the
original placeholder is included in the result string unchanged.
For example:
>>> from string import Template, SafeTemplate
>>> mapping = dict(name='Guido', country='the Netherlands')
>>> s = Template('$who was born in $country')
>>> print s % mapping
Traceback (most recent call last):
[...traceback omitted...]
KeyError: u'who'
>>> s = SafeTemplate('$who was born in $country')
>>> print s % mapping
$who was born in the Netherlands
Why `$' and Braces?
The BDFL said it best: The $ means "substitution" in so many
languages besides Perl that I wonder where you've been. [...]
We're copying this from the shell.
Comparison to PEP 215
PEP 215 describes an alternate proposal for string interpolation.
Unlike that PEP, this one does not propose any new syntax for
Python. All the proposed new features are embodied in a new
library module. PEP 215 proposes a new string prefix
representation such as $"" which signal to Python that a new type
of string is present. $-strings would have to interact with the
existing r-prefixes and u-prefixes, essentially doubling the
number of string prefix combinations.
PEP 215 also allows for arbitrary Python expressions inside the
$-strings, so that you could do things like:
import sys
print $"sys = $sys, sys = $sys.modules['sys']"
which would return
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
It's generally accepted that the rules in PEP 215 are safe in the
sense that they introduce no new security issues (see PEP 215,
"Security Issues" for details). However, the rules are still
quite complex, and make it more difficult to see the substitution
placeholder in the original $-string.
The interesting thing is that the Template class defined in this
PEP has nothing to say about the values that are substituted for
the placeholders. Thus, with a little extra work, it's possible
to support PEP 215's functionality using existing Python syntax.
For example, one could define subclasses of Template and dict that
allowed for a more complex placeholder syntax and a mapping that
evaluated those placeholders.
Internationalization
The implementation supports internationalization magic by keeping
the original string value intact. In fact, all the work of the
special substitution rules are implemented by overriding the
__mod__() operator. However the string value of a Template (or
SafeTemplate) is the string that was passed to its constructor.
This approach allows a gettext-based internationalized program to
use the Template instance as a lookup into the catalog; in fact
gettext doesn't care that the catalog key is a Template. Because
the value of the Template is the original $-string, translators
also never need to use %-strings. The right thing will happen at
run-time.
Open Issues
- Should the Template and SafeTemplate classes convert mapping
values to strings (or unicodes)? I.e. what should this code do:
>>> from string import Template
>>> Template('The cost was $amount euros') % {'amount': 7}
Should this raise an exception such as TypeError, or should this
return the string 'The cose was 7 euros'?
PEP author preference: no automatic stringification.
- The pattern for placeholders in the Template and SafeTemplate
classes matches Python identifiers. Some people want to match
Python attribute paths, e.g. "$os.path.sep". This can be useful
in some applications, however note that it is up to the
interpolation mapping to provide namespace lookup for the
attribute paths.
Should we include AttributeTemplate and SafeAttributeTemplate in
the standard library? What about more complex patterns such as
Python expressions?
PEP author preference: No, we don't include them for now. Such
classes are easily derived, and besides, we're not proposing to
include any interpolation mappings, and without such a
specialized mapping, a pattern matching attribute paths or
expressions aren't useful.
- Where does the Template and SafeTemplate classes live? Some
people have suggested creating a stringtools or stringlib module
to house these two classes. The PEP author has proposed a
re-organization of the existing string module, turning it into a
string package.
PEP author preference: There seems little consensus around
either suggestion, and since the classes are just a few lines of
Python, I propose no string module re-organization, but to add
these two classes to string.py.
- Should the $-placeholder rules be more strict? Specifically,
objections have been raised about 'magically' escaping $'s at
the end of strings, or in strings like '$100'. The suggestion
was that we add another matching group which matches bare $'s,
raising a ValueError if we find such a match.
PEP author preference: This sounds fine to me, although because
the pattern is part of the public interface for the class, we
will have to document that 4 groups are expected instead of 3.
References
[1] String Formatting Operations
http://www.python.org/doc/current/lib/typesseq-strings.html
[2] Identifiers and Keywords
http://www.python.org/doc/current/ref/identifiers.html
[3] Guido's python-dev posting from 21-Jul-2002
http://mail.python.org/pipermail/python-dev/2002-July/026397.html
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: