PEP 292, Simpler String Substitutions, Warsaw
This commit is contained in:
parent
d5c82d205f
commit
8f14274321
|
@ -0,0 +1,267 @@
|
||||||
|
PEP: 292
|
||||||
|
Title: Simpler String Substitutions
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: barry@zope.com (Barry A. Warsaw)
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Created: 18-Jun-2002
|
||||||
|
Python-Version: 2.3
|
||||||
|
Post-History:
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
|
||||||
|
This PEP describes a simpler string substitution feature, also
|
||||||
|
known as string interpolation. This PEP is "simpler" in two
|
||||||
|
respects:
|
||||||
|
|
||||||
|
1. Python's current string substitution feature (commonly known as
|
||||||
|
%-substitutions) is complicated and error prone. This PEP is
|
||||||
|
simpler at the cost of less expressiveness.
|
||||||
|
|
||||||
|
2. PEP 215 proposed an alternative string interpolation feature,
|
||||||
|
introducing a new `$' string prefix. PEP 292 is simpler than
|
||||||
|
this because it involves no syntax changes and has much simpler
|
||||||
|
rules for what substitutions can occur in the string.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
|
||||||
|
Python currently supports a string substitution (a.k.a. string
|
||||||
|
interpolation) syntax based on C's printf() % formatting
|
||||||
|
character[1]. While quite rich, %-formatting codes are also quite
|
||||||
|
error prone, even for experienced Python programmers. A common
|
||||||
|
mistake is to leave off the trailing format character, e.g. the
|
||||||
|
`s' in "%(name)s".
|
||||||
|
|
||||||
|
In addition, the rules for what can follow a % sign are fairly
|
||||||
|
complex, while the usual application rarely needs such complexity.
|
||||||
|
|
||||||
|
|
||||||
|
A Simpler Proposal
|
||||||
|
|
||||||
|
Here we propose the addition of a new string method, called .sub()
|
||||||
|
which performs substitution of mapping values into a string with
|
||||||
|
special substitution placeholders. These placeholders are
|
||||||
|
introduced with the $ character. The following rules for
|
||||||
|
$-placeholders apply:
|
||||||
|
|
||||||
|
1. $$ is an escape; it is replaced with a single $
|
||||||
|
|
||||||
|
2. $identifier names a substitution placeholder matching a mapping
|
||||||
|
key of "identifier". "identifier" must be a Python identifier
|
||||||
|
as defined in [2]. The first non-identifier character after
|
||||||
|
the $ character terminates this placeholder specification.
|
||||||
|
|
||||||
|
3. ${identifier} is equivalent to $identifier and for clarity,
|
||||||
|
this is the preferred form. It is required for when valid
|
||||||
|
identifier characters follow the placeholder but are not part of
|
||||||
|
the placeholder, e.g. "${noun}ification".
|
||||||
|
|
||||||
|
No other characters have special meaning.
|
||||||
|
|
||||||
|
The .sub() method takes an optional mapping (e.g. dictionary)
|
||||||
|
where the keys match placeholders in the string, and the values
|
||||||
|
are substituted for the placeholders. For example:
|
||||||
|
|
||||||
|
'${name} was born in ${country}'.sub({'name': 'Guido',
|
||||||
|
'country': 'the Netherlands'})
|
||||||
|
|
||||||
|
returns
|
||||||
|
|
||||||
|
'Guido was born in the Netherlands'
|
||||||
|
|
||||||
|
The mapping argument is optional; if it is omitted then the
|
||||||
|
mapping is taken from the locals and globals of the context in
|
||||||
|
which the .sub() method is executed. For example:
|
||||||
|
|
||||||
|
def birth(self, name):
|
||||||
|
country = self.countryOfOrigin['name']
|
||||||
|
return '${name} was born in ${country}'
|
||||||
|
|
||||||
|
birth('Guido')
|
||||||
|
|
||||||
|
returns
|
||||||
|
|
||||||
|
'Guido was born in the Netherlands'
|
||||||
|
|
||||||
|
|
||||||
|
Reference Implementation
|
||||||
|
|
||||||
|
Here's a Python 2.2-based reference implementation. Of course the
|
||||||
|
real implementation would be in C, would not require a string
|
||||||
|
subclass, and would not be modeled on the existing %-interpolation
|
||||||
|
feature.
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import re
|
||||||
|
|
||||||
|
dre = re.compile(r'(\$\$)|\$([_a-z]\w*)|\$\{([_a-z]\w*)\}', re.I)
|
||||||
|
EMPTYSTRING = ''
|
||||||
|
|
||||||
|
class dstr(str):
|
||||||
|
def sub(self, mapping=None):
|
||||||
|
# Default mapping is locals/globals of caller
|
||||||
|
if mapping is None:
|
||||||
|
frame = sys._getframe(1)
|
||||||
|
mapping = frame.f_globals.copy()
|
||||||
|
mapping.update(frame.f_locals)
|
||||||
|
# Escape %'s
|
||||||
|
s = self.replace('%', '%%')
|
||||||
|
# Convert $name and ${name} to $(name)s
|
||||||
|
parts = dre.split(s)
|
||||||
|
for i in range(1, len(parts), 4):
|
||||||
|
if parts[i] is not None:
|
||||||
|
parts[i] = '$'
|
||||||
|
elif parts[i+1] is not None:
|
||||||
|
parts[i+1] = '%(' + parts[i+1] + ')s'
|
||||||
|
else:
|
||||||
|
parts[i+2] = '%(' + parts[i+2] + ')s'
|
||||||
|
# Interpolate
|
||||||
|
return EMPTYSTRING.join(filter(None, parts)) % mapping
|
||||||
|
|
||||||
|
And here are some examples:
|
||||||
|
|
||||||
|
s = dstr('${name} was born in ${country}')
|
||||||
|
print s.sub({'name': 'Guido',
|
||||||
|
'country': 'the Netherlands'})
|
||||||
|
|
||||||
|
name = 'Barry'
|
||||||
|
country = 'the USA'
|
||||||
|
print s.sub()
|
||||||
|
|
||||||
|
This will print "Guido was born in the Netherlands" followed by
|
||||||
|
"Barry was born in the USA".
|
||||||
|
|
||||||
|
|
||||||
|
Handling Missing Keys
|
||||||
|
|
||||||
|
What should happen when one of the substitution keys is missing
|
||||||
|
from the mapping (or the locals/globals namespace if no argument
|
||||||
|
is given)? There are two possibilities:
|
||||||
|
|
||||||
|
- We can simply allow the exception (likely a NameError or
|
||||||
|
KeyError) to propagate.
|
||||||
|
|
||||||
|
- We can return the original substitution placeholder unchanged.
|
||||||
|
|
||||||
|
An example of the first is:
|
||||||
|
|
||||||
|
print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
|
||||||
|
|
||||||
|
would raise:
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "sub.py", line 66, in ?
|
||||||
|
print s.sub({'name': 'Bob'})
|
||||||
|
File "sub.py", line 26, in sub
|
||||||
|
return EMPTYSTRING.join(filter(None, parts)) % mapping
|
||||||
|
KeyError: country
|
||||||
|
|
||||||
|
An example of the second is:
|
||||||
|
|
||||||
|
print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
|
||||||
|
|
||||||
|
would print:
|
||||||
|
|
||||||
|
Bob was born in ${country}
|
||||||
|
|
||||||
|
The PEP author would prefer the latter interpretation, although a
|
||||||
|
case can be made for raising the exception instead. We could
|
||||||
|
almost ignore the issue, since the latter example could be
|
||||||
|
accomplished by passing in a "safe-dictionary" in instead of a
|
||||||
|
normal dictionary, like so:
|
||||||
|
|
||||||
|
class safedict(dict):
|
||||||
|
def __getitem__(self, key):
|
||||||
|
try:
|
||||||
|
return dict.__getitem__(self, key)
|
||||||
|
except KeyError:
|
||||||
|
return '${%s}' % key
|
||||||
|
|
||||||
|
so that
|
||||||
|
|
||||||
|
d = safedict({'name': 'Bob'})
|
||||||
|
print dstr('${name} was born in ${country}').sub(d)
|
||||||
|
|
||||||
|
would print:
|
||||||
|
|
||||||
|
Bob was born in ${country}
|
||||||
|
|
||||||
|
The one place where this won't work is when no arguments are given
|
||||||
|
to the .sub() method. .sub() wouldn't know whether to wrap
|
||||||
|
locals/globals in a safedict or not.
|
||||||
|
|
||||||
|
This ambiguity can be solved in several ways:
|
||||||
|
|
||||||
|
- we could have a parallel method called .safesub() which always
|
||||||
|
wrapped its argument in a safedict()
|
||||||
|
|
||||||
|
- .sub() could take an optional keyword argument flag which
|
||||||
|
indicates whether to wrap the argument in a safedict or not.
|
||||||
|
|
||||||
|
- .sub() could take an optional keyword argument which is a
|
||||||
|
callable that would get called with the original mapping and
|
||||||
|
return the mapping to be used for the substitution. By default,
|
||||||
|
this callable would be the identity function, but you could
|
||||||
|
easily pass in the safedict constructor instead.
|
||||||
|
|
||||||
|
BDFL proto-pronouncement: It should always raise a NameError when
|
||||||
|
the key is missing. There may not be sufficient use case for soft
|
||||||
|
failures in the no-argument version.
|
||||||
|
|
||||||
|
|
||||||
|
Comparison to PEP 215
|
||||||
|
|
||||||
|
PEP 215 describes an alternate proposal for string interpolation.
|
||||||
|
Unlike that PEP, this one does not propose any new syntax for
|
||||||
|
Python. All the proposed new features are embodied in a new
|
||||||
|
string method. PEP 215 proposes a new string prefix
|
||||||
|
representation such as $"" which signal to Python that a new type
|
||||||
|
of string is present. $-strings would have to interact with the
|
||||||
|
existing r-prefixes and u-prefixes, essentially doubling the
|
||||||
|
number of string prefix combinations.
|
||||||
|
|
||||||
|
PEP 215 also allows for arbitrary Python expressions inside the
|
||||||
|
$-strings, so that you could do things like:
|
||||||
|
|
||||||
|
import sys
|
||||||
|
print $"sys = $sys, sys = $sys.modules['sys']"
|
||||||
|
|
||||||
|
which would return
|
||||||
|
|
||||||
|
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
|
||||||
|
|
||||||
|
It's generally accepted that the rules in PEP 215 are safe in the
|
||||||
|
sense that they introduce no new security issues (see PEP 215,
|
||||||
|
"Security Issues" for details). However, the rules are still
|
||||||
|
quite complex, and make it more difficult to see what exactly is
|
||||||
|
the substitution placeholder in the original $-string.
|
||||||
|
|
||||||
|
By design, this PEP does not provide as much interpolation power
|
||||||
|
as PEP 215, however it is expected that the no-argument version of
|
||||||
|
.sub() allows at least as much power with no loss of readability.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
|
||||||
|
[1] String Formatting Operations
|
||||||
|
http://www.python.org/doc/current/lib/typesseq-strings.html
|
||||||
|
|
||||||
|
[2] Identifiers and Keywords
|
||||||
|
http://www.python.org/doc/current/ref/identifiers.html
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
End:
|
Loading…
Reference in New Issue