PEP 292, Simpler String Substitutions, Warsaw

This commit is contained in:
Barry Warsaw 2002-06-19 02:53:41 +00:00
parent d5c82d205f
commit 8f14274321
1 changed files with 267 additions and 0 deletions

267
pep-0292.txt Normal file
View File

@ -0,0 +1,267 @@
PEP: 292
Title: Simpler String Substitutions
Version: $Revision$
Last-Modified: $Date$
Author: barry@zope.com (Barry A. Warsaw)
Status: Draft
Type: Standards Track
Created: 18-Jun-2002
Python-Version: 2.3
Post-History:
Abstract
This PEP describes a simpler string substitution feature, also
known as string interpolation. This PEP is "simpler" in two
respects:
1. Python's current string substitution feature (commonly known as
%-substitutions) is complicated and error prone. This PEP is
simpler at the cost of less expressiveness.
2. PEP 215 proposed an alternative string interpolation feature,
introducing a new `$' string prefix. PEP 292 is simpler than
this because it involves no syntax changes and has much simpler
rules for what substitutions can occur in the string.
Rationale
Python currently supports a string substitution (a.k.a. string
interpolation) syntax based on C's printf() % formatting
character[1]. While quite rich, %-formatting codes are also quite
error prone, even for experienced Python programmers. A common
mistake is to leave off the trailing format character, e.g. the
`s' in "%(name)s".
In addition, the rules for what can follow a % sign are fairly
complex, while the usual application rarely needs such complexity.
A Simpler Proposal
Here we propose the addition of a new string method, called .sub()
which performs substitution of mapping values into a string with
special substitution placeholders. These placeholders are
introduced with the $ character. The following rules for
$-placeholders apply:
1. $$ is an escape; it is replaced with a single $
2. $identifier names a substitution placeholder matching a mapping
key of "identifier". "identifier" must be a Python identifier
as defined in [2]. The first non-identifier character after
the $ character terminates this placeholder specification.
3. ${identifier} is equivalent to $identifier and for clarity,
this is the preferred form. It is required for when valid
identifier characters follow the placeholder but are not part of
the placeholder, e.g. "${noun}ification".
No other characters have special meaning.
The .sub() method takes an optional mapping (e.g. dictionary)
where the keys match placeholders in the string, and the values
are substituted for the placeholders. For example:
'${name} was born in ${country}'.sub({'name': 'Guido',
'country': 'the Netherlands'})
returns
'Guido was born in the Netherlands'
The mapping argument is optional; if it is omitted then the
mapping is taken from the locals and globals of the context in
which the .sub() method is executed. For example:
def birth(self, name):
country = self.countryOfOrigin['name']
return '${name} was born in ${country}'
birth('Guido')
returns
'Guido was born in the Netherlands'
Reference Implementation
Here's a Python 2.2-based reference implementation. Of course the
real implementation would be in C, would not require a string
subclass, and would not be modeled on the existing %-interpolation
feature.
import sys
import re
dre = re.compile(r'(\$\$)|\$([_a-z]\w*)|\$\{([_a-z]\w*)\}', re.I)
EMPTYSTRING = ''
class dstr(str):
def sub(self, mapping=None):
# Default mapping is locals/globals of caller
if mapping is None:
frame = sys._getframe(1)
mapping = frame.f_globals.copy()
mapping.update(frame.f_locals)
# Escape %'s
s = self.replace('%', '%%')
# Convert $name and ${name} to $(name)s
parts = dre.split(s)
for i in range(1, len(parts), 4):
if parts[i] is not None:
parts[i] = '$'
elif parts[i+1] is not None:
parts[i+1] = '%(' + parts[i+1] + ')s'
else:
parts[i+2] = '%(' + parts[i+2] + ')s'
# Interpolate
return EMPTYSTRING.join(filter(None, parts)) % mapping
And here are some examples:
s = dstr('${name} was born in ${country}')
print s.sub({'name': 'Guido',
'country': 'the Netherlands'})
name = 'Barry'
country = 'the USA'
print s.sub()
This will print "Guido was born in the Netherlands" followed by
"Barry was born in the USA".
Handling Missing Keys
What should happen when one of the substitution keys is missing
from the mapping (or the locals/globals namespace if no argument
is given)? There are two possibilities:
- We can simply allow the exception (likely a NameError or
KeyError) to propagate.
- We can return the original substitution placeholder unchanged.
An example of the first is:
print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
would raise:
Traceback (most recent call last):
File "sub.py", line 66, in ?
print s.sub({'name': 'Bob'})
File "sub.py", line 26, in sub
return EMPTYSTRING.join(filter(None, parts)) % mapping
KeyError: country
An example of the second is:
print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
would print:
Bob was born in ${country}
The PEP author would prefer the latter interpretation, although a
case can be made for raising the exception instead. We could
almost ignore the issue, since the latter example could be
accomplished by passing in a "safe-dictionary" in instead of a
normal dictionary, like so:
class safedict(dict):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return '${%s}' % key
so that
d = safedict({'name': 'Bob'})
print dstr('${name} was born in ${country}').sub(d)
would print:
Bob was born in ${country}
The one place where this won't work is when no arguments are given
to the .sub() method. .sub() wouldn't know whether to wrap
locals/globals in a safedict or not.
This ambiguity can be solved in several ways:
- we could have a parallel method called .safesub() which always
wrapped its argument in a safedict()
- .sub() could take an optional keyword argument flag which
indicates whether to wrap the argument in a safedict or not.
- .sub() could take an optional keyword argument which is a
callable that would get called with the original mapping and
return the mapping to be used for the substitution. By default,
this callable would be the identity function, but you could
easily pass in the safedict constructor instead.
BDFL proto-pronouncement: It should always raise a NameError when
the key is missing. There may not be sufficient use case for soft
failures in the no-argument version.
Comparison to PEP 215
PEP 215 describes an alternate proposal for string interpolation.
Unlike that PEP, this one does not propose any new syntax for
Python. All the proposed new features are embodied in a new
string method. PEP 215 proposes a new string prefix
representation such as $"" which signal to Python that a new type
of string is present. $-strings would have to interact with the
existing r-prefixes and u-prefixes, essentially doubling the
number of string prefix combinations.
PEP 215 also allows for arbitrary Python expressions inside the
$-strings, so that you could do things like:
import sys
print $"sys = $sys, sys = $sys.modules['sys']"
which would return
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
It's generally accepted that the rules in PEP 215 are safe in the
sense that they introduce no new security issues (see PEP 215,
"Security Issues" for details). However, the rules are still
quite complex, and make it more difficult to see what exactly is
the substitution placeholder in the original $-string.
By design, this PEP does not provide as much interpolation power
as PEP 215, however it is expected that the no-argument version of
.sub() allows at least as much power with no loss of readability.
References
[1] String Formatting Operations
http://www.python.org/doc/current/lib/typesseq-strings.html
[2] Identifiers and Keywords
http://www.python.org/doc/current/ref/identifiers.html
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: