Updated based on recent Pycon discussions, with a pointer to the SF patch
containing the reference implementation.
This commit is contained in:
parent
b2436faf3b
commit
fa6ff92955
313
pep-0292.txt
313
pep-0292.txt
|
@ -6,8 +6,8 @@ Author: barry@python.org (Barry A. Warsaw)
|
||||||
Status: Draft
|
Status: Draft
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
Created: 18-Jun-2002
|
Created: 18-Jun-2002
|
||||||
Python-Version: 2.3
|
Python-Version: 2.4
|
||||||
Post-History: 18-Jun-2002
|
Post-History: 18-Jun-2002, 23-Mar-2004
|
||||||
|
|
||||||
|
|
||||||
Abstract
|
Abstract
|
||||||
|
@ -16,30 +16,26 @@ Abstract
|
||||||
known as string interpolation. This PEP is "simpler" in two
|
known as string interpolation. This PEP is "simpler" in two
|
||||||
respects:
|
respects:
|
||||||
|
|
||||||
1. Python's current string substitution feature (commonly known as
|
1. Python's current string substitution feature
|
||||||
%-substitutions) is complicated and error prone. This PEP is
|
(i.e. %-substitution) is complicated and error prone. This PEP
|
||||||
simpler at the cost of less expressiveness.
|
is simpler at the cost of some expressiveness.
|
||||||
|
|
||||||
2. PEP 215 proposed an alternative string interpolation feature,
|
2. PEP 215 proposed an alternative string interpolation feature,
|
||||||
introducing a new `$' string prefix. PEP 292 is simpler than
|
introducing a new `$' string prefix. PEP 292 is simpler than
|
||||||
this because it involves no syntax changes and has much simpler
|
this because it involves no syntax changes and has much simpler
|
||||||
rules for what substitutions can occur in the string.
|
rules for what substitutions can occur in the string.
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
|
|
||||||
Python currently supports a string substitution (a.k.a. string
|
Python currently supports a string substitution syntax based on
|
||||||
interpolation) syntax based on C's printf() % formatting
|
C's printf() '%' formatting character[1]. While quite rich,
|
||||||
character[1]. While quite rich, %-formatting codes are also quite
|
%-formatting codes are also error prone, even for
|
||||||
error prone, even for experienced Python programmers. A common
|
experienced Python programmers. A common mistake is to leave off
|
||||||
mistake is to leave off the trailing format character, e.g. the
|
the trailing format character, e.g. the `s' in "%(name)s".
|
||||||
`s' in "%(name)s".
|
|
||||||
|
|
||||||
In addition, the rules for what can follow a % sign are fairly
|
In addition, the rules for what can follow a % sign are fairly
|
||||||
complex, while the usual application rarely needs such
|
complex, while the usual application rarely needs such complexity.
|
||||||
complexity. Also error prone is the right-hand side of the %
|
|
||||||
operator: e.g. singleton tuples.
|
|
||||||
|
|
||||||
Most scripts need to do some string interpolation, but most of
|
Most scripts need to do some string interpolation, but most of
|
||||||
those use simple `stringification' formats, i.e. %s or %(name)s
|
those use simple `stringification' formats, i.e. %s or %(name)s
|
||||||
This form should be made simpler and less error prone.
|
This form should be made simpler and less error prone.
|
||||||
|
@ -47,226 +43,53 @@ Rationale
|
||||||
|
|
||||||
A Simpler Proposal
|
A Simpler Proposal
|
||||||
|
|
||||||
Here we propose the addition of a new string method, called .sub()
|
We propose the addition of a new class -- called 'dstring' --
|
||||||
which performs substitution of mapping values into a string with
|
derived from the built-in unicode type, which supports new rules
|
||||||
special substitution placeholders. These placeholders are
|
for string substitution. dstring's value contains placeholders,
|
||||||
introduced with the $ character. The following rules for
|
introduced with the $ character. The following rules for
|
||||||
$-placeholders apply:
|
$-placeholders apply:
|
||||||
|
|
||||||
1. $$ is an escape; it is replaced with a single $
|
1. $$ is an escape; it is replaced with a single $
|
||||||
|
|
||||||
2. $identifier names a substitution placeholder matching a mapping
|
2. $identifier names a substitution placeholder matching a mapping
|
||||||
key of "identifier". "identifier" must be a Python identifier
|
key of "identifier". By default, "identifier" must spell a
|
||||||
as defined in [2]. The first non-identifier character after
|
Python identifier as defined in [2]. The first non-identifier
|
||||||
the $ character terminates this placeholder specification.
|
character after the $ character terminates this placeholder
|
||||||
|
specification.
|
||||||
|
|
||||||
3. ${identifier} is equivalent to $identifier. It is required for
|
3. ${identifier} is equivalent to $identifier. It is required for
|
||||||
when valid identifier characters follow the placeholder but are
|
when valid identifier characters follow the placeholder but are
|
||||||
not part of the placeholder, e.g. "${noun}ification".
|
not part of the placeholder, e.g. "${noun}ification".
|
||||||
|
|
||||||
No other characters have special meaning.
|
No other characters have special meaning, however it is possible
|
||||||
|
to derive from the dstring class to define different rules for the
|
||||||
|
placeholder. For example, a derived class could allow for periods
|
||||||
|
in the placeholder (e.g. to support a kind of dynamic namespace
|
||||||
|
and attribute path lookup).
|
||||||
|
|
||||||
The .sub() method takes an optional mapping (e.g. dictionary)
|
Once the dstring has been created, substitutions can be performed
|
||||||
where the keys match placeholders in the string, and the values
|
using traditional Python syntax. For example:
|
||||||
are substituted for the placeholders. For example:
|
|
||||||
|
|
||||||
'${name} was born in ${country}'.sub({'name': 'Guido',
|
>>> mapping = dict(name='Guido', country='the Netherlands')
|
||||||
'country': 'the Netherlands'})
|
>>> s = dstring('${name} was born in ${country})
|
||||||
|
>>> print s % mapping
|
||||||
returns
|
Guido was born in the Netherlands
|
||||||
|
|
||||||
'Guido was born in the Netherlands'
|
|
||||||
|
|
||||||
The mapping argument is optional; if it is omitted then the
|
|
||||||
mapping is taken from the locals and globals of the context in
|
|
||||||
which the .sub() method is executed. For example:
|
|
||||||
|
|
||||||
def birth(self, name):
|
|
||||||
country = self.countryOfOrigin[name]
|
|
||||||
return '${name} was born in ${country}'.sub()
|
|
||||||
|
|
||||||
birth('Guido')
|
|
||||||
|
|
||||||
returns
|
|
||||||
|
|
||||||
'Guido was born in the Netherlands'
|
|
||||||
|
|
||||||
|
|
||||||
Why `$' and Braces?
|
Why `$' and Braces?
|
||||||
|
|
||||||
The BDFL said it best: The $ means "substitution" in so many
|
The BDFL said it best: The $ means "substitution" in so many
|
||||||
languages besides Perl that I wonder where you've been. [...]
|
languages besides Perl that I wonder where you've been. [...]
|
||||||
We're copying this from the shell.
|
We're copying this from the shell.
|
||||||
|
|
||||||
|
|
||||||
Security Issues
|
|
||||||
|
|
||||||
Never use no-arg .sub() on strings that come from untrusted
|
|
||||||
sources. It could be used to gain unauthorized information about
|
|
||||||
variables in your local or global scope.
|
|
||||||
|
|
||||||
|
|
||||||
Reference Implementation
|
Reference Implementation
|
||||||
|
|
||||||
Here's a Python 2.2-based reference implementation. Of course the
|
A reference implementation is available at [4]. The
|
||||||
real implementation would be in C, would not require a string
|
implementation contains the dstring class described above,
|
||||||
subclass, and would not be modeled on the existing %-interpolation
|
situated in a new standard library package called 'stringlib'.
|
||||||
feature.
|
Inside the reference implementation stringlib package are a few
|
||||||
|
other related nifty tools that aren't described in this PEP.
|
||||||
import sys
|
|
||||||
import re
|
|
||||||
|
|
||||||
class dstr(str):
|
|
||||||
def sub(self, mapping=None):
|
|
||||||
# Default mapping is locals/globals of caller
|
|
||||||
if mapping is None:
|
|
||||||
frame = sys._getframe(1)
|
|
||||||
mapping = frame.f_globals.copy()
|
|
||||||
mapping.update(frame.f_locals)
|
|
||||||
def repl(m):
|
|
||||||
return mapping[m.group(m.lastindex)]
|
|
||||||
return re.sub(r'\$(?:([_a-z]\w*)|\{([_a-z]\w*)\})', repl, self)
|
|
||||||
|
|
||||||
And here are some examples:
|
|
||||||
|
|
||||||
s = dstr('${name} was born in ${country}')
|
|
||||||
print s.sub({'name': 'Guido',
|
|
||||||
'country': 'the Netherlands'})
|
|
||||||
|
|
||||||
name = 'Barry'
|
|
||||||
country = 'the USA'
|
|
||||||
print s.sub()
|
|
||||||
|
|
||||||
This will print "Guido was born in the Netherlands" followed by
|
|
||||||
"Barry was born in the USA".
|
|
||||||
|
|
||||||
|
|
||||||
Handling Missing Keys
|
|
||||||
|
|
||||||
What should happen when one of the substitution keys is missing
|
|
||||||
from the mapping (or the locals/globals namespace if no argument
|
|
||||||
is given)? There are two possibilities:
|
|
||||||
|
|
||||||
- We can simply allow the exception.
|
|
||||||
|
|
||||||
- We can return the original substitution placeholder unchanged.
|
|
||||||
|
|
||||||
An example of the first is:
|
|
||||||
|
|
||||||
print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
|
|
||||||
|
|
||||||
would raise:
|
|
||||||
|
|
||||||
Traceback (most recent call last):
|
|
||||||
File "sub.py", line 66, in ?
|
|
||||||
print s.sub({'name': 'Bob'})
|
|
||||||
File "sub.py", line 26, in sub
|
|
||||||
return EMPTYSTRING.join(filter(None, parts)) % mapping
|
|
||||||
KeyError: country
|
|
||||||
|
|
||||||
An example of the second is:
|
|
||||||
|
|
||||||
print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
|
|
||||||
|
|
||||||
would print:
|
|
||||||
|
|
||||||
Bob was born in ${country}
|
|
||||||
|
|
||||||
We could almost ignore the issue, since the latter example could
|
|
||||||
be accomplished by passing in a "safe-dictionary" in instead of a
|
|
||||||
normal dictionary, like so:
|
|
||||||
|
|
||||||
class safedict(dict):
|
|
||||||
def __getitem__(self, key):
|
|
||||||
try:
|
|
||||||
return dict.__getitem__(self, key)
|
|
||||||
except KeyError:
|
|
||||||
return '${%s}' % key
|
|
||||||
|
|
||||||
so that
|
|
||||||
|
|
||||||
d = safedict({'name': 'Bob'})
|
|
||||||
print dstr('${name} was born in ${country}').sub(d)
|
|
||||||
|
|
||||||
would print:
|
|
||||||
|
|
||||||
Bob was born in ${country}
|
|
||||||
|
|
||||||
The one place where this won't work is when no arguments are given
|
|
||||||
to the .sub() method. .sub() wouldn't know whether to wrap
|
|
||||||
locals/globals in a safedict or not.
|
|
||||||
|
|
||||||
This ambiguity can be solved in several ways:
|
|
||||||
|
|
||||||
- we could have a parallel method called .safesub() which always
|
|
||||||
wrapped its argument in a safedict()
|
|
||||||
|
|
||||||
- .sub() could take an optional keyword argument flag which
|
|
||||||
indicates whether to wrap the argument in a safedict or not.
|
|
||||||
|
|
||||||
- .sub() could take an optional keyword argument which is a
|
|
||||||
callable that would get called with the original mapping and
|
|
||||||
return the mapping to be used for the substitution. By default,
|
|
||||||
this callable would be the identity function, but you could
|
|
||||||
easily pass in the safedict constructor instead.
|
|
||||||
|
|
||||||
BDFL proto-pronouncement: Strongly in favor of raising the
|
|
||||||
exception, with KeyError when a dict is used and NameError when
|
|
||||||
locals/globals are used. There may not be sufficient use case for
|
|
||||||
soft failures in the no-argument version.
|
|
||||||
|
|
||||||
|
|
||||||
Open Issues, Comments, and Suggestions
|
|
||||||
|
|
||||||
- Ka-Ping Yee makes the suggestion that .sub() should take keyword
|
|
||||||
arguments instead of a dictionary, and that if a dictionary was
|
|
||||||
to be passed in it should be done with **dict. For example:
|
|
||||||
|
|
||||||
s = '${name} was born in ${country}'
|
|
||||||
print s.sub(name='Guido', country='the Netherlands')
|
|
||||||
|
|
||||||
or
|
|
||||||
|
|
||||||
print s.sub(**{'name': 'Guido', 'country': 'the Netherlands'})
|
|
||||||
|
|
||||||
- Paul Prescod wonders whether having a method use sys._getframe()
|
|
||||||
doesn't set a bad precedent.
|
|
||||||
|
|
||||||
- Oren Tirosh suggests that .sub() take an optional argument which
|
|
||||||
would be used as a default for missing keys. If the optional
|
|
||||||
argument were not given, an exception would be raised. This may
|
|
||||||
not play well with Ka-Ping's suggestion.
|
|
||||||
|
|
||||||
- Other suggestions have been made as an alternative to a string
|
|
||||||
method including: a builtin function, a function in a module, an
|
|
||||||
operator (similar to "string % dict", e.g. "string / dict").
|
|
||||||
One strong argument for making it a built-in is given by Paul
|
|
||||||
Prescod:
|
|
||||||
|
|
||||||
"I really hate putting things in modules that will be needed in
|
|
||||||
a Python programmer's second program (the one after "Hello
|
|
||||||
world"). If this is to be the *simpler* way of doing
|
|
||||||
introspection then getting at it should be simpler than getting
|
|
||||||
at "%". $ is taught in hour 2, import is taught on day 2.
|
|
||||||
Some people may never make it to the metaphorical day 2 if they
|
|
||||||
are doing simple text processing in some kind of
|
|
||||||
embedded-Python environment."
|
|
||||||
|
|
||||||
- Should we take a cue from the `make' program and allow $(name)
|
|
||||||
as an alternative (or instead of) ${name}?
|
|
||||||
|
|
||||||
- Should we require a dictionary to the .sub() method? Some
|
|
||||||
people feel that it could be a security risk allowing implicit
|
|
||||||
access to globals/locals, even with the proper admonitions in
|
|
||||||
the documentation. In that case, a new built-in would be
|
|
||||||
necessary (because none of globals(), locals(), or vars() does
|
|
||||||
the right the w.r.t. nested scopes, etc.). Chirstian Tismer
|
|
||||||
has suggested allvars(). Perhaps allvars() should be a method
|
|
||||||
on a frame object (too?)?
|
|
||||||
|
|
||||||
- It has been suggested that using $ at all violates TOOWTDI.
|
|
||||||
Some other suggestions include using the % sign in the
|
|
||||||
following way: %{name}
|
|
||||||
|
|
||||||
|
|
||||||
Comparison to PEP 215
|
Comparison to PEP 215
|
||||||
|
@ -274,7 +97,7 @@ Comparison to PEP 215
|
||||||
PEP 215 describes an alternate proposal for string interpolation.
|
PEP 215 describes an alternate proposal for string interpolation.
|
||||||
Unlike that PEP, this one does not propose any new syntax for
|
Unlike that PEP, this one does not propose any new syntax for
|
||||||
Python. All the proposed new features are embodied in a new
|
Python. All the proposed new features are embodied in a new
|
||||||
string method. PEP 215 proposes a new string prefix
|
library module. PEP 215 proposes a new string prefix
|
||||||
representation such as $"" which signal to Python that a new type
|
representation such as $"" which signal to Python that a new type
|
||||||
of string is present. $-strings would have to interact with the
|
of string is present. $-strings would have to interact with the
|
||||||
existing r-prefixes and u-prefixes, essentially doubling the
|
existing r-prefixes and u-prefixes, essentially doubling the
|
||||||
|
@ -283,47 +106,44 @@ Comparison to PEP 215
|
||||||
PEP 215 also allows for arbitrary Python expressions inside the
|
PEP 215 also allows for arbitrary Python expressions inside the
|
||||||
$-strings, so that you could do things like:
|
$-strings, so that you could do things like:
|
||||||
|
|
||||||
import sys
|
import sys
|
||||||
print $"sys = $sys, sys = $sys.modules['sys']"
|
print $"sys = $sys, sys = $sys.modules['sys']"
|
||||||
|
|
||||||
which would return
|
which would return
|
||||||
|
|
||||||
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
|
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
|
||||||
|
|
||||||
It's generally accepted that the rules in PEP 215 are safe in the
|
It's generally accepted that the rules in PEP 215 are safe in the
|
||||||
sense that they introduce no new security issues (see PEP 215,
|
sense that they introduce no new security issues (see PEP 215,
|
||||||
"Security Issues" for details). However, the rules are still
|
"Security Issues" for details). However, the rules are still
|
||||||
quite complex, and make it more difficult to see what exactly is
|
quite complex, and make it more difficult to see the substitution
|
||||||
the substitution placeholder in the original $-string.
|
placeholder in the original $-string.
|
||||||
|
|
||||||
By design, this PEP does not provide as much interpolation power
|
The interesting thing is that the dstring class defined in this
|
||||||
as PEP 215, however it is expected that the no-argument version of
|
PEP has nothing to say about the values that are substituted for
|
||||||
.sub() allows at least as much power with no loss of readability.
|
the placeholders. Thus, with a little extra work, it's possible
|
||||||
|
to support PEP 215's functionality using existing Python syntax.
|
||||||
|
|
||||||
|
For example, one could define a subclass of dict that allowed a
|
||||||
|
more complex placeholder syntax and a mapping that evaluated those
|
||||||
|
placeholders.
|
||||||
|
|
||||||
|
|
||||||
BDFL Weathervane
|
Internationalization
|
||||||
|
|
||||||
Guido lays out[3] what he feels are the real issues that need to
|
The reference implementation accomplishes this magic by parsing
|
||||||
be fleshed out in this PEP:
|
the constructor string, transforming $-strings into standard
|
||||||
|
Python %-strings. dstring caches this value and uses it whenever
|
||||||
|
the special __mod__() method is called via the % operator.
|
||||||
|
However the string value of a dstring is the string that was
|
||||||
|
passed to its constructor.
|
||||||
|
|
||||||
- Compile-time vs. run-time parsing. I've become convinced that
|
This approach allows a gettext-based internationalized program to
|
||||||
the compiler should do the parsing: this is the only way to make
|
use the dstring instance as a lookup into the catalog; in fact
|
||||||
access to variables in nested scopes work, avoids security
|
gettext doesn't care that the catalog key is a dstring. Because
|
||||||
issues, and makes it easier to diagnose errors (e.g. in
|
the value of the dstring is the original $-string, translators
|
||||||
PyChecker).
|
also never need to use %-strings. The right thing will happen at
|
||||||
|
run-time.
|
||||||
- How to support translation. Here the template must be replaced
|
|
||||||
at run-time, but it is still desirable that the collection of
|
|
||||||
available names is known at compile time (to avoid the security
|
|
||||||
issues).
|
|
||||||
|
|
||||||
- Optional formatting specifiers. I agree with Lalo that these
|
|
||||||
should not be part of the interpolation syntax but need to be
|
|
||||||
dealt with at a different level. I think these are only
|
|
||||||
relevant for numeric data. Funny, there's still a
|
|
||||||
(now-deprecated) module fpformat.py that supports arbitrary
|
|
||||||
floating point formatting, and string.zfill() supports a bit of
|
|
||||||
integer formatting.
|
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
|
@ -332,11 +152,14 @@ References
|
||||||
http://www.python.org/doc/current/lib/typesseq-strings.html
|
http://www.python.org/doc/current/lib/typesseq-strings.html
|
||||||
|
|
||||||
[2] Identifiers and Keywords
|
[2] Identifiers and Keywords
|
||||||
http://www.python.org/doc/current/ref/identifiers.html
|
http://www.python.org/doc/current/ref/identifiers.html
|
||||||
|
|
||||||
[3] Guido's python-dev posting from 21-Jul-2002
|
[3] Guido's python-dev posting from 21-Jul-2002
|
||||||
http://mail.python.org/pipermail/python-dev/2002-July/026397.html
|
http://mail.python.org/pipermail/python-dev/2002-July/026397.html
|
||||||
|
|
||||||
|
[4] Reference implementation
|
||||||
|
http://sourceforge.net/tracker/index.php?func=detail&aid=922115&group_id=5470&atid=305470
|
||||||
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue