PEP 292, Simpler String Substitutions, Warsaw

2002-06-19 02:53:41 +00:00 · 2002-06-19 02:53:41 +00:00 · 8f14274321
parent d5c82d205f
commit 8f14274321
1 changed files with 267 additions and 0 deletions
--- a/pep-0292.txt
+++ b/pep-0292.txt
@ -0,0 +1,267 @@
+PEP: 292
+Title: Simpler String Substitutions
+Version: $Revision$
+Last-Modified: $Date$
+Author: barry@zope.com (Barry A. Warsaw)
+Status: Draft
+Type: Standards Track
+Created: 18-Jun-2002
+Python-Version: 2.3
+Post-History:
+
+
+Abstract
+
+    This PEP describes a simpler string substitution feature, also
+    known as string interpolation.  This PEP is "simpler" in two
+    respects:
+
+    1. Python's current string substitution feature (commonly known as
+       %-substitutions) is complicated and error prone.  This PEP is
+       simpler at the cost of less expressiveness.
+
+    2. PEP 215 proposed an alternative string interpolation feature,
+       introducing a new `$' string prefix.  PEP 292 is simpler than
+       this because it involves no syntax changes and has much simpler
+       rules for what substitutions can occur in the string.
+       
+
+Rationale
+
+    Python currently supports a string substitution (a.k.a. string
+    interpolation) syntax based on C's printf() % formatting
+    character[1].  While quite rich, %-formatting codes are also quite
+    error prone, even for experienced Python programmers.  A common
+    mistake is to leave off the trailing format character, e.g. the
+    `s' in "%(name)s".
+
+    In addition, the rules for what can follow a % sign are fairly
+    complex, while the usual application rarely needs such complexity.
+
+
+A Simpler Proposal
+
+    Here we propose the addition of a new string method, called .sub()
+    which performs substitution of mapping values into a string with
+    special substitution placeholders.  These placeholders are
+    introduced with the $ character.  The following rules for
+    $-placeholders apply:
+
+    1. $$ is an escape; it is replaced with a single $
+
+    2. $identifier names a substitution placeholder matching a mapping
+       key of "identifier".  "identifier" must be a Python identifier
+       as defined in [2].  The first non-identifier character after
+       the $ character terminates this placeholder specification.
+
+    3. ${identifier} is equivalent to $identifier and for clarity,
+       this is the preferred form.  It is required for when valid
+       identifier characters follow the placeholder but are not part of
+       the placeholder, e.g. "${noun}ification".
+
+    No other characters have special meaning.
+
+    The .sub() method takes an optional mapping (e.g. dictionary)
+    where the keys match placeholders in the string, and the values
+    are substituted for the placeholders.  For example:
+
+	'${name} was born in ${country}'.sub({'name': 'Guido',
+					      'country': 'the Netherlands'})
+
+    returns
+
+        'Guido was born in the Netherlands'
+
+    The mapping argument is optional; if it is omitted then the
+    mapping is taken from the locals and globals of the context in
+    which the .sub() method is executed.  For example:
+
+	def birth(self, name):
+	    country = self.countryOfOrigin['name']
+	    return '${name} was born in ${country}'
+
+	birth('Guido')
+
+    returns
+
+	'Guido was born in the Netherlands'
+
+
+Reference Implementation
+
+    Here's a Python 2.2-based reference implementation.  Of course the
+    real implementation would be in C, would not require a string
+    subclass, and would not be modeled on the existing %-interpolation
+    feature.
+
+	import sys
+	import re
+
+	dre = re.compile(r'(\$\$)|\$([_a-z]\w*)|\$\{([_a-z]\w*)\}', re.I)
+	EMPTYSTRING = ''
+
+	class dstr(str):
+	    def sub(self, mapping=None):
+		# Default mapping is locals/globals of caller
+		if mapping is None:
+		    frame = sys._getframe(1)
+		    mapping = frame.f_globals.copy()
+		    mapping.update(frame.f_locals)
+		# Escape %'s
+		s = self.replace('%', '%%')
+		# Convert $name and ${name} to $(name)s
+		parts = dre.split(s)
+		for i in range(1, len(parts), 4):
+		    if parts[i] is not None:
+			parts[i] = '$'
+		    elif parts[i+1] is not None:
+			parts[i+1] = '%(' + parts[i+1] + ')s'
+		    else:
+			parts[i+2] = '%(' + parts[i+2] + ')s'
+		# Interpolate
+		return EMPTYSTRING.join(filter(None, parts)) % mapping
+    
+    And here are some examples:
+
+	s = dstr('${name} was born in ${country}')
+	print s.sub({'name': 'Guido',
+		     'country': 'the Netherlands'})
+
+	name = 'Barry'
+	country = 'the USA'
+	print s.sub()
+
+    This will print "Guido was born in the Netherlands" followed by
+    "Barry was born in the USA".
+
+
+Handling Missing Keys
+
+    What should happen when one of the substitution keys is missing
+    from the mapping (or the locals/globals namespace if no argument
+    is given)?  There are two possibilities:
+
+    - We can simply allow the exception (likely a NameError or
+      KeyError) to propagate.
+
+    - We can return the original substitution placeholder unchanged.
+
+    An example of the first is:
+
+        print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
+
+    would raise:
+
+	Traceback (most recent call last):
+	  File "sub.py", line 66, in ?
+	    print s.sub({'name': 'Bob'})
+	  File "sub.py", line 26, in sub
+	    return EMPTYSTRING.join(filter(None, parts)) % mapping
+	KeyError: country
+
+    An example of the second is:
+
+        print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
+
+    would print:
+
+	Bob was born in ${country}
+
+    The PEP author would prefer the latter interpretation, although a
+    case can be made for raising the exception instead.  We could
+    almost ignore the issue, since the latter example could be
+    accomplished by passing in a "safe-dictionary" in instead of a
+    normal dictionary, like so:
+
+	class safedict(dict):
+	    def __getitem__(self, key):
+		try:
+		    return dict.__getitem__(self, key)
+		except KeyError:
+		    return '${%s}' % key
+
+    so that
+
+	d = safedict({'name': 'Bob'})
+	print dstr('${name} was born in ${country}').sub(d)
+
+    would print:
+
+	Bob was born in ${country}
+
+    The one place where this won't work is when no arguments are given
+    to the .sub() method.  .sub() wouldn't know whether to wrap
+    locals/globals in a safedict or not.
+
+    This ambiguity can be solved in several ways:
+
+    - we could have a parallel method called .safesub() which always
+      wrapped its argument in a safedict()
+
+    - .sub() could take an optional keyword argument flag which
+      indicates whether to wrap the argument in a safedict or not.
+
+    - .sub() could take an optional keyword argument which is a
+      callable that would get called with the original mapping and
+      return the mapping to be used for the substitution.  By default,
+      this callable would be the identity function, but you could
+      easily pass in the safedict constructor instead.
+
+    BDFL proto-pronouncement: It should always raise a NameError when
+    the key is missing.  There may not be sufficient use case for soft
+    failures in the no-argument version.
+
+
+Comparison to PEP 215
+
+    PEP 215 describes an alternate proposal for string interpolation.
+    Unlike that PEP, this one does not propose any new syntax for
+    Python.  All the proposed new features are embodied in a new
+    string method.  PEP 215 proposes a new string prefix
+    representation such as $"" which signal to Python that a new type
+    of string is present.  $-strings would have to interact with the
+    existing r-prefixes and u-prefixes, essentially doubling the
+    number of string prefix combinations.
+
+    PEP 215 also allows for arbitrary Python expressions inside the
+    $-strings, so that you could do things like:
+
+	import sys
+	print $"sys = $sys, sys = $sys.modules['sys']"
+
+    which would return
+
+	sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
+ 
+    It's generally accepted that the rules in PEP 215 are safe in the
+    sense that they introduce no new security issues (see PEP 215,
+    "Security Issues" for details).  However, the rules are still
+    quite complex, and make it more difficult to see what exactly is
+    the substitution placeholder in the original $-string.
+
+    By design, this PEP does not provide as much interpolation power
+    as PEP 215, however it is expected that the no-argument version of
+    .sub() allows at least as much power with no loss of readability.
+
+
+References
+
+    [1] String Formatting Operations
+        http://www.python.org/doc/current/lib/typesseq-strings.html
+
+    [2] Identifiers and Keywords
+	http://www.python.org/doc/current/ref/identifiers.html
+
+
+Copyright
+
+    This document has been placed in the public domain.
+
+
+
+Local Variables:
+mode: indented-text
+indent-tabs-mode: nil
+sentence-end-double-space: t
+fill-column: 70
+End: