Updated based on recent Pycon discussions, with a pointer to the SF patch

containing the reference implementation.
2004-03-24 03:08:02 +00:00 · 2004-03-24 03:08:02 +00:00 · fa6ff92955
parent b2436faf3b
commit fa6ff92955
1 changed files with 68 additions and 245 deletions
--- a/pep-0292.txt
+++ b/pep-0292.txt
@ -6,8 +6,8 @@ Author: barry@python.org (Barry A. Warsaw)
 Status: Draft
 Type: Standards Track
 Created: 18-Jun-2002
-Python-Version: 2.3
-Post-History: 18-Jun-2002
+Python-Version: 2.4
+Post-History: 18-Jun-2002, 23-Mar-2004


 Abstract
@ -16,30 +16,26 @@ Abstract
    known as string interpolation.  This PEP is "simpler" in two
    respects:

-    1. Python's current string substitution feature (commonly known as
-       %-substitutions) is complicated and error prone.  This PEP is
-       simpler at the cost of less expressiveness.
+    1. Python's current string substitution feature
+       (i.e. %-substitution) is complicated and error prone.  This PEP
+       is simpler at the cost of some expressiveness.

    2. PEP 215 proposed an alternative string interpolation feature,
       introducing a new `$' string prefix.  PEP 292 is simpler than
       this because it involves no syntax changes and has much simpler
       rules for what substitutions can occur in the string.
-       
+

 Rationale

-    Python currently supports a string substitution (a.k.a. string
-    interpolation) syntax based on C's printf() % formatting
-    character[1].  While quite rich, %-formatting codes are also quite
-    error prone, even for experienced Python programmers.  A common
-    mistake is to leave off the trailing format character, e.g. the
-    `s' in "%(name)s".
+    Python currently supports a string substitution syntax based on
+    C's printf() '%' formatting character[1].  While quite rich,
+    %-formatting codes are also error prone, even for
+    experienced Python programmers.  A common mistake is to leave off
+    the trailing format character, e.g. the `s' in "%(name)s".

    In addition, the rules for what can follow a % sign are fairly
-    complex, while the usual application rarely needs such
-    complexity.  Also error prone is the right-hand side of the %
-    operator: e.g. singleton tuples.
-
+    complex, while the usual application rarely needs such complexity.
    Most scripts need to do some string interpolation, but most of
    those use simple `stringification' formats, i.e. %s or %(name)s
    This form should be made simpler and less error prone.
@ -47,226 +43,53 @@ Rationale

 A Simpler Proposal

-    Here we propose the addition of a new string method, called .sub()
-    which performs substitution of mapping values into a string with
-    special substitution placeholders.  These placeholders are
+    We propose the addition of a new class -- called 'dstring' --
+    derived from the built-in unicode type, which supports new rules
+    for string substitution.  dstring's value contains placeholders,
    introduced with the $ character.  The following rules for
    $-placeholders apply:

    1. $$ is an escape; it is replaced with a single $

    2. $identifier names a substitution placeholder matching a mapping
-       key of "identifier".  "identifier" must be a Python identifier
-       as defined in [2].  The first non-identifier character after
-       the $ character terminates this placeholder specification.
+       key of "identifier".  By default, "identifier" must spell a
+       Python identifier as defined in [2].  The first non-identifier
+       character after the $ character terminates this placeholder
+       specification.

    3. ${identifier} is equivalent to $identifier.  It is required for
       when valid identifier characters follow the placeholder but are
       not part of the placeholder, e.g. "${noun}ification".

-    No other characters have special meaning.
+    No other characters have special meaning, however it is possible
+    to derive from the dstring class to define different rules for the
+    placeholder.  For example, a derived class could allow for periods
+    in the placeholder (e.g. to support a kind of dynamic namespace
+    and attribute path lookup).

-    The .sub() method takes an optional mapping (e.g. dictionary)
-    where the keys match placeholders in the string, and the values
-    are substituted for the placeholders.  For example:
+    Once the dstring has been created, substitutions can be performed
+    using traditional Python syntax.  For example:

-	'${name} was born in ${country}'.sub({'name': 'Guido',
-					      'country': 'the Netherlands'})
-
-    returns
-
-        'Guido was born in the Netherlands'
-
-    The mapping argument is optional; if it is omitted then the
-    mapping is taken from the locals and globals of the context in
-    which the .sub() method is executed.  For example:
-
-	def birth(self, name):
-	    country = self.countryOfOrigin[name]
-	    return '${name} was born in ${country}'.sub()
-
-	birth('Guido')
-
-    returns
-
-	'Guido was born in the Netherlands'
+        >>> mapping = dict(name='Guido', country='the Netherlands')
+        >>> s = dstring('${name} was born in ${country})
+        >>> print s % mapping
+        Guido was born in the Netherlands


 Why `$' and Braces?

    The BDFL said it best: The $ means "substitution" in so many
-    languages besides Perl that I wonder where you've been. [...] 
+    languages besides Perl that I wonder where you've been. [...]
    We're copying this from the shell.


-Security Issues
-
-    Never use no-arg .sub() on strings that come from untrusted
-    sources.  It could be used to gain unauthorized information about
-    variables in your local or global scope.
-
-
 Reference Implementation

-    Here's a Python 2.2-based reference implementation.  Of course the
-    real implementation would be in C, would not require a string
-    subclass, and would not be modeled on the existing %-interpolation
-    feature.
-
-	import sys
-	import re
-
-	class dstr(str):
-            def sub(self, mapping=None):
-                # Default mapping is locals/globals of caller
-                if mapping is None:
-                    frame = sys._getframe(1)
-                    mapping = frame.f_globals.copy()
-                    mapping.update(frame.f_locals)
-                def repl(m):
-                    return mapping[m.group(m.lastindex)]
-                return re.sub(r'\$(?:([_a-z]\w*)|\{([_a-z]\w*)\})', repl, self)
-    
-    And here are some examples:
-
-	s = dstr('${name} was born in ${country}')
-	print s.sub({'name': 'Guido',
-		     'country': 'the Netherlands'})
-
-	name = 'Barry'
-	country = 'the USA'
-	print s.sub()
-
-    This will print "Guido was born in the Netherlands" followed by
-    "Barry was born in the USA".
-
-
-Handling Missing Keys
-
-    What should happen when one of the substitution keys is missing
-    from the mapping (or the locals/globals namespace if no argument
-    is given)?  There are two possibilities:
-
-    - We can simply allow the exception.
-
-    - We can return the original substitution placeholder unchanged.
-
-    An example of the first is:
-
-        print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
-
-    would raise:
-
-	Traceback (most recent call last):
-	  File "sub.py", line 66, in ?
-	    print s.sub({'name': 'Bob'})
-	  File "sub.py", line 26, in sub
-	    return EMPTYSTRING.join(filter(None, parts)) % mapping
-	KeyError: country
-
-    An example of the second is:
-
-        print dstr('${name} was born in ${country}').sub({'name': 'Bob'})
-
-    would print:
-
-	Bob was born in ${country}
-
-    We could almost ignore the issue, since the latter example could
-    be accomplished by passing in a "safe-dictionary" in instead of a
-    normal dictionary, like so:
-
-	class safedict(dict):
-	    def __getitem__(self, key):
-		try:
-		    return dict.__getitem__(self, key)
-		except KeyError:
-		    return '${%s}' % key
-
-    so that
-
-	d = safedict({'name': 'Bob'})
-	print dstr('${name} was born in ${country}').sub(d)
-
-    would print:
-
-	Bob was born in ${country}
-
-    The one place where this won't work is when no arguments are given
-    to the .sub() method.  .sub() wouldn't know whether to wrap
-    locals/globals in a safedict or not.
-
-    This ambiguity can be solved in several ways:
-
-    - we could have a parallel method called .safesub() which always
-      wrapped its argument in a safedict()
-
-    - .sub() could take an optional keyword argument flag which
-      indicates whether to wrap the argument in a safedict or not.
-
-    - .sub() could take an optional keyword argument which is a
-      callable that would get called with the original mapping and
-      return the mapping to be used for the substitution.  By default,
-      this callable would be the identity function, but you could
-      easily pass in the safedict constructor instead.
-
-    BDFL proto-pronouncement: Strongly in favor of raising the
-    exception, with KeyError when a dict is used and NameError when
-    locals/globals are used.  There may not be sufficient use case for
-    soft failures in the no-argument version.
-
-
-Open Issues, Comments, and Suggestions
-
-    - Ka-Ping Yee makes the suggestion that .sub() should take keyword
-      arguments instead of a dictionary, and that if a dictionary was
-      to be passed in it should be done with **dict.  For example:
-
-      s = '${name} was born in ${country}'
-      print s.sub(name='Guido', country='the Netherlands')
-
-      or
-
-      print s.sub(**{'name': 'Guido', 'country': 'the Netherlands'})
-
-    - Paul Prescod wonders whether having a method use sys._getframe()
-      doesn't set a bad precedent.
-
-    - Oren Tirosh suggests that .sub() take an optional argument which
-      would be used as a default for missing keys.  If the optional
-      argument were not given, an exception would be raised.  This may
-      not play well with Ka-Ping's suggestion.
-
-    - Other suggestions have been made as an alternative to a string
-      method including: a builtin function, a function in a module, an
-      operator (similar to "string % dict", e.g. "string / dict").
-      One strong argument for making it a built-in is given by Paul
-      Prescod:
-
-      "I really hate putting things in modules that will be needed in
-       a Python programmer's second program (the one after "Hello
-       world").  If this is to be the *simpler* way of doing
-       introspection then getting at it should be simpler than getting
-       at "%".  $ is taught in hour 2, import is taught on day 2.
-       Some people may never make it to the metaphorical day 2 if they
-       are doing simple text processing in some kind of
-       embedded-Python environment."
-
-     - Should we take a cue from the `make' program and allow $(name)
-       as an alternative (or instead of) ${name}?
-
-     - Should we require a dictionary to the .sub() method?  Some
-       people feel that it could be a security risk allowing implicit
-       access to globals/locals, even with the proper admonitions in
-       the documentation.  In that case, a new built-in would be
-       necessary (because none of globals(), locals(), or vars() does
-       the right the w.r.t. nested scopes, etc.).  Chirstian Tismer
-       has suggested allvars().  Perhaps allvars() should be a method
-       on a frame object (too?)?
-
-     - It has been suggested that using $ at all violates TOOWTDI.
-       Some other suggestions include using the % sign in the
-       following way: %{name}
+    A reference implementation is available at [4].  The
+    implementation contains the dstring class described above,
+    situated in a new standard library package called 'stringlib'.
+    Inside the reference implementation stringlib package are a few
+    other related nifty tools that aren't described in this PEP.


 Comparison to PEP 215
@ -274,7 +97,7 @@ Comparison to PEP 215
    PEP 215 describes an alternate proposal for string interpolation.
    Unlike that PEP, this one does not propose any new syntax for
    Python.  All the proposed new features are embodied in a new
-    string method.  PEP 215 proposes a new string prefix
+    library module.  PEP 215 proposes a new string prefix
    representation such as $"" which signal to Python that a new type
    of string is present.  $-strings would have to interact with the
    existing r-prefixes and u-prefixes, essentially doubling the
@ -283,47 +106,44 @@ Comparison to PEP 215
    PEP 215 also allows for arbitrary Python expressions inside the
    $-strings, so that you could do things like:

-	import sys
-	print $"sys = $sys, sys = $sys.modules['sys']"
+        import sys
+        print $"sys = $sys, sys = $sys.modules['sys']"

    which would return

-	sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
- 
+        sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
+
    It's generally accepted that the rules in PEP 215 are safe in the
    sense that they introduce no new security issues (see PEP 215,
    "Security Issues" for details).  However, the rules are still
-    quite complex, and make it more difficult to see what exactly is
-    the substitution placeholder in the original $-string.
+    quite complex, and make it more difficult to see the substitution
+    placeholder in the original $-string.

-    By design, this PEP does not provide as much interpolation power
-    as PEP 215, however it is expected that the no-argument version of
-    .sub() allows at least as much power with no loss of readability.
+    The interesting thing is that the dstring class defined in this
+    PEP has nothing to say about the values that are substituted for
+    the placeholders.  Thus, with a little extra work, it's possible
+    to support PEP 215's functionality using existing Python syntax.
+
+    For example, one could define a subclass of dict that allowed a
+    more complex placeholder syntax and a mapping that evaluated those
+    placeholders.


-BDFL Weathervane
+Internationalization

-    Guido lays out[3] what he feels are the real issues that need to
-    be fleshed out in this PEP:
+    The reference implementation accomplishes this magic by parsing
+    the constructor string, transforming $-strings into standard
+    Python %-strings.  dstring caches this value and uses it whenever
+    the special __mod__() method is called via the % operator.
+    However the string value of a dstring is the string that was
+    passed to its constructor.

-    - Compile-time vs. run-time parsing.  I've become convinced that
-      the compiler should do the parsing: this is the only way to make
-      access to variables in nested scopes work, avoids security
-      issues, and makes it easier to diagnose errors (e.g. in
-      PyChecker).
-
-    - How to support translation.  Here the template must be replaced
-      at run-time, but it is still desirable that the collection of
-      available names is known at compile time (to avoid the security
-      issues).
-
-    - Optional formatting specifiers.  I agree with Lalo that these
-      should not be part of the interpolation syntax but need to be
-      dealt with at a different level.  I think these are only
-      relevant for numeric data.  Funny, there's still a
-      (now-deprecated) module fpformat.py that supports arbitrary
-      floating point formatting, and string.zfill() supports a bit of
-      integer formatting.
+    This approach allows a gettext-based internationalized program to
+    use the dstring instance as a lookup into the catalog; in fact
+    gettext doesn't care that the catalog key is a dstring.  Because
+    the value of the dstring is the original $-string, translators
+    also never need to use %-strings.  The right thing will happen at
+    run-time.


 References
@ -332,11 +152,14 @@ References
        http://www.python.org/doc/current/lib/typesseq-strings.html

    [2] Identifiers and Keywords
-	http://www.python.org/doc/current/ref/identifiers.html
+        http://www.python.org/doc/current/ref/identifiers.html

    [3] Guido's python-dev posting from 21-Jul-2002
        http://mail.python.org/pipermail/python-dev/2002-July/026397.html

+    [4] Reference implementation
+        http://sourceforge.net/tracker/index.php?func=detail&aid=922115&group_id=5470&atid=305470
+

 Copyright