python-peps/peps/pep-0292.rst

207 lines
7.7 KiB
ReStructuredText

PEP: 292
Title: Simpler String Substitutions
Author: Barry Warsaw <barry@python.org>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Jun-2002
Python-Version: 2.4
Post-History: 18-Jun-2002, 23-Mar-2004, 22-Aug-2004
Replaces: 215
Abstract
========
This PEP describes a simpler string substitution feature, also
known as string interpolation. This PEP is "simpler" in two
respects:
1. Python's current string substitution feature
(i.e. ``%``-substitution) is complicated and error prone. This PEP
is simpler at the cost of some expressiveness.
2. :pep:`215` proposed an alternative string interpolation feature,
introducing a new ``$`` string prefix. :pep:`292` is simpler than
this because it involves no syntax changes and has much simpler
rules for what substitutions can occur in the string.
Rationale
=========
Python currently supports a string substitution syntax based on
C's ``printf()`` '``%``' formatting character [1]_. While quite rich,
``%``-formatting codes are also error prone, even for
experienced Python programmers. A common mistake is to leave off
the trailing format character, e.g. the '``s``' in ``"%(name)s"``.
In addition, the rules for what can follow a ``%`` sign are fairly
complex, while the usual application rarely needs such complexity.
Most scripts need to do some string interpolation, but most of
those use simple 'stringification' formats, i.e. ``%s`` or ``%(name)s``
This form should be made simpler and less error prone.
A Simpler Proposal
==================
We propose the addition of a new class, called ``Template``, which
will live in the string module. The ``Template`` class supports new
rules for string substitution; its value contains placeholders,
introduced with the ``$`` character. The following rules for
``$``-placeholders apply:
1. ``$$`` is an escape; it is replaced with a single ``$``
2. ``$identifier`` names a substitution placeholder matching a mapping
key of "identifier". By default, "identifier" must spell a
Python identifier as defined in [2]_. The first non-identifier
character after the ``$`` character terminates this placeholder
specification.
3. ``${identifier}`` is equivalent to ``$identifier``. It is required
when valid identifier characters follow the placeholder but are
not part of the placeholder, e.g. ``"${noun}ification"``.
If the ``$`` character appears at the end of the line, or is followed
by any other character than those described above, a ``ValueError``
will be raised at interpolation time. Values in mapping are
converted automatically to strings.
No other characters have special meaning, however it is possible
to derive from the ``Template`` class to define different substitution
rules. For example, a derived class could allow for periods in
the placeholder (e.g. to support a kind of dynamic namespace and
attribute path lookup), or could define a delimiter character
other than ``$``.
Once the ``Template`` has been created, substitutions can be performed
by calling one of two methods:
- ``substitute()``. This method returns a new string which results
when the values of a mapping are substituted for the
placeholders in the ``Template``. If there are placeholders which
are not present in the mapping, a ``KeyError`` will be raised.
- ``safe_substitute()``. This is similar to the ``substitute()`` method,
except that ``KeyErrors`` are never raised (due to placeholders
missing from the mapping). When a placeholder is missing, the
original placeholder will appear in the resulting string.
Here are some examples::
>>> from string import Template
>>> s = Template('${name} was born in ${country}')
>>> print s.substitute(name='Guido', country='the Netherlands')
Guido was born in the Netherlands
>>> print s.substitute(name='Guido')
Traceback (most recent call last):
[...]
KeyError: 'country'
>>> print s.safe_substitute(name='Guido')
Guido was born in ${country}
The signature of ``substitute()`` and ``safe_substitute()`` allows for
passing the mapping of placeholders to values, either as a single
dictionary-like object in the first positional argument, or as
keyword arguments as shown above. The exact details and
signatures of these two methods is reserved for the standard
library documentation.
Why ``$`` and Braces?
=====================
The BDFL said it best [3]_: "The ``$`` means "substitution" in so many
languages besides Perl that I wonder where you've been. [...]
We're copying this from the shell."
Thus the substitution rules are chosen because of the similarity
with so many other languages. This makes the substitution rules
easier to teach, learn, and remember.
Comparison to PEP 215
=====================
:pep:`215` describes an alternate proposal for string interpolation.
Unlike that PEP, this one does not propose any new syntax for
Python. All the proposed new features are embodied in a new
library module. :pep:`215` proposes a new string prefix
representation such as ``$""`` which signal to Python that a new type
of string is present. ``$``-strings would have to interact with the
existing r-prefixes and u-prefixes, essentially doubling the
number of string prefix combinations.
:pep:`215` also allows for arbitrary Python expressions inside the
``$``-strings, so that you could do things like::
import sys
print $"sys = $sys, sys = $sys.modules['sys']"
which would return::
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
It's generally accepted that the rules in :pep:`215` are safe in the
sense that they introduce no new security issues (see :pep:`215`,
"Security Issues" for details). However, the rules are still
quite complex, and make it more difficult to see the substitution
placeholder in the original ``$``-string.
The interesting thing is that the ``Template`` class defined in this
PEP is designed for inheritance and, with a little extra work,
it's possible to support :pep:`215`'s functionality using existing
Python syntax.
For example, one could define subclasses of ``Template`` and dict that
allowed for a more complex placeholder syntax and a mapping that
evaluated those placeholders.
Internationalization
====================
The implementation supports internationalization by recording the
original template string in the ``Template`` instance's ``template``
attribute. This attribute would serve as the lookup key in an
gettext-based catalog. It is up to the application to turn the
resulting string back into a ``Template`` for substitution.
However, the ``Template`` class was designed to work more intuitively
in an internationalized application, by supporting the mixing-in
of ``Template`` and unicode subclasses. Thus an internationalized
application could create an application-specific subclass,
multiply inheriting from ``Template`` and unicode, and using instances
of that subclass as the gettext catalog key. Further, the
subclass could alias the special ``__mod__()`` method to either
``.substitute()`` or ``.safe_substitute()`` to provide a more traditional
string/unicode like ``%``-operator substitution syntax.
Reference Implementation
========================
The implementation [4]_ has been committed to the Python 2.4 source tree.
References
==========
.. [1] String Formatting Operations
https://docs.python.org/release/2.6/library/stdtypes.html#string-formatting-operations
.. [2] Identifiers and Keywords
https://docs.python.org/release/2.6/reference/lexical_analysis.html#identifiers-and-keywords
.. [3] https://mail.python.org/pipermail/python-dev/2002-June/025652.html
.. [4] Reference Implementation
http://sourceforge.net/tracker/index.php?func=detail&aid=1014055&group_id=5470&atid=305470
Copyright
=========
This document has been placed in the public domain.