709 lines
29 KiB
Plaintext
709 lines
29 KiB
Plaintext
PEP: 3101
|
||
Title: Advanced String Formatting
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Talin <talin at acm.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/plain
|
||
Created: 16-Apr-2006
|
||
Python-Version: 3.0
|
||
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
|
||
|
||
|
||
Abstract
|
||
|
||
This PEP proposes a new system for built-in string formatting
|
||
operations, intended as a replacement for the existing '%' string
|
||
formatting operator.
|
||
|
||
|
||
Rationale
|
||
|
||
Python currently provides two methods of string interpolation:
|
||
|
||
- The '%' operator for strings. [1]
|
||
|
||
- The string.Template module. [2]
|
||
|
||
The primary scope of this PEP concerns proposals for built-in
|
||
string formatting operations (in other words, methods of the
|
||
built-in string type).
|
||
|
||
The '%' operator is primarily limited by the fact that it is a
|
||
binary operator, and therefore can take at most two arguments.
|
||
One of those arguments is already dedicated to the format string,
|
||
leaving all other variables to be squeezed into the remaining
|
||
argument. The current practice is to use either a dictionary or a
|
||
tuple as the second argument, but as many people have commented
|
||
[3], this lacks flexibility. The "all or nothing" approach
|
||
(meaning that one must choose between only positional arguments,
|
||
or only named arguments) is felt to be overly constraining.
|
||
|
||
While there is some overlap between this proposal and
|
||
string.Template, it is felt that each serves a distinct need,
|
||
and that one does not obviate the other. This proposal is for
|
||
a mechanism which, like '%', is efficient for small strings
|
||
which are only used once, so, for example, compilation of a
|
||
string into a template is not contemplated in this proposal,
|
||
although the proposal does take care to define format strings
|
||
and the API in such a way that an efficient template package
|
||
could reuse the syntax and even some of the underlying
|
||
formatting code.
|
||
|
||
|
||
Specification
|
||
|
||
The specification will consist of the following parts:
|
||
|
||
- Specification of a new formatting method to be added to the
|
||
built-in string class.
|
||
|
||
- Specification of functions and flag values to be added to
|
||
the string module, so that the underlying formatting engine
|
||
can be used with additional options.
|
||
|
||
- Specification of a new syntax for format strings.
|
||
|
||
- Specification of a new set of special methods to control the
|
||
formatting and conversion of objects.
|
||
|
||
- Specification of an API for user-defined formatting classes.
|
||
|
||
- Specification of how formatting errors are handled.
|
||
|
||
Note on string encodings: When discussing this PEP in the context
|
||
of Python 3.0, it is assumed that all strings are unicode strings,
|
||
and that the use of the word 'string' in the context of this
|
||
document will generally refer to a Python 3.0 string, which is
|
||
the same as Python 2.x unicode object.
|
||
|
||
In the context of Python 2.x, the use of the word 'string' in this
|
||
document refers to an object which may either be a regular string
|
||
or a unicode object. All of the function call interfaces
|
||
described in this PEP can be used for both strings and unicode
|
||
objects, and in all cases there is sufficient information
|
||
to be able to properly deduce the output string type (in
|
||
other words, there is no need for two separate APIs).
|
||
In all cases, the type of the format string dominates - that
|
||
is, the result of the conversion will always result in an object
|
||
that contains the same representation of characters as the
|
||
input format string.
|
||
|
||
|
||
String Methods
|
||
|
||
The built-in string class (and also the unicode class in 2.6) will
|
||
gain a new method, 'format', which takes an arbitrary number of
|
||
positional and keyword arguments:
|
||
|
||
"The story of {0}, {1}, and {c}".format(a, b, c=d)
|
||
|
||
Within a format string, each positional argument is identified
|
||
with a number, starting from zero, so in the above example, 'a' is
|
||
argument 0 and 'b' is argument 1. Each keyword argument is
|
||
identified by its keyword name, so in the above example, 'c' is
|
||
used to refer to the third argument.
|
||
|
||
|
||
Format Strings
|
||
|
||
Format strings consist of intermingled character data and markup.
|
||
|
||
Character data is data which is transferred unchanged from the
|
||
format string to the output string; markup is not transferred from
|
||
the format string directly to the output, but instead is used to
|
||
define 'replacement fields' that describes to the format engine
|
||
what should be placed in the output string in the place of the
|
||
markup.
|
||
|
||
Brace characters ('curly braces') are used to indicate a
|
||
replacement field within the string:
|
||
|
||
"My name is {0}".format('Fred')
|
||
|
||
The result of this is the string:
|
||
|
||
"My name is Fred"
|
||
|
||
Braces can be escaped by doubling:
|
||
|
||
"My name is {0} :-{{}}".format('Fred')
|
||
|
||
Which would produce:
|
||
|
||
"My name is Fred :-{}"
|
||
|
||
The element within the braces is called a 'field'. Fields consist
|
||
of a 'field name', which can either be simple or compound, and an
|
||
optional 'conversion specifier'.
|
||
|
||
|
||
Simple and Compound Field Names
|
||
|
||
Simple field names are either names or numbers. If numbers, they
|
||
must be valid base-10 integers; if names, they must be valid
|
||
Python identifiers. A number is used to identify a positional
|
||
argument, while a name is used to identify a keyword argument.
|
||
|
||
A compound field name is a combination of multiple simple field
|
||
names in an expression:
|
||
|
||
"My name is {0.name}".format(file('out.txt'))
|
||
|
||
This example shows the use of the 'getattr' or 'dot' operator
|
||
in a field expression. The dot operator allows an attribute of
|
||
an input value to be specified as the field value.
|
||
|
||
The types of expressions that can be used in a compound name
|
||
have been deliberately limited in order to prevent potential
|
||
security exploits resulting from the ability to place arbitrary
|
||
Python expressions inside of strings. Only two operators are
|
||
supported, the '.' (getattr) operator, and the '[]' (getitem)
|
||
operator.
|
||
|
||
Another limitation that is defined to limit potential security
|
||
issues is that field names or attribute names beginning with an
|
||
underscore are disallowed. This enforces the common convention
|
||
that names beginning with an underscore are 'private'.
|
||
|
||
An example of the 'getitem' syntax:
|
||
|
||
"My name is {0[name]}".format(dict(name='Fred'))
|
||
|
||
It should be noted that the use of 'getitem' within a string is
|
||
much more limited than its normal use. In the above example, the
|
||
string 'name' really is the literal string 'name', not a variable
|
||
named 'name'. The rules for parsing an item key are very simple.
|
||
If it starts with a digit, then its treated as a number, otherwise
|
||
it is used as a string.
|
||
|
||
It is not possible to specify arbitrary dictionary keys from
|
||
within a format string.
|
||
|
||
Implementation note: The implementation of this proposal is
|
||
not required to enforce the rule about a name being a valid
|
||
Python identifier. Instead, it will rely on the getattr function
|
||
of the underlying object to throw an exception if the identifier
|
||
is not legal. The format function will have a minimalist parser
|
||
which only attempts to figure out when it is "done" with an
|
||
identifier (by finding a '.' or a ']', or '}', etc.) The only
|
||
exception to this laissez-faire approach is that, by default,
|
||
strings are not allowed to have leading underscores.
|
||
|
||
|
||
Conversion Specifiers
|
||
|
||
Each field can also specify an optional set of 'conversion
|
||
specifiers' which can be used to adjust the format of that field.
|
||
Conversion specifiers follow the field name, with a colon (':')
|
||
character separating the two:
|
||
|
||
"My name is {0:8}".format('Fred')
|
||
|
||
The meaning and syntax of the conversion specifiers depends on the
|
||
type of object that is being formatted, however there is a
|
||
standard set of conversion specifiers used for any object that
|
||
does not override them.
|
||
|
||
Conversion specifiers can themselves contain replacement fields.
|
||
For example, a field whose field width is itself a parameter
|
||
could be specified via:
|
||
|
||
"{0:{1}}".format(a, b, c)
|
||
|
||
Note that the doubled '}' at the end, which would normally be
|
||
escaped, is not escaped in this case. The reason is because
|
||
the '{{' and '}}' syntax for escapes is only applied when used
|
||
*outside* of a format field. Within a format field, the brace
|
||
characters always have their normal meaning.
|
||
|
||
The syntax for conversion specifiers is open-ended, since a class
|
||
can override the standard conversion specifiers. In such cases,
|
||
the format() method merely passes all of the characters between
|
||
the first colon and the matching brace to the relevant underlying
|
||
formatting method.
|
||
|
||
|
||
Standard Conversion Specifiers
|
||
|
||
If an object does not define its own conversion specifiers, a
|
||
standard set of conversion specifiers are used. These are similar
|
||
in concept to the conversion specifiers used by the existing '%'
|
||
operator, however there are also a number of significant
|
||
differences. The standard conversion specifiers fall into three
|
||
major categories: string conversions, integer conversions and
|
||
floating point conversions.
|
||
|
||
The general form of a standard conversion specifier is:
|
||
|
||
[[fill]align][sign][width][.precision][type]
|
||
|
||
The brackets ([]) indicate an optional element.
|
||
|
||
Then the optional align flag can be one of the following:
|
||
|
||
'<' - Forces the field to be left-aligned within the available
|
||
space (This is the default.)
|
||
'>' - Forces the field to be right-aligned within the
|
||
available space.
|
||
'=' - Forces the padding to be placed after the sign (if any)
|
||
but before the digits. This is used for printing fields
|
||
in the form '+000000120'.
|
||
'^' - Forces the field to be centered within the available
|
||
space.
|
||
|
||
Note that unless a minimum field width is defined, the field
|
||
width will always be the same size as the data to fill it, so
|
||
that the alignment option has no meaning in this case.
|
||
|
||
The optional 'fill' character defines the character to be used to
|
||
pad the field to the minimum width. The alignment flag must be
|
||
supplied if the character is a number other than 0 (otherwise the
|
||
character would be interpreted as part of the field width
|
||
specifier). A zero fill character without an alignment flag
|
||
implies an alignment type of '='.
|
||
|
||
The 'sign' element can be one of the following:
|
||
|
||
'+' - indicates that a sign should be used for both
|
||
positive as well as negative numbers
|
||
'-' - indicates that a sign should be used only for negative
|
||
numbers (this is the default behaviour)
|
||
' ' - indicates that a leading space should be used on
|
||
positive numbers
|
||
'()' - indicates that negative numbers should be surrounded
|
||
by parentheses
|
||
|
||
'width' is a decimal integer defining the minimum field width. If
|
||
not specified, then the field width will be determined by the
|
||
content.
|
||
|
||
The 'precision' is a decimal number indicating how many digits
|
||
should be displayed after the decimal point in a floating point
|
||
conversion. In a string conversion the field indicates how many
|
||
characters will be used from the field content. The precision is
|
||
ignored for integer conversions.
|
||
|
||
Finally, the 'type' determines how the data should be presented.
|
||
If the type field is absent, an appropriate type will be assigned
|
||
based on the value to be formatted ('d' for integers and longs,
|
||
'g' for floats, and 's' for everything else.)
|
||
|
||
The available string conversion types are:
|
||
|
||
's' - String format. Invokes str() on the object.
|
||
This is the default conversion specifier type.
|
||
'r' - Repr format. Invokes repr() on the object.
|
||
|
||
There are several integer conversion types. All invoke int() on
|
||
the object before attempting to format it.
|
||
|
||
The available integer conversion types are:
|
||
|
||
'b' - Binary. Outputs the number in base 2.
|
||
'c' - Character. Converts the integer to the corresponding
|
||
unicode character before printing.
|
||
'd' - Decimal Integer. Outputs the number in base 10.
|
||
'o' - Octal format. Outputs the number in base 8.
|
||
'x' - Hex format. Outputs the number in base 16, using lower-
|
||
case letters for the digits above 9.
|
||
'X' - Hex format. Outputs the number in base 16, using upper-
|
||
case letters for the digits above 9.
|
||
|
||
There are several floating point conversion types. All invoke
|
||
float() on the object before attempting to format it.
|
||
|
||
The available floating point conversion types are:
|
||
|
||
'e' - Exponent notation. Prints the number in scientific
|
||
notation using the letter 'e' to indicate the exponent.
|
||
'E' - Exponent notation. Same as 'e' except it uses an upper
|
||
case 'E' as the separator character.
|
||
'f' - Fixed point. Displays the number as a fixed-point
|
||
number.
|
||
'F' - Fixed point. Same as 'f'.
|
||
'g' - General format. This prints the number as a fixed-point
|
||
number, unless the number is too large, in which case
|
||
it switches to 'e' exponent notation.
|
||
'G' - General format. Same as 'g' except switches to 'E'
|
||
if the number gets to large.
|
||
'n' - Number. This is the same as 'g', except that it uses the
|
||
current locale setting to insert the appropriate
|
||
number separator characters.
|
||
'%' - Percentage. Multiplies the number by 100 and displays
|
||
in fixed ('f') format, followed by a percent sign.
|
||
|
||
Objects are able to define their own conversion specifiers to
|
||
replace the standard ones. An example is the 'datetime' class,
|
||
whose conversion specifiers might look something like the
|
||
arguments to the strftime() function:
|
||
|
||
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
|
||
|
||
|
||
Controlling Formatting on a Per-Type Basis
|
||
|
||
A class that wishes to implement a custom interpretation of its
|
||
conversion specifiers can implement a __format__ method:
|
||
|
||
class AST:
|
||
def __format__(self, specifiers):
|
||
...
|
||
|
||
The 'specifiers' argument will be either a string object or a
|
||
unicode object, depending on the type of the original format
|
||
string. The __format__ method should test the type of the
|
||
specifiers parameter to determine whether to return a string or
|
||
unicode object. It is the responsibility of the __format__ method
|
||
to return an object of the proper type.
|
||
|
||
string.format() will format each field using the following steps:
|
||
|
||
1) See if the value to be formatted has a __format__ method. If
|
||
it does, then call it.
|
||
|
||
2) Otherwise, check the internal formatter within string.format
|
||
that contains knowledge of certain builtin types.
|
||
|
||
3) Otherwise, call str() or unicode() as appropriate.
|
||
|
||
|
||
User-Defined Formatting
|
||
|
||
There will be times when customizing the formatting of fields
|
||
on a per-type basis is not enough. An example might be a
|
||
spreadsheet application, which displays hash marks '#' when a value
|
||
is too large to fit in the available space.
|
||
|
||
For more powerful and flexible formatting, access to the underlying
|
||
format engine can be obtained through the 'Formatter' class that
|
||
lives in the 'string' module. This class takes additional options
|
||
which are not accessible via the normal str.format method.
|
||
|
||
An application can create their own Formatter instance which has
|
||
customized behavior, either by setting the properties of the
|
||
Formatter instance, or by subclassing the Formatter class.
|
||
|
||
The PEP does not attempt to exactly specify all methods and
|
||
properties defined by the Formatter class; Instead, those will be
|
||
defined and documented in the initial implementation. However, this
|
||
PEP will specify the general requirements for the Formatter class,
|
||
which are listed below.
|
||
|
||
|
||
Formatter Creation and Initialization
|
||
|
||
The Formatter class takes a single initialization argument, 'flags':
|
||
|
||
Formatter(flags=0)
|
||
|
||
The 'flags' argument is used to control certain subtle behavioral
|
||
differences in formatting that would be cumbersome to change via
|
||
subclassing. The flags values are defined as static variables
|
||
in the "Formatter" class:
|
||
|
||
Formatter.ALLOW_LEADING_UNDERSCORES
|
||
|
||
By default, leading underscores are not allowed in identifier
|
||
lookups (getattr or getitem). Setting this flag will allow
|
||
this.
|
||
|
||
Formatter.CHECK_UNUSED_POSITIONAL
|
||
|
||
If this flag is set, the any positional arguments which are
|
||
supplied to the 'format' method but which are not used by
|
||
the format string will cause an error.
|
||
|
||
Formatter.CHECK_UNUSED_NAME
|
||
|
||
If this flag is set, the any named arguments which are
|
||
supplied to the 'format' method but which are not used by
|
||
the format string will cause an error.
|
||
|
||
|
||
Formatter Methods
|
||
|
||
The methods of class Formatter are as follows:
|
||
|
||
-- format(format_string, *args, **kwargs)
|
||
-- vformat(format_string, args, kwargs)
|
||
-- get_positional(args, index)
|
||
-- get_named(kwds, name)
|
||
-- format_field(value, conversion)
|
||
|
||
'format' is the primary API method. It takes a format template,
|
||
and an arbitrary set of positional and keyword argument. 'format'
|
||
is just a wrapper that calls 'vformat'.
|
||
|
||
'vformat' is the function that does the actual work of formatting. It
|
||
is exposed as a separate function for cases where you want to pass in
|
||
a predefined dictionary of arguments, rather than unpacking and
|
||
repacking the dictionary as individual arguments using the '*args' and
|
||
'**kwds' syntax. 'vformat' does the work of breaking up the format
|
||
template string into character data and replacement fields. It calls
|
||
the 'get_positional' and 'get_index' methods as appropriate.
|
||
|
||
Note that the checking of unused arguments, and the restriction on
|
||
leading underscores in attribute names are also done in this function.
|
||
|
||
'get_positional' and 'get_named' are used to retrieve a given field
|
||
value. For compound field names, these functions are only called for
|
||
the first component of the field name; Subsequent components are
|
||
handled through normal attribute and indexing operations. So for
|
||
example, the field expression '0.name' would cause 'get_positional' to
|
||
be called with the list of positional arguments and a numeric index of
|
||
0, and then the standard 'getattr' function would be called to get the
|
||
'name' attribute of the result.
|
||
|
||
If the index or keyword refers to an item that does not exist, then an
|
||
IndexError/KeyError will be raised.
|
||
|
||
'format_field' actually generates the text for a replacement field.
|
||
The 'value' argument corresponds to the value being formatted, which
|
||
was retrieved from the arguments using the field name. The
|
||
'conversion' argument is the conversion spec part of the field, which
|
||
will be either a string or unicode object, depending on the type of
|
||
the original format string.
|
||
|
||
Note: The final implementation of the Formatter class may define
|
||
additional overridable methods and hooks. In particular, it may be
|
||
that 'vformat' is itself a composition of several additional,
|
||
overridable methods. (Depending on whether it is convenient to the
|
||
implementor of Formatter.)
|
||
|
||
|
||
Customizing Formatters
|
||
|
||
This section describes some typical ways that Formatter objects
|
||
can be customized.
|
||
|
||
To support alternative format-string syntax, the 'vformat' method
|
||
can be overridden to alter the way format strings are parsed.
|
||
|
||
One common desire is to support a 'default' namespace, so that
|
||
you don't need to pass in keyword arguments to the format()
|
||
method, but can instead use values in a pre-existing namespace.
|
||
This can easily be done by overriding get_named() as follows:
|
||
|
||
class NamespaceFormatter(Formatter):
|
||
def __init__(self, namespace={}, flags=0):
|
||
Formatter.__init__(self, flags)
|
||
self.namespace = namespace
|
||
|
||
def get_named(self, kwds, name):
|
||
try:
|
||
# Check explicitly passed arguments first
|
||
return kwds[name]
|
||
except KeyError:
|
||
return self.namespace[name]
|
||
|
||
One can use this to easily create a formatting function that allows
|
||
access to global variables, for example:
|
||
|
||
fmt = NamespaceFormatter(globals())
|
||
|
||
greeting = "hello"
|
||
print(fmt("{greeting}, world!"))
|
||
|
||
A similar technique can be done with the locals() dictionary to
|
||
gain access to the locals dictionary.
|
||
|
||
It would also be possible to create a 'smart' namespace formatter
|
||
that could automatically access both locals and globals through
|
||
snooping of the calling stack. Due to the need for compatibility
|
||
the different versions of Python, such a capability will not be
|
||
included in the standard library, however it is anticipated that
|
||
someone will create and publish a recipe for doing this.
|
||
|
||
Another type of customization is to change the way that built-in
|
||
types are formatted by overriding the 'format_field' method. (For
|
||
non-built-in types, you can simply define a __format__ special
|
||
method on that type.) So for example, you could override the
|
||
formatting of numbers to output scientific notation when needed.
|
||
|
||
|
||
Error handling
|
||
|
||
There are two classes of exceptions which can occur during formatting:
|
||
exceptions generated by the formatter code itself, and exceptions
|
||
generated by user code (such as a field object's getattr function, or
|
||
the field_hook function).
|
||
|
||
In general, exceptions generated by the formatter code itself are
|
||
of the "ValueError" variety -- there is an error in the actual "value"
|
||
of the format string. (This is not always true; for example, the
|
||
string.format() function might be passed a non-string as its first
|
||
parameter, which would result in a TypeError.)
|
||
|
||
The text associated with these internally generated ValueError
|
||
exceptions will indicate the location of the exception inside
|
||
the format string, as well as the nature of the exception.
|
||
|
||
For exceptions generated by user code, a trace record and
|
||
dummy frame will be added to the traceback stack to help
|
||
in determining the location in the string where the exception
|
||
occurred. The inserted traceback will indicate that the
|
||
error occurred at:
|
||
|
||
File "<format_string>;", line XX, in column_YY
|
||
|
||
where XX and YY represent the line and character position
|
||
information in the string, respectively.
|
||
|
||
|
||
Alternate Syntax
|
||
|
||
Naturally, one of the most contentious issues is the syntax of the
|
||
format strings, and in particular the markup conventions used to
|
||
indicate fields.
|
||
|
||
Rather than attempting to exhaustively list all of the various
|
||
proposals, I will cover the ones that are most widely used
|
||
already.
|
||
|
||
- Shell variable syntax: $name and $(name) (or in some variants,
|
||
${name}). This is probably the oldest convention out there, and
|
||
is used by Perl and many others. When used without the braces,
|
||
the length of the variable is determined by lexically scanning
|
||
until an invalid character is found.
|
||
|
||
This scheme is generally used in cases where interpolation is
|
||
implicit - that is, in environments where any string can contain
|
||
interpolation variables, and no special subsitution function
|
||
need be invoked. In such cases, it is important to prevent the
|
||
interpolation behavior from occuring accidentally, so the '$'
|
||
(which is otherwise a relatively uncommonly-used character) is
|
||
used to signal when the behavior should occur.
|
||
|
||
It is the author's opinion, however, that in cases where the
|
||
formatting is explicitly invoked, that less care needs to be
|
||
taken to prevent accidental interpolation, in which case a
|
||
lighter and less unwieldy syntax can be used.
|
||
|
||
- Printf and its cousins ('%'), including variations that add a
|
||
field index, so that fields can be interpolated out of order.
|
||
|
||
- Other bracket-only variations. Various MUDs (Multi-User
|
||
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
|
||
string interpolation. The Microsoft .Net libraries uses braces
|
||
({}), and a syntax which is very similar to the one in this
|
||
proposal, although the syntax for conversion specifiers is quite
|
||
different. [4]
|
||
|
||
- Backquoting. This method has the benefit of minimal syntactical
|
||
clutter, however it lacks many of the benefits of a function
|
||
call syntax (such as complex expression arguments, custom
|
||
formatters, etc.).
|
||
|
||
- Other variations include Ruby's #{}, PHP's {$name}, and so
|
||
on.
|
||
|
||
Some specific aspects of the syntax warrant additional comments:
|
||
|
||
1) Backslash character for escapes. The original version of
|
||
this PEP used backslash rather than doubling to escape a bracket.
|
||
This worked because backslashes in Python string literals that
|
||
don't conform to a standard backslash sequence such as '\n'
|
||
are left unmodified. However, this caused a certain amount
|
||
of confusion, and led to potential situations of multiple
|
||
recursive escapes, i.e. '\\\\{' to place a literal backslash
|
||
in front of a bracket.
|
||
|
||
2) The use of the colon character (':') as a separator for
|
||
conversion specifiers. This was chosen simply because that's
|
||
what .Net uses.
|
||
|
||
|
||
Security Considerations
|
||
|
||
Historically, string formatting has been a common source of
|
||
security holes in web-based applications, particularly if the
|
||
string templating system allows arbitrary expressions to be
|
||
embedded in format strings.
|
||
|
||
The typical scenario is one where the string data being processed
|
||
is coming from outside the application, perhaps from HTTP headers
|
||
or fields within a web form. An attacker could substitute their
|
||
own strings designed to cause havok.
|
||
|
||
The string formatting system outlined in this PEP is by no means
|
||
'secure', in the sense that no Python library module can, on its
|
||
own, guarantee security, especially given the open nature of
|
||
the Python language. Building a secure application requires a
|
||
secure approach to design.
|
||
|
||
What this PEP does attempt to do is make the job of designing a
|
||
secure application easier, by making it easier for a programmer
|
||
to reason about the possible consequences of a string formatting
|
||
operation. It does this by limiting those consequences to a smaller
|
||
and more easier understood subset.
|
||
|
||
For example, because it is possible in Python to override the
|
||
'getattr' operation of a type, the interpretation of a compound
|
||
replacement field such as "0.name" could potentially run
|
||
arbitrary code.
|
||
|
||
However, it is *extremely* rare for the mere retrieval of an
|
||
attribute to have side effects. Other operations which are more
|
||
likely to have side effects - such as method calls - are disallowed.
|
||
Thus, a programmer can be reasonably assured that no string
|
||
formatting operation will cause a state change in the program.
|
||
This assurance is not only useful in securing an application, but
|
||
in debugging it as well.
|
||
|
||
Similarly, the restriction on field names beginning with
|
||
underscores is intended to provide similar assurances about the
|
||
visibility of private data.
|
||
|
||
Of course, programmers would be well-advised to avoid using
|
||
any external data as format strings, and instead use that data
|
||
as the format arguments instead.
|
||
|
||
|
||
Sample Implementation
|
||
|
||
An implementation of an earlier version of this PEP was created by
|
||
Patrick Maupin and Eric V. Smith, and can be found in the pep3101
|
||
sandbox at:
|
||
|
||
http://svn.python.org/view/sandbox/trunk/pep3101/
|
||
|
||
|
||
Backwards Compatibility
|
||
|
||
Backwards compatibility can be maintained by leaving the existing
|
||
mechanisms in place. The new system does not collide with any of
|
||
the method names of the existing string formatting techniques, so
|
||
both systems can co-exist until it comes time to deprecate the
|
||
older system.
|
||
|
||
|
||
References
|
||
|
||
[1] Python Library Reference - String formating operations
|
||
http://docs.python.org/lib/typesseq-strings.html
|
||
|
||
[2] Python Library References - Template strings
|
||
http://docs.python.org/lib/node109.html
|
||
|
||
[3] [Python-3000] String formating operations in python 3k
|
||
http://mail.python.org/pipermail/python-3000/2006-April/000285.html
|
||
|
||
[4] Composite Formatting - [.Net Framework Developer's Guide]
|
||
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|