A substantial rewrite of PEP3101.
This commit is contained in:
parent
9e84963d32
commit
fa5ea5f886
435
pep-3101.txt
435
pep-3101.txt
|
@ -26,10 +26,10 @@ Rationale
|
||||||
|
|
||||||
- The string.Template module. [2]
|
- The string.Template module. [2]
|
||||||
|
|
||||||
The scope of this PEP will be restricted to proposals for built-in
|
The primary scope of this PEP concerns proposals for built-in
|
||||||
string formatting operations (in other words, methods of the
|
string formatting operations (in other words, methods of the
|
||||||
built-in string type).
|
built-in string type).
|
||||||
|
|
||||||
The '%' operator is primarily limited by the fact that it is a
|
The '%' operator is primarily limited by the fact that it is a
|
||||||
binary operator, and therefore can take at most two arguments.
|
binary operator, and therefore can take at most two arguments.
|
||||||
One of those arguments is already dedicated to the format string,
|
One of those arguments is already dedicated to the format string,
|
||||||
|
@ -42,8 +42,14 @@ Rationale
|
||||||
|
|
||||||
While there is some overlap between this proposal and
|
While there is some overlap between this proposal and
|
||||||
string.Template, it is felt that each serves a distinct need,
|
string.Template, it is felt that each serves a distinct need,
|
||||||
and that one does not obviate the other. In any case,
|
and that one does not obviate the other. This proposal is for
|
||||||
string.Template will not be discussed here.
|
a mechanism which, like '%', is efficient for small strings
|
||||||
|
which are only used once, so, for example, compilation of a
|
||||||
|
string into a template is not contemplated in this proposal,
|
||||||
|
although the proposal does take care to define format strings
|
||||||
|
and the API in such a way that an efficient template package
|
||||||
|
could reuse the syntax and even some of the underlying
|
||||||
|
formatting code.
|
||||||
|
|
||||||
|
|
||||||
Specification
|
Specification
|
||||||
|
@ -53,39 +59,43 @@ Specification
|
||||||
- Specification of a new formatting method to be added to the
|
- Specification of a new formatting method to be added to the
|
||||||
built-in string class.
|
built-in string class.
|
||||||
|
|
||||||
|
- Specification of functions and flag values to be added to
|
||||||
|
the string module, so that the underlying formatting engine
|
||||||
|
can be used with additional options.
|
||||||
|
|
||||||
- Specification of a new syntax for format strings.
|
- Specification of a new syntax for format strings.
|
||||||
|
|
||||||
- Specification of a new set of class methods to control the
|
- Specification of a new set of special methods to control the
|
||||||
formatting and conversion of objects.
|
formatting and conversion of objects.
|
||||||
|
|
||||||
- Specification of an API for user-defined formatting classes.
|
- Specification of an API for user-defined formatting classes.
|
||||||
|
|
||||||
- Specification of how formatting errors are handled.
|
- Specification of how formatting errors are handled.
|
||||||
|
|
||||||
Note on string encodings: Since this PEP is being targeted
|
Note on string encodings: When discussing this PEP in the context
|
||||||
at Python 3.0, it is assumed that all strings are unicode strings,
|
of Python 3.0, it is assumed that all strings are unicode strings,
|
||||||
and that the use of the word 'string' in the context of this
|
and that the use of the word 'string' in the context of this
|
||||||
document will generally refer to a Python 3.0 string, which is
|
document will generally refer to a Python 3.0 string, which is
|
||||||
the same as Python 2.x unicode object.
|
the same as Python 2.x unicode object.
|
||||||
|
|
||||||
If it should happen that this functionality is backported to
|
In the context of Python 2.x, the use of the word 'string' in this
|
||||||
the 2.x series, then it will be necessary to handle both regular
|
document refers to an object which may either be a regular string
|
||||||
string as well as unicode objects. All of the function call
|
or a unicode object. All of the function call interfaces
|
||||||
interfaces described in this PEP can be used for both strings
|
described in this PEP can be used for both strings and unicode
|
||||||
and unicode objects, and in all cases there is sufficient
|
objects, and in all cases there is sufficient information
|
||||||
information to be able to properly deduce the output string
|
to be able to properly deduce the output string type (in
|
||||||
type (in other words, there is no need for two separate APIs).
|
other words, there is no need for two separate APIs).
|
||||||
In all cases, the type of the template string dominates - that
|
In all cases, the type of the format string dominates - that
|
||||||
is, the result of the conversion will always result in an object
|
is, the result of the conversion will always result in an object
|
||||||
that contains the same representation of characters as the
|
that contains the same representation of characters as the
|
||||||
input template string.
|
input format string.
|
||||||
|
|
||||||
|
|
||||||
String Methods
|
String Methods
|
||||||
|
|
||||||
The build-in string class will gain a new method, 'format',
|
The built-in string class (and also the unicode class in 2.6) will
|
||||||
which takes takes an arbitrary number of positional and keyword
|
gain a new method, 'format', which takes an arbitrary number of
|
||||||
arguments:
|
positional and keyword arguments:
|
||||||
|
|
||||||
"The story of {0}, {1}, and {c}".format(a, b, c=d)
|
"The story of {0}, {1}, and {c}".format(a, b, c=d)
|
||||||
|
|
||||||
|
@ -98,6 +108,15 @@ String Methods
|
||||||
|
|
||||||
Format Strings
|
Format Strings
|
||||||
|
|
||||||
|
Format strings consist of intermingled character data and markup.
|
||||||
|
|
||||||
|
Character data is data which is transferred unchanged from the
|
||||||
|
format string to the output string; markup is not transferred from
|
||||||
|
the format string directly to the output, but instead is used to
|
||||||
|
define 'replacement fields' that describes to the format engine
|
||||||
|
what should be placed in the output string in the place of the
|
||||||
|
markup.
|
||||||
|
|
||||||
Brace characters ('curly braces') are used to indicate a
|
Brace characters ('curly braces') are used to indicate a
|
||||||
replacement field within the string:
|
replacement field within the string:
|
||||||
|
|
||||||
|
@ -114,11 +133,11 @@ Format Strings
|
||||||
Which would produce:
|
Which would produce:
|
||||||
|
|
||||||
"My name is Fred :-{}"
|
"My name is Fred :-{}"
|
||||||
|
|
||||||
The element within the braces is called a 'field'. Fields consist
|
The element within the braces is called a 'field'. Fields consist
|
||||||
of a 'field name', which can either be simple or compound, and an
|
of a 'field name', which can either be simple or compound, and an
|
||||||
optional 'conversion specifier'.
|
optional 'conversion specifier'.
|
||||||
|
|
||||||
|
|
||||||
Simple and Compound Field Names
|
Simple and Compound Field Names
|
||||||
|
|
||||||
|
@ -126,12 +145,12 @@ Simple and Compound Field Names
|
||||||
must be valid base-10 integers; if names, they must be valid
|
must be valid base-10 integers; if names, they must be valid
|
||||||
Python identifiers. A number is used to identify a positional
|
Python identifiers. A number is used to identify a positional
|
||||||
argument, while a name is used to identify a keyword argument.
|
argument, while a name is used to identify a keyword argument.
|
||||||
|
|
||||||
A compound field name is a combination of multiple simple field
|
A compound field name is a combination of multiple simple field
|
||||||
names in an expression:
|
names in an expression:
|
||||||
|
|
||||||
"My name is {0.name}".format(file('out.txt'))
|
"My name is {0.name}".format(file('out.txt'))
|
||||||
|
|
||||||
This example shows the use of the 'getattr' or 'dot' operator
|
This example shows the use of the 'getattr' or 'dot' operator
|
||||||
in a field expression. The dot operator allows an attribute of
|
in a field expression. The dot operator allows an attribute of
|
||||||
an input value to be specified as the field value.
|
an input value to be specified as the field value.
|
||||||
|
@ -142,22 +161,36 @@ Simple and Compound Field Names
|
||||||
Python expressions inside of strings. Only two operators are
|
Python expressions inside of strings. Only two operators are
|
||||||
supported, the '.' (getattr) operator, and the '[]' (getitem)
|
supported, the '.' (getattr) operator, and the '[]' (getitem)
|
||||||
operator.
|
operator.
|
||||||
|
|
||||||
|
Another limitation that is defined to limit potential security
|
||||||
|
issues is that field names or attribute names beginning with an
|
||||||
|
underscore are disallowed. This enforces the common convention
|
||||||
|
that names beginning with an underscore are 'private'.
|
||||||
|
|
||||||
An example of the 'getitem' syntax:
|
An example of the 'getitem' syntax:
|
||||||
|
|
||||||
"My name is {0[name]}".format(dict(name='Fred'))
|
"My name is {0[name]}".format(dict(name='Fred'))
|
||||||
|
|
||||||
It should be noted that the use of 'getitem' within a string is
|
It should be noted that the use of 'getitem' within a string is
|
||||||
much more limited than its normal use. In the above example, the
|
much more limited than its normal use. In the above example, the
|
||||||
string 'name' really is the literal string 'name', not a variable
|
string 'name' really is the literal string 'name', not a variable
|
||||||
named 'name'. The rules for parsing an item key are the same as
|
named 'name'. The rules for parsing an item key are very simple.
|
||||||
for parsing a simple name - in other words, if it looks like a
|
If it starts with a digit, then its treated as a number, otherwise
|
||||||
number, then its treated as a number, if it looks like an
|
it is used as a string.
|
||||||
identifier, then it is used as a string.
|
|
||||||
|
|
||||||
It is not possible to specify arbitrary dictionary keys from
|
It is not possible to specify arbitrary dictionary keys from
|
||||||
within a format string.
|
within a format string.
|
||||||
|
|
||||||
|
Implementation note: The implementation of this proposal is
|
||||||
|
not required to enforce the rule about a name being a valid
|
||||||
|
Python identifier. Instead, it will rely on the getattr function
|
||||||
|
of the underlying object to throw an exception if the identifier
|
||||||
|
is not legal. The format function will have a minimalist parser
|
||||||
|
which only attempts to figure out when it is "done" with an
|
||||||
|
identifier (by finding a '.' or a ']', or '}', etc.) The only
|
||||||
|
exception to this laissez-faire approach is that, by default,
|
||||||
|
strings are not allowed to have leading underscores.
|
||||||
|
|
||||||
|
|
||||||
Conversion Specifiers
|
Conversion Specifiers
|
||||||
|
|
||||||
|
@ -176,9 +209,9 @@ Conversion Specifiers
|
||||||
Conversion specifiers can themselves contain replacement fields.
|
Conversion specifiers can themselves contain replacement fields.
|
||||||
For example, a field whose field width is itself a parameter
|
For example, a field whose field width is itself a parameter
|
||||||
could be specified via:
|
could be specified via:
|
||||||
|
|
||||||
"{0:{1}}".format(a, b, c)
|
"{0:{1}}".format(a, b, c)
|
||||||
|
|
||||||
Note that the doubled '}' at the end, which would normally be
|
Note that the doubled '}' at the end, which would normally be
|
||||||
escaped, is not escaped in this case. The reason is because
|
escaped, is not escaped in this case. The reason is because
|
||||||
the '{{' and '}}' syntax for escapes is only applied when used
|
the '{{' and '}}' syntax for escapes is only applied when used
|
||||||
|
@ -201,13 +234,13 @@ Standard Conversion Specifiers
|
||||||
differences. The standard conversion specifiers fall into three
|
differences. The standard conversion specifiers fall into three
|
||||||
major categories: string conversions, integer conversions and
|
major categories: string conversions, integer conversions and
|
||||||
floating point conversions.
|
floating point conversions.
|
||||||
|
|
||||||
The general form of a standard conversion specifier is:
|
The general form of a standard conversion specifier is:
|
||||||
|
|
||||||
[[fill]align][sign][width][.precision][type]
|
[[fill]align][sign][width][.precision][type]
|
||||||
|
|
||||||
The brackets ([]) indicate an optional element.
|
The brackets ([]) indicate an optional element.
|
||||||
|
|
||||||
Then the optional align flag can be one of the following:
|
Then the optional align flag can be one of the following:
|
||||||
|
|
||||||
'<' - Forces the field to be left-aligned within the available
|
'<' - Forces the field to be left-aligned within the available
|
||||||
|
@ -217,18 +250,20 @@ Standard Conversion Specifiers
|
||||||
'=' - Forces the padding to be placed after the sign (if any)
|
'=' - Forces the padding to be placed after the sign (if any)
|
||||||
but before the digits. This is used for printing fields
|
but before the digits. This is used for printing fields
|
||||||
in the form '+000000120'.
|
in the form '+000000120'.
|
||||||
|
'^' - Forces the field to be centered within the available
|
||||||
|
space.
|
||||||
|
|
||||||
Note that unless a minimum field width is defined, the field
|
Note that unless a minimum field width is defined, the field
|
||||||
width will always be the same size as the data to fill it, so
|
width will always be the same size as the data to fill it, so
|
||||||
that the alignment option has no meaning in this case.
|
that the alignment option has no meaning in this case.
|
||||||
|
|
||||||
The optional 'fill' character defines the character to be used to
|
The optional 'fill' character defines the character to be used to
|
||||||
pad the field to the minimum width. The alignment flag must be
|
pad the field to the minimum width. The alignment flag must be
|
||||||
supplied if the character is a number other than 0 (otherwise the
|
supplied if the character is a number other than 0 (otherwise the
|
||||||
character would be interpreted as part of the field width
|
character would be interpreted as part of the field width
|
||||||
specifier). A zero fill character without an alignment flag
|
specifier). A zero fill character without an alignment flag
|
||||||
implies an alignment type of '='.
|
implies an alignment type of '='.
|
||||||
|
|
||||||
The 'sign' element can be one of the following:
|
The 'sign' element can be one of the following:
|
||||||
|
|
||||||
'+' - indicates that a sign should be used for both
|
'+' - indicates that a sign should be used for both
|
||||||
|
@ -249,7 +284,7 @@ Standard Conversion Specifiers
|
||||||
conversion. In a string conversion the field indicates how many
|
conversion. In a string conversion the field indicates how many
|
||||||
characters will be used from the field content. The precision is
|
characters will be used from the field content. The precision is
|
||||||
ignored for integer conversions.
|
ignored for integer conversions.
|
||||||
|
|
||||||
Finally, the 'type' determines how the data should be presented.
|
Finally, the 'type' determines how the data should be presented.
|
||||||
If the type field is absent, an appropriate type will be assigned
|
If the type field is absent, an appropriate type will be assigned
|
||||||
based on the value to be formatted ('d' for integers and longs,
|
based on the value to be formatted ('d' for integers and longs,
|
||||||
|
@ -307,7 +342,7 @@ Standard Conversion Specifiers
|
||||||
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
|
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
|
||||||
|
|
||||||
|
|
||||||
Controlling Formatting
|
Controlling Formatting on a Per-Type Basis
|
||||||
|
|
||||||
A class that wishes to implement a custom interpretation of its
|
A class that wishes to implement a custom interpretation of its
|
||||||
conversion specifiers can implement a __format__ method:
|
conversion specifiers can implement a __format__ method:
|
||||||
|
@ -334,107 +369,187 @@ Controlling Formatting
|
||||||
3) Otherwise, call str() or unicode() as appropriate.
|
3) Otherwise, call str() or unicode() as appropriate.
|
||||||
|
|
||||||
|
|
||||||
User-Defined Formatting Classes
|
User-Defined Formatting
|
||||||
|
|
||||||
There will be times when customizing the formatting of fields
|
There will be times when customizing the formatting of fields
|
||||||
on a per-type basis is not enough. An example might be an
|
on a per-type basis is not enough. An example might be a
|
||||||
accounting application, which displays negative numbers in
|
spreadsheet application, which displays hash marks '#' when a value
|
||||||
parentheses rather than using a negative sign.
|
is too large to fit in the available space.
|
||||||
|
|
||||||
The string formatting system facilitates this kind of application-
|
|
||||||
specific formatting by allowing user code to directly invoke
|
|
||||||
the code that interprets format strings and fields. User-written
|
|
||||||
code can intercept the normal formatting operations on a per-field
|
|
||||||
basis, substituting their own formatting methods.
|
|
||||||
|
|
||||||
For example, in the aforementioned accounting application, there
|
|
||||||
could be an application-specific number formatter, which reuses
|
|
||||||
the string.format templating code to do most of the work. The
|
|
||||||
API for such an application-specific formatter is up to the
|
|
||||||
application; here are several possible examples:
|
|
||||||
|
|
||||||
cell_format("The total is: {0}", total)
|
|
||||||
|
|
||||||
TemplateString("The total is: {0}").format(total)
|
|
||||||
|
|
||||||
Creating an application-specific formatter is relatively straight-
|
|
||||||
forward. The string and unicode classes will have a class method
|
|
||||||
called 'cformat' that does all the actual work of formatting; The
|
|
||||||
built-in format() method is just a wrapper that calls cformat.
|
|
||||||
|
|
||||||
The type signature for the cFormat function is as follows:
|
|
||||||
|
|
||||||
cformat(template, format_hook, args, kwargs)
|
|
||||||
|
|
||||||
The parameters to the cformat function are:
|
For more powerful and flexible formatting, access to the underlying
|
||||||
|
format engine can be obtained through the 'Formatter' class that
|
||||||
|
lives in the 'string' module. This class takes additional options
|
||||||
|
which are not accessible via the normal str.format method.
|
||||||
|
|
||||||
-- The format template string.
|
An application can create their own Formatter instance which has
|
||||||
-- A callable 'format hook', which is called once per field
|
customized behavior, either by setting the properties of the
|
||||||
-- A tuple containing the positional arguments
|
Formatter instance, or by subclassing the Formatter class.
|
||||||
-- A dict containing the keyword arguments
|
|
||||||
|
|
||||||
The cformat function will parse all of the fields in the format
|
The PEP does not attempt to exactly specify all methods and
|
||||||
string, and return a new string (or unicode) with all of the
|
properties defined by the Formatter class; Instead, those will be
|
||||||
fields replaced with their formatted values.
|
defined and documented in the initial implementation. However, this
|
||||||
|
PEP will specify the general requirements for the Formatter class,
|
||||||
|
which are listed below.
|
||||||
|
|
||||||
The format hook is a callable object supplied by the user, which
|
|
||||||
is invoked once per field, and which can override the normal
|
|
||||||
formatting for that field. For each field, the cformat function
|
|
||||||
will attempt to call the field format hook with the following
|
|
||||||
arguments:
|
|
||||||
|
|
||||||
format_hook(value, conversion)
|
Formatter Creation and Initialization
|
||||||
|
|
||||||
The 'value' field corresponds to the value being formatted, which
|
The Formatter class takes a single initialization argument, 'flags':
|
||||||
was retrieved from the arguments using the field name.
|
|
||||||
|
|
||||||
The 'conversion' argument is the conversion spec part of the
|
Formatter(flags=0)
|
||||||
field, which will be either a string or unicode object, depending
|
|
||||||
on the type of the original format string.
|
|
||||||
|
|
||||||
The field_hook will be called once per field. The field_hook may
|
The 'flags' argument is used to control certain subtle behavioral
|
||||||
take one of two actions:
|
differences in formatting that would be cumbersome to change via
|
||||||
|
subclassing. The flags values are defined as static variables
|
||||||
1) Return a string or unicode object that is the result
|
in the "Formatter" class:
|
||||||
of the formatting operation.
|
|
||||||
|
|
||||||
2) Return None, indicating that the field_hook will not
|
Formatter.ALLOW_LEADING_UNDERSCORES
|
||||||
process this field and the default formatting should be
|
|
||||||
used. This decision should be based on the type of the
|
By default, leading underscores are not allowed in identifier
|
||||||
value object, and the contents of the conversion string.
|
lookups (getattr or getitem). Setting this flag will allow
|
||||||
|
this.
|
||||||
|
|
||||||
|
Formatter.CHECK_UNUSED_POSITIONAL
|
||||||
|
|
||||||
|
If this flag is set, the any positional arguments which are
|
||||||
|
supplied to the 'format' method but which are not used by
|
||||||
|
the format string will cause an error.
|
||||||
|
|
||||||
|
Formatter.CHECK_UNUSED_NAME
|
||||||
|
|
||||||
|
If this flag is set, the any named arguments which are
|
||||||
|
supplied to the 'format' method but which are not used by
|
||||||
|
the format string will cause an error.
|
||||||
|
|
||||||
|
|
||||||
|
Formatter Methods
|
||||||
|
|
||||||
|
The methods of class Formatter are as follows:
|
||||||
|
|
||||||
|
-- format(format_string, *args, **kwargs)
|
||||||
|
-- vformat(format_string, args, kwargs)
|
||||||
|
-- get_positional(args, index)
|
||||||
|
-- get_named(kwds, name)
|
||||||
|
-- format_field(value, conversion)
|
||||||
|
|
||||||
|
'format' is the primary API method. It takes a format template,
|
||||||
|
and an arbitrary set of positional and keyword argument. 'format'
|
||||||
|
is just a wrapper that calls 'vformat'.
|
||||||
|
|
||||||
|
'vformat' is the function that does the actual work of formatting. It
|
||||||
|
is exposed as a separate function for cases where you want to pass in
|
||||||
|
a predefined dictionary of arguments, rather than unpacking and
|
||||||
|
repacking the dictionary as individual arguments using the '*args' and
|
||||||
|
'**kwds' syntax. 'vformat' does the work of breaking up the format
|
||||||
|
template string into character data and replacement fields. It calls
|
||||||
|
the 'get_positional' and 'get_index' methods as appropriate.
|
||||||
|
|
||||||
|
Note that the checking of unused arguments, and the restriction on
|
||||||
|
leading underscores in attribute names are also done in this function.
|
||||||
|
|
||||||
|
'get_positional' and 'get_named' are used to retrieve a given field
|
||||||
|
value. For compound field names, these functions are only called for
|
||||||
|
the first component of the field name; Subsequent components are
|
||||||
|
handled through normal attribute and indexing operations. So for
|
||||||
|
example, the field expression '0.name' would cause 'get_positional' to
|
||||||
|
be called with the list of positional arguments and a numeric index of
|
||||||
|
0, and then the standard 'getattr' function would be called to get the
|
||||||
|
'name' attribute of the result.
|
||||||
|
|
||||||
|
If the index or keyword refers to an item that does not exist, then an
|
||||||
|
IndexError/KeyError will be raised.
|
||||||
|
|
||||||
|
'format_field' actually generates the text for a replacement field.
|
||||||
|
The 'value' argument corresponds to the value being formatted, which
|
||||||
|
was retrieved from the arguments using the field name. The
|
||||||
|
'conversion' argument is the conversion spec part of the field, which
|
||||||
|
will be either a string or unicode object, depending on the type of
|
||||||
|
the original format string.
|
||||||
|
|
||||||
|
Note: The final implementation of the Formatter class may define
|
||||||
|
additional overridable methods and hooks. In particular, it may be
|
||||||
|
that 'vformat' is itself a composition of several additional,
|
||||||
|
overridable methods. (Depending on whether it is convenient to the
|
||||||
|
implementor of Formatter.)
|
||||||
|
|
||||||
|
|
||||||
|
Customizing Formatters
|
||||||
|
|
||||||
|
This section describes some typical ways that Formatter objects
|
||||||
|
can be customized.
|
||||||
|
|
||||||
|
To support alternative format-string syntax, the 'vformat' method
|
||||||
|
can be overridden to alter the way format strings are parsed.
|
||||||
|
|
||||||
|
One common desire is to support a 'default' namespace, so that
|
||||||
|
you don't need to pass in keyword arguments to the format()
|
||||||
|
method, but can instead use values in a pre-existing namespace.
|
||||||
|
This can easily be done by overriding get_named() as follows:
|
||||||
|
|
||||||
|
class NamespaceFormatter(Formatter):
|
||||||
|
def __init__(self, namespace={}, flags=0):
|
||||||
|
Formatter.__init__(self, flags)
|
||||||
|
self.namespace = namespace
|
||||||
|
|
||||||
|
def get_named(self, kwds, name):
|
||||||
|
try:
|
||||||
|
# Check explicitly passed arguments first
|
||||||
|
return kwds[name]
|
||||||
|
except KeyError:
|
||||||
|
return self.namespace[name]
|
||||||
|
|
||||||
|
One can use this to easily create a formatting function that allows
|
||||||
|
access to global variables, for example:
|
||||||
|
|
||||||
|
fmt = NamespaceFormatter(globals())
|
||||||
|
|
||||||
|
greeting = "hello"
|
||||||
|
print(fmt("{greeting}, world!"))
|
||||||
|
|
||||||
|
A similar technique can be done with the locals() dictionary to
|
||||||
|
gain access to the locals dictionary.
|
||||||
|
|
||||||
|
It would also be possible to create a 'smart' namespace formatter
|
||||||
|
that could automatically access both locals and globals through
|
||||||
|
snooping of the calling stack. Due to the need for compatibility
|
||||||
|
the different versions of Python, such a capability will not be
|
||||||
|
included in the standard library, however it is anticipated that
|
||||||
|
someone will create and publish a recipe for doing this.
|
||||||
|
|
||||||
|
Another type of customization is to change the way that built-in
|
||||||
|
types are formatted by overriding the 'format_field' method. (For
|
||||||
|
non-built-in types, you can simply define a __format__ special
|
||||||
|
method on that type.) So for example, you could override the
|
||||||
|
formatting of numbers to output scientific notation when needed.
|
||||||
|
|
||||||
|
|
||||||
Error handling
|
Error handling
|
||||||
|
|
||||||
The string formatting system has two error handling modes, which
|
There are two classes of exceptions which can occur during formatting:
|
||||||
are controlled by the value of a class variable:
|
exceptions generated by the formatter code itself, and exceptions
|
||||||
|
generated by user code (such as a field object's getattr function, or
|
||||||
string.strict_format_errors = True
|
the field_hook function).
|
||||||
|
|
||||||
The 'strict_format_errors' flag defaults to False, or 'lenient'
|
In general, exceptions generated by the formatter code itself are
|
||||||
mode. Setting it to True enables 'strict' mode. The current mode
|
of the "ValueError" variety -- there is an error in the actual "value"
|
||||||
determines how errors are handled, depending on the type of the
|
of the format string. (This is not always true; for example, the
|
||||||
error.
|
string.format() function might be passed a non-string as its first
|
||||||
|
parameter, which would result in a TypeError.)
|
||||||
The types of errors that can occur are:
|
|
||||||
|
The text associated with these internally generated ValueError
|
||||||
1) Reference to a missing or invalid argument from within a
|
exceptions will indicate the location of the exception inside
|
||||||
field specifier. In strict mode, this will raise an exception.
|
the format string, as well as the nature of the exception.
|
||||||
In lenient mode, this will cause the value of the field to be
|
|
||||||
replaced with the string '?name?', where 'name' will be the
|
For exceptions generated by user code, a trace record and
|
||||||
type of error (KeyError, IndexError, or AttributeError).
|
dummy frame will be added to the traceback stack to help
|
||||||
|
in determining the location in the string where the exception
|
||||||
So for example:
|
occurred. The inserted traceback will indicate that the
|
||||||
|
error occurred at:
|
||||||
>>> string.strict_format_errors = False
|
|
||||||
>>> print 'Item 2 of argument 0 is: {0[2]}'.format( [0,1] )
|
File "<format_string>;", line XX, in column_YY
|
||||||
"Item 2 of argument 0 is: ?IndexError?"
|
|
||||||
|
where XX and YY represent the line and character position
|
||||||
2) Unused argument. In strict mode, this will raise an exception.
|
information in the string, respectively.
|
||||||
In lenient mode, this will be ignored.
|
|
||||||
|
|
||||||
3) Exception raised by underlying formatter. These exceptions
|
|
||||||
are always passed through, regardless of the current mode.
|
|
||||||
|
|
||||||
|
|
||||||
Alternate Syntax
|
Alternate Syntax
|
||||||
|
@ -483,9 +598,9 @@ Alternate Syntax
|
||||||
|
|
||||||
- Other variations include Ruby's #{}, PHP's {$name}, and so
|
- Other variations include Ruby's #{}, PHP's {$name}, and so
|
||||||
on.
|
on.
|
||||||
|
|
||||||
Some specific aspects of the syntax warrant additional comments:
|
Some specific aspects of the syntax warrant additional comments:
|
||||||
|
|
||||||
1) Backslash character for escapes. The original version of
|
1) Backslash character for escapes. The original version of
|
||||||
this PEP used backslash rather than doubling to escape a bracket.
|
this PEP used backslash rather than doubling to escape a bracket.
|
||||||
This worked because backslashes in Python string literals that
|
This worked because backslashes in Python string literals that
|
||||||
|
@ -494,18 +609,66 @@ Alternate Syntax
|
||||||
of confusion, and led to potential situations of multiple
|
of confusion, and led to potential situations of multiple
|
||||||
recursive escapes, i.e. '\\\\{' to place a literal backslash
|
recursive escapes, i.e. '\\\\{' to place a literal backslash
|
||||||
in front of a bracket.
|
in front of a bracket.
|
||||||
|
|
||||||
2) The use of the colon character (':') as a separator for
|
2) The use of the colon character (':') as a separator for
|
||||||
conversion specifiers. This was chosen simply because that's
|
conversion specifiers. This was chosen simply because that's
|
||||||
what .Net uses.
|
what .Net uses.
|
||||||
|
|
||||||
|
|
||||||
|
Security Considerations
|
||||||
|
|
||||||
|
Historically, string formatting has been a common source of
|
||||||
|
security holes in web-based applications, particularly if the
|
||||||
|
string templating system allows arbitrary expressions to be
|
||||||
|
embedded in format strings.
|
||||||
|
|
||||||
|
The typical scenario is one where the string data being processed
|
||||||
|
is coming from outside the application, perhaps from HTTP headers
|
||||||
|
or fields within a web form. An attacker could substitute their
|
||||||
|
own strings designed to cause havok.
|
||||||
|
|
||||||
|
The string formatting system outlined in this PEP is by no means
|
||||||
|
'secure', in the sense that no Python library module can, on its
|
||||||
|
own, guarantee security, especially given the open nature of
|
||||||
|
the Python language. Building a secure application requires a
|
||||||
|
secure approach to design.
|
||||||
|
|
||||||
|
What this PEP does attempt to do is make the job of designing a
|
||||||
|
secure application easier, by making it easier for a programmer
|
||||||
|
to reason about the possible consequences of a string formatting
|
||||||
|
operation. It does this by limiting those consequences to a smaller
|
||||||
|
and more easier understood subset.
|
||||||
|
|
||||||
|
For example, because it is possible in Python to override the
|
||||||
|
'getattr' operation of a type, the interpretation of a compound
|
||||||
|
replacement field such as "0.name" could potentially run
|
||||||
|
arbitrary code.
|
||||||
|
|
||||||
|
However, it is *extremely* rare for the mere retrieval of an
|
||||||
|
attribute to have side effects. Other operations which are more
|
||||||
|
likely to have side effects - such as method calls - are disallowed.
|
||||||
|
Thus, a programmer can be reasonably assured that no string
|
||||||
|
formatting operation will cause a state change in the program.
|
||||||
|
This assurance is not only useful in securing an application, but
|
||||||
|
in debugging it as well.
|
||||||
|
|
||||||
|
Similarly, the restriction on field names beginning with
|
||||||
|
underscores is intended to provide similar assurances about the
|
||||||
|
visibility of private data.
|
||||||
|
|
||||||
|
Of course, programmers would be well-advised to avoid using
|
||||||
|
any external data as format strings, and instead use that data
|
||||||
|
as the format arguments instead.
|
||||||
|
|
||||||
|
|
||||||
Sample Implementation
|
Sample Implementation
|
||||||
|
|
||||||
A rough prototype of the underlying 'cformat' function has been
|
An implementation of an earlier version of this PEP was created by
|
||||||
coded in Python, however it needs much refinement before being
|
Patrick Maupin and Eric V. Smith, and can be found in the pep3101
|
||||||
submitted.
|
sandbox at:
|
||||||
|
|
||||||
|
http://svn.python.org/view/sandbox/trunk/pep3101/
|
||||||
|
|
||||||
|
|
||||||
Backwards Compatibility
|
Backwards Compatibility
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue