2006-04-26 16:33:25 -04:00
|
|
|
|
PEP: 3101
|
|
|
|
|
Title: Advanced String Formatting
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2015-12-27 12:09:06 -05:00
|
|
|
|
Author: Talin <viridia@gmail.com>
|
2009-01-19 11:08:45 -05:00
|
|
|
|
Status: Final
|
2007-04-14 22:10:27 -04:00
|
|
|
|
Type: Standards Track
|
2006-04-26 16:33:25 -04:00
|
|
|
|
Content-Type: text/plain
|
|
|
|
|
Created: 16-Apr-2006
|
|
|
|
|
Python-Version: 3.0
|
2008-07-14 11:50:26 -04:00
|
|
|
|
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007, 14-Sep-2008
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP proposes a new system for built-in string formatting
|
|
|
|
|
operations, intended as a replacement for the existing '%' string
|
|
|
|
|
formatting operator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
Python currently provides two methods of string interpolation:
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
- The '%' operator for strings. [1]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
- The string.Template module. [2]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
The primary scope of this PEP concerns proposals for built-in
|
2006-04-26 16:33:25 -04:00
|
|
|
|
string formatting operations (in other words, methods of the
|
2006-04-27 12:53:54 -04:00
|
|
|
|
built-in string type).
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
The '%' operator is primarily limited by the fact that it is a
|
|
|
|
|
binary operator, and therefore can take at most two arguments.
|
|
|
|
|
One of those arguments is already dedicated to the format string,
|
|
|
|
|
leaving all other variables to be squeezed into the remaining
|
|
|
|
|
argument. The current practice is to use either a dictionary or a
|
|
|
|
|
tuple as the second argument, but as many people have commented
|
2006-04-27 12:53:54 -04:00
|
|
|
|
[3], this lacks flexibility. The "all or nothing" approach
|
2006-04-26 16:33:25 -04:00
|
|
|
|
(meaning that one must choose between only positional arguments,
|
|
|
|
|
or only named arguments) is felt to be overly constraining.
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
While there is some overlap between this proposal and
|
|
|
|
|
string.Template, it is felt that each serves a distinct need,
|
2007-06-03 14:53:34 -04:00
|
|
|
|
and that one does not obviate the other. This proposal is for
|
|
|
|
|
a mechanism which, like '%', is efficient for small strings
|
|
|
|
|
which are only used once, so, for example, compilation of a
|
|
|
|
|
string into a template is not contemplated in this proposal,
|
|
|
|
|
although the proposal does take care to define format strings
|
|
|
|
|
and the API in such a way that an efficient template package
|
|
|
|
|
could reuse the syntax and even some of the underlying
|
|
|
|
|
formatting code.
|
2006-04-27 12:53:54 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
The specification will consist of the following parts:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-05-06 21:49:43 -04:00
|
|
|
|
- Specification of a new formatting method to be added to the
|
|
|
|
|
built-in string class.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
- Specification of functions and flag values to be added to
|
|
|
|
|
the string module, so that the underlying formatting engine
|
|
|
|
|
can be used with additional options.
|
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
- Specification of a new syntax for format strings.
|
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
- Specification of a new set of special methods to control the
|
2006-04-26 16:33:25 -04:00
|
|
|
|
formatting and conversion of objects.
|
|
|
|
|
|
|
|
|
|
- Specification of an API for user-defined formatting classes.
|
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
- Specification of how formatting errors are handled.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
Note on string encodings: When discussing this PEP in the context
|
|
|
|
|
of Python 3.0, it is assumed that all strings are unicode strings,
|
2006-06-10 20:59:06 -04:00
|
|
|
|
and that the use of the word 'string' in the context of this
|
|
|
|
|
document will generally refer to a Python 3.0 string, which is
|
|
|
|
|
the same as Python 2.x unicode object.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
In the context of Python 2.x, the use of the word 'string' in this
|
|
|
|
|
document refers to an object which may either be a regular string
|
|
|
|
|
or a unicode object. All of the function call interfaces
|
|
|
|
|
described in this PEP can be used for both strings and unicode
|
|
|
|
|
objects, and in all cases there is sufficient information
|
|
|
|
|
to be able to properly deduce the output string type (in
|
|
|
|
|
other words, there is no need for two separate APIs).
|
|
|
|
|
In all cases, the type of the format string dominates - that
|
2006-06-10 20:59:06 -04:00
|
|
|
|
is, the result of the conversion will always result in an object
|
|
|
|
|
that contains the same representation of characters as the
|
2007-06-03 14:53:34 -04:00
|
|
|
|
input format string.
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
String Methods
|
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
The built-in string class (and also the unicode class in 2.6) will
|
|
|
|
|
gain a new method, 'format', which takes an arbitrary number of
|
|
|
|
|
positional and keyword arguments:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
"The story of {0}, {1}, and {c}".format(a, b, c=d)
|
|
|
|
|
|
|
|
|
|
Within a format string, each positional argument is identified
|
|
|
|
|
with a number, starting from zero, so in the above example, 'a' is
|
|
|
|
|
argument 0 and 'b' is argument 1. Each keyword argument is
|
|
|
|
|
identified by its keyword name, so in the above example, 'c' is
|
|
|
|
|
used to refer to the third argument.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
There is also a global built-in function, 'format' which formats
|
|
|
|
|
a single value:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2008-05-29 09:26:39 -04:00
|
|
|
|
print(format(10.0, "7.3g"))
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
This function is described in a later section.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Format Strings
|
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
Format strings consist of intermingled character data and markup.
|
|
|
|
|
|
|
|
|
|
Character data is data which is transferred unchanged from the
|
|
|
|
|
format string to the output string; markup is not transferred from
|
|
|
|
|
the format string directly to the output, but instead is used to
|
2007-10-23 14:50:09 -04:00
|
|
|
|
define 'replacement fields' that describe to the format engine
|
|
|
|
|
what should be placed in the output string in place of the markup.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
Brace characters ('curly braces') are used to indicate a
|
|
|
|
|
replacement field within the string:
|
|
|
|
|
|
|
|
|
|
"My name is {0}".format('Fred')
|
|
|
|
|
|
|
|
|
|
The result of this is the string:
|
|
|
|
|
|
|
|
|
|
"My name is Fred"
|
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
Braces can be escaped by doubling:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
"My name is {0} :-{{}}".format('Fred')
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
Which would produce:
|
|
|
|
|
|
|
|
|
|
"My name is Fred :-{}"
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
The element within the braces is called a 'field'. Fields consist
|
2006-05-06 21:49:43 -04:00
|
|
|
|
of a 'field name', which can either be simple or compound, and an
|
2007-08-14 20:14:29 -04:00
|
|
|
|
optional 'format specifier'.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
|
|
|
|
Simple and Compound Field Names
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Simple field names are either names or numbers. If numbers, they
|
2006-05-06 21:49:43 -04:00
|
|
|
|
must be valid base-10 integers; if names, they must be valid
|
|
|
|
|
Python identifiers. A number is used to identify a positional
|
|
|
|
|
argument, while a name is used to identify a keyword argument.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
A compound field name is a combination of multiple simple field
|
|
|
|
|
names in an expression:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
"My name is {0.name}".format(open('out.txt', 'w'))
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
This example shows the use of the 'getattr' or 'dot' operator
|
2007-07-24 19:36:34 -04:00
|
|
|
|
in a field expression. The dot operator allows an attribute of
|
2006-06-10 20:59:06 -04:00
|
|
|
|
an input value to be specified as the field value.
|
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Unlike some other programming languages, you cannot embed arbitrary
|
|
|
|
|
expressions in format strings. This is by design - the types of
|
|
|
|
|
expressions that you can use is deliberately limited. Only two operators
|
|
|
|
|
are supported: the '.' (getattr) operator, and the '[]' (getitem)
|
2007-08-14 20:14:29 -04:00
|
|
|
|
operator. The reason for allowing these operators is that they don't
|
2007-07-24 19:36:34 -04:00
|
|
|
|
normally have side effects in non-pathological code.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
An example of the 'getitem' syntax:
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
"My name is {0[name]}".format(dict(name='Fred'))
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
It should be noted that the use of 'getitem' within a format string
|
|
|
|
|
is much more limited than its conventional usage. In the above example,
|
|
|
|
|
the string 'name' really is the literal string 'name', not a variable
|
|
|
|
|
named 'name'. The rules for parsing an item key are very simple.
|
2007-10-23 14:50:09 -04:00
|
|
|
|
If it starts with a digit, then it is treated as a number, otherwise
|
2007-06-03 14:53:34 -04:00
|
|
|
|
it is used as a string.
|
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
Because keys are not quote-delimited, it is not possible to
|
|
|
|
|
specify arbitrary dictionary keys (e.g., the strings "10" or
|
|
|
|
|
":-]") from within a format string.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Implementation note: The implementation of this proposal is
|
2007-10-23 14:50:09 -04:00
|
|
|
|
not required to enforce the rule about a simple or dotted name
|
|
|
|
|
being a valid Python identifier. Instead, it will rely on the
|
|
|
|
|
getattr function of the underlying object to throw an exception if
|
|
|
|
|
the identifier is not legal. The str.format() function will have
|
|
|
|
|
a minimalist parser which only attempts to figure out when it is
|
|
|
|
|
"done" with an identifier (by finding a '.' or a ']', or '}',
|
|
|
|
|
etc.).
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Format Specifiers
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Each field can also specify an optional set of 'format
|
2006-05-06 21:49:43 -04:00
|
|
|
|
specifiers' which can be used to adjust the format of that field.
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Format specifiers follow the field name, with a colon (':')
|
2006-05-06 21:49:43 -04:00
|
|
|
|
character separating the two:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
"My name is {0:8}".format('Fred')
|
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The meaning and syntax of the format specifiers depends on the
|
2007-10-23 14:50:09 -04:00
|
|
|
|
type of object that is being formatted, but there is a standard
|
|
|
|
|
set of format specifiers used for any object that does not
|
|
|
|
|
override them.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Format specifiers can themselves contain replacement fields.
|
2006-07-04 20:51:40 -04:00
|
|
|
|
For example, a field whose field width is itself a parameter
|
2006-06-10 20:59:06 -04:00
|
|
|
|
could be specified via:
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
"{0:{1}}".format(a, b)
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-08-23 19:16:30 -04:00
|
|
|
|
These 'internal' replacement fields can only occur in the format
|
|
|
|
|
specifier part of the replacement field. Internal replacement fields
|
|
|
|
|
cannot themselves have format specifiers. This implies also that
|
|
|
|
|
replacement fields cannot be nested to arbitrary levels.
|
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
Note that the doubled '}' at the end, which would normally be
|
|
|
|
|
escaped, is not escaped in this case. The reason is because
|
|
|
|
|
the '{{' and '}}' syntax for escapes is only applied when used
|
2007-07-24 19:36:34 -04:00
|
|
|
|
*outside* of a format field. Within a format field, the brace
|
2006-06-10 20:59:06 -04:00
|
|
|
|
characters always have their normal meaning.
|
2006-07-04 20:51:40 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The syntax for format specifiers is open-ended, since a class
|
|
|
|
|
can override the standard format specifiers. In such cases,
|
|
|
|
|
the str.format() method merely passes all of the characters between
|
2006-07-04 20:51:40 -04:00
|
|
|
|
the first colon and the matching brace to the relevant underlying
|
|
|
|
|
formatting method.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Standard Format Specifiers
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
If an object does not define its own format specifiers, a standard
|
|
|
|
|
set of format specifiers is used. These are similar in concept to
|
|
|
|
|
the format specifiers used by the existing '%' operator, however
|
|
|
|
|
there are also a number of differences.
|
2007-08-23 19:16:30 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The general form of a standard format specifier is:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2008-07-14 11:50:26 -04:00
|
|
|
|
[[fill]align][sign][#][0][minimumwidth][.precision][type]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-07-04 20:51:40 -04:00
|
|
|
|
The brackets ([]) indicate an optional element.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
Then the optional align flag can be one of the following:
|
|
|
|
|
|
|
|
|
|
'<' - Forces the field to be left-aligned within the available
|
|
|
|
|
space (This is the default.)
|
|
|
|
|
'>' - Forces the field to be right-aligned within the
|
|
|
|
|
available space.
|
2006-07-04 20:51:40 -04:00
|
|
|
|
'=' - Forces the padding to be placed after the sign (if any)
|
2007-07-24 19:36:34 -04:00
|
|
|
|
but before the digits. This is used for printing fields
|
2007-08-23 19:16:30 -04:00
|
|
|
|
in the form '+000000120'. This alignment option is only
|
|
|
|
|
valid for numeric types.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
'^' - Forces the field to be centered within the available
|
|
|
|
|
space.
|
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
Note that unless a minimum field width is defined, the field
|
|
|
|
|
width will always be the same size as the data to fill it, so
|
|
|
|
|
that the alignment option has no meaning in this case.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
The optional 'fill' character defines the character to be used to
|
2007-08-23 19:16:30 -04:00
|
|
|
|
pad the field to the minimum width. The fill character, if present,
|
|
|
|
|
must be followed by an alignment flag.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
The 'sign' option is only valid for numeric types, and can be one
|
|
|
|
|
of the following:
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
|
|
|
|
'+' - indicates that a sign should be used for both
|
|
|
|
|
positive as well as negative numbers
|
|
|
|
|
'-' - indicates that a sign should be used only for negative
|
2007-07-24 19:36:34 -04:00
|
|
|
|
numbers (this is the default behavior)
|
2006-06-10 20:59:06 -04:00
|
|
|
|
' ' - indicates that a leading space should be used on
|
|
|
|
|
positive numbers
|
2008-07-14 11:50:26 -04:00
|
|
|
|
|
|
|
|
|
If the '#' character is present, integers use the 'alternate form'
|
2009-03-17 04:34:18 -04:00
|
|
|
|
for formatting. This means that binary, octal, and hexadecimal
|
2008-07-14 11:50:26 -04:00
|
|
|
|
output will be prefixed with '0b', '0o', and '0x', respectively.
|
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
'width' is a decimal integer defining the minimum field width. If
|
2006-06-10 20:59:06 -04:00
|
|
|
|
not specified, then the field width will be determined by the
|
|
|
|
|
content.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-23 19:16:30 -04:00
|
|
|
|
If the width field is preceded by a zero ('0') character, this enables
|
|
|
|
|
zero-padding. This is equivalent to an alignment type of '=' and a
|
|
|
|
|
fill character of '0'.
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
2006-07-04 20:51:40 -04:00
|
|
|
|
The 'precision' is a decimal number indicating how many digits
|
|
|
|
|
should be displayed after the decimal point in a floating point
|
2007-10-23 14:50:09 -04:00
|
|
|
|
conversion. For non-numeric types the field indicates the maximum
|
2007-08-14 20:14:29 -04:00
|
|
|
|
field size - in other words, how many characters will be used from
|
|
|
|
|
the field content. The precision is ignored for integer conversions.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
Finally, the 'type' determines how the data should be presented.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The available integer presentation types are:
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
|
|
|
|
'b' - Binary. Outputs the number in base 2.
|
|
|
|
|
'c' - Character. Converts the integer to the corresponding
|
2007-10-23 14:50:09 -04:00
|
|
|
|
Unicode character before printing.
|
2006-06-10 20:59:06 -04:00
|
|
|
|
'd' - Decimal Integer. Outputs the number in base 10.
|
|
|
|
|
'o' - Octal format. Outputs the number in base 8.
|
|
|
|
|
'x' - Hex format. Outputs the number in base 16, using lower-
|
|
|
|
|
case letters for the digits above 9.
|
|
|
|
|
'X' - Hex format. Outputs the number in base 16, using upper-
|
|
|
|
|
case letters for the digits above 9.
|
2008-05-11 14:59:32 -04:00
|
|
|
|
'n' - Number. This is the same as 'd', except that it uses the
|
|
|
|
|
current locale setting to insert the appropriate
|
|
|
|
|
number separator characters.
|
2007-08-23 19:16:30 -04:00
|
|
|
|
'' (None) - the same as 'd'
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The available floating point presentation types are:
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
|
|
|
|
'e' - Exponent notation. Prints the number in scientific
|
|
|
|
|
notation using the letter 'e' to indicate the exponent.
|
2008-07-17 10:34:14 -04:00
|
|
|
|
'E' - Exponent notation. Same as 'e' except it converts the
|
|
|
|
|
number to uppercase.
|
2006-06-10 20:59:06 -04:00
|
|
|
|
'f' - Fixed point. Displays the number as a fixed-point
|
|
|
|
|
number.
|
2008-07-17 10:34:14 -04:00
|
|
|
|
'F' - Fixed point. Same as 'f' except it converts the number
|
|
|
|
|
to uppercase.
|
2006-06-10 20:59:06 -04:00
|
|
|
|
'g' - General format. This prints the number as a fixed-point
|
|
|
|
|
number, unless the number is too large, in which case
|
|
|
|
|
it switches to 'e' exponent notation.
|
|
|
|
|
'G' - General format. Same as 'g' except switches to 'E'
|
|
|
|
|
if the number gets to large.
|
|
|
|
|
'n' - Number. This is the same as 'g', except that it uses the
|
|
|
|
|
current locale setting to insert the appropriate
|
|
|
|
|
number separator characters.
|
|
|
|
|
'%' - Percentage. Multiplies the number by 100 and displays
|
|
|
|
|
in fixed ('f') format, followed by a percent sign.
|
2007-08-23 19:16:30 -04:00
|
|
|
|
'' (None) - similar to 'g', except that it prints at least one
|
|
|
|
|
digit after the decimal point.
|
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Objects are able to define their own format specifiers to
|
2006-06-10 20:59:06 -04:00
|
|
|
|
replace the standard ones. An example is the 'datetime' class,
|
2007-08-14 20:14:29 -04:00
|
|
|
|
whose format specifiers might look something like the
|
2006-06-10 20:59:06 -04:00
|
|
|
|
arguments to the strftime() function:
|
|
|
|
|
|
2011-09-30 16:35:03 -04:00
|
|
|
|
"Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now())
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-23 19:16:30 -04:00
|
|
|
|
For all built-in types, an empty format specification will produce
|
2007-10-23 14:50:09 -04:00
|
|
|
|
the equivalent of str(value). It is recommended that objects
|
|
|
|
|
defining their own format specifiers follow this convention as
|
|
|
|
|
well.
|
2007-08-23 19:16:30 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Explicit Conversion Flag
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The explicit conversion flag is used to transform the format field value
|
|
|
|
|
before it is formatted. This can be used to override the type-specific
|
|
|
|
|
formatting behavior, and format the value as if it were a more
|
|
|
|
|
generic type. Currently, two explicit conversion flags are
|
|
|
|
|
recognized:
|
2007-08-23 19:16:30 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
!r - convert the value to a string using repr().
|
|
|
|
|
!s - convert the value to a string using str().
|
2007-08-23 19:16:30 -04:00
|
|
|
|
|
|
|
|
|
These flags are placed before the format specifier:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
"{0!r:20}".format("Hello")
|
2007-08-23 19:16:30 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
In the preceding example, the string "Hello" will be printed, with quotes,
|
|
|
|
|
in a field of at least 20 characters width.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-23 19:16:30 -04:00
|
|
|
|
A custom Formatter class can define additional conversion flags.
|
|
|
|
|
The built-in formatter will raise a ValueError if an invalid
|
|
|
|
|
conversion flag is specified.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Controlling Formatting on a Per-Type Basis
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Each Python type can control formatting of its instances by defining
|
|
|
|
|
a __format__ method. The __format__ method is responsible for
|
|
|
|
|
interpreting the format specifier, formatting the value, and
|
|
|
|
|
returning the resulting string.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The new, global built-in function 'format' simply calls this special
|
|
|
|
|
method, similar to how len() and str() simply call their respective
|
|
|
|
|
special methods:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
def format(value, format_spec):
|
|
|
|
|
return value.__format__(format_spec)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
It is safe to call this function with a value of "None" (because the
|
|
|
|
|
"None" value in Python is an object and can have methods.)
|
|
|
|
|
|
|
|
|
|
Several built-in types, including 'str', 'int', 'float', and 'object'
|
|
|
|
|
define __format__ methods. This means that if you derive from any of
|
|
|
|
|
those types, your class will know how to format itself.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The object.__format__ method is the simplest: It simply converts the
|
|
|
|
|
object to a string, and then calls format again:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
class object:
|
|
|
|
|
def __format__(self, format_spec):
|
|
|
|
|
return format(str(self), format_spec)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The __format__ methods for 'int' and 'float' will do numeric formatting
|
|
|
|
|
based on the format specifier. In some cases, these formatting
|
|
|
|
|
operations may be delegated to other types. So for example, in the case
|
|
|
|
|
where the 'int' formatter sees a format type of 'f' (meaning 'float')
|
|
|
|
|
it can simply cast the value to a float and call format() again.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Any class can override the __format__ method to provide custom
|
|
|
|
|
formatting for that type:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
class AST:
|
|
|
|
|
def __format__(self, format_spec):
|
|
|
|
|
...
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Note for Python 2.x: The 'format_spec' argument will be either
|
|
|
|
|
a string object or a unicode object, depending on the type of the
|
|
|
|
|
original format string. The __format__ method should test the type
|
|
|
|
|
of the specifiers parameter to determine whether to return a string or
|
|
|
|
|
unicode object. It is the responsibility of the __format__ method
|
|
|
|
|
to return an object of the proper type.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Note that the 'explicit conversion' flag mentioned above is not passed
|
|
|
|
|
to the __format__ method. Rather, it is expected that the conversion
|
|
|
|
|
specified by the flag will be performed before calling __format__.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
User-Defined Formatting
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-05-06 21:49:43 -04:00
|
|
|
|
There will be times when customizing the formatting of fields
|
2007-06-03 14:53:34 -04:00
|
|
|
|
on a per-type basis is not enough. An example might be a
|
|
|
|
|
spreadsheet application, which displays hash marks '#' when a value
|
|
|
|
|
is too large to fit in the available space.
|
|
|
|
|
|
|
|
|
|
For more powerful and flexible formatting, access to the underlying
|
|
|
|
|
format engine can be obtained through the 'Formatter' class that
|
|
|
|
|
lives in the 'string' module. This class takes additional options
|
|
|
|
|
which are not accessible via the normal str.format method.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
An application can subclass the Formatter class to create its own
|
|
|
|
|
customized formatting behavior.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
The PEP does not attempt to exactly specify all methods and
|
2007-10-23 14:50:09 -04:00
|
|
|
|
properties defined by the Formatter class; instead, those will be
|
2007-07-24 19:36:34 -04:00
|
|
|
|
defined and documented in the initial implementation. However, this
|
2007-06-03 14:53:34 -04:00
|
|
|
|
PEP will specify the general requirements for the Formatter class,
|
|
|
|
|
which are listed below.
|
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Although string.format() does not directly use the Formatter class
|
|
|
|
|
to do formatting, both use the same underlying implementation. The
|
|
|
|
|
reason that string.format() does not use the Formatter class directly
|
|
|
|
|
is because "string" is a built-in type, which means that all of its
|
|
|
|
|
methods must be implemented in C, whereas Formatter is a Python
|
|
|
|
|
class. Formatter provides an extensible wrapper around the same
|
|
|
|
|
C functions as are used by string.format().
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Formatter Methods
|
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
The Formatter class takes no initialization arguments:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
fmt = Formatter()
|
|
|
|
|
|
|
|
|
|
The public API methods of class Formatter are as follows:
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
-- format(format_string, *args, **kwargs)
|
|
|
|
|
-- vformat(format_string, args, kwargs)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
'format' is the primary API method. It takes a format template,
|
2007-10-23 14:50:09 -04:00
|
|
|
|
and an arbitrary set of positional and keyword arguments.
|
|
|
|
|
'format' is just a wrapper that calls 'vformat'.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
'vformat' is the function that does the actual work of formatting. It
|
2007-06-03 14:53:34 -04:00
|
|
|
|
is exposed as a separate function for cases where you want to pass in
|
|
|
|
|
a predefined dictionary of arguments, rather than unpacking and
|
|
|
|
|
repacking the dictionary as individual arguments using the '*args' and
|
2007-07-24 19:36:34 -04:00
|
|
|
|
'**kwds' syntax. 'vformat' does the work of breaking up the format
|
|
|
|
|
template string into character data and replacement fields. It calls
|
|
|
|
|
the 'get_positional' and 'get_index' methods as appropriate (described
|
|
|
|
|
below.)
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Formatter defines the following overridable methods:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
-- get_value(key, args, kwargs)
|
2007-07-24 19:36:34 -04:00
|
|
|
|
-- check_unused_args(used_args, args, kwargs)
|
2007-08-14 20:14:29 -04:00
|
|
|
|
-- format_field(value, format_spec)
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
'get_value' is used to retrieve a given field value. The 'key' argument
|
|
|
|
|
will be either an integer or a string. If it is an integer, it represents
|
|
|
|
|
the index of the positional argument in 'args'; If it is a string, then
|
|
|
|
|
it represents a named argument in 'kwargs'.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
The 'args' parameter is set to the list of positional arguments to
|
|
|
|
|
'vformat', and the 'kwargs' parameter is set to the dictionary of
|
|
|
|
|
positional arguments.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
For compound field names, these functions are only called for the
|
2007-10-23 14:50:09 -04:00
|
|
|
|
first component of the field name; subsequent components are handled
|
2007-08-14 20:14:29 -04:00
|
|
|
|
through normal attribute and indexing operations.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
So for example, the field expression '0.name' would cause 'get_value'
|
|
|
|
|
to be called with a 'key' argument of 0. The 'name' attribute will be
|
|
|
|
|
looked up after 'get_value' returns by calling the built-in 'getattr'
|
|
|
|
|
function.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
If the index or keyword refers to an item that does not exist, then an
|
2007-08-14 20:14:29 -04:00
|
|
|
|
IndexError/KeyError should be raised.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
'check_unused_args' is used to implement checking for unused arguments
|
|
|
|
|
if desired. The arguments to this function is the set of all argument
|
|
|
|
|
keys that were actually referred to in the format string (integers for
|
|
|
|
|
positional arguments, and strings for named arguments), and a reference
|
2007-08-23 19:16:30 -04:00
|
|
|
|
to the args and kwargs that was passed to vformat. The set of unused
|
|
|
|
|
args can be calculated from these parameters. 'check_unused_args'
|
2007-07-24 19:36:34 -04:00
|
|
|
|
is assumed to throw an exception if the check fails.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
'format_field' simply calls the global 'format' built-in. The method
|
|
|
|
|
is provided so that subclasses can override it.
|
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
To get a better understanding of how these functions relate to each
|
|
|
|
|
other, here is pseudocode that explains the general operation of
|
2007-08-14 20:14:29 -04:00
|
|
|
|
vformat.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
def vformat(format_string, args, kwargs):
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
# Output buffer and set of used args
|
|
|
|
|
buffer = StringIO.StringIO()
|
|
|
|
|
used_args = set()
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
# Tokens are either format fields or literal strings
|
|
|
|
|
for token in self.parse(format_string):
|
|
|
|
|
if is_format_field(token):
|
2007-08-14 20:14:29 -04:00
|
|
|
|
# Split the token into field value and format spec
|
|
|
|
|
field_spec, _, format_spec = token.partition(":")
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
# Check for explicit type conversion
|
2007-08-23 19:16:30 -04:00
|
|
|
|
explicit, _, field_spec = field_spec.rpartition("!")
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
# 'first_part' is the part before the first '.' or '['
|
2007-08-14 20:14:29 -04:00
|
|
|
|
# Assume that 'get_first_part' returns either an int or
|
|
|
|
|
# a string, depending on the syntax.
|
|
|
|
|
first_part = get_first_part(field_spec)
|
|
|
|
|
value = self.get_value(first_part, args, kwargs)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
# Record the fact that we used this arg
|
2007-07-24 19:36:34 -04:00
|
|
|
|
used_args.add(first_part)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
# Handle [subfield] or .subfield. Assume that 'components'
|
|
|
|
|
# returns an iterator of the various subfields, not including
|
|
|
|
|
# the first part.
|
|
|
|
|
for comp in components(field_spec):
|
2007-07-24 19:36:34 -04:00
|
|
|
|
value = resolve_subfield(value, comp)
|
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
# Handle explicit type conversion
|
|
|
|
|
if explicit == 'r':
|
|
|
|
|
value = repr(value)
|
|
|
|
|
elif explicit == 's':
|
|
|
|
|
value = str(value)
|
|
|
|
|
|
|
|
|
|
# Call the global 'format' function and write out the converted
|
|
|
|
|
# value.
|
|
|
|
|
buffer.write(self.format_field(value, format_spec))
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
else:
|
|
|
|
|
buffer.write(token)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
self.check_unused_args(used_args, args, kwargs)
|
|
|
|
|
return buffer.getvalue()
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
Note that the actual algorithm of the Formatter class (which will be
|
|
|
|
|
implemented in C) may not be the one presented here. (It's likely
|
|
|
|
|
that the actual implementation won't be a 'class' at all - rather,
|
|
|
|
|
vformat may just call a C function which accepts the other overridable
|
|
|
|
|
methods as arguments.) The primary purpose of this code example is to
|
|
|
|
|
illustrate the order in which overridable methods are called.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Customizing Formatters
|
|
|
|
|
|
|
|
|
|
This section describes some typical ways that Formatter objects
|
|
|
|
|
can be customized.
|
|
|
|
|
|
|
|
|
|
To support alternative format-string syntax, the 'vformat' method
|
|
|
|
|
can be overridden to alter the way format strings are parsed.
|
|
|
|
|
|
|
|
|
|
One common desire is to support a 'default' namespace, so that
|
|
|
|
|
you don't need to pass in keyword arguments to the format()
|
|
|
|
|
method, but can instead use values in a pre-existing namespace.
|
2007-08-23 19:16:30 -04:00
|
|
|
|
This can easily be done by overriding get_value() as follows:
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
class NamespaceFormatter(Formatter):
|
2007-08-23 19:16:30 -04:00
|
|
|
|
def __init__(self, namespace={}):
|
|
|
|
|
Formatter.__init__(self)
|
2007-06-03 14:53:34 -04:00
|
|
|
|
self.namespace = namespace
|
|
|
|
|
|
2007-08-14 20:14:29 -04:00
|
|
|
|
def get_value(self, key, args, kwds):
|
|
|
|
|
if isinstance(key, str):
|
|
|
|
|
try:
|
|
|
|
|
# Check explicitly passed arguments first
|
2010-04-02 04:54:11 -04:00
|
|
|
|
return kwds[key]
|
2007-08-14 20:14:29 -04:00
|
|
|
|
except KeyError:
|
2010-04-02 04:54:11 -04:00
|
|
|
|
return self.namespace[key]
|
2007-08-14 20:14:29 -04:00
|
|
|
|
else:
|
|
|
|
|
Formatter.get_value(key, args, kwds)
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
One can use this to easily create a formatting function that allows
|
|
|
|
|
access to global variables, for example:
|
|
|
|
|
|
|
|
|
|
fmt = NamespaceFormatter(globals())
|
|
|
|
|
|
|
|
|
|
greeting = "hello"
|
2010-04-02 04:54:11 -04:00
|
|
|
|
print(fmt.format("{greeting}, world!"))
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
A similar technique can be done with the locals() dictionary to
|
|
|
|
|
gain access to the locals dictionary.
|
|
|
|
|
|
|
|
|
|
It would also be possible to create a 'smart' namespace formatter
|
|
|
|
|
that could automatically access both locals and globals through
|
2007-07-24 19:36:34 -04:00
|
|
|
|
snooping of the calling stack. Due to the need for compatibility
|
2007-10-23 14:50:09 -04:00
|
|
|
|
with the different versions of Python, such a capability will not
|
|
|
|
|
be included in the standard library, however it is anticipated
|
|
|
|
|
that someone will create and publish a recipe for doing this.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
Another type of customization is to change the way that built-in
|
2007-07-24 19:36:34 -04:00
|
|
|
|
types are formatted by overriding the 'format_field' method. (For
|
2007-06-03 14:53:34 -04:00
|
|
|
|
non-built-in types, you can simply define a __format__ special
|
2007-07-24 19:36:34 -04:00
|
|
|
|
method on that type.) So for example, you could override the
|
2007-06-03 14:53:34 -04:00
|
|
|
|
formatting of numbers to output scientific notation when needed.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
|
|
|
|
|
Error handling
|
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
There are two classes of exceptions which can occur during formatting:
|
|
|
|
|
exceptions generated by the formatter code itself, and exceptions
|
2007-07-24 19:36:34 -04:00
|
|
|
|
generated by user code (such as a field object's 'getattr' function).
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
In general, exceptions generated by the formatter code itself are
|
|
|
|
|
of the "ValueError" variety -- there is an error in the actual "value"
|
|
|
|
|
of the format string. (This is not always true; for example, the
|
|
|
|
|
string.format() function might be passed a non-string as its first
|
|
|
|
|
parameter, which would result in a TypeError.)
|
|
|
|
|
|
|
|
|
|
The text associated with these internally generated ValueError
|
|
|
|
|
exceptions will indicate the location of the exception inside
|
|
|
|
|
the format string, as well as the nature of the exception.
|
|
|
|
|
|
|
|
|
|
For exceptions generated by user code, a trace record and
|
|
|
|
|
dummy frame will be added to the traceback stack to help
|
|
|
|
|
in determining the location in the string where the exception
|
|
|
|
|
occurred. The inserted traceback will indicate that the
|
|
|
|
|
error occurred at:
|
|
|
|
|
|
|
|
|
|
File "<format_string>;", line XX, in column_YY
|
|
|
|
|
|
|
|
|
|
where XX and YY represent the line and character position
|
|
|
|
|
information in the string, respectively.
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alternate Syntax
|
|
|
|
|
|
|
|
|
|
Naturally, one of the most contentious issues is the syntax of the
|
|
|
|
|
format strings, and in particular the markup conventions used to
|
|
|
|
|
indicate fields.
|
|
|
|
|
|
|
|
|
|
Rather than attempting to exhaustively list all of the various
|
|
|
|
|
proposals, I will cover the ones that are most widely used
|
|
|
|
|
already.
|
|
|
|
|
|
|
|
|
|
- Shell variable syntax: $name and $(name) (or in some variants,
|
|
|
|
|
${name}). This is probably the oldest convention out there, and
|
|
|
|
|
is used by Perl and many others. When used without the braces,
|
|
|
|
|
the length of the variable is determined by lexically scanning
|
|
|
|
|
until an invalid character is found.
|
|
|
|
|
|
|
|
|
|
This scheme is generally used in cases where interpolation is
|
|
|
|
|
implicit - that is, in environments where any string can contain
|
2007-10-23 14:50:09 -04:00
|
|
|
|
interpolation variables, and no special substitution function
|
2006-04-26 16:33:25 -04:00
|
|
|
|
need be invoked. In such cases, it is important to prevent the
|
2007-10-23 14:50:09 -04:00
|
|
|
|
interpolation behavior from occurring accidentally, so the '$'
|
2006-04-26 16:33:25 -04:00
|
|
|
|
(which is otherwise a relatively uncommonly-used character) is
|
|
|
|
|
used to signal when the behavior should occur.
|
|
|
|
|
|
|
|
|
|
It is the author's opinion, however, that in cases where the
|
|
|
|
|
formatting is explicitly invoked, that less care needs to be
|
|
|
|
|
taken to prevent accidental interpolation, in which case a
|
|
|
|
|
lighter and less unwieldy syntax can be used.
|
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
- printf and its cousins ('%'), including variations that add a
|
2006-04-26 16:33:25 -04:00
|
|
|
|
field index, so that fields can be interpolated out of order.
|
|
|
|
|
|
|
|
|
|
- Other bracket-only variations. Various MUDs (Multi-User
|
|
|
|
|
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
|
|
|
|
|
string interpolation. The Microsoft .Net libraries uses braces
|
|
|
|
|
({}), and a syntax which is very similar to the one in this
|
2007-08-14 20:14:29 -04:00
|
|
|
|
proposal, although the syntax for format specifiers is quite
|
2006-04-27 12:53:54 -04:00
|
|
|
|
different. [4]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
- Backquoting. This method has the benefit of minimal syntactical
|
|
|
|
|
clutter, however it lacks many of the benefits of a function
|
|
|
|
|
call syntax (such as complex expression arguments, custom
|
|
|
|
|
formatters, etc.).
|
|
|
|
|
|
|
|
|
|
- Other variations include Ruby's #{}, PHP's {$name}, and so
|
|
|
|
|
on.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-05-06 21:49:43 -04:00
|
|
|
|
Some specific aspects of the syntax warrant additional comments:
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-06-10 20:59:06 -04:00
|
|
|
|
1) Backslash character for escapes. The original version of
|
|
|
|
|
this PEP used backslash rather than doubling to escape a bracket.
|
|
|
|
|
This worked because backslashes in Python string literals that
|
|
|
|
|
don't conform to a standard backslash sequence such as '\n'
|
2007-07-24 19:36:34 -04:00
|
|
|
|
are left unmodified. However, this caused a certain amount
|
2006-06-10 20:59:06 -04:00
|
|
|
|
of confusion, and led to potential situations of multiple
|
|
|
|
|
recursive escapes, i.e. '\\\\{' to place a literal backslash
|
|
|
|
|
in front of a bracket.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-05-06 21:49:43 -04:00
|
|
|
|
2) The use of the colon character (':') as a separator for
|
2007-08-14 20:14:29 -04:00
|
|
|
|
format specifiers. This was chosen simply because that's
|
2006-05-06 21:49:43 -04:00
|
|
|
|
what .Net uses.
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Alternate Feature Proposals
|
|
|
|
|
|
|
|
|
|
Restricting attribute access: An earlier version of the PEP
|
|
|
|
|
restricted the ability to access attributes beginning with a
|
|
|
|
|
leading underscore, for example "{0}._private". However, this
|
|
|
|
|
is a useful ability to have when debugging, so the feature
|
|
|
|
|
was dropped.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Some developers suggested that the ability to do 'getattr' and
|
|
|
|
|
'getitem' access should be dropped entirely. However, this
|
|
|
|
|
is in conflict with the needs of another set of developers who
|
|
|
|
|
strongly lobbied for the ability to pass in a large dict as a
|
|
|
|
|
single argument (without flattening it into individual keyword
|
|
|
|
|
arguments using the **kwargs syntax) and then have the format
|
|
|
|
|
string refer to dict entries individually.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
There has also been suggestions to expand the set of expressions
|
|
|
|
|
that are allowed in a format string. However, this was seen
|
|
|
|
|
to go against the spirit of TOOWTDI, since the same effect can
|
|
|
|
|
be achieved in most cases by executing the same expression on
|
|
|
|
|
the parameter before it's passed in to the formatting function.
|
|
|
|
|
For cases where the format string is being use to do arbitrary
|
|
|
|
|
formatting in a data-rich environment, it's recommended to use
|
2007-10-23 14:50:09 -04:00
|
|
|
|
a template engine specialized for this purpose, such as
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Genshi [5] or Cheetah [6].
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
Many other features were considered and rejected because they
|
|
|
|
|
could easily be achieved by subclassing Formatter instead of
|
|
|
|
|
building the feature into the base implementation. This includes
|
|
|
|
|
alternate syntax, comments in format strings, and many others.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
Security Considerations
|
|
|
|
|
|
|
|
|
|
Historically, string formatting has been a common source of
|
|
|
|
|
security holes in web-based applications, particularly if the
|
2007-10-23 14:50:09 -04:00
|
|
|
|
string formatting system allows arbitrary expressions to be
|
2007-06-03 14:53:34 -04:00
|
|
|
|
embedded in format strings.
|
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
The best way to use string formatting in a way that does not
|
|
|
|
|
create potential security holes is to never use format strings
|
|
|
|
|
that come from an untrusted source.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-10-23 14:50:09 -04:00
|
|
|
|
Barring that, the next best approach is to ensure that string
|
2007-07-24 19:36:34 -04:00
|
|
|
|
formatting has no side effects. Because of the open nature of
|
|
|
|
|
Python, it is impossible to guarantee that any non-trivial
|
|
|
|
|
operation has this property. What this PEP does is limit the
|
|
|
|
|
types of expressions in format strings to those in which visible
|
|
|
|
|
side effects are both rare and strongly discouraged by the
|
|
|
|
|
culture of Python developers. So for example, attribute access
|
|
|
|
|
is allowed because it would be considered pathological to write
|
|
|
|
|
code where the mere access of an attribute has visible side
|
|
|
|
|
effects (whether the code has *invisible* side effects - such
|
|
|
|
|
as creating a cache entry for faster lookup - is irrelevant.)
|
2007-06-03 14:53:34 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
Sample Implementation
|
|
|
|
|
|
2007-06-03 14:53:34 -04:00
|
|
|
|
An implementation of an earlier version of this PEP was created by
|
|
|
|
|
Patrick Maupin and Eric V. Smith, and can be found in the pep3101
|
|
|
|
|
sandbox at:
|
|
|
|
|
|
|
|
|
|
http://svn.python.org/view/sandbox/trunk/pep3101/
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
Backwards Compatibility
|
|
|
|
|
|
|
|
|
|
Backwards compatibility can be maintained by leaving the existing
|
|
|
|
|
mechanisms in place. The new system does not collide with any of
|
|
|
|
|
the method names of the existing string formatting techniques, so
|
|
|
|
|
both systems can co-exist until it comes time to deprecate the
|
|
|
|
|
older system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
[1] Python Library Reference - String formating operations
|
2009-10-01 17:59:04 -04:00
|
|
|
|
http://docs.python.org/library/stdtypes.html#string-formatting-operations
|
2006-04-27 12:53:54 -04:00
|
|
|
|
|
|
|
|
|
[2] Python Library References - Template strings
|
2009-10-01 17:59:04 -04:00
|
|
|
|
http://docs.python.org/library/string.html#string.Template
|
2006-04-27 12:53:54 -04:00
|
|
|
|
|
|
|
|
|
[3] [Python-3000] String formating operations in python 3k
|
2006-04-26 16:33:25 -04:00
|
|
|
|
http://mail.python.org/pipermail/python-3000/2006-April/000285.html
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
[4] Composite Formatting - [.Net Framework Developer's Guide]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2007-07-24 19:36:34 -04:00
|
|
|
|
[5] Genshi templating engine.
|
|
|
|
|
http://genshi.edgewall.org/
|
|
|
|
|
|
|
|
|
|
[5] Cheetah - The Python-Powered Template Engine.
|
|
|
|
|
http://www.cheetahtemplate.org/
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|