Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

This commit is contained in:
Talin 2006-06-11 00:59:06 +00:00
parent 7e4415edea
commit 0ee269ce09
1 changed files with 228 additions and 78 deletions

View File

@ -8,7 +8,7 @@ Type: Standards
Content-Type: text/plain
Created: 16-Apr-2006
Python-Version: 3.0
Post-History: 28-Apr-2006
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
Abstract
@ -48,7 +48,7 @@ Rationale
Specification
The specification will consist of 4 parts:
The specification will consist of the following parts:
- Specification of a new formatting method to be added to the
built-in string class.
@ -60,6 +60,26 @@ Specification
- Specification of an API for user-defined formatting classes.
- Specification of how formatting errors are handled.
Note on string encodings: Since this PEP is being targeted
at Python 3.0, it is assumed that all strings are unicode strings,
and that the use of the word 'string' in the context of this
document will generally refer to a Python 3.0 string, which is
the same as Python 2.x unicode object.
If it should happen that this functionality is backported to
the 2.x series, then it will be necessary to handle both regular
string as well as unicode objects. All of the function call
interfaces described in this PEP can be used for both strings
and unicode objects, and in all cases there is sufficient
information to be able to properly deduce the output string
type (in other words, there is no need for two separate APIs).
In all cases, the type of the template string dominates - that
is, the result of the conversion will always result in an object
that contains the same representation of characters as the
input template string.
String Methods
@ -75,9 +95,6 @@ String Methods
identified by its keyword name, so in the above example, 'c' is
used to refer to the third argument.
The result of the format call is an object of the same type
(string or unicode) as the format string.
Format Strings
@ -90,9 +107,9 @@ Format Strings
"My name is Fred"
Braces can be escaped using a backslash:
Braces can be escaped by doubling:
"My name is {0} :-\{\}".format('Fred')
"My name is {0} :-{{}}".format('Fred')
Which would produce:
@ -102,20 +119,47 @@ Format Strings
of a 'field name', which can either be simple or compound, and an
optional 'conversion specifier'.
Simple and Compound Field Names
Simple field names are either names or numbers. If numbers, they
must be valid base-10 integers; if names, they must be valid
Python identifiers. A number is used to identify a positional
argument, while a name is used to identify a keyword argument.
Compound names are a sequence of simple names seperated by
periods:
A compound field name is a combination of multiple simple field
names in an expression:
"My name is {0.name} :-\{\}".format(dict(name='Fred'))
"My name is {0.name}".format(file('out.txt'))
Compound names can be used to access specific dictionary entries,
array elements, or object attributes. In the above example, the
'{0.name}' field refers to the dictionary entry 'name' within
positional argument 0.
This example shows the use of the 'getattr' or 'dot' operator
in a field expression. The dot operator allows an attribute of
an input value to be specified as the field value.
The types of expressions that can be used in a compound name
have been deliberately limited in order to prevent potential
security exploits resulting from the ability to place arbitrary
Python expressions inside of strings. Only two operators are
supported, the '.' (getattr) operator, and the '[]' (getitem)
operator.
An example of the 'getitem' syntax:
"My name is {0[name]}".format(dict(name='Fred'))
It should be noted that the use of 'getitem' within a string is
much more limited than its normal use. In the above example, the
string 'name' really is the literal string 'name', not a variable
named 'name'. The rules for parsing an item key are the same as
for parsing a simple name - in other words, if it looks like a
number, then its treated as a number, if it looks like an
identifier, then it is used as a string.
It is not possible to specify arbitrary dictionary keys from
within a format string.
Conversion Specifiers
Each field can also specify an optional set of 'conversion
specifiers' which can be used to adjust the format of that field.
@ -129,53 +173,135 @@ Format Strings
built-in types will recognize a standard set of conversion
specifiers.
The conversion specifier consists of a sequence of zero or more
characters, each of which can consist of any printable character
except for a non-escaped '}'.
Conversion specifiers can themselves contain replacement fields.
For example, a field whose field width it itself a parameter
could be specified via:
Conversion specifiers can themselves contain replacement fields;
this will be described in a later section. Except for this
replacement, the format() method does not attempt to intepret the
conversion specifiers in any way; it merely passes all of the
characters between the first colon ':' and the matching right
brace ('}') to the various underlying formatters (described
later.)
"{0:{1}}".format(a, b, c)
Note that the doubled '}' at the end, which would normally be
escaped, is not escaped in this case. The reason is because
the '{{' and '}}' syntax for escapes is only applied when used
*outside* of a format field. Within a format field, the brace
characters always have their normal meaning.
The syntax for conversion specifiers is open-ended, since except
than doing field replacements, the format() method does not
attempt to interpret them in any way; it merely passes all of the
characters between the first colon and the matching brace to
the various underlying formatter methods.
Standard Conversion Specifiers
For most built-in types, the conversion specifiers will be the
same or similar to the existing conversion specifiers used with
the '%' operator. Thus, instead of '%02.2x", you will say
'{0:02.2x}'.
If an object does not define its own conversion specifiers, a
standard set of conversion specifiers are used. These are similar
in concept to the conversion specifiers used by the existing '%'
operator, however there are also a number of significant
differences. The standard conversion specifiers fall into three
major categories: string conversions, integer conversions and
floating point conversions.
There are a few differences however:
The general form of a standard conversion specifier is:
- The trailing letter is optional - you don't need to say '2.2d',
you can instead just say '2.2'. If the letter is omitted, a
default will be assumed based on the type of the argument.
The defaults will be as follows:
[[fill]align][sign][width][.precision][type]
string or unicode object: 's'
integer: 'd'
floating-point number: 'f'
all other types: 's'
The brackets ([]) indicate an optional field.
- Variable field width specifiers use a nested version of the {}
syntax, allowing the width specifier to be either a positional
or keyword argument:
Then the optional align flag can be one of the following:
"{0:{1}.{2}d}".format(a, b, c)
'<' - Forces the field to be left-aligned within the available
space (This is the default.)
'>' - Forces the field to be right-aligned within the
available space.
'=' - Forces the padding to be placed between immediately
after the sign, if any. This is used for printing fields
in the form '+000000120'.
- The support for length modifiers (which are ignored by Python
anyway) is dropped.
Note that unless a minimum field width is defined, the field
width will always be the same size as the data to fill it, so
that the alignment option has no meaning in this case.
For non-built-in types, the conversion specifiers will be specific
to that type. An example is the 'datetime' class, whose
conversion specifiers are identical to the arguments to the
strftime() function:
The optional 'fill' character defines the character to be used to
pad the field to the minimum width. The alignment flag must be
supplied if the character is a number other than 0 (otherwise the
character would be interpreted as part of the field width
specifier). A zero fill character without an alignment flag
implies an alignment type of '='.
"Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now())
The 'sign' field can be one of the following:
'+' - indicates that a sign should be used for both
positive as well as negative numbers
'-' - indicates that a sign should be used only for negative
numbers (this is the default behaviour)
' ' - indicates that a leading space should be used on
positive numbers
'()' - indicates that negative numbers should be surrounded
by parentheses
'width' is a decimal integer defining the minimum field width. If
not specified, then the field width will be determined by the
content.
The 'precision' field is a decimal number indicating how many
digits should be displayed after the decimal point.
Finally, the 'type' determines how the data should be presented.
If the type field is absent, an appropriate type will be assigned
based on the value to be formatted ('d' for integers and longs,
'g' for floats, and 's' for everything else.)
The available string conversion types are:
's' - String format. Invokes str() on the object.
This is the default conversion specifier type.
'r' - Repr format. Invokes repr() on the object.
There are several integer conversion types. All invoke int() on
the object before attempting to format it.
The available integer conversion types are:
'b' - Binary. Outputs the number in base 2.
'c' - Character. Converts the integer to the corresponding
unicode character before printing.
'd' - Decimal Integer. Outputs the number in base 10.
'o' - Octal format. Outputs the number in base 8.
'x' - Hex format. Outputs the number in base 16, using lower-
case letters for the digits above 9.
'X' - Hex format. Outputs the number in base 16, using upper-
case letters for the digits above 9.
There are several floating point conversion types. All invoke
float() on the object before attempting to format it.
The available floating point conversion types are:
'e' - Exponent notation. Prints the number in scientific
notation using the letter 'e' to indicate the exponent.
'E' - Exponent notation. Same as 'e' except it uses an upper
case 'E' as the separator character.
'f' - Fixed point. Displays the number as a fixed-point
number.
'F' - Fixed point. Same as 'f'.
'g' - General format. This prints the number as a fixed-point
number, unless the number is too large, in which case
it switches to 'e' exponent notation.
'G' - General format. Same as 'g' except switches to 'E'
if the number gets to large.
'n' - Number. This is the same as 'g', except that it uses the
current locale setting to insert the appropriate
number separator characters.
'%' - Percentage. Multiplies the number by 100 and displays
in fixed ('f') format, followed by a percent sign.
Objects are able to define their own conversion specifiers to
replace the standard ones. An example is the 'datetime' class,
whose conversion specifiers might look something like the
arguments to the strftime() function:
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
Controlling Formatting
@ -224,19 +350,22 @@ User-Defined Formatting Classes
API for such an application-specific formatter is up to the
application; here are several possible examples:
cell_format( "The total is: {0}", total )
cell_format("The total is: {0}", total)
TemplateString( "The total is: {0}" ).format( total )
TemplateString("The total is: {0}").format(total)
Creating an application-specific formatter is relatively straight-
forward. The string and unicode classes will have a class method
called 'cformat' that does all the actual work of formatting; The
built-in format() method is just a wrapper that calls cformat.
The type signature for the cFormat function is as follows:
cformat(template, format_hook, args, kwargs)
The parameters to the cformat function are:
-- The format string (or unicode; the same function handles
both.)
-- The format template string.
-- A callable 'format hook', which is called once per field
-- A tuple containing the positional arguments
-- A dict containing the keyword arguments
@ -251,7 +380,7 @@ User-Defined Formatting Classes
will attempt to call the field format hook with the following
arguments:
format_hook(value, conversion, buffer)
format_hook(value, conversion)
The 'value' field corresponds to the value being formatted, which
was retrieved from the arguments using the field name.
@ -260,20 +389,49 @@ User-Defined Formatting Classes
field, which will be either a string or unicode object, depending
on the type of the original format string.
The 'buffer' argument is a Python array object, either a byte
array or unicode character array. The buffer object will contain
the partially constructed string; the field hook is free to modify
the contents of this buffer if needed.
The field_hook will be called once per field. The field_hook may
take one of two actions:
1) Return False, indicating that the field_hook will not
1) Return a string or unicode object that is the result
of the formatting operation.
2) Return None, indicating that the field_hook will not
process this field and the default formatting should be
used. This decision should be based on the type of the
value object, and the contents of the conversion string.
2) Append the formatted field to the buffer, and return True.
Error handling
The string formatting system has two error handling modes, which
are controlled by the value of a class variable:
string.strict_format_errors = True
The 'strict_format_errors' flag defaults to False, or 'lenient'
mode. Setting it to True enables 'strict' mode. The current mode
determines how errors are handled, depending on the type of the
error.
The types of errors that can occur are:
1) Reference to a missing or invalid argument from within a
field specifier. In strict mode, this will raise an exception.
In lenient mode, this will cause the value of the field to be
replaced with the string '?name?', where 'name' will be the
type of error (KeyError, IndexError, or AttributeError).
So for example:
>>> string.strict_format_errors = False
>>> print 'Item 2 of argument 0 is: {0[2]}'.format( [0,1] )
"Item 2 of argument 0 is: ?IndexError?"
2) Unused argument. In strict mode, this will raise an exception.
In lenient mode, this will be ignored.
3) Exception raised by underlying formatter. These exceptions
are always passed through, regardless of the current mode.
Alternate Syntax
@ -325,22 +483,14 @@ Alternate Syntax
Some specific aspects of the syntax warrant additional comments:
1) The use of the backslash character for escapes. A few people
suggested doubling the brace characters to indicate a literal
brace rather than using backslash as an escape character. This is
also the convention used in the .Net libraries. Here's how the
previously-given example would look with this convention:
"My name is {0} :-{{}}".format('Fred')
One problem with this syntax is that it conflicts with the use of
nested braces to allow parameterization of the conversion
specifiers:
"{0:{1}.{2}}".format(a, b, c)
(There are alternative solutions, but they are too long to go
into here.)
1) Backslash character for escapes. The original version of
this PEP used backslash rather than doubling to escape a bracket.
This worked because backslashes in Python string literals that
don't conform to a standard backslash sequence such as '\n'
are left unmodified. However, this caused a certain amount
of confusion, and led to potential situations of multiple
recursive escapes, i.e. '\\\\{' to place a literal backslash
in front of a bracket.
2) The use of the colon character (':') as a separator for
conversion specifiers. This was chosen simply because that's