Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.
This commit is contained in:
parent
7e4415edea
commit
0ee269ce09
302
pep-3101.txt
302
pep-3101.txt
|
@ -8,7 +8,7 @@ Type: Standards
|
|||
Content-Type: text/plain
|
||||
Created: 16-Apr-2006
|
||||
Python-Version: 3.0
|
||||
Post-History: 28-Apr-2006
|
||||
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
|
||||
|
||||
|
||||
Abstract
|
||||
|
@ -48,7 +48,7 @@ Rationale
|
|||
|
||||
Specification
|
||||
|
||||
The specification will consist of 4 parts:
|
||||
The specification will consist of the following parts:
|
||||
|
||||
- Specification of a new formatting method to be added to the
|
||||
built-in string class.
|
||||
|
@ -60,6 +60,26 @@ Specification
|
|||
|
||||
- Specification of an API for user-defined formatting classes.
|
||||
|
||||
- Specification of how formatting errors are handled.
|
||||
|
||||
Note on string encodings: Since this PEP is being targeted
|
||||
at Python 3.0, it is assumed that all strings are unicode strings,
|
||||
and that the use of the word 'string' in the context of this
|
||||
document will generally refer to a Python 3.0 string, which is
|
||||
the same as Python 2.x unicode object.
|
||||
|
||||
If it should happen that this functionality is backported to
|
||||
the 2.x series, then it will be necessary to handle both regular
|
||||
string as well as unicode objects. All of the function call
|
||||
interfaces described in this PEP can be used for both strings
|
||||
and unicode objects, and in all cases there is sufficient
|
||||
information to be able to properly deduce the output string
|
||||
type (in other words, there is no need for two separate APIs).
|
||||
In all cases, the type of the template string dominates - that
|
||||
is, the result of the conversion will always result in an object
|
||||
that contains the same representation of characters as the
|
||||
input template string.
|
||||
|
||||
|
||||
String Methods
|
||||
|
||||
|
@ -75,9 +95,6 @@ String Methods
|
|||
identified by its keyword name, so in the above example, 'c' is
|
||||
used to refer to the third argument.
|
||||
|
||||
The result of the format call is an object of the same type
|
||||
(string or unicode) as the format string.
|
||||
|
||||
|
||||
Format Strings
|
||||
|
||||
|
@ -90,9 +107,9 @@ Format Strings
|
|||
|
||||
"My name is Fred"
|
||||
|
||||
Braces can be escaped using a backslash:
|
||||
Braces can be escaped by doubling:
|
||||
|
||||
"My name is {0} :-\{\}".format('Fred')
|
||||
"My name is {0} :-{{}}".format('Fred')
|
||||
|
||||
Which would produce:
|
||||
|
||||
|
@ -102,20 +119,47 @@ Format Strings
|
|||
of a 'field name', which can either be simple or compound, and an
|
||||
optional 'conversion specifier'.
|
||||
|
||||
|
||||
Simple and Compound Field Names
|
||||
|
||||
Simple field names are either names or numbers. If numbers, they
|
||||
must be valid base-10 integers; if names, they must be valid
|
||||
Python identifiers. A number is used to identify a positional
|
||||
argument, while a name is used to identify a keyword argument.
|
||||
|
||||
Compound names are a sequence of simple names seperated by
|
||||
periods:
|
||||
A compound field name is a combination of multiple simple field
|
||||
names in an expression:
|
||||
|
||||
"My name is {0.name} :-\{\}".format(dict(name='Fred'))
|
||||
"My name is {0.name}".format(file('out.txt'))
|
||||
|
||||
Compound names can be used to access specific dictionary entries,
|
||||
array elements, or object attributes. In the above example, the
|
||||
'{0.name}' field refers to the dictionary entry 'name' within
|
||||
positional argument 0.
|
||||
This example shows the use of the 'getattr' or 'dot' operator
|
||||
in a field expression. The dot operator allows an attribute of
|
||||
an input value to be specified as the field value.
|
||||
|
||||
The types of expressions that can be used in a compound name
|
||||
have been deliberately limited in order to prevent potential
|
||||
security exploits resulting from the ability to place arbitrary
|
||||
Python expressions inside of strings. Only two operators are
|
||||
supported, the '.' (getattr) operator, and the '[]' (getitem)
|
||||
operator.
|
||||
|
||||
An example of the 'getitem' syntax:
|
||||
|
||||
"My name is {0[name]}".format(dict(name='Fred'))
|
||||
|
||||
It should be noted that the use of 'getitem' within a string is
|
||||
much more limited than its normal use. In the above example, the
|
||||
string 'name' really is the literal string 'name', not a variable
|
||||
named 'name'. The rules for parsing an item key are the same as
|
||||
for parsing a simple name - in other words, if it looks like a
|
||||
number, then its treated as a number, if it looks like an
|
||||
identifier, then it is used as a string.
|
||||
|
||||
It is not possible to specify arbitrary dictionary keys from
|
||||
within a format string.
|
||||
|
||||
|
||||
Conversion Specifiers
|
||||
|
||||
Each field can also specify an optional set of 'conversion
|
||||
specifiers' which can be used to adjust the format of that field.
|
||||
|
@ -129,53 +173,135 @@ Format Strings
|
|||
built-in types will recognize a standard set of conversion
|
||||
specifiers.
|
||||
|
||||
The conversion specifier consists of a sequence of zero or more
|
||||
characters, each of which can consist of any printable character
|
||||
except for a non-escaped '}'.
|
||||
Conversion specifiers can themselves contain replacement fields.
|
||||
For example, a field whose field width it itself a parameter
|
||||
could be specified via:
|
||||
|
||||
Conversion specifiers can themselves contain replacement fields;
|
||||
this will be described in a later section. Except for this
|
||||
replacement, the format() method does not attempt to intepret the
|
||||
conversion specifiers in any way; it merely passes all of the
|
||||
characters between the first colon ':' and the matching right
|
||||
brace ('}') to the various underlying formatters (described
|
||||
later.)
|
||||
"{0:{1}}".format(a, b, c)
|
||||
|
||||
Note that the doubled '}' at the end, which would normally be
|
||||
escaped, is not escaped in this case. The reason is because
|
||||
the '{{' and '}}' syntax for escapes is only applied when used
|
||||
*outside* of a format field. Within a format field, the brace
|
||||
characters always have their normal meaning.
|
||||
|
||||
The syntax for conversion specifiers is open-ended, since except
|
||||
than doing field replacements, the format() method does not
|
||||
attempt to interpret them in any way; it merely passes all of the
|
||||
characters between the first colon and the matching brace to
|
||||
the various underlying formatter methods.
|
||||
|
||||
|
||||
Standard Conversion Specifiers
|
||||
|
||||
For most built-in types, the conversion specifiers will be the
|
||||
same or similar to the existing conversion specifiers used with
|
||||
the '%' operator. Thus, instead of '%02.2x", you will say
|
||||
'{0:02.2x}'.
|
||||
If an object does not define its own conversion specifiers, a
|
||||
standard set of conversion specifiers are used. These are similar
|
||||
in concept to the conversion specifiers used by the existing '%'
|
||||
operator, however there are also a number of significant
|
||||
differences. The standard conversion specifiers fall into three
|
||||
major categories: string conversions, integer conversions and
|
||||
floating point conversions.
|
||||
|
||||
There are a few differences however:
|
||||
The general form of a standard conversion specifier is:
|
||||
|
||||
- The trailing letter is optional - you don't need to say '2.2d',
|
||||
you can instead just say '2.2'. If the letter is omitted, a
|
||||
default will be assumed based on the type of the argument.
|
||||
The defaults will be as follows:
|
||||
[[fill]align][sign][width][.precision][type]
|
||||
|
||||
string or unicode object: 's'
|
||||
integer: 'd'
|
||||
floating-point number: 'f'
|
||||
all other types: 's'
|
||||
The brackets ([]) indicate an optional field.
|
||||
|
||||
- Variable field width specifiers use a nested version of the {}
|
||||
syntax, allowing the width specifier to be either a positional
|
||||
or keyword argument:
|
||||
Then the optional align flag can be one of the following:
|
||||
|
||||
"{0:{1}.{2}d}".format(a, b, c)
|
||||
'<' - Forces the field to be left-aligned within the available
|
||||
space (This is the default.)
|
||||
'>' - Forces the field to be right-aligned within the
|
||||
available space.
|
||||
'=' - Forces the padding to be placed between immediately
|
||||
after the sign, if any. This is used for printing fields
|
||||
in the form '+000000120'.
|
||||
|
||||
- The support for length modifiers (which are ignored by Python
|
||||
anyway) is dropped.
|
||||
Note that unless a minimum field width is defined, the field
|
||||
width will always be the same size as the data to fill it, so
|
||||
that the alignment option has no meaning in this case.
|
||||
|
||||
For non-built-in types, the conversion specifiers will be specific
|
||||
to that type. An example is the 'datetime' class, whose
|
||||
conversion specifiers are identical to the arguments to the
|
||||
strftime() function:
|
||||
The optional 'fill' character defines the character to be used to
|
||||
pad the field to the minimum width. The alignment flag must be
|
||||
supplied if the character is a number other than 0 (otherwise the
|
||||
character would be interpreted as part of the field width
|
||||
specifier). A zero fill character without an alignment flag
|
||||
implies an alignment type of '='.
|
||||
|
||||
"Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now())
|
||||
The 'sign' field can be one of the following:
|
||||
|
||||
'+' - indicates that a sign should be used for both
|
||||
positive as well as negative numbers
|
||||
'-' - indicates that a sign should be used only for negative
|
||||
numbers (this is the default behaviour)
|
||||
' ' - indicates that a leading space should be used on
|
||||
positive numbers
|
||||
'()' - indicates that negative numbers should be surrounded
|
||||
by parentheses
|
||||
|
||||
'width' is a decimal integer defining the minimum field width. If
|
||||
not specified, then the field width will be determined by the
|
||||
content.
|
||||
|
||||
The 'precision' field is a decimal number indicating how many
|
||||
digits should be displayed after the decimal point.
|
||||
|
||||
Finally, the 'type' determines how the data should be presented.
|
||||
If the type field is absent, an appropriate type will be assigned
|
||||
based on the value to be formatted ('d' for integers and longs,
|
||||
'g' for floats, and 's' for everything else.)
|
||||
|
||||
The available string conversion types are:
|
||||
|
||||
's' - String format. Invokes str() on the object.
|
||||
This is the default conversion specifier type.
|
||||
'r' - Repr format. Invokes repr() on the object.
|
||||
|
||||
There are several integer conversion types. All invoke int() on
|
||||
the object before attempting to format it.
|
||||
|
||||
The available integer conversion types are:
|
||||
|
||||
'b' - Binary. Outputs the number in base 2.
|
||||
'c' - Character. Converts the integer to the corresponding
|
||||
unicode character before printing.
|
||||
'd' - Decimal Integer. Outputs the number in base 10.
|
||||
'o' - Octal format. Outputs the number in base 8.
|
||||
'x' - Hex format. Outputs the number in base 16, using lower-
|
||||
case letters for the digits above 9.
|
||||
'X' - Hex format. Outputs the number in base 16, using upper-
|
||||
case letters for the digits above 9.
|
||||
|
||||
There are several floating point conversion types. All invoke
|
||||
float() on the object before attempting to format it.
|
||||
|
||||
The available floating point conversion types are:
|
||||
|
||||
'e' - Exponent notation. Prints the number in scientific
|
||||
notation using the letter 'e' to indicate the exponent.
|
||||
'E' - Exponent notation. Same as 'e' except it uses an upper
|
||||
case 'E' as the separator character.
|
||||
'f' - Fixed point. Displays the number as a fixed-point
|
||||
number.
|
||||
'F' - Fixed point. Same as 'f'.
|
||||
'g' - General format. This prints the number as a fixed-point
|
||||
number, unless the number is too large, in which case
|
||||
it switches to 'e' exponent notation.
|
||||
'G' - General format. Same as 'g' except switches to 'E'
|
||||
if the number gets to large.
|
||||
'n' - Number. This is the same as 'g', except that it uses the
|
||||
current locale setting to insert the appropriate
|
||||
number separator characters.
|
||||
'%' - Percentage. Multiplies the number by 100 and displays
|
||||
in fixed ('f') format, followed by a percent sign.
|
||||
|
||||
Objects are able to define their own conversion specifiers to
|
||||
replace the standard ones. An example is the 'datetime' class,
|
||||
whose conversion specifiers might look something like the
|
||||
arguments to the strftime() function:
|
||||
|
||||
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
|
||||
|
||||
|
||||
Controlling Formatting
|
||||
|
@ -224,19 +350,22 @@ User-Defined Formatting Classes
|
|||
API for such an application-specific formatter is up to the
|
||||
application; here are several possible examples:
|
||||
|
||||
cell_format( "The total is: {0}", total )
|
||||
cell_format("The total is: {0}", total)
|
||||
|
||||
TemplateString( "The total is: {0}" ).format( total )
|
||||
TemplateString("The total is: {0}").format(total)
|
||||
|
||||
Creating an application-specific formatter is relatively straight-
|
||||
forward. The string and unicode classes will have a class method
|
||||
called 'cformat' that does all the actual work of formatting; The
|
||||
built-in format() method is just a wrapper that calls cformat.
|
||||
|
||||
The type signature for the cFormat function is as follows:
|
||||
|
||||
cformat(template, format_hook, args, kwargs)
|
||||
|
||||
The parameters to the cformat function are:
|
||||
|
||||
-- The format string (or unicode; the same function handles
|
||||
both.)
|
||||
-- The format template string.
|
||||
-- A callable 'format hook', which is called once per field
|
||||
-- A tuple containing the positional arguments
|
||||
-- A dict containing the keyword arguments
|
||||
|
@ -251,7 +380,7 @@ User-Defined Formatting Classes
|
|||
will attempt to call the field format hook with the following
|
||||
arguments:
|
||||
|
||||
format_hook(value, conversion, buffer)
|
||||
format_hook(value, conversion)
|
||||
|
||||
The 'value' field corresponds to the value being formatted, which
|
||||
was retrieved from the arguments using the field name.
|
||||
|
@ -260,20 +389,49 @@ User-Defined Formatting Classes
|
|||
field, which will be either a string or unicode object, depending
|
||||
on the type of the original format string.
|
||||
|
||||
The 'buffer' argument is a Python array object, either a byte
|
||||
array or unicode character array. The buffer object will contain
|
||||
the partially constructed string; the field hook is free to modify
|
||||
the contents of this buffer if needed.
|
||||
|
||||
The field_hook will be called once per field. The field_hook may
|
||||
take one of two actions:
|
||||
|
||||
1) Return False, indicating that the field_hook will not
|
||||
1) Return a string or unicode object that is the result
|
||||
of the formatting operation.
|
||||
|
||||
2) Return None, indicating that the field_hook will not
|
||||
process this field and the default formatting should be
|
||||
used. This decision should be based on the type of the
|
||||
value object, and the contents of the conversion string.
|
||||
|
||||
2) Append the formatted field to the buffer, and return True.
|
||||
|
||||
Error handling
|
||||
|
||||
The string formatting system has two error handling modes, which
|
||||
are controlled by the value of a class variable:
|
||||
|
||||
string.strict_format_errors = True
|
||||
|
||||
The 'strict_format_errors' flag defaults to False, or 'lenient'
|
||||
mode. Setting it to True enables 'strict' mode. The current mode
|
||||
determines how errors are handled, depending on the type of the
|
||||
error.
|
||||
|
||||
The types of errors that can occur are:
|
||||
|
||||
1) Reference to a missing or invalid argument from within a
|
||||
field specifier. In strict mode, this will raise an exception.
|
||||
In lenient mode, this will cause the value of the field to be
|
||||
replaced with the string '?name?', where 'name' will be the
|
||||
type of error (KeyError, IndexError, or AttributeError).
|
||||
|
||||
So for example:
|
||||
|
||||
>>> string.strict_format_errors = False
|
||||
>>> print 'Item 2 of argument 0 is: {0[2]}'.format( [0,1] )
|
||||
"Item 2 of argument 0 is: ?IndexError?"
|
||||
|
||||
2) Unused argument. In strict mode, this will raise an exception.
|
||||
In lenient mode, this will be ignored.
|
||||
|
||||
3) Exception raised by underlying formatter. These exceptions
|
||||
are always passed through, regardless of the current mode.
|
||||
|
||||
|
||||
Alternate Syntax
|
||||
|
@ -325,22 +483,14 @@ Alternate Syntax
|
|||
|
||||
Some specific aspects of the syntax warrant additional comments:
|
||||
|
||||
1) The use of the backslash character for escapes. A few people
|
||||
suggested doubling the brace characters to indicate a literal
|
||||
brace rather than using backslash as an escape character. This is
|
||||
also the convention used in the .Net libraries. Here's how the
|
||||
previously-given example would look with this convention:
|
||||
|
||||
"My name is {0} :-{{}}".format('Fred')
|
||||
|
||||
One problem with this syntax is that it conflicts with the use of
|
||||
nested braces to allow parameterization of the conversion
|
||||
specifiers:
|
||||
|
||||
"{0:{1}.{2}}".format(a, b, c)
|
||||
|
||||
(There are alternative solutions, but they are too long to go
|
||||
into here.)
|
||||
1) Backslash character for escapes. The original version of
|
||||
this PEP used backslash rather than doubling to escape a bracket.
|
||||
This worked because backslashes in Python string literals that
|
||||
don't conform to a standard backslash sequence such as '\n'
|
||||
are left unmodified. However, this caused a certain amount
|
||||
of confusion, and led to potential situations of multiple
|
||||
recursive escapes, i.e. '\\\\{' to place a literal backslash
|
||||
in front of a bracket.
|
||||
|
||||
2) The use of the colon character (':') as a separator for
|
||||
conversion specifiers. This was chosen simply because that's
|
||||
|
|
Loading…
Reference in New Issue