Latest updates to PEP 3101, incorporating recent discussions.

This commit is contained in:
Talin 2007-08-15 00:14:29 +00:00
parent 0801f23a05
commit 8089f06be4
1 changed files with 171 additions and 111 deletions

View File

@ -8,7 +8,7 @@ Type: Standards Track
Content-Type: text/plain
Created: 16-Apr-2006
Python-Version: 3.0
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007
Abstract
@ -105,6 +105,13 @@ String Methods
identified by its keyword name, so in the above example, 'c' is
used to refer to the third argument.
There is also a global built-in function, 'format' which formats
a single value:
print format(10.0, "7.3g")
This function is described in a later section.
Format Strings
@ -136,7 +143,7 @@ Format Strings
The element within the braces is called a 'field'. Fields consist
of a 'field name', which can either be simple or compound, and an
optional 'conversion specifier'.
optional 'format specifier'.
Simple and Compound Field Names
@ -159,7 +166,7 @@ Simple and Compound Field Names
expressions in format strings. This is by design - the types of
expressions that you can use is deliberately limited. Only two operators
are supported: the '.' (getattr) operator, and the '[]' (getitem)
operator. The reason for allowing these operators is that they dont'
operator. The reason for allowing these operators is that they don't
normally have side effects in non-pathological code.
An example of the 'getitem' syntax:
@ -180,26 +187,26 @@ Simple and Compound Field Names
not required to enforce the rule about a name being a valid
Python identifier. Instead, it will rely on the getattr function
of the underlying object to throw an exception if the identifier
is not legal. The format function will have a minimalist parser
which only attempts to figure out when it is "done" with an
is not legal. The str.format() function will have a minimalist
parser which only attempts to figure out when it is "done" with an
identifier (by finding a '.' or a ']', or '}', etc.).
Conversion Specifiers
Format Specifiers
Each field can also specify an optional set of 'conversion
Each field can also specify an optional set of 'format
specifiers' which can be used to adjust the format of that field.
Conversion specifiers follow the field name, with a colon (':')
Format specifiers follow the field name, with a colon (':')
character separating the two:
"My name is {0:8}".format('Fred')
The meaning and syntax of the conversion specifiers depends on the
The meaning and syntax of the format specifiers depends on the
type of object that is being formatted, however there is a
standard set of conversion specifiers used for any object that
standard set of format specifiers used for any object that
does not override them.
Conversion specifiers can themselves contain replacement fields.
Format specifiers can themselves contain replacement fields.
For example, a field whose field width is itself a parameter
could be specified via:
@ -211,24 +218,21 @@ Conversion Specifiers
*outside* of a format field. Within a format field, the brace
characters always have their normal meaning.
The syntax for conversion specifiers is open-ended, since a class
can override the standard conversion specifiers. In such cases,
the format() method merely passes all of the characters between
The syntax for format specifiers is open-ended, since a class
can override the standard format specifiers. In such cases,
the str.format() method merely passes all of the characters between
the first colon and the matching brace to the relevant underlying
formatting method.
Standard Conversion Specifiers
Standard Format Specifiers
If an object does not define its own conversion specifiers, a
standard set of conversion specifiers are used. These are similar
in concept to the conversion specifiers used by the existing '%'
operator, however there are also a number of significant
differences. The standard conversion specifiers fall into three
major categories: string conversions, integer conversions and
floating point conversions.
If an object does not define its own format specifiers, a
standard set of format specifiers are used. These are similar
in concept to the format specifiers used by the existing '%'
operator, however there are also a number of differences.
The general form of a standard conversion specifier is:
The general form of a standard format specifier is:
[[fill]align][sign][width][.precision][type]
@ -265,8 +269,6 @@ Standard Conversion Specifiers
numbers (this is the default behavior)
' ' - indicates that a leading space should be used on
positive numbers
'()' - indicates that negative numbers should be surrounded
by parentheses
'width' is a decimal integer defining the minimum field width. If
not specified, then the field width will be determined by the
@ -274,25 +276,16 @@ Standard Conversion Specifiers
The 'precision' is a decimal number indicating how many digits
should be displayed after the decimal point in a floating point
conversion. In a string conversion the field indicates how many
characters will be used from the field content. The precision is
ignored for integer conversions.
conversion. For non-number types the field indicates the maximum
field size - in other words, how many characters will be used from
the field content. The precision is ignored for integer conversions.
Finally, the 'type' determines how the data should be presented.
If the type field is absent, an appropriate type will be assigned
based on the value to be formatted ('d' for integers and longs,
'g' for floats, and 's' for everything else.)
'g' for floats and decimals, and 's' for everything else.)
The available string conversion types are:
's' - String format. Invokes str() on the object.
This is the default conversion specifier type.
'r' - Repr format. Invokes repr() on the object.
There are several integer conversion types. All invoke int() on
the object before attempting to format it.
The available integer conversion types are:
The available integer presentation types are:
'b' - Binary. Outputs the number in base 2.
'c' - Character. Converts the integer to the corresponding
@ -304,10 +297,7 @@ Standard Conversion Specifiers
'X' - Hex format. Outputs the number in base 16, using upper-
case letters for the digits above 9.
There are several floating point conversion types. All invoke
float() on the object before attempting to format it.
The available floating point conversion types are:
The available floating point presentation types are:
'e' - Exponent notation. Prints the number in scientific
notation using the letter 'e' to indicate the exponent.
@ -327,39 +317,88 @@ Standard Conversion Specifiers
'%' - Percentage. Multiplies the number by 100 and displays
in fixed ('f') format, followed by a percent sign.
Objects are able to define their own conversion specifiers to
For non-numeric types, there's only one presentation type, 's', which
does nothing - its only purpose is to ease the transition from the
old '%' style formatting.
Objects are able to define their own format specifiers to
replace the standard ones. An example is the 'datetime' class,
whose conversion specifiers might look something like the
whose format specifiers might look something like the
arguments to the strftime() function:
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
Explicit Conversion Flag
The explicit conversion flag is used to transform the format field value
before it is formatted. This can be used to override the type-specific
formatting behavior, and format the value as if it were a more
generic type. Currently, two explicit conversion flags are
recognized:
!r - convert the value to a string using repr().
!s - convert the value to a string using str().
These flags are typically placed before the format specifier:
"{0!r:20}".format("Hello")
In the preceding example, the string "Hello" will be printed, with quotes,
in a field of at least 20 characters width.
Controlling Formatting on a Per-Type Basis
A class that wishes to implement a custom interpretation of its
conversion specifiers can implement a __format__ method:
Each Python type can control formatting of its instances by defining
a __format__ method. The __format__ method is responsible for
interpreting the format specifier, formatting the value, and
returning the resulting string.
The new, global built-in function 'format' simply calls this special
method, similar to how len() and str() simply call their respective
special methods:
def format(value, format_spec):
return value.__format__(format_spec)
It is safe to call this function with a value of "None" (because the
"None" value in Python is an object and can have methods.)
Several built-in types, including 'str', 'int', 'float', and 'object'
define __format__ methods. This means that if you derive from any of
those types, your class will know how to format itself.
The object.__format__ method is the simplest: It simply converts the
object to a string, and then calls format again:
class object:
def __format__(self, format_spec):
return format(str(self), format_spec)
The __format__ methods for 'int' and 'float' will do numeric formatting
based on the format specifier. In some cases, these formatting
operations may be delegated to other types. So for example, in the case
where the 'int' formatter sees a format type of 'f' (meaning 'float')
it can simply cast the value to a float and call format() again.
Any class can override the __format__ method to provide custom
formatting for that type:
class AST:
def __format__(self, specifiers):
def __format__(self, format_spec):
...
The 'specifiers' argument will be either a string object or a
unicode object, depending on the type of the original format
string. The __format__ method should test the type of the
specifiers parameter to determine whether to return a string or
Note for Python 2.x: The 'format_spec' argument will be either
a string object or a unicode object, depending on the type of the
original format string. The __format__ method should test the type
of the specifiers parameter to determine whether to return a string or
unicode object. It is the responsibility of the __format__ method
to return an object of the proper type.
string.format() will format each field using the following steps:
1) See if the value to be formatted has a __format__ method. If
it does, then call it.
2) Otherwise, check the internal formatter within string.format
that contains knowledge of certain builtin types.
3) Otherwise, call str() or unicode() as appropriate.
Note that the 'explicit conversion' flag mentioned above is not passed
to the __format__ method. Rather, it is expected that the conversion
specified by the flag will be performed before calling __format__.
User-Defined Formatting
@ -418,24 +457,30 @@ Formatter Methods
Formatter defines the following overridable methods:
-- get_positional(args, index)
-- get_named(kwds, name)
-- get_value(key, args, kwargs)
-- check_unused_args(used_args, args, kwargs)
-- format_field(value, conversion)
-- format_field(value, format_spec)
'get_positional' and 'get_named' are used to retrieve a given field
value. For compound field names, these functions are only called for
the first component of the field name; Subsequent components are
handled through normal attribute and indexing operations.
'get_value' is used to retrieve a given field value. The 'key' argument
will be either an integer or a string. If it is an integer, it represents
the index of the positional argument in 'args'; If it is a string, then
it represents a named argument in 'kwargs'.
So for example, the field expression '0.name' would cause
'get_positional' to be called with the parameter 'args' set to the
list of positional arguments to vformat, and 'index' set to zero;
the returned value would then be passed to the standard 'getattr'
function to get the 'name' attribute.
The 'args' parameter is set to the list of positional arguments to
'vformat', and the 'kwargs' parameter is set to the dictionary of
positional arguments.
For compound field names, these functions are only called for the
first component of the field name; Subsequent components are handled
through normal attribute and indexing operations.
So for example, the field expression '0.name' would cause 'get_value'
to be called with a 'key' argument of 0. The 'name' attribute will be
looked up after 'get_value' returns by calling the built-in 'getattr'
function.
If the index or keyword refers to an item that does not exist, then an
IndexError/KeyError will be raised.
IndexError/KeyError should be raised.
'check_unused_args' is used to implement checking for unused arguments
if desired. The arguments to this function is the set of all argument
@ -445,16 +490,12 @@ Formatter Methods
of these two sets will be the set of unused args. 'check_unused_args'
is assumed to throw an exception if the check fails.
'format_field' actually generates the text for a replacement field.
The 'value' argument corresponds to the value being formatted, which
was retrieved from the arguments using the field name. The
'conversion' argument is the conversion spec part of the field, which
will be either a string or unicode object, depending on the type of
the original format string.
'format_field' simply calls the global 'format' built-in. The method
is provided so that subclasses can override it.
To get a better understanding of how these functions relate to each
other, here is pseudocode that explains the general operation of
vformat:
vformat.
def vformat(format_string, args, kwargs):
@ -465,22 +506,36 @@ Formatter Methods
# Tokens are either format fields or literal strings
for token in self.parse(format_string):
if is_format_field(token):
field_spec, conversion_spec = token.rsplit(":", 2)
# Split the token into field value and format spec
field_spec, _, format_spec = token.partition(":")
# Check for explicit type conversion
field_spec, _, explicit = field_spec.partition("!")
# 'first_part' is the part before the first '.' or '['
first_part = get_first_part(token)
used_args.add(first_part)
if is_positional(first_part):
value = self.get_positional(args, first_part)
else:
value = self.get_named(kwargs, first_part)
# Assume that 'get_first_part' returns either an int or
# a string, depending on the syntax.
first_part = get_first_part(field_spec)
value = self.get_value(first_part, args, kwargs)
# Handle [subfield] or .subfield
for comp in components(token):
# Record the fact that we used this arg
used_args.add(first_part)
# Handle [subfield] or .subfield. Assume that 'components'
# returns an iterator of the various subfields, not including
# the first part.
for comp in components(field_spec):
value = resolve_subfield(value, comp)
# Write out the converted value
buffer.write(format_field(value, conversion))
# Handle explicit type conversion
if explicit == 'r':
value = repr(value)
elif explicit == 's':
value = str(value)
# Call the global 'format' function and write out the converted
# value.
buffer.write(self.format_field(value, format_spec))
else:
buffer.write(token)
@ -488,10 +543,12 @@ Formatter Methods
self.check_unused_args(used_args, args, kwargs)
return buffer.getvalue()
Note that the actual algorithm of the Formatter class may not be the
one presented here. In particular, the final implementation of
the Formatter class may define additional overridable methods and
hooks. Also, the final implementation will be written in C.
Note that the actual algorithm of the Formatter class (which will be
implemented in C) may not be the one presented here. (It's likely
that the actual implementation won't be a 'class' at all - rather,
vformat may just call a C function which accepts the other overridable
methods as arguments.) The primary purpose of this code example is to
illustrate the order in which overridable methods are called.
Customizing Formatters
@ -512,12 +569,15 @@ Customizing Formatters
Formatter.__init__(self, flags)
self.namespace = namespace
def get_named(self, kwds, name):
def get_value(self, key, args, kwds):
if isinstance(key, str):
try:
# Check explicitly passed arguments first
return kwds[name]
except KeyError:
return self.namespace[name]
else:
Formatter.get_value(key, args, kwds)
One can use this to easily create a formatting function that allows
access to global variables, for example:
@ -608,7 +668,7 @@ Alternate Syntax
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
string interpolation. The Microsoft .Net libraries uses braces
({}), and a syntax which is very similar to the one in this
proposal, although the syntax for conversion specifiers is quite
proposal, although the syntax for format specifiers is quite
different. [4]
- Backquoting. This method has the benefit of minimal syntactical
@ -631,7 +691,7 @@ Alternate Syntax
in front of a bracket.
2) The use of the colon character (':') as a separator for
conversion specifiers. This was chosen simply because that's
format specifiers. This was chosen simply because that's
what .Net uses.