Latest updates to PEP 3101, incorporating recent discussions.
This commit is contained in:
parent
0801f23a05
commit
8089f06be4
258
pep-3101.txt
258
pep-3101.txt
|
@ -8,7 +8,7 @@ Type: Standards Track
|
||||||
Content-Type: text/plain
|
Content-Type: text/plain
|
||||||
Created: 16-Apr-2006
|
Created: 16-Apr-2006
|
||||||
Python-Version: 3.0
|
Python-Version: 3.0
|
||||||
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
|
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007
|
||||||
|
|
||||||
|
|
||||||
Abstract
|
Abstract
|
||||||
|
@ -105,6 +105,13 @@ String Methods
|
||||||
identified by its keyword name, so in the above example, 'c' is
|
identified by its keyword name, so in the above example, 'c' is
|
||||||
used to refer to the third argument.
|
used to refer to the third argument.
|
||||||
|
|
||||||
|
There is also a global built-in function, 'format' which formats
|
||||||
|
a single value:
|
||||||
|
|
||||||
|
print format(10.0, "7.3g")
|
||||||
|
|
||||||
|
This function is described in a later section.
|
||||||
|
|
||||||
|
|
||||||
Format Strings
|
Format Strings
|
||||||
|
|
||||||
|
@ -136,7 +143,7 @@ Format Strings
|
||||||
|
|
||||||
The element within the braces is called a 'field'. Fields consist
|
The element within the braces is called a 'field'. Fields consist
|
||||||
of a 'field name', which can either be simple or compound, and an
|
of a 'field name', which can either be simple or compound, and an
|
||||||
optional 'conversion specifier'.
|
optional 'format specifier'.
|
||||||
|
|
||||||
|
|
||||||
Simple and Compound Field Names
|
Simple and Compound Field Names
|
||||||
|
@ -159,7 +166,7 @@ Simple and Compound Field Names
|
||||||
expressions in format strings. This is by design - the types of
|
expressions in format strings. This is by design - the types of
|
||||||
expressions that you can use is deliberately limited. Only two operators
|
expressions that you can use is deliberately limited. Only two operators
|
||||||
are supported: the '.' (getattr) operator, and the '[]' (getitem)
|
are supported: the '.' (getattr) operator, and the '[]' (getitem)
|
||||||
operator. The reason for allowing these operators is that they dont'
|
operator. The reason for allowing these operators is that they don't
|
||||||
normally have side effects in non-pathological code.
|
normally have side effects in non-pathological code.
|
||||||
|
|
||||||
An example of the 'getitem' syntax:
|
An example of the 'getitem' syntax:
|
||||||
|
@ -180,26 +187,26 @@ Simple and Compound Field Names
|
||||||
not required to enforce the rule about a name being a valid
|
not required to enforce the rule about a name being a valid
|
||||||
Python identifier. Instead, it will rely on the getattr function
|
Python identifier. Instead, it will rely on the getattr function
|
||||||
of the underlying object to throw an exception if the identifier
|
of the underlying object to throw an exception if the identifier
|
||||||
is not legal. The format function will have a minimalist parser
|
is not legal. The str.format() function will have a minimalist
|
||||||
which only attempts to figure out when it is "done" with an
|
parser which only attempts to figure out when it is "done" with an
|
||||||
identifier (by finding a '.' or a ']', or '}', etc.).
|
identifier (by finding a '.' or a ']', or '}', etc.).
|
||||||
|
|
||||||
|
|
||||||
Conversion Specifiers
|
Format Specifiers
|
||||||
|
|
||||||
Each field can also specify an optional set of 'conversion
|
Each field can also specify an optional set of 'format
|
||||||
specifiers' which can be used to adjust the format of that field.
|
specifiers' which can be used to adjust the format of that field.
|
||||||
Conversion specifiers follow the field name, with a colon (':')
|
Format specifiers follow the field name, with a colon (':')
|
||||||
character separating the two:
|
character separating the two:
|
||||||
|
|
||||||
"My name is {0:8}".format('Fred')
|
"My name is {0:8}".format('Fred')
|
||||||
|
|
||||||
The meaning and syntax of the conversion specifiers depends on the
|
The meaning and syntax of the format specifiers depends on the
|
||||||
type of object that is being formatted, however there is a
|
type of object that is being formatted, however there is a
|
||||||
standard set of conversion specifiers used for any object that
|
standard set of format specifiers used for any object that
|
||||||
does not override them.
|
does not override them.
|
||||||
|
|
||||||
Conversion specifiers can themselves contain replacement fields.
|
Format specifiers can themselves contain replacement fields.
|
||||||
For example, a field whose field width is itself a parameter
|
For example, a field whose field width is itself a parameter
|
||||||
could be specified via:
|
could be specified via:
|
||||||
|
|
||||||
|
@ -211,24 +218,21 @@ Conversion Specifiers
|
||||||
*outside* of a format field. Within a format field, the brace
|
*outside* of a format field. Within a format field, the brace
|
||||||
characters always have their normal meaning.
|
characters always have their normal meaning.
|
||||||
|
|
||||||
The syntax for conversion specifiers is open-ended, since a class
|
The syntax for format specifiers is open-ended, since a class
|
||||||
can override the standard conversion specifiers. In such cases,
|
can override the standard format specifiers. In such cases,
|
||||||
the format() method merely passes all of the characters between
|
the str.format() method merely passes all of the characters between
|
||||||
the first colon and the matching brace to the relevant underlying
|
the first colon and the matching brace to the relevant underlying
|
||||||
formatting method.
|
formatting method.
|
||||||
|
|
||||||
|
|
||||||
Standard Conversion Specifiers
|
Standard Format Specifiers
|
||||||
|
|
||||||
If an object does not define its own conversion specifiers, a
|
If an object does not define its own format specifiers, a
|
||||||
standard set of conversion specifiers are used. These are similar
|
standard set of format specifiers are used. These are similar
|
||||||
in concept to the conversion specifiers used by the existing '%'
|
in concept to the format specifiers used by the existing '%'
|
||||||
operator, however there are also a number of significant
|
operator, however there are also a number of differences.
|
||||||
differences. The standard conversion specifiers fall into three
|
|
||||||
major categories: string conversions, integer conversions and
|
|
||||||
floating point conversions.
|
|
||||||
|
|
||||||
The general form of a standard conversion specifier is:
|
The general form of a standard format specifier is:
|
||||||
|
|
||||||
[[fill]align][sign][width][.precision][type]
|
[[fill]align][sign][width][.precision][type]
|
||||||
|
|
||||||
|
@ -265,8 +269,6 @@ Standard Conversion Specifiers
|
||||||
numbers (this is the default behavior)
|
numbers (this is the default behavior)
|
||||||
' ' - indicates that a leading space should be used on
|
' ' - indicates that a leading space should be used on
|
||||||
positive numbers
|
positive numbers
|
||||||
'()' - indicates that negative numbers should be surrounded
|
|
||||||
by parentheses
|
|
||||||
|
|
||||||
'width' is a decimal integer defining the minimum field width. If
|
'width' is a decimal integer defining the minimum field width. If
|
||||||
not specified, then the field width will be determined by the
|
not specified, then the field width will be determined by the
|
||||||
|
@ -274,25 +276,16 @@ Standard Conversion Specifiers
|
||||||
|
|
||||||
The 'precision' is a decimal number indicating how many digits
|
The 'precision' is a decimal number indicating how many digits
|
||||||
should be displayed after the decimal point in a floating point
|
should be displayed after the decimal point in a floating point
|
||||||
conversion. In a string conversion the field indicates how many
|
conversion. For non-number types the field indicates the maximum
|
||||||
characters will be used from the field content. The precision is
|
field size - in other words, how many characters will be used from
|
||||||
ignored for integer conversions.
|
the field content. The precision is ignored for integer conversions.
|
||||||
|
|
||||||
Finally, the 'type' determines how the data should be presented.
|
Finally, the 'type' determines how the data should be presented.
|
||||||
If the type field is absent, an appropriate type will be assigned
|
If the type field is absent, an appropriate type will be assigned
|
||||||
based on the value to be formatted ('d' for integers and longs,
|
based on the value to be formatted ('d' for integers and longs,
|
||||||
'g' for floats, and 's' for everything else.)
|
'g' for floats and decimals, and 's' for everything else.)
|
||||||
|
|
||||||
The available string conversion types are:
|
The available integer presentation types are:
|
||||||
|
|
||||||
's' - String format. Invokes str() on the object.
|
|
||||||
This is the default conversion specifier type.
|
|
||||||
'r' - Repr format. Invokes repr() on the object.
|
|
||||||
|
|
||||||
There are several integer conversion types. All invoke int() on
|
|
||||||
the object before attempting to format it.
|
|
||||||
|
|
||||||
The available integer conversion types are:
|
|
||||||
|
|
||||||
'b' - Binary. Outputs the number in base 2.
|
'b' - Binary. Outputs the number in base 2.
|
||||||
'c' - Character. Converts the integer to the corresponding
|
'c' - Character. Converts the integer to the corresponding
|
||||||
|
@ -304,10 +297,7 @@ Standard Conversion Specifiers
|
||||||
'X' - Hex format. Outputs the number in base 16, using upper-
|
'X' - Hex format. Outputs the number in base 16, using upper-
|
||||||
case letters for the digits above 9.
|
case letters for the digits above 9.
|
||||||
|
|
||||||
There are several floating point conversion types. All invoke
|
The available floating point presentation types are:
|
||||||
float() on the object before attempting to format it.
|
|
||||||
|
|
||||||
The available floating point conversion types are:
|
|
||||||
|
|
||||||
'e' - Exponent notation. Prints the number in scientific
|
'e' - Exponent notation. Prints the number in scientific
|
||||||
notation using the letter 'e' to indicate the exponent.
|
notation using the letter 'e' to indicate the exponent.
|
||||||
|
@ -327,39 +317,88 @@ Standard Conversion Specifiers
|
||||||
'%' - Percentage. Multiplies the number by 100 and displays
|
'%' - Percentage. Multiplies the number by 100 and displays
|
||||||
in fixed ('f') format, followed by a percent sign.
|
in fixed ('f') format, followed by a percent sign.
|
||||||
|
|
||||||
Objects are able to define their own conversion specifiers to
|
For non-numeric types, there's only one presentation type, 's', which
|
||||||
|
does nothing - its only purpose is to ease the transition from the
|
||||||
|
old '%' style formatting.
|
||||||
|
|
||||||
|
Objects are able to define their own format specifiers to
|
||||||
replace the standard ones. An example is the 'datetime' class,
|
replace the standard ones. An example is the 'datetime' class,
|
||||||
whose conversion specifiers might look something like the
|
whose format specifiers might look something like the
|
||||||
arguments to the strftime() function:
|
arguments to the strftime() function:
|
||||||
|
|
||||||
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
|
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
|
||||||
|
|
||||||
|
|
||||||
|
Explicit Conversion Flag
|
||||||
|
|
||||||
|
The explicit conversion flag is used to transform the format field value
|
||||||
|
before it is formatted. This can be used to override the type-specific
|
||||||
|
formatting behavior, and format the value as if it were a more
|
||||||
|
generic type. Currently, two explicit conversion flags are
|
||||||
|
recognized:
|
||||||
|
|
||||||
|
!r - convert the value to a string using repr().
|
||||||
|
!s - convert the value to a string using str().
|
||||||
|
|
||||||
|
These flags are typically placed before the format specifier:
|
||||||
|
|
||||||
|
"{0!r:20}".format("Hello")
|
||||||
|
|
||||||
|
In the preceding example, the string "Hello" will be printed, with quotes,
|
||||||
|
in a field of at least 20 characters width.
|
||||||
|
|
||||||
|
|
||||||
Controlling Formatting on a Per-Type Basis
|
Controlling Formatting on a Per-Type Basis
|
||||||
|
|
||||||
A class that wishes to implement a custom interpretation of its
|
Each Python type can control formatting of its instances by defining
|
||||||
conversion specifiers can implement a __format__ method:
|
a __format__ method. The __format__ method is responsible for
|
||||||
|
interpreting the format specifier, formatting the value, and
|
||||||
|
returning the resulting string.
|
||||||
|
|
||||||
|
The new, global built-in function 'format' simply calls this special
|
||||||
|
method, similar to how len() and str() simply call their respective
|
||||||
|
special methods:
|
||||||
|
|
||||||
|
def format(value, format_spec):
|
||||||
|
return value.__format__(format_spec)
|
||||||
|
|
||||||
|
It is safe to call this function with a value of "None" (because the
|
||||||
|
"None" value in Python is an object and can have methods.)
|
||||||
|
|
||||||
|
Several built-in types, including 'str', 'int', 'float', and 'object'
|
||||||
|
define __format__ methods. This means that if you derive from any of
|
||||||
|
those types, your class will know how to format itself.
|
||||||
|
|
||||||
|
The object.__format__ method is the simplest: It simply converts the
|
||||||
|
object to a string, and then calls format again:
|
||||||
|
|
||||||
|
class object:
|
||||||
|
def __format__(self, format_spec):
|
||||||
|
return format(str(self), format_spec)
|
||||||
|
|
||||||
|
The __format__ methods for 'int' and 'float' will do numeric formatting
|
||||||
|
based on the format specifier. In some cases, these formatting
|
||||||
|
operations may be delegated to other types. So for example, in the case
|
||||||
|
where the 'int' formatter sees a format type of 'f' (meaning 'float')
|
||||||
|
it can simply cast the value to a float and call format() again.
|
||||||
|
|
||||||
|
Any class can override the __format__ method to provide custom
|
||||||
|
formatting for that type:
|
||||||
|
|
||||||
class AST:
|
class AST:
|
||||||
def __format__(self, specifiers):
|
def __format__(self, format_spec):
|
||||||
...
|
...
|
||||||
|
|
||||||
The 'specifiers' argument will be either a string object or a
|
Note for Python 2.x: The 'format_spec' argument will be either
|
||||||
unicode object, depending on the type of the original format
|
a string object or a unicode object, depending on the type of the
|
||||||
string. The __format__ method should test the type of the
|
original format string. The __format__ method should test the type
|
||||||
specifiers parameter to determine whether to return a string or
|
of the specifiers parameter to determine whether to return a string or
|
||||||
unicode object. It is the responsibility of the __format__ method
|
unicode object. It is the responsibility of the __format__ method
|
||||||
to return an object of the proper type.
|
to return an object of the proper type.
|
||||||
|
|
||||||
string.format() will format each field using the following steps:
|
Note that the 'explicit conversion' flag mentioned above is not passed
|
||||||
|
to the __format__ method. Rather, it is expected that the conversion
|
||||||
1) See if the value to be formatted has a __format__ method. If
|
specified by the flag will be performed before calling __format__.
|
||||||
it does, then call it.
|
|
||||||
|
|
||||||
2) Otherwise, check the internal formatter within string.format
|
|
||||||
that contains knowledge of certain builtin types.
|
|
||||||
|
|
||||||
3) Otherwise, call str() or unicode() as appropriate.
|
|
||||||
|
|
||||||
|
|
||||||
User-Defined Formatting
|
User-Defined Formatting
|
||||||
|
@ -418,24 +457,30 @@ Formatter Methods
|
||||||
|
|
||||||
Formatter defines the following overridable methods:
|
Formatter defines the following overridable methods:
|
||||||
|
|
||||||
-- get_positional(args, index)
|
-- get_value(key, args, kwargs)
|
||||||
-- get_named(kwds, name)
|
|
||||||
-- check_unused_args(used_args, args, kwargs)
|
-- check_unused_args(used_args, args, kwargs)
|
||||||
-- format_field(value, conversion)
|
-- format_field(value, format_spec)
|
||||||
|
|
||||||
'get_positional' and 'get_named' are used to retrieve a given field
|
'get_value' is used to retrieve a given field value. The 'key' argument
|
||||||
value. For compound field names, these functions are only called for
|
will be either an integer or a string. If it is an integer, it represents
|
||||||
the first component of the field name; Subsequent components are
|
the index of the positional argument in 'args'; If it is a string, then
|
||||||
handled through normal attribute and indexing operations.
|
it represents a named argument in 'kwargs'.
|
||||||
|
|
||||||
So for example, the field expression '0.name' would cause
|
The 'args' parameter is set to the list of positional arguments to
|
||||||
'get_positional' to be called with the parameter 'args' set to the
|
'vformat', and the 'kwargs' parameter is set to the dictionary of
|
||||||
list of positional arguments to vformat, and 'index' set to zero;
|
positional arguments.
|
||||||
the returned value would then be passed to the standard 'getattr'
|
|
||||||
function to get the 'name' attribute.
|
For compound field names, these functions are only called for the
|
||||||
|
first component of the field name; Subsequent components are handled
|
||||||
|
through normal attribute and indexing operations.
|
||||||
|
|
||||||
|
So for example, the field expression '0.name' would cause 'get_value'
|
||||||
|
to be called with a 'key' argument of 0. The 'name' attribute will be
|
||||||
|
looked up after 'get_value' returns by calling the built-in 'getattr'
|
||||||
|
function.
|
||||||
|
|
||||||
If the index or keyword refers to an item that does not exist, then an
|
If the index or keyword refers to an item that does not exist, then an
|
||||||
IndexError/KeyError will be raised.
|
IndexError/KeyError should be raised.
|
||||||
|
|
||||||
'check_unused_args' is used to implement checking for unused arguments
|
'check_unused_args' is used to implement checking for unused arguments
|
||||||
if desired. The arguments to this function is the set of all argument
|
if desired. The arguments to this function is the set of all argument
|
||||||
|
@ -445,16 +490,12 @@ Formatter Methods
|
||||||
of these two sets will be the set of unused args. 'check_unused_args'
|
of these two sets will be the set of unused args. 'check_unused_args'
|
||||||
is assumed to throw an exception if the check fails.
|
is assumed to throw an exception if the check fails.
|
||||||
|
|
||||||
'format_field' actually generates the text for a replacement field.
|
'format_field' simply calls the global 'format' built-in. The method
|
||||||
The 'value' argument corresponds to the value being formatted, which
|
is provided so that subclasses can override it.
|
||||||
was retrieved from the arguments using the field name. The
|
|
||||||
'conversion' argument is the conversion spec part of the field, which
|
|
||||||
will be either a string or unicode object, depending on the type of
|
|
||||||
the original format string.
|
|
||||||
|
|
||||||
To get a better understanding of how these functions relate to each
|
To get a better understanding of how these functions relate to each
|
||||||
other, here is pseudocode that explains the general operation of
|
other, here is pseudocode that explains the general operation of
|
||||||
vformat:
|
vformat.
|
||||||
|
|
||||||
def vformat(format_string, args, kwargs):
|
def vformat(format_string, args, kwargs):
|
||||||
|
|
||||||
|
@ -465,22 +506,36 @@ Formatter Methods
|
||||||
# Tokens are either format fields or literal strings
|
# Tokens are either format fields or literal strings
|
||||||
for token in self.parse(format_string):
|
for token in self.parse(format_string):
|
||||||
if is_format_field(token):
|
if is_format_field(token):
|
||||||
field_spec, conversion_spec = token.rsplit(":", 2)
|
# Split the token into field value and format spec
|
||||||
|
field_spec, _, format_spec = token.partition(":")
|
||||||
|
|
||||||
|
# Check for explicit type conversion
|
||||||
|
field_spec, _, explicit = field_spec.partition("!")
|
||||||
|
|
||||||
# 'first_part' is the part before the first '.' or '['
|
# 'first_part' is the part before the first '.' or '['
|
||||||
first_part = get_first_part(token)
|
# Assume that 'get_first_part' returns either an int or
|
||||||
used_args.add(first_part)
|
# a string, depending on the syntax.
|
||||||
if is_positional(first_part):
|
first_part = get_first_part(field_spec)
|
||||||
value = self.get_positional(args, first_part)
|
value = self.get_value(first_part, args, kwargs)
|
||||||
else:
|
|
||||||
value = self.get_named(kwargs, first_part)
|
|
||||||
|
|
||||||
# Handle [subfield] or .subfield
|
# Record the fact that we used this arg
|
||||||
for comp in components(token):
|
used_args.add(first_part)
|
||||||
|
|
||||||
|
# Handle [subfield] or .subfield. Assume that 'components'
|
||||||
|
# returns an iterator of the various subfields, not including
|
||||||
|
# the first part.
|
||||||
|
for comp in components(field_spec):
|
||||||
value = resolve_subfield(value, comp)
|
value = resolve_subfield(value, comp)
|
||||||
|
|
||||||
# Write out the converted value
|
# Handle explicit type conversion
|
||||||
buffer.write(format_field(value, conversion))
|
if explicit == 'r':
|
||||||
|
value = repr(value)
|
||||||
|
elif explicit == 's':
|
||||||
|
value = str(value)
|
||||||
|
|
||||||
|
# Call the global 'format' function and write out the converted
|
||||||
|
# value.
|
||||||
|
buffer.write(self.format_field(value, format_spec))
|
||||||
|
|
||||||
else:
|
else:
|
||||||
buffer.write(token)
|
buffer.write(token)
|
||||||
|
@ -488,10 +543,12 @@ Formatter Methods
|
||||||
self.check_unused_args(used_args, args, kwargs)
|
self.check_unused_args(used_args, args, kwargs)
|
||||||
return buffer.getvalue()
|
return buffer.getvalue()
|
||||||
|
|
||||||
Note that the actual algorithm of the Formatter class may not be the
|
Note that the actual algorithm of the Formatter class (which will be
|
||||||
one presented here. In particular, the final implementation of
|
implemented in C) may not be the one presented here. (It's likely
|
||||||
the Formatter class may define additional overridable methods and
|
that the actual implementation won't be a 'class' at all - rather,
|
||||||
hooks. Also, the final implementation will be written in C.
|
vformat may just call a C function which accepts the other overridable
|
||||||
|
methods as arguments.) The primary purpose of this code example is to
|
||||||
|
illustrate the order in which overridable methods are called.
|
||||||
|
|
||||||
|
|
||||||
Customizing Formatters
|
Customizing Formatters
|
||||||
|
@ -512,12 +569,15 @@ Customizing Formatters
|
||||||
Formatter.__init__(self, flags)
|
Formatter.__init__(self, flags)
|
||||||
self.namespace = namespace
|
self.namespace = namespace
|
||||||
|
|
||||||
def get_named(self, kwds, name):
|
def get_value(self, key, args, kwds):
|
||||||
|
if isinstance(key, str):
|
||||||
try:
|
try:
|
||||||
# Check explicitly passed arguments first
|
# Check explicitly passed arguments first
|
||||||
return kwds[name]
|
return kwds[name]
|
||||||
except KeyError:
|
except KeyError:
|
||||||
return self.namespace[name]
|
return self.namespace[name]
|
||||||
|
else:
|
||||||
|
Formatter.get_value(key, args, kwds)
|
||||||
|
|
||||||
One can use this to easily create a formatting function that allows
|
One can use this to easily create a formatting function that allows
|
||||||
access to global variables, for example:
|
access to global variables, for example:
|
||||||
|
@ -608,7 +668,7 @@ Alternate Syntax
|
||||||
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
|
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
|
||||||
string interpolation. The Microsoft .Net libraries uses braces
|
string interpolation. The Microsoft .Net libraries uses braces
|
||||||
({}), and a syntax which is very similar to the one in this
|
({}), and a syntax which is very similar to the one in this
|
||||||
proposal, although the syntax for conversion specifiers is quite
|
proposal, although the syntax for format specifiers is quite
|
||||||
different. [4]
|
different. [4]
|
||||||
|
|
||||||
- Backquoting. This method has the benefit of minimal syntactical
|
- Backquoting. This method has the benefit of minimal syntactical
|
||||||
|
@ -631,7 +691,7 @@ Alternate Syntax
|
||||||
in front of a bracket.
|
in front of a bracket.
|
||||||
|
|
||||||
2) The use of the colon character (':') as a separator for
|
2) The use of the colon character (':') as a separator for
|
||||||
conversion specifiers. This was chosen simply because that's
|
format specifiers. This was chosen simply because that's
|
||||||
what .Net uses.
|
what .Net uses.
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue