Latest updates to PEP 3101, incorporating recent discussions.

This commit is contained in:
Talin 2007-08-15 00:14:29 +00:00
parent 0801f23a05
commit 8089f06be4
1 changed files with 171 additions and 111 deletions

View File

@ -8,7 +8,7 @@ Type: Standards Track
Content-Type: text/plain Content-Type: text/plain
Created: 16-Apr-2006 Created: 16-Apr-2006
Python-Version: 3.0 Python-Version: 3.0
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006 Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007
Abstract Abstract
@ -105,6 +105,13 @@ String Methods
identified by its keyword name, so in the above example, 'c' is identified by its keyword name, so in the above example, 'c' is
used to refer to the third argument. used to refer to the third argument.
There is also a global built-in function, 'format' which formats
a single value:
print format(10.0, "7.3g")
This function is described in a later section.
Format Strings Format Strings
@ -136,7 +143,7 @@ Format Strings
The element within the braces is called a 'field'. Fields consist The element within the braces is called a 'field'. Fields consist
of a 'field name', which can either be simple or compound, and an of a 'field name', which can either be simple or compound, and an
optional 'conversion specifier'. optional 'format specifier'.
Simple and Compound Field Names Simple and Compound Field Names
@ -159,7 +166,7 @@ Simple and Compound Field Names
expressions in format strings. This is by design - the types of expressions in format strings. This is by design - the types of
expressions that you can use is deliberately limited. Only two operators expressions that you can use is deliberately limited. Only two operators
are supported: the '.' (getattr) operator, and the '[]' (getitem) are supported: the '.' (getattr) operator, and the '[]' (getitem)
operator. The reason for allowing these operators is that they dont' operator. The reason for allowing these operators is that they don't
normally have side effects in non-pathological code. normally have side effects in non-pathological code.
An example of the 'getitem' syntax: An example of the 'getitem' syntax:
@ -180,26 +187,26 @@ Simple and Compound Field Names
not required to enforce the rule about a name being a valid not required to enforce the rule about a name being a valid
Python identifier. Instead, it will rely on the getattr function Python identifier. Instead, it will rely on the getattr function
of the underlying object to throw an exception if the identifier of the underlying object to throw an exception if the identifier
is not legal. The format function will have a minimalist parser is not legal. The str.format() function will have a minimalist
which only attempts to figure out when it is "done" with an parser which only attempts to figure out when it is "done" with an
identifier (by finding a '.' or a ']', or '}', etc.). identifier (by finding a '.' or a ']', or '}', etc.).
Conversion Specifiers Format Specifiers
Each field can also specify an optional set of 'conversion Each field can also specify an optional set of 'format
specifiers' which can be used to adjust the format of that field. specifiers' which can be used to adjust the format of that field.
Conversion specifiers follow the field name, with a colon (':') Format specifiers follow the field name, with a colon (':')
character separating the two: character separating the two:
"My name is {0:8}".format('Fred') "My name is {0:8}".format('Fred')
The meaning and syntax of the conversion specifiers depends on the The meaning and syntax of the format specifiers depends on the
type of object that is being formatted, however there is a type of object that is being formatted, however there is a
standard set of conversion specifiers used for any object that standard set of format specifiers used for any object that
does not override them. does not override them.
Conversion specifiers can themselves contain replacement fields. Format specifiers can themselves contain replacement fields.
For example, a field whose field width is itself a parameter For example, a field whose field width is itself a parameter
could be specified via: could be specified via:
@ -211,24 +218,21 @@ Conversion Specifiers
*outside* of a format field. Within a format field, the brace *outside* of a format field. Within a format field, the brace
characters always have their normal meaning. characters always have their normal meaning.
The syntax for conversion specifiers is open-ended, since a class The syntax for format specifiers is open-ended, since a class
can override the standard conversion specifiers. In such cases, can override the standard format specifiers. In such cases,
the format() method merely passes all of the characters between the str.format() method merely passes all of the characters between
the first colon and the matching brace to the relevant underlying the first colon and the matching brace to the relevant underlying
formatting method. formatting method.
Standard Conversion Specifiers Standard Format Specifiers
If an object does not define its own conversion specifiers, a If an object does not define its own format specifiers, a
standard set of conversion specifiers are used. These are similar standard set of format specifiers are used. These are similar
in concept to the conversion specifiers used by the existing '%' in concept to the format specifiers used by the existing '%'
operator, however there are also a number of significant operator, however there are also a number of differences.
differences. The standard conversion specifiers fall into three
major categories: string conversions, integer conversions and
floating point conversions.
The general form of a standard conversion specifier is: The general form of a standard format specifier is:
[[fill]align][sign][width][.precision][type] [[fill]align][sign][width][.precision][type]
@ -265,8 +269,6 @@ Standard Conversion Specifiers
numbers (this is the default behavior) numbers (this is the default behavior)
' ' - indicates that a leading space should be used on ' ' - indicates that a leading space should be used on
positive numbers positive numbers
'()' - indicates that negative numbers should be surrounded
by parentheses
'width' is a decimal integer defining the minimum field width. If 'width' is a decimal integer defining the minimum field width. If
not specified, then the field width will be determined by the not specified, then the field width will be determined by the
@ -274,25 +276,16 @@ Standard Conversion Specifiers
The 'precision' is a decimal number indicating how many digits The 'precision' is a decimal number indicating how many digits
should be displayed after the decimal point in a floating point should be displayed after the decimal point in a floating point
conversion. In a string conversion the field indicates how many conversion. For non-number types the field indicates the maximum
characters will be used from the field content. The precision is field size - in other words, how many characters will be used from
ignored for integer conversions. the field content. The precision is ignored for integer conversions.
Finally, the 'type' determines how the data should be presented. Finally, the 'type' determines how the data should be presented.
If the type field is absent, an appropriate type will be assigned If the type field is absent, an appropriate type will be assigned
based on the value to be formatted ('d' for integers and longs, based on the value to be formatted ('d' for integers and longs,
'g' for floats, and 's' for everything else.) 'g' for floats and decimals, and 's' for everything else.)
The available string conversion types are: The available integer presentation types are:
's' - String format. Invokes str() on the object.
This is the default conversion specifier type.
'r' - Repr format. Invokes repr() on the object.
There are several integer conversion types. All invoke int() on
the object before attempting to format it.
The available integer conversion types are:
'b' - Binary. Outputs the number in base 2. 'b' - Binary. Outputs the number in base 2.
'c' - Character. Converts the integer to the corresponding 'c' - Character. Converts the integer to the corresponding
@ -304,10 +297,7 @@ Standard Conversion Specifiers
'X' - Hex format. Outputs the number in base 16, using upper- 'X' - Hex format. Outputs the number in base 16, using upper-
case letters for the digits above 9. case letters for the digits above 9.
There are several floating point conversion types. All invoke The available floating point presentation types are:
float() on the object before attempting to format it.
The available floating point conversion types are:
'e' - Exponent notation. Prints the number in scientific 'e' - Exponent notation. Prints the number in scientific
notation using the letter 'e' to indicate the exponent. notation using the letter 'e' to indicate the exponent.
@ -327,39 +317,88 @@ Standard Conversion Specifiers
'%' - Percentage. Multiplies the number by 100 and displays '%' - Percentage. Multiplies the number by 100 and displays
in fixed ('f') format, followed by a percent sign. in fixed ('f') format, followed by a percent sign.
Objects are able to define their own conversion specifiers to For non-numeric types, there's only one presentation type, 's', which
does nothing - its only purpose is to ease the transition from the
old '%' style formatting.
Objects are able to define their own format specifiers to
replace the standard ones. An example is the 'datetime' class, replace the standard ones. An example is the 'datetime' class,
whose conversion specifiers might look something like the whose format specifiers might look something like the
arguments to the strftime() function: arguments to the strftime() function:
"Today is: {0:a b d H:M:S Y}".format(datetime.now()) "Today is: {0:a b d H:M:S Y}".format(datetime.now())
Explicit Conversion Flag
The explicit conversion flag is used to transform the format field value
before it is formatted. This can be used to override the type-specific
formatting behavior, and format the value as if it were a more
generic type. Currently, two explicit conversion flags are
recognized:
!r - convert the value to a string using repr().
!s - convert the value to a string using str().
These flags are typically placed before the format specifier:
"{0!r:20}".format("Hello")
In the preceding example, the string "Hello" will be printed, with quotes,
in a field of at least 20 characters width.
Controlling Formatting on a Per-Type Basis Controlling Formatting on a Per-Type Basis
A class that wishes to implement a custom interpretation of its Each Python type can control formatting of its instances by defining
conversion specifiers can implement a __format__ method: a __format__ method. The __format__ method is responsible for
interpreting the format specifier, formatting the value, and
returning the resulting string.
class AST: The new, global built-in function 'format' simply calls this special
def __format__(self, specifiers): method, similar to how len() and str() simply call their respective
... special methods:
The 'specifiers' argument will be either a string object or a def format(value, format_spec):
unicode object, depending on the type of the original format return value.__format__(format_spec)
string. The __format__ method should test the type of the
specifiers parameter to determine whether to return a string or It is safe to call this function with a value of "None" (because the
"None" value in Python is an object and can have methods.)
Several built-in types, including 'str', 'int', 'float', and 'object'
define __format__ methods. This means that if you derive from any of
those types, your class will know how to format itself.
The object.__format__ method is the simplest: It simply converts the
object to a string, and then calls format again:
class object:
def __format__(self, format_spec):
return format(str(self), format_spec)
The __format__ methods for 'int' and 'float' will do numeric formatting
based on the format specifier. In some cases, these formatting
operations may be delegated to other types. So for example, in the case
where the 'int' formatter sees a format type of 'f' (meaning 'float')
it can simply cast the value to a float and call format() again.
Any class can override the __format__ method to provide custom
formatting for that type:
class AST:
def __format__(self, format_spec):
...
Note for Python 2.x: The 'format_spec' argument will be either
a string object or a unicode object, depending on the type of the
original format string. The __format__ method should test the type
of the specifiers parameter to determine whether to return a string or
unicode object. It is the responsibility of the __format__ method unicode object. It is the responsibility of the __format__ method
to return an object of the proper type. to return an object of the proper type.
string.format() will format each field using the following steps: Note that the 'explicit conversion' flag mentioned above is not passed
to the __format__ method. Rather, it is expected that the conversion
1) See if the value to be formatted has a __format__ method. If specified by the flag will be performed before calling __format__.
it does, then call it.
2) Otherwise, check the internal formatter within string.format
that contains knowledge of certain builtin types.
3) Otherwise, call str() or unicode() as appropriate.
User-Defined Formatting User-Defined Formatting
@ -418,24 +457,30 @@ Formatter Methods
Formatter defines the following overridable methods: Formatter defines the following overridable methods:
-- get_positional(args, index) -- get_value(key, args, kwargs)
-- get_named(kwds, name)
-- check_unused_args(used_args, args, kwargs) -- check_unused_args(used_args, args, kwargs)
-- format_field(value, conversion) -- format_field(value, format_spec)
'get_positional' and 'get_named' are used to retrieve a given field 'get_value' is used to retrieve a given field value. The 'key' argument
value. For compound field names, these functions are only called for will be either an integer or a string. If it is an integer, it represents
the first component of the field name; Subsequent components are the index of the positional argument in 'args'; If it is a string, then
handled through normal attribute and indexing operations. it represents a named argument in 'kwargs'.
So for example, the field expression '0.name' would cause The 'args' parameter is set to the list of positional arguments to
'get_positional' to be called with the parameter 'args' set to the 'vformat', and the 'kwargs' parameter is set to the dictionary of
list of positional arguments to vformat, and 'index' set to zero; positional arguments.
the returned value would then be passed to the standard 'getattr'
function to get the 'name' attribute. For compound field names, these functions are only called for the
first component of the field name; Subsequent components are handled
through normal attribute and indexing operations.
So for example, the field expression '0.name' would cause 'get_value'
to be called with a 'key' argument of 0. The 'name' attribute will be
looked up after 'get_value' returns by calling the built-in 'getattr'
function.
If the index or keyword refers to an item that does not exist, then an If the index or keyword refers to an item that does not exist, then an
IndexError/KeyError will be raised. IndexError/KeyError should be raised.
'check_unused_args' is used to implement checking for unused arguments 'check_unused_args' is used to implement checking for unused arguments
if desired. The arguments to this function is the set of all argument if desired. The arguments to this function is the set of all argument
@ -445,16 +490,12 @@ Formatter Methods
of these two sets will be the set of unused args. 'check_unused_args' of these two sets will be the set of unused args. 'check_unused_args'
is assumed to throw an exception if the check fails. is assumed to throw an exception if the check fails.
'format_field' actually generates the text for a replacement field. 'format_field' simply calls the global 'format' built-in. The method
The 'value' argument corresponds to the value being formatted, which is provided so that subclasses can override it.
was retrieved from the arguments using the field name. The
'conversion' argument is the conversion spec part of the field, which
will be either a string or unicode object, depending on the type of
the original format string.
To get a better understanding of how these functions relate to each To get a better understanding of how these functions relate to each
other, here is pseudocode that explains the general operation of other, here is pseudocode that explains the general operation of
vformat: vformat.
def vformat(format_string, args, kwargs): def vformat(format_string, args, kwargs):
@ -465,22 +506,36 @@ Formatter Methods
# Tokens are either format fields or literal strings # Tokens are either format fields or literal strings
for token in self.parse(format_string): for token in self.parse(format_string):
if is_format_field(token): if is_format_field(token):
field_spec, conversion_spec = token.rsplit(":", 2) # Split the token into field value and format spec
field_spec, _, format_spec = token.partition(":")
# Check for explicit type conversion
field_spec, _, explicit = field_spec.partition("!")
# 'first_part' is the part before the first '.' or '[' # 'first_part' is the part before the first '.' or '['
first_part = get_first_part(token) # Assume that 'get_first_part' returns either an int or
used_args.add(first_part) # a string, depending on the syntax.
if is_positional(first_part): first_part = get_first_part(field_spec)
value = self.get_positional(args, first_part) value = self.get_value(first_part, args, kwargs)
else:
value = self.get_named(kwargs, first_part)
# Handle [subfield] or .subfield # Record the fact that we used this arg
for comp in components(token): used_args.add(first_part)
# Handle [subfield] or .subfield. Assume that 'components'
# returns an iterator of the various subfields, not including
# the first part.
for comp in components(field_spec):
value = resolve_subfield(value, comp) value = resolve_subfield(value, comp)
# Write out the converted value # Handle explicit type conversion
buffer.write(format_field(value, conversion)) if explicit == 'r':
value = repr(value)
elif explicit == 's':
value = str(value)
# Call the global 'format' function and write out the converted
# value.
buffer.write(self.format_field(value, format_spec))
else: else:
buffer.write(token) buffer.write(token)
@ -488,10 +543,12 @@ Formatter Methods
self.check_unused_args(used_args, args, kwargs) self.check_unused_args(used_args, args, kwargs)
return buffer.getvalue() return buffer.getvalue()
Note that the actual algorithm of the Formatter class may not be the Note that the actual algorithm of the Formatter class (which will be
one presented here. In particular, the final implementation of implemented in C) may not be the one presented here. (It's likely
the Formatter class may define additional overridable methods and that the actual implementation won't be a 'class' at all - rather,
hooks. Also, the final implementation will be written in C. vformat may just call a C function which accepts the other overridable
methods as arguments.) The primary purpose of this code example is to
illustrate the order in which overridable methods are called.
Customizing Formatters Customizing Formatters
@ -512,12 +569,15 @@ Customizing Formatters
Formatter.__init__(self, flags) Formatter.__init__(self, flags)
self.namespace = namespace self.namespace = namespace
def get_named(self, kwds, name): def get_value(self, key, args, kwds):
try: if isinstance(key, str):
# Check explicitly passed arguments first try:
return kwds[name] # Check explicitly passed arguments first
except KeyError: return kwds[name]
return self.namespace[name] except KeyError:
return self.namespace[name]
else:
Formatter.get_value(key, args, kwds)
One can use this to easily create a formatting function that allows One can use this to easily create a formatting function that allows
access to global variables, for example: access to global variables, for example:
@ -608,7 +668,7 @@ Alternate Syntax
Dungeons) such as MUSH have used brackets (e.g. [name]) to do Dungeons) such as MUSH have used brackets (e.g. [name]) to do
string interpolation. The Microsoft .Net libraries uses braces string interpolation. The Microsoft .Net libraries uses braces
({}), and a syntax which is very similar to the one in this ({}), and a syntax which is very similar to the one in this
proposal, although the syntax for conversion specifiers is quite proposal, although the syntax for format specifiers is quite
different. [4] different. [4]
- Backquoting. This method has the benefit of minimal syntactical - Backquoting. This method has the benefit of minimal syntactical
@ -631,7 +691,7 @@ Alternate Syntax
in front of a bracket. in front of a bracket.
2) The use of the colon character (':') as a separator for 2) The use of the colon character (':') as a separator for
conversion specifiers. This was chosen simply because that's format specifiers. This was chosen simply because that's
what .Net uses. what .Net uses.