diff --git a/pep-3101.txt b/pep-3101.txt index a097ac062..bef669afe 100644 --- a/pep-3101.txt +++ b/pep-3101.txt @@ -5,822 +5,843 @@ Last-Modified: $Date$ Author: Talin Status: Final Type: Standards Track -Content-Type: text/plain +Content-Type: text/x-rst Created: 16-Apr-2006 Python-Version: 3.0 Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007, 14-Sep-2008 Abstract +======== - This PEP proposes a new system for built-in string formatting - operations, intended as a replacement for the existing '%' string - formatting operator. +This PEP proposes a new system for built-in string formatting +operations, intended as a replacement for the existing '%' string +formatting operator. Rationale +========= - Python currently provides two methods of string interpolation: +Python currently provides two methods of string interpolation: - - The '%' operator for strings. [1] +- The '%' operator for strings. [1]_ - - The string.Template module. [2] +- The string.Template module. [2]_ - The primary scope of this PEP concerns proposals for built-in - string formatting operations (in other words, methods of the - built-in string type). +The primary scope of this PEP concerns proposals for built-in +string formatting operations (in other words, methods of the +built-in string type). - The '%' operator is primarily limited by the fact that it is a - binary operator, and therefore can take at most two arguments. - One of those arguments is already dedicated to the format string, - leaving all other variables to be squeezed into the remaining - argument. The current practice is to use either a dictionary or a - tuple as the second argument, but as many people have commented - [3], this lacks flexibility. The "all or nothing" approach - (meaning that one must choose between only positional arguments, - or only named arguments) is felt to be overly constraining. +The '%' operator is primarily limited by the fact that it is a +binary operator, and therefore can take at most two arguments. +One of those arguments is already dedicated to the format string, +leaving all other variables to be squeezed into the remaining +argument. The current practice is to use either a dictionary or a +tuple as the second argument, but as many people have commented +[3]_, this lacks flexibility. The "all or nothing" approach +(meaning that one must choose between only positional arguments, +or only named arguments) is felt to be overly constraining. - While there is some overlap between this proposal and - string.Template, it is felt that each serves a distinct need, - and that one does not obviate the other. This proposal is for - a mechanism which, like '%', is efficient for small strings - which are only used once, so, for example, compilation of a - string into a template is not contemplated in this proposal, - although the proposal does take care to define format strings - and the API in such a way that an efficient template package - could reuse the syntax and even some of the underlying - formatting code. +While there is some overlap between this proposal and +string.Template, it is felt that each serves a distinct need, +and that one does not obviate the other. This proposal is for +a mechanism which, like '%', is efficient for small strings +which are only used once, so, for example, compilation of a +string into a template is not contemplated in this proposal, +although the proposal does take care to define format strings +and the API in such a way that an efficient template package +could reuse the syntax and even some of the underlying +formatting code. Specification +============= - The specification will consist of the following parts: +The specification will consist of the following parts: - - Specification of a new formatting method to be added to the - built-in string class. +- Specification of a new formatting method to be added to the + built-in string class. - - Specification of functions and flag values to be added to - the string module, so that the underlying formatting engine - can be used with additional options. +- Specification of functions and flag values to be added to + the string module, so that the underlying formatting engine + can be used with additional options. - - Specification of a new syntax for format strings. +- Specification of a new syntax for format strings. - - Specification of a new set of special methods to control the - formatting and conversion of objects. +- Specification of a new set of special methods to control the + formatting and conversion of objects. - - Specification of an API for user-defined formatting classes. +- Specification of an API for user-defined formatting classes. - - Specification of how formatting errors are handled. +- Specification of how formatting errors are handled. - Note on string encodings: When discussing this PEP in the context - of Python 3.0, it is assumed that all strings are unicode strings, - and that the use of the word 'string' in the context of this - document will generally refer to a Python 3.0 string, which is - the same as Python 2.x unicode object. +Note on string encodings: When discussing this PEP in the context +of Python 3.0, it is assumed that all strings are unicode strings, +and that the use of the word 'string' in the context of this +document will generally refer to a Python 3.0 string, which is +the same as Python 2.x unicode object. - In the context of Python 2.x, the use of the word 'string' in this - document refers to an object which may either be a regular string - or a unicode object. All of the function call interfaces - described in this PEP can be used for both strings and unicode - objects, and in all cases there is sufficient information - to be able to properly deduce the output string type (in - other words, there is no need for two separate APIs). - In all cases, the type of the format string dominates - that - is, the result of the conversion will always result in an object - that contains the same representation of characters as the - input format string. +In the context of Python 2.x, the use of the word 'string' in this +document refers to an object which may either be a regular string +or a unicode object. All of the function call interfaces +described in this PEP can be used for both strings and unicode +objects, and in all cases there is sufficient information +to be able to properly deduce the output string type (in +other words, there is no need for two separate APIs). +In all cases, the type of the format string dominates - that +is, the result of the conversion will always result in an object +that contains the same representation of characters as the +input format string. String Methods +-------------- - The built-in string class (and also the unicode class in 2.6) will - gain a new method, 'format', which takes an arbitrary number of - positional and keyword arguments: +The built-in string class (and also the unicode class in 2.6) will +gain a new method, 'format', which takes an arbitrary number of +positional and keyword arguments:: - "The story of {0}, {1}, and {c}".format(a, b, c=d) + "The story of {0}, {1}, and {c}".format(a, b, c=d) - Within a format string, each positional argument is identified - with a number, starting from zero, so in the above example, 'a' is - argument 0 and 'b' is argument 1. Each keyword argument is - identified by its keyword name, so in the above example, 'c' is - used to refer to the third argument. +Within a format string, each positional argument is identified +with a number, starting from zero, so in the above example, 'a' is +argument 0 and 'b' is argument 1. Each keyword argument is +identified by its keyword name, so in the above example, 'c' is +used to refer to the third argument. - There is also a global built-in function, 'format' which formats - a single value: +There is also a global built-in function, 'format' which formats +a single value:: - print(format(10.0, "7.3g")) + print(format(10.0, "7.3g")) - This function is described in a later section. +This function is described in a later section. Format Strings +-------------- - Format strings consist of intermingled character data and markup. +Format strings consist of intermingled character data and markup. - Character data is data which is transferred unchanged from the - format string to the output string; markup is not transferred from - the format string directly to the output, but instead is used to - define 'replacement fields' that describe to the format engine - what should be placed in the output string in place of the markup. +Character data is data which is transferred unchanged from the +format string to the output string; markup is not transferred from +the format string directly to the output, but instead is used to +define 'replacement fields' that describe to the format engine +what should be placed in the output string in place of the markup. - Brace characters ('curly braces') are used to indicate a - replacement field within the string: +Brace characters ('curly braces') are used to indicate a +replacement field within the string:: - "My name is {0}".format('Fred') + "My name is {0}".format('Fred') - The result of this is the string: +The result of this is the string:: - "My name is Fred" + "My name is Fred" - Braces can be escaped by doubling: +Braces can be escaped by doubling:: - "My name is {0} :-{{}}".format('Fred') + "My name is {0} :-{{}}".format('Fred') - Which would produce: +Which would produce:: - "My name is Fred :-{}" + "My name is Fred :-{}" - The element within the braces is called a 'field'. Fields consist - of a 'field name', which can either be simple or compound, and an - optional 'format specifier'. +The element within the braces is called a 'field'. Fields consist +of a 'field name', which can either be simple or compound, and an +optional 'format specifier'. Simple and Compound Field Names +------------------------------- - Simple field names are either names or numbers. If numbers, they - must be valid base-10 integers; if names, they must be valid - Python identifiers. A number is used to identify a positional - argument, while a name is used to identify a keyword argument. +Simple field names are either names or numbers. If numbers, they +must be valid base-10 integers; if names, they must be valid +Python identifiers. A number is used to identify a positional +argument, while a name is used to identify a keyword argument. - A compound field name is a combination of multiple simple field - names in an expression: +A compound field name is a combination of multiple simple field +names in an expression:: - "My name is {0.name}".format(open('out.txt', 'w')) + "My name is {0.name}".format(open('out.txt', 'w')) - This example shows the use of the 'getattr' or 'dot' operator - in a field expression. The dot operator allows an attribute of - an input value to be specified as the field value. +This example shows the use of the 'getattr' or 'dot' operator +in a field expression. The dot operator allows an attribute of +an input value to be specified as the field value. - Unlike some other programming languages, you cannot embed arbitrary - expressions in format strings. This is by design - the types of - expressions that you can use is deliberately limited. Only two operators - are supported: the '.' (getattr) operator, and the '[]' (getitem) - operator. The reason for allowing these operators is that they don't - normally have side effects in non-pathological code. +Unlike some other programming languages, you cannot embed arbitrary +expressions in format strings. This is by design - the types of +expressions that you can use is deliberately limited. Only two operators +are supported: the '.' (getattr) operator, and the '[]' (getitem) +operator. The reason for allowing these operators is that they don't +normally have side effects in non-pathological code. - An example of the 'getitem' syntax: +An example of the 'getitem' syntax:: - "My name is {0[name]}".format(dict(name='Fred')) + "My name is {0[name]}".format(dict(name='Fred')) - It should be noted that the use of 'getitem' within a format string - is much more limited than its conventional usage. In the above example, - the string 'name' really is the literal string 'name', not a variable - named 'name'. The rules for parsing an item key are very simple. - If it starts with a digit, then it is treated as a number, otherwise - it is used as a string. +It should be noted that the use of 'getitem' within a format string +is much more limited than its conventional usage. In the above example, +the string 'name' really is the literal string 'name', not a variable +named 'name'. The rules for parsing an item key are very simple. +If it starts with a digit, then it is treated as a number, otherwise +it is used as a string. - Because keys are not quote-delimited, it is not possible to - specify arbitrary dictionary keys (e.g., the strings "10" or - ":-]") from within a format string. +Because keys are not quote-delimited, it is not possible to +specify arbitrary dictionary keys (e.g., the strings "10" or +":-]") from within a format string. - Implementation note: The implementation of this proposal is - not required to enforce the rule about a simple or dotted name - being a valid Python identifier. Instead, it will rely on the - getattr function of the underlying object to throw an exception if - the identifier is not legal. The str.format() function will have - a minimalist parser which only attempts to figure out when it is - "done" with an identifier (by finding a '.' or a ']', or '}', - etc.). +Implementation note: The implementation of this proposal is +not required to enforce the rule about a simple or dotted name +being a valid Python identifier. Instead, it will rely on the +getattr function of the underlying object to throw an exception if +the identifier is not legal. The ``str.format()`` function will have +a minimalist parser which only attempts to figure out when it is +"done" with an identifier (by finding a '.' or a ']', or '}', +etc.). Format Specifiers +----------------- - Each field can also specify an optional set of 'format - specifiers' which can be used to adjust the format of that field. - Format specifiers follow the field name, with a colon (':') - character separating the two: +Each field can also specify an optional set of 'format +specifiers' which can be used to adjust the format of that field. +Format specifiers follow the field name, with a colon (':') +character separating the two:: - "My name is {0:8}".format('Fred') + "My name is {0:8}".format('Fred') - The meaning and syntax of the format specifiers depends on the - type of object that is being formatted, but there is a standard - set of format specifiers used for any object that does not - override them. +The meaning and syntax of the format specifiers depends on the +type of object that is being formatted, but there is a standard +set of format specifiers used for any object that does not +override them. - Format specifiers can themselves contain replacement fields. - For example, a field whose field width is itself a parameter - could be specified via: +Format specifiers can themselves contain replacement fields. +For example, a field whose field width is itself a parameter +could be specified via:: - "{0:{1}}".format(a, b) + "{0:{1}}".format(a, b) - These 'internal' replacement fields can only occur in the format - specifier part of the replacement field. Internal replacement fields - cannot themselves have format specifiers. This implies also that - replacement fields cannot be nested to arbitrary levels. +These 'internal' replacement fields can only occur in the format +specifier part of the replacement field. Internal replacement fields +cannot themselves have format specifiers. This implies also that +replacement fields cannot be nested to arbitrary levels. - Note that the doubled '}' at the end, which would normally be - escaped, is not escaped in this case. The reason is because - the '{{' and '}}' syntax for escapes is only applied when used - *outside* of a format field. Within a format field, the brace - characters always have their normal meaning. +Note that the doubled '}' at the end, which would normally be +escaped, is not escaped in this case. The reason is because +the '{{' and '}}' syntax for escapes is only applied when used +**outside** of a format field. Within a format field, the brace +characters always have their normal meaning. - The syntax for format specifiers is open-ended, since a class - can override the standard format specifiers. In such cases, - the str.format() method merely passes all of the characters between - the first colon and the matching brace to the relevant underlying - formatting method. +The syntax for format specifiers is open-ended, since a class +can override the standard format specifiers. In such cases, +the ``str.format()`` method merely passes all of the characters between +the first colon and the matching brace to the relevant underlying +formatting method. Standard Format Specifiers +-------------------------- - If an object does not define its own format specifiers, a standard - set of format specifiers is used. These are similar in concept to - the format specifiers used by the existing '%' operator, however - there are also a number of differences. +If an object does not define its own format specifiers, a standard +set of format specifiers is used. These are similar in concept to +the format specifiers used by the existing '%' operator, however +there are also a number of differences. - The general form of a standard format specifier is: +The general form of a standard format specifier is:: - [[fill]align][sign][#][0][minimumwidth][.precision][type] + [[fill]align][sign][#][0][minimumwidth][.precision][type] - The brackets ([]) indicate an optional element. +The brackets ([]) indicate an optional element. - Then the optional align flag can be one of the following: +Then the optional align flag can be one of the following:: - '<' - Forces the field to be left-aligned within the available - space (This is the default.) - '>' - Forces the field to be right-aligned within the - available space. - '=' - Forces the padding to be placed after the sign (if any) - but before the digits. This is used for printing fields - in the form '+000000120'. This alignment option is only - valid for numeric types. - '^' - Forces the field to be centered within the available - space. + '<' - Forces the field to be left-aligned within the available + space (This is the default.) + '>' - Forces the field to be right-aligned within the + available space. + '=' - Forces the padding to be placed after the sign (if any) + but before the digits. This is used for printing fields + in the form '+000000120'. This alignment option is only + valid for numeric types. + '^' - Forces the field to be centered within the available + space. - Note that unless a minimum field width is defined, the field - width will always be the same size as the data to fill it, so - that the alignment option has no meaning in this case. +Note that unless a minimum field width is defined, the field +width will always be the same size as the data to fill it, so +that the alignment option has no meaning in this case. - The optional 'fill' character defines the character to be used to - pad the field to the minimum width. The fill character, if present, - must be followed by an alignment flag. +The optional 'fill' character defines the character to be used to +pad the field to the minimum width. The fill character, if present, +must be followed by an alignment flag. - The 'sign' option is only valid for numeric types, and can be one - of the following: +The 'sign' option is only valid for numeric types, and can be one +of the following:: - '+' - indicates that a sign should be used for both - positive as well as negative numbers - '-' - indicates that a sign should be used only for negative - numbers (this is the default behavior) - ' ' - indicates that a leading space should be used on - positive numbers + '+' - indicates that a sign should be used for both + positive as well as negative numbers + '-' - indicates that a sign should be used only for negative + numbers (this is the default behavior) + ' ' - indicates that a leading space should be used on + positive numbers - If the '#' character is present, integers use the 'alternate form' - for formatting. This means that binary, octal, and hexadecimal - output will be prefixed with '0b', '0o', and '0x', respectively. +If the '#' character is present, integers use the 'alternate form' +for formatting. This means that binary, octal, and hexadecimal +output will be prefixed with '0b', '0o', and '0x', respectively. - 'width' is a decimal integer defining the minimum field width. If - not specified, then the field width will be determined by the - content. +'width' is a decimal integer defining the minimum field width. If +not specified, then the field width will be determined by the +content. - If the width field is preceded by a zero ('0') character, this enables - zero-padding. This is equivalent to an alignment type of '=' and a - fill character of '0'. +If the width field is preceded by a zero ('0') character, this enables +zero-padding. This is equivalent to an alignment type of '=' and a +fill character of '0'. - The 'precision' is a decimal number indicating how many digits - should be displayed after the decimal point in a floating point - conversion. For non-numeric types the field indicates the maximum - field size - in other words, how many characters will be used from - the field content. The precision is ignored for integer conversions. +The 'precision' is a decimal number indicating how many digits +should be displayed after the decimal point in a floating point +conversion. For non-numeric types the field indicates the maximum +field size - in other words, how many characters will be used from +the field content. The precision is ignored for integer conversions. - Finally, the 'type' determines how the data should be presented. +Finally, the 'type' determines how the data should be presented. - The available integer presentation types are: +The available integer presentation types are:: - 'b' - Binary. Outputs the number in base 2. - 'c' - Character. Converts the integer to the corresponding - Unicode character before printing. - 'd' - Decimal Integer. Outputs the number in base 10. - 'o' - Octal format. Outputs the number in base 8. - 'x' - Hex format. Outputs the number in base 16, using lower- - case letters for the digits above 9. - 'X' - Hex format. Outputs the number in base 16, using upper- - case letters for the digits above 9. - 'n' - Number. This is the same as 'd', except that it uses the - current locale setting to insert the appropriate - number separator characters. - '' (None) - the same as 'd' + 'b' - Binary. Outputs the number in base 2. + 'c' - Character. Converts the integer to the corresponding + Unicode character before printing. + 'd' - Decimal Integer. Outputs the number in base 10. + 'o' - Octal format. Outputs the number in base 8. + 'x' - Hex format. Outputs the number in base 16, using lower- + case letters for the digits above 9. + 'X' - Hex format. Outputs the number in base 16, using upper- + case letters for the digits above 9. + 'n' - Number. This is the same as 'd', except that it uses the + current locale setting to insert the appropriate + number separator characters. + '' (None) - the same as 'd' - The available floating point presentation types are: +The available floating point presentation types are:: - 'e' - Exponent notation. Prints the number in scientific - notation using the letter 'e' to indicate the exponent. - 'E' - Exponent notation. Same as 'e' except it converts the - number to uppercase. - 'f' - Fixed point. Displays the number as a fixed-point - number. - 'F' - Fixed point. Same as 'f' except it converts the number - to uppercase. - 'g' - General format. This prints the number as a fixed-point - number, unless the number is too large, in which case - it switches to 'e' exponent notation. - 'G' - General format. Same as 'g' except switches to 'E' - if the number gets to large. - 'n' - Number. This is the same as 'g', except that it uses the - current locale setting to insert the appropriate - number separator characters. - '%' - Percentage. Multiplies the number by 100 and displays - in fixed ('f') format, followed by a percent sign. - '' (None) - similar to 'g', except that it prints at least one - digit after the decimal point. + 'e' - Exponent notation. Prints the number in scientific + notation using the letter 'e' to indicate the exponent. + 'E' - Exponent notation. Same as 'e' except it converts the + number to uppercase. + 'f' - Fixed point. Displays the number as a fixed-point + number. + 'F' - Fixed point. Same as 'f' except it converts the number + to uppercase. + 'g' - General format. This prints the number as a fixed-point + number, unless the number is too large, in which case + it switches to 'e' exponent notation. + 'G' - General format. Same as 'g' except switches to 'E' + if the number gets to large. + 'n' - Number. This is the same as 'g', except that it uses the + current locale setting to insert the appropriate + number separator characters. + '%' - Percentage. Multiplies the number by 100 and displays + in fixed ('f') format, followed by a percent sign. + '' (None) - similar to 'g', except that it prints at least one + digit after the decimal point. - Objects are able to define their own format specifiers to - replace the standard ones. An example is the 'datetime' class, - whose format specifiers might look something like the - arguments to the strftime() function: +Objects are able to define their own format specifiers to +replace the standard ones. An example is the 'datetime' class, +whose format specifiers might look something like the +arguments to the ``strftime()`` function:: - "Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now()) + "Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now()) - For all built-in types, an empty format specification will produce - the equivalent of str(value). It is recommended that objects - defining their own format specifiers follow this convention as - well. +For all built-in types, an empty format specification will produce +the equivalent of ``str(value)``. It is recommended that objects +defining their own format specifiers follow this convention as +well. Explicit Conversion Flag +------------------------ - The explicit conversion flag is used to transform the format field value - before it is formatted. This can be used to override the type-specific - formatting behavior, and format the value as if it were a more - generic type. Currently, two explicit conversion flags are - recognized: +The explicit conversion flag is used to transform the format field value +before it is formatted. This can be used to override the type-specific +formatting behavior, and format the value as if it were a more +generic type. Currently, two explicit conversion flags are +recognized:: - !r - convert the value to a string using repr(). - !s - convert the value to a string using str(). + !r - convert the value to a string using repr(). + !s - convert the value to a string using str(). - These flags are placed before the format specifier: +These flags are placed before the format specifier:: - "{0!r:20}".format("Hello") + "{0!r:20}".format("Hello") - In the preceding example, the string "Hello" will be printed, with quotes, - in a field of at least 20 characters width. +In the preceding example, the string "Hello" will be printed, with quotes, +in a field of at least 20 characters width. - A custom Formatter class can define additional conversion flags. - The built-in formatter will raise a ValueError if an invalid - conversion flag is specified. +A custom Formatter class can define additional conversion flags. +The built-in formatter will raise a ValueError if an invalid +conversion flag is specified. Controlling Formatting on a Per-Type Basis +------------------------------------------ - Each Python type can control formatting of its instances by defining - a __format__ method. The __format__ method is responsible for - interpreting the format specifier, formatting the value, and - returning the resulting string. +Each Python type can control formatting of its instances by defining +a ``__format__`` method. The ``__format__`` method is responsible for +interpreting the format specifier, formatting the value, and +returning the resulting string. - The new, global built-in function 'format' simply calls this special - method, similar to how len() and str() simply call their respective - special methods: +The new, global built-in function 'format' simply calls this special +method, similar to how ``len()`` and ``str()`` simply call their respective +special methods:: - def format(value, format_spec): - return value.__format__(format_spec) + def format(value, format_spec): + return value.__format__(format_spec) - It is safe to call this function with a value of "None" (because the - "None" value in Python is an object and can have methods.) +It is safe to call this function with a value of "None" (because the +"None" value in Python is an object and can have methods.) - Several built-in types, including 'str', 'int', 'float', and 'object' - define __format__ methods. This means that if you derive from any of - those types, your class will know how to format itself. +Several built-in types, including 'str', 'int', 'float', and 'object' +define ``__format__`` methods. This means that if you derive from any of +those types, your class will know how to format itself. - The object.__format__ method is the simplest: It simply converts the - object to a string, and then calls format again: +The ``object.__format__`` method is the simplest: It simply converts the +object to a string, and then calls format again:: - class object: - def __format__(self, format_spec): - return format(str(self), format_spec) + class object: + def __format__(self, format_spec): + return format(str(self), format_spec) - The __format__ methods for 'int' and 'float' will do numeric formatting - based on the format specifier. In some cases, these formatting - operations may be delegated to other types. So for example, in the case - where the 'int' formatter sees a format type of 'f' (meaning 'float') - it can simply cast the value to a float and call format() again. +The ``__format__`` methods for 'int' and 'float' will do numeric formatting +based on the format specifier. In some cases, these formatting +operations may be delegated to other types. So for example, in the case +where the 'int' formatter sees a format type of 'f' (meaning 'float') +it can simply cast the value to a float and call ``format()`` again. - Any class can override the __format__ method to provide custom - formatting for that type: +Any class can override the ``__format__`` method to provide custom +formatting for that type:: - class AST: - def __format__(self, format_spec): - ... + class AST: + def __format__(self, format_spec): + ... - Note for Python 2.x: The 'format_spec' argument will be either - a string object or a unicode object, depending on the type of the - original format string. The __format__ method should test the type - of the specifiers parameter to determine whether to return a string or - unicode object. It is the responsibility of the __format__ method - to return an object of the proper type. +Note for Python 2.x: The 'format_spec' argument will be either +a string object or a unicode object, depending on the type of the +original format string. The ``__format__`` method should test the type +of the specifiers parameter to determine whether to return a string or +unicode object. It is the responsibility of the ``__format__`` method +to return an object of the proper type. - Note that the 'explicit conversion' flag mentioned above is not passed - to the __format__ method. Rather, it is expected that the conversion - specified by the flag will be performed before calling __format__. +Note that the 'explicit conversion' flag mentioned above is not passed +to the ``__format__`` method. Rather, it is expected that the conversion +specified by the flag will be performed before calling ``__format__``. User-Defined Formatting +----------------------- - There will be times when customizing the formatting of fields - on a per-type basis is not enough. An example might be a - spreadsheet application, which displays hash marks '#' when a value - is too large to fit in the available space. +There will be times when customizing the formatting of fields +on a per-type basis is not enough. An example might be a +spreadsheet application, which displays hash marks '#' when a value +is too large to fit in the available space. - For more powerful and flexible formatting, access to the underlying - format engine can be obtained through the 'Formatter' class that - lives in the 'string' module. This class takes additional options - which are not accessible via the normal str.format method. +For more powerful and flexible formatting, access to the underlying +format engine can be obtained through the 'Formatter' class that +lives in the 'string' module. This class takes additional options +which are not accessible via the normal str.format method. - An application can subclass the Formatter class to create its own - customized formatting behavior. +An application can subclass the Formatter class to create its own +customized formatting behavior. - The PEP does not attempt to exactly specify all methods and - properties defined by the Formatter class; instead, those will be - defined and documented in the initial implementation. However, this - PEP will specify the general requirements for the Formatter class, - which are listed below. +The PEP does not attempt to exactly specify all methods and +properties defined by the ``Formatter`` class; instead, those will be +defined and documented in the initial implementation. However, this +PEP will specify the general requirements for the ``Formatter`` class, +which are listed below. - Although string.format() does not directly use the Formatter class - to do formatting, both use the same underlying implementation. The - reason that string.format() does not use the Formatter class directly - is because "string" is a built-in type, which means that all of its - methods must be implemented in C, whereas Formatter is a Python - class. Formatter provides an extensible wrapper around the same - C functions as are used by string.format(). +Although ``string.format()`` does not directly use the ``Formatter`` class +to do formatting, both use the same underlying implementation. The +reason that ``string.format()`` does not use the ``Formatter`` class directly +is because "string" is a built-in type, which means that all of its +methods must be implemented in C, whereas ``Formatter`` is a Python +class. ``Formatter`` provides an extensible wrapper around the same +C functions as are used by ``string.format()``. Formatter Methods +----------------- - The Formatter class takes no initialization arguments: +The ``Formatter`` class takes no initialization arguments:: - fmt = Formatter() + fmt = Formatter() - The public API methods of class Formatter are as follows: +The public API methods of class ``Formatter`` are as follows:: - -- format(format_string, *args, **kwargs) - -- vformat(format_string, args, kwargs) + -- format(format_string, *args, **kwargs) + -- vformat(format_string, args, kwargs) - 'format' is the primary API method. It takes a format template, - and an arbitrary set of positional and keyword arguments. - 'format' is just a wrapper that calls 'vformat'. +'format' is the primary API method. It takes a format template, +and an arbitrary set of positional and keyword arguments. +'format' is just a wrapper that calls 'vformat'. - 'vformat' is the function that does the actual work of formatting. It - is exposed as a separate function for cases where you want to pass in - a predefined dictionary of arguments, rather than unpacking and - repacking the dictionary as individual arguments using the '*args' and - '**kwds' syntax. 'vformat' does the work of breaking up the format - template string into character data and replacement fields. It calls - the 'get_positional' and 'get_index' methods as appropriate (described - below.) +'vformat' is the function that does the actual work of formatting. It +is exposed as a separate function for cases where you want to pass in +a predefined dictionary of arguments, rather than unpacking and +repacking the dictionary as individual arguments using the ``*args`` and +``**kwds`` syntax. 'vformat' does the work of breaking up the format +template string into character data and replacement fields. It calls +the 'get_positional' and 'get_index' methods as appropriate (described +below.) - Formatter defines the following overridable methods: +``Formatter`` defines the following overridable methods:: - -- get_value(key, args, kwargs) - -- check_unused_args(used_args, args, kwargs) - -- format_field(value, format_spec) + -- get_value(key, args, kwargs) + -- check_unused_args(used_args, args, kwargs) + -- format_field(value, format_spec) - 'get_value' is used to retrieve a given field value. The 'key' argument - will be either an integer or a string. If it is an integer, it represents - the index of the positional argument in 'args'; If it is a string, then - it represents a named argument in 'kwargs'. +'get_value' is used to retrieve a given field value. The 'key' argument +will be either an integer or a string. If it is an integer, it represents +the index of the positional argument in 'args'; If it is a string, then +it represents a named argument in 'kwargs'. - The 'args' parameter is set to the list of positional arguments to - 'vformat', and the 'kwargs' parameter is set to the dictionary of - positional arguments. +The 'args' parameter is set to the list of positional arguments to +'vformat', and the 'kwargs' parameter is set to the dictionary of +positional arguments. - For compound field names, these functions are only called for the - first component of the field name; subsequent components are handled - through normal attribute and indexing operations. +For compound field names, these functions are only called for the +first component of the field name; subsequent components are handled +through normal attribute and indexing operations. - So for example, the field expression '0.name' would cause 'get_value' - to be called with a 'key' argument of 0. The 'name' attribute will be - looked up after 'get_value' returns by calling the built-in 'getattr' - function. +So for example, the field expression '0.name' would cause 'get_value' +to be called with a 'key' argument of 0. The 'name' attribute will be +looked up after 'get_value' returns by calling the built-in 'getattr' +function. - If the index or keyword refers to an item that does not exist, then an - IndexError/KeyError should be raised. +If the index or keyword refers to an item that does not exist, then an +``IndexError/KeyError`` should be raised. - 'check_unused_args' is used to implement checking for unused arguments - if desired. The arguments to this function is the set of all argument - keys that were actually referred to in the format string (integers for - positional arguments, and strings for named arguments), and a reference - to the args and kwargs that was passed to vformat. The set of unused - args can be calculated from these parameters. 'check_unused_args' - is assumed to throw an exception if the check fails. +'check_unused_args' is used to implement checking for unused arguments +if desired. The arguments to this function is the set of all argument +keys that were actually referred to in the format string (integers for +positional arguments, and strings for named arguments), and a reference +to the args and kwargs that was passed to vformat. The set of unused +args can be calculated from these parameters. 'check_unused_args' +is assumed to throw an exception if the check fails. - 'format_field' simply calls the global 'format' built-in. The method - is provided so that subclasses can override it. +'format_field' simply calls the global 'format' built-in. The method +is provided so that subclasses can override it. - To get a better understanding of how these functions relate to each - other, here is pseudocode that explains the general operation of - vformat. +To get a better understanding of how these functions relate to each +other, here is pseudocode that explains the general operation of +vformat:: - def vformat(format_string, args, kwargs): + def vformat(format_string, args, kwargs): - # Output buffer and set of used args - buffer = StringIO.StringIO() - used_args = set() + # Output buffer and set of used args + buffer = StringIO.StringIO() + used_args = set() - # Tokens are either format fields or literal strings - for token in self.parse(format_string): - if is_format_field(token): - # Split the token into field value and format spec - field_spec, _, format_spec = token.partition(":") + # Tokens are either format fields or literal strings + for token in self.parse(format_string): + if is_format_field(token): + # Split the token into field value and format spec + field_spec, _, format_spec = token.partition(":") - # Check for explicit type conversion - explicit, _, field_spec = field_spec.rpartition("!") + # Check for explicit type conversion + explicit, _, field_spec = field_spec.rpartition("!") - # 'first_part' is the part before the first '.' or '[' - # Assume that 'get_first_part' returns either an int or - # a string, depending on the syntax. - first_part = get_first_part(field_spec) - value = self.get_value(first_part, args, kwargs) + # 'first_part' is the part before the first '.' or '[' + # Assume that 'get_first_part' returns either an int or + # a string, depending on the syntax. + first_part = get_first_part(field_spec) + value = self.get_value(first_part, args, kwargs) - # Record the fact that we used this arg - used_args.add(first_part) + # Record the fact that we used this arg + used_args.add(first_part) - # Handle [subfield] or .subfield. Assume that 'components' - # returns an iterator of the various subfields, not including - # the first part. - for comp in components(field_spec): - value = resolve_subfield(value, comp) + # Handle [subfield] or .subfield. Assume that 'components' + # returns an iterator of the various subfields, not including + # the first part. + for comp in components(field_spec): + value = resolve_subfield(value, comp) - # Handle explicit type conversion - if explicit == 'r': - value = repr(value) - elif explicit == 's': - value = str(value) + # Handle explicit type conversion + if explicit == 'r': + value = repr(value) + elif explicit == 's': + value = str(value) - # Call the global 'format' function and write out the converted - # value. - buffer.write(self.format_field(value, format_spec)) + # Call the global 'format' function and write out the converted + # value. + buffer.write(self.format_field(value, format_spec)) - else: - buffer.write(token) + else: + buffer.write(token) - self.check_unused_args(used_args, args, kwargs) - return buffer.getvalue() + self.check_unused_args(used_args, args, kwargs) + return buffer.getvalue() - Note that the actual algorithm of the Formatter class (which will be - implemented in C) may not be the one presented here. (It's likely - that the actual implementation won't be a 'class' at all - rather, - vformat may just call a C function which accepts the other overridable - methods as arguments.) The primary purpose of this code example is to - illustrate the order in which overridable methods are called. +Note that the actual algorithm of the Formatter class (which will be +implemented in C) may not be the one presented here. (It's likely +that the actual implementation won't be a 'class' at all - rather, +vformat may just call a C function which accepts the other overridable +methods as arguments.) The primary purpose of this code example is to +illustrate the order in which overridable methods are called. Customizing Formatters +---------------------- - This section describes some typical ways that Formatter objects - can be customized. +This section describes some typical ways that Formatter objects +can be customized. - To support alternative format-string syntax, the 'vformat' method - can be overridden to alter the way format strings are parsed. +To support alternative format-string syntax, the 'vformat' method +can be overridden to alter the way format strings are parsed. - One common desire is to support a 'default' namespace, so that - you don't need to pass in keyword arguments to the format() - method, but can instead use values in a pre-existing namespace. - This can easily be done by overriding get_value() as follows: +One common desire is to support a 'default' namespace, so that +you don't need to pass in keyword arguments to the ``format()`` +method, but can instead use values in a pre-existing namespace. +This can easily be done by overriding ``get_value()`` as follows:: - class NamespaceFormatter(Formatter): - def __init__(self, namespace={}): - Formatter.__init__(self) - self.namespace = namespace + class NamespaceFormatter(Formatter): + def __init__(self, namespace={}): + Formatter.__init__(self) + self.namespace = namespace - def get_value(self, key, args, kwds): - if isinstance(key, str): - try: - # Check explicitly passed arguments first - return kwds[key] - except KeyError: - return self.namespace[key] - else: - Formatter.get_value(key, args, kwds) + def get_value(self, key, args, kwds): + if isinstance(key, str): + try: + # Check explicitly passed arguments first + return kwds[key] + except KeyError: + return self.namespace[key] + else: + Formatter.get_value(key, args, kwds) - One can use this to easily create a formatting function that allows - access to global variables, for example: +One can use this to easily create a formatting function that allows +access to global variables, for example:: - fmt = NamespaceFormatter(globals()) + fmt = NamespaceFormatter(globals()) - greeting = "hello" - print(fmt.format("{greeting}, world!")) + greeting = "hello" + print(fmt.format("{greeting}, world!")) - A similar technique can be done with the locals() dictionary to - gain access to the locals dictionary. +A similar technique can be done with the ``locals()`` dictionary to +gain access to the locals dictionary. - It would also be possible to create a 'smart' namespace formatter - that could automatically access both locals and globals through - snooping of the calling stack. Due to the need for compatibility - with the different versions of Python, such a capability will not - be included in the standard library, however it is anticipated - that someone will create and publish a recipe for doing this. +It would also be possible to create a 'smart' namespace formatter +that could automatically access both locals and globals through +snooping of the calling stack. Due to the need for compatibility +with the different versions of Python, such a capability will not +be included in the standard library, however it is anticipated +that someone will create and publish a recipe for doing this. - Another type of customization is to change the way that built-in - types are formatted by overriding the 'format_field' method. (For - non-built-in types, you can simply define a __format__ special - method on that type.) So for example, you could override the - formatting of numbers to output scientific notation when needed. +Another type of customization is to change the way that built-in +types are formatted by overriding the 'format_field' method. (For +non-built-in types, you can simply define a ``__format__`` special +method on that type.) So for example, you could override the +formatting of numbers to output scientific notation when needed. Error handling +-------------- - There are two classes of exceptions which can occur during formatting: - exceptions generated by the formatter code itself, and exceptions - generated by user code (such as a field object's 'getattr' function). +There are two classes of exceptions which can occur during formatting: +exceptions generated by the formatter code itself, and exceptions +generated by user code (such as a field object's 'getattr' function). - In general, exceptions generated by the formatter code itself are - of the "ValueError" variety -- there is an error in the actual "value" - of the format string. (This is not always true; for example, the - string.format() function might be passed a non-string as its first - parameter, which would result in a TypeError.) +In general, exceptions generated by the formatter code itself are +of the "ValueError" variety -- there is an error in the actual "value" +of the format string. (This is not always true; for example, the +``string.format()`` function might be passed a non-string as its first +parameter, which would result in a ``TypeError``.) - The text associated with these internally generated ValueError - exceptions will indicate the location of the exception inside - the format string, as well as the nature of the exception. +The text associated with these internally generated ``ValueError`` +exceptions will indicate the location of the exception inside +the format string, as well as the nature of the exception. - For exceptions generated by user code, a trace record and - dummy frame will be added to the traceback stack to help - in determining the location in the string where the exception - occurred. The inserted traceback will indicate that the - error occurred at: +For exceptions generated by user code, a trace record and +dummy frame will be added to the traceback stack to help +in determining the location in the string where the exception +occurred. The inserted traceback will indicate that the +error occurred at:: - File ";", line XX, in column_YY + File ";", line XX, in column_YY - where XX and YY represent the line and character position - information in the string, respectively. +where XX and YY represent the line and character position +information in the string, respectively. Alternate Syntax +================ - Naturally, one of the most contentious issues is the syntax of the - format strings, and in particular the markup conventions used to - indicate fields. +Naturally, one of the most contentious issues is the syntax of the +format strings, and in particular the markup conventions used to +indicate fields. - Rather than attempting to exhaustively list all of the various - proposals, I will cover the ones that are most widely used - already. +Rather than attempting to exhaustively list all of the various +proposals, I will cover the ones that are most widely used +already. - - Shell variable syntax: $name and $(name) (or in some variants, - ${name}). This is probably the oldest convention out there, and - is used by Perl and many others. When used without the braces, - the length of the variable is determined by lexically scanning - until an invalid character is found. +- Shell variable syntax: ``$name`` and ``$(name)`` (or in some variants, + ``${name}``). This is probably the oldest convention out there, and + is used by Perl and many others. When used without the braces, + the length of the variable is determined by lexically scanning + until an invalid character is found. - This scheme is generally used in cases where interpolation is - implicit - that is, in environments where any string can contain - interpolation variables, and no special substitution function - need be invoked. In such cases, it is important to prevent the - interpolation behavior from occurring accidentally, so the '$' - (which is otherwise a relatively uncommonly-used character) is - used to signal when the behavior should occur. + This scheme is generally used in cases where interpolation is + implicit - that is, in environments where any string can contain + interpolation variables, and no special substitution function + need be invoked. In such cases, it is important to prevent the + interpolation behavior from occurring accidentally, so the '$' + (which is otherwise a relatively uncommonly-used character) is + used to signal when the behavior should occur. - It is the author's opinion, however, that in cases where the - formatting is explicitly invoked, that less care needs to be - taken to prevent accidental interpolation, in which case a - lighter and less unwieldy syntax can be used. + It is the author's opinion, however, that in cases where the + formatting is explicitly invoked, that less care needs to be + taken to prevent accidental interpolation, in which case a + lighter and less unwieldy syntax can be used. - - printf and its cousins ('%'), including variations that add a - field index, so that fields can be interpolated out of order. +- printf and its cousins ('%'), including variations that add a + field index, so that fields can be interpolated out of order. - - Other bracket-only variations. Various MUDs (Multi-User - Dungeons) such as MUSH have used brackets (e.g. [name]) to do - string interpolation. The Microsoft .Net libraries uses braces - ({}), and a syntax which is very similar to the one in this - proposal, although the syntax for format specifiers is quite - different. [4] +- Other bracket-only variations. Various MUDs (Multi-User + Dungeons) such as MUSH have used brackets (e.g. ``[name]``) to do + string interpolation. The Microsoft .Net libraries uses braces + (``{}``), and a syntax which is very similar to the one in this + proposal, although the syntax for format specifiers is quite + different. [4]_ - - Backquoting. This method has the benefit of minimal syntactical - clutter, however it lacks many of the benefits of a function - call syntax (such as complex expression arguments, custom - formatters, etc.). +- Backquoting. This method has the benefit of minimal syntactical + clutter, however it lacks many of the benefits of a function + call syntax (such as complex expression arguments, custom + formatters, etc.). - - Other variations include Ruby's #{}, PHP's {$name}, and so - on. +- Other variations include Ruby's ``#{}``, PHP's ``{$name}``, and so + on. - Some specific aspects of the syntax warrant additional comments: +Some specific aspects of the syntax warrant additional comments: - 1) Backslash character for escapes. The original version of - this PEP used backslash rather than doubling to escape a bracket. - This worked because backslashes in Python string literals that - don't conform to a standard backslash sequence such as '\n' - are left unmodified. However, this caused a certain amount - of confusion, and led to potential situations of multiple - recursive escapes, i.e. '\\\\{' to place a literal backslash - in front of a bracket. +1) Backslash character for escapes. The original version of +this PEP used backslash rather than doubling to escape a bracket. +This worked because backslashes in Python string literals that +don't conform to a standard backslash sequence such as ``\n`` +are left unmodified. However, this caused a certain amount +of confusion, and led to potential situations of multiple +recursive escapes, i.e. ``\\\\{`` to place a literal backslash +in front of a bracket. - 2) The use of the colon character (':') as a separator for - format specifiers. This was chosen simply because that's - what .Net uses. +2) The use of the colon character (':') as a separator for +format specifiers. This was chosen simply because that's +what .Net uses. Alternate Feature Proposals +=========================== - Restricting attribute access: An earlier version of the PEP - restricted the ability to access attributes beginning with a - leading underscore, for example "{0}._private". However, this - is a useful ability to have when debugging, so the feature - was dropped. +Restricting attribute access: An earlier version of the PEP +restricted the ability to access attributes beginning with a +leading underscore, for example "{0}._private". However, this +is a useful ability to have when debugging, so the feature +was dropped. - Some developers suggested that the ability to do 'getattr' and - 'getitem' access should be dropped entirely. However, this - is in conflict with the needs of another set of developers who - strongly lobbied for the ability to pass in a large dict as a - single argument (without flattening it into individual keyword - arguments using the **kwargs syntax) and then have the format - string refer to dict entries individually. +Some developers suggested that the ability to do 'getattr' and +'getitem' access should be dropped entirely. However, this +is in conflict with the needs of another set of developers who +strongly lobbied for the ability to pass in a large dict as a +single argument (without flattening it into individual keyword +arguments using the ``**kwargs`` syntax) and then have the format +string refer to dict entries individually. - There has also been suggestions to expand the set of expressions - that are allowed in a format string. However, this was seen - to go against the spirit of TOOWTDI, since the same effect can - be achieved in most cases by executing the same expression on - the parameter before it's passed in to the formatting function. - For cases where the format string is being use to do arbitrary - formatting in a data-rich environment, it's recommended to use - a template engine specialized for this purpose, such as - Genshi [5] or Cheetah [6]. +There has also been suggestions to expand the set of expressions +that are allowed in a format string. However, this was seen +to go against the spirit of TOOWTDI, since the same effect can +be achieved in most cases by executing the same expression on +the parameter before it's passed in to the formatting function. +For cases where the format string is being use to do arbitrary +formatting in a data-rich environment, it's recommended to use +a template engine specialized for this purpose, such as +Genshi [5]_ or Cheetah [6]_. - Many other features were considered and rejected because they - could easily be achieved by subclassing Formatter instead of - building the feature into the base implementation. This includes - alternate syntax, comments in format strings, and many others. +Many other features were considered and rejected because they +could easily be achieved by subclassing ``Formatter`` instead of +building the feature into the base implementation. This includes +alternate syntax, comments in format strings, and many others. Security Considerations +======================= - Historically, string formatting has been a common source of - security holes in web-based applications, particularly if the - string formatting system allows arbitrary expressions to be - embedded in format strings. +Historically, string formatting has been a common source of +security holes in web-based applications, particularly if the +string formatting system allows arbitrary expressions to be +embedded in format strings. - The best way to use string formatting in a way that does not - create potential security holes is to never use format strings - that come from an untrusted source. +The best way to use string formatting in a way that does not +create potential security holes is to never use format strings +that come from an untrusted source. - Barring that, the next best approach is to ensure that string - formatting has no side effects. Because of the open nature of - Python, it is impossible to guarantee that any non-trivial - operation has this property. What this PEP does is limit the - types of expressions in format strings to those in which visible - side effects are both rare and strongly discouraged by the - culture of Python developers. So for example, attribute access - is allowed because it would be considered pathological to write - code where the mere access of an attribute has visible side - effects (whether the code has *invisible* side effects - such - as creating a cache entry for faster lookup - is irrelevant.) +Barring that, the next best approach is to ensure that string +formatting has no side effects. Because of the open nature of +Python, it is impossible to guarantee that any non-trivial +operation has this property. What this PEP does is limit the +types of expressions in format strings to those in which visible +side effects are both rare and strongly discouraged by the +culture of Python developers. So for example, attribute access +is allowed because it would be considered pathological to write +code where the mere access of an attribute has visible side +effects (whether the code has **invisible** side effects - such +as creating a cache entry for faster lookup - is irrelevant.) Sample Implementation +===================== - An implementation of an earlier version of this PEP was created by - Patrick Maupin and Eric V. Smith, and can be found in the pep3101 - sandbox at: +An implementation of an earlier version of this PEP was created by +Patrick Maupin and Eric V. Smith, and can be found in the pep3101 +sandbox at: - http://svn.python.org/view/sandbox/trunk/pep3101/ + http://svn.python.org/view/sandbox/trunk/pep3101/ Backwards Compatibility +======================= - Backwards compatibility can be maintained by leaving the existing - mechanisms in place. The new system does not collide with any of - the method names of the existing string formatting techniques, so - both systems can co-exist until it comes time to deprecate the - older system. +Backwards compatibility can be maintained by leaving the existing +mechanisms in place. The new system does not collide with any of +the method names of the existing string formatting techniques, so +both systems can co-exist until it comes time to deprecate the +older system. References +========== - [1] Python Library Reference - String formating operations - http://docs.python.org/library/stdtypes.html#string-formatting-operations +.. [1] Python Library Reference - String formating operations + http://docs.python.org/library/stdtypes.html#string-formatting-operations - [2] Python Library References - Template strings - http://docs.python.org/library/string.html#string.Template +.. [2] Python Library References - Template strings + http://docs.python.org/library/string.html#string.Template - [3] [Python-3000] String formating operations in python 3k - https://mail.python.org/pipermail/python-3000/2006-April/000285.html +.. [3] [Python-3000] String formating operations in python 3k + https://mail.python.org/pipermail/python-3000/2006-April/000285.html - [4] Composite Formatting - [.Net Framework Developer's Guide] - http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true +.. [4] Composite Formatting - [.Net Framework Developer's Guide] + http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true - [5] Genshi templating engine. - http://genshi.edgewall.org/ +.. [5] Genshi templating engine. + http://genshi.edgewall.org/ - [5] Cheetah - The Python-Powered Template Engine. - http://www.cheetahtemplate.org/ +.. [6] Cheetah - The Python-Powered Template Engine. + http://www.cheetahtemplate.org/ Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: