2006-04-26 16:33:25 -04:00
|
|
|
|
PEP: 3101
|
|
|
|
|
Title: Advanced String Formatting
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Talin <talin at acm.org>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards
|
|
|
|
|
Content-Type: text/plain
|
|
|
|
|
Created: 16-Apr-2006
|
|
|
|
|
Python-Version: 3.0
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP proposes a new system for built-in string formatting
|
|
|
|
|
operations, intended as a replacement for the existing '%' string
|
|
|
|
|
formatting operator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
Python currently provides two methods of string interpolation:
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
- The '%' operator for strings. [1]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
- The string.Template module. [2]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
The scope of this PEP will be restricted to proposals for built-in
|
|
|
|
|
string formatting operations (in other words, methods of the
|
2006-04-27 12:53:54 -04:00
|
|
|
|
built-in string type).
|
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
The '%' operator is primarily limited by the fact that it is a
|
|
|
|
|
binary operator, and therefore can take at most two arguments.
|
|
|
|
|
One of those arguments is already dedicated to the format string,
|
|
|
|
|
leaving all other variables to be squeezed into the remaining
|
|
|
|
|
argument. The current practice is to use either a dictionary or a
|
|
|
|
|
tuple as the second argument, but as many people have commented
|
2006-04-27 12:53:54 -04:00
|
|
|
|
[3], this lacks flexibility. The "all or nothing" approach
|
2006-04-26 16:33:25 -04:00
|
|
|
|
(meaning that one must choose between only positional arguments,
|
|
|
|
|
or only named arguments) is felt to be overly constraining.
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
While there is some overlap between this proposal and
|
|
|
|
|
string.Template, it is felt that each serves a distinct need,
|
|
|
|
|
and that one does not obviate the other. In any case,
|
|
|
|
|
string.Template will not be discussed here.
|
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
|
|
|
|
|
The specification will consist of 4 parts:
|
|
|
|
|
|
|
|
|
|
- Specification of a set of methods to be added to the built-in
|
|
|
|
|
string class.
|
|
|
|
|
|
|
|
|
|
- Specification of a new syntax for format strings.
|
|
|
|
|
|
|
|
|
|
- Specification of a new set of class methods to control the
|
|
|
|
|
formatting and conversion of objects.
|
|
|
|
|
|
|
|
|
|
- Specification of an API for user-defined formatting classes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
String Methods
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
The build-in string class will gain a new method, 'format',
|
|
|
|
|
which takes takes an arbitrary number of positional and keyword
|
|
|
|
|
arguments:
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
"The story of {0}, {1}, and {c}".format(a, b, c=d)
|
|
|
|
|
|
|
|
|
|
Within a format string, each positional argument is identified
|
|
|
|
|
with a number, starting from zero, so in the above example, 'a' is
|
|
|
|
|
argument 0 and 'b' is argument 1. Each keyword argument is
|
|
|
|
|
identified by its keyword name, so in the above example, 'c' is
|
|
|
|
|
used to refer to the third argument.
|
|
|
|
|
|
|
|
|
|
The result of the format call is an object of the same type
|
|
|
|
|
(string or unicode) as the format string.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Format Strings
|
|
|
|
|
|
|
|
|
|
Brace characters ('curly braces') are used to indicate a
|
|
|
|
|
replacement field within the string:
|
|
|
|
|
|
|
|
|
|
"My name is {0}".format('Fred')
|
|
|
|
|
|
|
|
|
|
The result of this is the string:
|
|
|
|
|
|
|
|
|
|
"My name is Fred"
|
|
|
|
|
|
|
|
|
|
Braces can be escaped using a backslash:
|
|
|
|
|
|
|
|
|
|
"My name is {0} :-\{\}".format('Fred')
|
|
|
|
|
|
|
|
|
|
Which would produce:
|
|
|
|
|
|
|
|
|
|
"My name is Fred :-{}"
|
|
|
|
|
|
|
|
|
|
The element within the braces is called a 'field'. Fields consist
|
|
|
|
|
of a name, which can either be simple or compound, and an optional
|
|
|
|
|
'conversion specifier'.
|
|
|
|
|
|
|
|
|
|
Simple names are either names or numbers. If numbers, they must
|
|
|
|
|
be valid decimal numbers; if names, they must be valid Python
|
|
|
|
|
identifiers. A number is used to identify a positional argument,
|
|
|
|
|
while a name is used to identify a keyword argument.
|
|
|
|
|
|
|
|
|
|
Compound names are a sequence of simple names seperated by
|
|
|
|
|
periods:
|
|
|
|
|
|
|
|
|
|
"My name is {0.name} :-\{\}".format(dict(name='Fred'))
|
|
|
|
|
|
|
|
|
|
Compound names can be used to access specific dictionary entries,
|
|
|
|
|
array elements, or object attributes. In the above example, the
|
|
|
|
|
'{0.name}' field refers to the dictionary entry 'name' within
|
|
|
|
|
positional argument 0.
|
|
|
|
|
|
|
|
|
|
Each field can also specify an optional set of 'conversion
|
|
|
|
|
specifiers'. Conversion specifiers follow the field name, with a
|
|
|
|
|
colon (':') character separating the two:
|
|
|
|
|
|
|
|
|
|
"My name is {0:8}".format('Fred')
|
|
|
|
|
|
|
|
|
|
The meaning and syntax of the conversion specifiers depends on the
|
|
|
|
|
type of object that is being formatted, however many of the
|
|
|
|
|
built-in types will recognize a standard set of conversion
|
|
|
|
|
specifiers.
|
|
|
|
|
|
|
|
|
|
The conversion specifier consists of a sequence of zero or more
|
|
|
|
|
characters, each of which can consist of any printable character
|
|
|
|
|
except for a non-escaped '}'. The format() method does not
|
|
|
|
|
attempt to intepret the conversion specifiers in any way; it
|
|
|
|
|
merely passes all of the characters between the first colon ':'
|
|
|
|
|
and the matching right brace ('}') to the various underlying
|
|
|
|
|
formatters (described later.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Standard Conversion Specifiers
|
|
|
|
|
|
|
|
|
|
For most built-in types, the conversion specifiers will be the
|
|
|
|
|
same or similar to the existing conversion specifiers used with
|
|
|
|
|
the '%' operator. Thus, instead of '%02.2x", you will say
|
|
|
|
|
'{0:2.2x}'.
|
|
|
|
|
|
|
|
|
|
There are a few differences however:
|
|
|
|
|
|
|
|
|
|
- The trailing letter is optional - you don't need to say '2.2d',
|
|
|
|
|
you can instead just say '2.2'. If the letter is omitted, the
|
|
|
|
|
value will be converted into its 'natural' form (that is, the
|
|
|
|
|
form that it take if str() or unicode() were called on it)
|
|
|
|
|
subject to the field length and precision specifiers (if
|
|
|
|
|
supplied).
|
|
|
|
|
|
|
|
|
|
- Variable field width specifiers use a nested version of the {}
|
|
|
|
|
syntax, allowing the width specifier to be either a positional
|
|
|
|
|
or keyword argument:
|
|
|
|
|
|
|
|
|
|
"{0:{1}.{2}d}".format(a, b, c)
|
|
|
|
|
|
|
|
|
|
(Note: It might be easier to parse if these used a different
|
|
|
|
|
type of delimiter, such as parens - avoiding the need to create
|
|
|
|
|
a regex that handles the recursive case.)
|
|
|
|
|
|
|
|
|
|
- The support for length modifiers (which are ignored by Python
|
|
|
|
|
anyway) is dropped.
|
|
|
|
|
|
|
|
|
|
For non-built-in types, the conversion specifiers will be specific
|
|
|
|
|
to that type. An example is the 'datetime' class, whose
|
|
|
|
|
conversion specifiers are identical to the arguments to the
|
|
|
|
|
strftime() function:
|
|
|
|
|
|
|
|
|
|
"Today is: {0:%x}".format(datetime.now())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Controlling Formatting
|
|
|
|
|
|
|
|
|
|
A class that wishes to implement a custom interpretation of its
|
|
|
|
|
conversion specifiers can implement a __format__ method:
|
|
|
|
|
|
|
|
|
|
class AST:
|
|
|
|
|
def __format__(self, specifiers):
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
The 'specifiers' argument will be either a string object or a
|
|
|
|
|
unicode object, depending on the type of the original format
|
|
|
|
|
string. The __format__ method should test the type of the
|
|
|
|
|
specifiers parameter to determine whether to return a string or
|
|
|
|
|
unicode object. It is the responsibility of the __format__ method
|
|
|
|
|
to return an object of the proper type.
|
|
|
|
|
|
|
|
|
|
string.format() will format each field using the following steps:
|
|
|
|
|
|
|
|
|
|
1) See if the value to be formatted has a __format__ method. If
|
|
|
|
|
it does, then call it.
|
|
|
|
|
|
|
|
|
|
2) Otherwise, check the internal formatter within string.format
|
|
|
|
|
that contains knowledge of certain builtin types.
|
|
|
|
|
|
|
|
|
|
3) Otherwise, call str() or unicode() as appropriate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
User-Defined Formatting Classes
|
|
|
|
|
|
|
|
|
|
The code that interprets format strings can be called explicitly
|
|
|
|
|
from user code. This allows the creation of custom formatter
|
|
|
|
|
classes that can override the normal formatting rules.
|
|
|
|
|
|
|
|
|
|
The string and unicode classes will have a class method called
|
|
|
|
|
'cformat' that does all the actual work of formatting; The
|
|
|
|
|
format() method is just a wrapper that calls cformat.
|
|
|
|
|
|
|
|
|
|
The parameters to the cformat function are:
|
|
|
|
|
|
|
|
|
|
-- The format string (or unicode; the same function handles
|
|
|
|
|
both.)
|
|
|
|
|
-- A field format hook (see below)
|
|
|
|
|
-- A tuple containing the positional arguments
|
|
|
|
|
-- A dict containing the keyword arguments
|
|
|
|
|
|
|
|
|
|
The cformat function will parse all of the fields in the format
|
|
|
|
|
string, and return a new string (or unicode) with all of the
|
|
|
|
|
fields replaced with their formatted values.
|
|
|
|
|
|
|
|
|
|
For each field, the cformat function will attempt to call the
|
|
|
|
|
field format hook with the following arguments:
|
|
|
|
|
|
|
|
|
|
field_hook(value, conversion, buffer)
|
|
|
|
|
|
|
|
|
|
The 'value' field corresponds to the value being formatted, which
|
|
|
|
|
was retrieved from the arguments using the field name. (The
|
|
|
|
|
field_hook has no control over the selection of values, only
|
|
|
|
|
how they are formatted.)
|
|
|
|
|
|
|
|
|
|
The 'conversion' argument is the conversion spec part of the
|
|
|
|
|
field, which will be either a string or unicode object, depending
|
|
|
|
|
on the type of the original format string.
|
|
|
|
|
|
|
|
|
|
The 'buffer' argument is a Python array object, either a byte
|
|
|
|
|
array or unicode character array. The buffer object will contain
|
|
|
|
|
the partially constructed string; the field hook is free to modify
|
|
|
|
|
the contents of this buffer if needed.
|
|
|
|
|
|
|
|
|
|
The field_hook will be called once per field. The field_hook may
|
|
|
|
|
take one of two actions:
|
|
|
|
|
|
|
|
|
|
1) Return False, indicating that the field_hook will not
|
|
|
|
|
process this field and the default formatting should be
|
|
|
|
|
used. This decision should be based on the type of the
|
|
|
|
|
value object, and the contents of the conversion string.
|
|
|
|
|
|
|
|
|
|
2) Append the formatted field to the buffer, and return True.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alternate Syntax
|
|
|
|
|
|
|
|
|
|
Naturally, one of the most contentious issues is the syntax of the
|
|
|
|
|
format strings, and in particular the markup conventions used to
|
|
|
|
|
indicate fields.
|
|
|
|
|
|
|
|
|
|
Rather than attempting to exhaustively list all of the various
|
|
|
|
|
proposals, I will cover the ones that are most widely used
|
|
|
|
|
already.
|
|
|
|
|
|
|
|
|
|
- Shell variable syntax: $name and $(name) (or in some variants,
|
|
|
|
|
${name}). This is probably the oldest convention out there, and
|
|
|
|
|
is used by Perl and many others. When used without the braces,
|
|
|
|
|
the length of the variable is determined by lexically scanning
|
|
|
|
|
until an invalid character is found.
|
|
|
|
|
|
|
|
|
|
This scheme is generally used in cases where interpolation is
|
|
|
|
|
implicit - that is, in environments where any string can contain
|
|
|
|
|
interpolation variables, and no special subsitution function
|
|
|
|
|
need be invoked. In such cases, it is important to prevent the
|
|
|
|
|
interpolation behavior from occuring accidentally, so the '$'
|
|
|
|
|
(which is otherwise a relatively uncommonly-used character) is
|
|
|
|
|
used to signal when the behavior should occur.
|
|
|
|
|
|
|
|
|
|
It is the author's opinion, however, that in cases where the
|
|
|
|
|
formatting is explicitly invoked, that less care needs to be
|
|
|
|
|
taken to prevent accidental interpolation, in which case a
|
|
|
|
|
lighter and less unwieldy syntax can be used.
|
|
|
|
|
|
|
|
|
|
- Printf and its cousins ('%'), including variations that add a
|
|
|
|
|
field index, so that fields can be interpolated out of order.
|
|
|
|
|
|
|
|
|
|
- Other bracket-only variations. Various MUDs (Multi-User
|
|
|
|
|
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
|
|
|
|
|
string interpolation. The Microsoft .Net libraries uses braces
|
|
|
|
|
({}), and a syntax which is very similar to the one in this
|
|
|
|
|
proposal, although the syntax for conversion specifiers is quite
|
2006-04-27 12:53:54 -04:00
|
|
|
|
different. [4]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
|
|
|
|
|
- Backquoting. This method has the benefit of minimal syntactical
|
|
|
|
|
clutter, however it lacks many of the benefits of a function
|
|
|
|
|
call syntax (such as complex expression arguments, custom
|
|
|
|
|
formatters, etc.).
|
|
|
|
|
|
|
|
|
|
- Other variations include Ruby's #{}, PHP's {$name}, and so
|
|
|
|
|
on.
|
|
|
|
|
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
Sample Implementation
|
|
|
|
|
|
|
|
|
|
A rought prototype of the underlying 'cformat' function has been
|
|
|
|
|
coded in Python, however it needs much refinement before being
|
|
|
|
|
submitted.
|
|
|
|
|
|
|
|
|
|
|
2006-04-26 16:33:25 -04:00
|
|
|
|
Backwards Compatibility
|
|
|
|
|
|
|
|
|
|
Backwards compatibility can be maintained by leaving the existing
|
|
|
|
|
mechanisms in place. The new system does not collide with any of
|
|
|
|
|
the method names of the existing string formatting techniques, so
|
|
|
|
|
both systems can co-exist until it comes time to deprecate the
|
|
|
|
|
older system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
[1] Python Library Reference - String formating operations
|
|
|
|
|
http://docs.python.org/lib/typesseq-strings.html
|
|
|
|
|
|
|
|
|
|
[2] Python Library References - Template strings
|
|
|
|
|
http://docs.python.org/lib/node109.html
|
|
|
|
|
|
|
|
|
|
[3] [Python-3000] String formating operations in python 3k
|
2006-04-26 16:33:25 -04:00
|
|
|
|
http://mail.python.org/pipermail/python-3000/2006-April/000285.html
|
|
|
|
|
|
2006-04-27 12:53:54 -04:00
|
|
|
|
[4] Composite Formatting - [.Net Framework Developer's Guide]
|
2006-04-26 16:33:25 -04:00
|
|
|
|
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|