python-peps/pep-3101.txt

PEP: 3101
Title: Advanced String Formatting
Version: $Revision$
Last-Modified: $Date$
Author: Talin <talin at acm.org>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 16-Apr-2006
Python-Version: 3.0
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006


Abstract

    This PEP proposes a new system for built-in string formatting
    operations, intended as a replacement for the existing '%' string
    formatting operator.


Rationale

    Python currently provides two methods of string interpolation:

    - The '%' operator for strings. [1]

    - The string.Template module. [2]

    The primary scope of this PEP concerns proposals for built-in
    string formatting operations (in other words, methods of the
    built-in string type).

    The '%' operator is primarily limited by the fact that it is a
    binary operator, and therefore can take at most two arguments.
    One of those arguments is already dedicated to the format string,
    leaving all other variables to be squeezed into the remaining
    argument.  The current practice is to use either a dictionary or a
    tuple as the second argument, but as many people have commented
    [3], this lacks flexibility.  The "all or nothing" approach
    (meaning that one must choose between only positional arguments,
    or only named arguments) is felt to be overly constraining.

    While there is some overlap between this proposal and
    string.Template, it is felt that each serves a distinct need,
    and that one does not obviate the other.  This proposal is for
    a mechanism which, like '%', is efficient for small strings
    which are only used once, so, for example, compilation of a
    string into a template is not contemplated in this proposal,
    although the proposal does take care to define format strings
    and the API in such a way that an efficient template package
    could reuse the syntax and even some of the underlying
    formatting code.


Specification

    The specification will consist of the following parts:

    - Specification of a new formatting method to be added to the
      built-in string class.

    - Specification of functions and flag values to be added to
      the string module, so that the underlying formatting engine
      can be used with additional options.

    - Specification of a new syntax for format strings.

    - Specification of a new set of special methods to control the
      formatting and conversion of objects.

    - Specification of an API for user-defined formatting classes.

    - Specification of how formatting errors are handled.

    Note on string encodings: When discussing this PEP in the context
    of Python 3.0, it is assumed that all strings are unicode strings,
    and that the use of the word 'string' in the context of this
    document will generally refer to a Python 3.0 string, which is
    the same as Python 2.x unicode object.

    In the context of Python 2.x, the use of the word 'string' in this
    document refers to an object which may either be a regular string
    or a unicode object.  All of the function call interfaces
    described in this PEP can be used for both strings and unicode
    objects, and in all cases there is sufficient information
    to be able to properly deduce the output string type (in
    other words, there is no need for two separate APIs).
    In all cases, the type of the format string dominates - that
    is, the result of the conversion will always result in an object
    that contains the same representation of characters as the
    input format string.


String Methods

    The built-in string class (and also the unicode class in 2.6) will
    gain a new method, 'format', which takes an arbitrary number of
    positional and keyword arguments:

        "The story of {0}, {1}, and {c}".format(a, b, c=d)

    Within a format string, each positional argument is identified
    with a number, starting from zero, so in the above example, 'a' is
    argument 0 and 'b' is argument 1.  Each keyword argument is
    identified by its keyword name, so in the above example, 'c' is
    used to refer to the third argument.


Format Strings

    Format strings consist of intermingled character data and markup.

    Character data is data which is transferred unchanged from the
    format string to the output string; markup is not transferred from
    the format string directly to the output, but instead is used to
    define 'replacement fields' that describes to the format engine
    what should be placed in the output string in the place of the
    markup.

    Brace characters ('curly braces') are used to indicate a
    replacement field within the string:

        "My name is {0}".format('Fred')

    The result of this is the string:

        "My name is Fred"

    Braces can be escaped by doubling:

        "My name is {0} :-{{}}".format('Fred')

    Which would produce:

        "My name is Fred :-{}"

    The element within the braces is called a 'field'.  Fields consist
    of a 'field name', which can either be simple or compound, and an
    optional 'conversion specifier'.


Simple and Compound Field Names

    Simple field names are either names or numbers. If numbers, they
    must be valid base-10 integers; if names, they must be valid
    Python identifiers.  A number is used to identify a positional
    argument, while a name is used to identify a keyword argument.

    A compound field name is a combination of multiple simple field
    names in an expression:

        "My name is {0.name}".format(file('out.txt'))

    This example shows the use of the 'getattr' or 'dot' operator
    in a field expression. The dot operator allows an attribute of
    an input value to be specified as the field value.

    The types of expressions that can be used in a compound name
    have been deliberately limited in order to prevent potential
    security exploits resulting from the ability to place arbitrary
    Python expressions inside of strings. Only two operators are
    supported, the '.' (getattr) operator, and the '[]' (getitem)
    operator.

    Another limitation that is defined to limit potential security
    issues is that field names or attribute names beginning with an
    underscore are disallowed. This enforces the common convention
    that names beginning with an underscore are 'private'.

    An example of the 'getitem' syntax:

        "My name is {0[name]}".format(dict(name='Fred'))

    It should be noted that the use of 'getitem' within a string is
    much more limited than its normal use. In the above example, the
    string 'name' really is the literal string 'name', not a variable
    named 'name'. The rules for parsing an item key are very simple.
    If it starts with a digit, then its treated as a number, otherwise
    it is used as a string.

    It is not possible to specify arbitrary dictionary keys from
    within a format string.

    Implementation note:  The implementation of this proposal is
    not required to enforce the rule about a name being a valid
    Python identifier.  Instead, it will rely on the getattr function
    of the underlying object to throw an exception if the identifier
    is not legal.  The format function will have a minimalist parser
    which only attempts to figure out when it is "done" with an
    identifier (by finding a '.' or a ']', or '}', etc.)  The only
    exception to this laissez-faire approach is that, by default,
    strings are not allowed to have leading underscores.


Conversion Specifiers

    Each field can also specify an optional set of 'conversion
    specifiers' which can be used to adjust the format of that field.
    Conversion specifiers follow the field name, with a colon (':')
    character separating the two:

        "My name is {0:8}".format('Fred')

    The meaning and syntax of the conversion specifiers depends on the
    type of object that is being formatted, however there is a
    standard set of conversion specifiers used for any object that
    does not override them.

    Conversion specifiers can themselves contain replacement fields.
    For example, a field whose field width is itself a parameter
    could be specified via:

        "{0:{1}}".format(a, b, c)

    Note that the doubled '}' at the end, which would normally be
    escaped, is not escaped in this case.  The reason is because
    the '{{' and '}}' syntax for escapes is only applied when used
    *outside* of a format field. Within a format field, the brace
    characters always have their normal meaning.

    The syntax for conversion specifiers is open-ended, since a class
    can override the standard conversion specifiers. In such cases,
    the format() method merely passes all of the characters between
    the first colon and the matching brace to the relevant underlying
    formatting method.


Standard Conversion Specifiers

    If an object does not define its own conversion specifiers, a
    standard set of conversion specifiers are used.  These are similar
    in concept to the conversion specifiers used by the existing '%'
    operator, however there are also a number of significant
    differences.  The standard conversion specifiers fall into three
    major categories: string conversions, integer conversions and
    floating point conversions.

    The general form of a standard conversion specifier is:

        [[fill]align][sign][width][.precision][type]

    The brackets ([]) indicate an optional element.

    Then the optional align flag can be one of the following:

        '<' - Forces the field to be left-aligned within the available
              space (This is the default.)
        '>' - Forces the field to be right-aligned within the
              available space.
        '=' - Forces the padding to be placed after the sign (if any)
              but before the digits. This is used for printing fields
              in the form '+000000120'.
        '^' - Forces the field to be centered within the available
              space.

    Note that unless a minimum field width is defined, the field
    width will always be the same size as the data to fill it, so
    that the alignment option has no meaning in this case.

    The optional 'fill' character defines the character to be used to
    pad the field to the minimum width.  The alignment flag must be
    supplied if the character is a number other than 0 (otherwise the
    character would be interpreted as part of the field width
    specifier). A zero fill character without an alignment flag
    implies an alignment type of '='.

    The 'sign' element can be one of the following:

        '+'  - indicates that a sign should be used for both
               positive as well as negative numbers
        '-'  - indicates that a sign should be used only for negative
               numbers (this is the default behaviour)
        ' '  - indicates that a leading space should be used on
               positive numbers
        '()' - indicates that negative numbers should be surrounded
               by parentheses

    'width' is a decimal integer defining the minimum field width. If
    not specified, then the field width will be determined by the
    content.

    The 'precision' is a decimal number indicating how many digits
    should be displayed after the decimal point in a floating point
    conversion. In a string conversion the field indicates how many
    characters will be used from the field content. The precision is
    ignored for integer conversions.

    Finally, the 'type' determines how the data should be presented.
    If the type field is absent, an appropriate type will be assigned
    based on the value to be formatted ('d' for integers and longs,
    'g' for floats, and 's' for everything else.)

    The available string conversion types are:

        's' - String format. Invokes str() on the object.
              This is the default conversion specifier type.
        'r' - Repr format. Invokes repr() on the object.

    There are several integer conversion types. All invoke int() on
    the object before attempting to format it.

    The available integer conversion types are:

        'b' - Binary. Outputs the number in base 2.
        'c' - Character. Converts the integer to the corresponding
              unicode character before printing.
        'd' - Decimal Integer. Outputs the number in base 10.
        'o' - Octal format. Outputs the number in base 8.
        'x' - Hex format. Outputs the number in base 16, using lower-
              case letters for the digits above 9.
        'X' - Hex format. Outputs the number in base 16, using upper-
              case letters for the digits above 9.

    There are several floating point conversion types. All invoke
    float() on the object before attempting to format it.

    The available floating point conversion types are:

        'e' - Exponent notation. Prints the number in scientific
              notation using the letter 'e' to indicate the exponent.
        'E' - Exponent notation. Same as 'e' except it uses an upper
              case 'E' as the separator character.
        'f' - Fixed point. Displays the number as a fixed-point
              number.
        'F' - Fixed point. Same as 'f'.
        'g' - General format. This prints the number as a fixed-point
              number, unless the number is too large, in which case
              it switches to 'e' exponent notation.
        'G' - General format. Same as 'g' except switches to 'E'
              if the number gets to large.
        'n' - Number. This is the same as 'g', except that it uses the
              current locale setting to insert the appropriate
              number separator characters.
        '%' - Percentage. Multiplies the number by 100 and displays
              in fixed ('f') format, followed by a percent sign.

    Objects are able to define their own conversion specifiers to
    replace the standard ones.  An example is the 'datetime' class,
    whose conversion specifiers might look something like the
    arguments to the strftime() function:

        "Today is: {0:a b d H:M:S Y}".format(datetime.now())


Controlling Formatting on a Per-Type Basis

    A class that wishes to implement a custom interpretation of its
    conversion specifiers can implement a __format__ method:

    class AST:
        def __format__(self, specifiers):
            ...

    The 'specifiers' argument will be either a string object or a
    unicode object, depending on the type of the original format
    string.  The __format__ method should test the type of the
    specifiers parameter to determine whether to return a string or
    unicode object.  It is the responsibility of the __format__ method
    to return an object of the proper type.

    string.format() will format each field using the following steps:

     1) See if the value to be formatted has a __format__ method.  If
        it does, then call it.

     2) Otherwise, check the internal formatter within string.format
        that contains knowledge of certain builtin types.

     3) Otherwise, call str() or unicode() as appropriate.


User-Defined Formatting

    There will be times when customizing the formatting of fields
    on a per-type basis is not enough.  An example might be a
    spreadsheet application, which displays hash marks '#' when a value
    is too large to fit in the available space.

    For more powerful and flexible formatting, access to the underlying
    format engine can be obtained through the 'Formatter' class that
    lives in the 'string' module.  This class takes additional options
    which are not accessible via the normal str.format method.

    An application can create their own Formatter instance which has
    customized behavior, either by setting the properties of the
    Formatter instance, or by subclassing the Formatter class.

    The PEP does not attempt to exactly specify all methods and
    properties defined by the Formatter class; Instead, those will be
    defined and documented in the initial implementation. However, this
    PEP will specify the general requirements for the Formatter class,
    which are listed below.


Formatter Creation and Initialization

    The Formatter class takes a single initialization argument, 'flags':

        Formatter(flags=0)

    The 'flags' argument is used to control certain subtle behavioral
    differences in formatting that would be cumbersome to change via
    subclassing. The flags values are defined as static variables
    in the "Formatter" class:

        Formatter.ALLOW_LEADING_UNDERSCORES

            By default, leading underscores are not allowed in identifier
            lookups (getattr or getitem).  Setting this flag will allow
            this.

        Formatter.CHECK_UNUSED_POSITIONAL

            If this flag is set, the any positional arguments which are
            supplied to the 'format' method but which are not used by
            the format string will cause an error.

        Formatter.CHECK_UNUSED_NAME

            If this flag is set, the any named arguments which are
            supplied to the 'format' method but which are not used by
            the format string will cause an error.


Formatter Methods

    The methods of class Formatter are as follows:

        -- format(format_string, *args, **kwargs)
        -- vformat(format_string, args, kwargs)
        -- get_positional(args, index)
        -- get_named(kwds, name)
        -- format_field(value, conversion)

    'format' is the primary API method. It takes a format template,
    and an arbitrary set of positional and keyword argument. 'format'
    is just a wrapper that calls 'vformat'.

    'vformat' is the function that does the actual work of formatting. It
    is exposed as a separate function for cases where you want to pass in
    a predefined dictionary of arguments, rather than unpacking and
    repacking the dictionary as individual arguments using the '*args' and
    '**kwds' syntax. 'vformat' does the work of breaking up the format
    template string into character data and replacement fields. It calls
    the 'get_positional' and 'get_index' methods as appropriate.

    Note that the checking of unused arguments, and the restriction on
    leading underscores in attribute names are also done in this function.

    'get_positional' and 'get_named' are used to retrieve a given field
    value. For compound field names, these functions are only called for
    the first component of the field name; Subsequent components are
    handled through normal attribute and indexing operations. So for
    example, the field expression '0.name' would cause 'get_positional' to
    be called with the list of positional arguments and a numeric index of
    0, and then the standard 'getattr' function would be called to get the
    'name' attribute of the result.

    If the index or keyword refers to an item that does not exist, then an
    IndexError/KeyError will be raised.

    'format_field' actually generates the text for a replacement field.
    The 'value' argument corresponds to the value being formatted, which
    was retrieved from the arguments using the field name. The
    'conversion' argument is the conversion spec part of the field, which
    will be either a string or unicode object, depending on the type of
    the original format string.

    Note: The final implementation of the Formatter class may define
    additional overridable methods and hooks. In particular, it may be
    that 'vformat' is itself a composition of several additional,
    overridable methods. (Depending on whether it is convenient to the
    implementor of Formatter.)


Customizing Formatters

    This section describes some typical ways that Formatter objects
    can be customized.

    To support alternative format-string syntax, the 'vformat' method
    can be overridden to alter the way format strings are parsed.

    One common desire is to support a 'default' namespace, so that
    you don't need to pass in keyword arguments to the format()
    method, but can instead use values in a pre-existing namespace.
    This can easily be done by overriding get_named() as follows:

       class NamespaceFormatter(Formatter):
          def __init__(self, namespace={}, flags=0):
              Formatter.__init__(self, flags)
              self.namespace = namespace

          def get_named(self, kwds, name):
              try:
                  # Check explicitly passed arguments first
                  return kwds[name]
            except KeyError:
                  return self.namespace[name]

    One can use this to easily create a formatting function that allows
    access to global variables, for example:

        fmt = NamespaceFormatter(globals())

        greeting = "hello"
        print(fmt("{greeting}, world!"))

    A similar technique can be done with the locals() dictionary to
    gain access to the locals dictionary.

    It would also be possible to create a 'smart' namespace formatter
    that could automatically access both locals and globals through
    snooping of the calling stack. Due to the need for compatibility
    the different versions of Python, such a capability will not be
    included in the standard library, however it is anticipated that
    someone will create and publish a recipe for doing this.

    Another type of customization is to change the way that built-in
    types are formatted by overriding the 'format_field' method. (For
    non-built-in types, you can simply define a __format__ special
    method on that type.) So for example, you could override the
    formatting of numbers to output scientific notation when needed.


Error handling

    There are two classes of exceptions which can occur during formatting:
    exceptions generated by the formatter code itself, and exceptions
    generated by user code (such as a field object's getattr function, or
    the field_hook function).

    In general, exceptions generated by the formatter code itself are
    of the "ValueError" variety -- there is an error in the actual "value"
    of the format string.  (This is not always true; for example, the
    string.format() function might be passed a non-string as its first
    parameter, which would result in a TypeError.)

    The text associated with these internally generated ValueError
    exceptions will indicate the location of the exception inside
    the format string, as well as the nature of the exception.

    For exceptions generated by user code, a trace record and
    dummy frame will be added to the traceback stack to help
    in determining the location in the string where the exception
    occurred.  The inserted traceback will indicate that the
    error occurred at:

        File "<format_string>;", line XX, in column_YY

    where XX and YY represent the line and character position
    information in the string, respectively.


Alternate Syntax

    Naturally, one of the most contentious issues is the syntax of the
    format strings, and in particular the markup conventions used to
    indicate fields.

    Rather than attempting to exhaustively list all of the various
    proposals, I will cover the ones that are most widely used
    already.

    - Shell variable syntax: $name and $(name) (or in some variants,
      ${name}).  This is probably the oldest convention out there, and
      is used by Perl and many others.  When used without the braces,
      the length of the variable is determined by lexically scanning
      until an invalid character is found.

      This scheme is generally used in cases where interpolation is
      implicit - that is, in environments where any string can contain
      interpolation variables, and no special subsitution function
      need be invoked.  In such cases, it is important to prevent the
      interpolation behavior from occuring accidentally, so the '$'
      (which is otherwise a relatively uncommonly-used character) is
      used to signal when the behavior should occur.

      It is the author's opinion, however, that in cases where the
      formatting is explicitly invoked, that less care needs to be
      taken to prevent accidental interpolation, in which case a
      lighter and less unwieldy syntax can be used.

    - Printf and its cousins ('%'), including variations that add a
      field index, so that fields can be interpolated out of order.

    - Other bracket-only variations.  Various MUDs (Multi-User
      Dungeons) such as MUSH have used brackets (e.g. [name]) to do
      string interpolation.  The Microsoft .Net libraries uses braces
      ({}), and a syntax which is very similar to the one in this
      proposal, although the syntax for conversion specifiers is quite
      different. [4]

    - Backquoting.  This method has the benefit of minimal syntactical
      clutter, however it lacks many of the benefits of a function
      call syntax (such as complex expression arguments, custom
      formatters, etc.).

    - Other variations include Ruby's #{}, PHP's {$name}, and so
      on.

    Some specific aspects of the syntax warrant additional comments:

    1) Backslash character for escapes.  The original version of
    this PEP used backslash rather than doubling to escape a bracket.
    This worked because backslashes in Python string literals that
    don't conform to a standard backslash sequence such as '\n'
    are left unmodified. However, this caused a certain amount
    of confusion, and led to potential situations of multiple
    recursive escapes, i.e. '\\\\{' to place a literal backslash
    in front of a bracket.

    2) The use of the colon character (':') as a separator for
    conversion specifiers.  This was chosen simply because that's
    what .Net uses.


Security Considerations

    Historically, string formatting has been a common source of
    security holes in web-based applications, particularly if the
    string templating system allows arbitrary expressions to be
    embedded in format strings.

    The typical scenario is one where the string data being processed
    is coming from outside the application, perhaps from HTTP headers
    or fields within a web form. An attacker could substitute their
    own strings designed to cause havok.

    The string formatting system outlined in this PEP is by no means
    'secure', in the sense that no Python library module can, on its
    own, guarantee security, especially given the open nature of
    the Python language. Building a secure application requires a
    secure approach to design.

    What this PEP does attempt to do is make the job of designing a
    secure application easier, by making it easier for a programmer
    to reason about the possible consequences of a string formatting
    operation. It does this by limiting those consequences to a smaller
    and more easier understood subset.

    For example, because it is possible in Python to override the
    'getattr' operation of a type, the interpretation of a compound
    replacement field such as "0.name" could potentially run
    arbitrary code.

    However, it is *extremely* rare for the mere retrieval of an
    attribute to have side effects. Other operations which are more
    likely to have side effects - such as method calls - are disallowed.
    Thus, a programmer can be reasonably assured that no string
    formatting operation will cause a state change in the program.
    This assurance is not only useful in securing an application, but
    in debugging it as well.

    Similarly, the restriction on field names beginning with
    underscores is intended to provide similar assurances about the
    visibility of private data.

    Of course, programmers would be well-advised to avoid using
    any external data as format strings, and instead use that data
    as the format arguments instead.


Sample Implementation

    An implementation of an earlier version of this PEP was created by
    Patrick Maupin and Eric V. Smith, and can be found in the pep3101
    sandbox at:

       http://svn.python.org/view/sandbox/trunk/pep3101/


Backwards Compatibility

    Backwards compatibility can be maintained by leaving the existing
    mechanisms in place.  The new system does not collide with any of
    the method names of the existing string formatting techniques, so
    both systems can co-exist until it comes time to deprecate the
    older system.


References

    [1] Python Library Reference - String formating operations
    http://docs.python.org/lib/typesseq-strings.html

    [2] Python Library References - Template strings
    http://docs.python.org/lib/node109.html

    [3] [Python-3000] String formating operations in python 3k
        http://mail.python.org/pipermail/python-3000/2006-April/000285.html

    [4] Composite Formatting - [.Net Framework Developer's Guide]
        http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true


Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								PEP: 3101
 								Title: Advanced String Formatting
 								Version: $Revision$
 								Last-Modified: $Date$
 								Author: Talin <talin at acm.org>
 								Status: Draft
-												Make Type field values consistent across all PEPs.

											
										
										
											2007-04-14 22:10:27 -04:00
+								Type: Standards Track
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								Content-Type: text/plain
 								Created: 16-Apr-2006
 								Python-Version: 3.0
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								Abstract
 								    This PEP proposes a new system for built-in string formatting
 								    operations, intended as a replacement for the existing '%' string
 								    formatting operator.
 								Rationale
 								    Python currently provides two methods of string interpolation:
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								    - The '%' operator for strings. [1]
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								    - The string.Template module. [2]
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    The primary scope of this PEP concerns proposals for built-in
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								    string formatting operations (in other words, methods of the
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								    built-in string type).
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								    The '%' operator is primarily limited by the fact that it is a
 								    binary operator, and therefore can take at most two arguments.
 								    One of those arguments is already dedicated to the format string,
 								    leaving all other variables to be squeezed into the remaining
 								    argument.  The current practice is to use either a dictionary or a
 								    tuple as the second argument, but as many people have commented
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								    [3], this lacks flexibility.  The "all or nothing" approach
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								    (meaning that one must choose between only positional arguments,
 								    or only named arguments) is felt to be overly constraining.
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								    While there is some overlap between this proposal and
 								    string.Template, it is felt that each serves a distinct need,
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    and that one does not obviate the other.  This proposal is for
 								    a mechanism which, like '%', is efficient for small strings
 								    which are only used once, so, for example, compilation of a
 								    string into a template is not contemplated in this proposal,
 								    although the proposal does take care to define format strings
 								    and the API in such a way that an efficient template package
 								    could reuse the syntax and even some of the underlying
 								    formatting code.
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								Specification
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    The specification will consist of the following parts:
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Updated based on collected feedback.

											
										
										
											2006-05-06 21:49:43 -04:00
+								    - Specification of a new formatting method to be added to the
 								      built-in string class.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    - Specification of functions and flag values to be added to
 								      the string module, so that the underlying formatting engine
 								      can be used with additional options.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								    - Specification of a new syntax for format strings.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    - Specification of a new set of special methods to control the
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								      formatting and conversion of objects.
 								    - Specification of an API for user-defined formatting classes.
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    - Specification of how formatting errors are handled.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
 								    Note on string encodings: When discussing this PEP in the context
 								    of Python 3.0, it is assumed that all strings are unicode strings,
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    and that the use of the word 'string' in the context of this
 								    document will generally refer to a Python 3.0 string, which is
 								    the same as Python 2.x unicode object.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
 								    In the context of Python 2.x, the use of the word 'string' in this
 								    document refers to an object which may either be a regular string
 								    or a unicode object.  All of the function call interfaces
 								    described in this PEP can be used for both strings and unicode
 								    objects, and in all cases there is sufficient information
 								    to be able to properly deduce the output string type (in
 								    other words, there is no need for two separate APIs).
 								    In all cases, the type of the format string dominates - that
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    is, the result of the conversion will always result in an object
 								    that contains the same representation of characters as the
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    input format string.
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								String Methods
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    The built-in string class (and also the unicode class in 2.6) will
 								    gain a new method, 'format', which takes an arbitrary number of
 								    positional and keyword arguments:
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								        "The story of {0}, {1}, and {c}".format(a, b, c=d)
 								    Within a format string, each positional argument is identified
 								    with a number, starting from zero, so in the above example, 'a' is
 								    argument 0 and 'b' is argument 1.  Each keyword argument is
 								    identified by its keyword name, so in the above example, 'c' is
 								    used to refer to the third argument.
 								Format Strings
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    Format strings consist of intermingled character data and markup.
 								    Character data is data which is transferred unchanged from the
 								    format string to the output string; markup is not transferred from
 								    the format string directly to the output, but instead is used to
 								    define 'replacement fields' that describes to the format engine
 								    what should be placed in the output string in the place of the
 								    markup.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								    Brace characters ('curly braces') are used to indicate a
 								    replacement field within the string:
 								        "My name is {0}".format('Fred')
 								    The result of this is the string:
 								        "My name is Fred"
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    Braces can be escaped by doubling:
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								        "My name is {0} :-{{}}".format('Fred')
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								    Which would produce:
 								        "My name is Fred :-{}"
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								    The element within the braces is called a 'field'.  Fields consist
-												Updated based on collected feedback.

											
										
										
											2006-05-06 21:49:43 -04:00
+								    of a 'field name', which can either be simple or compound, and an
 								    optional 'conversion specifier'.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
 								Simple and Compound Field Names
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Updated based on collected feedback.

											
										
										
											2006-05-06 21:49:43 -04:00
+								    Simple field names are either names or numbers. If numbers, they
 								    must be valid base-10 integers; if names, they must be valid
 								    Python identifiers.  A number is used to identify a positional
 								    argument, while a name is used to identify a keyword argument.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    A compound field name is a combination of multiple simple field
 								    names in an expression:
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								        "My name is {0.name}".format(file('out.txt'))
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    This example shows the use of the 'getattr' or 'dot' operator
 								    in a field expression. The dot operator allows an attribute of
 								    an input value to be specified as the field value.
 								    The types of expressions that can be used in a compound name
 								    have been deliberately limited in order to prevent potential
 								    security exploits resulting from the ability to place arbitrary
 								    Python expressions inside of strings. Only two operators are
 								    supported, the '.' (getattr) operator, and the '[]' (getitem)
 								    operator.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
 								    Another limitation that is defined to limit potential security
 								    issues is that field names or attribute names beginning with an
 								    underscore are disallowed. This enforces the common convention
 								    that names beginning with an underscore are 'private'.
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    An example of the 'getitem' syntax:
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								        "My name is {0[name]}".format(dict(name='Fred'))
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    It should be noted that the use of 'getitem' within a string is
 								    much more limited than its normal use. In the above example, the
 								    string 'name' really is the literal string 'name', not a variable
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    named 'name'. The rules for parsing an item key are very simple.
 								    If it starts with a digit, then its treated as a number, otherwise
 								    it is used as a string.
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    It is not possible to specify arbitrary dictionary keys from
 								    within a format string.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    Implementation note:  The implementation of this proposal is
 								    not required to enforce the rule about a name being a valid
 								    Python identifier.  Instead, it will rely on the getattr function
 								    of the underlying object to throw an exception if the identifier
 								    is not legal.  The format function will have a minimalist parser
 								    which only attempts to figure out when it is "done" with an
 								    identifier (by finding a '.' or a ']', or '}', etc.)  The only
 								    exception to this laissez-faire approach is that, by default,
 								    strings are not allowed to have leading underscores.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								Conversion Specifiers
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								    Each field can also specify an optional set of 'conversion
-												Updated based on collected feedback.

											
										
										
											2006-05-06 21:49:43 -04:00
+								    specifiers' which can be used to adjust the format of that field.
 								    Conversion specifiers follow the field name, with a colon (':')
 								    character separating the two:
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								        "My name is {0:8}".format('Fred')
 								    The meaning and syntax of the conversion specifiers depends on the
-												Updates to PEP 3101 as a result of discussion in Python-3000

											
										
										
											2006-07-04 20:51:40 -04:00
+								    type of object that is being formatted, however there is a
 								    standard set of conversion specifiers used for any object that
 								    does not override them.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    Conversion specifiers can themselves contain replacement fields.
-												Updates to PEP 3101 as a result of discussion in Python-3000

											
										
										
											2006-07-04 20:51:40 -04:00
+								    For example, a field whose field width is itself a parameter
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    could be specified via:
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								        "{0:{1}}".format(a, b, c)
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    Note that the doubled '}' at the end, which would normally be
 								    escaped, is not escaped in this case.  The reason is because
 								    the '{{' and '}}' syntax for escapes is only applied when used
 								    *outside* of a format field. Within a format field, the brace
 								    characters always have their normal meaning.
-												Updates to PEP 3101 as a result of discussion in Python-3000

											
										
										
											2006-07-04 20:51:40 -04:00
 								    The syntax for conversion specifiers is open-ended, since a class
 								    can override the standard conversion specifiers. In such cases,
 								    the format() method merely passes all of the characters between
 								    the first colon and the matching brace to the relevant underlying
 								    formatting method.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								Standard Conversion Specifiers
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    If an object does not define its own conversion specifiers, a
 								    standard set of conversion specifiers are used.  These are similar
 								    in concept to the conversion specifiers used by the existing '%'
 								    operator, however there are also a number of significant
 								    differences.  The standard conversion specifiers fall into three
 								    major categories: string conversions, integer conversions and
 								    floating point conversions.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    The general form of a standard conversion specifier is:
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								        [[fill]align][sign][width][.precision][type]
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Updates to PEP 3101 as a result of discussion in Python-3000

											
										
										
											2006-07-04 20:51:40 -04:00
+								    The brackets ([]) indicate an optional element.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    Then the optional align flag can be one of the following:
 								        '<' - Forces the field to be left-aligned within the available
 								              space (This is the default.)
 								        '>' - Forces the field to be right-aligned within the
 								              available space.
-												Updates to PEP 3101 as a result of discussion in Python-3000

											
										
										
											2006-07-04 20:51:40 -04:00
+								        '=' - Forces the padding to be placed after the sign (if any)
 								              but before the digits. This is used for printing fields
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								              in the form '+000000120'.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								        '^' - Forces the field to be centered within the available
 								              space.
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    Note that unless a minimum field width is defined, the field
 								    width will always be the same size as the data to fill it, so
 								    that the alignment option has no meaning in this case.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    The optional 'fill' character defines the character to be used to
 								    pad the field to the minimum width.  The alignment flag must be
 								    supplied if the character is a number other than 0 (otherwise the
 								    character would be interpreted as part of the field width
 								    specifier). A zero fill character without an alignment flag
 								    implies an alignment type of '='.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Updates to PEP 3101 as a result of discussion in Python-3000

											
										
										
											2006-07-04 20:51:40 -04:00
+								    The 'sign' element can be one of the following:
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
 								        '+'  - indicates that a sign should be used for both
 								               positive as well as negative numbers
 								        '-'  - indicates that a sign should be used only for negative
 								               numbers (this is the default behaviour)
 								        ' '  - indicates that a leading space should be used on
 								               positive numbers
 								        '()' - indicates that negative numbers should be surrounded
 								               by parentheses
 								    'width' is a decimal integer defining the minimum field width. If
 								    not specified, then the field width will be determined by the
 								    content.
-												Updates to PEP 3101 as a result of discussion in Python-3000

											
										
										
											2006-07-04 20:51:40 -04:00
+								    The 'precision' is a decimal number indicating how many digits
 								    should be displayed after the decimal point in a floating point
 								    conversion. In a string conversion the field indicates how many
 								    characters will be used from the field content. The precision is
 								    ignored for integer conversions.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+								    Finally, the 'type' determines how the data should be presented.
 								    If the type field is absent, an appropriate type will be assigned
 								    based on the value to be formatted ('d' for integers and longs,
 								    'g' for floats, and 's' for everything else.)
 								    The available string conversion types are:
 								        's' - String format. Invokes str() on the object.
 								              This is the default conversion specifier type.
 								        'r' - Repr format. Invokes repr() on the object.
 								    There are several integer conversion types. All invoke int() on
 								    the object before attempting to format it.
 								    The available integer conversion types are:
 								        'b' - Binary. Outputs the number in base 2.
 								        'c' - Character. Converts the integer to the corresponding
 								              unicode character before printing.
 								        'd' - Decimal Integer. Outputs the number in base 10.
 								        'o' - Octal format. Outputs the number in base 8.
 								        'x' - Hex format. Outputs the number in base 16, using lower-
 								              case letters for the digits above 9.
 								        'X' - Hex format. Outputs the number in base 16, using upper-
 								              case letters for the digits above 9.
 								    There are several floating point conversion types. All invoke
 								    float() on the object before attempting to format it.
 								    The available floating point conversion types are:
 								        'e' - Exponent notation. Prints the number in scientific
 								              notation using the letter 'e' to indicate the exponent.
 								        'E' - Exponent notation. Same as 'e' except it uses an upper
 								              case 'E' as the separator character.
 								        'f' - Fixed point. Displays the number as a fixed-point
 								              number.
 								        'F' - Fixed point. Same as 'f'.
 								        'g' - General format. This prints the number as a fixed-point
 								              number, unless the number is too large, in which case
 								              it switches to 'e' exponent notation.
 								        'G' - General format. Same as 'g' except switches to 'E'
 								              if the number gets to large.
 								        'n' - Number. This is the same as 'g', except that it uses the
 								              current locale setting to insert the appropriate
 								              number separator characters.
 								        '%' - Percentage. Multiplies the number by 100 and displays
 								              in fixed ('f') format, followed by a percent sign.
 								    Objects are able to define their own conversion specifiers to
 								    replace the standard ones.  An example is the 'datetime' class,
 								    whose conversion specifiers might look something like the
 								    arguments to the strftime() function:
 								        "Today is: {0:a b d H:M:S Y}".format(datetime.now())
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								Controlling Formatting on a Per-Type Basis
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								    A class that wishes to implement a custom interpretation of its
 								    conversion specifiers can implement a __format__ method:
 								    class AST:
 								        def __format__(self, specifiers):
 								            ...
 								    The 'specifiers' argument will be either a string object or a
 								    unicode object, depending on the type of the original format
 								    string.  The __format__ method should test the type of the
 								    specifiers parameter to determine whether to return a string or
 								    unicode object.  It is the responsibility of the __format__ method
 								    to return an object of the proper type.
 								    string.format() will format each field using the following steps:
 ) See if the value to be formatted has a __format__ method.  If
 								        it does, then call it.
 ) Otherwise, check the internal formatter within string.format
 								        that contains knowledge of certain builtin types.
 ) Otherwise, call str() or unicode() as appropriate.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								User-Defined Formatting
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Updated based on collected feedback.

											
										
										
											2006-05-06 21:49:43 -04:00
+								    There will be times when customizing the formatting of fields
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    on a per-type basis is not enough.  An example might be a
 								    spreadsheet application, which displays hash marks '#' when a value
 								    is too large to fit in the available space.
 								    For more powerful and flexible formatting, access to the underlying
 								    format engine can be obtained through the 'Formatter' class that
 								    lives in the 'string' module.  This class takes additional options
 								    which are not accessible via the normal str.format method.
 								    An application can create their own Formatter instance which has
 								    customized behavior, either by setting the properties of the
 								    Formatter instance, or by subclassing the Formatter class.
 								    The PEP does not attempt to exactly specify all methods and
 								    properties defined by the Formatter class; Instead, those will be
 								    defined and documented in the initial implementation. However, this
 								    PEP will specify the general requirements for the Formatter class,
 								    which are listed below.
 								Formatter Creation and Initialization
 								    The Formatter class takes a single initialization argument, 'flags':
 								        Formatter(flags=0)
 								    The 'flags' argument is used to control certain subtle behavioral
 								    differences in formatting that would be cumbersome to change via
 								    subclassing. The flags values are defined as static variables
 								    in the "Formatter" class:
 								        Formatter.ALLOW_LEADING_UNDERSCORES
 								            By default, leading underscores are not allowed in identifier
 								            lookups (getattr or getitem).  Setting this flag will allow
 								            this.
 								        Formatter.CHECK_UNUSED_POSITIONAL
 								            If this flag is set, the any positional arguments which are
 								            supplied to the 'format' method but which are not used by
 								            the format string will cause an error.
 								        Formatter.CHECK_UNUSED_NAME
 								            If this flag is set, the any named arguments which are
 								            supplied to the 'format' method but which are not used by
 								            the format string will cause an error.
 								Formatter Methods
 								    The methods of class Formatter are as follows:
 								        -- format(format_string, *args, **kwargs)
 								        -- vformat(format_string, args, kwargs)
 								        -- get_positional(args, index)
 								        -- get_named(kwds, name)
 								        -- format_field(value, conversion)
 								    'format' is the primary API method. It takes a format template,
 								    and an arbitrary set of positional and keyword argument. 'format'
 								    is just a wrapper that calls 'vformat'.
 								    'vformat' is the function that does the actual work of formatting. It
 								    is exposed as a separate function for cases where you want to pass in
 								    a predefined dictionary of arguments, rather than unpacking and
 								    repacking the dictionary as individual arguments using the '*args' and
 								    '**kwds' syntax. 'vformat' does the work of breaking up the format
 								    template string into character data and replacement fields. It calls
 								    the 'get_positional' and 'get_index' methods as appropriate.
 								    Note that the checking of unused arguments, and the restriction on
 								    leading underscores in attribute names are also done in this function.
 								    'get_positional' and 'get_named' are used to retrieve a given field
 								    value. For compound field names, these functions are only called for
 								    the first component of the field name; Subsequent components are
 								    handled through normal attribute and indexing operations. So for
 								    example, the field expression '0.name' would cause 'get_positional' to
 								    be called with the list of positional arguments and a numeric index of
 , and then the standard 'getattr' function would be called to get the
 								    'name' attribute of the result.
 								    If the index or keyword refers to an item that does not exist, then an
 								    IndexError/KeyError will be raised.
 								    'format_field' actually generates the text for a replacement field.
 								    The 'value' argument corresponds to the value being formatted, which
 								    was retrieved from the arguments using the field name. The
 								    'conversion' argument is the conversion spec part of the field, which
 								    will be either a string or unicode object, depending on the type of
 								    the original format string.
 								    Note: The final implementation of the Formatter class may define
 								    additional overridable methods and hooks. In particular, it may be
 								    that 'vformat' is itself a composition of several additional,
 								    overridable methods. (Depending on whether it is convenient to the
 								    implementor of Formatter.)
 								Customizing Formatters
 								    This section describes some typical ways that Formatter objects
 								    can be customized.
 								    To support alternative format-string syntax, the 'vformat' method
 								    can be overridden to alter the way format strings are parsed.
 								    One common desire is to support a 'default' namespace, so that
 								    you don't need to pass in keyword arguments to the format()
 								    method, but can instead use values in a pre-existing namespace.
 								    This can easily be done by overriding get_named() as follows:
 								       class NamespaceFormatter(Formatter):
 								          def __init__(self, namespace={}, flags=0):
 								              Formatter.__init__(self, flags)
 								              self.namespace = namespace
 								          def get_named(self, kwds, name):
 								              try:
 								                  # Check explicitly passed arguments first
 								                  return kwds[name]
 								            except KeyError:
 								                  return self.namespace[name]
 								    One can use this to easily create a formatting function that allows
 								    access to global variables, for example:
 								        fmt = NamespaceFormatter(globals())
 								        greeting = "hello"
 								        print(fmt("{greeting}, world!"))
 								    A similar technique can be done with the locals() dictionary to
 								    gain access to the locals dictionary.
 								    It would also be possible to create a 'smart' namespace formatter
 								    that could automatically access both locals and globals through
 								    snooping of the calling stack. Due to the need for compatibility
 								    the different versions of Python, such a capability will not be
 								    included in the standard library, however it is anticipated that
 								    someone will create and publish a recipe for doing this.
 								    Another type of customization is to change the way that built-in
 								    types are formatted by overriding the 'format_field' method. (For
 								    non-built-in types, you can simply define a __format__ special
 								    method on that type.) So for example, you could override the
 								    formatting of numbers to output scientific notation when needed.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
 								Error handling
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    There are two classes of exceptions which can occur during formatting:
 								    exceptions generated by the formatter code itself, and exceptions
 								    generated by user code (such as a field object's getattr function, or
 								    the field_hook function).
 								    In general, exceptions generated by the formatter code itself are
 								    of the "ValueError" variety -- there is an error in the actual "value"
 								    of the format string.  (This is not always true; for example, the
 								    string.format() function might be passed a non-string as its first
 								    parameter, which would result in a TypeError.)
 								    The text associated with these internally generated ValueError
 								    exceptions will indicate the location of the exception inside
 								    the format string, as well as the nature of the exception.
 								    For exceptions generated by user code, a trace record and
 								    dummy frame will be added to the traceback stack to help
 								    in determining the location in the string where the exception
 								    occurred.  The inserted traceback will indicate that the
 								    error occurred at:
 								        File "<format_string>;", line XX, in column_YY
 								    where XX and YY represent the line and character position
 								    information in the string, respectively.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								Alternate Syntax
 								    Naturally, one of the most contentious issues is the syntax of the
 								    format strings, and in particular the markup conventions used to
 								    indicate fields.
 								    Rather than attempting to exhaustively list all of the various
 								    proposals, I will cover the ones that are most widely used
 								    already.
 								    - Shell variable syntax: $name and $(name) (or in some variants,
 								      ${name}).  This is probably the oldest convention out there, and
 								      is used by Perl and many others.  When used without the braces,
 								      the length of the variable is determined by lexically scanning
 								      until an invalid character is found.
 								      This scheme is generally used in cases where interpolation is
 								      implicit - that is, in environments where any string can contain
 								      interpolation variables, and no special subsitution function
 								      need be invoked.  In such cases, it is important to prevent the
 								      interpolation behavior from occuring accidentally, so the '$'
 								      (which is otherwise a relatively uncommonly-used character) is
 								      used to signal when the behavior should occur.
 								      It is the author's opinion, however, that in cases where the
 								      formatting is explicitly invoked, that less care needs to be
 								      taken to prevent accidental interpolation, in which case a
 								      lighter and less unwieldy syntax can be used.
 								    - Printf and its cousins ('%'), including variations that add a
 								      field index, so that fields can be interpolated out of order.
 								    - Other bracket-only variations.  Various MUDs (Multi-User
 								      Dungeons) such as MUSH have used brackets (e.g. [name]) to do
 								      string interpolation.  The Microsoft .Net libraries uses braces
 								      ({}), and a syntax which is very similar to the one in this
 								      proposal, although the syntax for conversion specifiers is quite
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								      different. [4]
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
 								    - Backquoting.  This method has the benefit of minimal syntactical
 								      clutter, however it lacks many of the benefits of a function
 								      call syntax (such as complex expression arguments, custom
 								      formatters, etc.).
 								    - Other variations include Ruby's #{}, PHP's {$name}, and so
 								      on.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Updated based on collected feedback.

											
										
										
											2006-05-06 21:49:43 -04:00
+								    Some specific aspects of the syntax warrant additional comments:
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Lots of changes - added specification for conversions, error handling, complex field specs and general cleanup.

											
										
										
											2006-06-10 20:59:06 -04:00
+) Backslash character for escapes.  The original version of
 								    this PEP used backslash rather than doubling to escape a bracket.
 								    This worked because backslashes in Python string literals that
 								    don't conform to a standard backslash sequence such as '\n'
 								    are left unmodified. However, this caused a certain amount
 								    of confusion, and led to potential situations of multiple
 								    recursive escapes, i.e. '\\\\{' to place a literal backslash
 								    in front of a bracket.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
-												Updated based on collected feedback.

											
										
										
											2006-05-06 21:49:43 -04:00
+) The use of the colon character (':') as a separator for
 								    conversion specifiers.  This was chosen simply because that's
 								    what .Net uses.
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
 								Security Considerations
 								    Historically, string formatting has been a common source of
 								    security holes in web-based applications, particularly if the
 								    string templating system allows arbitrary expressions to be
 								    embedded in format strings.
 								    The typical scenario is one where the string data being processed
 								    is coming from outside the application, perhaps from HTTP headers
 								    or fields within a web form. An attacker could substitute their
 								    own strings designed to cause havok.
 								    The string formatting system outlined in this PEP is by no means
 								    'secure', in the sense that no Python library module can, on its
 								    own, guarantee security, especially given the open nature of
 								    the Python language. Building a secure application requires a
 								    secure approach to design.
 								    What this PEP does attempt to do is make the job of designing a
 								    secure application easier, by making it easier for a programmer
 								    to reason about the possible consequences of a string formatting
 								    operation. It does this by limiting those consequences to a smaller
 								    and more easier understood subset.
 								    For example, because it is possible in Python to override the
 								    'getattr' operation of a type, the interpretation of a compound
 								    replacement field such as "0.name" could potentially run
 								    arbitrary code.
 								    However, it is *extremely* rare for the mere retrieval of an
 								    attribute to have side effects. Other operations which are more
 								    likely to have side effects - such as method calls - are disallowed.
 								    Thus, a programmer can be reasonably assured that no string
 								    formatting operation will cause a state change in the program.
 								    This assurance is not only useful in securing an application, but
 								    in debugging it as well.
 								    Similarly, the restriction on field names beginning with
 								    underscores is intended to provide similar assurances about the
 								    visibility of private data.
 								    Of course, programmers would be well-advised to avoid using
 								    any external data as format strings, and instead use that data
 								    as the format arguments instead.
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								Sample Implementation
-												A substantial rewrite of PEP3101.

											
										
										
											2007-06-03 14:53:34 -04:00
+								    An implementation of an earlier version of this PEP was created by
 								    Patrick Maupin and Eric V. Smith, and can be found in the pep3101
 								    sandbox at:
 								       http://svn.python.org/view/sandbox/trunk/pep3101/
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								Backwards Compatibility
 								    Backwards compatibility can be maintained by leaving the existing
 								    mechanisms in place.  The new system does not collide with any of
 								    the method names of the existing string formatting techniques, so
 								    both systems can co-exist until it comes time to deprecate the
 								    older system.
 								References
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								    [1] Python Library Reference - String formating operations
 								    http://docs.python.org/lib/typesseq-strings.html
 								    [2] Python Library References - Template strings
 								    http://docs.python.org/lib/node109.html
 								    [3] [Python-3000] String formating operations in python 3k
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								        http://mail.python.org/pipermail/python-3000/2006-April/000285.html
-												Removed references to previous 'fformat' proposal.
Added clarification about relationship with string.Template

											
										
										
											2006-04-27 12:53:54 -04:00
+								    [4] Composite Formatting - [.Net Framework Developer's Guide]
-												added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

											
										
										
											2006-04-26 16:33:25 -04:00
+								        http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
 								Copyright
 								    This document has been placed in the public domain.
 								Local Variables:
 								mode: indented-text
 								indent-tabs-mode: nil
 								sentence-end-double-space: t
 								fill-column: 70
 								coding: utf-8
 								End: