Latest updates to PEP 3101, incorporating recent discussions.

2007-08-15 00:14:29 +00:00 · 2007-08-15 00:14:29 +00:00 · 8089f06be4
parent 0801f23a05
commit 8089f06be4
1 changed files with 171 additions and 111 deletions
--- a/pep-3101.txt
+++ b/pep-3101.txt
@ -8,7 +8,7 @@ Type: Standards Track
 Content-Type: text/plain
 Created: 16-Apr-2006
 Python-Version: 3.0
-Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
+Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007


 Abstract
@ -104,6 +104,13 @@ String Methods
    argument 0 and 'b' is argument 1.  Each keyword argument is
    identified by its keyword name, so in the above example, 'c' is
    used to refer to the third argument.
+    
+    There is also a global built-in function, 'format' which formats
+    a single value:
+    
+       print format(10.0, "7.3g")
+       
+    This function is described in a later section.


 Format Strings
@ -136,7 +143,7 @@ Format Strings

    The element within the braces is called a 'field'.  Fields consist
    of a 'field name', which can either be simple or compound, and an
-    optional 'conversion specifier'.
+    optional 'format specifier'.


 Simple and Compound Field Names
@ -159,7 +166,7 @@ Simple and Compound Field Names
    expressions in format strings.  This is by design - the types of
    expressions that you can use is deliberately limited.  Only two operators
    are supported: the '.' (getattr) operator, and the '[]' (getitem)
-    operator.  The reason for allowing these operators is that they dont'
+    operator.  The reason for allowing these operators is that they don't
    normally have side effects in non-pathological code.

    An example of the 'getitem' syntax:
@ -180,26 +187,26 @@ Simple and Compound Field Names
    not required to enforce the rule about a name being a valid
    Python identifier.  Instead, it will rely on the getattr function
    of the underlying object to throw an exception if the identifier
-    is not legal.  The format function will have a minimalist parser
-    which only attempts to figure out when it is "done" with an
+    is not legal.  The str.format() function will have a minimalist
+    parser which only attempts to figure out when it is "done" with an
    identifier (by finding a '.' or a ']', or '}', etc.).


-Conversion Specifiers
+Format Specifiers

-    Each field can also specify an optional set of 'conversion
+    Each field can also specify an optional set of 'format
    specifiers' which can be used to adjust the format of that field.
-    Conversion specifiers follow the field name, with a colon (':')
+    Format specifiers follow the field name, with a colon (':')
    character separating the two:

        "My name is {0:8}".format('Fred')

-    The meaning and syntax of the conversion specifiers depends on the
+    The meaning and syntax of the format specifiers depends on the
    type of object that is being formatted, however there is a
-    standard set of conversion specifiers used for any object that
+    standard set of format specifiers used for any object that
    does not override them.

-    Conversion specifiers can themselves contain replacement fields.
+    Format specifiers can themselves contain replacement fields.
    For example, a field whose field width is itself a parameter
    could be specified via:

@ -211,24 +218,21 @@ Conversion Specifiers
    *outside* of a format field.  Within a format field, the brace
    characters always have their normal meaning.

-    The syntax for conversion specifiers is open-ended, since a class
-    can override the standard conversion specifiers.  In such cases,
-    the format() method merely passes all of the characters between
+    The syntax for format specifiers is open-ended, since a class
+    can override the standard format specifiers.  In such cases,
+    the str.format() method merely passes all of the characters between
    the first colon and the matching brace to the relevant underlying
    formatting method.


-Standard Conversion Specifiers
+Standard Format Specifiers

-    If an object does not define its own conversion specifiers, a
-    standard set of conversion specifiers are used.  These are similar
-    in concept to the conversion specifiers used by the existing '%'
-    operator, however there are also a number of significant
-    differences.  The standard conversion specifiers fall into three
-    major categories: string conversions, integer conversions and
-    floating point conversions.
-
-    The general form of a standard conversion specifier is:
+    If an object does not define its own format specifiers, a
+    standard set of format specifiers are used.  These are similar
+    in concept to the format specifiers used by the existing '%'
+    operator, however there are also a number of differences.
+    
+    The general form of a standard format specifier is:

        [[fill]align][sign][width][.precision][type]

@ -265,8 +269,6 @@ Standard Conversion Specifiers
               numbers (this is the default behavior)
        ' '  - indicates that a leading space should be used on
               positive numbers
-        '()' - indicates that negative numbers should be surrounded
-               by parentheses

    'width' is a decimal integer defining the minimum field width.  If
    not specified, then the field width will be determined by the
@ -274,25 +276,16 @@ Standard Conversion Specifiers

    The 'precision' is a decimal number indicating how many digits
    should be displayed after the decimal point in a floating point
-    conversion.  In a string conversion the field indicates how many
-    characters will be used from the field content.  The precision is
-    ignored for integer conversions.
+    conversion.  For non-number types the field indicates the maximum
+    field size - in other words, how many characters will be used from
+    the field content.  The precision is ignored for integer conversions.

    Finally, the 'type' determines how the data should be presented.
    If the type field is absent, an appropriate type will be assigned
    based on the value to be formatted ('d' for integers and longs,
-    'g' for floats, and 's' for everything else.)
-
-    The available string conversion types are:
-
-        's' - String format.  Invokes str() on the object.
-              This is the default conversion specifier type.
-        'r' - Repr format.  Invokes repr() on the object.
-
-    There are several integer conversion types.  All invoke int() on
-    the object before attempting to format it.
-
-    The available integer conversion types are:
+    'g' for floats and decimals, and 's' for everything else.)
+    
+    The available integer presentation types are:

        'b' - Binary. Outputs the number in base 2.
        'c' - Character. Converts the integer to the corresponding
@ -304,10 +297,7 @@ Standard Conversion Specifiers
        'X' - Hex format. Outputs the number in base 16, using upper-
              case letters for the digits above 9.

-    There are several floating point conversion types.  All invoke
-    float() on the object before attempting to format it.
-
-    The available floating point conversion types are:
+    The available floating point presentation types are:

        'e' - Exponent notation. Prints the number in scientific
              notation using the letter 'e' to indicate the exponent.
@ -326,40 +316,89 @@ Standard Conversion Specifiers
              number separator characters.
        '%' - Percentage. Multiplies the number by 100 and displays
              in fixed ('f') format, followed by a percent sign.
-
-    Objects are able to define their own conversion specifiers to
+              
+    For non-numeric types, there's only one presentation type, 's', which
+    does nothing - its only purpose is to ease the transition from the
+    old '%' style formatting.
+              
+    Objects are able to define their own format specifiers to
    replace the standard ones.  An example is the 'datetime' class,
-    whose conversion specifiers might look something like the
+    whose format specifiers might look something like the
    arguments to the strftime() function:

        "Today is: {0:a b d H:M:S Y}".format(datetime.now())


+Explicit Conversion Flag
+
+    The explicit conversion flag is used to transform the format field value
+    before it is formatted.  This can be used to override the type-specific
+    formatting behavior, and format the value as if it were a more
+    generic type.  Currently, two explicit conversion flags are
+    recognized:
+    
+        !r - convert the value to a string using repr().
+        !s - convert the value to a string using str().
+        
+    These flags are typically placed before the format specifier:
+
+        "{0!r:20}".format("Hello")
+        
+    In the preceding example, the string "Hello" will be printed, with quotes,
+    in a field of at least 20 characters width.
+
+
 Controlling Formatting on a Per-Type Basis

-    A class that wishes to implement a custom interpretation of its
-    conversion specifiers can implement a __format__ method:
+    Each Python type can control formatting of its instances by defining
+    a __format__ method.  The __format__ method is responsible for
+    interpreting the format specifier, formatting the value, and
+    returning the resulting string.
+    
+    The new, global built-in function 'format' simply calls this special
+    method, similar to how len() and str() simply call their respective
+    special methods:
+    
+        def format(value, format_spec):
+            return value.__format__(format_spec)
+            
+    It is safe to call this function with a value of "None" (because the
+    "None" value in Python is an object and can have methods.)

-    class AST:
-        def __format__(self, specifiers):
-            ...
+    Several built-in types, including 'str', 'int', 'float', and 'object'
+    define __format__ methods.  This means that if you derive from any of
+    those types, your class will know how to format itself.
+    
+    The object.__format__ method is the simplest: It simply converts the
+    object to a string, and then calls format again:
+    
+        class object:
+            def __format__(self, format_spec):
+                return format(str(self), format_spec)
+                
+    The __format__ methods for 'int' and 'float' will do numeric formatting
+    based on the format specifier.  In some cases, these formatting
+    operations may be delegated to other types.  So for example, in the case
+    where the 'int' formatter sees a format type of 'f' (meaning 'float')
+    it can simply cast the value to a float and call format() again.
+    
+    Any class can override the __format__ method to provide custom
+    formatting for that type:

-    The 'specifiers' argument will be either a string object or a
-    unicode object, depending on the type of the original format
-    string.  The __format__ method should test the type of the
-    specifiers parameter to determine whether to return a string or
+        class AST:
+            def __format__(self, format_spec):
+                ...
+
+    Note for Python 2.x: The 'format_spec' argument will be either
+    a string object or a unicode object, depending on the type of the
+    original format string.  The __format__ method should test the type
+    of the specifiers parameter to determine whether to return a string or
    unicode object.  It is the responsibility of the __format__ method
    to return an object of the proper type.
-
-    string.format() will format each field using the following steps:
-
-     1) See if the value to be formatted has a __format__ method.  If
-        it does, then call it.
-
-     2) Otherwise, check the internal formatter within string.format
-        that contains knowledge of certain builtin types.
-
-     3) Otherwise, call str() or unicode() as appropriate.
+    
+    Note that the 'explicit conversion' flag mentioned above is not passed
+    to the __format__ method.  Rather, it is expected that the conversion
+    specified by the flag will be performed before calling __format__.


 User-Defined Formatting
@ -418,24 +457,30 @@ Formatter Methods

    Formatter defines the following overridable methods:
        
-        -- get_positional(args, index)
-        -- get_named(kwds, name)
+        -- get_value(key, args, kwargs)
        -- check_unused_args(used_args, args, kwargs)
-        -- format_field(value, conversion)
+        -- format_field(value, format_spec)

-    'get_positional' and 'get_named' are used to retrieve a given field
-    value.  For compound field names, these functions are only called for
-    the first component of the field name; Subsequent components are
-    handled through normal attribute and indexing operations.
+    'get_value' is used to retrieve a given field value.  The 'key' argument
+    will be either an integer or a string.  If it is an integer, it represents
+    the index of the positional argument in 'args'; If it is a string, then
+    it represents a named argument in 'kwargs'.
    
-    So for example, the field expression '0.name' would cause
-    'get_positional' to be called with the parameter 'args' set to the
-    list of positional arguments to vformat, and 'index' set to zero;
-    the returned value would then be passed to the standard 'getattr'
-    function to get the 'name' attribute.
+    The 'args' parameter is set to the list of positional arguments to
+    'vformat', and the 'kwargs' parameter is set to the dictionary of
+    positional arguments.
+    
+    For compound field names, these functions are only called for the
+    first component of the field name; Subsequent components are handled
+    through normal attribute and indexing operations.
+    
+    So for example, the field expression '0.name' would cause 'get_value'
+    to be called with a 'key' argument of 0.  The 'name' attribute will be
+    looked up after 'get_value' returns by calling the built-in 'getattr'
+    function.

    If the index or keyword refers to an item that does not exist, then an
-    IndexError/KeyError will be raised.
+    IndexError/KeyError should be raised.
    
    'check_unused_args' is used to implement checking for unused arguments
    if desired.  The arguments to this function is the set of all argument
@ -444,17 +489,13 @@ Formatter Methods
    to the args and kwargs that was passed to vformat.  The intersection
    of these two sets will be the set of unused args.  'check_unused_args'
    is assumed to throw an exception if the check fails.
-
-    'format_field' actually generates the text for a replacement field.
-    The 'value' argument corresponds to the value being formatted, which
-    was retrieved from the arguments using the field name.  The
-    'conversion' argument is the conversion spec part of the field, which
-    will be either a string or unicode object, depending on the type of
-    the original format string.
    
+    'format_field' simply calls the global 'format' built-in.  The method
+    is provided so that subclasses can override it.
+
    To get a better understanding of how these functions relate to each
    other, here is pseudocode that explains the general operation of
-    vformat:
+    vformat.
    
        def vformat(format_string, args, kwargs):
        
@ -465,22 +506,36 @@ Formatter Methods
          # Tokens are either format fields or literal strings
          for token in self.parse(format_string):
            if is_format_field(token):
-              field_spec, conversion_spec = token.rsplit(":", 2)
+              # Split the token into field value and format spec
+              field_spec, _, format_spec = token.partition(":")
+              
+              # Check for explicit type conversion
+              field_spec, _, explicit = field_spec.partition("!")
              
              # 'first_part' is the part before the first '.' or '['
-              first_part = get_first_part(token)
+              # Assume that 'get_first_part' returns either an int or
+              # a string, depending on the syntax.
+              first_part = get_first_part(field_spec)
+              value = self.get_value(first_part, args, kwargs)
+              
+              # Record the fact that we used this arg
              used_args.add(first_part)
-              if is_positional(first_part):
-                value = self.get_positional(args, first_part) 
-              else:
-                value = self.get_named(kwargs, first_part)
-                
-              # Handle [subfield] or .subfield
-              for comp in components(token):
+              
+              # Handle [subfield] or .subfield. Assume that 'components'
+              # returns an iterator of the various subfields, not including
+              # the first part.
+              for comp in components(field_spec):
                value = resolve_subfield(value, comp)

-              # Write out the converted value
-              buffer.write(format_field(value, conversion))
+              # Handle explicit type conversion
+              if explicit == 'r':
+                value = repr(value)
+              elif explicit == 's':
+                value = str(value)
+
+              # Call the global 'format' function and write out the converted
+              # value.
+              buffer.write(self.format_field(value, format_spec))
              
            else:
              buffer.write(token)
@ -488,10 +543,12 @@ Formatter Methods
          self.check_unused_args(used_args, args, kwargs)
          return buffer.getvalue()
          
-    Note that the actual algorithm of the Formatter class may not be the
-    one presented here.  In particular, the final implementation of
-    the Formatter class may define additional overridable methods and
-    hooks.  Also, the final implementation will be written in C.
+    Note that the actual algorithm of the Formatter class (which will be
+    implemented in C) may not be the one presented here.  (It's likely
+    that the actual implementation won't be a 'class' at all - rather,
+    vformat may just call a C function which accepts the other overridable
+    methods as arguments.)  The primary purpose of this code example is to
+    illustrate the order in which overridable methods are called.


 Customizing Formatters
@ -512,12 +569,15 @@ Customizing Formatters
              Formatter.__init__(self, flags)
              self.namespace = namespace

-          def get_named(self, kwds, name):
-              try:
-                  # Check explicitly passed arguments first
-                  return kwds[name]
-            except KeyError:
-                  return self.namespace[name]
+          def get_value(self, key, args, kwds):
+              if isinstance(key, str):
+                  try:
+                      # Check explicitly passed arguments first
+                      return kwds[name]
+                  except KeyError:
+                      return self.namespace[name]
+              else:
+                  Formatter.get_value(key, args, kwds)

    One can use this to easily create a formatting function that allows
    access to global variables, for example:
@ -608,7 +668,7 @@ Alternate Syntax
      Dungeons) such as MUSH have used brackets (e.g. [name]) to do
      string interpolation.  The Microsoft .Net libraries uses braces
      ({}), and a syntax which is very similar to the one in this
-      proposal, although the syntax for conversion specifiers is quite
+      proposal, although the syntax for format specifiers is quite
      different. [4]

    - Backquoting.  This method has the benefit of minimal syntactical
@ -631,7 +691,7 @@ Alternate Syntax
    in front of a bracket.

    2) The use of the colon character (':') as a separator for
-    conversion specifiers.  This was chosen simply because that's
+    format specifiers.  This was chosen simply because that's
    what .Net uses.