Latest updates to PEP 3101, incorporating recent discussions.

2007-08-15 00:14:29 +00:00 · 2007-08-15 00:14:29 +00:00 · 8089f06be4
parent 0801f23a05
commit 8089f06be4
1 changed files with 171 additions and 111 deletions
--- a/pep-3101.txt
+++ b/pep-3101.txt
@ -8,7 +8,7 @@ Type: Standards Track
 Content-Type: text/plain
 Created: 16-Apr-2006
 Python-Version: 3.0
-Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006
+Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2007, 14-Aug-2007
 Abstract
@ -105,6 +105,13 @@ String Methods
    identified by its keyword name, so in the above example, 'c' is
    used to refer to the third argument.
    There is also a global built-in function, 'format' which formats
    a single value:
       print format(10.0, "7.3g")
    This function is described in a later section.
 Format Strings
@ -136,7 +143,7 @@ Format Strings
    The element within the braces is called a 'field'.  Fields consist
    of a 'field name', which can either be simple or compound, and an
-    optional 'conversion specifier'.
+    optional 'format specifier'.
 Simple and Compound Field Names
@ -159,7 +166,7 @@ Simple and Compound Field Names
    expressions in format strings.  This is by design - the types of
    expressions that you can use is deliberately limited.  Only two operators
    are supported: the '.' (getattr) operator, and the '[]' (getitem)
-    operator.  The reason for allowing these operators is that they dont'
+    operator.  The reason for allowing these operators is that they don't
    normally have side effects in non-pathological code.
    An example of the 'getitem' syntax:
@ -180,26 +187,26 @@ Simple and Compound Field Names
    not required to enforce the rule about a name being a valid
    Python identifier.  Instead, it will rely on the getattr function
    of the underlying object to throw an exception if the identifier
-    is not legal.  The format function will have a minimalist parser
+    is not legal.  The str.format() function will have a minimalist
-    which only attempts to figure out when it is "done" with an
+    parser which only attempts to figure out when it is "done" with an
    identifier (by finding a '.' or a ']', or '}', etc.).
-Conversion Specifiers
+Format Specifiers
-    Each field can also specify an optional set of 'conversion
+    Each field can also specify an optional set of 'format
    specifiers' which can be used to adjust the format of that field.
-    Conversion specifiers follow the field name, with a colon (':')
+    Format specifiers follow the field name, with a colon (':')
    character separating the two:
        "My name is {0:8}".format('Fred')
-    The meaning and syntax of the conversion specifiers depends on the
+    The meaning and syntax of the format specifiers depends on the
    type of object that is being formatted, however there is a
-    standard set of conversion specifiers used for any object that
+    standard set of format specifiers used for any object that
    does not override them.
-    Conversion specifiers can themselves contain replacement fields.
+    Format specifiers can themselves contain replacement fields.
    For example, a field whose field width is itself a parameter
    could be specified via:
@ -211,24 +218,21 @@ Conversion Specifiers
    *outside* of a format field.  Within a format field, the brace
    characters always have their normal meaning.
-    The syntax for conversion specifiers is open-ended, since a class
+    The syntax for format specifiers is open-ended, since a class
-    can override the standard conversion specifiers.  In such cases,
+    can override the standard format specifiers.  In such cases,
-    the format() method merely passes all of the characters between
+    the str.format() method merely passes all of the characters between
    the first colon and the matching brace to the relevant underlying
    formatting method.
-Standard Conversion Specifiers
+Standard Format Specifiers
-    If an object does not define its own conversion specifiers, a
+    If an object does not define its own format specifiers, a
-    standard set of conversion specifiers are used.  These are similar
+    standard set of format specifiers are used.  These are similar
-    in concept to the conversion specifiers used by the existing '%'
+    in concept to the format specifiers used by the existing '%'
-    operator, however there are also a number of significant
+    operator, however there are also a number of differences.
    differences.  The standard conversion specifiers fall into three
    major categories: string conversions, integer conversions and
    floating point conversions.
-    The general form of a standard conversion specifier is:
+    The general form of a standard format specifier is:
        [[fill]align][sign][width][.precision][type]
@ -265,8 +269,6 @@ Standard Conversion Specifiers
               numbers (this is the default behavior)
        ' '  - indicates that a leading space should be used on
               positive numbers
        '()' - indicates that negative numbers should be surrounded
               by parentheses
    'width' is a decimal integer defining the minimum field width.  If
    not specified, then the field width will be determined by the
@ -274,25 +276,16 @@ Standard Conversion Specifiers
    The 'precision' is a decimal number indicating how many digits
    should be displayed after the decimal point in a floating point
-    conversion.  In a string conversion the field indicates how many
+    conversion.  For non-number types the field indicates the maximum
-    characters will be used from the field content.  The precision is
+    field size - in other words, how many characters will be used from
-    ignored for integer conversions.
+    the field content.  The precision is ignored for integer conversions.
    Finally, the 'type' determines how the data should be presented.
    If the type field is absent, an appropriate type will be assigned
    based on the value to be formatted ('d' for integers and longs,
-    'g' for floats, and 's' for everything else.)
+    'g' for floats and decimals, and 's' for everything else.)
-    The available string conversion types are:
+    The available integer presentation types are:
        's' - String format.  Invokes str() on the object.
              This is the default conversion specifier type.
        'r' - Repr format.  Invokes repr() on the object.
    There are several integer conversion types.  All invoke int() on
    the object before attempting to format it.
    The available integer conversion types are:
        'b' - Binary. Outputs the number in base 2.
        'c' - Character. Converts the integer to the corresponding
@ -304,10 +297,7 @@ Standard Conversion Specifiers
        'X' - Hex format. Outputs the number in base 16, using upper-
              case letters for the digits above 9.
-    There are several floating point conversion types.  All invoke
+    The available floating point presentation types are:
    float() on the object before attempting to format it.
    The available floating point conversion types are:
        'e' - Exponent notation. Prints the number in scientific
              notation using the letter 'e' to indicate the exponent.
@ -327,39 +317,88 @@ Standard Conversion Specifiers
        '%' - Percentage. Multiplies the number by 100 and displays
              in fixed ('f') format, followed by a percent sign.
-    Objects are able to define their own conversion specifiers to
+    For non-numeric types, there's only one presentation type, 's', which
    does nothing - its only purpose is to ease the transition from the
    old '%' style formatting.
    Objects are able to define their own format specifiers to
    replace the standard ones.  An example is the 'datetime' class,
-    whose conversion specifiers might look something like the
+    whose format specifiers might look something like the
    arguments to the strftime() function:
        "Today is: {0:a b d H:M:S Y}".format(datetime.now())
 Explicit Conversion Flag
    The explicit conversion flag is used to transform the format field value
    before it is formatted.  This can be used to override the type-specific
    formatting behavior, and format the value as if it were a more
    generic type.  Currently, two explicit conversion flags are
    recognized:
        !r - convert the value to a string using repr().
        !s - convert the value to a string using str().
    These flags are typically placed before the format specifier:
        "{0!r:20}".format("Hello")
    In the preceding example, the string "Hello" will be printed, with quotes,
    in a field of at least 20 characters width.
 Controlling Formatting on a Per-Type Basis
-    A class that wishes to implement a custom interpretation of its
+    Each Python type can control formatting of its instances by defining
-    conversion specifiers can implement a __format__ method:
+    a __format__ method.  The __format__ method is responsible for
    interpreting the format specifier, formatting the value, and
    returning the resulting string.
-    class AST:
+    The new, global built-in function 'format' simply calls this special
-        def __format__(self, specifiers):
+    method, similar to how len() and str() simply call their respective
-            ...
+    special methods:
-    The 'specifiers' argument will be either a string object or a
+        def format(value, format_spec):
-    unicode object, depending on the type of the original format
+            return value.__format__(format_spec)
-    string.  The __format__ method should test the type of the
+            
-    specifiers parameter to determine whether to return a string or
+    It is safe to call this function with a value of "None" (because the
    "None" value in Python is an object and can have methods.)
    Several built-in types, including 'str', 'int', 'float', and 'object'
    define __format__ methods.  This means that if you derive from any of
    those types, your class will know how to format itself.
    The object.__format__ method is the simplest: It simply converts the
    object to a string, and then calls format again:
        class object:
            def __format__(self, format_spec):
                return format(str(self), format_spec)
    The __format__ methods for 'int' and 'float' will do numeric formatting
    based on the format specifier.  In some cases, these formatting
    operations may be delegated to other types.  So for example, in the case
    where the 'int' formatter sees a format type of 'f' (meaning 'float')
    it can simply cast the value to a float and call format() again.
    Any class can override the __format__ method to provide custom
    formatting for that type:
        class AST:
            def __format__(self, format_spec):
                ...
    Note for Python 2.x: The 'format_spec' argument will be either
    a string object or a unicode object, depending on the type of the
    original format string.  The __format__ method should test the type
    of the specifiers parameter to determine whether to return a string or
    unicode object.  It is the responsibility of the __format__ method
    to return an object of the proper type.
-    string.format() will format each field using the following steps:
+    Note that the 'explicit conversion' flag mentioned above is not passed
-
+    to the __format__ method.  Rather, it is expected that the conversion
-     1) See if the value to be formatted has a __format__ method.  If
+    specified by the flag will be performed before calling __format__.
        it does, then call it.
     2) Otherwise, check the internal formatter within string.format
        that contains knowledge of certain builtin types.
     3) Otherwise, call str() or unicode() as appropriate.
 User-Defined Formatting
@ -418,24 +457,30 @@ Formatter Methods
    Formatter defines the following overridable methods:
-        -- get_positional(args, index)
+        -- get_value(key, args, kwargs)
        -- get_named(kwds, name)
        -- check_unused_args(used_args, args, kwargs)
-        -- format_field(value, conversion)
+        -- format_field(value, format_spec)
-    'get_positional' and 'get_named' are used to retrieve a given field
+    'get_value' is used to retrieve a given field value.  The 'key' argument
-    value.  For compound field names, these functions are only called for
+    will be either an integer or a string.  If it is an integer, it represents
-    the first component of the field name; Subsequent components are
+    the index of the positional argument in 'args'; If it is a string, then
-    handled through normal attribute and indexing operations.
+    it represents a named argument in 'kwargs'.
-    So for example, the field expression '0.name' would cause
+    The 'args' parameter is set to the list of positional arguments to
-    'get_positional' to be called with the parameter 'args' set to the
+    'vformat', and the 'kwargs' parameter is set to the dictionary of
-    list of positional arguments to vformat, and 'index' set to zero;
+    positional arguments.
-    the returned value would then be passed to the standard 'getattr'
+    
-    function to get the 'name' attribute.
+    For compound field names, these functions are only called for the
    first component of the field name; Subsequent components are handled
    through normal attribute and indexing operations.
    So for example, the field expression '0.name' would cause 'get_value'
    to be called with a 'key' argument of 0.  The 'name' attribute will be
    looked up after 'get_value' returns by calling the built-in 'getattr'
    function.
    If the index or keyword refers to an item that does not exist, then an
-    IndexError/KeyError will be raised.
+    IndexError/KeyError should be raised.
    'check_unused_args' is used to implement checking for unused arguments
    if desired.  The arguments to this function is the set of all argument
@ -445,16 +490,12 @@ Formatter Methods
    of these two sets will be the set of unused args.  'check_unused_args'
    is assumed to throw an exception if the check fails.
-    'format_field' actually generates the text for a replacement field.
+    'format_field' simply calls the global 'format' built-in.  The method
-    The 'value' argument corresponds to the value being formatted, which
+    is provided so that subclasses can override it.
    was retrieved from the arguments using the field name.  The
    'conversion' argument is the conversion spec part of the field, which
    will be either a string or unicode object, depending on the type of
    the original format string.
    To get a better understanding of how these functions relate to each
    other, here is pseudocode that explains the general operation of
-    vformat:
+    vformat.
        def vformat(format_string, args, kwargs):
@ -465,22 +506,36 @@ Formatter Methods
          # Tokens are either format fields or literal strings
          for token in self.parse(format_string):
            if is_format_field(token):
-              field_spec, conversion_spec = token.rsplit(":", 2)
+              # Split the token into field value and format spec
              field_spec, _, format_spec = token.partition(":")
              # Check for explicit type conversion
              field_spec, _, explicit = field_spec.partition("!")
              # 'first_part' is the part before the first '.' or '['
-              first_part = get_first_part(token)
+              # Assume that 'get_first_part' returns either an int or
-              used_args.add(first_part)
+              # a string, depending on the syntax.
-              if is_positional(first_part):
+              first_part = get_first_part(field_spec)
-                value = self.get_positional(args, first_part) 
+              value = self.get_value(first_part, args, kwargs)
              else:
                value = self.get_named(kwargs, first_part)
-              # Handle [subfield] or .subfield
+              # Record the fact that we used this arg
-              for comp in components(token):
+              used_args.add(first_part)
              # Handle [subfield] or .subfield. Assume that 'components'
              # returns an iterator of the various subfields, not including
              # the first part.
              for comp in components(field_spec):
                value = resolve_subfield(value, comp)
-              # Write out the converted value
+              # Handle explicit type conversion
-              buffer.write(format_field(value, conversion))
+              if explicit == 'r':
                value = repr(value)
              elif explicit == 's':
                value = str(value)
              # Call the global 'format' function and write out the converted
              # value.
              buffer.write(self.format_field(value, format_spec))
            else:
              buffer.write(token)
@ -488,10 +543,12 @@ Formatter Methods
          self.check_unused_args(used_args, args, kwargs)
          return buffer.getvalue()
-    Note that the actual algorithm of the Formatter class may not be the
+    Note that the actual algorithm of the Formatter class (which will be
-    one presented here.  In particular, the final implementation of
+    implemented in C) may not be the one presented here.  (It's likely
-    the Formatter class may define additional overridable methods and
+    that the actual implementation won't be a 'class' at all - rather,
-    hooks.  Also, the final implementation will be written in C.
+    vformat may just call a C function which accepts the other overridable
    methods as arguments.)  The primary purpose of this code example is to
    illustrate the order in which overridable methods are called.
 Customizing Formatters
@ -512,12 +569,15 @@ Customizing Formatters
              Formatter.__init__(self, flags)
              self.namespace = namespace
-          def get_named(self, kwds, name):
+          def get_value(self, key, args, kwds):
-              try:
+              if isinstance(key, str):
-                  # Check explicitly passed arguments first
+                  try:
-                  return kwds[name]
+                      # Check explicitly passed arguments first
-            except KeyError:
+                      return kwds[name]
-                  return self.namespace[name]
+                  except KeyError:
                      return self.namespace[name]
              else:
                  Formatter.get_value(key, args, kwds)
    One can use this to easily create a formatting function that allows
    access to global variables, for example:
@ -608,7 +668,7 @@ Alternate Syntax
      Dungeons) such as MUSH have used brackets (e.g. [name]) to do
      string interpolation.  The Microsoft .Net libraries uses braces
      ({}), and a syntax which is very similar to the one in this
-      proposal, although the syntax for conversion specifiers is quite
+      proposal, although the syntax for format specifiers is quite
      different. [4]
    - Backquoting.  This method has the benefit of minimal syntactical
@ -631,7 +691,7 @@ Alternate Syntax
    in front of a bracket.
    2) The use of the colon character (':') as a separator for
-    conversion specifiers.  This was chosen simply because that's
+    format specifiers.  This was chosen simply because that's
    what .Net uses.