added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments

This commit is contained in:
David Goodger 2006-04-26 20:33:25 +00:00
parent a30697598f
commit 0ac7bc2d0c
3 changed files with 535 additions and 1 deletions

View File

@ -104,6 +104,8 @@ Index by Category
S 358 The "bytes" Object Schemenauer
S 359 The "make" Statement Bethard
S 754 IEEE 754 Floating Point Special Values Warnes
S 3101 Advanced String Formatting Talin
S 3102 Keyword-Only Arguments Talin
Finished PEPs (done, implemented in Subversion)
@ -425,7 +427,8 @@ Numerical Index
P 3002 Procedure for Backwards-Incompatible Changes Bethard
I 3099 Things that will Not Change in Python 3000 Brandl
I 3100 Python 3.0 Plans Kuchling, Cannon
S 3101 Advanced String Formatting Talin
S 3102 Keyword-Only Arguments Talin
Key
@ -522,6 +525,7 @@ Owners
Smith, Kevin D. Kevin.Smith@theMorgue.org
Stein, Greg gstein@lyra.org
Suzi, Roman rnd@onego.ru
Talin talin at acm.org
Taschuk, Steven staschuk@telusplanet.net
Tirosh, Oren oren at hishome.net
Warnes, Gregory R. warnes@users.sourceforge.net

346
pep-3101.txt Normal file
View File

@ -0,0 +1,346 @@
PEP: 3101
Title: Advanced String Formatting
Version: $Revision$
Last-Modified: $Date$
Author: Talin <talin at acm.org>
Status: Draft
Type: Standards
Content-Type: text/plain
Created: 16-Apr-2006
Python-Version: 3.0
Post-History:
Abstract
This PEP proposes a new system for built-in string formatting
operations, intended as a replacement for the existing '%' string
formatting operator.
Rationale
Python currently provides two methods of string interpolation:
- The '%' operator for strings.
- The string.Template module.
The scope of this PEP will be restricted to proposals for built-in
string formatting operations (in other words, methods of the
built-in string type). This does not obviate the need for more
sophisticated string-manipulation modules in the standard library
such as string.Template. In any case, string.Template will not be
discussed here, except to say that the this proposal will most
likely have some overlapping functionality with that module.
The '%' operator is primarily limited by the fact that it is a
binary operator, and therefore can take at most two arguments.
One of those arguments is already dedicated to the format string,
leaving all other variables to be squeezed into the remaining
argument. The current practice is to use either a dictionary or a
tuple as the second argument, but as many people have commented
[1], this lacks flexibility. The "all or nothing" approach
(meaning that one must choose between only positional arguments,
or only named arguments) is felt to be overly constraining.
Specification
The specification will consist of 4 parts:
- Specification of a set of methods to be added to the built-in
string class.
- Specification of a new syntax for format strings.
- Specification of a new set of class methods to control the
formatting and conversion of objects.
- Specification of an API for user-defined formatting classes.
String Methods
The build-in string class will gain two new methods. The first
method is 'format', and takes an arbitrary number of positional
and keyword arguments:
"The story of {0}, {1}, and {c}".format(a, b, c=d)
Within a format string, each positional argument is identified
with a number, starting from zero, so in the above example, 'a' is
argument 0 and 'b' is argument 1. Each keyword argument is
identified by its keyword name, so in the above example, 'c' is
used to refer to the third argument.
The result of the format call is an object of the same type
(string or unicode) as the format string.
Format Strings
Brace characters ('curly braces') are used to indicate a
replacement field within the string:
"My name is {0}".format('Fred')
The result of this is the string:
"My name is Fred"
Braces can be escaped using a backslash:
"My name is {0} :-\{\}".format('Fred')
Which would produce:
"My name is Fred :-{}"
The element within the braces is called a 'field'. Fields consist
of a name, which can either be simple or compound, and an optional
'conversion specifier'.
Simple names are either names or numbers. If numbers, they must
be valid decimal numbers; if names, they must be valid Python
identifiers. A number is used to identify a positional argument,
while a name is used to identify a keyword argument.
Compound names are a sequence of simple names seperated by
periods:
"My name is {0.name} :-\{\}".format(dict(name='Fred'))
Compound names can be used to access specific dictionary entries,
array elements, or object attributes. In the above example, the
'{0.name}' field refers to the dictionary entry 'name' within
positional argument 0.
Each field can also specify an optional set of 'conversion
specifiers'. Conversion specifiers follow the field name, with a
colon (':') character separating the two:
"My name is {0:8}".format('Fred')
The meaning and syntax of the conversion specifiers depends on the
type of object that is being formatted, however many of the
built-in types will recognize a standard set of conversion
specifiers.
The conversion specifier consists of a sequence of zero or more
characters, each of which can consist of any printable character
except for a non-escaped '}'. The format() method does not
attempt to intepret the conversion specifiers in any way; it
merely passes all of the characters between the first colon ':'
and the matching right brace ('}') to the various underlying
formatters (described later.)
When using the 'fformat' variant, it is possible to omit the field
name entirely, and simply include the conversion specifiers:
"My name is {:pad(23)}"
This syntax is used to send special instructions to the custom
formatter object (such as instructing it to insert padding
characters up to a given column.) The interpretation of this
'empty' field is entirely up to the custom formatter; no
standard interpretation will be defined in this PEP.
If a custom formatter is not being used, then it is an error to
omit the field name.
Standard Conversion Specifiers
For most built-in types, the conversion specifiers will be the
same or similar to the existing conversion specifiers used with
the '%' operator. Thus, instead of '%02.2x", you will say
'{0:2.2x}'.
There are a few differences however:
- The trailing letter is optional - you don't need to say '2.2d',
you can instead just say '2.2'. If the letter is omitted, the
value will be converted into its 'natural' form (that is, the
form that it take if str() or unicode() were called on it)
subject to the field length and precision specifiers (if
supplied).
- Variable field width specifiers use a nested version of the {}
syntax, allowing the width specifier to be either a positional
or keyword argument:
"{0:{1}.{2}d}".format(a, b, c)
(Note: It might be easier to parse if these used a different
type of delimiter, such as parens - avoiding the need to create
a regex that handles the recursive case.)
- The support for length modifiers (which are ignored by Python
anyway) is dropped.
For non-built-in types, the conversion specifiers will be specific
to that type. An example is the 'datetime' class, whose
conversion specifiers are identical to the arguments to the
strftime() function:
"Today is: {0:%x}".format(datetime.now())
Controlling Formatting
A class that wishes to implement a custom interpretation of its
conversion specifiers can implement a __format__ method:
class AST:
def __format__(self, specifiers):
...
The 'specifiers' argument will be either a string object or a
unicode object, depending on the type of the original format
string. The __format__ method should test the type of the
specifiers parameter to determine whether to return a string or
unicode object. It is the responsibility of the __format__ method
to return an object of the proper type.
string.format() will format each field using the following steps:
1) See if the value to be formatted has a __format__ method. If
it does, then call it.
2) Otherwise, check the internal formatter within string.format
that contains knowledge of certain builtin types.
3) Otherwise, call str() or unicode() as appropriate.
User-Defined Formatting Classes
The code that interprets format strings can be called explicitly
from user code. This allows the creation of custom formatter
classes that can override the normal formatting rules.
The string and unicode classes will have a class method called
'cformat' that does all the actual work of formatting; The
format() method is just a wrapper that calls cformat.
The parameters to the cformat function are:
-- The format string (or unicode; the same function handles
both.)
-- A field format hook (see below)
-- A tuple containing the positional arguments
-- A dict containing the keyword arguments
The cformat function will parse all of the fields in the format
string, and return a new string (or unicode) with all of the
fields replaced with their formatted values.
For each field, the cformat function will attempt to call the
field format hook with the following arguments:
field_hook(value, conversion, buffer)
The 'value' field corresponds to the value being formatted, which
was retrieved from the arguments using the field name. (The
field_hook has no control over the selection of values, only
how they are formatted.)
The 'conversion' argument is the conversion spec part of the
field, which will be either a string or unicode object, depending
on the type of the original format string.
The 'buffer' argument is a Python array object, either a byte
array or unicode character array. The buffer object will contain
the partially constructed string; the field hook is free to modify
the contents of this buffer if needed.
The field_hook will be called once per field. The field_hook may
take one of two actions:
1) Return False, indicating that the field_hook will not
process this field and the default formatting should be
used. This decision should be based on the type of the
value object, and the contents of the conversion string.
2) Append the formatted field to the buffer, and return True.
Alternate Syntax
Naturally, one of the most contentious issues is the syntax of the
format strings, and in particular the markup conventions used to
indicate fields.
Rather than attempting to exhaustively list all of the various
proposals, I will cover the ones that are most widely used
already.
- Shell variable syntax: $name and $(name) (or in some variants,
${name}). This is probably the oldest convention out there, and
is used by Perl and many others. When used without the braces,
the length of the variable is determined by lexically scanning
until an invalid character is found.
This scheme is generally used in cases where interpolation is
implicit - that is, in environments where any string can contain
interpolation variables, and no special subsitution function
need be invoked. In such cases, it is important to prevent the
interpolation behavior from occuring accidentally, so the '$'
(which is otherwise a relatively uncommonly-used character) is
used to signal when the behavior should occur.
It is the author's opinion, however, that in cases where the
formatting is explicitly invoked, that less care needs to be
taken to prevent accidental interpolation, in which case a
lighter and less unwieldy syntax can be used.
- Printf and its cousins ('%'), including variations that add a
field index, so that fields can be interpolated out of order.
- Other bracket-only variations. Various MUDs (Multi-User
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
string interpolation. The Microsoft .Net libraries uses braces
({}), and a syntax which is very similar to the one in this
proposal, although the syntax for conversion specifiers is quite
different. [2]
- Backquoting. This method has the benefit of minimal syntactical
clutter, however it lacks many of the benefits of a function
call syntax (such as complex expression arguments, custom
formatters, etc.).
- Other variations include Ruby's #{}, PHP's {$name}, and so
on.
Backwards Compatibility
Backwards compatibility can be maintained by leaving the existing
mechanisms in place. The new system does not collide with any of
the method names of the existing string formatting techniques, so
both systems can co-exist until it comes time to deprecate the
older system.
References
[1] [Python-3000] String formating operations in python 3k
http://mail.python.org/pipermail/python-3000/2006-April/000285.html
[2] Composite Formatting - [.Net Framework Developer's Guide]
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

184
pep-3102.txt Normal file
View File

@ -0,0 +1,184 @@
PEP: 3102
Title: Keyword-Only Arguments
Version: $Revision$
Last-Modified: $Date$
Author: Talin <talin at acm.org>
Status: Draft
Type: Standards
Content-Type: text/plain
Created: 22-Apr-2006
Python-Version: 3.0
Post-History:
Abstract
This PEP proposes a change to the way that function arguments are
assigned to named parameter slots. In particular, it enables the
declaration of "keyword-only" arguments: arguments that can only
be supplied by keyword and which will never be automatically
filled in by a positional argument.
Rationale
The current Python function-calling paradigm allows arguments to
be specified either by position or by keyword. An argument can be
filled in either explicitly by name, or implicitly by position.
There are often cases where it is desirable for a function to take
a variable number of arguments. The Python language supports this
using the 'varargs' syntax ('*name'), which specifies that any
'left over' arguments be passed into the varargs parameter as a
tuple.
One limitation on this is that currently, all of the regular
argument slots must be filled before the vararg slot can be.
This is not always desirable. One can easily envision a function
which takes a variable number of arguments, but also takes one
or more 'options' in the form of keyword arguments. Currently,
the only way to do this is to define both a varargs argument,
and a 'keywords' argument (**kwargs), and then manually extract
the desired keywords from the dictionary.
Specification
Syntactically, the proposed changes are fairly simple. The first
change is to allow regular arguments to appear after a varargs
argument:
def sortwords(*wordlist, case_sensitive=False):
...
This function accepts any number of positional arguments, and it
also accepts a keyword option called 'case_sensitive'. This
option will never be filled in by a positional argument, but
must be explicitly specified by name.
Keyword-only arguments are not required to have a default value.
Since Python requires that all arguments be bound to a value,
and since the only way to bind a value to a keyword-only argument
is via keyword, such arguments are therefore 'required keyword'
arguments. Such arguments must be supplied by the caller, and
they must be supplied via keyword.
The second syntactical change is to allow the argument name to
be omitted for a varargs argument:
def compare(a, b, *, key=None):
...
The reasoning behind this change is as follows. Imagine for a
moment a function which takes several positional arguments, as
well as a keyword argument:
def compare(a, b, key=None):
...
Now, suppose you wanted to have 'key' be a keyword-only argument.
Under the above syntax, you could accomplish this by adding a
varargs argument immediately before the keyword argument:
def compare(a, b, *ignore, key=None):
...
Unfortunately, the 'ignore' argument will also suck up any
erroneous positional arguments that may have been supplied by the
caller. Given that we'd prefer any unwanted arguments to raise an
error, we could do this:
def compare(a, b, *ignore, key=None):
if ignore: # If ignore is not empty
raise TypeError
As a convenient shortcut, we can simply omit the 'ignore' name,
meaning 'don't allow any positional arguments beyond this point'.
Function Calling Behavior
The previous section describes the difference between the old
behavior and the new. However, it is also useful to have a
description of the new behavior that stands by itself, without
reference to the previous model. So this next section will
attempt to provide such a description.
When a function is called, the input arguments are assigned to
formal parameters as follows:
- For each formal parameter, there is a slot which will be used
to contain the value of the argument assigned to that
parameter.
- Slots which have had values assigned to them are marked as
'filled'. Slots which have no value assigned to them yet are
considered 'empty'.
- Initially, all slots are marked as empty.
- Positional arguments are assigned first, followed by keyword
arguments.
- For each positional argument:
o Attempt to bind the argument to the first unfilled
parameter slot. If the slot is not a vararg slot, then
mark the slot as 'filled'.
o If the next unfilled slot is a vararg slot, and it does
not have a name, then it is an error.
o Otherwise, if the next unfilled slot is a vararg slot then
all remaining non-keyword arguments are placed into the
vararg slot.
- For each keyword argument:
o If there is a parameter with the same name as the keyword,
then the argument value is assigned to that parameter slot.
However, if the parameter slot is already filled, then that
is an error.
o Otherwise, if there is a 'keyword dictionary' argument,
the argument is added to the dictionary using the keyword
name as the dictionary key, unless there is already an
entry with that key, in which case it is an error.
o Otherwise, if there is no keyword dictionary, and no
matching named parameter, then it is an error.
- Finally:
o If the vararg slot is not yet filled, assign an empty tuple
as its value.
o For each remaining empty slot: if there is a default value
for that slot, then fill the slot with the default value.
If there is no default value, then it is an error.
In accordance with the current Python implementation, any errors
encountered will be signaled by raising TypeError. (If you want
something different, that's a subject for a different PEP.)
Backwards Compatibility
The function calling behavior specified in this PEP is a superset
of the existing behavior - that is, it is expected that any
existing programs will continue to work.
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: