added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments
This commit is contained in:
parent
a30697598f
commit
0ac7bc2d0c
|
@ -104,6 +104,8 @@ Index by Category
|
|||
S 358 The "bytes" Object Schemenauer
|
||||
S 359 The "make" Statement Bethard
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
S 3101 Advanced String Formatting Talin
|
||||
S 3102 Keyword-Only Arguments Talin
|
||||
|
||||
Finished PEPs (done, implemented in Subversion)
|
||||
|
||||
|
@ -425,7 +427,8 @@ Numerical Index
|
|||
P 3002 Procedure for Backwards-Incompatible Changes Bethard
|
||||
I 3099 Things that will Not Change in Python 3000 Brandl
|
||||
I 3100 Python 3.0 Plans Kuchling, Cannon
|
||||
|
||||
S 3101 Advanced String Formatting Talin
|
||||
S 3102 Keyword-Only Arguments Talin
|
||||
|
||||
Key
|
||||
|
||||
|
@ -522,6 +525,7 @@ Owners
|
|||
Smith, Kevin D. Kevin.Smith@theMorgue.org
|
||||
Stein, Greg gstein@lyra.org
|
||||
Suzi, Roman rnd@onego.ru
|
||||
Talin talin at acm.org
|
||||
Taschuk, Steven staschuk@telusplanet.net
|
||||
Tirosh, Oren oren at hishome.net
|
||||
Warnes, Gregory R. warnes@users.sourceforge.net
|
||||
|
|
|
@ -0,0 +1,346 @@
|
|||
PEP: 3101
|
||||
Title: Advanced String Formatting
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Talin <talin at acm.org>
|
||||
Status: Draft
|
||||
Type: Standards
|
||||
Content-Type: text/plain
|
||||
Created: 16-Apr-2006
|
||||
Python-Version: 3.0
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP proposes a new system for built-in string formatting
|
||||
operations, intended as a replacement for the existing '%' string
|
||||
formatting operator.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Python currently provides two methods of string interpolation:
|
||||
|
||||
- The '%' operator for strings.
|
||||
|
||||
- The string.Template module.
|
||||
|
||||
The scope of this PEP will be restricted to proposals for built-in
|
||||
string formatting operations (in other words, methods of the
|
||||
built-in string type). This does not obviate the need for more
|
||||
sophisticated string-manipulation modules in the standard library
|
||||
such as string.Template. In any case, string.Template will not be
|
||||
discussed here, except to say that the this proposal will most
|
||||
likely have some overlapping functionality with that module.
|
||||
|
||||
The '%' operator is primarily limited by the fact that it is a
|
||||
binary operator, and therefore can take at most two arguments.
|
||||
One of those arguments is already dedicated to the format string,
|
||||
leaving all other variables to be squeezed into the remaining
|
||||
argument. The current practice is to use either a dictionary or a
|
||||
tuple as the second argument, but as many people have commented
|
||||
[1], this lacks flexibility. The "all or nothing" approach
|
||||
(meaning that one must choose between only positional arguments,
|
||||
or only named arguments) is felt to be overly constraining.
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
The specification will consist of 4 parts:
|
||||
|
||||
- Specification of a set of methods to be added to the built-in
|
||||
string class.
|
||||
|
||||
- Specification of a new syntax for format strings.
|
||||
|
||||
- Specification of a new set of class methods to control the
|
||||
formatting and conversion of objects.
|
||||
|
||||
- Specification of an API for user-defined formatting classes.
|
||||
|
||||
|
||||
String Methods
|
||||
|
||||
The build-in string class will gain two new methods. The first
|
||||
method is 'format', and takes an arbitrary number of positional
|
||||
and keyword arguments:
|
||||
|
||||
"The story of {0}, {1}, and {c}".format(a, b, c=d)
|
||||
|
||||
Within a format string, each positional argument is identified
|
||||
with a number, starting from zero, so in the above example, 'a' is
|
||||
argument 0 and 'b' is argument 1. Each keyword argument is
|
||||
identified by its keyword name, so in the above example, 'c' is
|
||||
used to refer to the third argument.
|
||||
|
||||
The result of the format call is an object of the same type
|
||||
(string or unicode) as the format string.
|
||||
|
||||
|
||||
Format Strings
|
||||
|
||||
Brace characters ('curly braces') are used to indicate a
|
||||
replacement field within the string:
|
||||
|
||||
"My name is {0}".format('Fred')
|
||||
|
||||
The result of this is the string:
|
||||
|
||||
"My name is Fred"
|
||||
|
||||
Braces can be escaped using a backslash:
|
||||
|
||||
"My name is {0} :-\{\}".format('Fred')
|
||||
|
||||
Which would produce:
|
||||
|
||||
"My name is Fred :-{}"
|
||||
|
||||
The element within the braces is called a 'field'. Fields consist
|
||||
of a name, which can either be simple or compound, and an optional
|
||||
'conversion specifier'.
|
||||
|
||||
Simple names are either names or numbers. If numbers, they must
|
||||
be valid decimal numbers; if names, they must be valid Python
|
||||
identifiers. A number is used to identify a positional argument,
|
||||
while a name is used to identify a keyword argument.
|
||||
|
||||
Compound names are a sequence of simple names seperated by
|
||||
periods:
|
||||
|
||||
"My name is {0.name} :-\{\}".format(dict(name='Fred'))
|
||||
|
||||
Compound names can be used to access specific dictionary entries,
|
||||
array elements, or object attributes. In the above example, the
|
||||
'{0.name}' field refers to the dictionary entry 'name' within
|
||||
positional argument 0.
|
||||
|
||||
Each field can also specify an optional set of 'conversion
|
||||
specifiers'. Conversion specifiers follow the field name, with a
|
||||
colon (':') character separating the two:
|
||||
|
||||
"My name is {0:8}".format('Fred')
|
||||
|
||||
The meaning and syntax of the conversion specifiers depends on the
|
||||
type of object that is being formatted, however many of the
|
||||
built-in types will recognize a standard set of conversion
|
||||
specifiers.
|
||||
|
||||
The conversion specifier consists of a sequence of zero or more
|
||||
characters, each of which can consist of any printable character
|
||||
except for a non-escaped '}'. The format() method does not
|
||||
attempt to intepret the conversion specifiers in any way; it
|
||||
merely passes all of the characters between the first colon ':'
|
||||
and the matching right brace ('}') to the various underlying
|
||||
formatters (described later.)
|
||||
|
||||
When using the 'fformat' variant, it is possible to omit the field
|
||||
name entirely, and simply include the conversion specifiers:
|
||||
|
||||
"My name is {:pad(23)}"
|
||||
|
||||
This syntax is used to send special instructions to the custom
|
||||
formatter object (such as instructing it to insert padding
|
||||
characters up to a given column.) The interpretation of this
|
||||
'empty' field is entirely up to the custom formatter; no
|
||||
standard interpretation will be defined in this PEP.
|
||||
|
||||
If a custom formatter is not being used, then it is an error to
|
||||
omit the field name.
|
||||
|
||||
|
||||
Standard Conversion Specifiers
|
||||
|
||||
For most built-in types, the conversion specifiers will be the
|
||||
same or similar to the existing conversion specifiers used with
|
||||
the '%' operator. Thus, instead of '%02.2x", you will say
|
||||
'{0:2.2x}'.
|
||||
|
||||
There are a few differences however:
|
||||
|
||||
- The trailing letter is optional - you don't need to say '2.2d',
|
||||
you can instead just say '2.2'. If the letter is omitted, the
|
||||
value will be converted into its 'natural' form (that is, the
|
||||
form that it take if str() or unicode() were called on it)
|
||||
subject to the field length and precision specifiers (if
|
||||
supplied).
|
||||
|
||||
- Variable field width specifiers use a nested version of the {}
|
||||
syntax, allowing the width specifier to be either a positional
|
||||
or keyword argument:
|
||||
|
||||
"{0:{1}.{2}d}".format(a, b, c)
|
||||
|
||||
(Note: It might be easier to parse if these used a different
|
||||
type of delimiter, such as parens - avoiding the need to create
|
||||
a regex that handles the recursive case.)
|
||||
|
||||
- The support for length modifiers (which are ignored by Python
|
||||
anyway) is dropped.
|
||||
|
||||
For non-built-in types, the conversion specifiers will be specific
|
||||
to that type. An example is the 'datetime' class, whose
|
||||
conversion specifiers are identical to the arguments to the
|
||||
strftime() function:
|
||||
|
||||
"Today is: {0:%x}".format(datetime.now())
|
||||
|
||||
|
||||
Controlling Formatting
|
||||
|
||||
A class that wishes to implement a custom interpretation of its
|
||||
conversion specifiers can implement a __format__ method:
|
||||
|
||||
class AST:
|
||||
def __format__(self, specifiers):
|
||||
...
|
||||
|
||||
The 'specifiers' argument will be either a string object or a
|
||||
unicode object, depending on the type of the original format
|
||||
string. The __format__ method should test the type of the
|
||||
specifiers parameter to determine whether to return a string or
|
||||
unicode object. It is the responsibility of the __format__ method
|
||||
to return an object of the proper type.
|
||||
|
||||
string.format() will format each field using the following steps:
|
||||
|
||||
1) See if the value to be formatted has a __format__ method. If
|
||||
it does, then call it.
|
||||
|
||||
2) Otherwise, check the internal formatter within string.format
|
||||
that contains knowledge of certain builtin types.
|
||||
|
||||
3) Otherwise, call str() or unicode() as appropriate.
|
||||
|
||||
|
||||
User-Defined Formatting Classes
|
||||
|
||||
The code that interprets format strings can be called explicitly
|
||||
from user code. This allows the creation of custom formatter
|
||||
classes that can override the normal formatting rules.
|
||||
|
||||
The string and unicode classes will have a class method called
|
||||
'cformat' that does all the actual work of formatting; The
|
||||
format() method is just a wrapper that calls cformat.
|
||||
|
||||
The parameters to the cformat function are:
|
||||
|
||||
-- The format string (or unicode; the same function handles
|
||||
both.)
|
||||
-- A field format hook (see below)
|
||||
-- A tuple containing the positional arguments
|
||||
-- A dict containing the keyword arguments
|
||||
|
||||
The cformat function will parse all of the fields in the format
|
||||
string, and return a new string (or unicode) with all of the
|
||||
fields replaced with their formatted values.
|
||||
|
||||
For each field, the cformat function will attempt to call the
|
||||
field format hook with the following arguments:
|
||||
|
||||
field_hook(value, conversion, buffer)
|
||||
|
||||
The 'value' field corresponds to the value being formatted, which
|
||||
was retrieved from the arguments using the field name. (The
|
||||
field_hook has no control over the selection of values, only
|
||||
how they are formatted.)
|
||||
|
||||
The 'conversion' argument is the conversion spec part of the
|
||||
field, which will be either a string or unicode object, depending
|
||||
on the type of the original format string.
|
||||
|
||||
The 'buffer' argument is a Python array object, either a byte
|
||||
array or unicode character array. The buffer object will contain
|
||||
the partially constructed string; the field hook is free to modify
|
||||
the contents of this buffer if needed.
|
||||
|
||||
The field_hook will be called once per field. The field_hook may
|
||||
take one of two actions:
|
||||
|
||||
1) Return False, indicating that the field_hook will not
|
||||
process this field and the default formatting should be
|
||||
used. This decision should be based on the type of the
|
||||
value object, and the contents of the conversion string.
|
||||
|
||||
2) Append the formatted field to the buffer, and return True.
|
||||
|
||||
|
||||
Alternate Syntax
|
||||
|
||||
Naturally, one of the most contentious issues is the syntax of the
|
||||
format strings, and in particular the markup conventions used to
|
||||
indicate fields.
|
||||
|
||||
Rather than attempting to exhaustively list all of the various
|
||||
proposals, I will cover the ones that are most widely used
|
||||
already.
|
||||
|
||||
- Shell variable syntax: $name and $(name) (or in some variants,
|
||||
${name}). This is probably the oldest convention out there, and
|
||||
is used by Perl and many others. When used without the braces,
|
||||
the length of the variable is determined by lexically scanning
|
||||
until an invalid character is found.
|
||||
|
||||
This scheme is generally used in cases where interpolation is
|
||||
implicit - that is, in environments where any string can contain
|
||||
interpolation variables, and no special subsitution function
|
||||
need be invoked. In such cases, it is important to prevent the
|
||||
interpolation behavior from occuring accidentally, so the '$'
|
||||
(which is otherwise a relatively uncommonly-used character) is
|
||||
used to signal when the behavior should occur.
|
||||
|
||||
It is the author's opinion, however, that in cases where the
|
||||
formatting is explicitly invoked, that less care needs to be
|
||||
taken to prevent accidental interpolation, in which case a
|
||||
lighter and less unwieldy syntax can be used.
|
||||
|
||||
- Printf and its cousins ('%'), including variations that add a
|
||||
field index, so that fields can be interpolated out of order.
|
||||
|
||||
- Other bracket-only variations. Various MUDs (Multi-User
|
||||
Dungeons) such as MUSH have used brackets (e.g. [name]) to do
|
||||
string interpolation. The Microsoft .Net libraries uses braces
|
||||
({}), and a syntax which is very similar to the one in this
|
||||
proposal, although the syntax for conversion specifiers is quite
|
||||
different. [2]
|
||||
|
||||
- Backquoting. This method has the benefit of minimal syntactical
|
||||
clutter, however it lacks many of the benefits of a function
|
||||
call syntax (such as complex expression arguments, custom
|
||||
formatters, etc.).
|
||||
|
||||
- Other variations include Ruby's #{}, PHP's {$name}, and so
|
||||
on.
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
|
||||
Backwards compatibility can be maintained by leaving the existing
|
||||
mechanisms in place. The new system does not collide with any of
|
||||
the method names of the existing string formatting techniques, so
|
||||
both systems can co-exist until it comes time to deprecate the
|
||||
older system.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] [Python-3000] String formating operations in python 3k
|
||||
http://mail.python.org/pipermail/python-3000/2006-April/000285.html
|
||||
|
||||
[2] Composite Formatting - [.Net Framework Developer's Guide]
|
||||
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
|
@ -0,0 +1,184 @@
|
|||
PEP: 3102
|
||||
Title: Keyword-Only Arguments
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Talin <talin at acm.org>
|
||||
Status: Draft
|
||||
Type: Standards
|
||||
Content-Type: text/plain
|
||||
Created: 22-Apr-2006
|
||||
Python-Version: 3.0
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP proposes a change to the way that function arguments are
|
||||
assigned to named parameter slots. In particular, it enables the
|
||||
declaration of "keyword-only" arguments: arguments that can only
|
||||
be supplied by keyword and which will never be automatically
|
||||
filled in by a positional argument.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
The current Python function-calling paradigm allows arguments to
|
||||
be specified either by position or by keyword. An argument can be
|
||||
filled in either explicitly by name, or implicitly by position.
|
||||
|
||||
There are often cases where it is desirable for a function to take
|
||||
a variable number of arguments. The Python language supports this
|
||||
using the 'varargs' syntax ('*name'), which specifies that any
|
||||
'left over' arguments be passed into the varargs parameter as a
|
||||
tuple.
|
||||
|
||||
One limitation on this is that currently, all of the regular
|
||||
argument slots must be filled before the vararg slot can be.
|
||||
|
||||
This is not always desirable. One can easily envision a function
|
||||
which takes a variable number of arguments, but also takes one
|
||||
or more 'options' in the form of keyword arguments. Currently,
|
||||
the only way to do this is to define both a varargs argument,
|
||||
and a 'keywords' argument (**kwargs), and then manually extract
|
||||
the desired keywords from the dictionary.
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
Syntactically, the proposed changes are fairly simple. The first
|
||||
change is to allow regular arguments to appear after a varargs
|
||||
argument:
|
||||
|
||||
def sortwords(*wordlist, case_sensitive=False):
|
||||
...
|
||||
|
||||
This function accepts any number of positional arguments, and it
|
||||
also accepts a keyword option called 'case_sensitive'. This
|
||||
option will never be filled in by a positional argument, but
|
||||
must be explicitly specified by name.
|
||||
|
||||
Keyword-only arguments are not required to have a default value.
|
||||
Since Python requires that all arguments be bound to a value,
|
||||
and since the only way to bind a value to a keyword-only argument
|
||||
is via keyword, such arguments are therefore 'required keyword'
|
||||
arguments. Such arguments must be supplied by the caller, and
|
||||
they must be supplied via keyword.
|
||||
|
||||
The second syntactical change is to allow the argument name to
|
||||
be omitted for a varargs argument:
|
||||
|
||||
def compare(a, b, *, key=None):
|
||||
...
|
||||
|
||||
The reasoning behind this change is as follows. Imagine for a
|
||||
moment a function which takes several positional arguments, as
|
||||
well as a keyword argument:
|
||||
|
||||
def compare(a, b, key=None):
|
||||
...
|
||||
|
||||
Now, suppose you wanted to have 'key' be a keyword-only argument.
|
||||
Under the above syntax, you could accomplish this by adding a
|
||||
varargs argument immediately before the keyword argument:
|
||||
|
||||
def compare(a, b, *ignore, key=None):
|
||||
...
|
||||
|
||||
Unfortunately, the 'ignore' argument will also suck up any
|
||||
erroneous positional arguments that may have been supplied by the
|
||||
caller. Given that we'd prefer any unwanted arguments to raise an
|
||||
error, we could do this:
|
||||
|
||||
def compare(a, b, *ignore, key=None):
|
||||
if ignore: # If ignore is not empty
|
||||
raise TypeError
|
||||
|
||||
As a convenient shortcut, we can simply omit the 'ignore' name,
|
||||
meaning 'don't allow any positional arguments beyond this point'.
|
||||
|
||||
|
||||
Function Calling Behavior
|
||||
|
||||
The previous section describes the difference between the old
|
||||
behavior and the new. However, it is also useful to have a
|
||||
description of the new behavior that stands by itself, without
|
||||
reference to the previous model. So this next section will
|
||||
attempt to provide such a description.
|
||||
|
||||
When a function is called, the input arguments are assigned to
|
||||
formal parameters as follows:
|
||||
|
||||
- For each formal parameter, there is a slot which will be used
|
||||
to contain the value of the argument assigned to that
|
||||
parameter.
|
||||
|
||||
- Slots which have had values assigned to them are marked as
|
||||
'filled'. Slots which have no value assigned to them yet are
|
||||
considered 'empty'.
|
||||
|
||||
- Initially, all slots are marked as empty.
|
||||
|
||||
- Positional arguments are assigned first, followed by keyword
|
||||
arguments.
|
||||
|
||||
- For each positional argument:
|
||||
|
||||
o Attempt to bind the argument to the first unfilled
|
||||
parameter slot. If the slot is not a vararg slot, then
|
||||
mark the slot as 'filled'.
|
||||
|
||||
o If the next unfilled slot is a vararg slot, and it does
|
||||
not have a name, then it is an error.
|
||||
|
||||
o Otherwise, if the next unfilled slot is a vararg slot then
|
||||
all remaining non-keyword arguments are placed into the
|
||||
vararg slot.
|
||||
|
||||
- For each keyword argument:
|
||||
|
||||
o If there is a parameter with the same name as the keyword,
|
||||
then the argument value is assigned to that parameter slot.
|
||||
However, if the parameter slot is already filled, then that
|
||||
is an error.
|
||||
|
||||
o Otherwise, if there is a 'keyword dictionary' argument,
|
||||
the argument is added to the dictionary using the keyword
|
||||
name as the dictionary key, unless there is already an
|
||||
entry with that key, in which case it is an error.
|
||||
|
||||
o Otherwise, if there is no keyword dictionary, and no
|
||||
matching named parameter, then it is an error.
|
||||
|
||||
- Finally:
|
||||
|
||||
o If the vararg slot is not yet filled, assign an empty tuple
|
||||
as its value.
|
||||
|
||||
o For each remaining empty slot: if there is a default value
|
||||
for that slot, then fill the slot with the default value.
|
||||
If there is no default value, then it is an error.
|
||||
|
||||
In accordance with the current Python implementation, any errors
|
||||
encountered will be signaled by raising TypeError. (If you want
|
||||
something different, that's a subject for a different PEP.)
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
|
||||
The function calling behavior specified in this PEP is a superset
|
||||
of the existing behavior - that is, it is expected that any
|
||||
existing programs will continue to work.
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue