From 0ac7bc2d0cb7e740b2edca0fa1adf26ad0d51f21 Mon Sep 17 00:00:00 2001 From: David Goodger Date: Wed, 26 Apr 2006 20:33:25 +0000 Subject: [PATCH] added two PEPs by Talin: 3101, Advanced String Formatting; and 3102, Keyword-Only Arguments --- pep-0000.txt | 6 +- pep-3101.txt | 346 +++++++++++++++++++++++++++++++++++++++++++++++++++ pep-3102.txt | 184 +++++++++++++++++++++++++++ 3 files changed, 535 insertions(+), 1 deletion(-) create mode 100644 pep-3101.txt create mode 100644 pep-3102.txt diff --git a/pep-0000.txt b/pep-0000.txt index 91fa9b7c0..423006a5a 100644 --- a/pep-0000.txt +++ b/pep-0000.txt @@ -104,6 +104,8 @@ Index by Category S 358 The "bytes" Object Schemenauer S 359 The "make" Statement Bethard S 754 IEEE 754 Floating Point Special Values Warnes + S 3101 Advanced String Formatting Talin + S 3102 Keyword-Only Arguments Talin Finished PEPs (done, implemented in Subversion) @@ -425,7 +427,8 @@ Numerical Index P 3002 Procedure for Backwards-Incompatible Changes Bethard I 3099 Things that will Not Change in Python 3000 Brandl I 3100 Python 3.0 Plans Kuchling, Cannon - + S 3101 Advanced String Formatting Talin + S 3102 Keyword-Only Arguments Talin Key @@ -522,6 +525,7 @@ Owners Smith, Kevin D. Kevin.Smith@theMorgue.org Stein, Greg gstein@lyra.org Suzi, Roman rnd@onego.ru + Talin talin at acm.org Taschuk, Steven staschuk@telusplanet.net Tirosh, Oren oren at hishome.net Warnes, Gregory R. warnes@users.sourceforge.net diff --git a/pep-3101.txt b/pep-3101.txt new file mode 100644 index 000000000..edf990f0f --- /dev/null +++ b/pep-3101.txt @@ -0,0 +1,346 @@ +PEP: 3101 +Title: Advanced String Formatting +Version: $Revision$ +Last-Modified: $Date$ +Author: Talin +Status: Draft +Type: Standards +Content-Type: text/plain +Created: 16-Apr-2006 +Python-Version: 3.0 +Post-History: + + +Abstract + + This PEP proposes a new system for built-in string formatting + operations, intended as a replacement for the existing '%' string + formatting operator. + + +Rationale + + Python currently provides two methods of string interpolation: + + - The '%' operator for strings. + + - The string.Template module. + + The scope of this PEP will be restricted to proposals for built-in + string formatting operations (in other words, methods of the + built-in string type). This does not obviate the need for more + sophisticated string-manipulation modules in the standard library + such as string.Template. In any case, string.Template will not be + discussed here, except to say that the this proposal will most + likely have some overlapping functionality with that module. + + The '%' operator is primarily limited by the fact that it is a + binary operator, and therefore can take at most two arguments. + One of those arguments is already dedicated to the format string, + leaving all other variables to be squeezed into the remaining + argument. The current practice is to use either a dictionary or a + tuple as the second argument, but as many people have commented + [1], this lacks flexibility. The "all or nothing" approach + (meaning that one must choose between only positional arguments, + or only named arguments) is felt to be overly constraining. + + +Specification + + The specification will consist of 4 parts: + + - Specification of a set of methods to be added to the built-in + string class. + + - Specification of a new syntax for format strings. + + - Specification of a new set of class methods to control the + formatting and conversion of objects. + + - Specification of an API for user-defined formatting classes. + + +String Methods + + The build-in string class will gain two new methods. The first + method is 'format', and takes an arbitrary number of positional + and keyword arguments: + + "The story of {0}, {1}, and {c}".format(a, b, c=d) + + Within a format string, each positional argument is identified + with a number, starting from zero, so in the above example, 'a' is + argument 0 and 'b' is argument 1. Each keyword argument is + identified by its keyword name, so in the above example, 'c' is + used to refer to the third argument. + + The result of the format call is an object of the same type + (string or unicode) as the format string. + + +Format Strings + + Brace characters ('curly braces') are used to indicate a + replacement field within the string: + + "My name is {0}".format('Fred') + + The result of this is the string: + + "My name is Fred" + + Braces can be escaped using a backslash: + + "My name is {0} :-\{\}".format('Fred') + + Which would produce: + + "My name is Fred :-{}" + + The element within the braces is called a 'field'. Fields consist + of a name, which can either be simple or compound, and an optional + 'conversion specifier'. + + Simple names are either names or numbers. If numbers, they must + be valid decimal numbers; if names, they must be valid Python + identifiers. A number is used to identify a positional argument, + while a name is used to identify a keyword argument. + + Compound names are a sequence of simple names seperated by + periods: + + "My name is {0.name} :-\{\}".format(dict(name='Fred')) + + Compound names can be used to access specific dictionary entries, + array elements, or object attributes. In the above example, the + '{0.name}' field refers to the dictionary entry 'name' within + positional argument 0. + + Each field can also specify an optional set of 'conversion + specifiers'. Conversion specifiers follow the field name, with a + colon (':') character separating the two: + + "My name is {0:8}".format('Fred') + + The meaning and syntax of the conversion specifiers depends on the + type of object that is being formatted, however many of the + built-in types will recognize a standard set of conversion + specifiers. + + The conversion specifier consists of a sequence of zero or more + characters, each of which can consist of any printable character + except for a non-escaped '}'. The format() method does not + attempt to intepret the conversion specifiers in any way; it + merely passes all of the characters between the first colon ':' + and the matching right brace ('}') to the various underlying + formatters (described later.) + + When using the 'fformat' variant, it is possible to omit the field + name entirely, and simply include the conversion specifiers: + + "My name is {:pad(23)}" + + This syntax is used to send special instructions to the custom + formatter object (such as instructing it to insert padding + characters up to a given column.) The interpretation of this + 'empty' field is entirely up to the custom formatter; no + standard interpretation will be defined in this PEP. + + If a custom formatter is not being used, then it is an error to + omit the field name. + + +Standard Conversion Specifiers + + For most built-in types, the conversion specifiers will be the + same or similar to the existing conversion specifiers used with + the '%' operator. Thus, instead of '%02.2x", you will say + '{0:2.2x}'. + + There are a few differences however: + + - The trailing letter is optional - you don't need to say '2.2d', + you can instead just say '2.2'. If the letter is omitted, the + value will be converted into its 'natural' form (that is, the + form that it take if str() or unicode() were called on it) + subject to the field length and precision specifiers (if + supplied). + + - Variable field width specifiers use a nested version of the {} + syntax, allowing the width specifier to be either a positional + or keyword argument: + + "{0:{1}.{2}d}".format(a, b, c) + + (Note: It might be easier to parse if these used a different + type of delimiter, such as parens - avoiding the need to create + a regex that handles the recursive case.) + + - The support for length modifiers (which are ignored by Python + anyway) is dropped. + + For non-built-in types, the conversion specifiers will be specific + to that type. An example is the 'datetime' class, whose + conversion specifiers are identical to the arguments to the + strftime() function: + + "Today is: {0:%x}".format(datetime.now()) + + +Controlling Formatting + + A class that wishes to implement a custom interpretation of its + conversion specifiers can implement a __format__ method: + + class AST: + def __format__(self, specifiers): + ... + + The 'specifiers' argument will be either a string object or a + unicode object, depending on the type of the original format + string. The __format__ method should test the type of the + specifiers parameter to determine whether to return a string or + unicode object. It is the responsibility of the __format__ method + to return an object of the proper type. + + string.format() will format each field using the following steps: + + 1) See if the value to be formatted has a __format__ method. If + it does, then call it. + + 2) Otherwise, check the internal formatter within string.format + that contains knowledge of certain builtin types. + + 3) Otherwise, call str() or unicode() as appropriate. + + +User-Defined Formatting Classes + + The code that interprets format strings can be called explicitly + from user code. This allows the creation of custom formatter + classes that can override the normal formatting rules. + + The string and unicode classes will have a class method called + 'cformat' that does all the actual work of formatting; The + format() method is just a wrapper that calls cformat. + + The parameters to the cformat function are: + + -- The format string (or unicode; the same function handles + both.) + -- A field format hook (see below) + -- A tuple containing the positional arguments + -- A dict containing the keyword arguments + + The cformat function will parse all of the fields in the format + string, and return a new string (or unicode) with all of the + fields replaced with their formatted values. + + For each field, the cformat function will attempt to call the + field format hook with the following arguments: + + field_hook(value, conversion, buffer) + + The 'value' field corresponds to the value being formatted, which + was retrieved from the arguments using the field name. (The + field_hook has no control over the selection of values, only + how they are formatted.) + + The 'conversion' argument is the conversion spec part of the + field, which will be either a string or unicode object, depending + on the type of the original format string. + + The 'buffer' argument is a Python array object, either a byte + array or unicode character array. The buffer object will contain + the partially constructed string; the field hook is free to modify + the contents of this buffer if needed. + + The field_hook will be called once per field. The field_hook may + take one of two actions: + + 1) Return False, indicating that the field_hook will not + process this field and the default formatting should be + used. This decision should be based on the type of the + value object, and the contents of the conversion string. + + 2) Append the formatted field to the buffer, and return True. + + +Alternate Syntax + + Naturally, one of the most contentious issues is the syntax of the + format strings, and in particular the markup conventions used to + indicate fields. + + Rather than attempting to exhaustively list all of the various + proposals, I will cover the ones that are most widely used + already. + + - Shell variable syntax: $name and $(name) (or in some variants, + ${name}). This is probably the oldest convention out there, and + is used by Perl and many others. When used without the braces, + the length of the variable is determined by lexically scanning + until an invalid character is found. + + This scheme is generally used in cases where interpolation is + implicit - that is, in environments where any string can contain + interpolation variables, and no special subsitution function + need be invoked. In such cases, it is important to prevent the + interpolation behavior from occuring accidentally, so the '$' + (which is otherwise a relatively uncommonly-used character) is + used to signal when the behavior should occur. + + It is the author's opinion, however, that in cases where the + formatting is explicitly invoked, that less care needs to be + taken to prevent accidental interpolation, in which case a + lighter and less unwieldy syntax can be used. + + - Printf and its cousins ('%'), including variations that add a + field index, so that fields can be interpolated out of order. + + - Other bracket-only variations. Various MUDs (Multi-User + Dungeons) such as MUSH have used brackets (e.g. [name]) to do + string interpolation. The Microsoft .Net libraries uses braces + ({}), and a syntax which is very similar to the one in this + proposal, although the syntax for conversion specifiers is quite + different. [2] + + - Backquoting. This method has the benefit of minimal syntactical + clutter, however it lacks many of the benefits of a function + call syntax (such as complex expression arguments, custom + formatters, etc.). + + - Other variations include Ruby's #{}, PHP's {$name}, and so + on. + + +Backwards Compatibility + + Backwards compatibility can be maintained by leaving the existing + mechanisms in place. The new system does not collide with any of + the method names of the existing string formatting techniques, so + both systems can co-exist until it comes time to deprecate the + older system. + + +References + + [1] [Python-3000] String formating operations in python 3k + http://mail.python.org/pipermail/python-3000/2006-April/000285.html + + [2] Composite Formatting - [.Net Framework Developer's Guide] + http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true + + +Copyright + + This document has been placed in the public domain. + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +sentence-end-double-space: t +fill-column: 70 +coding: utf-8 +End: diff --git a/pep-3102.txt b/pep-3102.txt new file mode 100644 index 000000000..8be4d999c --- /dev/null +++ b/pep-3102.txt @@ -0,0 +1,184 @@ +PEP: 3102 +Title: Keyword-Only Arguments +Version: $Revision$ +Last-Modified: $Date$ +Author: Talin +Status: Draft +Type: Standards +Content-Type: text/plain +Created: 22-Apr-2006 +Python-Version: 3.0 +Post-History: + + +Abstract + + This PEP proposes a change to the way that function arguments are + assigned to named parameter slots. In particular, it enables the + declaration of "keyword-only" arguments: arguments that can only + be supplied by keyword and which will never be automatically + filled in by a positional argument. + + +Rationale + + The current Python function-calling paradigm allows arguments to + be specified either by position or by keyword. An argument can be + filled in either explicitly by name, or implicitly by position. + + There are often cases where it is desirable for a function to take + a variable number of arguments. The Python language supports this + using the 'varargs' syntax ('*name'), which specifies that any + 'left over' arguments be passed into the varargs parameter as a + tuple. + + One limitation on this is that currently, all of the regular + argument slots must be filled before the vararg slot can be. + + This is not always desirable. One can easily envision a function + which takes a variable number of arguments, but also takes one + or more 'options' in the form of keyword arguments. Currently, + the only way to do this is to define both a varargs argument, + and a 'keywords' argument (**kwargs), and then manually extract + the desired keywords from the dictionary. + + +Specification + + Syntactically, the proposed changes are fairly simple. The first + change is to allow regular arguments to appear after a varargs + argument: + + def sortwords(*wordlist, case_sensitive=False): + ... + + This function accepts any number of positional arguments, and it + also accepts a keyword option called 'case_sensitive'. This + option will never be filled in by a positional argument, but + must be explicitly specified by name. + + Keyword-only arguments are not required to have a default value. + Since Python requires that all arguments be bound to a value, + and since the only way to bind a value to a keyword-only argument + is via keyword, such arguments are therefore 'required keyword' + arguments. Such arguments must be supplied by the caller, and + they must be supplied via keyword. + + The second syntactical change is to allow the argument name to + be omitted for a varargs argument: + + def compare(a, b, *, key=None): + ... + + The reasoning behind this change is as follows. Imagine for a + moment a function which takes several positional arguments, as + well as a keyword argument: + + def compare(a, b, key=None): + ... + + Now, suppose you wanted to have 'key' be a keyword-only argument. + Under the above syntax, you could accomplish this by adding a + varargs argument immediately before the keyword argument: + + def compare(a, b, *ignore, key=None): + ... + + Unfortunately, the 'ignore' argument will also suck up any + erroneous positional arguments that may have been supplied by the + caller. Given that we'd prefer any unwanted arguments to raise an + error, we could do this: + + def compare(a, b, *ignore, key=None): + if ignore: # If ignore is not empty + raise TypeError + + As a convenient shortcut, we can simply omit the 'ignore' name, + meaning 'don't allow any positional arguments beyond this point'. + + +Function Calling Behavior + + The previous section describes the difference between the old + behavior and the new. However, it is also useful to have a + description of the new behavior that stands by itself, without + reference to the previous model. So this next section will + attempt to provide such a description. + + When a function is called, the input arguments are assigned to + formal parameters as follows: + + - For each formal parameter, there is a slot which will be used + to contain the value of the argument assigned to that + parameter. + + - Slots which have had values assigned to them are marked as + 'filled'. Slots which have no value assigned to them yet are + considered 'empty'. + + - Initially, all slots are marked as empty. + + - Positional arguments are assigned first, followed by keyword + arguments. + + - For each positional argument: + + o Attempt to bind the argument to the first unfilled + parameter slot. If the slot is not a vararg slot, then + mark the slot as 'filled'. + + o If the next unfilled slot is a vararg slot, and it does + not have a name, then it is an error. + + o Otherwise, if the next unfilled slot is a vararg slot then + all remaining non-keyword arguments are placed into the + vararg slot. + + - For each keyword argument: + + o If there is a parameter with the same name as the keyword, + then the argument value is assigned to that parameter slot. + However, if the parameter slot is already filled, then that + is an error. + + o Otherwise, if there is a 'keyword dictionary' argument, + the argument is added to the dictionary using the keyword + name as the dictionary key, unless there is already an + entry with that key, in which case it is an error. + + o Otherwise, if there is no keyword dictionary, and no + matching named parameter, then it is an error. + + - Finally: + + o If the vararg slot is not yet filled, assign an empty tuple + as its value. + + o For each remaining empty slot: if there is a default value + for that slot, then fill the slot with the default value. + If there is no default value, then it is an error. + + In accordance with the current Python implementation, any errors + encountered will be signaled by raising TypeError. (If you want + something different, that's a subject for a different PEP.) + + +Backwards Compatibility + + The function calling behavior specified in this PEP is a superset + of the existing behavior - that is, it is expected that any + existing programs will continue to work. + + +Copyright + + This document has been placed in the public domain. + + +Local Variables: +mode: indented-text +indent-tabs-mode: nil +sentence-end-double-space: t +fill-column: 70 +coding: utf-8 +End: