Updated PEP 3101 to incorporate latest feedback, and simplify even further. Also added additional explanation of custom formatting classes.
This commit is contained in:
parent
935f64f730
commit
00d28204ef
246
pep-3101.txt
246
pep-3101.txt
|
@ -155,25 +155,20 @@ Simple and Compound Field Names
|
||||||
in a field expression. The dot operator allows an attribute of
|
in a field expression. The dot operator allows an attribute of
|
||||||
an input value to be specified as the field value.
|
an input value to be specified as the field value.
|
||||||
|
|
||||||
The types of expressions that can be used in a compound name
|
Unlike some other programming languages, you cannot embed arbitrary
|
||||||
have been deliberately limited in order to prevent potential
|
expressions in format strings. This is by design - the types of
|
||||||
security exploits resulting from the ability to place arbitrary
|
expressions that you can use is deliberately limited. Only two operators
|
||||||
Python expressions inside of strings. Only two operators are
|
are supported: the '.' (getattr) operator, and the '[]' (getitem)
|
||||||
supported, the '.' (getattr) operator, and the '[]' (getitem)
|
operator. The reason for allowing these operators is that they dont'
|
||||||
operator.
|
normally have side effects in non-pathological code.
|
||||||
|
|
||||||
Another limitation that is defined to limit potential security
|
|
||||||
issues is that field names or attribute names beginning with an
|
|
||||||
underscore are disallowed. This enforces the common convention
|
|
||||||
that names beginning with an underscore are 'private'.
|
|
||||||
|
|
||||||
An example of the 'getitem' syntax:
|
An example of the 'getitem' syntax:
|
||||||
|
|
||||||
"My name is {0[name]}".format(dict(name='Fred'))
|
"My name is {0[name]}".format(dict(name='Fred'))
|
||||||
|
|
||||||
It should be noted that the use of 'getitem' within a string is
|
It should be noted that the use of 'getitem' within a format string
|
||||||
much more limited than its normal use. In the above example, the
|
is much more limited than its conventional usage. In the above example,
|
||||||
string 'name' really is the literal string 'name', not a variable
|
the string 'name' really is the literal string 'name', not a variable
|
||||||
named 'name'. The rules for parsing an item key are very simple.
|
named 'name'. The rules for parsing an item key are very simple.
|
||||||
If it starts with a digit, then its treated as a number, otherwise
|
If it starts with a digit, then its treated as a number, otherwise
|
||||||
it is used as a string.
|
it is used as a string.
|
||||||
|
@ -187,9 +182,7 @@ Simple and Compound Field Names
|
||||||
of the underlying object to throw an exception if the identifier
|
of the underlying object to throw an exception if the identifier
|
||||||
is not legal. The format function will have a minimalist parser
|
is not legal. The format function will have a minimalist parser
|
||||||
which only attempts to figure out when it is "done" with an
|
which only attempts to figure out when it is "done" with an
|
||||||
identifier (by finding a '.' or a ']', or '}', etc.) The only
|
identifier (by finding a '.' or a ']', or '}', etc.).
|
||||||
exception to this laissez-faire approach is that, by default,
|
|
||||||
strings are not allowed to have leading underscores.
|
|
||||||
|
|
||||||
|
|
||||||
Conversion Specifiers
|
Conversion Specifiers
|
||||||
|
@ -269,7 +262,7 @@ Standard Conversion Specifiers
|
||||||
'+' - indicates that a sign should be used for both
|
'+' - indicates that a sign should be used for both
|
||||||
positive as well as negative numbers
|
positive as well as negative numbers
|
||||||
'-' - indicates that a sign should be used only for negative
|
'-' - indicates that a sign should be used only for negative
|
||||||
numbers (this is the default behaviour)
|
numbers (this is the default behavior)
|
||||||
' ' - indicates that a leading space should be used on
|
' ' - indicates that a leading space should be used on
|
||||||
positive numbers
|
positive numbers
|
||||||
'()' - indicates that negative numbers should be surrounded
|
'()' - indicates that negative numbers should be surrounded
|
||||||
|
@ -381,9 +374,8 @@ User-Defined Formatting
|
||||||
lives in the 'string' module. This class takes additional options
|
lives in the 'string' module. This class takes additional options
|
||||||
which are not accessible via the normal str.format method.
|
which are not accessible via the normal str.format method.
|
||||||
|
|
||||||
An application can create their own Formatter instance which has
|
An application can subclass the Formatter class to create their
|
||||||
customized behavior, either by setting the properties of the
|
own customized formatting behavior.
|
||||||
Formatter instance, or by subclassing the Formatter class.
|
|
||||||
|
|
||||||
The PEP does not attempt to exactly specify all methods and
|
The PEP does not attempt to exactly specify all methods and
|
||||||
properties defined by the Formatter class; Instead, those will be
|
properties defined by the Formatter class; Instead, those will be
|
||||||
|
@ -391,46 +383,25 @@ User-Defined Formatting
|
||||||
PEP will specify the general requirements for the Formatter class,
|
PEP will specify the general requirements for the Formatter class,
|
||||||
which are listed below.
|
which are listed below.
|
||||||
|
|
||||||
|
Although string.format() does not directly use the Formatter class
|
||||||
Formatter Creation and Initialization
|
to do formatting, both use the same underlying implementation. The
|
||||||
|
reason that string.format() does not use the Formatter class directly
|
||||||
The Formatter class takes a single initialization argument, 'flags':
|
is because "string" is a built-in type, which means that all of its
|
||||||
|
methods must be implemented in C, whereas Formatter is a Python
|
||||||
Formatter(flags=0)
|
class. Formatter provides an extensible wrapper around the same
|
||||||
|
C functions as are used by string.format().
|
||||||
The 'flags' argument is used to control certain subtle behavioral
|
|
||||||
differences in formatting that would be cumbersome to change via
|
|
||||||
subclassing. The flags values are defined as static variables
|
|
||||||
in the "Formatter" class:
|
|
||||||
|
|
||||||
Formatter.ALLOW_LEADING_UNDERSCORES
|
|
||||||
|
|
||||||
By default, leading underscores are not allowed in identifier
|
|
||||||
lookups (getattr or getitem). Setting this flag will allow
|
|
||||||
this.
|
|
||||||
|
|
||||||
Formatter.CHECK_UNUSED_POSITIONAL
|
|
||||||
|
|
||||||
If this flag is set, the any positional arguments which are
|
|
||||||
supplied to the 'format' method but which are not used by
|
|
||||||
the format string will cause an error.
|
|
||||||
|
|
||||||
Formatter.CHECK_UNUSED_NAME
|
|
||||||
|
|
||||||
If this flag is set, the any named arguments which are
|
|
||||||
supplied to the 'format' method but which are not used by
|
|
||||||
the format string will cause an error.
|
|
||||||
|
|
||||||
|
|
||||||
Formatter Methods
|
Formatter Methods
|
||||||
|
|
||||||
The methods of class Formatter are as follows:
|
The Formatter class takes no initialization arguments:
|
||||||
|
|
||||||
|
fmt = Formatter()
|
||||||
|
|
||||||
|
The public API methods of class Formatter are as follows:
|
||||||
|
|
||||||
-- format(format_string, *args, **kwargs)
|
-- format(format_string, *args, **kwargs)
|
||||||
-- vformat(format_string, args, kwargs)
|
-- vformat(format_string, args, kwargs)
|
||||||
-- get_positional(args, index)
|
|
||||||
-- get_named(kwds, name)
|
|
||||||
-- format_field(value, conversion)
|
|
||||||
|
|
||||||
'format' is the primary API method. It takes a format template,
|
'format' is the primary API method. It takes a format template,
|
||||||
and an arbitrary set of positional and keyword argument. 'format'
|
and an arbitrary set of positional and keyword argument. 'format'
|
||||||
|
@ -442,23 +413,38 @@ Formatter Methods
|
||||||
repacking the dictionary as individual arguments using the '*args' and
|
repacking the dictionary as individual arguments using the '*args' and
|
||||||
'**kwds' syntax. 'vformat' does the work of breaking up the format
|
'**kwds' syntax. 'vformat' does the work of breaking up the format
|
||||||
template string into character data and replacement fields. It calls
|
template string into character data and replacement fields. It calls
|
||||||
the 'get_positional' and 'get_index' methods as appropriate.
|
the 'get_positional' and 'get_index' methods as appropriate (described
|
||||||
|
below.)
|
||||||
|
|
||||||
Note that the checking of unused arguments, and the restriction on
|
Formatter defines the following overridable methods:
|
||||||
leading underscores in attribute names are also done in this function.
|
|
||||||
|
-- get_positional(args, index)
|
||||||
|
-- get_named(kwds, name)
|
||||||
|
-- check_unused_args(used_args, args, kwargs)
|
||||||
|
-- format_field(value, conversion)
|
||||||
|
|
||||||
'get_positional' and 'get_named' are used to retrieve a given field
|
'get_positional' and 'get_named' are used to retrieve a given field
|
||||||
value. For compound field names, these functions are only called for
|
value. For compound field names, these functions are only called for
|
||||||
the first component of the field name; Subsequent components are
|
the first component of the field name; Subsequent components are
|
||||||
handled through normal attribute and indexing operations. So for
|
handled through normal attribute and indexing operations.
|
||||||
example, the field expression '0.name' would cause 'get_positional' to
|
|
||||||
be called with the list of positional arguments and a numeric index of
|
So for example, the field expression '0.name' would cause
|
||||||
0, and then the standard 'getattr' function would be called to get the
|
'get_positional' to be called with the parameter 'args' set to the
|
||||||
'name' attribute of the result.
|
list of positional arguments to vformat, and 'index' set to zero;
|
||||||
|
the returned value would then be passed to the standard 'getattr'
|
||||||
|
function to get the 'name' attribute.
|
||||||
|
|
||||||
If the index or keyword refers to an item that does not exist, then an
|
If the index or keyword refers to an item that does not exist, then an
|
||||||
IndexError/KeyError will be raised.
|
IndexError/KeyError will be raised.
|
||||||
|
|
||||||
|
'check_unused_args' is used to implement checking for unused arguments
|
||||||
|
if desired. The arguments to this function is the set of all argument
|
||||||
|
keys that were actually referred to in the format string (integers for
|
||||||
|
positional arguments, and strings for named arguments), and a reference
|
||||||
|
to the args and kwargs that was passed to vformat. The intersection
|
||||||
|
of these two sets will be the set of unused args. 'check_unused_args'
|
||||||
|
is assumed to throw an exception if the check fails.
|
||||||
|
|
||||||
'format_field' actually generates the text for a replacement field.
|
'format_field' actually generates the text for a replacement field.
|
||||||
The 'value' argument corresponds to the value being formatted, which
|
The 'value' argument corresponds to the value being formatted, which
|
||||||
was retrieved from the arguments using the field name. The
|
was retrieved from the arguments using the field name. The
|
||||||
|
@ -466,11 +452,46 @@ Formatter Methods
|
||||||
will be either a string or unicode object, depending on the type of
|
will be either a string or unicode object, depending on the type of
|
||||||
the original format string.
|
the original format string.
|
||||||
|
|
||||||
Note: The final implementation of the Formatter class may define
|
To get a better understanding of how these functions relate to each
|
||||||
additional overridable methods and hooks. In particular, it may be
|
other, here is pseudocode that explains the general operation of
|
||||||
that 'vformat' is itself a composition of several additional,
|
vformat:
|
||||||
overridable methods. (Depending on whether it is convenient to the
|
|
||||||
implementor of Formatter.)
|
def vformat(format_string, args, kwargs):
|
||||||
|
|
||||||
|
# Output buffer and set of used args
|
||||||
|
buffer = StringIO.StringIO()
|
||||||
|
used_args = set()
|
||||||
|
|
||||||
|
# Tokens are either format fields or literal strings
|
||||||
|
for token in self.parse(format_string):
|
||||||
|
if is_format_field(token):
|
||||||
|
field_spec, conversion_spec = token.rsplit(":", 2)
|
||||||
|
|
||||||
|
# 'first_part' is the part before the first '.' or '['
|
||||||
|
first_part = get_first_part(token)
|
||||||
|
used_args.add(first_part)
|
||||||
|
if is_positional(first_part):
|
||||||
|
value = self.get_positional(args, first_part)
|
||||||
|
else:
|
||||||
|
value = self.get_named(kwargs, first_part)
|
||||||
|
|
||||||
|
# Handle [subfield] or .subfield
|
||||||
|
for comp in components(token):
|
||||||
|
value = resolve_subfield(value, comp)
|
||||||
|
|
||||||
|
# Write out the converted value
|
||||||
|
buffer.write(format_field(value, conversion))
|
||||||
|
|
||||||
|
else:
|
||||||
|
buffer.write(token)
|
||||||
|
|
||||||
|
self.check_unused_args(used_args, args, kwargs)
|
||||||
|
return buffer.getvalue()
|
||||||
|
|
||||||
|
Note that the actual algorithm of the Formatter class may not be the
|
||||||
|
one presented here. In particular, the final implementation of
|
||||||
|
the Formatter class may define additional overridable methods and
|
||||||
|
hooks. Also, the final implementation will be written in C.
|
||||||
|
|
||||||
|
|
||||||
Customizing Formatters
|
Customizing Formatters
|
||||||
|
@ -527,8 +548,7 @@ Error handling
|
||||||
|
|
||||||
There are two classes of exceptions which can occur during formatting:
|
There are two classes of exceptions which can occur during formatting:
|
||||||
exceptions generated by the formatter code itself, and exceptions
|
exceptions generated by the formatter code itself, and exceptions
|
||||||
generated by user code (such as a field object's getattr function, or
|
generated by user code (such as a field object's 'getattr' function).
|
||||||
the field_hook function).
|
|
||||||
|
|
||||||
In general, exceptions generated by the formatter code itself are
|
In general, exceptions generated by the formatter code itself are
|
||||||
of the "ValueError" variety -- there is an error in the actual "value"
|
of the "ValueError" variety -- there is an error in the actual "value"
|
||||||
|
@ -615,6 +635,38 @@ Alternate Syntax
|
||||||
what .Net uses.
|
what .Net uses.
|
||||||
|
|
||||||
|
|
||||||
|
Alternate Feature Proposals
|
||||||
|
|
||||||
|
Restricting attribute access: An earlier version of the PEP
|
||||||
|
restricted the ability to access attributes beginning with a
|
||||||
|
leading underscore, for example "{0}._private". However, this
|
||||||
|
is a useful ability to have when debugging, so the feature
|
||||||
|
was dropped.
|
||||||
|
|
||||||
|
Some developers suggested that the ability to do 'getattr' and
|
||||||
|
'getitem' access should be dropped entirely. However, this
|
||||||
|
is in conflict with the needs of another set of developers who
|
||||||
|
strongly lobbied for the ability to pass in a large dict as a
|
||||||
|
single argument (without flattening it into individual keyword
|
||||||
|
arguments using the **kwargs syntax) and then have the format
|
||||||
|
string refer to dict entries individually.
|
||||||
|
|
||||||
|
There has also been suggestions to expand the set of expressions
|
||||||
|
that are allowed in a format string. However, this was seen
|
||||||
|
to go against the spirit of TOOWTDI, since the same effect can
|
||||||
|
be achieved in most cases by executing the same expression on
|
||||||
|
the parameter before it's passed in to the formatting function.
|
||||||
|
For cases where the format string is being use to do arbitrary
|
||||||
|
formatting in a data-rich environment, it's recommended to use
|
||||||
|
a templating engine specialized for this purpose, such as
|
||||||
|
Genshi [5] or Cheetah [6].
|
||||||
|
|
||||||
|
Many other features were considered and rejected because they
|
||||||
|
could easily be achieved by subclassing Formatter instead of
|
||||||
|
building the feature into the base implementation. This includes
|
||||||
|
alternate syntax, comments in format strings, and many others.
|
||||||
|
|
||||||
|
|
||||||
Security Considerations
|
Security Considerations
|
||||||
|
|
||||||
Historically, string formatting has been a common source of
|
Historically, string formatting has been a common source of
|
||||||
|
@ -622,43 +674,21 @@ Security Considerations
|
||||||
string templating system allows arbitrary expressions to be
|
string templating system allows arbitrary expressions to be
|
||||||
embedded in format strings.
|
embedded in format strings.
|
||||||
|
|
||||||
The typical scenario is one where the string data being processed
|
The best way to use string formatting in a way that does not
|
||||||
is coming from outside the application, perhaps from HTTP headers
|
create potential security holes is to never use format strings
|
||||||
or fields within a web form. An attacker could substitute their
|
that come from an untrusted source.
|
||||||
own strings designed to cause havok.
|
|
||||||
|
|
||||||
The string formatting system outlined in this PEP is by no means
|
Barring that, the next best approach is to insure that string
|
||||||
'secure', in the sense that no Python library module can, on its
|
formatting has no side effects. Because of the open nature of
|
||||||
own, guarantee security, especially given the open nature of
|
Python, it is impossible to guarantee that any non-trivial
|
||||||
the Python language. Building a secure application requires a
|
operation has this property. What this PEP does is limit the
|
||||||
secure approach to design.
|
types of expressions in format strings to those in which visible
|
||||||
|
side effects are both rare and strongly discouraged by the
|
||||||
What this PEP does attempt to do is make the job of designing a
|
culture of Python developers. So for example, attribute access
|
||||||
secure application easier, by making it easier for a programmer
|
is allowed because it would be considered pathological to write
|
||||||
to reason about the possible consequences of a string formatting
|
code where the mere access of an attribute has visible side
|
||||||
operation. It does this by limiting those consequences to a smaller
|
effects (whether the code has *invisible* side effects - such
|
||||||
and more easier understood subset.
|
as creating a cache entry for faster lookup - is irrelevant.)
|
||||||
|
|
||||||
For example, because it is possible in Python to override the
|
|
||||||
'getattr' operation of a type, the interpretation of a compound
|
|
||||||
replacement field such as "0.name" could potentially run
|
|
||||||
arbitrary code.
|
|
||||||
|
|
||||||
However, it is *extremely* rare for the mere retrieval of an
|
|
||||||
attribute to have side effects. Other operations which are more
|
|
||||||
likely to have side effects - such as method calls - are disallowed.
|
|
||||||
Thus, a programmer can be reasonably assured that no string
|
|
||||||
formatting operation will cause a state change in the program.
|
|
||||||
This assurance is not only useful in securing an application, but
|
|
||||||
in debugging it as well.
|
|
||||||
|
|
||||||
Similarly, the restriction on field names beginning with
|
|
||||||
underscores is intended to provide similar assurances about the
|
|
||||||
visibility of private data.
|
|
||||||
|
|
||||||
Of course, programmers would be well-advised to avoid using
|
|
||||||
any external data as format strings, and instead use that data
|
|
||||||
as the format arguments instead.
|
|
||||||
|
|
||||||
|
|
||||||
Sample Implementation
|
Sample Implementation
|
||||||
|
@ -693,6 +723,12 @@ References
|
||||||
[4] Composite Formatting - [.Net Framework Developer's Guide]
|
[4] Composite Formatting - [.Net Framework Developer's Guide]
|
||||||
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
|
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true
|
||||||
|
|
||||||
|
[5] Genshi templating engine.
|
||||||
|
http://genshi.edgewall.org/
|
||||||
|
|
||||||
|
[5] Cheetah - The Python-Powered Template Engine.
|
||||||
|
http://www.cheetahtemplate.org/
|
||||||
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue