PEP 501: switch to a more normal binary operator

This commit is contained in:
Nick Coghlan 2015-08-23 07:04:33 +10:00
parent d377d6216a
commit 98fc50b68c
1 changed files with 67 additions and 45 deletions

View File

@ -29,19 +29,30 @@ an opening for a form of code injection attack, where the supplied user data
has not been properly escaped before being passed to the ``os.system`` call.
To address that problem (and a number of other concerns), this PEP proposes an
alternative approach to compiler supported interpolation, based on a new
``__interpolate__`` magic method, and using a substitution syntax inspired by
alternative approach to compiler supported interpolation, based on a new ``$``
binary operator with a syntactically constrained right hand side, a new
``__interpolate__`` magic method, and a substitution syntax inspired by
that used in ``string.Template`` and ES6 JavaScript, rather than adding a 4th
substitution variable syntax to Python.
Some examples of the proposed syntax::
msg = str$'My age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
print(_$"This is a $translated $message")
translated = l20n$"{{ $user }} is running {{ appname }}"
myquery = sql$"SELECT $column FROM $table;"
mycommand = sh$"cat $filename"
mypage = html$"<html><body>${response.body}</body></html>"
Proposal
========
This PEP proposes that the new syntax::
This PEP proposes the introduction of a new binary operator specifically for
interpolation of arbitrary expressions::
value = !interpolator "Substitute $names and ${expressions} at runtime"
value = interpolator $ "Substitute $names and ${expressions} at runtime"
be interpreted as::
This would be effectively interpreted as::
_raw_template = "Substitute $names and ${expressions} at runtime"
_parsed_fields = (
@ -54,25 +65,24 @@ be interpreted as::
_parsed_fields,
_field_values)
Whitespace would be permitted between the interpolator name and the opening
quote, but not required in most cases.
The right hand side of the new operator would be syntactically constrained to
be a string literal.
The ``str`` builtin type would gain an ``__interpolate__`` implementation that
supported the following ``str.format`` based semantics::
supported the following ``str.format`` inspired semantics::
>>> import datetime
>>> name = 'Jane'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> !str'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
>>> str$'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> !str'She said her name is ${name!r}.'
>>> str$'She said her name is ${name!r}.'
"She said her name is 'Jane'."
The interpolation prefix could be used with single-quoted, double-quoted and
triple quoted strings. It may also be used with raw strings, but in that case
whitespace would be required between the interpolator name and the trailing
string.
The interpolation operator could be used with single-quoted, double-quoted and
triple quoted strings, including raw strings. It would not support bytes
literals as the right hand side of the expression.
This PEP does not propose to remove or deprecate any of the existing
string formatting mechanisms, as those will remain valuable when formatting
@ -122,18 +132,16 @@ JavaScript and core application code written in Python.
Specification
=============
In source code, interpolation expressions are introduced by the new character
``!``. This is a new kind of expression, consisting of::
This PEP proposes the introduction of ``$`` as a new binary operator designed
specifically to support interpolation of template strings::
!DOTTED_NAME TEMPLATE_STRING
INTERPOLATOR $ TEMPLATE_STRING
Similar to ``yield`` expressions, this construct can be used without
parentheses as a standalone expression statement, as the sole expression on the
right hand side of an assignment or return statement, and as the sole argument
to a function. In other situations, it requires containing parentheses to avoid
ambiguity.
This would work as a normal binary operator (precedence TBD), with the
exception that the template string would be syntactically constrained to be a
string literal, rather than permitting arbitrary expressions.
The template string must be a Unicode string (byte strings are not permitted),
The template string must be a Unicode string (bytes literals are not permitted),
and string literal concatenation operates as normal within the template string
component of the expression.
@ -180,7 +188,7 @@ appears in the source code.
For the following example interpolation expression::
!str 'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'
str$'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'
the parsed fields tuple would be::
@ -253,7 +261,7 @@ same expression and used runtime field parsing::
>>> def foo(data):
... return data + 20
...
>>> !str 'input=$bar, output=${foo(bar)}'
>>> str$'input=$bar, output=${foo(bar)}'
'input=10, output=30'
Is essentially equivalent to::
@ -267,9 +275,9 @@ Handling code injection attacks
The proposed interpolation expressions make it potentially attractive to write
code like the following::
myquery = !str "SELECT $column FROM $table;"
mycommand = !str "cat $filename"
mypage = !str "<html><body>$content</body></html>"
myquery = str$"SELECT $column FROM $table;"
mycommand = str$"cat $filename"
mypage = str$"<html><body>${response.body}</body></html>"
These all represent potential vectors for code injection attacks, if any of the
variables being interpolated happen to come from an untrusted source. The
@ -277,9 +285,9 @@ specific proposal in this PEP is designed to make it straightforward to write
use case specific interpolators that take care of quoting interpolated values
appropriately for the relevant security context::
myquery = !sql "SELECT $column FROM $table;"
mycommand = !sh "cat $filename"
mypage = !html "<html><body>$content</body></html>"
myquery = sql$"SELECT $column FROM $table;"
mycommand = sh$"cat $filename"
mypage = html$"<html><body>${response.body}</body></html>"
This PEP does not cover adding such interpolators to the standard library,
but instead ensures they can be readily provided by third party libraries.
@ -305,13 +313,13 @@ errors all raise SyntaxError.
Unmatched braces::
>>> !str 'x=${x'
>>> str$'x=${x'
File "<stdin>", line 1
SyntaxError: missing '}' in interpolation expression
Invalid expressions::
>>> !str 'x=${!x}'
>>> str$'x=${!x}'
File "<fstring>", line 1
!x
^
@ -350,9 +358,9 @@ would delegate interpolation calls to ``string.Template``::
field_values[expr] = values[field_num]
return field_values
And would then be invoked as::
And would could then be invoked as::
print(!i18n "This is a $translated $message")
print(_$"This is a $translated $message")
Any actual implementation would need to address other issues (most notably
message catalog extraction), but this gives the general idea of what might be
@ -365,11 +373,11 @@ substitution, and ``{{ $user }}`` for local context substitution.
With the syntax in this PEP, an l20n interpolator could be written as::
translated = !l20n "{{ $user }} is running {{ appname }}"
translated = l20n$"{{ $user }} is running {{ appname }}"
With the syntax proposed in PEP 498 (and neglecting the difficulty of doing
catalog lookups using PEP 498's semantics), the necessary brace escaping would
make the string look like this in order to interpolating the user variable
make the string look like this in order to interpolate the user variable
while preserving all of the expected braces::
interpolated = "{{{{ ${user} }}}} is running {{{{ appname }}}}"
@ -388,8 +396,8 @@ While beyond the scope of this initial PEP, the proposal described here could
potentially be applied to the logging module's event reporting APIs, permitting
relevant details to be captured using forms like::
!logging.debug "Event: $event; Details: $data"
!logging.critical "Error: $error; Details: $data"
logging.debug$"Event: $event; Details: $data"
logging.critical$"Error: $error; Details: $data"
Discussion
@ -398,14 +406,28 @@ Discussion
Refer to PEP 498 for additional discussion, as several of the points there
also apply to this PEP.
Compatibility with IPython magic strings
----------------------------------------
Determining relative precedence
-------------------------------
IPython uses "!" to introduce custom interactive constructs. These are only
used at statement level, and could continue to be special cased in the
IPython runtime.
The PEP doesn't currently specify the relative precedence of the new operator,
as the only examples considered so far concern standalone expressions or simple
variable assignments.
This existing usage *did* help inspire the syntax proposed in this PEP.
Development of a reference implementation based on the PEP 498 reference
implementation may help answer that question.
Deferring support for binary interpolation
------------------------------------------
Supporting binary interpolation with this syntax would be relatively
straightforward (just a matter of relaxing the syntactic restrictions on the
right hand side of the operator), but poses a signficant likelihood of
producing confusing type errors when a text interpolator was presented with
binary input.
Since the proposed operator is useful without binary interpolation support, and
such support can be readily added later, further consideration of binary
interpolation is considered out of scope for the current PEP.
Preserving the raw template string
----------------------------------